As a sysadmin and/or backend engineer, you spend a lot of time digging through logs, and responding to problems based on what you see there. Doing so is often difficult and tedious, for a myriad of reasons:
- Logs come from many different sources on a host.
- Each source generates logs in a different format and puts them in a different place.
- Each source treats the urgency behind its logs differently.
And those only apply within a single host, so multiply that headache by the number of hosts in your cluster!
Ultimately our goal was to have a normalized dataset consisting of all the logs that have ever happened in our system. With that we can then write an