The log/event processing pipeline you can't have - apenwarr

4371 shaares

Filters

Links per page

20 50 100

The log/event processing pipeline you can't have - apenwarr

Let me tell you about the still-not-defunct real-time log processing pipeline we built at my now-defunct last job. It handled logs from a large number of embedded devices that our ISP operated [...] Eventually our team's log processing system evolved to become the primary monitoring and alerting infrastructure for our ISP.

you mostly get told that you shouldn't be using unstructured logs anyway, you should be using event streams.
That advice is not wrong, but it's incomplete.

There's a file called /dev/kmsg which, if you write to it, produces messages into the kernel's buffer. Let's do that! For all our messages!

RAM is even more volatile than disk, and you have to reboot after a kernel panic. So the RAM is gone, right? Well, no. Sort of. Not exactly.

have the client to stream logs to the server. This is possible using HTTP POST and Chunked encoding,

The log uploader uses a backoff timer so that if it's been trying to upload for a while, it uploads less often. However, the backoff timer was limited to no more than the usual inter-upload interval.

Someone probably told you that log messages are too slow, or too big, or too hard to read, or too hard to use, or you should use them while debugging and then delete them. All those people were living in the past and they didn't have a fancy log pipeline. Computers are really, really fast now. Storage is really, really cheap.

How much are you paying for someone to run some bloaty full-text indexer on all your logs, to save a few milliseconds per grep?

logs · streaming · SystemDesign · HTTP · Internet · ETL_&_big_data_streams_processing · kernel · grep · backoff

February 22, 2019 08:24:06 AM GMT+01:00 · permalink

https://apenwarr.ca/log/20190216

Filters

Links per page

20 50 100