Before I get too far into this, let me be up front, I went through a lot of iterations of this code. Some of these were really bad or really stupid. I’m going to tell you what I remember of the process no matter how dumb it may have been or how incompotent it may make me look. Why would I do this? Because creation is a process, and if you hide your mistakes, you won’t learn from them. If I hide my mistakes, how can I teach you not to make them?
How did this first attempt go? In short: bad.
So admittedly I went into this version with the lowest possible standards. With a mindset akin to:
As you can imagine, if you just throw “all teh tasks” at the problem it doesn’t turn out well. Once I had worked out the majority of the race conditions, I ended up with something that looked a bit like this:
Now you may notice a few issues here:
- I just throw 10 tasks at it and cross my fingers
- Both queues are unbounded
So what happened when I ran this? Fire, brimstone, melting silicon, smoke gates exploding left and right. Ok, maybe not that bad, I hit 100% CPU while only achieving 3-5 MB/s in read throughput. To put this in perspective, my current implementation can achieve ~140 MB/s in this basic setup.
10 Tasks and Cross Your Fingers!
async/await concurrency model. While I may not be explicitly calling things as bad as
lock, there is locking occurring, the framework is just being really nice and hiding it from me. (Interesting tangent Why Skylake CPUs Are Sometimes 50% Slower – How Intel Has Broken Existing Code.)
What could go wrong? Turns out, just about exactly what you expect. I ended up with parsed entries sitting waiting to be dequeued by the
I tried to get creative with my caching the first run through. It ended up working pretty well. But it had some issues. Namely, compression algorithms tend to be very slow. And I wanted my cache files to be not just machine readable, but human readable as well, for debugging purposes. So I decided to combine some off the shelf components into a cache file.
This file is basically a zip file, with 1 compressed file for every 10,000
LogEntry objects. The
LogEntry object is written as JSON to the file, one per line. This yields a structure something like:
Zip Container |- Batch1 Container | |- LogEntry1 | |- LogEntry2 | |- ... | |- LogEntry10000 |- Batch2 Container | |- LogEntry1 | |- LogEntry2 | |- ... | |- LogEntry10000
LogEntry contains an extra crucial piece of information beyond the parsed data from the line, the original line number in the file. This allows us to tie a point on a graph to a line in a file. Now a user of the software is able to use their own tools to delve deeper into a particular line or set of lines in the file after looking at our charts.
So lets see how this fits into my parsing code above.
Because of how Zip Files work, they’re not really thread safe, at least not the libraries I tried for this project. So much like the reader task being a bottleneck, the writer task is another new bottleneck.
Ultimately, this code is pretty simple, has a few problems, and got me 3-5 MB/s of read. It allowed me to get a POC on the table and taught me you can’t just throw Tasks at a problem and have it all automgically work. My next attempt was much better, took advantage of multiple kinds of parallel programming and avoided some of the memory issues present in this incarnation.