High Performance Log parsing in C# - A Brief History

7-Aug-2018

About 4 and a half years ago, I made a big decision, I jumped from full-time development to working as a Technical Services Engineer for a software company. I was no longer developing full time and I needed something to do in my precious spare time to keep my skills sharp. Something every TSE does is read logs, copious amounts of logs. Long, long, long, logs. Sometimes these logs are so big, you can’t find a text reader that can reliably open and search them. I’ve seen improperly configured logrotate configurations yield single log files in excess of 15GB. Even when you have a tool that can render them (less, more, Textpad), it can be tough to grasp the full meaning of all of this information. Enter my project to parse and visualize these logs. After all, humans are very visual creaters. We can see patterns in images that we wouldn’t otherwise recognize in text alone.

When I started as a TSE, there were already some tools out there to do this, but they were written in Python. They also did not work well (if at all) on Windows. Being a Windows user and somewhat of a Windows expert is why I was hired in the first place. Python can be a great language, but a high performance, memory conscious language, it is not. As a C# developer, I thought, perhaps, .NET could do better. After all, it has a better threading model, asynchronous I/O, static typing, and many other “benefits”. But I was a bit younger and a bit more naive back then. Some of those features of .NET would be a godsend, some would be a curse.

So before I started, I decided to lay out a few requirements for myself:

Must be multithreaded
Must not require loading the whole file into memory at the same time
Must not require re-parsing of the file each time it is viewd (some sort of caching layer)
Must run on any operating system
Must use a common GUI across platforms
Must be close to feature parity wtih existing tools

Looking at these requirements, they don’t seem too bad. But they each present their own challenges once you get into them.

Must Be Multithreaded

This should be simple enough in C#. There are so many different ways to do this:

Threads
Tasks
Parallel.For / Parallel.ForEach

In keeping with the more modern way of doing things, I focused mainly on the Task paradigm. However, I soon discovered this poses some problems of its own. It required me to learn a considerable amount about locking, shared collections, and how async/await work under the hood.

Must Not Require Loading the Whole File Into Memory At The Same Time

This means we need to introduce some sort of caching layer. In the beginning, there was hacking, in the end, perhaps hacking is no longer required.

Must Not Require Re-parsing of the Whole File Each Time

This means we need to be able to read the file in pieces, parse them, and release the memory as quickly as possible. This results in a streaming approach that can keep memory contrained to 100-200 MB (depending on the size of the file) while still having no loss in data.

Must Run Anywhere

In the beginning there was Mono. Then, Microsoft smiled upon us and introduced .NET Core. Oh what a great thing this has been.

Must Have a Common GUI across platforms

While Mono and Xamarain provide great cross platform support, I opted for a web interface approach for this. Some of my earliest attempts used WinForms however the performance was lackluster and there are not many freely licensed charting libraries for WinForms. On the other hand, there are copious JavaScript libraries for charting data that are freely licensed.

Must Be Close to Feature Parity With Existing Tools

This means we need to support the same types of graphs and extract the same information as the current tools.

In my next few posts, I’ll talk about what I tried, what worked, what didn’t, and what caught fire.

Pete Garafano