Every few months, I would learn something new about .NET through the normal course of my reading. I invested in a copy of R# Ultimate, getting me access to dotTrace and dotMemory. Through profiling my application I learned some very interesting things, which, in hindsight, make perfect sense.
- Regex is slooooooooooooooooooooooooooooooowwwwwwwwwwwwwwwwwww
Substringallocates a completely new string
- Garbage Collection can be really expensive, I mean really expensive
So it turns out, Regular Expressions are really slow, especially when compared to
String.IndexOf. I won’t get into the nitty gritty of the details, if you want to see numbers, theburningmonk.com has a great blog post on the subject. So I decided, hey, I’m going to get rid of
Regex and start using
This was an excellent idea. I improved my parsing performance by ~3x. So instead of 3-5 MB/s, I was getting 9-15 MB/s. I patted myself on the back, called myself a programming genius, and grabbed a glass of whiskey from the bar in the office.
Substring / GC
I settled in and ran my profiler to see if there was anything else I could improve.
dotTrace showed my GC time tripled and I was spending about 10% of execution time in full GC. Well that’s less than ideal. WHAT HAPPENED?!
Looking at the allocations, I could see 100’s of millions of strings. Uhm, well, that seems like a lot. Then it dawned on me, strings are immutable! Every time I sliced up a string, I go a whole new string and all the glorious memory allocations that come with it. No wonder my heap had turned into swiss cheese!
At this point I was a bit baffled. I honestly had no idea what to do about this new problem. But was it really a problem? After all, I got a speed improvement and things were humming along happily.
Yes, it was still a problem. The hacker inside me said, there must be a better way. But what was it…