Lately I’ve been doing some interesting work that I’ve alluded to elsewhere dealing with the binary communications protocol hosted Blizzard Entertainment’s Battle.net game service. It’s kind of what brought me into C# development in the first place; I walked away from it for a few years, and now I’ve been digging into it again. And I’ve learned a few things between then and now; I’ve been particularly interested in looking at the under-the-hood workings of the CLR, and so I’m starting a new series on "Speedy C#". Let me be the first to point out that optimizations have a unique way of obfuscating code; particularly in this example, if you don’t explain why you’re doing what you’re doing, and exactly what result you expect, you could run into trouble, or worse, your colleagues may run into trouble. So while going through this series,
A little background: the binary protocol used for Battle.net has about 80 or so message IDs, which generally have a different structure for each. The messages don’t necessarily come as a result of sending a message first, and so the general pattern is that a receive loop is in place that receives the data, parses it, and then sends events back to the client. In fact, there are no synchronous requests defined by the protocol.
When I first started programming, I had handlers for every message ID in a switch/case branching construct:
When I looked at this in ildasm, I noticed that it declared a max stack size of something ridiculously large (sorry I don’t have a specific number – it was about 6 years ago). I also noticed that there were a LOT of branches, but not necessarily in the order in which I had written them. The compiler had intrinsically optimized my code to perform a binary search. Fairly interesting, optimal speed at O(log N), and something that most of us wouldn’t have thought of naturally!
When I last revisited this type of development, I broke all of my handlers out of the branching conditional, calling a separate method to handle each message. This had a nice effect of making me not have to worry about variable name collisions like I had to in the above example, and it made the code slightly more maintainable. It’s difficult to gauge on paper whether that would have been better or worse performance; there was certainly far less stack allocation, but there was an additional (potentially virtual) method call.
The latest code incorporated into my library takes a different approach: I declare a Dictionary<BncsPacketId, ParseCallback>, populate it with default handlers, and allow existing handlers to be replaced and new ones to be added provided certain conditions are met. This has had several benefits:
- According to MSDN, Dictionary<TKey, TValue> approaches O(1), which is (obviously) the fastest lookup we could hope for.
- Adding support for new or changed messages does not require change to the code, only that a handler be updated via a method call.
- Handlers can be switched at runtime.
In this code, a ParseCallback is a delegate that accepts information provided by the message header and the message contents themselves. This has modified the entire parsing thread to be:
Now, obviously, this is a very domain-specific optimization that I wouldn’t make unless it makes sense in the problem domain. For mine, it does; I am writing the library so that others are able to integrate functionality without having to worry about modifying code that they maybe are not familiar with or are worried about breaking. If you absolutely need to use this method, be sure to document why.
The "Speedy C#" Series:
- Part 1: Optimizing Long if-else or switch Branches
- Part 2: Optimizing Memory Allocations – Pooling and Reusing Objects
- Part 3: Understanding Memory References, Pinned Objects, and Pointers
- Part 4: Using – and Understanding – CLR Profiler
- Part 5: Using Threads with Waits, or Don’t Kill your CPU