Running with Code Like with scissors, only more dangerous

20Aug/080

Software Development and LEGOs

Posted by Rob

I recently went to my cousin's birthday party (he turned 6), and since he's apparently become a huge Star Wars fan (read: played LEGO Star Wars and loves shooting off C-3PO's leg), my parents decided to get him a couple small Star Wars LEGO toys that were designed for that age group.  They're small - maybe 30 pieces or so - but they're still pretty neat, and I can't help but be amazed at how the pieces still come together to form the whole.  I was a LEGO and K'nex fanatic when I was younger, but I don't think I really played around with them since I was 14 or so - ever since I left Illinois.  So when I was asked to help my cousin put them together, my first reaction was, "Hey, I don't do that anymore."  But as I got started, I was drawn in - even on the little 30-piece landspeeder.

And so, I recently made one of the biggest entirely-vanity buys since I've moved into my new place:

The box

The "Ultimate Collector's Edition" Imperial Star Destroyer.  I couldn't bring myself to invest the full $500 in the Millenium Falcon, but I thought that the price wasn't too insane, a paltry 3104 pieces wasn't too daunting, and that I would have enough fun putting together the monstrous 37" set.  The instruction book is about the thickness of my high school year books, and I'm disappointed but not entirely surprised that they didn't include the hyperdrive units as part of the set.  I guess I'm on my own for making it actually fly.

I took the pieces out and set them on my dining room table, and as I analyze the rather daunting task ahead, I realized that, really, it's like my every day job.  And the first page of the book seemed to confirm it:

Page 1: The Phantom Frame

Everything is done in components.  We have a frame, then a wing, then another wing.  The bridge.  All of these go together with little regard for how the rest of the ship is built.  Just like we build software.  We try to solve a problem, but we try to make sure that our components can be used or adapted to solve another problem as well. 

So maybe all of my formative years building with those LEGO sets weren't for nought.  Maybe they were setting the stage for software development even then, tuning my problem-solving skills, making me think of things as tasks to be decomposed until the smallest task can be approached.  Who's to say?

Sadly, my software development experience doesn't have a way for me to estimate time to build this bad boy.  But at least my motivation's there!

On the dining room table

Update: Check out the building marathon!

Tagged as: , No Comments
13Aug/080

Speedy C#, Part 3: Understanding Memory References, Pinned Objects, and Pointers

Posted by Rob

So often in the managed world we're able to get away with not worrying about memory management.  "But the GC takes care of cleaning my objects for me!"  That's true; but if you want your application to be performant, you should at least understand what's going on in all of those circuits and silicon.

In Part 2, I talked a bit about how creating object pools can help you to avoid garbage collections by keeping memory allocated for a long time.  Here, I'm going to talk a bit more extensively about how objects are stored in memory, what a "pinned object" is, and how pointers can be used quickly in C#.

NOTE: This article assumes you are familiar with pointers and pointer arithmetic.  If not, you may wish to brush up.

Objects in Memory - A Closer Look at the Heap

When you create an instance of a class (not a struct or an enum), your object is being stored on the "heap" - a large contiguous area of memory that is just there.  (For more information on the heap, read up on Part 2).  This includes, interestingly enough, any Array objects you create (such as a byte[[) - they're reference objects, not value objects.  (The one exception is if you use the stackalloc operator in C#).  So, suppose I make the following class:

   1: class Sample
   2: {
   3:     public int A;
   4:     public long B;
   5:     public short C;
   6:     public short D;
   7: }

Here's how it would conceptually look in a memory block:

An instance of Sample in memory

As you can see, the class is laid out contiguously (although the CLR does not guarantee this behavior unless it is decorated with [StructLayout(LayoutKind.Sequential)]).  Still, you get the idea.

However, when we create an object and get a reference to it, we don't actually get a pointer to the object - we get a "reference".  This isn't a reference like you might expect in C or C++, either; rather, it's similar to a handle.  We can use it just like it's conceptually in memory like I laid out.  However, the CLR hides implementation details; for example, every object on the heap has at least a reference to its RuntimeTypeHandle so that casting can be checked at runtime.  To demonstrate, let's take a byte[].  When it's stored on the heap, it's pretty clear what we're looking at.  Arrays of any type are an interesting edge case in .NET; normally, C# does not allow you to obtain a pointer of a managed type (and in fact you can't do what I'm about to demonstrate with a reference type), but arrays themselves ARE managed types (don't worry about the last two lines of output just yet).

   1: static unsafe void Main(string[] args)
   2: {
   3:     byte[] bytes = new byte[100];
   4:     bytes[0] = 1;
   5:     bytes[1] = 2;
   6:     bytes[2] = 3;
   7:     bytes[3] = 4;
   8:  
   9:     Type arrayType = null;
  10:     fixed (byte* pMem = &bytes[0])
  11:     {
  12:         Console.WriteLine("{0:x16}", (long)pMem);
  13:         int* pArrayBase = (int*) pMem;
  14:         Console.WriteLine("{0:x8}", *pArrayBase);
  15:         pArrayBase--;
  16:         Console.WriteLine("{0:x8}", *pArrayBase);
  17:         pArrayBase--;
  18:         Console.WriteLine("{0:x8}", *pArrayBase);
  19:         pArrayBase--;
  20:         Console.WriteLine("{0:x8}", *pArrayBase);
  21:         pArrayBase--;
  22:         Console.WriteLine("{0:x8}", *pArrayBase);
  23:         long rtth = *(long*) pArrayBase;
  24:         RuntimeTypeHandle handle;
  25:         // RTTH is a value-type whose only member is an IntPtr; can be set as a long on x64
  26:         RuntimeTypeHandle* pH = &handle;
  27:         *((long*) pH) = rtth;
  28:         arrayType = Type.GetTypeFromHandle(handle);
  29:     }
  30:  
  31:     if (arrayType != null)
  32:     {
  33:         Console.WriteLine(arrayType.Name);
  34:     }
  35:  
  36:     Console.WriteLine("byte[] RTTH: {0:x16}", typeof (byte[]).TypeHandle.Value.ToInt64());
  37:     int a = 1;
  38:     int b = 2;
  39:     int* pA = &a;
  40:     int* pB = &b;
  41:     Console.WriteLine(*pB);
  42:     Console.WriteLine(*(pB - 1));
  43:  
  44:     Console.ReadLine();
  45: }

Now, just to clarify: I run on x64.  The above code will not function as expected on x86.  There are a few items that will also produce slightly varying results for you; for instance, pMem shouldn't be cast to a long on x86, and to get to the instance's stored RTTH, you only need to decrement the pointer 3 times on x86 (whereas the RTTH on x64 is 8 bytes long).  Here's the output on my machine:

0000000002a31748                Console.WriteLine("{0:x16}", (long)pMem);
04030201                        Console.WriteLine("{0:x8}", *(pMem));
00000000                        Console.WriteLine("{0:x8}", *(pMem - 1));
00000064                        Console.WriteLine("{0:x8}", *(pMem - 2));
00000642                        Console.WriteLine("{0:x8}", *(pMem - 3));
7890a4a8                        Console.WriteLine("{0:x8}", *(pMem - 4));
Byte[] Console.WriteLine(arrayType.Name); byte[] RTTH: 00000642789562c2 Console.WriteLine("{0:x16}", typeof(byte[]).TypeHandle.Value.ToInt64()); 2 Console.WriteLine(*pB); 1 Console.WriteLine(*(pB - 1));

So, here we see that the runtime type identifier is stored as part of the object reference on the heap; so is the array length (that's the hex value 00000064 that you see on the fourth line of output - it's 100 in decimal).  That's how arrays are stored on the heap, and it's pretty much how objects are stored; when we have an object reference, we can treat it as if it's a pointer into memory.  But it's more than that; below our "pointer" exists additional information about the object.  We don't get to see that additional information because the CLR hides it from us.

What are reference variables then?  Ultimately, they're stack variables that contain our "pointer" that isn't really a pointer.  I said not to worry too much about the last two lines before, but they are intended to show you one thing: stack variables are allocated sequentially on the stack.  I declared a, then b; by obtaining a pointer to b, I was also able to obtain a pointer to a by decrementing the pointer by the size of the variable (in this case, 32 bits).  To show you that my handle is in fact legitimately pointing to a stack variable, take a look at the following code:

   1: static unsafe void Main(string[] args)
   2: {
   3:     Sample s = new Sample {A = 0x01020304, B = 0x0f0e0d0c0b0a0908, C = 0x0706, D = 0x0504};
   4:     long a = 1;
   5:     long b = 2;
   6:     long* pA = &a;
   7:     long* pB = &b;
   8:     Console.WriteLine("{0:x16}", *pB);
   9:     Console.WriteLine("{0:x16}", *(pB - 1));
  10:     Console.WriteLine("{0:x16}", *(pB - 2));
  11:  
  12:     long prS = (long)(pB - 2); // the location of s on the stack
  13:     long* pS = *(long**)prS;
  14:     Console.WriteLine("{0:x16}", *pS);
  15:     Console.WriteLine("{0:x16}", *(pS + 1));
  16:     Console.WriteLine("{0:x16}", *(pS + 2));
  17:  
  18:     Console.ReadLine();
  19: }

Again, the above code will not function as expected on x86 (to make it do so, replace all long references with int).  The output of this code is fascinating:

0000000000000002      b
0000000000000001      a
0000000002be16c8      s
00000642801a4400      *pS
0f0e0d0c0b0a0908      *(ps + 1) 
0504070601020304      *(ps + 2) 

You might notice that s is a pointer to the heap, and that dereferencing it gives us a number that looks suspiciously similar to a RuntimeTypeHandle just like in the last example, and you'd be correct.  The other interesting thing is the variable order: the B variable in the Sample class was aligned so that it would be first (8-byte alignment on x64 appears to be the default).  Applying [StructLayout] to it as noted before makes it look right (although to the untrained eye it will look entirely backwards due to endianness).

In Part 2, I talked about how garbage collection allows us to not worry so much about external fragmentation of the heap, because the GC performs a process called "compaction," by which objects are moved around in memory so that there aren't small areas of free space.  The interesting question is: what happens if a GC compaction happens and we have a pointer to an object?

Accessing Memory Locations with Pinned Objects

The CLR allows us to "pin" an object so that it is not moved during garbage collection.  This can potentially have some big consequences for garbage collection, though; the heap is still fragmented if an object is pinned during a pass.  What's more is that if the object becomes eligible for compaction after the pass, it's still considered a gen-0 object even though it should have moved to gen-1.  C# enables us to pin an object via the fixed statement.

In truth, the only objects worth pinning are arrays.  You can't pin a regular reference object to get a pointer for the reason shown above (it's not guaranteed to follow any particular pattern), and single value-type objects can be accessed directly on the stack without pinning.  Pinning arrays has some good performance benefits (which I'll get to a bit later), but like I said, not without a cost.

The neatest part about pointers in C# is that a pointer can be cast to a pointer of any other value-type; this is exceptionally common in C code (reading a file into memory by reading the length of a struct, and then treating the memory as a pointer to that struct, for example).  Sometimes it's simply easier for us to do that in C# than it is to use a stream.  Consider the case of reading a PE file header; it's a nightmare!  So many lines of code when you could simply read in a buffer and call it a PE file header.  Strong typing imposes that limitation, but thankfully even on edge cases like this, we can work around it.

I'm not going to discuss the performance characteristics of pinned objects during a garbage collection; for one, they're hard to measure, but more importantly, it's been well-documented to hurt the performance of the garbage collector.

Getting Pointers without the Pinning

There are other means by which to obtain, create, and manage pointers aside from the standard fixed statement.  As mentioned earlier, you can use the stackalloc statement to allocate a block of memory on the stack; it provides a pointer to the stack with the base of an array.  Alternatively, if you don't care about portability, you can use native Windows functions to allocate memory for you.  These functions might include LocalAlloc, HeapAlloc, VirtualAlloc, or VirtualAllocEx, depending on what your needs are.

An interesting prospect might be to allocate multiple heaps using the HeapCreate APIs; this would allow you to manage your memory per-area of responsibility; Noel Llopis suggests such a strategy in his book C++ for Game Programmers.  Although all of this memory management might seem like overkill, if you're really hunting for the next tweak to speed up your code, this might help you get over the line.

Performance Characteristics of Unsafe vs. Safe Code

Let's not kid ourselves; unsafe code is inherently unsafe because the runtime doesn't manage the code for us.  So before using code like this in your applications, be absolutely certain that you need it.

The CLR provides the means to access heap memory via the Marshal.AllocHGlobal method.  The documentation notes that it uses LocalAlloc, probably because LocalAlloc doesn't require a pointer to a heap.  Despite the admonition that you'll get better performance and more features out of the other functions, the use of LocalAlloc does not seem to be a hindrance in speed relative to using HeapCreate/HeapAlloc/HeapDestroy.  The execution times are shown here:

  Debug Mode - 5 Iterations Release Mode - 5 Iterations Debug Mode - 25 Iterations Release Mode - 25 Iterations
Normal .NET Array [] notation x86: 17ms; x64: 45ms x86: 15ms; x64: 65ms x86: 109ms; x64: 252ms x86: 95ms; x64: 333ms
Marshal.AllocHGlobal with pointers x86: 15ms; x64: 36ms x86: 14ms; 30ms x86: 95ms; x64: 193ms x86: 80ms; x64: 148ms
LocalAlloc P/Invoke with Pointers x86: 16ms; x64: 37ms x86: 14ms; x64: 31ms x86: 96ms; x64: 193ms x86: 78ms; x64: 161ms
HeapAlloc P/Invoke with Pointers x86: 16ms; x64: 42ms x86: 14ms; x64: 32ms x86: 102ms; x64: 197ms x86: 88ms; x64: 166ms

Surprisingly, the normal array bracket notation performed significantly worse in release builds than in debug builds on x64; I don't really have an answer for why that would be.  I did not perform extensive statistical regression or even provide averages; I ran each set three times, and if they all looked mostly the same, I used the data.  These data are from x64 machines; the x86 results were from setting compilation target to x86 and running the program in WOW64.  I was surprised how much slower x64 was, though it might have been because we were using machine words on x86, and half-words on x64.  Perhaps memory access would be faster if we were using longs on x64.  (Prelim tests seem to confirm this; I will post a follow-up soon.)

Here are the P/Invoke declarations:

   1: public enum LocalAllocFlags
   2: {
   3:     Fixed = 0,
   4:     Moveable = 2,
   5:     ZeroInit = 0x40,
   6: }
   7:  
   8: public enum HeapCreateFlags
   9: {
  10:     None = 0,
  11:     EnableExecute = 0x40000,
  12:     GenerateExceptions = 4,
  13:     NoSerialize = 1,
  14: }
  15:  
  16: public enum HeapAllocFlags
  17: {
  18:     None = 0,
  19:     GenerateExceptions = 4,
  20:     NoSerialize = 1,
  21:     ZeroMemory = 8,
  22: }
  23:  
  24: static class UnsafeNativeMethods
  25: {
  26:     [DllImport("kernel32.dll")]
  27:     public static extern IntPtr LocalAlloc(LocalAllocFlags flags, UIntPtr uBytes);
  28:  
  29:     [DllImport("kernel32.dll")]
  30:     public static extern IntPtr LocalFree(IntPtr hMem);
  31:  
  32:     [DllImport("kernel32.dll")]
  33:     public static extern IntPtr HeapCreate(HeapCreateFlags flOptions, UIntPtr dwInitialSize, UIntPtr dwMaxSize);
  34:  
  35:     [DllImport("kernel32.dll")]
  36:     public static extern IntPtr HeapAlloc(IntPtr hHeap, HeapAllocFlags dwFlags, UIntPtr dwBytes);
  37:  
  38:     [DllImport("kernel32.dll")]
  39:     public static extern IntPtr HeapFree(IntPtr hHeap, HeapAllocFlags dwFlags, IntPtr lpMem);
  40:  
  41:     [DllImport("kernel32.dll")]
  42:     [return: MarshalAs(UnmanagedType.Bool)]
  43:     public static extern bool HeapDestroy(IntPtr hHeap);
  44: }

And finally, here's the benchmarking code:

   1: class Program
   2: {
   3:     private const int ITERATIONS = 25;
   4:     static unsafe void Main(string[] args)
   5:     {
   6:         Console.WriteLine("Press <enter> to start.");
   7:         Console.ReadLine();
   8:  
   9:         Stopwatch arrayClock = Stopwatch.StartNew();
  10:         for (int iter = 0; iter < ITERATIONS; iter++)
  11:         {
  12:             RunArrayTest();
  13:         }
  14:         arrayClock.Stop();
  15:         Console.WriteLine("{0}ms elapsed for Array test, {1} iterations.  Press <enter> to continue.", arrayClock.ElapsedMilliseconds, ITERATIONS);
  16:         Console.ReadLine();
  17:  
  18:         Stopwatch marshalClock = Stopwatch.StartNew();
  19:         for (int iter = 0; iter < ITERATIONS; iter++)
  20:         {
  21:             RunMarshalAllocHGlobalTest();
  22:         }
  23:         marshalClock.Stop();
  24:         Console.WriteLine("{0}ms elapsed for Marshal test, {1} iterations.  Press <enter> to continue.", marshalClock.ElapsedMilliseconds, ITERATIONS);
  25:         Console.ReadLine();
  26:  
  27:         Stopwatch localClock = Stopwatch.StartNew();
  28:         for (int iter = 0; iter < ITERATIONS; iter++)
  29:         {
  30:             RunLocalAllocTest();
  31:         }
  32:         localClock.Stop();
  33:         Console.WriteLine("{0}ms elapsed for LocalAlloc P/Invoke test, {1} iterations.  Press <enter> to continue.", localClock.ElapsedMilliseconds, ITERATIONS);
  34:         Console.ReadLine();
  35:  
  36:         Stopwatch heapClock = Stopwatch.StartNew();
  37:         for (int iter = 0; iter < ITERATIONS; iter++)
  38:         {
  39:             RunHeapAllocTest();
  40:         }
  41:         heapClock.Stop();
  42:         Console.WriteLine("{0}ms elapsed for HeapAlloc P/Invoke test, {1} iterations.  Press <enter> to continue.", heapClock.ElapsedMilliseconds, ITERATIONS);
  43:         Console.ReadLine();
  44:     }
  45:  
  46:     private unsafe static void RunHeapAllocTest()
  47:     {
  48:         UIntPtr pSize = new UIntPtr((uint)(1048576 * sizeof(int)));
  49:         IntPtr pHeap = UnsafeNativeMethods.HeapCreate(HeapCreateFlags.None, pSize, UIntPtr.Zero);
  50:         if (pHeap == IntPtr.Zero)
  51:         {
  52:             Console.WriteLine("Could not create heap.");
  53:             return;
  54:         }
  55:         IntPtr pMem = UnsafeNativeMethods.HeapAlloc(pHeap, HeapAllocFlags.ZeroMemory, pSize);
  56:         if (pMem == IntPtr.Zero)
  57:         {
  58:             Console.WriteLine("Could not allocate heap.");
  59:             return;
  60:         }
  61:  
  62:         int* pNumbers = (int*)pMem.ToPointer();
  63:         for (int i = 0; i < 1048576; i++)
  64:         {
  65:             pNumbers[i] = i;
  66:         }
  67:         UnsafeNativeMethods.HeapFree(pHeap, HeapAllocFlags.None, pMem);
  68:         UnsafeNativeMethods.HeapDestroy(pHeap);
  69:     }
  70:  
  71:     private unsafe static void RunLocalAllocTest()
  72:     {
  73:         UIntPtr pSize = new UIntPtr((uint)(1048576 * sizeof(int)));
  74:         IntPtr pMem = UnsafeNativeMethods.LocalAlloc(LocalAllocFlags.ZeroInit, pSize);
  75:         if (pMem == IntPtr.Zero)
  76:         {
  77:             Console.WriteLine("Could not allocate heap memory.");
  78:             return;
  79:         }
  80:  
  81:         int* pNumbers = (int*)pMem.ToPointer();
  82:         for (int i = 0; i < 1048576; i++)
  83:         {
  84:             pNumbers[i] = i;
  85:         }
  86:         UnsafeNativeMethods.LocalFree(pMem);
  87:     }
  88:  
  89:     private unsafe static void RunMarshalAllocHGlobalTest()
  90:     {
  91:         IntPtr pMem = Marshal.AllocHGlobal(1048576 * sizeof (int));
  92:         if (pMem == IntPtr.Zero)
  93:         {
  94:             Console.WriteLine("Could not allocate memory.");
  95:             return;
  96:         }
  97:  
  98:         int* pNumbers = (int*) pMem.ToPointer();
  99:         for (int i = 0; i < 1048576; i++)
 100:         {
 101:             pNumbers[i] = i;
 102:         }
 103:         Marshal.FreeHGlobal(pMem);
 104:     }
 105:  
 106:     private static void RunArrayTest()
 107:     {
 108:         int[] array = new int[1048576]; //4mb array
 109:         for (int i = 0; i < 1048576; i++)
 110:         {
 111:             array[i] = i;
 112:         }
 113:     }
 114: }

There isn't anything to complicated; a 4MB buffer is allocated using the selected method and then each 32-bit element is populated with its array index.  Unsafe code outperforms safe code in each x64 test, though the difference is much more marginal on x86.  The explanation is simple; safe code is checking the array index on every lookup. 

Summary

Using pointers and unsafe code can be a boost to your application's performance, but you should consider where, when, and how you do it.  Since you don't have control over when the GC is invoked, pinning objects like arrays can be costly.  You might instead consider using Windows API functions or direct memory access functions through the Marshal class to organize your memory if you absolutely need to chug that last piece of speed out of your code, but be warned - it's not safe out there.

The "Speedy C#" Series:

7Aug/082

Speedy C#, Part 2: Optimizing Memory Allocations – Pooling and Reusing Objects

Posted by Rob

In C#, Visual Basic .NET, C++/CLI, J# - the list goes on - we're freed from having to worry about our memory management.  Objects take care of themselves, and when the CLR garbage collector detects that an object is no longer in use, it frees the associated memory.  That doesn't mean that we should run around allocating and deallocating objects all willy-nilly; in fact, since we have less control over memory, we arguably have the opportunity to be more careful with the way we use high-frequency objects.

Memory Regions in .NET

In .NET, and generally in most programming, we can think of two places in which we can access memory: the stack and the heap.  We can think of Stack memory as temporary workspace, or scratch space; when we leave a function, all of our stack goes away.  Way, way down in the machine architecture, the stack also stores the return addresses of functions.  The stack also stores function parameters.  It's generally very orderly, inexpensive, but its volatile nature makes it a poor candidate for long-term storage.  In .NET, all types that derive from the ValueType class are stored on the stack unless they are boxed into an object reference; this includes types defined with the struct and enum keywords, as well as all of the primitive types except string (including int, double, and bool).

Heap memory is another matter.  The heap is a region of memory reserved for the use of the program and is intended to store objects that aren't quite so transient.  This might be something like a database connection, a file or buffer, or a window.

The Enemy: Fragmentation

Over time, objects are allocated and eventually released, and because there's not really any rhyme or reason, the heap becomes chaotic.  Allocations grab the first free block that's large enough (sometimes larger than necessary) and hold onto it until they go into the abyss.  This leads to fragmentation - all the free memory must be tracked somehow, and here's the real killer: contiguous blocks of free memory may not always be recognized as such.  Check this out: let's say we have a heap allocated at memory location 0x4000 that is 32 bytes wide:

An un-used heap

(Yes, my awesome artwork was done with none other than Excel!)

Suppose we allocate an 8-byte object and another 8-byte object, then a 16-byte object.  The first is in red, the second in orange, and the third in gray:

The heap after it was filled

Now I'll free the first and the third objects; we'll have 24 bytes of total free memory:

A fragmented heap

Either we need to keep track of every little piece of memory, which might be the fastest algorithm for releasing but slow for allocating (not to mention potentially VERY wasteful), or try to come up with another solution.  This type of memory fragmentation is referred to as external fragmentation.

The Garbage Collector and Compaction

The garbage collector has two components: a reference counter and a compaction engine.  The reference counter is responsible for determining when objects no longer have references to them; this frees programmers from having to explicitly destroy objects (as is the practice in C++ with the delete operator, or in C with the free function).  A lazy thread is then able to release and compact memory as needed, avoiding much of the overhead of external fragmentation and also allowing unused memory to be reclaimed.  The garbage collector in .NET is generational; it checks the newest objects first (what are called "gen-0"), and if the newest objects are still in use, they get moved to gen-1.  If the memory pressure requires, gen-1 objects are evaluated, and if they are still in use, they get moved to gen-2.  Gen-2 objects are considered long-lasting, and are only checked when memory pressure is severe.

Let's go back to our heap example; supposing I had an 8-byte, another 8-byte, and a 12-byte allocation, here's my heap graph:

A new heap

Object 1 (in red) has gone out of scope, but objects 2 and three are sticking around.  Using normal memory freeing rules, the largest object that could be allocated would still only be 8 bytes, because that would be the largest contiguous free space.  However, using .NET's compacting garbage collector, we could expect something along these lines:

A compacted heap graph

Here we can see we've dealt with the problem of external fragmentation by compacting the heap.  This convenience doesn't come without a cost, though; while the garbage collector performed the compaction, all of your application threads were suspended.  The GC can't guarantee object integrity if memory is getting abused during a garbage collection!

Preventing Compaction: Stop Killing off Objects!

Object pooling is a pattern to use that allows objects to be reused rather than allocated and deallocated, which helps to prevent heap fragmentation as well as costly GC compactions.  A pool can be thought of as an object factory; in essence, the most rudimentary pool could look like this:

   1: public class Pool<T> where T : new()
   2: {
   3:     private Stack<T> _items = new Stack<T>();
   4:     private object _sync = new object(); 
   5:  
   6:     public T Get()
   7:     {
   8:         lock (_sync)
   9:         {
  10:             if (_items.Count == 0)
  11:             {
  12:                 return new T();
  13:             }
  14:             else
  15:             {
  16:                 return _items.Pop();
  17:             }
  18:         }
  19:     }
  20:  
  21:     public void Free(T item)
  22:     {
  23:         lock (_sync)
  24:         {
  25:             _items.Push(item);
  26:         }
  27:     }
  28: }

Here, objects are created entirely on-demand and, when freed, are stored in a stack.  The reason we want to use a Stack is the performance characteristic of adding and removing objects; operations are always performed at the end of the list, which makes it highly efficient to add or remove items.  If possible, it may be prudent to pre-create a number of objects for use throughout the lifetime of your application. 

Here's an example: the project I've been discussing lately uses a pool of byte arrays to handle incoming network messages received and sent via a Socket.  When pooling is enabled, over the course of the application's lifetime, there were 17 Gen-0 collections, 5 Gen-1 collections, and 3 Gen-2 collections; a total of 270 byte[] instances were allocated, of which 44 were eligible for pooling and were pooled.  When pooling is disabled, there were 22 Gen-0 collections, 5 Gen-1 collections, and 3 Gen-2 collections; a total of 11,660 byte[] instances were allocated, of which approximately 10,900 were eligible for pooling.  That's a lot of memory!

Summary - When and Why

Object pooling is a powerful optimization technique, and if you're already using factory patterns it shouldn't be terribly foreign to you.  The .NET Framework includes the ThreadPool class as part of System.Threading.  Other objects you might consider pooling are database connections, any expensive links to unmanaged code, or anything that needs to be allocated frequently and can then be thrown away.  In my example, byte arrays are exceptionally good for this because they can be overwritten easily.

Further Reading

The "Speedy C#" Series:

5Aug/080

Speedy C#, Part 1: Optimizing Long if-else or switch Branches

Posted by Rob

Lately I've been doing some interesting work that I've alluded to elsewhere dealing with the binary communications protocol hosted Blizzard Entertainment's Battle.net game service.  It's kind of what brought me into C# development in the first place; I walked away from it for a few years, and now I've been digging into it again.  And I've learned a few things between then and now; I've been particularly interested in looking at the under-the-hood workings of the CLR, and so I'm starting a new series on "Speedy C#".  Let me be the first to point out that optimizations have a unique way of obfuscating code; particularly in this example, if you don't explain why you're doing what you're doing, and exactly what result you expect, you could run into trouble, or worse, your colleagues may run into trouble.  So while going through this series,

A little background: the binary protocol used for Battle.net has about 80 or so message IDs, which generally have a different structure for each.  The messages don't necessarily come as a result of sending a message first, and so the general pattern is that a receive loop is in place that receives the data, parses it, and then sends events back to the client.  In fact, there are no synchronous requests defined by the protocol.

When I first started programming, I had handlers for every message ID in a switch/case branching construct:

   1: switch (packetID)
   2: {
   3:     case BncsPacketId.Null:
   4:         break;
   5:     case BncsPacketId.EnterChat:
   6:         string ecUniqueName = pck.ReadNTString();
   7:         string ecStatstring = pck.ReadNTString();
   8:         string ecAcctName = pck.ReadNTString();
   9:         EnteredChatEventArgs ecArgs = new EnteredChatEventArgs(ecUniqueName, ecStatstring, ecAcctName);
  10:         OnEnteredChat(ecArgs);
  11:         break;
  12:     // ... ad nauseum
  13: }

When I looked at this in ildasm, I noticed that it declared a max stack size of something ridiculously large (sorry I don't have a specific number - it was about 6 years ago).  I also noticed that there were a LOT of branches, but not necessarily in the order in which I had written them.  The compiler had intrinsically optimized my code to perform a binary search.  Fairly interesting, optimal speed at O(log N), and something that most of us wouldn't have thought of naturally!

When I last revisited this type of development, I broke all of my handlers out of the branching conditional, calling a separate method to handle each message.  This had a nice effect of making me not have to worry about variable name collisions like I had to in the above example, and it made the code slightly more maintainable.  It's difficult to gauge on paper whether that would have been better or worse performance; there was certainly far less stack allocation, but there was an additional (potentially virtual) method call.

The latest code incorporated into my library takes a different approach: I declare a Dictionary<BncsPacketId, ParseCallback>, populate it with default handlers, and allow existing handlers to be replaced and new ones to be added provided certain conditions are met.  This has had several benefits:

  • According to MSDN, Dictionary<TKey, TValue> approaches O(1), which is (obviously) the fastest lookup we could hope for. 
  • Adding support for new or changed messages does not require change to the code, only that a handler be updated via a method call.
  • Handlers can be switched at runtime.

In this code, a ParseCallback is a delegate that accepts information provided by the message header and the message contents themselves.  This has modified the entire parsing thread to be:

   1: private void Parse()
   2: {
   3:     try
   4:     {
   5:         while (IsConnected)
   6:         {
   7:             m_parseWait.Reset();
   8:  
   9:             while (m_packetQueue.Count == 0)
  10:             {
  11:                 m_parseWait.WaitOne();
  12:             }
  13:  
  14:             ParseData data = m_packetQueue.Dequeue();
  15:             if (m_packetToParserMap.ContainsKey(data.PacketID))
  16:             {
  17:                 m_packetToParserMap[data.PacketID](data);
  18:             }
  19:             else
  20:             {
  21:                 switch (data.PacketID)
  22:                 {
  23:                     #region SID_NULL
  24:                     case BncsPacketId.Null:
  25:                         break;
  26:                     #endregion
  27:                     default:
  28:                         Trace.WriteLine(data.PacketID, "Unhandled packet");
  29:                         if (!BattleNetClientResources.IncomingBufferPool.FreeBuffer(data.Data))
  30:                         {
  31:                             Debug.WriteLine(data.PacketID, "Incoming buffer was not freed for packet");
  32:                         }
  33:                         break;
  34:                 }
  35:             }
  36:         }
  37:     }
  38:     catch (ThreadAbortException)
  39:     {
  40:         // exit the thread gracefully.
  41:     }
  42: }

Now, obviously, this is a very domain-specific optimization that I wouldn't make unless it makes sense in the problem domain.  For mine, it does; I am writing the library so that others are able to integrate functionality without having to worry about modifying code that they maybe are not familiar with or are worried about breaking.  If you absolutely need to use this method, be sure to document why.

The "Speedy C#" Series:

Tagged as: , No Comments
1Aug/080

My C# 4.0 Wishlist Part 6: Automatic Properties for Enum Variables

Posted by Rob

OK, so I lied; I'm not stopping at 5 parts.

I've been working with enumerations frequently lately; the Battle.net chat protocol is binary and therefore the values that come over the wire have different contextual meanings based on the values that might have preceded them.  For example, a chat message event actually can have about a dozen meanings; it can be a server-broadcasted message, a message from another user, or just an announcement that a user joined the channel.  In addition to the standard values identifying things like message type, messages typically have one form or another of flags; if the event is based on a user, the flags contain information about the user's status on the server (whether the user is an administrator or has operator privileges in the channel).  Others, such as channel information updates, contain information about the chat channel itself, such as whether it is public, silent, or otherwise normal.

The Problem

Having had to deal with enumerations frequently has made me hate code like this:

   1: if (((UserFlags)e.Flags & UserFlags.ChannelOperator) == UserFlags.ChannelOperator)

Especially when working with bitwise values (enumerations decorated with the [Flags] attribute), because of the specific operator precedence constraints that C# places on the developer, this becomes annoying quickly.  So much so, that classes where I have to do that frequently end up with several TestFlag() methods, but even these are limited.  Consider code like this:

   1: bool TestFlag(UserFlags test, UserFlags reference) { ... }
   2: bool TestFlag(ChannelFlags test, ChannelFlags reference { ... }

Or this:

   1: bool TestFlag<T>(T test, T reference) {
   2:  // hard to implement since no meaningful type constraint can be placed on T
   3: }

Or this:

   1: bool TestFlag(int test, int reference) { ... }

In proposition 1 we have to implement n methods, either repeatedly or in a globally-defined, internal utility class; that stinks.  Proposition 2 is difficult to implement; we can't place a type constraint because C# doesn't allow enum type constraints, and since enums have a type constraint themselves of always being an integral value, this would be ideal; but type constraints in this case are limited to struct, which doesn't guarantee operator | or operator &.  In proposition 3, every time we want to test, we need to cast to int (or long) and lose type information.  I guess that works, but then you worry that you end up with code like this:

   1: if (TestFlag((int)e.User.Flags, (int)UserFlags.ServerAdministrator))
   2: { 
   3:     // ...
   4: } 
   5: else if (TestFlag((int)e.User.Flags, (int)UserFlags.ChannelOperator))
   6: {
   7:     // ...
   8: } // ...

No, there's a cleaner solution, and, like the compiler features added to C# 3.0, it doesn't require a new CLR: automatic properties on enumerations.

The Solution

Internally, enumerations are treated as their base numeric type by the CLR; the variable itself carries around type information, but it's not strong and can be changed by direct casting.  But the compiler always knows the type of a local variable and can apply it directly.  So, consider applying a property to an enumeration variable called IsEnumField.  Consider this [Flags] enumeration, and look at the code that uses it when using this style of coding:

   1: if (e.User.Flags.IsNone)
   2: {   } 
   3: else if (e.User.Flags.IsBlizzardRepresentative || e.User.Flags.IsBattleNetAdministrator)
   4: {   }
   5: else if (e.User.Flags.IsChannelOperator
   6: {   }
   7: else if (e.User.Flags.IsNoUDP)
   8: {   }

We can easily identify the pattern that the compiler supports; prefix "Is" to the field name and perform the underlying logic.

The great part about this solution is that the emitted code is exactly the same as what you or I would produce right now.  So the compiler can know by its clever compiler tricks to do this:

   1: if (e.User.Flags == UserFlags.None) {   }
   2: else if ((e.User.Flags & UserFlags.BlizzardRepresentative) == UserFlags.BlizzardRepresentative
   3:        || (e.User.Flags & UserFlags.BattleNetAdministrator) == UserFlags.BattleNetAdministrator) {   }
   4: else if ((e.User.Flags & UserFlags.ChannelOperator) == UserFlags.ChannelOperator) {   }
   5: else if ((e.User.Flags & UserFlags.NoUDP) == UserFlags.NoUDP) {   }

In that example, I qualified the type name UserFlags nine times.  Can you say "carpal tunnel"?

Future-Proofing

There are some considerations to make about this.  First, there are already going to be some enumerations in the wild with field names that begin with "Is," and it could very easily raise confusion if someone sees code such as user.Flags.IsIsOnline.  Fortunately, the solution is equally simple: create a decorator attribute, just like we did for extension methods:

   1: namespace System
   2: {
   3:     [AttributeUsage(AttributeTargets.Enum)]
   4:     public sealed class EnumPropertiesAttribute : Attribute { }
   5: }

Then, when you create an enumeration that you'd like to expose these style of properties, simply decorate the enumeration with this attribute.  IntelliSense knows to show the properties, the compiler knows to translate the properties, and we're in the free and clear.

Wouldn't it be great?

The C# 4.0 Wishlist series