Running with Code Like with scissors, only more dangerous

13Aug/080

Speedy C#, Part 3: Understanding Memory References, Pinned Objects, and Pointers

So often in the managed world we’re able to get away with not worrying about memory management.  "But the GC takes care of cleaning my objects for me!"  That’s true; but if you want your application to be performant, you should at least understand what’s going on in all of those circuits and silicon.

In Part 2, I talked a bit about how creating object pools can help you to avoid garbage collections by keeping memory allocated for a long time.  Here, I’m going to talk a bit more extensively about how objects are stored in memory, what a "pinned object" is, and how pointers can be used quickly in C#.

NOTE: This article assumes you are familiar with pointers and pointer arithmetic.  If not, you may wish to brush up.

Objects in Memory – A Closer Look at the Heap

When you create an instance of a class (not a struct or an enum), your object is being stored on the "heap" – a large contiguous area of memory that is just there.  (For more information on the heap, read up on Part 2).  This includes, interestingly enough, any Array objects you create (such as a byte[[) – they’re reference objects, not value objects.  (The one exception is if you use the stackalloc operator in C#).  So, suppose I make the following class:

   1: class Sample
   2: {
   3:     public int A;
   4:     public long B;
   5:     public short C;
   6:     public short D;
   7: }

Here’s how it would conceptually look in a memory block:

An instance of Sample in memory

As you can see, the class is laid out contiguously (although the CLR does not guarantee this behavior unless it is decorated with [StructLayout(LayoutKind.Sequential)]).  Still, you get the idea.

However, when we create an object and get a reference to it, we don’t actually get a pointer to the object – we get a "reference".  This isn’t a reference like you might expect in C or C++, either; rather, it’s similar to a handle.  We can use it just like it’s conceptually in memory like I laid out.  However, the CLR hides implementation details; for example, every object on the heap has at least a reference to its RuntimeTypeHandle so that casting can be checked at runtime.  To demonstrate, let’s take a byte[].  When it’s stored on the heap, it’s pretty clear what we’re looking at.  Arrays of any type are an interesting edge case in .NET; normally, C# does not allow you to obtain a pointer of a managed type (and in fact you can’t do what I’m about to demonstrate with a reference type), but arrays themselves ARE managed types (don’t worry about the last two lines of output just yet).

   1: static unsafe void Main(string[] args)
   2: {
   3:     byte[] bytes = new byte[100];
   4:     bytes[0] = 1;
   5:     bytes[1] = 2;
   6:     bytes[2] = 3;
   7:     bytes[3] = 4;
   8:  
   9:     Type arrayType = null;
  10:     fixed (byte* pMem = &bytes[0])
  11:     {
  12:         Console.WriteLine("{0:x16}", (long)pMem);
  13:         int* pArrayBase = (int*) pMem;
  14:         Console.WriteLine("{0:x8}", *pArrayBase);
  15:         pArrayBase--;
  16:         Console.WriteLine("{0:x8}", *pArrayBase);
  17:         pArrayBase--;
  18:         Console.WriteLine("{0:x8}", *pArrayBase);
  19:         pArrayBase--;
  20:         Console.WriteLine("{0:x8}", *pArrayBase);
  21:         pArrayBase--;
  22:         Console.WriteLine("{0:x8}", *pArrayBase);
  23:         long rtth = *(long*) pArrayBase;
  24:         RuntimeTypeHandle handle;
  25:         // RTTH is a value-type whose only member is an IntPtr; can be set as a long on x64
  26:         RuntimeTypeHandle* pH = &handle;
  27:         *((long*) pH) = rtth;
  28:         arrayType = Type.GetTypeFromHandle(handle);
  29:     }
  30:  
  31:     if (arrayType != null)
  32:     {
  33:         Console.WriteLine(arrayType.Name);
  34:     }
  35:  
  36:     Console.WriteLine("byte[] RTTH: {0:x16}", typeof (byte[]).TypeHandle.Value.ToInt64());
  37:     int a = 1;
  38:     int b = 2;
  39:     int* pA = &a;
  40:     int* pB = &b;
  41:     Console.WriteLine(*pB);
  42:     Console.WriteLine(*(pB - 1));
  43:  
  44:     Console.ReadLine();
  45: }

Now, just to clarify: I run on x64.  The above code will not function as expected on x86.  There are a few items that will also produce slightly varying results for you; for instance, pMem shouldn’t be cast to a long on x86, and to get to the instance’s stored RTTH, you only need to decrement the pointer 3 times on x86 (whereas the RTTH on x64 is 8 bytes long).  Here’s the output on my machine:

0000000002a31748                Console.WriteLine("{0:x16}", (long)pMem);
04030201                        Console.WriteLine("{0:x8}", *(pMem));
00000000                        Console.WriteLine("{0:x8}", *(pMem - 1));
00000064                        Console.WriteLine("{0:x8}", *(pMem - 2));
00000642                        Console.WriteLine("{0:x8}", *(pMem - 3));
7890a4a8                        Console.WriteLine("{0:x8}", *(pMem - 4));
Byte[] Console.WriteLine(arrayType.Name); byte[] RTTH: 00000642789562c2 Console.WriteLine("{0:x16}", typeof(byte[]).TypeHandle.Value.ToInt64()); 2 Console.WriteLine(*pB); 1 Console.WriteLine(*(pB - 1));

So, here we see that the runtime type identifier is stored as part of the object reference on the heap; so is the array length (that’s the hex value 00000064 that you see on the fourth line of output – it’s 100 in decimal).  That’s how arrays are stored on the heap, and it’s pretty much how objects are stored; when we have an object reference, we can treat it as if it’s a pointer into memory.  But it’s more than that; below our "pointer" exists additional information about the object.  We don’t get to see that additional information because the CLR hides it from us.

What are reference variables then?  Ultimately, they’re stack variables that contain our "pointer" that isn’t really a pointer.  I said not to worry too much about the last two lines before, but they are intended to show you one thing: stack variables are allocated sequentially on the stack.  I declared a, then b; by obtaining a pointer to b, I was also able to obtain a pointer to a by decrementing the pointer by the size of the variable (in this case, 32 bits).  To show you that my handle is in fact legitimately pointing to a stack variable, take a look at the following code:

   1: static unsafe void Main(string[] args)
   2: {
   3:     Sample s = new Sample {A = 0x01020304, B = 0x0f0e0d0c0b0a0908, C = 0x0706, D = 0x0504};
   4:     long a = 1;
   5:     long b = 2;
   6:     long* pA = &a;
   7:     long* pB = &b;
   8:     Console.WriteLine("{0:x16}", *pB);
   9:     Console.WriteLine("{0:x16}", *(pB - 1));
  10:     Console.WriteLine("{0:x16}", *(pB - 2));
  11:  
  12:     long prS = (long)(pB - 2); // the location of s on the stack
  13:     long* pS = *(long**)prS;
  14:     Console.WriteLine("{0:x16}", *pS);
  15:     Console.WriteLine("{0:x16}", *(pS + 1));
  16:     Console.WriteLine("{0:x16}", *(pS + 2));
  17:  
  18:     Console.ReadLine();
  19: }

Again, the above code will not function as expected on x86 (to make it do so, replace all long references with int).  The output of this code is fascinating:

0000000000000002      b
0000000000000001      a
0000000002be16c8      s
00000642801a4400      *pS
0f0e0d0c0b0a0908      *(ps + 1) 
0504070601020304      *(ps + 2) 

You might notice that s is a pointer to the heap, and that dereferencing it gives us a number that looks suspiciously similar to a RuntimeTypeHandle just like in the last example, and you’d be correct.  The other interesting thing is the variable order: the B variable in the Sample class was aligned so that it would be first (8-byte alignment on x64 appears to be the default).  Applying [StructLayout] to it as noted before makes it look right (although to the untrained eye it will look entirely backwards due to endianness).

In Part 2, I talked about how garbage collection allows us to not worry so much about external fragmentation of the heap, because the GC performs a process called "compaction," by which objects are moved around in memory so that there aren’t small areas of free space.  The interesting question is: what happens if a GC compaction happens and we have a pointer to an object?

Accessing Memory Locations with Pinned Objects

The CLR allows us to "pin" an object so that it is not moved during garbage collection.  This can potentially have some big consequences for garbage collection, though; the heap is still fragmented if an object is pinned during a pass.  What’s more is that if the object becomes eligible for compaction after the pass, it’s still considered a gen-0 object even though it should have moved to gen-1.  C# enables us to pin an object via the fixed statement.

In truth, the only objects worth pinning are arrays.  You can’t pin a regular reference object to get a pointer for the reason shown above (it’s not guaranteed to follow any particular pattern), and single value-type objects can be accessed directly on the stack without pinning.  Pinning arrays has some good performance benefits (which I’ll get to a bit later), but like I said, not without a cost.

The neatest part about pointers in C# is that a pointer can be cast to a pointer of any other value-type; this is exceptionally common in C code (reading a file into memory by reading the length of a struct, and then treating the memory as a pointer to that struct, for example).  Sometimes it’s simply easier for us to do that in C# than it is to use a stream.  Consider the case of reading a PE file header; it’s a nightmare!  So many lines of code when you could simply read in a buffer and call it a PE file header.  Strong typing imposes that limitation, but thankfully even on edge cases like this, we can work around it.

I’m not going to discuss the performance characteristics of pinned objects during a garbage collection; for one, they’re hard to measure, but more importantly, it’s been well-documented to hurt the performance of the garbage collector.

Getting Pointers without the Pinning

There are other means by which to obtain, create, and manage pointers aside from the standard fixed statement.  As mentioned earlier, you can use the stackalloc statement to allocate a block of memory on the stack; it provides a pointer to the stack with the base of an array.  Alternatively, if you don’t care about portability, you can use native Windows functions to allocate memory for you.  These functions might include LocalAlloc, HeapAlloc, VirtualAlloc, or VirtualAllocEx, depending on what your needs are.

An interesting prospect might be to allocate multiple heaps using the HeapCreate APIs; this would allow you to manage your memory per-area of responsibility; Noel Llopis suggests such a strategy in his book C++ for Game Programmers.  Although all of this memory management might seem like overkill, if you’re really hunting for the next tweak to speed up your code, this might help you get over the line.

Performance Characteristics of Unsafe vs. Safe Code

Let’s not kid ourselves; unsafe code is inherently unsafe because the runtime doesn’t manage the code for us.  So before using code like this in your applications, be absolutely certain that you need it.

The CLR provides the means to access heap memory via the Marshal.AllocHGlobal method.  The documentation notes that it uses LocalAlloc, probably because LocalAlloc doesn’t require a pointer to a heap.  Despite the admonition that you’ll get better performance and more features out of the other functions, the use of LocalAlloc does not seem to be a hindrance in speed relative to using HeapCreate/HeapAlloc/HeapDestroy.  The execution times are shown here:

  Debug Mode – 5 Iterations Release Mode – 5 Iterations Debug Mode – 25 Iterations Release Mode – 25 Iterations
Normal .NET Array [] notation x86: 17ms; x64: 45ms x86: 15ms; x64: 65ms x86: 109ms; x64: 252ms x86: 95ms; x64: 333ms
Marshal.AllocHGlobal with pointers x86: 15ms; x64: 36ms x86: 14ms; 30ms x86: 95ms; x64: 193ms x86: 80ms; x64: 148ms
LocalAlloc P/Invoke with Pointers x86: 16ms; x64: 37ms x86: 14ms; x64: 31ms x86: 96ms; x64: 193ms x86: 78ms; x64: 161ms
HeapAlloc P/Invoke with Pointers x86: 16ms; x64: 42ms x86: 14ms; x64: 32ms x86: 102ms; x64: 197ms x86: 88ms; x64: 166ms

Surprisingly, the normal array bracket notation performed significantly worse in release builds than in debug builds on x64; I don’t really have an answer for why that would be.  I did not perform extensive statistical regression or even provide averages; I ran each set three times, and if they all looked mostly the same, I used the data.  These data are from x64 machines; the x86 results were from setting compilation target to x86 and running the program in WOW64.  I was surprised how much slower x64 was, though it might have been because we were using machine words on x86, and half-words on x64.  Perhaps memory access would be faster if we were using longs on x64.  (Prelim tests seem to confirm this; I will post a follow-up soon.)

Here are the P/Invoke declarations:

   1: public enum LocalAllocFlags
   2: {
   3:     Fixed = 0,
   4:     Moveable = 2,
   5:     ZeroInit = 0x40,
   6: }
   7:  
   8: public enum HeapCreateFlags
   9: {
  10:     None = 0,
  11:     EnableExecute = 0x40000,
  12:     GenerateExceptions = 4,
  13:     NoSerialize = 1,
  14: }
  15:  
  16: public enum HeapAllocFlags
  17: {
  18:     None = 0,
  19:     GenerateExceptions = 4,
  20:     NoSerialize = 1,
  21:     ZeroMemory = 8,
  22: }
  23:  
  24: static class UnsafeNativeMethods
  25: {
  26:     [DllImport("kernel32.dll")]
  27:     public static extern IntPtr LocalAlloc(LocalAllocFlags flags, UIntPtr uBytes);
  28:  
  29:     [DllImport("kernel32.dll")]
  30:     public static extern IntPtr LocalFree(IntPtr hMem);
  31:  
  32:     [DllImport("kernel32.dll")]
  33:     public static extern IntPtr HeapCreate(HeapCreateFlags flOptions, UIntPtr dwInitialSize, UIntPtr dwMaxSize);
  34:  
  35:     [DllImport("kernel32.dll")]
  36:     public static extern IntPtr HeapAlloc(IntPtr hHeap, HeapAllocFlags dwFlags, UIntPtr dwBytes);
  37:  
  38:     [DllImport("kernel32.dll")]
  39:     public static extern IntPtr HeapFree(IntPtr hHeap, HeapAllocFlags dwFlags, IntPtr lpMem);
  40:  
  41:     [DllImport("kernel32.dll")]
  42:     [return: MarshalAs(UnmanagedType.Bool)]
  43:     public static extern bool HeapDestroy(IntPtr hHeap);
  44: }

And finally, here’s the benchmarking code:

   1: class Program
   2: {
   3:     private const int ITERATIONS = 25;
   4:     static unsafe void Main(string[] args)
   5:     {
   6:         Console.WriteLine("Press <enter> to start.");
   7:         Console.ReadLine();
   8:  
   9:         Stopwatch arrayClock = Stopwatch.StartNew();
  10:         for (int iter = 0; iter < ITERATIONS; iter++)
  11:         {
  12:             RunArrayTest();
  13:         }
  14:         arrayClock.Stop();
  15:         Console.WriteLine("{0}ms elapsed for Array test, {1} iterations.  Press <enter> to continue.", arrayClock.ElapsedMilliseconds, ITERATIONS);
  16:         Console.ReadLine();
  17:  
  18:         Stopwatch marshalClock = Stopwatch.StartNew();
  19:         for (int iter = 0; iter < ITERATIONS; iter++)
  20:         {
  21:             RunMarshalAllocHGlobalTest();
  22:         }
  23:         marshalClock.Stop();
  24:         Console.WriteLine("{0}ms elapsed for Marshal test, {1} iterations.  Press <enter> to continue.", marshalClock.ElapsedMilliseconds, ITERATIONS);
  25:         Console.ReadLine();
  26:  
  27:         Stopwatch localClock = Stopwatch.StartNew();
  28:         for (int iter = 0; iter < ITERATIONS; iter++)
  29:         {
  30:             RunLocalAllocTest();
  31:         }
  32:         localClock.Stop();
  33:         Console.WriteLine("{0}ms elapsed for LocalAlloc P/Invoke test, {1} iterations.  Press <enter> to continue.", localClock.ElapsedMilliseconds, ITERATIONS);
  34:         Console.ReadLine();
  35:  
  36:         Stopwatch heapClock = Stopwatch.StartNew();
  37:         for (int iter = 0; iter < ITERATIONS; iter++)
  38:         {
  39:             RunHeapAllocTest();
  40:         }
  41:         heapClock.Stop();
  42:         Console.WriteLine("{0}ms elapsed for HeapAlloc P/Invoke test, {1} iterations.  Press <enter> to continue.", heapClock.ElapsedMilliseconds, ITERATIONS);
  43:         Console.ReadLine();
  44:     }
  45:  
  46:     private unsafe static void RunHeapAllocTest()
  47:     {
  48:         UIntPtr pSize = new UIntPtr((uint)(1048576 * sizeof(int)));
  49:         IntPtr pHeap = UnsafeNativeMethods.HeapCreate(HeapCreateFlags.None, pSize, UIntPtr.Zero);
  50:         if (pHeap == IntPtr.Zero)
  51:         {
  52:             Console.WriteLine("Could not create heap.");
  53:             return;
  54:         }
  55:         IntPtr pMem = UnsafeNativeMethods.HeapAlloc(pHeap, HeapAllocFlags.ZeroMemory, pSize);
  56:         if (pMem == IntPtr.Zero)
  57:         {
  58:             Console.WriteLine("Could not allocate heap.");
  59:             return;
  60:         }
  61:  
  62:         int* pNumbers = (int*)pMem.ToPointer();
  63:         for (int i = 0; i < 1048576; i++)
  64:         {
  65:             pNumbers[i] = i;
  66:         }
  67:         UnsafeNativeMethods.HeapFree(pHeap, HeapAllocFlags.None, pMem);
  68:         UnsafeNativeMethods.HeapDestroy(pHeap);
  69:     }
  70:  
  71:     private unsafe static void RunLocalAllocTest()
  72:     {
  73:         UIntPtr pSize = new UIntPtr((uint)(1048576 * sizeof(int)));
  74:         IntPtr pMem = UnsafeNativeMethods.LocalAlloc(LocalAllocFlags.ZeroInit, pSize);
  75:         if (pMem == IntPtr.Zero)
  76:         {
  77:             Console.WriteLine("Could not allocate heap memory.");
  78:             return;
  79:         }
  80:  
  81:         int* pNumbers = (int*)pMem.ToPointer();
  82:         for (int i = 0; i < 1048576; i++)
  83:         {
  84:             pNumbers[i] = i;
  85:         }
  86:         UnsafeNativeMethods.LocalFree(pMem);
  87:     }
  88:  
  89:     private unsafe static void RunMarshalAllocHGlobalTest()
  90:     {
  91:         IntPtr pMem = Marshal.AllocHGlobal(1048576 * sizeof (int));
  92:         if (pMem == IntPtr.Zero)
  93:         {
  94:             Console.WriteLine("Could not allocate memory.");
  95:             return;
  96:         }
  97:  
  98:         int* pNumbers = (int*) pMem.ToPointer();
  99:         for (int i = 0; i < 1048576; i++)
 100:         {
 101:             pNumbers[i] = i;
 102:         }
 103:         Marshal.FreeHGlobal(pMem);
 104:     }
 105:  
 106:     private static void RunArrayTest()
 107:     {
 108:         int[] array = new int[1048576]; //4mb array
 109:         for (int i = 0; i < 1048576; i++)
 110:         {
 111:             array[i] = i;
 112:         }
 113:     }
 114: }

There isn’t anything to complicated; a 4MB buffer is allocated using the selected method and then each 32-bit element is populated with its array index.  Unsafe code outperforms safe code in each x64 test, though the difference is much more marginal on x86.  The explanation is simple; safe code is checking the array index on every lookup. 

Summary

Using pointers and unsafe code can be a boost to your application’s performance, but you should consider where, when, and how you do it.  Since you don’t have control over when the GC is invoked, pinning objects like arrays can be costly.  You might instead consider using Windows API functions or direct memory access functions through the Marshal class to organize your memory if you absolutely need to chug that last piece of speed out of your code, but be warned – it’s not safe out there.

The "Speedy C#" Series:

Comments (0) Trackbacks (0)

No comments yet.


Leave a comment

ERROR: si-captcha.php plugin says GD image support not detected in PHP!

Contact your web host and ask them why GD image support is not enabled for PHP.

ERROR: si-captcha.php plugin says imagepng function not detected in PHP!

Contact your web host and ask them why imagepng function is not enabled for PHP.

No trackbacks yet.