Running with Code Like with scissors, only more dangerous


The technical interview, part 2: Deck of Cards

Posted by Rob Paveza

Last time, I talked about how to dazzle an interviewer who asks you to write some code to test whether a given input is a palindrome. This time, it's my favorite interview question to ask.

Interviewer: Write some code that creates a deck of cards. If you ask for more details, I'll tell you something to the effect of, Imagine I want to write a game like poker or solitaire. I need to create a standard deck of cards.

This question is deliberately open-ended to test a few things:

  • Can you identify what kinds of objects are needed?
  • Can you design good APIs?
  • Can you choose effective data structures?
  • Can you design and implement good algorithms?

As a candidate, I would want to start out by establishing what cards are: they are a collection of a suit and a rank. These are each discrete values, and could be implemented as an enumeration in languages which support them, or else constants:

var Suit = {
    clubs: 1,
    diamonds: 2,
    hearts: 3,
    spades: 4
var Rank = {
    ace: 1,
    two: 2,
    three: 3,
    four: 4,
    five: 5,
    six: 6,
    seven: 7,
    eight: 8,
    nine: 9,
    ten: 10,
    jack: 11,
    queen: 12,
    king: 13
public enum Suit 

public enum Rank
enum Suit {

enum Rank {

Try this in the TypeScript playground.

An enumeration is a kind of type that provides a set of named constants. Not all languages support enumerations, but they're fairly easy to simulate in most, like I did in JavaScript above.

Now that we have the suit and rank set up, we can create a class or structure to represent a Card. Why have I chosen to do this? The problem asks for a deck of cards; clearly a Deck is a discrete object which is composed of some number of other discrete objects (Cards).

function Card(suit, rank) {
    if (suit < Suit.clubs || suit > Suit.spades)
        throw new RangeError('Invalid suit.');
    if (rank < Rank.ace || rank > Rank.king)
        throw new RangeError('Invalid rank.');

    this.suit = suit;
    this.rank = rank;
public class Card
    // The readonly modifier should only be used for immutable value-types.
    // In this case since they're enumerations which are value-types, it's good.
    private readonly Suit _suit;
    private readonly Rank _rank;

    public Card(Suit suit, Rank rank) 
        // Checking range is also valid (like the other language examples).
        // This is a pretty good API to know about though.
        if (!Enum.IsDefined(typeof(Suit), suit))
            throw new ArgumentOutOfRangeException("suit");
        if (!Enum.IsDefined(typeof(Rank), rank))
            throw new ArgumentOutOfRangeException("rank");

        _suit = suit;
        _rank = rank;

    public Suit Suit 
        get { return _suit; }

    public Rank Rank 
        get { return _rank; }
class Card
    // TypeScript supports a shorthand for declaring properties.
    // The code that this emits will be (almost) identical to the JavaScript sample,
    // except that the range check will be done after the property assignment.
    constructor(public suit: Suit, public rank: Rank) {
        if (suit < Suit.clubs || suit > Suit.spades)
            throw new RangeError('Invalid suit.');
        if (rank < Rank.ace || rank > Rank.king)
            throw new RangeError('Invalid rank.');

Try this in the TypeScript playground.

In object-oriented programming, a class is the definition of a custom data type. Classes create an abstract "view" of data in the same way we can think of categories of objects (for example, "books" or "animals") which have some common sets of characteristics. Classes aren't the data themselves, however; we refer to individual datums as "objects" or "object instances." Each object of a given class should have the same kinds of data and similar behaviors, but each individual object might have specific nuances or individual data. For example, any given book might have a different number of pages, but all books have *some* number of pages.

Cards are simple data types. They have two properties - rank and suit - and once the card is created, you shouldn't be able to change those values. (The samples for JavaScript and TypeScript don't enforce that restriction, and if you point that out, you'll probably get bonus points). Given these constraints, it may also be worthwhile for the C# developer to point out that a Card could be a struct because structs should represent immutable values, but it's also not invalid to use a class here.

Classes have a desirable characteristic: they represent composition (or aggregation) of multiple values. That leads us to believe that a Deck, which is composed of multiple Cards, should also be a class:

function Deck() {
    this._cards = [];

Deck.prototype.addCard = function(card) {
    // Don't need to do checking specifically but if you want to be really fancy:
    if (!(card instanceof Card))
        throw new RangeError("Can only add a Card to the deck.");


Deck.prototype.dealCard = function() {
    if (this._cards.length === 0)
        throw new RangeError("No cards to deal.");

    return this._cards.pop();

Deck.prototype.shuffle = function(numTimes) {
    // numTimes is an optional parameter.  If it's not a number, set a reasonable default.
    if (typeof numTimes !== 'number')
        numTimes = 5;

    var cards = this._cards;
    var cardCount = cards.length;

    // Do the shuffle operation numTimes times.
    for (var time = 0; time < numTimes; time++) {
        // Visit every card position once per "time"
        for (var index = 0; index < cardCount; index++) {
            // Create a random number in the range of [0, length)
            var numToSwap = Math.floor(Math.random() * cardCount);
            // Swap the cards at index and numToSwap
            var temp = cards[numToSwap];
            cards[numToSwap] = cards[index];
            cards[index] = temp;

// Not a prototype function.
Deck.createPokerDeck = function() {
    var result = new Deck();

    // Note these operators should be <= because we want to ensure we include all suits and ranks.
    for (var suit = Suit.clubs; suit <= Suit.spades; suit++) {
        for (var rank = Rank.ace; rank <= Rank.king; rank++) {
            var card = new Card(suit, rank);

    return result;
public class Deck
    private List<Card> _cards;
    private int _curPos;

    // The keen observer will notice that _curPos always equals _cards.Length - 1.
    // Can optimize this away but it's good for illustrative purposes.
    public Deck()
        _cards = new List<Card>();
        _curPos = -1;

    public void AddCard(Card card)
        if (card == null)
            throw new ArgumentNullException("card");


    public Card DealCard()
        if (this._curPos < 0 || this._cards.Count == 0)
            throw new InvalidOperationException("There are no cards to deal.");

        var card = this._cards[this._curPos];
        this._curPos--; // Optionally, decrement operation can be combined into the previous line as long as it's postfix.

        return card;

    public void Shuffle(int numTimes = 5)
        List<Card> cards = this._cards;
        int cardCount = cards.Count;
        Random rng = new Random();

        // Do the shuffle operation numTimes times.
        for (int time = 0; time < numTimes; time++)
            // Visit every card position once per "time"
            for (var index = 0; index < cardCount; index++)
                // Create a random number in the range of [0, length)
                int indexToSwap = rng.Next(cardCount);
                // Swap the cards at index and indexToSwap
                Card temp = cards[indexToSwap];
                cards[indexToSwap] = cards[index];
                cards[index] = temp;

    public static Deck CreatePokerDeck()
        Deck result = new Deck();

        // Note these operators should be <= because we want to ensure we include all suits and ranks.
        for (int suit = (int)Suit.Clubs; suit <= (int)Suit.Spades; suit++) {
            for (int rank = (int)Rank.Ace; rank <= (int)Rank.King; rank++) {
                var card = new Card((Suit)suit, (Rank)rank);

        return result;
class Deck
    private _cards: Array<Card>;
    constructor() {
        this._cards = [];

    addCard(card: Card): void {
        // You can bypass runtime type checking here if you're using all TypeScript,
        // because the TypeScript compiler will emit a warning.  Otherwise, see 
        // the JavaScript sample for runtime type checking.


    dealCard(): Card {
        if (this._cards.length === 0)
            throw new RangeError('No cards to deal.');

        return this._cards.pop();

    shuffle(numTimes: number = 5): void {
        var cards = this._cards;
        var cardCount = cards.length;

        // Do the shuffle operation numTimes times.
        for (var time = 0; time < numTimes; time++) {
            // Visit every card position once per "time"
            for (var index = 0; index < cardCount; index++) {
                // Create a random number in the range of [0, length)
                var numToSwap = Math.floor(Math.random() * cardCount);
                // Swap the cards at index and numToSwap
                var temp = cards[numToSwap];
                cards[numToSwap] = cards[index];
                cards[index] = temp;

    static createPokerDeck(): Deck {
        var result = new Deck();

        // Note these operators should be <= because we want to ensure we include all suits and ranks.
        for (var suit = Suit.clubs; suit <= Suit.spades; suit++) {
            for (var rank = Rank.ace; rank <= Rank.king; rank++) {
                var card = new Card(suit, rank);

        return result;

Try this in the TypeScript playground.

Let's unpack this piece-by-piece:

Use of an Array to store the Cards in JavaScript/TypeScript, List in C#

An Array is the default collection type in JavaScript and there isn't really a compelling reason to switch. Since an Array automatically grows and shrinks as you add and remove items, and these are such common operations that the engines optimize for them, there isn't a good reason to switch to another data type. In C#, the choice of a List is an interesting one. Alternatives include a linked list, Queue, or Stack, particularly because the JavaScript variants use Stack semantics (push/pop). However, the most computationally-intensive function of a Deck, Shuffle, requires random-access, and none of those alternatives is the right data type for random access.


This method is pretty straightforward; it adds a new card to the end of the existing collection. You might call out that you're not checking for equality; this allows a Deck to contain multiple copies of the same value Card (for example, two Kings of Diamonds), for example. One thing that you might want to check for is reference equality (so that you're not adding two of the same exact Card variable, which would be impossible with real cards). However, that shouldn't be required unless you're really, really trying to dazzle your interviewer.


This method is also pretty straightforward. It should take one card from one end of the array. In JavaScript, it's equally valid to use pop as shift, but I would balk at an implementation using splice as too complex. Range checking is a requirement for this method.

Basic operations and constructor

As an interviewer, I would expect a candidate to at minimum identified AddCard and DealCard as fundamental operations. I would strongly hope that the candidate would also have identified a need to shuffle the cards, and shuffling should also identify the need to have a basic set of cards to populate as well (or, the candidate may want a basic set of cards, and realize that they won't be shuffled). However, I would advise against populating the deck within the constructor. The Deck is the container of Cards; you use a Deck for Euchre, but it doesn't have the same set of Cards within it as a Deck for Texas hold 'em.


The most obvious way to implement this is to go through each of the ranks and suits, creating a Card for each, and adding it to an originally-empty Deck. There are a number of ways to do this; you might see the following implementation:

Deck.createPokerDeck = function () {
    var result = new Deck();

    var suits = [Suit.clubs,, Suit.hearts, Suit.spades];
    var ranks = [Rank.ace, Rank.two, Rank.three, Rank.four, Rank.five, Rank.six,, Rank.eight, Rank.nine, Rank.ten, Rank.jack, Rank.queen, Rank.king];
    suits.forEach(function (suit) {
        ranks.forEach(function (rank) {
            var card = new Card(suit, rank);

    return result;
    public static Deck CreatePokerDeck()
        Deck result = new Deck();

        foreach (int suit in Enumerable.Range((int)Suit.Clubs, (int)Suit.Spades))
            foreach (int rank in Enumerable.Range((int)Rank.Ace, (int)Rank.King))
                var card = new Card((Suit)suit, (Rank)rank);

        return result;
    static createPokerDeck(): Deck {
        var result = new Deck();

        var suits: Array<Suit> = [Suit.clubs,, Suit.hearts, Suit.spades];
        var ranks: Array<Rank> = [Rank.ace, Rank.two, Rank.three, Rank.four, Rank.five, Rank.six,, Rank.eight, Rank.nine, Rank.ten, Rank.jack, Rank.queen, Rank.king];
        suits.forEach(suit => {
            ranks.forEach(rank => {
                var card = new Card(suit, rank);

        return result;

Try this in the TypeScript playground.

I would raise a flag about complexity of this kind of implementation with the candidate. You might ask about performance and if there's a better way, particularly for the JavaScript versions, but also somewhat for the C# version. Each of these introduces new method calls which are really unnecessary, because you can get the same values just by doing addition.


Shuffle is my favorite problem of all. There are any number of ways to do it. It tests if candidates know about and understand how to use the random number generator in their language, and also lets you glimpse into reasoning. Almost always, candidates try to swap two random cards somewhere in the deck. The most obvious question to ask, though, is how do you know how many total times to actually perform swaps? You might shuffle 104 times for a 52-card deck, but even with that, there's a reasonable chance you might have never touched a particular card. Not that such an action is unreasonable; a card in a real deck might not change position either. But iterating through every card in a deck and then generating a random number is a sure way to ensure that every card is visited once.

One interesting pivot on this problem is using the JavaScript forEach function on your array, and then swapping. In general, I prefer not to see this; forEach implies an enumeration, and modifying an object being enumerated over is kind of a no-no. JavaScript engines apply a consistent behavior, but C# will barf on it (it's invalid to modify a collection while an enumerator is active on it in C#).

For JavaScript also, the developer needs to have clearly called Math.floor(Math.random() * count). Math.random() generates a double-precision floating point number in the range of [0, 1). Multiplying this value by the length of our array gives us a floating point number in the range of [0, length), but it'll still possibly have a fractional component; calling floor gives an integer in the range of [0, length-1].


This problem exercises basic problem-solving skills, some algorithms, some bits and bytes, and really focuses on object-oriented design skills. It also happens to be the exact problem I was given during my very first technical screening for a marketing agency in Phoenix.

Next time: Reasoning about linked lists.


The technical interview, part 1: Palindrome

Posted by Rob Paveza

Last time, I talked about what I expect of an interview candidate as a technical job interviewer. This time, we're going to go through an initial question.

Interviewer: Write a function/method/subroutine to determine whether a particular string is a palindrome, i.e., is the same text when reversed as it is forward.

This is a common question for an interview because it allows you to demonstrate basic problem-solving skills and you can also show your interviewer that you know how to dive into the requirements. It requires that you can implement a simple algorithm, have an understanding of how strings work in your particular language, and can work through optimizations of your algorithm. Because the problem itself is so simple, you get a chance to dazzle your interviewer with good programming practices.

Understanding the problem

A palindrome is a string like kayak which you can reverse and end up with the same string. This gives you the naïve implementation:

        let reversed = ReverseString(test)
        return reversed equals test

Most languages will have a baked-in function to reverse a string. So you could do that, but could you do it better?

Interviewer: Tell me about the performance characteristics of that baseline implementation.

You've allocated a new string, and depending on that particular implementation, it could have been costly. But you've got at least an O(n)* memory usage. String equality testing is going to be character-by-character, so that's going to be O(n) execution. So, if you're testing a string that has 2 million characters, the naïve algorithm allocates an additional 2mb string and does (in theory) up to 2 million comparisons.

How can we optimize? Well, let's start with understanding the basic problem again. Let's consider kayak:

k a y a k
0 1 2 3 4

Given this string layout, what would we need to do to go faster?

  1. Compare 0 to 4. Match, continue.
  2. Compare 1 to 3. Match, continue.
  3. Only one uncompared character is remaining, therefore return true.

This can be generalized to an algorithm:

  1. Initialize start and end indices to 0 and length - 1 (or 1 and length for 1-based index programming languages).
  2. Compare characters at the start and end indices. If they don't match, return false (fast exit).
  3. Increment start and decrement end.
  4. If the end index is less than or equal to start, return true. Otherwise, loop back to the step 2.

We now know enough to author a simple method to do this:

function isPalindrome(stringToTest) {
    var start = 0, end = stringToTest.length - 1;
    for (/* already initialized */ ; start < end; start++, end--) {
        if (stringToTest[start] !== stringToTest[end])
            return false;
    // got to the end with no mismatches, implies success
    return true;
class Utilities 
    public static bool IsPalindrome(string toTest) 
        int start = 0, end = toTest.Length;
        for (/* already initialized */ ; start < end; start++, end--)
            if (toTest[start] != toTest[end])
                return false;
        // got to the end with no mismatches, implies success
        return true;
bool isPalindrome(const wchar_t* toTest const) {
    auto len = wcslen(toTest);
    wchar_t* start = const_cast<wchar_t*>(toTest);
    wchar_t* end = start + len - 1;
    for (/* already initialized */ ; start < end; start++, end--) {
        if (*start != *end)
            return false;
    return true;

This is a fairly optimal solution.

Interviewer: What about error cases? What kind of error cases exist, and how can you work around them?

In JavaScript, because we don't do compile-time type validation, it's possible for types to mismatch. A caller can pass in a number, an object, undefined, no value, null, a Boolean, etc. Each of these has different odd behaviors; for an Array, for example, the behavior will effectively match how things work with strings. But, for most other values, because there isn't any string coercion, and math is performed directly against the value's length property (which will generally result in NaN), and therefore that will return true. isPalindrome(true) definitely should return false, but it doesn't!

There are a couple of ways to guard against this. You can test the input parameter for its type and fail with a TypeError. You could combine this by attaching the method to the String prototype, which would enable the function to be called against any String object as an instance method.

In C#, if you pass in a null reference, the method will fail with a NullReferenceException on the first dereference of toTest.Length. This is acceptable but likely not ideal; instead, it would be preferable to fail with an ArgumentNullException. You can also show nice API design intuition by making it into an extension method.

If you're authoring in C++, a great question to ask would be the kind of error environment you need to deal with. In COM, for example, you don't want to use structured exception handling, or at least allow SEH exceptions to cross the ABI boundary. Depending on your environment, you may be able to use SEH, but be prepared to author a COM version of the API. Of course, COM also advises a different type system. COM strings tend to be BSTRs, where a null value is equivalent to a zero-length string. (WinRT uses HSTRINGs, which behave similarly to BSTRs).

String.prototype.isPalindrome = function() { // argument removed, it is "this"
    if (typeof this !== 'string' && !(this instanceof String)) // In case of call, apply, or bind
        throw new TypeError('Can only call against a string.');

    var start = 0, end = this.length - 1;
    for (/* already initialized */ ; start < end; start++, end--) {
        if (this[start] !== this[end])
            return false;
    // got to the end with no mismatches, implies success
    return true;
static class StringExtensions
    public static bool IsPalindrome(this string toTest) 
        if (toTest == null)
            throw new ArgumentNullException("toTest");

        int start = 0, end = toTest.Length;
        for (/* already initialized */ ; start < end; start++, end--)
            if (toTest[start] != toTest[end])
                return false;
        // got to the end with no mismatches, implies success
        return true;
// This sample is using WRL and Windows Runtime (because it's the C++ COM library that I know)
// IDL file:
namespace MyApi 
    interface IPalindromeTest 
        HRESULT IsPalindrome([in] HSTRING toTest, [out, retval] boolean* result);

// header:
using namespace ABI::MyApi;
class PalindromeTest : public RuntimeClass<IPalindromeTest>
    InspectableClass(InterfaceName_MyApi_IPalindromeTest, BaseTrust)

    virtual ~PalindromeTest();
    HRESULT RuntimeClassInitialize();

    IFACEMETHOD(IsPalindrome)(_In_ HSTRING toTest, _Out_ boolean* result);

// implementation:
// Other stuff - RuntimeClassInitialize, etc. - goes here
IFACEMETHODIMP PalindromeTest::IsPalindrome(_In_ HSTRING toTest, _Out_ boolean* result)
    IfNullReturnError(result, E_POINTER);
    HRESULT hr;
    size_t len;
    const wchar_t* strVal;

    strVal = WindowsGetStringRawBuffer(toTest, &len);
    wchar_t* start = const_cast<wchar_t*>(strVal);
    wchar_t* end = start + len - 1;
    for (/* already initialized */ ; start < end; start++, end--) {
        if (*start != *end) 
            *result = false;
            return S_OK;
    *result = true;
    return S_OK;

Interviewer: This is a pretty good solution. What are some of the pitfalls or problems with it? What are some possible variations that you might do?

This is an open-ended question intended to see what you might consider for your implementation and also to gauge what you know and maybe don't know. Possible answers include:

This implementation fails to work with Unicode characters above U+FFFF. JavaScript, modulo ES6, doesn't support characters outside of the UCS-2 range. They can go back and forth to a host which does, but intrinsically, JavaScript didn't understand the idea of surrogate pairs. C++'s wchar_t and C#'s string don't directly support surrogate pairs because the underlying character type is 16-bit. In order to fix the code, you'd need to get the "real" length of the strings (factoring in the code points) and then going code-point-by-code-point instead of index-by-index.

I could implement this function recursively instead of iteratively. Generally, this implementation is better as an iterative function than a recursive function. But recognizing that it's possible, because the algorithm is really about solving progressively simple sub-problems, is a good thing to do. In languages which support tail-call recursion, this can be as well-performing as the iterative implementation.

(function() {
    function isPalindrome(toCheck, startIndex, endIndex) {
        if (endIndex >= startIndex)
            return true;

        if (toCheck[startIndex] !== toCheck[endIndex])
            return false;

        return isPalindrome(toCheck, startIndex + 1, endIndex - 1);

    String.prototype.isPalindrome = function() { 
        if (typeof this !== 'string' && !(this instanceof String)) 
            throw new TypeError('Can only call against a string.');

        return isPalindrome(this, 0, this.length - 1);

Interviewer: Great. How would you validate this implementation?

This gives you a few options for approaches. This problem is simple enough that you could intuit the sets of cases:

  • A null value
  • A zero-length string
  • An odd-length string
  • An even-length string

You might also walk through your code to identify all of the different code paths. Doing so will help you get to the same set of cases.


If you've talked through all of this, then chances are I already think you're a pretty strong candidate. This isn't a complete list of things I might ask you to talk about, but I'm hoping it's taken 15-20 minutes, and we can talk about more next.

Up next: a deck of cards.

* O(n) notation is a mechanism by which we describe the worst-case performance characteristics of a system. For memory, it's referring to the number of bytes allocated; for CPU utilization, it's the number of times a particular loop operation must be executed. It's generally described as the largest factor in the equation; O(n) is linear time, O(n2) is polynomial time, O(log n) is logarithmic time, etc. This is a simplification, but I'd suggest having a strong grasp of this concept.


In a disconnected world, robust code is crucial

Posted by Rob Paveza

Probably 99.99% of HTML applications and websites are served over HTTP exclusively. (I'm referring here to HTTP as a transport protocol, not HTTP vs.HTTPS, for example, and I realize that HTTP is an application-layer protocol according to OSI; but developers generally treat it as an abstraction for "the network"). As anybody who has done web programming knows, HTTP is a stateless protocol; that is, it's based on a request-response model, and in general, one request has no knowledge of previous requests. This has posed some challenges for web developers over the years, and some brilliant abstractions of state on top of the statelessness have been devised.

The hard part now, though, isn't to deal with statelessness. It's dealing with the request-and-response model.

All network communication is inherently request-and-response. There are some applications that utilize full-duplex communications to get around that (think of chat software), but for the most part, that isn't really available behind the firewall. Web sockets are still yet to be standardized (and there are some questions about long-term compatibility with WebSocket-ignorant proxies). And typically, corporate firewalls say no to outbound connections except on ports 80 or 443. Some applications (think Meebo) have been able to get around this limitation by cleverly using long-timeout delays on AJAX requests. The client makes a request to the server, and the server either responds immediately (if an event is in queue) or holds the request for 30-90 seconds to see if an event comes in. I even did this once myself with good success, although I never took that app into production. (There was also some question about the total # of clients an ASP.NET server could sustain whilst holding threads in that way).

In many respects, Windows developers haven't had to deal with this. We could issue synchronous requests, and the UI would stand still for a second, and either it would work or it would fail. But usability concerns over this process, as well as issues with high network latency (imagine pressing the "Submit" button and having to wait 20 seconds while your app freezes - by then, I've force-closed the app) have seen platform providers decree that asynchrony is the only way to go.

HTML isn't the only application provider dealing with this limitation. Adobe Flash has long had an asynchronous-communication-only model, Microsoft Silverlight has also carried on this principle; of course, these two applications have lived predominantly in browsers, where a hanging UI probably means interfering with other apps as well as the one making the request. Interestingly, WinRT - the Windows 8 developer framework - is also going to mandate an asynchronous model, following in the Silverlight-based foodsteps blazed by Windows Phone 7.

So as we trek out into the world of asynchrony, well, we have a whole mess of questions to deal with now:

  • If there's an error, does it show up in the calling method or in the callback method? Does it even show up?
  • Does a network (transport-level) error surface differently than an application error? What if the server returned an HTTP 403 Forbidden response?
  • What are all of the different kinds of errors that can crop up? Do I need to handle SocketException or is that going to be abstracted to something more meaningful to my application?
  • What do I do if a network error comes up? Do I assume that I'm offline, panic, and quit? What if my application only makes sense "online"?
  • Do I surface an error to the customer? Silently fail? I might generally fail silently if I'm a background process, but then again, what if it's an important one? What if the customer thought he was saving his draft while all along it was offline, and then the customer closes the browser?
  • During the async operation, should I show the user a timeout spinner or something to that effect?
  • How should I design my async operations? For example, consider a Save operation. Should I capture all of my state at once and send it off, and let the user immediately keep working? Should I make the user wait until saving completes? Should I even use Save, or automatically save whenever something changes?
  • If I use auto-save, how do I handle undo? What if I want to undo between sessions? Is there a way to go back if the hosting application crashes? (Worst case scenario: the user accidentally hit Select All, Delete and then the browser crashed after the auto-save).

This merely scratches the surface of the kinds of questions we'll need to begin asking ourselves. It doesn't even deal with the difficulties of programming asynchronously, which C# 5 is going to deal with extraordinarily, but many developers will be unable to take advantage of these updates. For example, suppose I have a widget on my page that monitors the status of a long-running server-based process. I need to have JavaScript on my page that monitors that process and updates my widget accordingly. Should I:

  • Write a singleton object? This might be easier and afford strong member protection, but I can only have one widget, unless I somehow differentiate between them and multiplex, which can become hairy quickly.
  • Should the monitoring function accept a callback, or should it be event-based, so that multiple subscribers can listen? (Maybe an event-based model offers some interesting ways to deal with the complexities of a singleton?)
  • Should the widget manipulate the view directly, or should I write separate code that handles the view based on the state of the object (or objects)?

The list goes on.

We're moving faster and faster into an asychronous world. It is already happening, and we as developers need to be prepared to handle these difficulties. We also need to understand how to communicate these kinds of questions to our business analysts, our supervisors, and our customers. We need to be able to equip ourselves to ask the right questions of our customers, so that when it's time to make a decision, we have the information we need.


Improving Performance with Dynamic Methods Part 1: The Problem Definition

Posted by Rob

One of the problems that a large part of the a certain gaming community has understood over the years has been one of version checking.  A common, though now older, method of version checking among this community has been to execute a known algorithm based on a seeded value; however, the algorithm would change based on a formula sent over the wire.  For instance, suppose for every four bytes in a file, there are four state values: A, B, C, and S.  The S value is the current four bytes of the file.  The server might send the following formula as an initialization: A=A-S B=B-C C=C+A A=A+B.  In addition, it sends some startup values for A, B, and C.  It means, that for every four bytes of the file, we need to perform the math in the stops outlined in the above file initialization string.

Now, one of the common ways to approach this problem has been to, basically, attack it by brute force.  We’d keep track of the state values in an array, then keep track of the indices of the state values in another array offset by their letters, then keep track of operators in another array, and finally doing double-dereferencing (dereferencing the index of the state value then actually dereferencing the state value.  So you might have code that looks like this:

foreach (step)
    states[Transform('S')] = ReadNext();
    foreach (string formula in list)
        states[Transform(formula[0])] = DoMath(states[Transform(formula[2])], states[Transform(formula[4])], GetOperator(formula));

Here, the “Transform” function translates a character to its index into the state value index.  This is a pretty sub-optimal solution given all of the extra dereferencing, and this is really a pseudo-implementation of this activity.  What would be best is if we could somehow unroll that inner loop and access the values directly (or through a single dereference, as a pointer would do).  In other words, it could be rewritten better like so:

foreach (step)
    S = ReadNext();
    A = A - S;
    B = B - C;
    C = C + A;
    A = A + B;

The challenge is that, the server provides the verification string, and it changes over time, so the client can’t reliably predict which combination of formulae will be used.  Although in the wild only a fixed set of combinations have ever been observed, there are a number of others that could potentially be presented, with no fixed number of formulas, three potential writeable state values and four readable state values per formula, and eight binary operators (+, –, *, /, %, &, |, and ^).  So, either we keep going with the inner loop, or we figure out some way to get all the benefits of compilation without the headaches of having to know exactly what we’re programming before we program it.  Fortunately, the .NET framework provides a way for us to do exactly that: dynamic methods.

To simplify the code that we need to generate, we’ll rewrite the inner code to look like this:

foreach (step)
    S = ReadNext();
    ExecuteStep(ref A, ref B, ref C, ref S);

Now, all we need to do is dynamically emit the ExecuteStep method.  To do so we’ll need to get into the System.Reflection.Emit namespace – kind of a scary place to be!  Fortunately, Reflector is going to make this easier for us – and we’ll be glad we’re doing this in IL.

In Part 2, we’ll look at how to actually emit the dynamic method by writing the equivalent code in C# and then looking at it in Reflector, then figuring out how to generate it at run-time.  Along the way, we’ll learn a little bit about the .NET evaluation stack.

Oh – one more thing – here’s why you should care about all of this.  A simple testing framework indicated a speed increase of a factor of four when changing this to use a dynamic method instead of the previous implementation.  Over 50 iterations, I observed the dynamic method versions taking a little less than 1/4 of the execution time of the original array-based implementation.

Now, if that’s not a marked improvement, I don’t know what is.  But remember, as with all performance optimizations, your mileage may vary.

Improving Performance with Dynamic Methods

  • Part 1: The Problem Definition
  • Part 2: Emit and Execute

Just how Big is Big-O, anyway?

Posted by Rob

For a couple reasons I love to look at programming interview questions around the internet.  Most of them revolve around data structures and algorithms, which is one of my favorite topics in computer science; and as a hiring manager, I find it valuable to try to see what the industry is asking people before bringing them on board.  I came across this site tonight, and while it had a lot of questions that I’ve seen before, this one – a variant of the programming exercise I was given for my first real (non-interning) job application – stood out:

Implement Shuffle given an array containing a deck of cards and the number of cards. Now make it O(n).

This caught my eye for a couple reasons: first, why would the algorithm not ever be O(n)?  Second, if I think its obvious implementation is O(n), then do I have a misconception about what O(n) means?  The problem doesn’t give any other details about what “Shuffle” means either.

Still, this is my naive algorithm:

void Shuffle ( cards[], length )
    for i = 0 to length
        n = Random % length
        Swap(&cards[i], &cards[n])

Swap() isn’t linear or otherwise; it’s constant time.  The only loop happening here is the single for loop, from 0 to the length of the array.  We can improve the entropy of the resultant shuffle by using a cryptographically-strong random number generator such as RNGCryptoServiceProvider, but it would require more memory.  Here’s an actual C# implementation:


using System;

namespace LinearBigOShuffle
    public class Card
        public Rank Rank { get; set; }
        public Suit Suit { get; set; }

        public override string ToString()
            return string.Format("{0,-8}{1}", Suit, Rank);

    public enum Suit
        Club = 1, Diamond = 2, Heart = 3, Spade = 4,

    public enum Rank
        Ace = 1, Two = 2, Three = 3, Four = 4, Five = 5, Six = 6, Seven = 7, Eight = 8, Nine = 8, Ten = 10,
        Jack = 11, Queen = 12, King = 13,


using System;

namespace LinearBigOShuffle
    class Program
        static void Main()
            // initialize the cards array
            Card[] cards = InitializeStandardDeck();

            // Verify that the cards were initialized correctly




            // Wait for the user to press enter to exit.

        private static void Shuffle(Card[] cards)
            Random r = new Random();
            int n = cards.Length;
            for (int i = 0; i < n; i++)
                int nextSwapIndex = r.Next(n);
                Card temp = cards[nextSwapIndex];
                cards[nextSwapIndex] = cards[i];
                cards[i] = temp;

        private static Card[] InitializeStandardDeck()
            Card[] cards = new Card[52];
            int index = 0;
            for (int suit = (int)Suit.Club; suit <= (int)Suit.Spade; suit++)
                for (int rank = 1; rank <= (int)Rank.King; rank++)
                    cards[index++] = new Card { Rank = (Rank)rank, Suit = (Suit)suit };
            return cards;

        private static void PrintCards(Card[] cards)
            foreach (Card c in cards)

In this implementation, I’ve included a couple helper methods and assumed Ace-low, but I don’t think that really matters.  If you run the program, you’ll see the in-order deck and the shuffled deck.  There are a couple caveats with the Shuffle implementation I’ve provided:

  • There is no guarantee that a card won’t end up in the same relative position in which it started, or the same absolute position.
  • There is no guarantee that an individual card won’t move multiple times.

If you think about it though, neither of these caveats are untrue for really shuffling cards!

What about Big-O?

The question I raised earlier was, just how big is Big-O anyway?  Well, Big-O is great for analyzing algorithmic complexity, but that’s not always what we want when measuring the performance of an implementation.  I’m depending on a library function in the Random class!  What if it was incredibly slow, because it produced truly random numbers instead of pseudorandom numbers?  In that scenario, I would not have a terribly performant implementation, even if my algorithm was “theoretically” correct.

Big-O is great!  But I guess what I was trying to say here is…

Don’t think it’s the only thing that will impact your code!


Speedy C#, Part 4: Using – and Understanding – CLR Profiler

Posted by Rob

CLR Profiler is a free and incredibly useful tool offered by Microsoft.  I'm fairly certain its primary use (at least from Microsoft's perspective) is to illustrate use of the CLR Profiling COM APIs, which aren't exceptionally clear-cut (in my opinion), particularly from a .NET programmer's point of view.  The really difficult part of using CLR Profiler is becoming accustomed to its interface and the data it presents; however, once you do so, I'm certain you'll find it incredibly helpful in addressing difficulties with memory usage.  This article aims to introduce you to the "important parts" of CLR Profiler - specifically, which graphs you should view, how you should interpret them, and how to address the problems you find.  This article will not review some of the more complicated parts of injecting CLR Profiler into something such as your ASP.NET application; there are other resources for that purpose.

For the purposes of this article, I've re-introduced a wasteful error into BN# that I found by using CLR Profiler.  We'll work through finding it in this article.

Getting Started

Once you have CLR Profiler "installed" - and I use the term loosely - you can start the application from the install path (don't look for a Start Menu item).  There are two versions of binaries, x86 and x64 versions; you should know which edition of the application you'd like to run.  If you're running a platform-neutral application (most .NET apps would fall under this category), and you're on an x64 system, you should use that one.  If you're running 32-bit Windows, or are running a program specifically targeted to x86, then you should run the x86 version of CLR Profiler.

As an important note, for Windows Vista users, if you're running with UAC enabled, make sure to run CLR Profiler as an administrator.  CLR Profiler works by injecting a COM DLL into the target, but it can't do that if you're not running the process as an administrator.

CLR Profiler while it's not running anything

When profiling memory, I turn off Calls tracking: it's located in the bottom-right of the UI window.

If your application requires access to the local application directory - for instance, by using the Application class in Windows Forms - you should go through the explicit Profile Application menu item within the File menu, and set the working directory option of that UI.  Otherwise, go ahead and click Start Application, browse to your application, and go.

During Operation

Other than the fact that your application will be measurably slower, you should be able to run the application as you otherwise would.  Your mileage will vary, but you'll get better results with more memory in your system.  But all developers have at least 4gb powering their boxes now, right?

During the application, you can click on the Show Heap now button on the main CLR Profiler GUI, which will display a heap graph of the current application, displaying the path to all currently allocated memory:

Heap Graph of current profile

To be honest, I find the heap graph to be relatively confusing, but the good news is that you don't need to keep using it.  But once you've dumped that temporary log, you can view the current heap and interesting information by closing that window and, in the main CLR Profiler window, going to the View menu, and choosing Summary, which displays a cool window:

A result summary of a profile

This window helps you understand what's happening:

  • Allocated bytes is really interesting – it relates the total amount of memory that you’ve allocated within managed code.
  • Final Heap Bytes is the amount of managed memory that currently is in use on the heap.  This doesn't necessarily reflect unmanaged items.
  • Relocated Bytes is the amount of memory that has been moved by the garbage collector during compaction operations.
  • Gen X collections shows the number of garbage collections that have occurred for each generation.
  • Garbage Collector Generation Sizes shows the number of bytes being used by each heap.

What's Happening with BN#?

I had a suspicion based on memory usage (reported by Task Manager) that BN# wasn’t quite as efficient as I would have hoped.  I wanted to do some investigation, so I plugged in CLR Profiler.  After a 30-second (or so) connection to, joining Clan Recruitment, this is what I saw:

Profile of BN# with intentional memory bug

That’s pretty heavy – 31mb or so total allocated memory but only ending up with about 3mb on the heap and only 3.5mb were relocated throughout the lifetime of the app – that told me that I was doing a lot of allocating and freeing very rapidly.  What’s the next step?

I clicked on the Allocation Graph button and took a look:

Allocation graph indicating 10mb of byte[] on the heap.

In this we can see that byte arrays are on the heap frequently and account for about 35% of all memory allocations.  That’s a big problem – especially since I pooled their creation already!  CLR profiler helps me track it down though, as I follow the highlighted call chain back to its source:

The culprit

This image indicates that I have a problem with a method called DataReader::get_m_data().  Now, as I mentioned, I had to recreate this problem, and the path of least resistance for me was to change the identifier m_data (used frequently in DataReader) to be a property instead of a field, so originally this said get_Data.  I thought that was odd until I saw its implementation:

        protected virtual byte[] Data
                byte[] dataCopy = new byte[_m_data.Length];
                Buffer.BlockCopy(_m_data, 0, dataCopy, 0, dataCopy.Length);
                return dataCopy;

So here, for every operation that accesses the Data property (in the original implementation, it was every operation, because the Data property was virtual), I was duplicating the entire arrayEVERY TIME.

I then changed the implementation so that operations defined within the base class wouldn’t needlessly go through a property, and derived classes had direct access to the buffer by reference (via the UnderlyingBuffer property).  What were my results?

Final Results

I think that fairly well speaks to the effectiveness of using tools like this. :)  A decrease of 27% in allocations, 33% in gen-0 collections, and 53% decrease of the amount of byte[] allocations:

Updated allocation graph

Further Reading

The "Speedy C#" Series:


Speedy C#, Part 3: Understanding Memory References, Pinned Objects, and Pointers

Posted by Rob

So often in the managed world we're able to get away with not worrying about memory management.  "But the GC takes care of cleaning my objects for me!"  That's true; but if you want your application to be performant, you should at least understand what's going on in all of those circuits and silicon.

In Part 2, I talked a bit about how creating object pools can help you to avoid garbage collections by keeping memory allocated for a long time.  Here, I'm going to talk a bit more extensively about how objects are stored in memory, what a "pinned object" is, and how pointers can be used quickly in C#.

NOTE: This article assumes you are familiar with pointers and pointer arithmetic.  If not, you may wish to brush up.

Objects in Memory - A Closer Look at the Heap

When you create an instance of a class (not a struct or an enum), your object is being stored on the "heap" - a large contiguous area of memory that is just there.  (For more information on the heap, read up on Part 2).  This includes, interestingly enough, any Array objects you create (such as a byte[[) - they're reference objects, not value objects.  (The one exception is if you use the stackalloc operator in C#).  So, suppose I make the following class:

   1: class Sample
   2: {
   3:     public int A;
   4:     public long B;
   5:     public short C;
   6:     public short D;
   7: }

Here's how it would conceptually look in a memory block:

An instance of Sample in memory

As you can see, the class is laid out contiguously (although the CLR does not guarantee this behavior unless it is decorated with [StructLayout(LayoutKind.Sequential)]).  Still, you get the idea.

However, when we create an object and get a reference to it, we don't actually get a pointer to the object - we get a "reference".  This isn't a reference like you might expect in C or C++, either; rather, it's similar to a handle.  We can use it just like it's conceptually in memory like I laid out.  However, the CLR hides implementation details; for example, every object on the heap has at least a reference to its RuntimeTypeHandle so that casting can be checked at runtime.  To demonstrate, let's take a byte[].  When it's stored on the heap, it's pretty clear what we're looking at.  Arrays of any type are an interesting edge case in .NET; normally, C# does not allow you to obtain a pointer of a managed type (and in fact you can't do what I'm about to demonstrate with a reference type), but arrays themselves ARE managed types (don't worry about the last two lines of output just yet).

   1: static unsafe void Main(string[] args)
   2: {
   3:     byte[] bytes = new byte[100];
   4:     bytes[0] = 1;
   5:     bytes[1] = 2;
   6:     bytes[2] = 3;
   7:     bytes[3] = 4;
   9:     Type arrayType = null;
  10:     fixed (byte* pMem = &bytes[0])
  11:     {
  12:         Console.WriteLine("{0:x16}", (long)pMem);
  13:         int* pArrayBase = (int*) pMem;
  14:         Console.WriteLine("{0:x8}", *pArrayBase);
  15:         pArrayBase--;
  16:         Console.WriteLine("{0:x8}", *pArrayBase);
  17:         pArrayBase--;
  18:         Console.WriteLine("{0:x8}", *pArrayBase);
  19:         pArrayBase--;
  20:         Console.WriteLine("{0:x8}", *pArrayBase);
  21:         pArrayBase--;
  22:         Console.WriteLine("{0:x8}", *pArrayBase);
  23:         long rtth = *(long*) pArrayBase;
  24:         RuntimeTypeHandle handle;
  25:         // RTTH is a value-type whose only member is an IntPtr; can be set as a long on x64
  26:         RuntimeTypeHandle* pH = &handle;
  27:         *((long*) pH) = rtth;
  28:         arrayType = Type.GetTypeFromHandle(handle);
  29:     }
  31:     if (arrayType != null)
  32:     {
  33:         Console.WriteLine(arrayType.Name);
  34:     }
  36:     Console.WriteLine("byte[] RTTH: {0:x16}", typeof (byte[]).TypeHandle.Value.ToInt64());
  37:     int a = 1;
  38:     int b = 2;
  39:     int* pA = &a;
  40:     int* pB = &b;
  41:     Console.WriteLine(*pB);
  42:     Console.WriteLine(*(pB - 1));
  44:     Console.ReadLine();
  45: }

Now, just to clarify: I run on x64.  The above code will not function as expected on x86.  There are a few items that will also produce slightly varying results for you; for instance, pMem shouldn't be cast to a long on x86, and to get to the instance's stored RTTH, you only need to decrement the pointer 3 times on x86 (whereas the RTTH on x64 is 8 bytes long).  Here's the output on my machine:

0000000002a31748                Console.WriteLine("{0:x16}", (long)pMem);
04030201                        Console.WriteLine("{0:x8}", *(pMem));
00000000                        Console.WriteLine("{0:x8}", *(pMem - 1));
00000064                        Console.WriteLine("{0:x8}", *(pMem - 2));
00000642                        Console.WriteLine("{0:x8}", *(pMem - 3));
7890a4a8                        Console.WriteLine("{0:x8}", *(pMem - 4));
Byte[] Console.WriteLine(arrayType.Name); byte[] RTTH: 00000642789562c2 Console.WriteLine("{0:x16}", typeof(byte[]).TypeHandle.Value.ToInt64()); 2 Console.WriteLine(*pB); 1 Console.WriteLine(*(pB - 1));

So, here we see that the runtime type identifier is stored as part of the object reference on the heap; so is the array length (that's the hex value 00000064 that you see on the fourth line of output - it's 100 in decimal).  That's how arrays are stored on the heap, and it's pretty much how objects are stored; when we have an object reference, we can treat it as if it's a pointer into memory.  But it's more than that; below our "pointer" exists additional information about the object.  We don't get to see that additional information because the CLR hides it from us.

What are reference variables then?  Ultimately, they're stack variables that contain our "pointer" that isn't really a pointer.  I said not to worry too much about the last two lines before, but they are intended to show you one thing: stack variables are allocated sequentially on the stack.  I declared a, then b; by obtaining a pointer to b, I was also able to obtain a pointer to a by decrementing the pointer by the size of the variable (in this case, 32 bits).  To show you that my handle is in fact legitimately pointing to a stack variable, take a look at the following code:

   1: static unsafe void Main(string[] args)
   2: {
   3:     Sample s = new Sample {A = 0x01020304, B = 0x0f0e0d0c0b0a0908, C = 0x0706, D = 0x0504};
   4:     long a = 1;
   5:     long b = 2;
   6:     long* pA = &a;
   7:     long* pB = &b;
   8:     Console.WriteLine("{0:x16}", *pB);
   9:     Console.WriteLine("{0:x16}", *(pB - 1));
  10:     Console.WriteLine("{0:x16}", *(pB - 2));
  12:     long prS = (long)(pB - 2); // the location of s on the stack
  13:     long* pS = *(long**)prS;
  14:     Console.WriteLine("{0:x16}", *pS);
  15:     Console.WriteLine("{0:x16}", *(pS + 1));
  16:     Console.WriteLine("{0:x16}", *(pS + 2));
  18:     Console.ReadLine();
  19: }

Again, the above code will not function as expected on x86 (to make it do so, replace all long references with int).  The output of this code is fascinating:

0000000000000002      b
0000000000000001      a
0000000002be16c8      s
00000642801a4400      *pS
0f0e0d0c0b0a0908      *(ps + 1) 
0504070601020304      *(ps + 2) 

You might notice that s is a pointer to the heap, and that dereferencing it gives us a number that looks suspiciously similar to a RuntimeTypeHandle just like in the last example, and you'd be correct.  The other interesting thing is the variable order: the B variable in the Sample class was aligned so that it would be first (8-byte alignment on x64 appears to be the default).  Applying [StructLayout] to it as noted before makes it look right (although to the untrained eye it will look entirely backwards due to endianness).

In Part 2, I talked about how garbage collection allows us to not worry so much about external fragmentation of the heap, because the GC performs a process called "compaction," by which objects are moved around in memory so that there aren't small areas of free space.  The interesting question is: what happens if a GC compaction happens and we have a pointer to an object?

Accessing Memory Locations with Pinned Objects

The CLR allows us to "pin" an object so that it is not moved during garbage collection.  This can potentially have some big consequences for garbage collection, though; the heap is still fragmented if an object is pinned during a pass.  What's more is that if the object becomes eligible for compaction after the pass, it's still considered a gen-0 object even though it should have moved to gen-1.  C# enables us to pin an object via the fixed statement.

In truth, the only objects worth pinning are arrays.  You can't pin a regular reference object to get a pointer for the reason shown above (it's not guaranteed to follow any particular pattern), and single value-type objects can be accessed directly on the stack without pinning.  Pinning arrays has some good performance benefits (which I'll get to a bit later), but like I said, not without a cost.

The neatest part about pointers in C# is that a pointer can be cast to a pointer of any other value-type; this is exceptionally common in C code (reading a file into memory by reading the length of a struct, and then treating the memory as a pointer to that struct, for example).  Sometimes it's simply easier for us to do that in C# than it is to use a stream.  Consider the case of reading a PE file header; it's a nightmare!  So many lines of code when you could simply read in a buffer and call it a PE file header.  Strong typing imposes that limitation, but thankfully even on edge cases like this, we can work around it.

I'm not going to discuss the performance characteristics of pinned objects during a garbage collection; for one, they're hard to measure, but more importantly, it's been well-documented to hurt the performance of the garbage collector.

Getting Pointers without the Pinning

There are other means by which to obtain, create, and manage pointers aside from the standard fixed statement.  As mentioned earlier, you can use the stackalloc statement to allocate a block of memory on the stack; it provides a pointer to the stack with the base of an array.  Alternatively, if you don't care about portability, you can use native Windows functions to allocate memory for you.  These functions might include LocalAlloc, HeapAlloc, VirtualAlloc, or VirtualAllocEx, depending on what your needs are.

An interesting prospect might be to allocate multiple heaps using the HeapCreate APIs; this would allow you to manage your memory per-area of responsibility; Noel Llopis suggests such a strategy in his book C++ for Game Programmers.  Although all of this memory management might seem like overkill, if you're really hunting for the next tweak to speed up your code, this might help you get over the line.

Performance Characteristics of Unsafe vs. Safe Code

Let's not kid ourselves; unsafe code is inherently unsafe because the runtime doesn't manage the code for us.  So before using code like this in your applications, be absolutely certain that you need it.

The CLR provides the means to access heap memory via the Marshal.AllocHGlobal method.  The documentation notes that it uses LocalAlloc, probably because LocalAlloc doesn't require a pointer to a heap.  Despite the admonition that you'll get better performance and more features out of the other functions, the use of LocalAlloc does not seem to be a hindrance in speed relative to using HeapCreate/HeapAlloc/HeapDestroy.  The execution times are shown here:

  Debug Mode - 5 Iterations Release Mode - 5 Iterations Debug Mode - 25 Iterations Release Mode - 25 Iterations
Normal .NET Array [] notation x86: 17ms; x64: 45ms x86: 15ms; x64: 65ms x86: 109ms; x64: 252ms x86: 95ms; x64: 333ms
Marshal.AllocHGlobal with pointers x86: 15ms; x64: 36ms x86: 14ms; 30ms x86: 95ms; x64: 193ms x86: 80ms; x64: 148ms
LocalAlloc P/Invoke with Pointers x86: 16ms; x64: 37ms x86: 14ms; x64: 31ms x86: 96ms; x64: 193ms x86: 78ms; x64: 161ms
HeapAlloc P/Invoke with Pointers x86: 16ms; x64: 42ms x86: 14ms; x64: 32ms x86: 102ms; x64: 197ms x86: 88ms; x64: 166ms

Surprisingly, the normal array bracket notation performed significantly worse in release builds than in debug builds on x64; I don't really have an answer for why that would be.  I did not perform extensive statistical regression or even provide averages; I ran each set three times, and if they all looked mostly the same, I used the data.  These data are from x64 machines; the x86 results were from setting compilation target to x86 and running the program in WOW64.  I was surprised how much slower x64 was, though it might have been because we were using machine words on x86, and half-words on x64.  Perhaps memory access would be faster if we were using longs on x64.  (Prelim tests seem to confirm this; I will post a follow-up soon.)

Here are the P/Invoke declarations:

   1: public enum LocalAllocFlags
   2: {
   3:     Fixed = 0,
   4:     Moveable = 2,
   5:     ZeroInit = 0x40,
   6: }
   8: public enum HeapCreateFlags
   9: {
  10:     None = 0,
  11:     EnableExecute = 0x40000,
  12:     GenerateExceptions = 4,
  13:     NoSerialize = 1,
  14: }
  16: public enum HeapAllocFlags
  17: {
  18:     None = 0,
  19:     GenerateExceptions = 4,
  20:     NoSerialize = 1,
  21:     ZeroMemory = 8,
  22: }
  24: static class UnsafeNativeMethods
  25: {
  26:     [DllImport("kernel32.dll")]
  27:     public static extern IntPtr LocalAlloc(LocalAllocFlags flags, UIntPtr uBytes);
  29:     [DllImport("kernel32.dll")]
  30:     public static extern IntPtr LocalFree(IntPtr hMem);
  32:     [DllImport("kernel32.dll")]
  33:     public static extern IntPtr HeapCreate(HeapCreateFlags flOptions, UIntPtr dwInitialSize, UIntPtr dwMaxSize);
  35:     [DllImport("kernel32.dll")]
  36:     public static extern IntPtr HeapAlloc(IntPtr hHeap, HeapAllocFlags dwFlags, UIntPtr dwBytes);
  38:     [DllImport("kernel32.dll")]
  39:     public static extern IntPtr HeapFree(IntPtr hHeap, HeapAllocFlags dwFlags, IntPtr lpMem);
  41:     [DllImport("kernel32.dll")]
  42:     [return: MarshalAs(UnmanagedType.Bool)]
  43:     public static extern bool HeapDestroy(IntPtr hHeap);
  44: }

And finally, here's the benchmarking code:

   1: class Program
   2: {
   3:     private const int ITERATIONS = 25;
   4:     static unsafe void Main(string[] args)
   5:     {
   6:         Console.WriteLine("Press <enter> to start.");
   7:         Console.ReadLine();
   9:         Stopwatch arrayClock = Stopwatch.StartNew();
  10:         for (int iter = 0; iter < ITERATIONS; iter++)
  11:         {
  12:             RunArrayTest();
  13:         }
  14:         arrayClock.Stop();
  15:         Console.WriteLine("{0}ms elapsed for Array test, {1} iterations.  Press <enter> to continue.", arrayClock.ElapsedMilliseconds, ITERATIONS);
  16:         Console.ReadLine();
  18:         Stopwatch marshalClock = Stopwatch.StartNew();
  19:         for (int iter = 0; iter < ITERATIONS; iter++)
  20:         {
  21:             RunMarshalAllocHGlobalTest();
  22:         }
  23:         marshalClock.Stop();
  24:         Console.WriteLine("{0}ms elapsed for Marshal test, {1} iterations.  Press <enter> to continue.", marshalClock.ElapsedMilliseconds, ITERATIONS);
  25:         Console.ReadLine();
  27:         Stopwatch localClock = Stopwatch.StartNew();
  28:         for (int iter = 0; iter < ITERATIONS; iter++)
  29:         {
  30:             RunLocalAllocTest();
  31:         }
  32:         localClock.Stop();
  33:         Console.WriteLine("{0}ms elapsed for LocalAlloc P/Invoke test, {1} iterations.  Press <enter> to continue.", localClock.ElapsedMilliseconds, ITERATIONS);
  34:         Console.ReadLine();
  36:         Stopwatch heapClock = Stopwatch.StartNew();
  37:         for (int iter = 0; iter < ITERATIONS; iter++)
  38:         {
  39:             RunHeapAllocTest();
  40:         }
  41:         heapClock.Stop();
  42:         Console.WriteLine("{0}ms elapsed for HeapAlloc P/Invoke test, {1} iterations.  Press <enter> to continue.", heapClock.ElapsedMilliseconds, ITERATIONS);
  43:         Console.ReadLine();
  44:     }
  46:     private unsafe static void RunHeapAllocTest()
  47:     {
  48:         UIntPtr pSize = new UIntPtr((uint)(1048576 * sizeof(int)));
  49:         IntPtr pHeap = UnsafeNativeMethods.HeapCreate(HeapCreateFlags.None, pSize, UIntPtr.Zero);
  50:         if (pHeap == IntPtr.Zero)
  51:         {
  52:             Console.WriteLine("Could not create heap.");
  53:             return;
  54:         }
  55:         IntPtr pMem = UnsafeNativeMethods.HeapAlloc(pHeap, HeapAllocFlags.ZeroMemory, pSize);
  56:         if (pMem == IntPtr.Zero)
  57:         {
  58:             Console.WriteLine("Could not allocate heap.");
  59:             return;
  60:         }
  62:         int* pNumbers = (int*)pMem.ToPointer();
  63:         for (int i = 0; i < 1048576; i++)
  64:         {
  65:             pNumbers[i] = i;
  66:         }
  67:         UnsafeNativeMethods.HeapFree(pHeap, HeapAllocFlags.None, pMem);
  68:         UnsafeNativeMethods.HeapDestroy(pHeap);
  69:     }
  71:     private unsafe static void RunLocalAllocTest()
  72:     {
  73:         UIntPtr pSize = new UIntPtr((uint)(1048576 * sizeof(int)));
  74:         IntPtr pMem = UnsafeNativeMethods.LocalAlloc(LocalAllocFlags.ZeroInit, pSize);
  75:         if (pMem == IntPtr.Zero)
  76:         {
  77:             Console.WriteLine("Could not allocate heap memory.");
  78:             return;
  79:         }
  81:         int* pNumbers = (int*)pMem.ToPointer();
  82:         for (int i = 0; i < 1048576; i++)
  83:         {
  84:             pNumbers[i] = i;
  85:         }
  86:         UnsafeNativeMethods.LocalFree(pMem);
  87:     }
  89:     private unsafe static void RunMarshalAllocHGlobalTest()
  90:     {
  91:         IntPtr pMem = Marshal.AllocHGlobal(1048576 * sizeof (int));
  92:         if (pMem == IntPtr.Zero)
  93:         {
  94:             Console.WriteLine("Could not allocate memory.");
  95:             return;
  96:         }
  98:         int* pNumbers = (int*) pMem.ToPointer();
  99:         for (int i = 0; i < 1048576; i++)
 100:         {
 101:             pNumbers[i] = i;
 102:         }
 103:         Marshal.FreeHGlobal(pMem);
 104:     }
 106:     private static void RunArrayTest()
 107:     {
 108:         int[] array = new int[1048576]; //4mb array
 109:         for (int i = 0; i < 1048576; i++)
 110:         {
 111:             array[i] = i;
 112:         }
 113:     }
 114: }

There isn't anything to complicated; a 4MB buffer is allocated using the selected method and then each 32-bit element is populated with its array index.  Unsafe code outperforms safe code in each x64 test, though the difference is much more marginal on x86.  The explanation is simple; safe code is checking the array index on every lookup. 


Using pointers and unsafe code can be a boost to your application's performance, but you should consider where, when, and how you do it.  Since you don't have control over when the GC is invoked, pinning objects like arrays can be costly.  You might instead consider using Windows API functions or direct memory access functions through the Marshal class to organize your memory if you absolutely need to chug that last piece of speed out of your code, but be warned - it's not safe out there.

The "Speedy C#" Series:


Speedy C#, Part 2: Optimizing Memory Allocations – Pooling and Reusing Objects

Posted by Rob

In C#, Visual Basic .NET, C++/CLI, J# - the list goes on - we're freed from having to worry about our memory management.  Objects take care of themselves, and when the CLR garbage collector detects that an object is no longer in use, it frees the associated memory.  That doesn't mean that we should run around allocating and deallocating objects all willy-nilly; in fact, since we have less control over memory, we arguably have the opportunity to be more careful with the way we use high-frequency objects.

Memory Regions in .NET

In .NET, and generally in most programming, we can think of two places in which we can access memory: the stack and the heap.  We can think of Stack memory as temporary workspace, or scratch space; when we leave a function, all of our stack goes away.  Way, way down in the machine architecture, the stack also stores the return addresses of functions.  The stack also stores function parameters.  It's generally very orderly, inexpensive, but its volatile nature makes it a poor candidate for long-term storage.  In .NET, all types that derive from the ValueType class are stored on the stack unless they are boxed into an object reference; this includes types defined with the struct and enum keywords, as well as all of the primitive types except string (including int, double, and bool).

Heap memory is another matter.  The heap is a region of memory reserved for the use of the program and is intended to store objects that aren't quite so transient.  This might be something like a database connection, a file or buffer, or a window.

The Enemy: Fragmentation

Over time, objects are allocated and eventually released, and because there's not really any rhyme or reason, the heap becomes chaotic.  Allocations grab the first free block that's large enough (sometimes larger than necessary) and hold onto it until they go into the abyss.  This leads to fragmentation - all the free memory must be tracked somehow, and here's the real killer: contiguous blocks of free memory may not always be recognized as such.  Check this out: let's say we have a heap allocated at memory location 0x4000 that is 32 bytes wide:

An un-used heap

(Yes, my awesome artwork was done with none other than Excel!)

Suppose we allocate an 8-byte object and another 8-byte object, then a 16-byte object.  The first is in red, the second in orange, and the third in gray:

The heap after it was filled

Now I'll free the first and the third objects; we'll have 24 bytes of total free memory:

A fragmented heap

Either we need to keep track of every little piece of memory, which might be the fastest algorithm for releasing but slow for allocating (not to mention potentially VERY wasteful), or try to come up with another solution.  This type of memory fragmentation is referred to as external fragmentation.

The Garbage Collector and Compaction

The garbage collector has two components: a reference counter and a compaction engine.  The reference counter is responsible for determining when objects no longer have references to them; this frees programmers from having to explicitly destroy objects (as is the practice in C++ with the delete operator, or in C with the free function).  A lazy thread is then able to release and compact memory as needed, avoiding much of the overhead of external fragmentation and also allowing unused memory to be reclaimed.  The garbage collector in .NET is generational; it checks the newest objects first (what are called "gen-0"), and if the newest objects are still in use, they get moved to gen-1.  If the memory pressure requires, gen-1 objects are evaluated, and if they are still in use, they get moved to gen-2.  Gen-2 objects are considered long-lasting, and are only checked when memory pressure is severe.

Let's go back to our heap example; supposing I had an 8-byte, another 8-byte, and a 12-byte allocation, here's my heap graph:

A new heap

Object 1 (in red) has gone out of scope, but objects 2 and three are sticking around.  Using normal memory freeing rules, the largest object that could be allocated would still only be 8 bytes, because that would be the largest contiguous free space.  However, using .NET's compacting garbage collector, we could expect something along these lines:

A compacted heap graph

Here we can see we've dealt with the problem of external fragmentation by compacting the heap.  This convenience doesn't come without a cost, though; while the garbage collector performed the compaction, all of your application threads were suspended.  The GC can't guarantee object integrity if memory is getting abused during a garbage collection!

Preventing Compaction: Stop Killing off Objects!

Object pooling is a pattern to use that allows objects to be reused rather than allocated and deallocated, which helps to prevent heap fragmentation as well as costly GC compactions.  A pool can be thought of as an object factory; in essence, the most rudimentary pool could look like this:

   1: public class Pool<T> where T : new()
   2: {
   3:     private Stack<T> _items = new Stack<T>();
   4:     private object _sync = new object(); 
   6:     public T Get()
   7:     {
   8:         lock (_sync)
   9:         {
  10:             if (_items.Count == 0)
  11:             {
  12:                 return new T();
  13:             }
  14:             else
  15:             {
  16:                 return _items.Pop();
  17:             }
  18:         }
  19:     }
  21:     public void Free(T item)
  22:     {
  23:         lock (_sync)
  24:         {
  25:             _items.Push(item);
  26:         }
  27:     }
  28: }

Here, objects are created entirely on-demand and, when freed, are stored in a stack.  The reason we want to use a Stack is the performance characteristic of adding and removing objects; operations are always performed at the end of the list, which makes it highly efficient to add or remove items.  If possible, it may be prudent to pre-create a number of objects for use throughout the lifetime of your application. 

Here's an example: the project I've been discussing lately uses a pool of byte arrays to handle incoming network messages received and sent via a Socket.  When pooling is enabled, over the course of the application's lifetime, there were 17 Gen-0 collections, 5 Gen-1 collections, and 3 Gen-2 collections; a total of 270 byte[] instances were allocated, of which 44 were eligible for pooling and were pooled.  When pooling is disabled, there were 22 Gen-0 collections, 5 Gen-1 collections, and 3 Gen-2 collections; a total of 11,660 byte[] instances were allocated, of which approximately 10,900 were eligible for pooling.  That's a lot of memory!

Summary - When and Why

Object pooling is a powerful optimization technique, and if you're already using factory patterns it shouldn't be terribly foreign to you.  The .NET Framework includes the ThreadPool class as part of System.Threading.  Other objects you might consider pooling are database connections, any expensive links to unmanaged code, or anything that needs to be allocated frequently and can then be thrown away.  In my example, byte arrays are exceptionally good for this because they can be overwritten easily.

Further Reading

The "Speedy C#" Series:


Speedy C#, Part 1: Optimizing Long if-else or switch Branches

Posted by Rob

Lately I've been doing some interesting work that I've alluded to elsewhere dealing with the binary communications protocol hosted Blizzard Entertainment's game service.  It's kind of what brought me into C# development in the first place; I walked away from it for a few years, and now I've been digging into it again.  And I've learned a few things between then and now; I've been particularly interested in looking at the under-the-hood workings of the CLR, and so I'm starting a new series on "Speedy C#".  Let me be the first to point out that optimizations have a unique way of obfuscating code; particularly in this example, if you don't explain why you're doing what you're doing, and exactly what result you expect, you could run into trouble, or worse, your colleagues may run into trouble.  So while going through this series,

A little background: the binary protocol used for has about 80 or so message IDs, which generally have a different structure for each.  The messages don't necessarily come as a result of sending a message first, and so the general pattern is that a receive loop is in place that receives the data, parses it, and then sends events back to the client.  In fact, there are no synchronous requests defined by the protocol.

When I first started programming, I had handlers for every message ID in a switch/case branching construct:

   1: switch (packetID)
   2: {
   3:     case BncsPacketId.Null:
   4:         break;
   5:     case BncsPacketId.EnterChat:
   6:         string ecUniqueName = pck.ReadNTString();
   7:         string ecStatstring = pck.ReadNTString();
   8:         string ecAcctName = pck.ReadNTString();
   9:         EnteredChatEventArgs ecArgs = new EnteredChatEventArgs(ecUniqueName, ecStatstring, ecAcctName);
  10:         OnEnteredChat(ecArgs);
  11:         break;
  12:     // ... ad nauseum
  13: }

When I looked at this in ildasm, I noticed that it declared a max stack size of something ridiculously large (sorry I don't have a specific number - it was about 6 years ago).  I also noticed that there were a LOT of branches, but not necessarily in the order in which I had written them.  The compiler had intrinsically optimized my code to perform a binary search.  Fairly interesting, optimal speed at O(log N), and something that most of us wouldn't have thought of naturally!

When I last revisited this type of development, I broke all of my handlers out of the branching conditional, calling a separate method to handle each message.  This had a nice effect of making me not have to worry about variable name collisions like I had to in the above example, and it made the code slightly more maintainable.  It's difficult to gauge on paper whether that would have been better or worse performance; there was certainly far less stack allocation, but there was an additional (potentially virtual) method call.

The latest code incorporated into my library takes a different approach: I declare a Dictionary<BncsPacketId, ParseCallback>, populate it with default handlers, and allow existing handlers to be replaced and new ones to be added provided certain conditions are met.  This has had several benefits:

  • According to MSDN, Dictionary<TKey, TValue> approaches O(1), which is (obviously) the fastest lookup we could hope for. 
  • Adding support for new or changed messages does not require change to the code, only that a handler be updated via a method call.
  • Handlers can be switched at runtime.

In this code, a ParseCallback is a delegate that accepts information provided by the message header and the message contents themselves.  This has modified the entire parsing thread to be:

   1: private void Parse()
   2: {
   3:     try
   4:     {
   5:         while (IsConnected)
   6:         {
   7:             m_parseWait.Reset();
   9:             while (m_packetQueue.Count == 0)
  10:             {
  11:                 m_parseWait.WaitOne();
  12:             }
  14:             ParseData data = m_packetQueue.Dequeue();
  15:             if (m_packetToParserMap.ContainsKey(data.PacketID))
  16:             {
  17:                 m_packetToParserMap[data.PacketID](data);
  18:             }
  19:             else
  20:             {
  21:                 switch (data.PacketID)
  22:                 {
  23:                     #region SID_NULL
  24:                     case BncsPacketId.Null:
  25:                         break;
  26:                     #endregion
  27:                     default:
  28:                         Trace.WriteLine(data.PacketID, "Unhandled packet");
  29:                         if (!BattleNetClientResources.IncomingBufferPool.FreeBuffer(data.Data))
  30:                         {
  31:                             Debug.WriteLine(data.PacketID, "Incoming buffer was not freed for packet");
  32:                         }
  33:                         break;
  34:                 }
  35:             }
  36:         }
  37:     }
  38:     catch (ThreadAbortException)
  39:     {
  40:         // exit the thread gracefully.
  41:     }
  42: }

Now, obviously, this is a very domain-specific optimization that I wouldn't make unless it makes sense in the problem domain.  For mine, it does; I am writing the library so that others are able to integrate functionality without having to worry about modifying code that they maybe are not familiar with or are worried about breaking.  If you absolutely need to use this method, be sure to document why.

The "Speedy C#" Series:

Tagged as: , No Comments