Running with Code Like with scissors, only more dangerous

6Apr/141

The technical interview, part 2: Deck of Cards

Posted by Rob Paveza

Last time, I talked about how to dazzle an interviewer who asks you to write some code to test whether a given input is a palindrome. This time, it's my favorite interview question to ask.

Interviewer: Write some code that creates a deck of cards. If you ask for more details, I'll tell you something to the effect of, Imagine I want to write a game like poker or solitaire. I need to create a standard deck of cards.

This question is deliberately open-ended to test a few things:

  • Can you identify what kinds of objects are needed?
  • Can you design good APIs?
  • Can you choose effective data structures?
  • Can you design and implement good algorithms?

As a candidate, I would want to start out by establishing what cards are: they are a collection of a suit and a rank. These are each discrete values, and could be implemented as an enumeration in languages which support them, or else constants:

var Suit = {
    clubs: 1,
    diamonds: 2,
    hearts: 3,
    spades: 4
};
var Rank = {
    ace: 1,
    two: 2,
    three: 3,
    four: 4,
    five: 5,
    six: 6,
    seven: 7,
    eight: 8,
    nine: 9,
    ten: 10,
    jack: 11,
    queen: 12,
    king: 13
};
public enum Suit 
{
    Clubs,
    Diamonds,
    Hearts,
    Spades,
}

public enum Rank
{
    Ace,
    Two,
    Three,
    Four,
    Five,
    Six,
    Seven,
    Eight,
    Nine,
    Ten,
    Jack,
    Queen,
    King,
}
enum Suit {
    clubs,
    diamonds,
    hearts,
    spades,
}

enum Rank {
    ace,
    two,
    three,
    four,
    five,
    six,
    seven,
    eight,
    nine,
    ten,
    jack,
    queen,
    king,
}

Try this in the TypeScript playground.

An enumeration is a kind of type that provides a set of named constants. Not all languages support enumerations, but they're fairly easy to simulate in most, like I did in JavaScript above.

Now that we have the suit and rank set up, we can create a class or structure to represent a Card. Why have I chosen to do this? The problem asks for a deck of cards; clearly a Deck is a discrete object which is composed of some number of other discrete objects (Cards).

function Card(suit, rank) {
    if (suit < Suit.clubs || suit > Suit.spades)
        throw new RangeError('Invalid suit.');
    if (rank < Rank.ace || rank > Rank.king)
        throw new RangeError('Invalid rank.');

    this.suit = suit;
    this.rank = rank;
}
public class Card
{
    // The readonly modifier should only be used for immutable value-types.
    // In this case since they're enumerations which are value-types, it's good.
    private readonly Suit _suit;
    private readonly Rank _rank;

    public Card(Suit suit, Rank rank) 
    {
        // Checking range is also valid (like the other language examples).
        // This is a pretty good API to know about though.
        if (!Enum.IsDefined(typeof(Suit), suit))
            throw new ArgumentOutOfRangeException("suit");
        if (!Enum.IsDefined(typeof(Rank), rank))
            throw new ArgumentOutOfRangeException("rank");

        _suit = suit;
        _rank = rank;
    }

    public Suit Suit 
    {
        get { return _suit; }
    }

    public Rank Rank 
    {
        get { return _rank; }
    }
}
class Card
{
    // TypeScript supports a shorthand for declaring properties.
    // The code that this emits will be (almost) identical to the JavaScript sample,
    // except that the range check will be done after the property assignment.
    constructor(public suit: Suit, public rank: Rank) {
        if (suit < Suit.clubs || suit > Suit.spades)
            throw new RangeError('Invalid suit.');
        if (rank < Rank.ace || rank > Rank.king)
            throw new RangeError('Invalid rank.');
    }
}

Try this in the TypeScript playground.

In object-oriented programming, a class is the definition of a custom data type. Classes create an abstract "view" of data in the same way we can think of categories of objects (for example, "books" or "animals") which have some common sets of characteristics. Classes aren't the data themselves, however; we refer to individual datums as "objects" or "object instances." Each object of a given class should have the same kinds of data and similar behaviors, but each individual object might have specific nuances or individual data. For example, any given book might have a different number of pages, but all books have *some* number of pages.

Cards are simple data types. They have two properties - rank and suit - and once the card is created, you shouldn't be able to change those values. (The samples for JavaScript and TypeScript don't enforce that restriction, and if you point that out, you'll probably get bonus points). Given these constraints, it may also be worthwhile for the C# developer to point out that a Card could be a struct because structs should represent immutable values, but it's also not invalid to use a class here.

Classes have a desirable characteristic: they represent composition (or aggregation) of multiple values. That leads us to believe that a Deck, which is composed of multiple Cards, should also be a class:

function Deck() {
    this._cards = [];
}

Deck.prototype.addCard = function(card) {
    // Don't need to do checking specifically but if you want to be really fancy:
    if (!(card instanceof Card))
        throw new RangeError("Can only add a Card to the deck.");

    this._cards.push(card);
}

Deck.prototype.dealCard = function() {
    if (this._cards.length === 0)
        throw new RangeError("No cards to deal.");

    return this._cards.pop();
}

Deck.prototype.shuffle = function(numTimes) {
    // numTimes is an optional parameter.  If it's not a number, set a reasonable default.
    if (typeof numTimes !== 'number')
        numTimes = 5;

    var cards = this._cards;
    var cardCount = cards.length;

    // Do the shuffle operation numTimes times.
    for (var time = 0; time < numTimes; time++) {
        // Visit every card position once per "time"
        for (var index = 0; index < cardCount; index++) {
            // Create a random number in the range of [0, length)
            var numToSwap = Math.floor(Math.random() * cardCount);
            // Swap the cards at index and numToSwap
            var temp = cards[numToSwap];
            cards[numToSwap] = cards[index];
            cards[index] = temp;
        }
    }
}

// Not a prototype function.
Deck.createPokerDeck = function() {
    var result = new Deck();

    // Note these operators should be <= because we want to ensure we include all suits and ranks.
    for (var suit = Suit.clubs; suit <= Suit.spades; suit++) {
        for (var rank = Rank.ace; rank <= Rank.king; rank++) {
            var card = new Card(suit, rank);
            result.addCard(card);
        }
    }

    return result;
}
public class Deck
{
    private List<Card> _cards;
    private int _curPos;

    // The keen observer will notice that _curPos always equals _cards.Length - 1.
    // Can optimize this away but it's good for illustrative purposes.
    public Deck()
    {
        _cards = new List<Card>();
        _curPos = -1;
    }

    public void AddCard(Card card)
    {
        if (card == null)
            throw new ArgumentNullException("card");

        this._cards.Add(card);
        this._curPos++;
    }

    public Card DealCard()
    {
        if (this._curPos < 0 || this._cards.Count == 0)
            throw new InvalidOperationException("There are no cards to deal.");

        var card = this._cards[this._curPos];
        this._cards.RemoveAt(this._curPos);
        this._curPos--; // Optionally, decrement operation can be combined into the previous line as long as it's postfix.

        return card;
    }

    public void Shuffle(int numTimes = 5)
    {
        List<Card> cards = this._cards;
        int cardCount = cards.Count;
        Random rng = new Random();

        // Do the shuffle operation numTimes times.
        for (int time = 0; time < numTimes; time++)
        {
            // Visit every card position once per "time"
            for (var index = 0; index < cardCount; index++)
            {
                // Create a random number in the range of [0, length)
                int indexToSwap = rng.Next(cardCount);
                // Swap the cards at index and indexToSwap
                Card temp = cards[indexToSwap];
                cards[indexToSwap] = cards[index];
                cards[index] = temp;
            }
        }
    }

    public static Deck CreatePokerDeck()
    {
        Deck result = new Deck();

        // Note these operators should be <= because we want to ensure we include all suits and ranks.
        for (int suit = (int)Suit.Clubs; suit <= (int)Suit.Spades; suit++) {
            for (int rank = (int)Rank.Ace; rank <= (int)Rank.King; rank++) {
                var card = new Card((Suit)suit, (Rank)rank);
                result.AddCard(card);
            }
        }

        return result;
    }
}
class Deck
{
    private _cards: Array<Card>;
    constructor() {
        this._cards = [];
    }

    addCard(card: Card): void {
        // You can bypass runtime type checking here if you're using all TypeScript,
        // because the TypeScript compiler will emit a warning.  Otherwise, see 
        // the JavaScript sample for runtime type checking.

        this._cards.push(card);
    }

    dealCard(): Card {
        if (this._cards.length === 0)
            throw new RangeError('No cards to deal.');

        return this._cards.pop();
    }

    shuffle(numTimes: number = 5): void {
        var cards = this._cards;
        var cardCount = cards.length;

        // Do the shuffle operation numTimes times.
        for (var time = 0; time < numTimes; time++) {
            // Visit every card position once per "time"
            for (var index = 0; index < cardCount; index++) {
                // Create a random number in the range of [0, length)
                var numToSwap = Math.floor(Math.random() * cardCount);
                // Swap the cards at index and numToSwap
                var temp = cards[numToSwap];
                cards[numToSwap] = cards[index];
                cards[index] = temp;
            }
        }
    }

    static createPokerDeck(): Deck {
        var result = new Deck();

        // Note these operators should be <= because we want to ensure we include all suits and ranks.
        for (var suit = Suit.clubs; suit <= Suit.spades; suit++) {
            for (var rank = Rank.ace; rank <= Rank.king; rank++) {
                var card = new Card(suit, rank);
                result.addCard(card);
            }
        }

        return result;
    }
}

Try this in the TypeScript playground.

Let's unpack this piece-by-piece:

Use of an Array to store the Cards in JavaScript/TypeScript, List in C#

An Array is the default collection type in JavaScript and there isn't really a compelling reason to switch. Since an Array automatically grows and shrinks as you add and remove items, and these are such common operations that the engines optimize for them, there isn't a good reason to switch to another data type. In C#, the choice of a List is an interesting one. Alternatives include a linked list, Queue, or Stack, particularly because the JavaScript variants use Stack semantics (push/pop). However, the most computationally-intensive function of a Deck, Shuffle, requires random-access, and none of those alternatives is the right data type for random access.

AddCard

This method is pretty straightforward; it adds a new card to the end of the existing collection. You might call out that you're not checking for equality; this allows a Deck to contain multiple copies of the same value Card (for example, two Kings of Diamonds), for example. One thing that you might want to check for is reference equality (so that you're not adding two of the same exact Card variable, which would be impossible with real cards). However, that shouldn't be required unless you're really, really trying to dazzle your interviewer.

DealCard

This method is also pretty straightforward. It should take one card from one end of the array. In JavaScript, it's equally valid to use pop as shift, but I would balk at an implementation using splice as too complex. Range checking is a requirement for this method.

Basic operations and constructor

As an interviewer, I would expect a candidate to at minimum identified AddCard and DealCard as fundamental operations. I would strongly hope that the candidate would also have identified a need to shuffle the cards, and shuffling should also identify the need to have a basic set of cards to populate as well (or, the candidate may want a basic set of cards, and realize that they won't be shuffled). However, I would advise against populating the deck within the constructor. The Deck is the container of Cards; you use a Deck for Euchre, but it doesn't have the same set of Cards within it as a Deck for Texas hold 'em.

CreatePokerDeck

The most obvious way to implement this is to go through each of the ranks and suits, creating a Card for each, and adding it to an originally-empty Deck. There are a number of ways to do this; you might see the following implementation:

Deck.createPokerDeck = function () {
    var result = new Deck();

    var suits = [Suit.clubs, Suit.diamonds, Suit.hearts, Suit.spades];
    var ranks = [Rank.ace, Rank.two, Rank.three, Rank.four, Rank.five, Rank.six, Rank.seven, Rank.eight, Rank.nine, Rank.ten, Rank.jack, Rank.queen, Rank.king];
    suits.forEach(function (suit) {
        ranks.forEach(function (rank) {
            var card = new Card(suit, rank);
            result.addCard(card);
        });
    });

    return result;
}
    public static Deck CreatePokerDeck()
    {
        Deck result = new Deck();

        foreach (int suit in Enumerable.Range((int)Suit.Clubs, (int)Suit.Spades))
        {
            foreach (int rank in Enumerable.Range((int)Rank.Ace, (int)Rank.King))
            {
                var card = new Card((Suit)suit, (Rank)rank);
                result.AddCard(card);
            }
        }

        return result;
    }
    static createPokerDeck(): Deck {
        var result = new Deck();

        var suits: Array<Suit> = [Suit.clubs, Suit.diamonds, Suit.hearts, Suit.spades];
        var ranks: Array<Rank> = [Rank.ace, Rank.two, Rank.three, Rank.four, Rank.five, Rank.six, Rank.seven, Rank.eight, Rank.nine, Rank.ten, Rank.jack, Rank.queen, Rank.king];
        
        suits.forEach(suit => {
            ranks.forEach(rank => {
                var card = new Card(suit, rank);
                result.addCard(card);
            });
        });

        return result;
    }

Try this in the TypeScript playground.

I would raise a flag about complexity of this kind of implementation with the candidate. You might ask about performance and if there's a better way, particularly for the JavaScript versions, but also somewhat for the C# version. Each of these introduces new method calls which are really unnecessary, because you can get the same values just by doing addition.

Shuffle

Shuffle is my favorite problem of all. There are any number of ways to do it. It tests if candidates know about and understand how to use the random number generator in their language, and also lets you glimpse into reasoning. Almost always, candidates try to swap two random cards somewhere in the deck. The most obvious question to ask, though, is how do you know how many total times to actually perform swaps? You might shuffle 104 times for a 52-card deck, but even with that, there's a reasonable chance you might have never touched a particular card. Not that such an action is unreasonable; a card in a real deck might not change position either. But iterating through every card in a deck and then generating a random number is a sure way to ensure that every card is visited once.

One interesting pivot on this problem is using the JavaScript forEach function on your array, and then swapping. In general, I prefer not to see this; forEach implies an enumeration, and modifying an object being enumerated over is kind of a no-no. JavaScript engines apply a consistent behavior, but C# will barf on it (it's invalid to modify a collection while an enumerator is active on it in C#).

For JavaScript also, the developer needs to have clearly called Math.floor(Math.random() * count). Math.random() generates a double-precision floating point number in the range of [0, 1). Multiplying this value by the length of our array gives us a floating point number in the range of [0, length), but it'll still possibly have a fractional component; calling floor gives an integer in the range of [0, length-1].

Summary

This problem exercises basic problem-solving skills, some algorithms, some bits and bytes, and really focuses on object-oriented design skills. It also happens to be the exact problem I was given during my very first technical screening for a marketing agency in Phoenix.

Next time: Reasoning about linked lists.


3Apr/140

The technical interview, part 1: Palindrome

Posted by Rob Paveza

Last time, I talked about what I expect of an interview candidate as a technical job interviewer. This time, we're going to go through an initial question.

Interviewer: Write a function/method/subroutine to determine whether a particular string is a palindrome, i.e., is the same text when reversed as it is forward.

This is a common question for an interview because it allows you to demonstrate basic problem-solving skills and you can also show your interviewer that you know how to dive into the requirements. It requires that you can implement a simple algorithm, have an understanding of how strings work in your particular language, and can work through optimizations of your algorithm. Because the problem itself is so simple, you get a chance to dazzle your interviewer with good programming practices.

Understanding the problem

A palindrome is a string like kayak which you can reverse and end up with the same string. This gives you the naïve implementation:

    IsPalindrome(test)
        let reversed = ReverseString(test)
        return reversed equals test
    end

Most languages will have a baked-in function to reverse a string. So you could do that, but could you do it better?

Interviewer: Tell me about the performance characteristics of that baseline implementation.

You've allocated a new string, and depending on that particular implementation, it could have been costly. But you've got at least an O(n)* memory usage. String equality testing is going to be character-by-character, so that's going to be O(n) execution. So, if you're testing a string that has 2 million characters, the naïve algorithm allocates an additional 2mb string and does (in theory) up to 2 million comparisons.

How can we optimize? Well, let's start with understanding the basic problem again. Let's consider kayak:

k a y a k
0 1 2 3 4

Given this string layout, what would we need to do to go faster?

  1. Compare 0 to 4. Match, continue.
  2. Compare 1 to 3. Match, continue.
  3. Only one uncompared character is remaining, therefore return true.

This can be generalized to an algorithm:

  1. Initialize start and end indices to 0 and length - 1 (or 1 and length for 1-based index programming languages).
  2. Compare characters at the start and end indices. If they don't match, return false (fast exit).
  3. Increment start and decrement end.
  4. If the end index is less than or equal to start, return true. Otherwise, loop back to the step 2.

We now know enough to author a simple method to do this:

function isPalindrome(stringToTest) {
    var start = 0, end = stringToTest.length - 1;
    for (/* already initialized */ ; start < end; start++, end--) {
        if (stringToTest[start] !== stringToTest[end])
            return false;
    }
    // got to the end with no mismatches, implies success
    return true;
}
class Utilities 
{
    public static bool IsPalindrome(string toTest) 
    {
        int start = 0, end = toTest.Length;
        for (/* already initialized */ ; start < end; start++, end--)
        {
            if (toTest[start] != toTest[end])
                return false;
        }
        // got to the end with no mismatches, implies success
        return true;
    }
}
bool isPalindrome(const wchar_t* toTest const) {
    auto len = wcslen(toTest);
    wchar_t* start = const_cast<wchar_t*>(toTest);
    wchar_t* end = start + len - 1;
    for (/* already initialized */ ; start < end; start++, end--) {
        if (*start != *end)
            return false;
    }
    return true;
}

This is a fairly optimal solution.

Interviewer: What about error cases? What kind of error cases exist, and how can you work around them?

In JavaScript, because we don't do compile-time type validation, it's possible for types to mismatch. A caller can pass in a number, an object, undefined, no value, null, a Boolean, etc. Each of these has different odd behaviors; for an Array, for example, the behavior will effectively match how things work with strings. But, for most other values, because there isn't any string coercion, and math is performed directly against the value's length property (which will generally result in NaN), and therefore that will return true. isPalindrome(true) definitely should return false, but it doesn't!

There are a couple of ways to guard against this. You can test the input parameter for its type and fail with a TypeError. You could combine this by attaching the method to the String prototype, which would enable the function to be called against any String object as an instance method.

In C#, if you pass in a null reference, the method will fail with a NullReferenceException on the first dereference of toTest.Length. This is acceptable but likely not ideal; instead, it would be preferable to fail with an ArgumentNullException. You can also show nice API design intuition by making it into an extension method.

If you're authoring in C++, a great question to ask would be the kind of error environment you need to deal with. In COM, for example, you don't want to use structured exception handling, or at least allow SEH exceptions to cross the ABI boundary. Depending on your environment, you may be able to use SEH, but be prepared to author a COM version of the API. Of course, COM also advises a different type system. COM strings tend to be BSTRs, where a null value is equivalent to a zero-length string. (WinRT uses HSTRINGs, which behave similarly to BSTRs).

String.prototype.isPalindrome = function() { // argument removed, it is "this"
    if (typeof this !== 'string' && !(this instanceof String)) // In case of call, apply, or bind
        throw new TypeError('Can only call against a string.');

    var start = 0, end = this.length - 1;
    for (/* already initialized */ ; start < end; start++, end--) {
        if (this[start] !== this[end])
            return false;
    }
    // got to the end with no mismatches, implies success
    return true;
}
static class StringExtensions
{
    public static bool IsPalindrome(this string toTest) 
    {
        if (toTest == null)
            throw new ArgumentNullException("toTest");

        int start = 0, end = toTest.Length;
        for (/* already initialized */ ; start < end; start++, end--)
        {
            if (toTest[start] != toTest[end])
                return false;
        }
        // got to the end with no mismatches, implies success
        return true;
    }
}
// This sample is using WRL and Windows Runtime (because it's the C++ COM library that I know)
// IDL file:
namespace MyApi 
{
    [uuid(...)]
    [version(...)]
    interface IPalindromeTest 
    {
        HRESULT IsPalindrome([in] HSTRING toTest, [out, retval] boolean* result);
    }
}

// header:
using namespace ABI::MyApi;
class PalindromeTest : public RuntimeClass<IPalindromeTest>
{
    InspectableClass(InterfaceName_MyApi_IPalindromeTest, BaseTrust)

public:
    PalindromeTest();
    virtual ~PalindromeTest();
    HRESULT RuntimeClassInitialize();

    IFACEMETHOD(IsPalindrome)(_In_ HSTRING toTest, _Out_ boolean* result);
}

// implementation:
// Other stuff - RuntimeClassInitialize, etc. - goes here
IFACEMETHODIMP PalindromeTest::IsPalindrome(_In_ HSTRING toTest, _Out_ boolean* result)
{
    IfNullReturnError(result, E_POINTER);
    HRESULT hr;
    size_t len;
    const wchar_t* strVal;

    strVal = WindowsGetStringRawBuffer(toTest, &len);
    wchar_t* start = const_cast<wchar_t*>(strVal);
    wchar_t* end = start + len - 1;
    for (/* already initialized */ ; start < end; start++, end--) {
        if (*start != *end) 
        {
            *result = false;
            return S_OK;
        }
    }
    *result = true;
    return S_OK;
}

Interviewer: This is a pretty good solution. What are some of the pitfalls or problems with it? What are some possible variations that you might do?

This is an open-ended question intended to see what you might consider for your implementation and also to gauge what you know and maybe don't know. Possible answers include:

This implementation fails to work with Unicode characters above U+FFFF. JavaScript, modulo ES6, doesn't support characters outside of the UCS-2 range. They can go back and forth to a host which does, but intrinsically, JavaScript didn't understand the idea of surrogate pairs. C++'s wchar_t and C#'s string don't directly support surrogate pairs because the underlying character type is 16-bit. In order to fix the code, you'd need to get the "real" length of the strings (factoring in the code points) and then going code-point-by-code-point instead of index-by-index.

I could implement this function recursively instead of iteratively. Generally, this implementation is better as an iterative function than a recursive function. But recognizing that it's possible, because the algorithm is really about solving progressively simple sub-problems, is a good thing to do. In languages which support tail-call recursion, this can be as well-performing as the iterative implementation.

(function() {
    function isPalindrome(toCheck, startIndex, endIndex) {
        if (endIndex >= startIndex)
            return true;

        if (toCheck[startIndex] !== toCheck[endIndex])
            return false;

        return isPalindrome(toCheck, startIndex + 1, endIndex - 1);
    }

    String.prototype.isPalindrome = function() { 
        if (typeof this !== 'string' && !(this instanceof String)) 
            throw new TypeError('Can only call against a string.');

        return isPalindrome(this, 0, this.length - 1);
    }
})();

Interviewer: Great. How would you validate this implementation?

This gives you a few options for approaches. This problem is simple enough that you could intuit the sets of cases:

  • A null value
  • A zero-length string
  • An odd-length string
  • An even-length string

You might also walk through your code to identify all of the different code paths. Doing so will help you get to the same set of cases.

Summary

If you've talked through all of this, then chances are I already think you're a pretty strong candidate. This isn't a complete list of things I might ask you to talk about, but I'm hoping it's taken 15-20 minutes, and we can talk about more next.

Up next: a deck of cards.


* O(n) notation is a mechanism by which we describe the worst-case performance characteristics of a system. For memory, it's referring to the number of bytes allocated; for CPU utilization, it's the number of times a particular loop operation must be executed. It's generally described as the largest factor in the equation; O(n) is linear time, O(n2) is polynomial time, O(log n) is logarithmic time, etc. This is a simplification, but I'd suggest having a strong grasp of this concept.


2Apr/140

The technical job interview, part 0

Posted by Rob Paveza

I've had the opportunity to be on interview loops for the last six months or so at Microsoft. I had previously done interviews as part of my job at Terralever as well, and at least for the technical part of the job, it hasn't really changed. This post isn't really about interviewing specifically at Microsoft. I'm a Program Manager, but on a programming language team, which means that my job is much more technical than most PM jobs on, for example, Office products. My expectations as a technical interviewer for PM are probably more in line with a developer's job in other parts of the company. As such, I'd also expect that they're in line for other development jobs throughout the industry.

I'd expect the specific problems are going to vary based on the team or the job you're interviewing for, but some problems are good universal fits. In general, I try to ask questions which will fit within the following buckets:

  • Basic problem-solving skills. If I give you a simple problem statement, can you talk through how you would solve the problem and then turn it into code? Can you do it on a whiteboard without a code editor? What level of detail do I need to provide? What levels of detail are you going to ask me for?
  • Performance analysis and trade-offs. While approaching your basic problems, almost always, the problems have multiple possible solutions. I hope you're identifying the alternatives as we go, but if you don't, can you discuss them when I ask you about them?
  • Bits and bytes. Do you understand the basic number operations, endianness, and two's-complement arithmetic? Do you know about specific implications for your target environment (e.g., number representation and optimization in JavaScript has special considerations)? Do you understand why this post is called "Part 0"?
  • Data structures and algorithms. Do you understand LIFO/FIFO, linked lists vs. arrays, hash tables, etc.? If you're a junior candidate and are unfamiliar altogether with the data structures, do you understand what I'm talking about if I explain them to you?
  • Object-oriented design. Can you decide what should be a class, a function, global variables, etc.? Do you know how to represent these kinds of constructs in your language of choice (e.g., prototype inheritance in JavaScript)?
  • Job-specific skills. On my team, I generally ask about JavaScript questions (looking for a PM), but I know C, C++, C#, Java, and JavaScript well, and so generally, I'm interested in asking about fundamental understanding of at least one programming language. For a web app developer, I'm going to likely ask about jQuery, perhaps Angular, Node.js, or other common libraries. But, unless I'm interviewing a senior person who should be able to join the team and immediately contribute, APIs and libraries generally get de-prioritized, because a fundamental understanding of the language is going to make it easy to quickly grok a library.
  • Debugging. What are the tools you could use to debug a problem that might crop up? Do you know the sequence of steps to diagnose a bug that I might have introduced, or that you introduced?
  • Testing. Do you know how to test and validate your code? What are the ranges of inputs you might want to test?
  • How up to date are you? Do you love your trade? Are you watching for ES6 features and specs coming out of the JavaScript language design committee? Do you have personal projects? (This isn't a required category but it generally gives bonus points).

Over the next few posts, I'm going to talk through some common interview problems, what a candidate's answers might be, and then what the interviewer might try to discuss about the answers and questions. This isn't a comprehensive guide, and you shouldn't expect to get these questions if you actually interview with me. The purpose is really to give you the mindset for an interview. What will make you successful? What do you need to know? How can you build up your skills to be ready?

Up next: testing for a palindrome. (Yes, it's super-common).

9Jan/121

Revealing Prototype Pattern: Pros and Cons

Posted by Rob Paveza

A short while ago, I wrote a post generally saying good things about the Revealing Prototype Pattern but mostly focused tearing down the other part that was presented with it, namely the way that variables were declared in a chain separated by the comma operator. This post will discuss some of the pros and cons of using this pattern, and give some examples about when you should or shouldn't use it.

Advantages

As Dan notes in his post, this features a significant improvement over straight prototype assignment (assignment of an object literal to a prototype), in that private visibility is supported. And because you're still assigning an object to the prototype, you are able to take advantage of prototypal inheritance. (Next time: How prototypal inheritance really works and how it can make your head explode).

Except for chaining his variable declarations, I have to admit Dan had me pretty well sold on the Revealing Prototype Pattern. It's very elegant; it provides good protection to its inner variables, and it uses a syntax we've been seeing more and more of ever since jQuery became a popular platform.

Unfortunately, it has some nasty drawbacks.

Disadvantages

To be fair, Dan lists some of the disadvantages about this pattern; however, he doesn't quite list them as such, and I think he was possibly unaware of some of their implications:

There's something interesting that happens with variables though, especially if you plan on creating more than one Calculator object in a page. Looking at the public functions you'll see that the 'this' keyword is used to access the currNumberCtl and eqCtl variables defined in the constructor. This works great since the caller of the public functions will be the Calculator object instance which of course has the two variables defined. However, when one of the public functions calls a private function such as setVal(), the context of 'this' changes and you'll no longer have access to the two variables.

The first time I read through that I glossed over the problems; I didn't quite understand the issue until I wrote some code. So let's do that - we'll implement the Java StringTokenizer class:

function StringTokenizer(srcString, delim)
{
    if (typeof srcString === 'undefined')
        throw new ReferenceError("Parameter 0 'srcString' is required.");
    if (typeof srcString !== 'string')
        srcString = srcString.toString();
    if (typeof delim !== 'string')
        delim = ' ';
    
    if (!(this instanceof StringTokenizer))    // enforce constructor usage
        return new StringTokenizer(srcString, delim);
        
    this.sourceString = srcString;
    this.delimiter = delim;
}
StringTokenizer.prototype = (function()
{
    var that = this;
    
    var _tokens = that.sourceString.split(that.delimiter);
    var _index = 0;
    
    var _countTokens = function() { return _tokens.length; };
    var _hasMoreTokens = function() { return _index < _tokens.length; };
    var _nextToken = function()
    {
        if (!_hasMoreTokens())
            return false;
        
        var next = _tokens[_index];
        _index += 1;
        return next;
    };
    var _reset = function() { _index = 0; };
    
    var resultPrototype = 
    {
        countTokens: _countTokens,
        hasMoreTokens: _hasMoreTokens,
        nextToken: _nextToken,
        reset: _reset
    };
    return resultPrototype;
})();

If you've ever written a jQuery plugin, you'll probably recognize what I did with the prototype assignment function; when writing jQuery plugins, it's common to close over the current instance of the jQuery object by assigning var that = $(this); so that you can write event-handler functions without losing access to the overall context. Unfortunately, what I did in this case is wrong; you may already see why.

var that = this;

In this context, this is a reference to the global object, not to the instance of the object - even though the prototype is being set. This is a generalization of what Dan said. Rewriting it to overcome it results in information leaking:

function StringTokenizer(srcString, delim)
{
    if (typeof srcString === 'undefined')
        throw new ReferenceError("Parameter 0 'srcString' is required.");
    if (typeof srcString !== 'string')
        srcString = srcString.toString();
    if (typeof delim !== 'string')
        delim = ' ';
    
    if (!(this instanceof StringTokenizer))    // enforce constructor usage
        return new StringTokenizer(srcString, delim);
        
    this.sourceString = srcString;
    this.delimiter = delim;
    this.tokens = srcString.split(delim);
    this.index = 0;
}
StringTokenizer.prototype = (function()
{
    var _countTokens = function() { return this.tokens.length; };
    var _hasMoreTokens = function() { return this.index < this.tokens.length; };
    var _nextToken = function()
    {
        if (!this.hasMoreTokens())
            return false;
        
        var next = this.tokens[this.index];
        this.index += 1;
        return next;
    };
    var _reset = function() { this.index = 0; };
    
    var resultPrototype = 
    {
        countTokens: _countTokens,
        hasMoreTokens: _hasMoreTokens,
        nextToken: _nextToken,
        reset: _reset
    };
    return resultPrototype;
})();

The code works correctly; but you can see that we have to make public all of the state variables we'll use in the constructor. (The alternatives are to either initialize the state variables in each function, where they would still be public; or to create an init function, which would still cause the variables to be public AND would require the user to know to call the init function before calling anything else).

Dan also indicated that you needed a workaround for private functions:

There are a few tricks that can be used to deal with this, but to work around the context change I simply pass “this” from the public functions into the private functions.

Personally, I prefer to try to avoid things one might call clever or tricky, because that's code for "so complex you can't understand it". But even in the case where you have a public function, you'll still get an error if you don't reference it via a public function call. This error is nonintuitive and could otherwise make you go on a bug hunt for a long time. Consider this change to the above code:

    var _hasMoreTokens = function() { return this.index < this.tokens.length; };
    var _nextToken = function()
    {
        if (!_hasMoreTokens())   // changed from:   if (!this.hasMoreTokens())
            return false;
        
        var next = this.tokens[this.index];
        this.index += 1;
        return next;
    };

Simply removing the 'this' reference in the caller is enough to cause 'this' to go out-of-scope in the _hasMoreTokens function. This is completely unintuitive behavior for developers who grew up in the classical inheritance model.

Alternatives

I wouldn't want to give you all of these options without giving you an alternative. The alternative I present here is one in which the entire object is populated in the constructor:

"use strict";
function StringTokenizer(srcString, delim)
{
    if (typeof srcString === 'undefined')
        throw new ReferenceError("Parameter 0 'srcString' is required.");
    if (typeof srcString !== 'string')
        srcString = srcString.toString();
    if (typeof delim !== 'string')
        delim = ' ';
    
    if (!(this instanceof StringTokenizer))    // enforce constructor usage
        return new StringTokenizer(srcString, delim);
        
    if (typeof Object.defineProperty !== 'undefined')
    {
        Object.defineProperty(this, 'sourceString', { value: srcString });
        Object.defineProperty(this, 'delimiter', { value: delim });
    }
    else
    {
        this.sourceString = srcString;
        this.delimiter = delim;
    }
    
    var _tokens = this.sourceString.split(this.delimiter);
    var _index = 0;
    
    var _countTokens = function() { return _tokens.length; };
    var _hasMoreTokens = function() { return _index < _tokens.length; };
    var _nextToken = function()
    {
        if (!_hasMoreTokens())
            return false;
        
        var next = _tokens[_index];
        _index += 1;
        return next;
    };
    var _reset = function() { _index = 0; };
    
    if (typeof Object.defineProperty !== 'undefined')
    {
        Object.defineProperty(this, 'countTokens', { value: _countTokens });
        Object.defineProperty(this, 'hasMoreTokens', { value: _hasMoreTokens });
        Object.defineProperty(this, 'nextToken', { value: _nextToken });
        Object.defineProperty(this, 'reset', { value: _reset });
    }
    else
    {
        this.countTokens = _countTokens;
        this.hasMoreTokens = _hasMoreTokens;
        this.nextToken = _nextToken;
        this.reset = _reset;
    }
}

The advantage of a structure like this one is that you always have access to this. (Note that this example is unnecessarily large because I've taken the additional step of protecting the properties with Object.defineProperty where it is supported). You always have access to private variables and you always have access to the state. The unfortunate side effect of this strategy is that it doesn't take advantage of prototypal inheritance (it's not that you can't do it with this strategy - more of that coming in the future) and that the entire private and public states (including functions) are closed-over, so you use more memory. Although, one may ask: is that really such a big deal in THIS sample?

Usage Considerations

The Revealing Prototype Pattern can be a good pattern to follow if you're less concerned with maintaining data integrity and state. You have to be careful with access non-public data and functions with it, but it's pretty elegant; and if you're working on a lot of objects, you have the opportunity to save on some memory usage by delegating the function definitions into the prototype rather than the specific object definition. It falls short, though, when trying to emulate classical data structures and enforce protection mechanisms. As such, it can require complicated or clever tricks to work around its shortcomings, which can ultimately lead to overly-complex or difficult-to-maintain code.

Like most patterns, your mileage may vary.

6Jan/120

Defining Variables in JavaScript

Posted by Rob Paveza

I've lately been reviewing different patterns and practices recently, and after reading his article about the Revealing Prototype Pattern, I wanted to take some time to analyze Dan Wahlin's approach to defining variables. It's hard to believe I found some common ground with Douglas Crockford, but as they say about broken clocks.... [Addendum: apparently, since JSlint says to 'combine with previous var statement,' I don't agree with Crockford.] Anyway, to begin, this post is inspired by Dan Wahlin's presentation of the Revealing Prototype Pattern; I noticed what I thought was a curious way for him to define his private variables, and looking back through his blog posts he discussed it in the original blog post of the series, Techniques, Strategies, and Patterns for Structuring JavaScript Code. For the most part, I like what Dan has to say, but I'm going to have to disagree when it comes to defining variables.

The Proposition

As Dan points out, this is the standard way of defining variables in JavaScript:

var eqCtl;
var currNumberCtl;
var operator;
var operatorSet = false;
var equalsPressed = false;
var lastNumber = null;

He advocates trading that for this:

var eqCtl,
    currNumberCtl,
    operator,
    operatorSet = false,
    equalsPressed = false,
    lastNumber = null;

It saves on 20 keystrokes, and he claims improved readability. Now, I disagree with Crockford's argument that, because JavaScript hoists variables to the top of the function, that you should always declare the variable there. I believe that, whenever possible, you should try to maximize locality of a variable. This is a principle discussed in Steve McConnell's Code Complete; the reasoning behind maximization of locality is that the human brain can only comprehend so much at once. (This is, of course, another argument in favor of many, simple, and small subroutines). By delaying the declaration of a variable until it needs to be used, we are able to better-comprehend the meaning of the variable and how its use affects and relates to the rest of the program. As such, I believe that one of the premises for moving these declarations into a combined var statement - specifically, to reflect the hoisting - is a poor rationale.

Let's carry on.

Similarities to Other Elements

In The Prototype Pattern, Dan demonstrates the use of a JavaScript object literal in assignment to the Calculator prototype, so that any object created using the Calculator constructor would inherit all of those properties:

Calculator.prototype = {
    add: function (x, y) {
        return x + y;
    },
    subtract: function (x, y) {
        return x - y;
    },
    multiply: function (x, y) {
        return x * y;
    },
    // ...
};

The important thing to note here is that we are simply defining an object literal; we are not writing procedural code, and that comma is not an operator! It is an important part of the JavaScript grammar, to be sure, but the comma here does not have the same semantic meaning as the comma we saw before. This subtle difference may lead to coding errors, in which someone who uses comma syntax with both will mistakenly believe they are declaring an object literal and use colons to separate the identifier from the value; or that they are declaring variables and use assignment syntax to separate the property from its value.

Comma Operator Considered Harmful

It surprises me to learn that JSlint advocates combining var declarations. Crockford's The Elements of JavaScript Style, Part 1 indicates that he isn't terribly fond of it either:

The comma operator was borrowed, like much of JavaScript's syntax, from C. The comma operator takes two values and returns the second one. Its presence in the language definition tends to mask certain coding errors, so compilers tend to be blind to some mistakes. It is best to avoid the comma operator, and use the semicolon statement separator instead.

Whichever way Crockford prefers it, I think what we need to remember is just because you CAN do something does not mean you SHOULD.

Let's consider Dan's full body of JavaScript from the Revealing Prototype Pattern. I'm going to shrink it a little bit, to emphasize the changes I'll make; and I'm removing any of his comments.

var Calculator = function (cn, eq) {
    this.currNumberCtl = cn;
    this.eqCtl = eq;
};

Calculator.prototype = function () {
    var operator = null,
        operatorSet = false,
        equalsPressed = false,
        lastNumber = null,
        add = function (x, y) { return x + y; },
        subtract = function (x, y) { return x - y; },
        multiply = function (x, y) { return x * y; },
        // I'm going to do something evil here.
        divide = function (x, y) {
            if (y == 0) {
                alert("Can't divide by 0");
            }
            return x / y;
        },
        setVal = function (val, thisObj) { thisObj.currNumberCtl.innerHTML = val; },
        setEquation = function (val, thisObj) { thisObj.eqCtl.innerHTML = val; },
        clearNumbers = function () {
            lastNumber = null;
            equalsPressed = operatorSet = false;
            setVal('0',this);
            setEquation('',this);
        },
        setOperator = function (newOperator) {
            if (newOperator == '=') {
                equalsPressed = true;
                calculate(this);
                setEquation('',this);
                return;
            }
            if (!equalsPressed) calculate(this);
            equalsPressed = false;
            operator = newOperator;
            operatorSet = true;
            lastNumber = parseFloat(this.currNumberCtl.innerHTML);
            var eqText = (this.eqCtl.innerHTML == '') ?
                lastNumber + ' ' + operator + ' ' :
                this.eqCtl.innerHTML + ' ' + operator + ' ';
            setEquation(eqText,this);
        },
        numberClick = function (e) {
            var button = (e.target) ? e.target : e.srcElement;
            if (operatorSet == true || 
                this.currNumberCtl.innerHTML == '0') {
                setVal('', this);
                operatorSet = false;
            }
            setVal(this.currNumberCtl.innerHTML + button.innerHTML, this);
            setEquation(this.eqCtl.innerHTML + button.innerHTML, this);
        },
        calculate = function (thisObj) {
            if (!operator || lastNumber == null) return;
            var displayedNumber = parseFloat(thisObj.currNumberCtl.innerHTML),
                newVal = 0;
            switch (operator) {
                case '+':
                    newVal = add(lastNumber, displayedNumber);
                    break;
                case '-':
                    newVal = subtract(lastNumber, displayedNumber);
                    break;
                case '*':
                    newVal = multiply(lastNumber, displayedNumber);
                    break;
                case '/':
                    newVal = divide(lastNumber, displayedNumber);
                    break;
            }
            setVal(newVal, thisObj);
            lastNumber = newVal;
        };
    return {
        numberClick: numberClick,
        setOperator: setOperator,
        clearNumbers: clearNumbers
    };
} ();

Note my comment: "I'm going to do something evil here." Here goes:
console.log('Hello, world')

Do you see what happened here? Let me put it in context.

        subtract = function (x, y) { return x - y; },
        multiply = function (x, y) { return x * y; },
        console.log('Hello, world')
        divide = function (x, y) {
            if (y == 0) {
                alert("Can't divide by 0");
            }
            return x / y;
        },

JavaScript semicolon insertion blew away the var when I inserted any valid statement or expression. In fact, I could have simply put 'Hello, world!' or 5 there, on its own line, and because the next line is a valid statement that stands on its own, JavaScript semicolon insertion blew away the var. As such, divide, setVal, setEquation, clearNumbers, setOperator, numberClick, and calculate were all just elevated to global scope, possibly blowing away existing variables and leaking a whole bunch of information with them. This could happen in any instance in which someone mistakenly types a semicolon (let's be honest - JavaScript is a semicolon-terminated language; it will happen somewhat frequently), or if they forget to put a comma at the end of a line.

As such, joining variable declarations together by using the comma operator is inherently an unsafe operation. You might think of it as a run-on sentence; it's not a good thing to do in English, so why would it be good to do in JavaScript or any other programming language?

And if that's not reason enough, here's another: declaring a variable is a statement. You are stating to the compiler, "I am declaring this variable to be a variable." Use the var statement to make a statement, and use the comma operator to indicate that there are operations. (Specifically, the one and only place I can think of in which a comma operator would be appropriate is if you need a single for-loop with multiple iterator variables, e.g., for (var i = 0, j = myArray.length - 1; i = 0; i++, j--). Of course, I don't want to say you should never use it, or else I'd be like Crockford with his dogmatic "I have never seen a piece of code that was not improved by refactoring it to remove the continue statement," which is patently silly.

But, beware the comma. He is correct in that it is easy to mark programming errors with commas. If you're going to declare a variable, do everyone a favor and declare it, using var, make it feel special by giving it its own line and declaration and semicolon. It will help in maintenance down the line.

28Mar/110

A Recent Discovery: IronJS

Posted by Rob Paveza

Since Microsoft first announced Managed JScript and shortly thereafter announced its demise (more on that here), I’ve been chomping at the bit for an implementation of JavaScript on the DLR.  Actually, I’ve probably been hoping for it for far longer, because JavaScript is near and dear to my heart.  If not for Brinkster offering free ASP-based web hosting back when I was just a young’un, I may never have discovered server-side programming, and may never have ended up where I am today. 

IronJS is a free, open-source implementation of ECMAScript v3 on the DLR based predominantly on F#.  It’s even the right kind of free, currently licensed under the Apache License version 2 (not a copyleft license).  Check out this picture of its testbed application running:

image

As a learning exercise, I intend to figure out a good way to shoehorn this into my Battle.net (1.0) chat client, JinxBot.  It actually shouldn’t be terribly difficult, but while I considered adding it as a plugin, I think I’d like it to be part of the core application instead. 

I’ll more than likely be covering IronJS in several parts in my blog in the coming weeks (possibly months, since I’m getting married on May 1).  But these are topics that I intend to cover:

  • Hosting IronJS in a C# application, loading, and executing JavaScript code
  • Sharing objects between script and .NET
  • Isolating script from .NET (preventing script from having run of the whole runtime)
  • Isolating script from script (running multiple environments in one application, similar to how web pages in browsers like IE and Chrome all run from one instance of the script engine but have different contexts so they have different sets of objects)
  • Performance optimizations
  • Dynamically generating isolation layers
  • Other considerations

Stay tuned!  In the meantime, check out IronJS on GitHub and get it running on your machine!

8Apr/090

Unsung C# Hero: Closure

Posted by Rob

Today I’m going to talk about a feature of C# that has been around since 2.0 (with the introduction of anonymous delegates) but which gets nearly no lip service and, despite the fact that most C# developers have probably used it, they’ve probably used it without thinking about it.  This feature is called closure, and it refers to the ability of a nested function to make reference to the surrounding function’s variables.

This article will make extensive discussion of how delegates are implemented in C#; a review may be appropriate before diving in.  Also, we’ll be making a lot of use of The Tool Formerly Known as Lutz Roeder’s .NET Reflector, which is now owned by Red Gate Software.

Anonymous Methods without Closure

Supposing that I had a reason to do so, I could assign an event handler as an anonymous method.  I think this is generally bad practice (there is no way to explicitly dissociate the event handler, because it doesn’t have a name), but you can:

    public partial class SampleNoClosure : Form
    {
        public SampleNoClosure()
        {
            InitializeComponent();

            button1.Click += delegate
            {
                MessageBox.Show("I was clicked!  See?");
            };
        }
    }

This will work as expected; on click, a small alert dialog will appear.  Nothing terribly special about that, right?  We could have written that as a lambda expression as well, not that it buys us anything.  It looks like this in Reflector:

Anonymous method with no closure.

We see that the compiler auto-generates a method that matches the appropriate signature.  Nothing here should be completely surprising.

Simple Example of Closure

Here is a sample class that includes closure.  The enclosed variable is sum.  You’ll note that everything just makes sense internally, right? 

    public partial class SimpleClosureExample : Form
    {
        public SimpleClosureExample()
        {
            InitializeComponent();

            int sum = 1;
            for (int i = 1; i <= 10; i++)
                sum += sum * i;

            button1.Click += delegate
            {
                MessageBox.Show("The sum was " + sum.ToString());
            };
        }
    }

So, it only makes sense that sum can be part of that anonymous function, right?  But we need to bear in mind that all C# code must be statically-compiled; we can’t just calculate sum.  Besides, what happens if the value was a parameter to the function?  Something that couldn’t be precompiled?  Well, in order to handle these scenarios, we need to think about how this will work.

In order to keep that method state alive, we need to create another object.  That’s how the state can be maintained regardless of threads and regardless of calls to the function.  We can see it as a nested class here, and the anonymous method looks just like it does in code:

Closure supporting class

A More Advanced Example

Whenever you work with a LINQ expression, chances are you’re using closure and anonymous functions (and lambda expressions) and don’t realize it.  Consider this LINQ-to-SQL query:

            int boardID = 811;
            int perPage = 20;
            int pageIndex = 0;

            var topics = (from topic in dc.Topics
                          orderby topic.IsSticky descending, topic.LastMessage.TimePosted descending
                          where topic.BoardID == boardID
                          select new
                          {
                              topic.TopicID,
                              Subject = topic.FirstMessage.Subject,
                              LatestSubject = topic.LastMessage.Subject,
                              LatestChange = topic.LastMessage.ModifiedTime,
                              NameOfUpdater = topic.LastMessage.PosterName,
                              Updater = topic.LastMessage.User,
                              Starter = topic.FirstMessage.User,
                              NameOfStarter = topic.FirstMessage.PosterName,
                              topic.ReplyCount,
                              topic.ViewCount
                          })
                            .Skip(perPage * pageIndex)
                            .Take(perPage);
            foreach (var topic in topics)
            {
                Console.WriteLine("{0} - {1} {2} {3} {4} by {5}", topic.Subject, topic.NameOfStarter, topic.ReplyCount, topic.ViewCount, topic.LatestChange, topic.NameOfUpdater);
            }

The closure here is happening within the where clause; you may recall that the C# where clause evaluates to the IEnumerable<T> extension method Where(Func<TSource, bool> predicate).

Here, it’s very easy to imagine a case where we wanted to write actual parameters.  This query is used to generate and display a topic list for a message board; all “stickied” posts should be at the top and the rest should be sorted by last time posted.  If I’m making that into a web server control, I’m going to need to not hard-code the board ID, the number of topics per page to display, and which page I’m looking at.

Now, this is kind of a hard thing to conceptualize; when I was going through this project, I expected all three variables to be incorporated into the class.  It turns out that Skip() and Take() don’t evaluate a lambda expression – they take straight values – so we don’t ultimately have to store them for evaluation later.  However, as expected, boardID does have to be stored, and that provides us with an interesting question of why.  And you might be asking why that is even the case; LINQ-to-SQL translates this into SQL for us:

SELECT TOP (20) [t0].[TopicID], [t2].[Subject], [t1].[Subject] AS [LatestSubject], [t1].[ModifiedTime] AS [LatestChange], [t1].[PosterName] AS [NameOfUpdater], [t4].[test], [t4].[UserID], [t4].[Username], [t4].[Email], [t4].[PasswordHash], [t6].[test] AS [test2], [t6].[UserID] AS [UserID2], [t6].[Username] AS [Username2], [t6].[Email] AS [Email2], [t6].[PasswordHash] AS [PasswordHash2], [t2].[PosterName] AS [NameOfStarter], [t0].[ReplyCount], [t0].[ViewCount]
FROM [dbo].[Topics] AS [t0]
LEFT OUTER JOIN [dbo].[Messages] AS [t1] ON [t1].[MessageID] = [t0].[LastMessageID]
LEFT OUTER JOIN [dbo].[Messages] AS [t2] ON [t2].[MessageID] = [t0].[FirstMessageID]
LEFT OUTER JOIN (
    SELECT 1 AS [test], [t3].[UserID], [t3].[Username], [t3].[Email], [t3].[PasswordHash]
    FROM [dbo].[Users] AS [t3]
    ) AS [t4] ON [t4].[UserID] = [t1].[UserID]
LEFT OUTER JOIN (
    SELECT 1 AS [test], [t5].[UserID], [t5].[Username], [t5].[Email], [t5].[PasswordHash]
    FROM [dbo].[Users] AS [t5]
    ) AS [t6] ON [t6].[UserID] = [t2].[UserID]
WHERE [t0].[BoardID] = @p0
ORDER BY [t0].[IsSticky] DESC, [t1].[TimePosted] DESC

So why, if we already have the SQL generated, do we need to bother with it?  Well, you may recall that LINQ-to-SQL doesn’t support all possible operators.  If we break support for the LINQ-to-SQL query and we have to pull back all of the relevant items, we’ll have to use that class.  At this point though, it goes unused.

Review

A closure is when you take the variables of a function and use them within a function declared inside of it – in C#, this is through anonymous delegates and lambda expressions.  C# typically will accomplish the use of closures by creating an implicit child class to contain the required state of the function as it executes, handing off the actual method to the contained class.

Further Reading

1Aug/080

My C# 4.0 Wishlist Part 6: Automatic Properties for Enum Variables

Posted by Rob

OK, so I lied; I'm not stopping at 5 parts.

I've been working with enumerations frequently lately; the Battle.net chat protocol is binary and therefore the values that come over the wire have different contextual meanings based on the values that might have preceded them.  For example, a chat message event actually can have about a dozen meanings; it can be a server-broadcasted message, a message from another user, or just an announcement that a user joined the channel.  In addition to the standard values identifying things like message type, messages typically have one form or another of flags; if the event is based on a user, the flags contain information about the user's status on the server (whether the user is an administrator or has operator privileges in the channel).  Others, such as channel information updates, contain information about the chat channel itself, such as whether it is public, silent, or otherwise normal.

The Problem

Having had to deal with enumerations frequently has made me hate code like this:

   1: if (((UserFlags)e.Flags & UserFlags.ChannelOperator) == UserFlags.ChannelOperator)

Especially when working with bitwise values (enumerations decorated with the [Flags] attribute), because of the specific operator precedence constraints that C# places on the developer, this becomes annoying quickly.  So much so, that classes where I have to do that frequently end up with several TestFlag() methods, but even these are limited.  Consider code like this:

   1: bool TestFlag(UserFlags test, UserFlags reference) { ... }
   2: bool TestFlag(ChannelFlags test, ChannelFlags reference { ... }

Or this:

   1: bool TestFlag<T>(T test, T reference) {
   2:  // hard to implement since no meaningful type constraint can be placed on T
   3: }

Or this:

   1: bool TestFlag(int test, int reference) { ... }

In proposition 1 we have to implement n methods, either repeatedly or in a globally-defined, internal utility class; that stinks.  Proposition 2 is difficult to implement; we can't place a type constraint because C# doesn't allow enum type constraints, and since enums have a type constraint themselves of always being an integral value, this would be ideal; but type constraints in this case are limited to struct, which doesn't guarantee operator | or operator &.  In proposition 3, every time we want to test, we need to cast to int (or long) and lose type information.  I guess that works, but then you worry that you end up with code like this:

   1: if (TestFlag((int)e.User.Flags, (int)UserFlags.ServerAdministrator))
   2: { 
   3:     // ...
   4: } 
   5: else if (TestFlag((int)e.User.Flags, (int)UserFlags.ChannelOperator))
   6: {
   7:     // ...
   8: } // ...

No, there's a cleaner solution, and, like the compiler features added to C# 3.0, it doesn't require a new CLR: automatic properties on enumerations.

The Solution

Internally, enumerations are treated as their base numeric type by the CLR; the variable itself carries around type information, but it's not strong and can be changed by direct casting.  But the compiler always knows the type of a local variable and can apply it directly.  So, consider applying a property to an enumeration variable called IsEnumField.  Consider this [Flags] enumeration, and look at the code that uses it when using this style of coding:

   1: if (e.User.Flags.IsNone)
   2: {   } 
   3: else if (e.User.Flags.IsBlizzardRepresentative || e.User.Flags.IsBattleNetAdministrator)
   4: {   }
   5: else if (e.User.Flags.IsChannelOperator
   6: {   }
   7: else if (e.User.Flags.IsNoUDP)
   8: {   }

We can easily identify the pattern that the compiler supports; prefix "Is" to the field name and perform the underlying logic.

The great part about this solution is that the emitted code is exactly the same as what you or I would produce right now.  So the compiler can know by its clever compiler tricks to do this:

   1: if (e.User.Flags == UserFlags.None) {   }
   2: else if ((e.User.Flags & UserFlags.BlizzardRepresentative) == UserFlags.BlizzardRepresentative
   3:        || (e.User.Flags & UserFlags.BattleNetAdministrator) == UserFlags.BattleNetAdministrator) {   }
   4: else if ((e.User.Flags & UserFlags.ChannelOperator) == UserFlags.ChannelOperator) {   }
   5: else if ((e.User.Flags & UserFlags.NoUDP) == UserFlags.NoUDP) {   }

In that example, I qualified the type name UserFlags nine times.  Can you say "carpal tunnel"?

Future-Proofing

There are some considerations to make about this.  First, there are already going to be some enumerations in the wild with field names that begin with "Is," and it could very easily raise confusion if someone sees code such as user.Flags.IsIsOnline.  Fortunately, the solution is equally simple: create a decorator attribute, just like we did for extension methods:

   1: namespace System
   2: {
   3:     [AttributeUsage(AttributeTargets.Enum)]
   4:     public sealed class EnumPropertiesAttribute : Attribute { }
   5: }

Then, when you create an enumeration that you'd like to expose these style of properties, simply decorate the enumeration with this attribute.  IntelliSense knows to show the properties, the compiler knows to translate the properties, and we're in the free and clear.

Wouldn't it be great?

The C# 4.0 Wishlist series

27Jan/080

My C# 4.0 Wishlist, Part 5 : The raise Keyword

Posted by Rob

One of the more obscure features of C# is the ability to specify custom overloads for adding and removing event registration similarly to properties, via the add and remove keywords.  Known as "event accessors," they implement the parts of event registration that the C# compiler normally handles.  You didn't think that that += operator was implemented on the type, did you?

   1:  class Test
   2:  {
   3:      public event EventHandler Event1;
   4:   
   5:      private EventHandler ev2;
   6:      public event EventHandler Event2
   7:      {
   8:          add
   9:          {
  10:              if (ev2 != null)
  11:                  ev2 = (EventHandler)Delegate.Combine(ev2, value);
  12:              else
  13:                  ev2 = value;
  14:          }
  15:          remove
  16:          {
  17:              if (ev2 != null)
  18:                  ev2 = (EventHandler)Delegate.Remove(ev2, value);
  19:          }
  20:      }
  21:      protected virtual void OnEvent2(EventArgs e)
  22:      {
  23:          if (ev2 != null)
  24:              ev2(this, e);
  25:      }
  26:  }
  27:   

This pattern is actually used extensively throughout the Windows Forms library, where controls add event handlers to base event handler collections implemented within a hashtable.  I can only surmise that this is done to prevent having dozens of event fields cluttering up the classes.

Now, if we were to compile this app and disassemble it in Reflector, we'd get a very similar picture to what we've got.  Reflector would show the compiler-generated add/remove blocks for Event1, though not when the event declaration is selected, and it also indicates that there are compiler directives that show the event accessors are synchronized.

Visual Basic .NET also supports this pattern, but adds an additional keyword: the RaiseEvent keyword:

   1:  Public Class Test
   2:      Public Event Event1 As EventHandler
   3:   
   4:      Private ev2 As EventHandler
   5:      Public Custom Event Event2 As EventHandler
   6:          AddHandler(ByVal value As EventHandler)
   7:              If Not ev2 Is Nothing Then
   8:                  ev2 = CType(System.Delegate.Combine(ev2, value), EventHandler)
   9:              Else
  10:                  ev2 = value
  11:              End If
  12:          End AddHandler
  13:   
  14:          RemoveHandler(ByVal value As EventHandler)
  15:              If Not ev2 Is Nothing Then
  16:                  ev2 = CType(System.Delegate.Remove(ev2, value), EventHandler)
  17:              End If
  18:          End RemoveHandler
  19:   
  20:          RaiseEvent(ByVal sender As Object, ByVal e As System.EventArgs)
  21:              ev2(sender, e)
  22:          End RaiseEvent
  23:      End Event
  24:   
  25:      Protected Overridable Sub OnEvent2(ByVal e As EventArgs)
  26:          If Not ev2 Is Nothing Then
  27:              RaiseEvent Event2(Me, e)
  28:          End If
  29:      End Sub
  30:  End Class

In this example, Visual Basic allows you to implement exactly how Event2 is raised.  When I look at this in Reflector to see how C# uses this, here's what I see:

Reflector view of custom VB event

Reflector gives C# the raise keyword.  Why haven't the C# language experts done so?

How would this be worthwhile?  Well, suppose that we're building an application that can have plugins.  We don't know that plugins are always going to work correctly, so when they handle an event, they may raise an exception.  The problem is, if an event is invoked and the first event handler causes an exception, none of the successive handlers will be invoked.

Arguably, the "state of the application is undefined after an exception is raised, so we should gracefully exit."  But that's not always the case!  What if the way to gracefully do this is to analyze the stack trace within the application, determine which plugin caused the exception, and unload the plugin?  We can't do any of this from C#.

Give us the raise keyword!

This is the end of my "C# 4.0 Wishlist" series.  For reference, here are the other articles:

25Jan/080

My C# 4.0 Wishlist, Part 4 : Constant typeof() Expressions

Posted by Rob

Along with some of the hacks I introduced into ShinyDesign, there was a problem using a generic parameter as an enum - I couldn't cast it back to an integral type, even System.UInt64, because T was not guaranteed to be an integral value (yet again why we should allow a type constraint, but I digress).

In any case, there have been cases where I'd like to, for instance, switch against a Type, particularly since incorporating generics.  Consider:

   1:  switch (typeof(T).GetUnderlyingType())
   2:  {
   3:      case typeof(byte):
   4:      case typeof(sbyte):
   5:          break;
   6:      case typeof(short):
   7:      case typeof(ushort):
   8:          break;
   9:      case typeof(int):
  10:      case typeof(uint):
  11:          break;
  12:      case typeof(long):
  13:      case typeof(ulong):
  14:          break;
  15:  }

This is MUCH cleaner than the alternative, current implementation:

   1:  Type t = typeof(T).GetUnderlyingType();
   2:  if (t == typeof(byte) || t == typeof(sbyte))
   3:  { }
   4:  else if (t == typeof(short) || t == typeof(ushort))
   5:  { } 
   6:  else if (t == typeof(int) || t == typeof(uint))
   7:  { } 
   8:  else if (t == typeof(long) || t == typeof(ulong))
   9:  { }

So this is a working example of how the syntax would be cleaner by allowing us to use the typeof expression result as a constant value.  If you've never tried this, the compiler complains.  Given this code:

 155:  switch (t)
 156:  {
 157:      case typeof(int):
 158:      case typeof(uint):
 159:          break;
 160:  }

I get:

EnumTypeConverter.cs(155,21): error CS0151: A value of an integral type expected

EnumTypeConverter.cs(157,22): error CS0150: A constant value is expected

EnumTypeConverter.cs(158,22): error CS0150: A constant value is expected

I'm sure you've switched over a string, though - it's one of the nice syntactical features of C#.  You might be wondering why, if switching over a string is possible, then why not a Type?

Switching on a string doesn't switch on a string - it shoots the strings into a Dictionary<string, int>, stores the offsets, and then uses a jump table with the IL switch instruction:

image

Yeah, obviously there's a lot of opportunity to misuse the typeof expressions.  But there are going to be legit uses, too, and honestly - if C# can have a compiler trick for strings, it can have a compiler trick for types.  And let's be honest - typeof() expressions aren't ever going to return different values for the same app (that's why people were locking types to synchronize across an AppDomain). 

This - like the inability to constrain a type constraint to an enum - is an artificial constraint that really shouldn't be there.