Running with Code Like with scissors, only more dangerous

6Feb/120

In a disconnected world, robust code is crucial

Posted by Rob Paveza

Probably 99.99% of HTML applications and websites are served over HTTP exclusively. (I'm referring here to HTTP as a transport protocol, not HTTP vs.HTTPS, for example, and I realize that HTTP is an application-layer protocol according to OSI; but developers generally treat it as an abstraction for "the network"). As anybody who has done web programming knows, HTTP is a stateless protocol; that is, it's based on a request-response model, and in general, one request has no knowledge of previous requests. This has posed some challenges for web developers over the years, and some brilliant abstractions of state on top of the statelessness have been devised.

The hard part now, though, isn't to deal with statelessness. It's dealing with the request-and-response model.

All network communication is inherently request-and-response. There are some applications that utilize full-duplex communications to get around that (think of chat software), but for the most part, that isn't really available behind the firewall. Web sockets are still yet to be standardized (and there are some questions about long-term compatibility with WebSocket-ignorant proxies). And typically, corporate firewalls say no to outbound connections except on ports 80 or 443. Some applications (think Meebo) have been able to get around this limitation by cleverly using long-timeout delays on AJAX requests. The client makes a request to the server, and the server either responds immediately (if an event is in queue) or holds the request for 30-90 seconds to see if an event comes in. I even did this once myself with good success, although I never took that app into production. (There was also some question about the total # of clients an ASP.NET server could sustain whilst holding threads in that way).

In many respects, Windows developers haven't had to deal with this. We could issue synchronous requests, and the UI would stand still for a second, and either it would work or it would fail. But usability concerns over this process, as well as issues with high network latency (imagine pressing the "Submit" button and having to wait 20 seconds while your app freezes - by then, I've force-closed the app) have seen platform providers decree that asynchrony is the only way to go.

HTML isn't the only application provider dealing with this limitation. Adobe Flash has long had an asynchronous-communication-only model, Microsoft Silverlight has also carried on this principle; of course, these two applications have lived predominantly in browsers, where a hanging UI probably means interfering with other apps as well as the one making the request. Interestingly, WinRT - the Windows 8 developer framework - is also going to mandate an asynchronous model, following in the Silverlight-based foodsteps blazed by Windows Phone 7.

So as we trek out into the world of asynchrony, well, we have a whole mess of questions to deal with now:

  • If there's an error, does it show up in the calling method or in the callback method? Does it even show up?
  • Does a network (transport-level) error surface differently than an application error? What if the server returned an HTTP 403 Forbidden response?
  • What are all of the different kinds of errors that can crop up? Do I need to handle SocketException or is that going to be abstracted to something more meaningful to my application?
  • What do I do if a network error comes up? Do I assume that I'm offline, panic, and quit? What if my application only makes sense "online"?
  • Do I surface an error to the customer? Silently fail? I might generally fail silently if I'm a background process, but then again, what if it's an important one? What if the customer thought he was saving his draft while all along it was offline, and then the customer closes the browser?
  • During the async operation, should I show the user a timeout spinner or something to that effect?
  • How should I design my async operations? For example, consider a Save operation. Should I capture all of my state at once and send it off, and let the user immediately keep working? Should I make the user wait until saving completes? Should I even use Save, or automatically save whenever something changes?
  • If I use auto-save, how do I handle undo? What if I want to undo between sessions? Is there a way to go back if the hosting application crashes? (Worst case scenario: the user accidentally hit Select All, Delete and then the browser crashed after the auto-save).

This merely scratches the surface of the kinds of questions we'll need to begin asking ourselves. It doesn't even deal with the difficulties of programming asynchronously, which C# 5 is going to deal with extraordinarily, but many developers will be unable to take advantage of these updates. For example, suppose I have a widget on my page that monitors the status of a long-running server-based process. I need to have JavaScript on my page that monitors that process and updates my widget accordingly. Should I:

  • Write a singleton object? This might be easier and afford strong member protection, but I can only have one widget, unless I somehow differentiate between them and multiplex, which can become hairy quickly.
  • Should the monitoring function accept a callback, or should it be event-based, so that multiple subscribers can listen? (Maybe an event-based model offers some interesting ways to deal with the complexities of a singleton?)
  • Should the widget manipulate the view directly, or should I write separate code that handles the view based on the state of the object (or objects)?

The list goes on.

We're moving faster and faster into an asychronous world. It is already happening, and we as developers need to be prepared to handle these difficulties. We also need to understand how to communicate these kinds of questions to our business analysts, our supervisors, and our customers. We need to be able to equip ourselves to ask the right questions of our customers, so that when it's time to make a decision, we have the information we need.

10Jan/120

Please, don’t ever use the “Remove Unused Namespaces” feature

Posted by Rob Paveza

Seriously, I don't know why this feature exists or who thought it would be a good idea. The time it takes to compile even the most complicated C# file is trivial.

On the other hand, when I try to access the .Take() method in System.Linq, or the Encoding class in System.Text, or suddenly I need a new List<T> from System.Collections.Generic - if you have used a tool to remove unused namespaces from your C# code file - you have made me wonder why my editor isn't detecting it.

There's a reason that many default namespaces are included with a default C# code file. Leave them there! Maintenance programmers will love you forever.

9Jan/121

Revealing Prototype Pattern: Pros and Cons

Posted by Rob Paveza

A short while ago, I wrote a post generally saying good things about the Revealing Prototype Pattern but mostly focused tearing down the other part that was presented with it, namely the way that variables were declared in a chain separated by the comma operator. This post will discuss some of the pros and cons of using this pattern, and give some examples about when you should or shouldn't use it.

Advantages

As Dan notes in his post, this features a significant improvement over straight prototype assignment (assignment of an object literal to a prototype), in that private visibility is supported. And because you're still assigning an object to the prototype, you are able to take advantage of prototypal inheritance. (Next time: How prototypal inheritance really works and how it can make your head explode).

Except for chaining his variable declarations, I have to admit Dan had me pretty well sold on the Revealing Prototype Pattern. It's very elegant; it provides good protection to its inner variables, and it uses a syntax we've been seeing more and more of ever since jQuery became a popular platform.

Unfortunately, it has some nasty drawbacks.

Disadvantages

To be fair, Dan lists some of the disadvantages about this pattern; however, he doesn't quite list them as such, and I think he was possibly unaware of some of their implications:

There's something interesting that happens with variables though, especially if you plan on creating more than one Calculator object in a page. Looking at the public functions you'll see that the 'this' keyword is used to access the currNumberCtl and eqCtl variables defined in the constructor. This works great since the caller of the public functions will be the Calculator object instance which of course has the two variables defined. However, when one of the public functions calls a private function such as setVal(), the context of 'this' changes and you'll no longer have access to the two variables.

The first time I read through that I glossed over the problems; I didn't quite understand the issue until I wrote some code. So let's do that - we'll implement the Java StringTokenizer class:

function StringTokenizer(srcString, delim)
{
    if (typeof srcString === 'undefined')
        throw new ReferenceError("Parameter 0 'srcString' is required.");
    if (typeof srcString !== 'string')
        srcString = srcString.toString();
    if (typeof delim !== 'string')
        delim = ' ';
    
    if (!(this instanceof StringTokenizer))    // enforce constructor usage
        return new StringTokenizer(srcString, delim);
        
    this.sourceString = srcString;
    this.delimiter = delim;
}
StringTokenizer.prototype = (function()
{
    var that = this;
    
    var _tokens = that.sourceString.split(that.delimiter);
    var _index = 0;
    
    var _countTokens = function() { return _tokens.length; };
    var _hasMoreTokens = function() { return _index < _tokens.length; };
    var _nextToken = function()
    {
        if (!_hasMoreTokens())
            return false;
        
        var next = _tokens[_index];
        _index += 1;
        return next;
    };
    var _reset = function() { _index = 0; };
    
    var resultPrototype = 
    {
        countTokens: _countTokens,
        hasMoreTokens: _hasMoreTokens,
        nextToken: _nextToken,
        reset: _reset
    };
    return resultPrototype;
})();

If you've ever written a jQuery plugin, you'll probably recognize what I did with the prototype assignment function; when writing jQuery plugins, it's common to close over the current instance of the jQuery object by assigning var that = $(this); so that you can write event-handler functions without losing access to the overall context. Unfortunately, what I did in this case is wrong; you may already see why.

var that = this;

In this context, this is a reference to the global object, not to the instance of the object - even though the prototype is being set. This is a generalization of what Dan said. Rewriting it to overcome it results in information leaking:

function StringTokenizer(srcString, delim)
{
    if (typeof srcString === 'undefined')
        throw new ReferenceError("Parameter 0 'srcString' is required.");
    if (typeof srcString !== 'string')
        srcString = srcString.toString();
    if (typeof delim !== 'string')
        delim = ' ';
    
    if (!(this instanceof StringTokenizer))    // enforce constructor usage
        return new StringTokenizer(srcString, delim);
        
    this.sourceString = srcString;
    this.delimiter = delim;
    this.tokens = srcString.split(delim);
    this.index = 0;
}
StringTokenizer.prototype = (function()
{
    var _countTokens = function() { return this.tokens.length; };
    var _hasMoreTokens = function() { return this.index < this.tokens.length; };
    var _nextToken = function()
    {
        if (!this.hasMoreTokens())
            return false;
        
        var next = this.tokens[this.index];
        this.index += 1;
        return next;
    };
    var _reset = function() { this.index = 0; };
    
    var resultPrototype = 
    {
        countTokens: _countTokens,
        hasMoreTokens: _hasMoreTokens,
        nextToken: _nextToken,
        reset: _reset
    };
    return resultPrototype;
})();

The code works correctly; but you can see that we have to make public all of the state variables we'll use in the constructor. (The alternatives are to either initialize the state variables in each function, where they would still be public; or to create an init function, which would still cause the variables to be public AND would require the user to know to call the init function before calling anything else).

Dan also indicated that you needed a workaround for private functions:

There are a few tricks that can be used to deal with this, but to work around the context change I simply pass “this” from the public functions into the private functions.

Personally, I prefer to try to avoid things one might call clever or tricky, because that's code for "so complex you can't understand it". But even in the case where you have a public function, you'll still get an error if you don't reference it via a public function call. This error is nonintuitive and could otherwise make you go on a bug hunt for a long time. Consider this change to the above code:

    var _hasMoreTokens = function() { return this.index < this.tokens.length; };
    var _nextToken = function()
    {
        if (!_hasMoreTokens())   // changed from:   if (!this.hasMoreTokens())
            return false;
        
        var next = this.tokens[this.index];
        this.index += 1;
        return next;
    };

Simply removing the 'this' reference in the caller is enough to cause 'this' to go out-of-scope in the _hasMoreTokens function. This is completely unintuitive behavior for developers who grew up in the classical inheritance model.

Alternatives

I wouldn't want to give you all of these options without giving you an alternative. The alternative I present here is one in which the entire object is populated in the constructor:

"use strict";
function StringTokenizer(srcString, delim)
{
    if (typeof srcString === 'undefined')
        throw new ReferenceError("Parameter 0 'srcString' is required.");
    if (typeof srcString !== 'string')
        srcString = srcString.toString();
    if (typeof delim !== 'string')
        delim = ' ';
    
    if (!(this instanceof StringTokenizer))    // enforce constructor usage
        return new StringTokenizer(srcString, delim);
        
    if (typeof Object.defineProperty !== 'undefined')
    {
        Object.defineProperty(this, 'sourceString', { value: srcString });
        Object.defineProperty(this, 'delimiter', { value: delim });
    }
    else
    {
        this.sourceString = srcString;
        this.delimiter = delim;
    }
    
    var _tokens = this.sourceString.split(this.delimiter);
    var _index = 0;
    
    var _countTokens = function() { return _tokens.length; };
    var _hasMoreTokens = function() { return _index < _tokens.length; };
    var _nextToken = function()
    {
        if (!_hasMoreTokens())
            return false;
        
        var next = _tokens[_index];
        _index += 1;
        return next;
    };
    var _reset = function() { _index = 0; };
    
    if (typeof Object.defineProperty !== 'undefined')
    {
        Object.defineProperty(this, 'countTokens', { value: _countTokens });
        Object.defineProperty(this, 'hasMoreTokens', { value: _hasMoreTokens });
        Object.defineProperty(this, 'nextToken', { value: _nextToken });
        Object.defineProperty(this, 'reset', { value: _reset });
    }
    else
    {
        this.countTokens = _countTokens;
        this.hasMoreTokens = _hasMoreTokens;
        this.nextToken = _nextToken;
        this.reset = _reset;
    }
}

The advantage of a structure like this one is that you always have access to this. (Note that this example is unnecessarily large because I've taken the additional step of protecting the properties with Object.defineProperty where it is supported). You always have access to private variables and you always have access to the state. The unfortunate side effect of this strategy is that it doesn't take advantage of prototypal inheritance (it's not that you can't do it with this strategy - more of that coming in the future) and that the entire private and public states (including functions) are closed-over, so you use more memory. Although, one may ask: is that really such a big deal in THIS sample?

Usage Considerations

The Revealing Prototype Pattern can be a good pattern to follow if you're less concerned with maintaining data integrity and state. You have to be careful with access non-public data and functions with it, but it's pretty elegant; and if you're working on a lot of objects, you have the opportunity to save on some memory usage by delegating the function definitions into the prototype rather than the specific object definition. It falls short, though, when trying to emulate classical data structures and enforce protection mechanisms. As such, it can require complicated or clever tricks to work around its shortcomings, which can ultimately lead to overly-complex or difficult-to-maintain code.

Like most patterns, your mileage may vary.

6Jan/120

Defining Variables in JavaScript

Posted by Rob Paveza

I've lately been reviewing different patterns and practices recently, and after reading his article about the Revealing Prototype Pattern, I wanted to take some time to analyze Dan Wahlin's approach to defining variables. It's hard to believe I found some common ground with Douglas Crockford, but as they say about broken clocks.... [Addendum: apparently, since JSlint says to 'combine with previous var statement,' I don't agree with Crockford.] Anyway, to begin, this post is inspired by Dan Wahlin's presentation of the Revealing Prototype Pattern; I noticed what I thought was a curious way for him to define his private variables, and looking back through his blog posts he discussed it in the original blog post of the series, Techniques, Strategies, and Patterns for Structuring JavaScript Code. For the most part, I like what Dan has to say, but I'm going to have to disagree when it comes to defining variables.

The Proposition

As Dan points out, this is the standard way of defining variables in JavaScript:

var eqCtl;
var currNumberCtl;
var operator;
var operatorSet = false;
var equalsPressed = false;
var lastNumber = null;

He advocates trading that for this:

var eqCtl,
    currNumberCtl,
    operator,
    operatorSet = false,
    equalsPressed = false,
    lastNumber = null;

It saves on 20 keystrokes, and he claims improved readability. Now, I disagree with Crockford's argument that, because JavaScript hoists variables to the top of the function, that you should always declare the variable there. I believe that, whenever possible, you should try to maximize locality of a variable. This is a principle discussed in Steve McConnell's Code Complete; the reasoning behind maximization of locality is that the human brain can only comprehend so much at once. (This is, of course, another argument in favor of many, simple, and small subroutines). By delaying the declaration of a variable until it needs to be used, we are able to better-comprehend the meaning of the variable and how its use affects and relates to the rest of the program. As such, I believe that one of the premises for moving these declarations into a combined var statement - specifically, to reflect the hoisting - is a poor rationale.

Let's carry on.

Similarities to Other Elements

In The Prototype Pattern, Dan demonstrates the use of a JavaScript object literal in assignment to the Calculator prototype, so that any object created using the Calculator constructor would inherit all of those properties:

Calculator.prototype = {
    add: function (x, y) {
        return x + y;
    },
    subtract: function (x, y) {
        return x - y;
    },
    multiply: function (x, y) {
        return x * y;
    },
    // ...
};

The important thing to note here is that we are simply defining an object literal; we are not writing procedural code, and that comma is not an operator! It is an important part of the JavaScript grammar, to be sure, but the comma here does not have the same semantic meaning as the comma we saw before. This subtle difference may lead to coding errors, in which someone who uses comma syntax with both will mistakenly believe they are declaring an object literal and use colons to separate the identifier from the value; or that they are declaring variables and use assignment syntax to separate the property from its value.

Comma Operator Considered Harmful

It surprises me to learn that JSlint advocates combining var declarations. Crockford's The Elements of JavaScript Style, Part 1 indicates that he isn't terribly fond of it either:

The comma operator was borrowed, like much of JavaScript's syntax, from C. The comma operator takes two values and returns the second one. Its presence in the language definition tends to mask certain coding errors, so compilers tend to be blind to some mistakes. It is best to avoid the comma operator, and use the semicolon statement separator instead.

Whichever way Crockford prefers it, I think what we need to remember is just because you CAN do something does not mean you SHOULD.

Let's consider Dan's full body of JavaScript from the Revealing Prototype Pattern. I'm going to shrink it a little bit, to emphasize the changes I'll make; and I'm removing any of his comments.

var Calculator = function (cn, eq) {
    this.currNumberCtl = cn;
    this.eqCtl = eq;
};

Calculator.prototype = function () {
    var operator = null,
        operatorSet = false,
        equalsPressed = false,
        lastNumber = null,
        add = function (x, y) { return x + y; },
        subtract = function (x, y) { return x - y; },
        multiply = function (x, y) { return x * y; },
        // I'm going to do something evil here.
        divide = function (x, y) {
            if (y == 0) {
                alert("Can't divide by 0");
            }
            return x / y;
        },
        setVal = function (val, thisObj) { thisObj.currNumberCtl.innerHTML = val; },
        setEquation = function (val, thisObj) { thisObj.eqCtl.innerHTML = val; },
        clearNumbers = function () {
            lastNumber = null;
            equalsPressed = operatorSet = false;
            setVal('0',this);
            setEquation('',this);
        },
        setOperator = function (newOperator) {
            if (newOperator == '=') {
                equalsPressed = true;
                calculate(this);
                setEquation('',this);
                return;
            }
            if (!equalsPressed) calculate(this);
            equalsPressed = false;
            operator = newOperator;
            operatorSet = true;
            lastNumber = parseFloat(this.currNumberCtl.innerHTML);
            var eqText = (this.eqCtl.innerHTML == '') ?
                lastNumber + ' ' + operator + ' ' :
                this.eqCtl.innerHTML + ' ' + operator + ' ';
            setEquation(eqText,this);
        },
        numberClick = function (e) {
            var button = (e.target) ? e.target : e.srcElement;
            if (operatorSet == true || 
                this.currNumberCtl.innerHTML == '0') {
                setVal('', this);
                operatorSet = false;
            }
            setVal(this.currNumberCtl.innerHTML + button.innerHTML, this);
            setEquation(this.eqCtl.innerHTML + button.innerHTML, this);
        },
        calculate = function (thisObj) {
            if (!operator || lastNumber == null) return;
            var displayedNumber = parseFloat(thisObj.currNumberCtl.innerHTML),
                newVal = 0;
            switch (operator) {
                case '+':
                    newVal = add(lastNumber, displayedNumber);
                    break;
                case '-':
                    newVal = subtract(lastNumber, displayedNumber);
                    break;
                case '*':
                    newVal = multiply(lastNumber, displayedNumber);
                    break;
                case '/':
                    newVal = divide(lastNumber, displayedNumber);
                    break;
            }
            setVal(newVal, thisObj);
            lastNumber = newVal;
        };
    return {
        numberClick: numberClick,
        setOperator: setOperator,
        clearNumbers: clearNumbers
    };
} ();

Note my comment: "I'm going to do something evil here." Here goes:
console.log('Hello, world')

Do you see what happened here? Let me put it in context.

        subtract = function (x, y) { return x - y; },
        multiply = function (x, y) { return x * y; },
        console.log('Hello, world')
        divide = function (x, y) {
            if (y == 0) {
                alert("Can't divide by 0");
            }
            return x / y;
        },

JavaScript semicolon insertion blew away the var when I inserted any valid statement or expression. In fact, I could have simply put 'Hello, world!' or 5 there, on its own line, and because the next line is a valid statement that stands on its own, JavaScript semicolon insertion blew away the var. As such, divide, setVal, setEquation, clearNumbers, setOperator, numberClick, and calculate were all just elevated to global scope, possibly blowing away existing variables and leaking a whole bunch of information with them. This could happen in any instance in which someone mistakenly types a semicolon (let's be honest - JavaScript is a semicolon-terminated language; it will happen somewhat frequently), or if they forget to put a comma at the end of a line.

As such, joining variable declarations together by using the comma operator is inherently an unsafe operation. You might think of it as a run-on sentence; it's not a good thing to do in English, so why would it be good to do in JavaScript or any other programming language?

And if that's not reason enough, here's another: declaring a variable is a statement. You are stating to the compiler, "I am declaring this variable to be a variable." Use the var statement to make a statement, and use the comma operator to indicate that there are operations. (Specifically, the one and only place I can think of in which a comma operator would be appropriate is if you need a single for-loop with multiple iterator variables, e.g., for (var i = 0, j = myArray.length - 1; i = 0; i++, j--). Of course, I don't want to say you should never use it, or else I'd be like Crockford with his dogmatic "I have never seen a piece of code that was not improved by refactoring it to remove the continue statement," which is patently silly.

But, beware the comma. He is correct in that it is easy to mark programming errors with commas. If you're going to declare a variable, do everyone a favor and declare it, using var, make it feel special by giving it its own line and declaration and semicolon. It will help in maintenance down the line.

28Mar/110

A Recent Discovery: IronJS

Posted by Rob Paveza

Since Microsoft first announced Managed JScript and shortly thereafter announced its demise (more on that here), I’ve been chomping at the bit for an implementation of JavaScript on the DLR.  Actually, I’ve probably been hoping for it for far longer, because JavaScript is near and dear to my heart.  If not for Brinkster offering free ASP-based web hosting back when I was just a young’un, I may never have discovered server-side programming, and may never have ended up where I am today. 

IronJS is a free, open-source implementation of ECMAScript v3 on the DLR based predominantly on F#.  It’s even the right kind of free, currently licensed under the Apache License version 2 (not a copyleft license).  Check out this picture of its testbed application running:

image

As a learning exercise, I intend to figure out a good way to shoehorn this into my Battle.net (1.0) chat client, JinxBot.  It actually shouldn’t be terribly difficult, but while I considered adding it as a plugin, I think I’d like it to be part of the core application instead. 

I’ll more than likely be covering IronJS in several parts in my blog in the coming weeks (possibly months, since I’m getting married on May 1).  But these are topics that I intend to cover:

  • Hosting IronJS in a C# application, loading, and executing JavaScript code
  • Sharing objects between script and .NET
  • Isolating script from .NET (preventing script from having run of the whole runtime)
  • Isolating script from script (running multiple environments in one application, similar to how web pages in browsers like IE and Chrome all run from one instance of the script engine but have different contexts so they have different sets of objects)
  • Performance optimizations
  • Dynamically generating isolation layers
  • Other considerations

Stay tuned!  In the meantime, check out IronJS on GitHub and get it running on your machine!

15Mar/110

The Microsoft Reactive Extensions

Posted by Rob Paveza

The honest truth is that I’m having difficulty establishing exactly what they could be used for, but they’re still really cool.  The Microsoft Reactive Extensions for the .NET Framework are the dual of LINQ: whereas LINQ operates over objects, or you might say pulls objects out of collections, the Reactive Extensions (Rx) handles push notifications.  It is the ultimate generalization of events and event handling within .NET.

Getting There

First, let’s consider the normal interfaces for IEnumerable:

interface IEnumerable<T>
{
    IEnumerator<T> GetEnumerator();
}

interface IEnumerator<T> : IDisposable
{
    T Current { get; }  // throws exception at end of enumeration
    bool MoveNext();
}

These interfaces (okay, really, the non-generic IEnumerable interface, but let’s not split hairs) are the foundation of the foreach C# keyword (and the For Each… In in Visual Basic).  A foreach can also be written, roughly, as:

foreach (string str in myListOfStrings)
    Console.WriteLine(str);
// rewritten:
using (IEnumerator<string> enumStr = myListOfStrings.GetEnumerator())
{
    while (enumStr.MoveNext())
    {
        Console.WriteLine(enumStr.Current);
    }
}

Keep this example in mind for later, because we’ll revisit how this can be used in Rx programming.

Dualism

Dualism is something of a mathematical concept, and I don’t want to get into it because I don’t completely understand it myself, but most nerdy people reading my blog will probably appreciate an example from particle physics.  Consider a proton: its physical dual is the antiproton (because when they meet they annhilate each other.  It’s not an electron, because while they have opposite charge, they have substantially different mass).

The core of Rx is the dual of IEnumerable.  That is, IObservable<T> and IObserver<T>.  But let’s deconstruct these piece by piece.  Let’s start at IEnumerator<T>:

interface IObserver<T>
{
    // T Current { get; }
    // That method looks like: T get_Current();
    void OnNext(T next);
    // Current throws an exception if MoveNext() previously returned false, so:
    void OnError(Exception error);

    // bool MoveNext() 
    // returns true while Current is populated, false when we reach the end, so:
    void OnDone();
}

You can see that, whereas everything in IEnumerator<T> pulled data, now we’ve transitioned into pushing data.  But the observer isn’t really the cool part; rather, it’s the subject that’s cool:

interface IObservable<T>
{
    // GetEnumerator() returned an object; here we pass one in
    // We still needed to capture the disposable functionality, so we return IDisposable
    IDisposable Subscribe(IObserver<T> observer);
}

Now, if you want to see the specifics about how these were constructed, you can check out the Expert-to-Expert video on Channel 9.  I’ve included some high-level notes, but they’re not really as deep as you can get with these guys.

Creating a Subject

Creating a subject is a bit of a challenge; subjects are event-driven, and those are generally kind of difficult to think about because the fit usually only into one of two buckets: user interaction and system I/O.  For sake of example, I’ve created a simple Windows Forms project to start with, that has a couple observable Buttons (the class is called ObservableButton, go figure), and an observer, which is the containing form.  You can download the starter project, which requires Visual Studio 2010 and the Rx Framework.

Subjects can be anything, though, and the power you can glean from these is amazing.  For the Red Bull NASCAR team, I created a server for a Twitter feed aggregator using Rx.  It started as reading a socket into HTTP data, then into chunked HTTP data, then into JSON packets, then into POCO objects that were then re-serialized and sent over the wire to N Flash clients.  As you can imagine, network programming, social programming, or any other kind of programming where an event is coming in unpredictably is a great candidate for this.  Why?

Let’s look at the use case I just listed.  As Twitter’s live stream service sends data over the wire, I need to parse it and send it to a lot of listening sockets.  But I don’t want to just say “Oh I just got the data, let me send it out again” – that would possibly slow down processing on other threads, because I might have to wait – my socket might already be in the process of sending data and so it’s in an invalid state to send further data.  If I had tied a server socket directly to the “I’m ready to send” signal directly, I would have been in trouble.  Rather, I had a utility (an Observer) that aggregated incoming messages until all server sockets were ready to send, at which point it would push those updated messages to the server sockets.

Let’s look at the sample program:

image

This isn’t really anything spectacular.  I could have done that with regular event handlers.

Aggregating Subjects

The magic of Rx, from my perspective, lies with what you can do with subjects.  I’m no longer initializing my constructor to require two lines – I’m merging the two buttons into one observable sequence:

        public Form1()
        {
            InitializeComponent();

            observableButton1.Merge(observableButton2).Subscribe(this);
        }

 

The result is identical – the events get handled and all is good.

Modifying Sequences

Now I’m going to change the class definition slightly:

    public partial class Form1 : Form, IObserver<Timestamped<string>>
    {
        public Form1()
        {
            InitializeComponent();

            observableButton1.Merge(observableButton2).Timestamp().Subscribe(this);
        }

        public void OnNext(Timestamped<string> value)
        {
            this.textBox1.Text += value.Timestamp.ToString("hh:mm tt   ") + value.Value + Environment.NewLine;
        }

        public void OnError(Exception error)
        {
            this.textBox1.Text += "Exception caught: " + Environment.NewLine + error.ToString() + Environment.NewLine;
        }

        public void OnCompleted()
        {
            this.textBox1.Text += "Sequence completed." + Environment.NewLine;
        }
    }

Note that by adding in the .Timestamp() call, I’ve transformed the observable to sequence of strings to be an observable sequence of timestamped strings.  That’s pretty cool, right?

This is even cooler: the Delay() method:

observableButton1.Merge(observableButton2).Timestamp()
                .Delay(new TimeSpan(0, 0, 1)).ObserveOn(this).Subscribe(this);

The ObserveOn method accepts a Windows Forms control, a Dispatcher (for WPF), or other scheduler implementation that can be used to synchronize the delay.  If I didn’t include it, the delayed merge would be called on a different thread, and we’d get an InvalidOperationException (because you can’t update a window on a thread other than the thread that created it). 

Do you want to avoid repetition?

            observableButton1.Merge(observableButton2).Timestamp()
                .DistinctUntilChanged(ts => ts.Value).Subscribe(this);

This produced output that only emitted one message, no matter how many times I clicked the same button, until I clicked the other button.

So, What Can We Do?

Well, right now it doesn’t seem like there’s a lot of tooling for Rx.  There’s a community wiki around the framework, though, and I think that we can eventually see a lot of good use.

Some ideas:

  • Develop a way to completely repeat ASP.NET requests.  Treat IIS as an IObservable<AspNetRequest>, where AspNetRequest contains all the state data that would otherwise populate these tools, which would immensely help with debugging.  Imagine when your tester only needs to record a series of test cases once, and otherwise is just testing for UI errors.
  • Wrap event-oriented APIs for simplified logging and replaying.  (In JinxBot, an event-oriented chat API named for my cat, I always wanted to capture all the events of the core API and be able to replay them via a subclass, which would have allowed pixel-perfect replay of a chat session).
  • Handle periodic data services like Twitter, SMS, email, or others in a clean and efficient way.

I’d like to see this take off, but it’s a very different way of looking at programming than what most .NET developers are used to.  Enjoy it, take a look, and let’s build it up!

10Aug/100

C# 4.0 and Dynamic Programming @ AZDNUG

Posted by Rob Paveza

I’m officially posting my code samples and slides BEFORE the presentation tonight so that I can actually claim that they’re online.  They are downloadable at http://robpaveza.net/pub/dynamics-presentation.zip – includes the project folder, the “building-up-the-code” part (including the IronPython code), and the slides.

The project requires Visual C# 2010 to open.  Executing the Python code will require IronPython 2.6 for .NET 4.0 (available here).  You may need to modify the paths in the Python code to make it work exactly right as it relies on some paths, and in fact, the Python code won’t run until you’ve build an executable for the Cs4Dynamics project.

7Aug/090

Just how Big is Big-O, anyway?

Posted by Rob

For a couple reasons I love to look at programming interview questions around the internet.  Most of them revolve around data structures and algorithms, which is one of my favorite topics in computer science; and as a hiring manager, I find it valuable to try to see what the industry is asking people before bringing them on board.  I came across this site tonight, and while it had a lot of questions that I’ve seen before, this one – a variant of the programming exercise I was given for my first real (non-interning) job application – stood out:

Implement Shuffle given an array containing a deck of cards and the number of cards. Now make it O(n).

This caught my eye for a couple reasons: first, why would the algorithm not ever be O(n)?  Second, if I think its obvious implementation is O(n), then do I have a misconception about what O(n) means?  The problem doesn’t give any other details about what “Shuffle” means either.

Still, this is my naive algorithm:

void Shuffle ( cards[], length )
    for i = 0 to length
        n = Random % length
        Swap(&cards[i], &cards[n])

Swap() isn’t linear or otherwise; it’s constant time.  The only loop happening here is the single for loop, from 0 to the length of the array.  We can improve the entropy of the resultant shuffle by using a cryptographically-strong random number generator such as RNGCryptoServiceProvider, but it would require more memory.  Here’s an actual C# implementation:

Card.cs:

using System;

namespace LinearBigOShuffle
{
    public class Card
    {
        public Rank Rank { get; set; }
        public Suit Suit { get; set; }

        public override string ToString()
        {
            return string.Format("{0,-8}{1}", Suit, Rank);
        }
    }

    public enum Suit
    {
        Club = 1, Diamond = 2, Heart = 3, Spade = 4,
    }

    public enum Rank
    {
        Ace = 1, Two = 2, Three = 3, Four = 4, Five = 5, Six = 6, Seven = 7, Eight = 8, Nine = 8, Ten = 10,
        Jack = 11, Queen = 12, King = 13,
    }
}

Program.cs:

using System;

namespace LinearBigOShuffle
{
    class Program
    {
        static void Main()
        {
            // initialize the cards array
            Card[] cards = InitializeStandardDeck();

            // Verify that the cards were initialized correctly
            PrintCards(cards);

            Console.WriteLine("Shuffling...");

            Shuffle(cards);

            PrintCards(cards);

            // Wait for the user to press enter to exit.
            Console.ReadLine();
        }

        private static void Shuffle(Card[] cards)
        {
            Random r = new Random();
            int n = cards.Length;
            for (int i = 0; i < n; i++)
            {
                int nextSwapIndex = r.Next(n);
                Card temp = cards[nextSwapIndex];
                cards[nextSwapIndex] = cards[i];
                cards[i] = temp;
            }
        }

        private static Card[] InitializeStandardDeck()
        {
            Card[] cards = new Card[52];
            int index = 0;
            for (int suit = (int)Suit.Club; suit <= (int)Suit.Spade; suit++)
            {
                for (int rank = 1; rank <= (int)Rank.King; rank++)
                {
                    cards[index++] = new Card { Rank = (Rank)rank, Suit = (Suit)suit };
                }
            }
            return cards;
        }

        private static void PrintCards(Card[] cards)
        {
            foreach (Card c in cards)
            {
                Console.WriteLine(c);
            }
        }
    }
}

In this implementation, I’ve included a couple helper methods and assumed Ace-low, but I don’t think that really matters.  If you run the program, you’ll see the in-order deck and the shuffled deck.  There are a couple caveats with the Shuffle implementation I’ve provided:

  • There is no guarantee that a card won’t end up in the same relative position in which it started, or the same absolute position.
  • There is no guarantee that an individual card won’t move multiple times.

If you think about it though, neither of these caveats are untrue for really shuffling cards!

What about Big-O?

The question I raised earlier was, just how big is Big-O anyway?  Well, Big-O is great for analyzing algorithmic complexity, but that’s not always what we want when measuring the performance of an implementation.  I’m depending on a library function in the Random class!  What if it was incredibly slow, because it produced truly random numbers instead of pseudorandom numbers?  In that scenario, I would not have a terribly performant implementation, even if my algorithm was “theoretically” correct.

Big-O is great!  But I guess what I was trying to say here is…

Don’t think it’s the only thing that will impact your code!

31Mar/091

My Job is Tools

Posted by Rob

For the last few months I’ve been working on a fairly large-sized web application for work.  The application is mostly Flex-based, so I’m working on back-end components, which are accessed via FluorineFx (an ActionScript-.NET connector).  The site also needs to integrate with Ektron, a somewhat perilous enterprise-level CMS, because a lot of the company’s web-facing data is already stored their and they want to reuse as much as possible.  The site is a completely new version of a much less-ambitious site designed for the same purpose, driven by a 400mb Access database.

So, as part of our deployment setup, our projects consist of:

  • 3 projects that encompass most of the site’s actual functionality.  One is for Ektron integration, one is for control development, and one is for the service implementations.
  • 1 tool that imports data from the old site.
  • 1 tool that imports localization data from Ektron.
  • 1 tool that creates Ektron Smart Forms in order to create that localization data.
  • 1 tool that encrypts member passwords (because until that’s done we can’t see the old members database).
  • 1 tool that both backs up Ektron staging data and also restores that data to a clean environment.
  • A team test project.
  • A test harness for other parts of the system.
  • Oh yeah, the web site.

I generally find myself on this approach whenever I’m working on a large system.  Importing data from SMF?  Write a tool.  Need to transform something into an XML format your site understands?  Write a tool. 

For me, the hardest part about getting tooling correct is making sure that I’m not wasting time when I’m designing one.  For instance, smart forms in Ektron are a cool idea, at least in concept.  I’ve found them somewhat difficult to use, but it’s still a cool idea for a CMS.  Our client wants to use Ektron to store localization data, but right now all I have are the dictionaries of English text in a Flex ActionScript file.  Here’s our process:

  • Grep the text out of the ActionScript dictionaries into a .CSV file.
  • Using Sebastien Lorien’s CSV reader, read each dictionary in one at a time.
  • Create a .xml file corresponding to the Ektron smart form exported file format.
  • Manually import or update each corresponding Ektron smart form.

When this eventually goes to the client (or to a new staging database for ourselves), we’ll run a tool on our database in “backup mode,” which will back up all of the smart forms, as well as the corresponding content rows, into a binary file.  Then we’ll run the tool on the new database in “restore mode,” which will update all of the content to the latest versions.

What does this allow us to do?

  • QA content on the test site and be confident that it’s going to be represented on the production site.  Because entering content into Ektron in this manner isn’t particularly easy, we don’t feel comfortable that we could do it with no errors.  By automating the process, we can be reasonably confident that we’ll eliminate typos.
  • Avoid repeated input time.  It might have cost me the same amount of time to write the backup/restore tool as it would have taken to just do it by hand.  However, I was fairly confident (and correct) that I’d have to do it more than once.  We get the added bonus of being able to deliver the tool to the client, who will also have to run it in their production environment.

I love writing tools. It makes everyone’s lives easier, and that’s never, ever bad.

Tagged as: , 1 Comment
13Mar/090

Facebook UI for ASP.NET Launches

Posted by Rob

I’ve officially launched the Facebook UI for ASP.NET project at CodePlex at http://facebookui.codeplex.com/.  It’s kind of a preview edition of what’s to come – and there’s quite a bit.  Here’s a quick overview:

Now:

  • Design-time and run-time server support for about 30 FBML controls.
  • Visual Studio IntelliSense support for FBML controls
  • Web Site project templates in Visual Studio 2008

Version 1:

  • All FBML tags within: Users/Groups, Profile-Specific, Embedded Media, Visibility on Profile, Tools, Forms, Message Attachments, Additional Permissions, Notifications and Requests, Status Messages, Page Navigation, Dialog, and Wall sections.
  • Complete API support.
  • Extensibility points within the FBML controls framework and API framework to enable updates from Facebook to be rolled in without requiring an immediate release of a new library.

Version 1.5:

  • LINQ-to-FQL (or as some folks at the user group mentioned, perhaps it’ll be called LINQ-to-Facebook).
  • Higher-level controls will be included, such as validators and simple logic controls, like a birthday display.

Version 2:

  • Facebook internationalization should be supported.
  • Mobile platform support should be extended.

Beyond that, I’m not sure where we’re going to take the project.  We’ll have to see!