Running with Code Like with scissors, only more dangerous

16Nov/150

Exploring .dbc files with C# Dynamic Code Generation, part 1: Defining the problem

Posted by Rob Paveza

You know, I look back at my blog after all these years and the particularly infrequent updates and reflect a little bit on just how much things have changed for me. I know that right now I want to be writing code using back-ticks because I'm so accustomed to writing Markdown. But that's neither here nor there.

I recently published a project called DbcExplorer on GitHub. This was just a pet project I'd been working on during the World of Warcraft: Mists of Pandaria timeline; I'd just joined Microsoft and had my Windows Phone, but there's no Mobile Armory on Windows Phone (or even Big Windows, for that matter). A little bit of background: World of Warcraft stores simple databases in files with an extension of .dbc or .db2; these databases allow rapid lookup by ID or simple fast enumeration. There are a myriad number of them, and they commonly change from version to version. The reason I wanted them was to be able to crawl item information and achievement information for the purpose of creating a miniature Mobile Armory for Windows Phone, that could at least tell you which achievements you were lacking, and people could vote on which achievements were easiest, so that you could quickly boost your score.

(Side note: When Warlords of Draenor was released, Blizzard changed their storage archive format from MPQ to CASC. Ladislav Zezula, who created StormLib, which was a C library for accessing MPQ files, had made some progress at the time at CASC as well. However, I couldn't get it to work at the time, so I stopped working on this project. Ladik and I recently figured out what the disconnect was, and I've now wrapped his CascLib into CascLibSharp, but I don't know that I'll be resurrecting the other project).

Anyway, DBC files are pretty easy. They have a header in the following form:

uint32        Magic 'WDBC', 'WDB2', or 'WCH2'
uint32        Number of records
uint32        Number of columns per record
uint32        Number of bytes per record (always 4x # of columns as far as I can tell)
uint32        String block length

The files that aren't of type 'WDBC' have a few additional fields, but the general structure is the same. The files then have the general form:

DbcHeader     Header
Record[Count] Records
uint8         0  (Start of string table, a 0-length string)
uint8[]       String block (UTF-8 encoded, null-terminated strings)

Each column is one of:

  • Int32
  • Float32
  • String (an int32-offset into the String Table)
  • "Flags" (a uint32 but usually has a fixed set of bit combinations)
  • Boolean (just 0 or 1)

So this pretty well defines the problem space. We need to support deserializing from this binary format into plain objects, so that I can say I have a DbcTable<T>, and my runtime will be able to enumerate the records in the table. Now, because the CLR doesn't guarantee how the properties on objects will be enumerated (at least to the best of my knowledge); it probably keeps a consistent order based on some ethereal thing, but I don't know what that order is based on, so before I go, I probably have to do something.

Briefly, let's look at DBFilesClient\CharTitles.dbc. This file (at least as of the most recent patch) has six columns. I don't know for sure, but it looks like the following:

Column    Type       Description
0         Int32      ID
1         Int32      Required achievement ID
2         String     Title
3         String     Title, repeated
4         Int32      Unknown, just seems to continuously increase
5         Int32      Reserved (all records have 0)

Since I don't know what to do with columns 3-5, I can just define the following class:

public class CharacterTitleRecord
{
    [DbcRecordPosition(0)]
    public int ID;
    [DbcRecordPosition(1)]
    public int RequiredAchievementID;
    [DbcRecordPosition(2)]
    public string Title;
}

Next time: We'll see how the naïve implementation deserializes each record.

16Feb/120

Bridging the gap between Jurassic and the DLR, Part Two

Posted by Rob Paveza

Part Two: A More Complete Object Model

Once I started implementing the rest of the object model, things started coming together very well.

Let's take a look at the rest of the ObjectInstance implementation:

        public override bool TryInvokeMember(InvokeMemberBinder binder, object[] args, out object result)
        {
            return TryCallMemberFunction(out result, binder.Name, args);
        }

        public override bool TryConvert(ConvertBinder binder, out object result)
        {
            if (binder.ReturnType == typeof(string))
            {
                result = this.ToString();
                return true;
            }
            else
            {
                try
                {
                    result = Jurassic.TypeConverter.ConvertTo(this.engine, this, binder.ReturnType);
                    return true;
                }
                catch
                {
                    result = null;
                    return false;
                }
            }
        }

        public override bool TryDeleteMember(DeleteMemberBinder binder)
        {
            return Delete(binder.Name, false);
        }

        public override bool TrySetMember(SetMemberBinder binder, object value)
        {
            this.FastSetProperty(binder.Name, value, PropertyAttributes.FullAccess, true);
            return true;
        }

        public override bool TryGetMember(GetMemberBinder binder, out object result)
        {
            result = GetNamedPropertyValue(binder.Name, this);
            if (object.ReferenceEquals(null, result))
                return false;

            return true;
        }

For FunctionInstance:

        public override bool TryInvoke(System.Dynamic.InvokeBinder binder, object[] args, out object result)
        {
            try
            {
                result = CallLateBound(this, args);
                return true;
            }
            catch
            {
                result = null;
                return false;
            }
        }

For ArrayInstance, things were a little more interesting. JavaScript doesn't support multidimensional arrays (as in C#, where you can access something via someArr[1,5]). However, it's important to consider the type. Fortunately, Jurassic provides that as well, fairly easily.

        public override bool TryGetIndex(GetIndexBinder binder, object[] indexes, out object result)
        {
            Debug.Assert(indexes != null && indexes.Length > 0);

            result = null;

            if (indexes.Length > 1)
                return false; // multi-dimensional arrays are not supported.

            if (object.ReferenceEquals(null, indexes[0]))
                return false;

            Type indexType = indexes[0].GetType();
            if (indexType.IsEnum)
            {
                indexType = Enum.GetUnderlyingType(indexType);
            }

            if (indexType == typeof(byte) || indexType == typeof(sbyte) || indexType == typeof(short) || indexType == typeof(ushort) || indexType == typeof(int) || indexType == typeof(uint))
            {
                uint index = unchecked((uint)indexes[0]);
                try
                {
                    result = this[index];
                    return true;
                }
                catch
                {
                    result = null;
                    return false;
                }
            }
            else
            {
                string index = indexes[0].ToString();
                try
                {
                    result = this[index];
                    return true;
                }
                catch
                {
                    result = null;
                    return false;
                }
            }
        }

        public override bool TrySetIndex(SetIndexBinder binder, object[] indexes, object value)
        {
            Debug.Assert(indexes != null && indexes.Length > 0);

            if (indexes.Length > 1)
                return false; // multi-dimensional arrays are not supported.

            if (object.ReferenceEquals(null, indexes[0]))
                return false;

            Type indexType = indexes[0].GetType();
            if (indexType.IsEnum)
            {
                indexType = Enum.GetUnderlyingType(indexType);
            }

            if (indexType == typeof(byte) || indexType == typeof(sbyte) || indexType == typeof(short) || indexType == typeof(ushort) || indexType == typeof(int) || indexType == typeof(uint))
            {
                uint index = unchecked((uint)indexes[0]);
                try
                {
                    this[index] = value;
                    return true;
                }
                catch
                {
                    return false;
                }
            }
            else
            {
                string index = indexes[0].ToString();
                try
                {
                    this[index] = value;
                    return true;
                }
                catch
                {
                    return false;
                }
            }
        }

        public override bool TryDeleteIndex(DeleteIndexBinder binder, object[] indexes)
        {
            Debug.Assert(indexes != null && indexes.Length > 0);

            if (indexes.Length > 1)
                return false; // multi-dimensional arrays are not supported.

            if (object.ReferenceEquals(null, indexes[0]))
                return false;

            Type indexType = indexes[0].GetType();
            if (indexType.IsEnum)
            {
                indexType = Enum.GetUnderlyingType(indexType);
            }

            if (indexType == typeof(byte) || indexType == typeof(sbyte) || indexType == typeof(short) || indexType == typeof(ushort) || indexType == typeof(int) || indexType == typeof(uint))
            {
                uint index = unchecked((uint)indexes[0]);
                return Delete(index, false);
            }
            else
            {
                string index = indexes[0].ToString();
                return Delete(index, false);
            }
        }

I did add one set of precompilation directives so that I could modify ScriptEngine with one little item:

        public 
#if SUPPORT_DYNAMIC
            dynamic
#else
            object 
#endif
            GetGlobalValue(string variableName)
        {
            if (variableName == null)
                throw new ArgumentNullException("variableName");
            return TypeUtilities.NormalizeValue(this.Global.GetPropertyValue(variableName));
        }

With that gem, we're in good shape. I'll update the test program; let's take a good look:

    class Program
    {
        static void Main(string[] args)
        {
            ScriptEngine engine = new ScriptEngine();

            engine.SetGlobalFunction("write", (Action<string>) ((s) => { Console.WriteLine(s); }));
            
            engine.Execute(@"
var a = {
    A: 'A',
    B: 20,
    C: function() { return 'Hello'; }
};

function double(val)
{
    return val * 2;
}

var array = [1, 5, 9, 13, 21];
");
            dynamic obj = engine.Evaluate<ObjectInstance>("a");
            Console.WriteLine(obj.A);
            Console.WriteLine(obj.B);
            Console.WriteLine(obj.C());
            obj.D = "What's that?";

            Console.WriteLine("C#: " + obj.D);

            engine.Execute(@"
write('JavaScript: ' + a.D);
");

            dynamic dbl = engine.GetGlobalValue("double");
            Console.WriteLine(dbl(20));
            Console.WriteLine(dbl.call(null, 20));

            dynamic array = engine.GetGlobalValue("array");
            Console.WriteLine(array[2]);

            Console.ReadLine();
            
        }
    }

Output is happily correct:

A
20
Hello
C#: What's that?
JavaScript: What's that?
40
40
9

Particularly neat about this implementation is that the calls automatically recurse. Note that I use the intrinsic JavaScript call method on the Function instance (of dbl). This implementation covers a whole bunch of typical scenarios and use-cases, and I'm happy to see that it has worked out fairly well thus far.

One item I've found is that there's a TypeLoadException when targeting .NET 4. This has something to do with a new CAS policy in .NET 4. For now, applying this attribute to the test program as well as the library will resolve the issue, though I don't intend for it to be long-term:

[assembly: System.Security.SecurityRules(System.Security.SecurityRuleSet.Level1)]

Next time, we'll do some more fit and finish, with precompilation constants and a security review.

14Feb/120

Bridging the gap between Jurassic and the DLR, Part One

Posted by Rob Paveza

Part One: ObjectInstance derives from DynamicObject

A while back I posted that I was joining the Jurassic team; Jurassic is an open-source JavaScript engine for .NET. If you've ever gone through the long search for a JavaScript implementation on .NET (other than JScript.NET, of course), there are a bunch of incomplete implementations, and if you're lucky enough to find the blog about the Microsoft project of JScript running on the DLR (once called Managed JScript), you'll find that it was an implementation that was specifically designed to give design feedback on the DLR itself, and was not planned to be carried forward into production. Personally I think that's too bad, but I'm happy to see a couple of projects (notably, Jurassic and IronJS) that have stepped up to fill the gap.

I had considered implementing IDynamicMetaObjectProvider, but inheriting from DynamicObject seems to be a better design decision all-around. Since all of the other JavaScript objects inherit from ObjectInstance, it's a simple matter of overriding its virtual methods instead of creating a new implementation of DynamicMetaObject for each class in the hierarchy.

I've created a new library project within the solution as well as a simple testing project to advise on the API as well as to step into my DynamicObject overrides. Here are some simple components:

// These are new:
using System.Dynamic;
using System.Diagnostics;

// This is updated
    public class ObjectInstance
        : DynamicObject
#if !SILVERLIGHT
        , System.Runtime.Serialization.IDeserializationCallback
#endif
    {

// The class exists as normal
        public override bool TryGetMember(GetMemberBinder binder, out object result)
        {
            result = GetNamedPropertyValue(binder.Name, this);
            if (result != null)
                return true;

            return false;
        }

        public override bool TryInvokeMember(InvokeMemberBinder binder, object[] args, out object result)
        {
            try
            {
                result = CallMemberFunction(binder.Name, args);
                return true;
            }
            catch
            {
                result = null;
                return false;
            }
        }

        public override bool TrySetMember(SetMemberBinder binder, object value)
        {
            this.AddProperty(binder.Name, value, PropertyAttributes.FullAccess, true);
            return true;
        }

    }

This is the source of the test application. It's very straightforward:

        static void Main(string[] args)
        {
            ScriptEngine engine = new ScriptEngine();

            engine.SetGlobalFunction("write", (Action<string>) ((s) => { Console.WriteLine(s); }));
            
            engine.Execute(@"
var a = {
    A: 'A',
    B: 20,
    C: function() { return 'Hello'; }
};
");
            dynamic obj = engine.Evaluate<ObjectInstance>("a");
            Console.WriteLine(obj.A);
            Console.WriteLine(obj.B);
            Console.WriteLine(obj.C());
            obj.D = "What's that?";

            Console.WriteLine("C#: " + obj.D);

            engine.Execute(@"
write('JavaScript: ' + a.D);
");

            Console.ReadLine();
            
        }

We create a global function 'write' which writes a string to the console. Then we create a global object a with properties A, B, and C. We then use C# to retrieve the value of this object as a dynamic. This is what provides us access to the DynamicObject's overrides intrinsic within C#'s support of the DLR. We then access each property (which each return a dynamic) and, happily because of Jurassic's automatic conversion of primitives to their respective .NET types, when these values are returned as dynamic, they can be automatically converted to their appropriate types for their Console.WriteLine parameter. You can see that we invoke TryGetMember on A and B, TryInvokeMember on C, and TrySetMember and then TryGetMember on D.

OK, it's late, so I'm not going to stick this out anymore right now. I'm not even sure if the previous paragraph was particularly coherent. :-)

There's a lot to update: it doesn't support case-insensitive languages (like Visual Basic), it's not particularly good at error checking, and I haven't dealt with any other components yet. The good news is that it seems like we should be good to go for the rest of the components.

Next time, we'll look at other classes, like ArrayInstance, FunctionInstance, and more.

9Jan/121

Revealing Prototype Pattern: Pros and Cons

Posted by Rob Paveza

A short while ago, I wrote a post generally saying good things about the Revealing Prototype Pattern but mostly focused tearing down the other part that was presented with it, namely the way that variables were declared in a chain separated by the comma operator. This post will discuss some of the pros and cons of using this pattern, and give some examples about when you should or shouldn't use it.

Advantages

As Dan notes in his post, this features a significant improvement over straight prototype assignment (assignment of an object literal to a prototype), in that private visibility is supported. And because you're still assigning an object to the prototype, you are able to take advantage of prototypal inheritance. (Next time: How prototypal inheritance really works and how it can make your head explode).

Except for chaining his variable declarations, I have to admit Dan had me pretty well sold on the Revealing Prototype Pattern. It's very elegant; it provides good protection to its inner variables, and it uses a syntax we've been seeing more and more of ever since jQuery became a popular platform.

Unfortunately, it has some nasty drawbacks.

Disadvantages

To be fair, Dan lists some of the disadvantages about this pattern; however, he doesn't quite list them as such, and I think he was possibly unaware of some of their implications:

There's something interesting that happens with variables though, especially if you plan on creating more than one Calculator object in a page. Looking at the public functions you'll see that the 'this' keyword is used to access the currNumberCtl and eqCtl variables defined in the constructor. This works great since the caller of the public functions will be the Calculator object instance which of course has the two variables defined. However, when one of the public functions calls a private function such as setVal(), the context of 'this' changes and you'll no longer have access to the two variables.

The first time I read through that I glossed over the problems; I didn't quite understand the issue until I wrote some code. So let's do that - we'll implement the Java StringTokenizer class:

function StringTokenizer(srcString, delim)
{
    if (typeof srcString === 'undefined')
        throw new ReferenceError("Parameter 0 'srcString' is required.");
    if (typeof srcString !== 'string')
        srcString = srcString.toString();
    if (typeof delim !== 'string')
        delim = ' ';
    
    if (!(this instanceof StringTokenizer))    // enforce constructor usage
        return new StringTokenizer(srcString, delim);
        
    this.sourceString = srcString;
    this.delimiter = delim;
}
StringTokenizer.prototype = (function()
{
    var that = this;
    
    var _tokens = that.sourceString.split(that.delimiter);
    var _index = 0;
    
    var _countTokens = function() { return _tokens.length; };
    var _hasMoreTokens = function() { return _index < _tokens.length; };
    var _nextToken = function()
    {
        if (!_hasMoreTokens())
            return false;
        
        var next = _tokens[_index];
        _index += 1;
        return next;
    };
    var _reset = function() { _index = 0; };
    
    var resultPrototype = 
    {
        countTokens: _countTokens,
        hasMoreTokens: _hasMoreTokens,
        nextToken: _nextToken,
        reset: _reset
    };
    return resultPrototype;
})();

If you've ever written a jQuery plugin, you'll probably recognize what I did with the prototype assignment function; when writing jQuery plugins, it's common to close over the current instance of the jQuery object by assigning var that = $(this); so that you can write event-handler functions without losing access to the overall context. Unfortunately, what I did in this case is wrong; you may already see why.

var that = this;

In this context, this is a reference to the global object, not to the instance of the object - even though the prototype is being set. This is a generalization of what Dan said. Rewriting it to overcome it results in information leaking:

function StringTokenizer(srcString, delim)
{
    if (typeof srcString === 'undefined')
        throw new ReferenceError("Parameter 0 'srcString' is required.");
    if (typeof srcString !== 'string')
        srcString = srcString.toString();
    if (typeof delim !== 'string')
        delim = ' ';
    
    if (!(this instanceof StringTokenizer))    // enforce constructor usage
        return new StringTokenizer(srcString, delim);
        
    this.sourceString = srcString;
    this.delimiter = delim;
    this.tokens = srcString.split(delim);
    this.index = 0;
}
StringTokenizer.prototype = (function()
{
    var _countTokens = function() { return this.tokens.length; };
    var _hasMoreTokens = function() { return this.index < this.tokens.length; };
    var _nextToken = function()
    {
        if (!this.hasMoreTokens())
            return false;
        
        var next = this.tokens[this.index];
        this.index += 1;
        return next;
    };
    var _reset = function() { this.index = 0; };
    
    var resultPrototype = 
    {
        countTokens: _countTokens,
        hasMoreTokens: _hasMoreTokens,
        nextToken: _nextToken,
        reset: _reset
    };
    return resultPrototype;
})();

The code works correctly; but you can see that we have to make public all of the state variables we'll use in the constructor. (The alternatives are to either initialize the state variables in each function, where they would still be public; or to create an init function, which would still cause the variables to be public AND would require the user to know to call the init function before calling anything else).

Dan also indicated that you needed a workaround for private functions:

There are a few tricks that can be used to deal with this, but to work around the context change I simply pass “this” from the public functions into the private functions.

Personally, I prefer to try to avoid things one might call clever or tricky, because that's code for "so complex you can't understand it". But even in the case where you have a public function, you'll still get an error if you don't reference it via a public function call. This error is nonintuitive and could otherwise make you go on a bug hunt for a long time. Consider this change to the above code:

    var _hasMoreTokens = function() { return this.index < this.tokens.length; };
    var _nextToken = function()
    {
        if (!_hasMoreTokens())   // changed from:   if (!this.hasMoreTokens())
            return false;
        
        var next = this.tokens[this.index];
        this.index += 1;
        return next;
    };

Simply removing the 'this' reference in the caller is enough to cause 'this' to go out-of-scope in the _hasMoreTokens function. This is completely unintuitive behavior for developers who grew up in the classical inheritance model.

Alternatives

I wouldn't want to give you all of these options without giving you an alternative. The alternative I present here is one in which the entire object is populated in the constructor:

"use strict";
function StringTokenizer(srcString, delim)
{
    if (typeof srcString === 'undefined')
        throw new ReferenceError("Parameter 0 'srcString' is required.");
    if (typeof srcString !== 'string')
        srcString = srcString.toString();
    if (typeof delim !== 'string')
        delim = ' ';
    
    if (!(this instanceof StringTokenizer))    // enforce constructor usage
        return new StringTokenizer(srcString, delim);
        
    if (typeof Object.defineProperty !== 'undefined')
    {
        Object.defineProperty(this, 'sourceString', { value: srcString });
        Object.defineProperty(this, 'delimiter', { value: delim });
    }
    else
    {
        this.sourceString = srcString;
        this.delimiter = delim;
    }
    
    var _tokens = this.sourceString.split(this.delimiter);
    var _index = 0;
    
    var _countTokens = function() { return _tokens.length; };
    var _hasMoreTokens = function() { return _index < _tokens.length; };
    var _nextToken = function()
    {
        if (!_hasMoreTokens())
            return false;
        
        var next = _tokens[_index];
        _index += 1;
        return next;
    };
    var _reset = function() { _index = 0; };
    
    if (typeof Object.defineProperty !== 'undefined')
    {
        Object.defineProperty(this, 'countTokens', { value: _countTokens });
        Object.defineProperty(this, 'hasMoreTokens', { value: _hasMoreTokens });
        Object.defineProperty(this, 'nextToken', { value: _nextToken });
        Object.defineProperty(this, 'reset', { value: _reset });
    }
    else
    {
        this.countTokens = _countTokens;
        this.hasMoreTokens = _hasMoreTokens;
        this.nextToken = _nextToken;
        this.reset = _reset;
    }
}

The advantage of a structure like this one is that you always have access to this. (Note that this example is unnecessarily large because I've taken the additional step of protecting the properties with Object.defineProperty where it is supported). You always have access to private variables and you always have access to the state. The unfortunate side effect of this strategy is that it doesn't take advantage of prototypal inheritance (it's not that you can't do it with this strategy - more of that coming in the future) and that the entire private and public states (including functions) are closed-over, so you use more memory. Although, one may ask: is that really such a big deal in THIS sample?

Usage Considerations

The Revealing Prototype Pattern can be a good pattern to follow if you're less concerned with maintaining data integrity and state. You have to be careful with access non-public data and functions with it, but it's pretty elegant; and if you're working on a lot of objects, you have the opportunity to save on some memory usage by delegating the function definitions into the prototype rather than the specific object definition. It falls short, though, when trying to emulate classical data structures and enforce protection mechanisms. As such, it can require complicated or clever tricks to work around its shortcomings, which can ultimately lead to overly-complex or difficult-to-maintain code.

Like most patterns, your mileage may vary.

28Mar/110

A Recent Discovery: IronJS

Posted by Rob Paveza

Since Microsoft first announced Managed JScript and shortly thereafter announced its demise (more on that here), I’ve been chomping at the bit for an implementation of JavaScript on the DLR.  Actually, I’ve probably been hoping for it for far longer, because JavaScript is near and dear to my heart.  If not for Brinkster offering free ASP-based web hosting back when I was just a young’un, I may never have discovered server-side programming, and may never have ended up where I am today. 

IronJS is a free, open-source implementation of ECMAScript v3 on the DLR based predominantly on F#.  It’s even the right kind of free, currently licensed under the Apache License version 2 (not a copyleft license).  Check out this picture of its testbed application running:

image

As a learning exercise, I intend to figure out a good way to shoehorn this into my Battle.net (1.0) chat client, JinxBot.  It actually shouldn’t be terribly difficult, but while I considered adding it as a plugin, I think I’d like it to be part of the core application instead. 

I’ll more than likely be covering IronJS in several parts in my blog in the coming weeks (possibly months, since I’m getting married on May 1).  But these are topics that I intend to cover:

  • Hosting IronJS in a C# application, loading, and executing JavaScript code
  • Sharing objects between script and .NET
  • Isolating script from .NET (preventing script from having run of the whole runtime)
  • Isolating script from script (running multiple environments in one application, similar to how web pages in browsers like IE and Chrome all run from one instance of the script engine but have different contexts so they have different sets of objects)
  • Performance optimizations
  • Dynamically generating isolation layers
  • Other considerations

Stay tuned!  In the meantime, check out IronJS on GitHub and get it running on your machine!

10Aug/100

C# 4.0 and Dynamic Programming @ AZDNUG

Posted by Rob Paveza

I’m officially posting my code samples and slides BEFORE the presentation tonight so that I can actually claim that they’re online.  They are downloadable at http://robpaveza.net/pub/dynamics-presentation.zip – includes the project folder, the “building-up-the-code” part (including the IronPython code), and the slides.

The project requires Visual C# 2010 to open.  Executing the Python code will require IronPython 2.6 for .NET 4.0 (available here).  You may need to modify the paths in the Python code to make it work exactly right as it relies on some paths, and in fact, the Python code won’t run until you’ve build an executable for the Cs4Dynamics project.

27May/103

Launching OpenGraph.NET

Posted by Rob Paveza

Tonight I’m publishing to Codeplex a project that I’ve been working on for about a month, that I’ve called OpenGraph.NET.  It’s a C# client for Facebook’s still-new Graph API.  It currently supports regular desktop applications, web sites (using Web Forms and ASP.NET MVC), and to some extent, Silverlight.  All of the groundwork is there – it’s just going to take a bit more work to get it across the finish line.  I’m calling it version 0.9.1 "Beta”.  (Maybe I’ll come up with some clever name like “Froyo,” like the operating system on my phone).

image

OpenGraph.NET’s documentation is available at http://robpaveza.net/opengraph.net/docs/ and the project can be downloaded from CodePlex at http://opengraph.codeplex.com/.  There are also a couple demos on the CodePlex site within the download.

OpenGraph.NET is licensed with the new BSD license – basically, you can use it for whatever you want, but if you hand out the project publically, either compiled or as source code, you should include a copy of my copyright notice and license terms.  I’m not an advocate of copyleft, but I would certainly welcome patch submissions.  Over the weekend, I’ll be porting the source code repository from my web server onto CodePlex.

One more note – it IS indeed working out there.  We’re using it on a currently-undisclosed project at Terralever for an event being hosted by one of our clients, and I am using the Real Time Updates handler for it as well.

Over the coming weeks, I’ll be talking about the internals of how this works, including dynamic methods.

I’d like to mention a big thank-you to James Newton-King, for the awesome Json.NET library which is used extensively throughout OpenGraph.NET.

19Jan/100

Improving Performance with Dynamic Methods Part 1: The Problem Definition

Posted by Rob

One of the problems that a large part of the a certain gaming community has understood over the years has been one of version checking.  A common, though now older, method of version checking among this community has been to execute a known algorithm based on a seeded value; however, the algorithm would change based on a formula sent over the wire.  For instance, suppose for every four bytes in a file, there are four state values: A, B, C, and S.  The S value is the current four bytes of the file.  The server might send the following formula as an initialization: A=A-S B=B-C C=C+A A=A+B.  In addition, it sends some startup values for A, B, and C.  It means, that for every four bytes of the file, we need to perform the math in the stops outlined in the above file initialization string.

Now, one of the common ways to approach this problem has been to, basically, attack it by brute force.  We’d keep track of the state values in an array, then keep track of the indices of the state values in another array offset by their letters, then keep track of operators in another array, and finally doing double-dereferencing (dereferencing the index of the state value then actually dereferencing the state value.  So you might have code that looks like this:

foreach (step)
{
    states[Transform('S')] = ReadNext();
    foreach (string formula in list)
    {
        states[Transform(formula[0])] = DoMath(states[Transform(formula[2])], states[Transform(formula[4])], GetOperator(formula));
    }
}

Here, the “Transform” function translates a character to its index into the state value index.  This is a pretty sub-optimal solution given all of the extra dereferencing, and this is really a pseudo-implementation of this activity.  What would be best is if we could somehow unroll that inner loop and access the values directly (or through a single dereference, as a pointer would do).  In other words, it could be rewritten better like so:

foreach (step)
{
    S = ReadNext();
    A = A - S;
    B = B - C;
    C = C + A;
    A = A + B;
}

The challenge is that, the server provides the verification string, and it changes over time, so the client can’t reliably predict which combination of formulae will be used.  Although in the wild only a fixed set of combinations have ever been observed, there are a number of others that could potentially be presented, with no fixed number of formulas, three potential writeable state values and four readable state values per formula, and eight binary operators (+, –, *, /, %, &, |, and ^).  So, either we keep going with the inner loop, or we figure out some way to get all the benefits of compilation without the headaches of having to know exactly what we’re programming before we program it.  Fortunately, the .NET framework provides a way for us to do exactly that: dynamic methods.

To simplify the code that we need to generate, we’ll rewrite the inner code to look like this:

foreach (step)
{
    S = ReadNext();
    ExecuteStep(ref A, ref B, ref C, ref S);
}

Now, all we need to do is dynamically emit the ExecuteStep method.  To do so we’ll need to get into the System.Reflection.Emit namespace – kind of a scary place to be!  Fortunately, Reflector is going to make this easier for us – and we’ll be glad we’re doing this in IL.

In Part 2, we’ll look at how to actually emit the dynamic method by writing the equivalent code in C# and then looking at it in Reflector, then figuring out how to generate it at run-time.  Along the way, we’ll learn a little bit about the .NET evaluation stack.

Oh – one more thing – here’s why you should care about all of this.  A simple testing framework indicated a speed increase of a factor of four when changing this to use a dynamic method instead of the previous implementation.  Over 50 iterations, I observed the dynamic method versions taking a little less than 1/4 of the execution time of the original array-based implementation.

Now, if that’s not a marked improvement, I don’t know what is.  But remember, as with all performance optimizations, your mileage may vary.

Improving Performance with Dynamic Methods

  • Part 1: The Problem Definition
  • Part 2: Emit and Execute