Running with Code Like with scissors, only more dangerous

16Feb/120

Bridging the gap between Jurassic and the DLR, Part Two

Posted by Rob Paveza

Part Two: A More Complete Object Model

Once I started implementing the rest of the object model, things started coming together very well.

Let's take a look at the rest of the ObjectInstance implementation:

        public override bool TryInvokeMember(InvokeMemberBinder binder, object[] args, out object result)
        {
            return TryCallMemberFunction(out result, binder.Name, args);
        }

        public override bool TryConvert(ConvertBinder binder, out object result)
        {
            if (binder.ReturnType == typeof(string))
            {
                result = this.ToString();
                return true;
            }
            else
            {
                try
                {
                    result = Jurassic.TypeConverter.ConvertTo(this.engine, this, binder.ReturnType);
                    return true;
                }
                catch
                {
                    result = null;
                    return false;
                }
            }
        }

        public override bool TryDeleteMember(DeleteMemberBinder binder)
        {
            return Delete(binder.Name, false);
        }

        public override bool TrySetMember(SetMemberBinder binder, object value)
        {
            this.FastSetProperty(binder.Name, value, PropertyAttributes.FullAccess, true);
            return true;
        }

        public override bool TryGetMember(GetMemberBinder binder, out object result)
        {
            result = GetNamedPropertyValue(binder.Name, this);
            if (object.ReferenceEquals(null, result))
                return false;

            return true;
        }

For FunctionInstance:

        public override bool TryInvoke(System.Dynamic.InvokeBinder binder, object[] args, out object result)
        {
            try
            {
                result = CallLateBound(this, args);
                return true;
            }
            catch
            {
                result = null;
                return false;
            }
        }

For ArrayInstance, things were a little more interesting. JavaScript doesn't support multidimensional arrays (as in C#, where you can access something via someArr[1,5]). However, it's important to consider the type. Fortunately, Jurassic provides that as well, fairly easily.

        public override bool TryGetIndex(GetIndexBinder binder, object[] indexes, out object result)
        {
            Debug.Assert(indexes != null && indexes.Length > 0);

            result = null;

            if (indexes.Length > 1)
                return false; // multi-dimensional arrays are not supported.

            if (object.ReferenceEquals(null, indexes[0]))
                return false;

            Type indexType = indexes[0].GetType();
            if (indexType.IsEnum)
            {
                indexType = Enum.GetUnderlyingType(indexType);
            }

            if (indexType == typeof(byte) || indexType == typeof(sbyte) || indexType == typeof(short) || indexType == typeof(ushort) || indexType == typeof(int) || indexType == typeof(uint))
            {
                uint index = unchecked((uint)indexes[0]);
                try
                {
                    result = this[index];
                    return true;
                }
                catch
                {
                    result = null;
                    return false;
                }
            }
            else
            {
                string index = indexes[0].ToString();
                try
                {
                    result = this[index];
                    return true;
                }
                catch
                {
                    result = null;
                    return false;
                }
            }
        }

        public override bool TrySetIndex(SetIndexBinder binder, object[] indexes, object value)
        {
            Debug.Assert(indexes != null && indexes.Length > 0);

            if (indexes.Length > 1)
                return false; // multi-dimensional arrays are not supported.

            if (object.ReferenceEquals(null, indexes[0]))
                return false;

            Type indexType = indexes[0].GetType();
            if (indexType.IsEnum)
            {
                indexType = Enum.GetUnderlyingType(indexType);
            }

            if (indexType == typeof(byte) || indexType == typeof(sbyte) || indexType == typeof(short) || indexType == typeof(ushort) || indexType == typeof(int) || indexType == typeof(uint))
            {
                uint index = unchecked((uint)indexes[0]);
                try
                {
                    this[index] = value;
                    return true;
                }
                catch
                {
                    return false;
                }
            }
            else
            {
                string index = indexes[0].ToString();
                try
                {
                    this[index] = value;
                    return true;
                }
                catch
                {
                    return false;
                }
            }
        }

        public override bool TryDeleteIndex(DeleteIndexBinder binder, object[] indexes)
        {
            Debug.Assert(indexes != null && indexes.Length > 0);

            if (indexes.Length > 1)
                return false; // multi-dimensional arrays are not supported.

            if (object.ReferenceEquals(null, indexes[0]))
                return false;

            Type indexType = indexes[0].GetType();
            if (indexType.IsEnum)
            {
                indexType = Enum.GetUnderlyingType(indexType);
            }

            if (indexType == typeof(byte) || indexType == typeof(sbyte) || indexType == typeof(short) || indexType == typeof(ushort) || indexType == typeof(int) || indexType == typeof(uint))
            {
                uint index = unchecked((uint)indexes[0]);
                return Delete(index, false);
            }
            else
            {
                string index = indexes[0].ToString();
                return Delete(index, false);
            }
        }

I did add one set of precompilation directives so that I could modify ScriptEngine with one little item:

        public 
#if SUPPORT_DYNAMIC
            dynamic
#else
            object 
#endif
            GetGlobalValue(string variableName)
        {
            if (variableName == null)
                throw new ArgumentNullException("variableName");
            return TypeUtilities.NormalizeValue(this.Global.GetPropertyValue(variableName));
        }

With that gem, we're in good shape. I'll update the test program; let's take a good look:

    class Program
    {
        static void Main(string[] args)
        {
            ScriptEngine engine = new ScriptEngine();

            engine.SetGlobalFunction("write", (Action<string>) ((s) => { Console.WriteLine(s); }));
            
            engine.Execute(@"
var a = {
    A: 'A',
    B: 20,
    C: function() { return 'Hello'; }
};

function double(val)
{
    return val * 2;
}

var array = [1, 5, 9, 13, 21];
");
            dynamic obj = engine.Evaluate<ObjectInstance>("a");
            Console.WriteLine(obj.A);
            Console.WriteLine(obj.B);
            Console.WriteLine(obj.C());
            obj.D = "What's that?";

            Console.WriteLine("C#: " + obj.D);

            engine.Execute(@"
write('JavaScript: ' + a.D);
");

            dynamic dbl = engine.GetGlobalValue("double");
            Console.WriteLine(dbl(20));
            Console.WriteLine(dbl.call(null, 20));

            dynamic array = engine.GetGlobalValue("array");
            Console.WriteLine(array[2]);

            Console.ReadLine();
            
        }
    }

Output is happily correct:

A
20
Hello
C#: What's that?
JavaScript: What's that?
40
40
9

Particularly neat about this implementation is that the calls automatically recurse. Note that I use the intrinsic JavaScript call method on the Function instance (of dbl). This implementation covers a whole bunch of typical scenarios and use-cases, and I'm happy to see that it has worked out fairly well thus far.

One item I've found is that there's a TypeLoadException when targeting .NET 4. This has something to do with a new CAS policy in .NET 4. For now, applying this attribute to the test program as well as the library will resolve the issue, though I don't intend for it to be long-term:

[assembly: System.Security.SecurityRules(System.Security.SecurityRuleSet.Level1)]

Next time, we'll do some more fit and finish, with precompilation constants and a security review.

14Feb/120

Bridging the gap between Jurassic and the DLR, Part One

Posted by Rob Paveza

Part One: ObjectInstance derives from DynamicObject

A while back I posted that I was joining the Jurassic team; Jurassic is an open-source JavaScript engine for .NET. If you've ever gone through the long search for a JavaScript implementation on .NET (other than JScript.NET, of course), there are a bunch of incomplete implementations, and if you're lucky enough to find the blog about the Microsoft project of JScript running on the DLR (once called Managed JScript), you'll find that it was an implementation that was specifically designed to give design feedback on the DLR itself, and was not planned to be carried forward into production. Personally I think that's too bad, but I'm happy to see a couple of projects (notably, Jurassic and IronJS) that have stepped up to fill the gap.

I had considered implementing IDynamicMetaObjectProvider, but inheriting from DynamicObject seems to be a better design decision all-around. Since all of the other JavaScript objects inherit from ObjectInstance, it's a simple matter of overriding its virtual methods instead of creating a new implementation of DynamicMetaObject for each class in the hierarchy.

I've created a new library project within the solution as well as a simple testing project to advise on the API as well as to step into my DynamicObject overrides. Here are some simple components:

// These are new:
using System.Dynamic;
using System.Diagnostics;

// This is updated
    public class ObjectInstance
        : DynamicObject
#if !SILVERLIGHT
        , System.Runtime.Serialization.IDeserializationCallback
#endif
    {

// The class exists as normal
        public override bool TryGetMember(GetMemberBinder binder, out object result)
        {
            result = GetNamedPropertyValue(binder.Name, this);
            if (result != null)
                return true;

            return false;
        }

        public override bool TryInvokeMember(InvokeMemberBinder binder, object[] args, out object result)
        {
            try
            {
                result = CallMemberFunction(binder.Name, args);
                return true;
            }
            catch
            {
                result = null;
                return false;
            }
        }

        public override bool TrySetMember(SetMemberBinder binder, object value)
        {
            this.AddProperty(binder.Name, value, PropertyAttributes.FullAccess, true);
            return true;
        }

    }

This is the source of the test application. It's very straightforward:

        static void Main(string[] args)
        {
            ScriptEngine engine = new ScriptEngine();

            engine.SetGlobalFunction("write", (Action<string>) ((s) => { Console.WriteLine(s); }));
            
            engine.Execute(@"
var a = {
    A: 'A',
    B: 20,
    C: function() { return 'Hello'; }
};
");
            dynamic obj = engine.Evaluate<ObjectInstance>("a");
            Console.WriteLine(obj.A);
            Console.WriteLine(obj.B);
            Console.WriteLine(obj.C());
            obj.D = "What's that?";

            Console.WriteLine("C#: " + obj.D);

            engine.Execute(@"
write('JavaScript: ' + a.D);
");

            Console.ReadLine();
            
        }

We create a global function 'write' which writes a string to the console. Then we create a global object a with properties A, B, and C. We then use C# to retrieve the value of this object as a dynamic. This is what provides us access to the DynamicObject's overrides intrinsic within C#'s support of the DLR. We then access each property (which each return a dynamic) and, happily because of Jurassic's automatic conversion of primitives to their respective .NET types, when these values are returned as dynamic, they can be automatically converted to their appropriate types for their Console.WriteLine parameter. You can see that we invoke TryGetMember on A and B, TryInvokeMember on C, and TrySetMember and then TryGetMember on D.

OK, it's late, so I'm not going to stick this out anymore right now. I'm not even sure if the previous paragraph was particularly coherent. :-)

There's a lot to update: it doesn't support case-insensitive languages (like Visual Basic), it's not particularly good at error checking, and I haven't dealt with any other components yet. The good news is that it seems like we should be good to go for the rest of the components.

Next time, we'll look at other classes, like ArrayInstance, FunctionInstance, and more.

17Jan/120

A very fast, random text string generator in C#, Part One

Posted by Rob Paveza

This is about half of a type I designed a few years ago in response to my boss's assertion that I couldn't improve on the speed of random string generators. This is a pretty handy type to have around when you want to generate a character string of a fixed length. It so happens that 6 characters can fit a "base 36" number very well into just over 231 bits (it's just a little bit bigger than int.MaxValue and so it stores the value as a uint).

What is included below is a first step. I want to show how to refactor it for improved performance and where certain design scenarios might be better; for example, repeatedly calling the CreateNew method results in repeatedly creating new RNGCryptoServiceProviders. Although I believe this class is a bit better about seeds as it incorporates hardware-dependent data, it isn't necessarily cryptographically strong. You can add some strength by passing in a single RNG to each subsequent call, for example.

This sample also only goes one way: from number to string. Next time, we'll show the reverse operation.

Note: an important inclusion:

using System.Security.Cryptography;
    /// <summary>
    /// Represents a number that may be transitioned to a six-digit alphanumeric (base-36) representation.  This type is not CLS-compliant.
    /// </summary>
    [CLSCompliant(false)]
    public struct AlphaNumericNumber
    {
        private const uint MAX_VALUE = 2176782336; // = RADIX ^ 6
        private const int RADIX = 36; // 10 digits + 26 alphabetics
        private uint m_val;

        /// <summary>
        /// Creates a new <see>AlphaNumericNumber</see> with the specified value.
        /// </summary>
        /// <param name="value">The value with which to assign the number.</param>
        public AlphaNumericNumber(uint value)
        {
            if (value > MAX_VALUE)
                value %= MAX_VALUE;
            m_val = value;
        }

        /// <summary>
        /// Creates a new, random <see>AlphaNumericNumber</see>.
        /// </summary>
        /// <returns>A randomly-chosen <see>AlphaNumericNumber</see>.</returns>
        public static AlphaNumericNumber CreateNew()
        {
            byte[] container = new byte[4];
            using (RNGCryptoServiceProvider rng = new RNGCryptoServiceProvider())
            {
                rng.GetNonZeroBytes(container);
            }

            return new AlphaNumericNumber(BitConverter.ToUInt32(container, 0));
        }

        /// <inheritdoc />
        public override string ToString()
        {
            char[] result = new char[6];
            uint msd = RADIX * RADIX * RADIX * RADIX * RADIX;

            uint currentDigit = msd;
            uint accumulator = m_val;
            unchecked
            {
                for (int charIndex = 0; charIndex < result.Length; charIndex++)
                {
                    uint currentDigitValue = accumulator / currentDigit;
                    accumulator -= (currentDigitValue * currentDigit);
                    if (currentDigitValue <= 9)
                    {
                        result[charIndex] = (char)('0' + currentDigitValue);
                    }
                    else
                    {
                        currentDigitValue -= 10; // adjust for 'A' equaling 10.
                        result[charIndex] = (char)('A' + currentDigitValue);
                    }

                    currentDigit /= RADIX;
                }
            }

            return new string(result);
        }

        /// <summary>
        /// Gets the value contained by this <see>AlphaNumericNumber</see>.
        /// </summary>
        public uint Value
        {
            get { return m_val; }
        }
    }
10Jan/120

Please, don’t ever use the “Remove Unused Namespaces” feature

Posted by Rob Paveza

Seriously, I don't know why this feature exists or who thought it would be a good idea. The time it takes to compile even the most complicated C# file is trivial.

On the other hand, when I try to access the .Take() method in System.Linq, or the Encoding class in System.Text, or suddenly I need a new List<T> from System.Collections.Generic - if you have used a tool to remove unused namespaces from your C# code file - you have made me wonder why my editor isn't detecting it.

There's a reason that many default namespaces are included with a default C# code file. Leave them there! Maintenance programmers will love you forever.

15Mar/110

The Microsoft Reactive Extensions

Posted by Rob Paveza

The honest truth is that I’m having difficulty establishing exactly what they could be used for, but they’re still really cool.  The Microsoft Reactive Extensions for the .NET Framework are the dual of LINQ: whereas LINQ operates over objects, or you might say pulls objects out of collections, the Reactive Extensions (Rx) handles push notifications.  It is the ultimate generalization of events and event handling within .NET.

Getting There

First, let’s consider the normal interfaces for IEnumerable:

interface IEnumerable<T>
{
    IEnumerator<T> GetEnumerator();
}

interface IEnumerator<T> : IDisposable
{
    T Current { get; }  // throws exception at end of enumeration
    bool MoveNext();
}

These interfaces (okay, really, the non-generic IEnumerable interface, but let’s not split hairs) are the foundation of the foreach C# keyword (and the For Each… In in Visual Basic).  A foreach can also be written, roughly, as:

foreach (string str in myListOfStrings)
    Console.WriteLine(str);
// rewritten:
using (IEnumerator<string> enumStr = myListOfStrings.GetEnumerator())
{
    while (enumStr.MoveNext())
    {
        Console.WriteLine(enumStr.Current);
    }
}

Keep this example in mind for later, because we’ll revisit how this can be used in Rx programming.

Dualism

Dualism is something of a mathematical concept, and I don’t want to get into it because I don’t completely understand it myself, but most nerdy people reading my blog will probably appreciate an example from particle physics.  Consider a proton: its physical dual is the antiproton (because when they meet they annhilate each other.  It’s not an electron, because while they have opposite charge, they have substantially different mass).

The core of Rx is the dual of IEnumerable.  That is, IObservable<T> and IObserver<T>.  But let’s deconstruct these piece by piece.  Let’s start at IEnumerator<T>:

interface IObserver<T>
{
    // T Current { get; }
    // That method looks like: T get_Current();
    void OnNext(T next);
    // Current throws an exception if MoveNext() previously returned false, so:
    void OnError(Exception error);

    // bool MoveNext() 
    // returns true while Current is populated, false when we reach the end, so:
    void OnDone();
}

You can see that, whereas everything in IEnumerator<T> pulled data, now we’ve transitioned into pushing data.  But the observer isn’t really the cool part; rather, it’s the subject that’s cool:

interface IObservable<T>
{
    // GetEnumerator() returned an object; here we pass one in
    // We still needed to capture the disposable functionality, so we return IDisposable
    IDisposable Subscribe(IObserver<T> observer);
}

Now, if you want to see the specifics about how these were constructed, you can check out the Expert-to-Expert video on Channel 9.  I’ve included some high-level notes, but they’re not really as deep as you can get with these guys.

Creating a Subject

Creating a subject is a bit of a challenge; subjects are event-driven, and those are generally kind of difficult to think about because the fit usually only into one of two buckets: user interaction and system I/O.  For sake of example, I’ve created a simple Windows Forms project to start with, that has a couple observable Buttons (the class is called ObservableButton, go figure), and an observer, which is the containing form.  You can download the starter project, which requires Visual Studio 2010 and the Rx Framework.

Subjects can be anything, though, and the power you can glean from these is amazing.  For the Red Bull NASCAR team, I created a server for a Twitter feed aggregator using Rx.  It started as reading a socket into HTTP data, then into chunked HTTP data, then into JSON packets, then into POCO objects that were then re-serialized and sent over the wire to N Flash clients.  As you can imagine, network programming, social programming, or any other kind of programming where an event is coming in unpredictably is a great candidate for this.  Why?

Let’s look at the use case I just listed.  As Twitter’s live stream service sends data over the wire, I need to parse it and send it to a lot of listening sockets.  But I don’t want to just say “Oh I just got the data, let me send it out again” – that would possibly slow down processing on other threads, because I might have to wait – my socket might already be in the process of sending data and so it’s in an invalid state to send further data.  If I had tied a server socket directly to the “I’m ready to send” signal directly, I would have been in trouble.  Rather, I had a utility (an Observer) that aggregated incoming messages until all server sockets were ready to send, at which point it would push those updated messages to the server sockets.

Let’s look at the sample program:

image

This isn’t really anything spectacular.  I could have done that with regular event handlers.

Aggregating Subjects

The magic of Rx, from my perspective, lies with what you can do with subjects.  I’m no longer initializing my constructor to require two lines – I’m merging the two buttons into one observable sequence:

        public Form1()
        {
            InitializeComponent();

            observableButton1.Merge(observableButton2).Subscribe(this);
        }

 

The result is identical – the events get handled and all is good.

Modifying Sequences

Now I’m going to change the class definition slightly:

    public partial class Form1 : Form, IObserver<Timestamped<string>>
    {
        public Form1()
        {
            InitializeComponent();

            observableButton1.Merge(observableButton2).Timestamp().Subscribe(this);
        }

        public void OnNext(Timestamped<string> value)
        {
            this.textBox1.Text += value.Timestamp.ToString("hh:mm tt   ") + value.Value + Environment.NewLine;
        }

        public void OnError(Exception error)
        {
            this.textBox1.Text += "Exception caught: " + Environment.NewLine + error.ToString() + Environment.NewLine;
        }

        public void OnCompleted()
        {
            this.textBox1.Text += "Sequence completed." + Environment.NewLine;
        }
    }

Note that by adding in the .Timestamp() call, I’ve transformed the observable to sequence of strings to be an observable sequence of timestamped strings.  That’s pretty cool, right?

This is even cooler: the Delay() method:

observableButton1.Merge(observableButton2).Timestamp()
                .Delay(new TimeSpan(0, 0, 1)).ObserveOn(this).Subscribe(this);

The ObserveOn method accepts a Windows Forms control, a Dispatcher (for WPF), or other scheduler implementation that can be used to synchronize the delay.  If I didn’t include it, the delayed merge would be called on a different thread, and we’d get an InvalidOperationException (because you can’t update a window on a thread other than the thread that created it). 

Do you want to avoid repetition?

            observableButton1.Merge(observableButton2).Timestamp()
                .DistinctUntilChanged(ts => ts.Value).Subscribe(this);

This produced output that only emitted one message, no matter how many times I clicked the same button, until I clicked the other button.

So, What Can We Do?

Well, right now it doesn’t seem like there’s a lot of tooling for Rx.  There’s a community wiki around the framework, though, and I think that we can eventually see a lot of good use.

Some ideas:

  • Develop a way to completely repeat ASP.NET requests.  Treat IIS as an IObservable<AspNetRequest>, where AspNetRequest contains all the state data that would otherwise populate these tools, which would immensely help with debugging.  Imagine when your tester only needs to record a series of test cases once, and otherwise is just testing for UI errors.
  • Wrap event-oriented APIs for simplified logging and replaying.  (In JinxBot, an event-oriented chat API named for my cat, I always wanted to capture all the events of the core API and be able to replay them via a subclass, which would have allowed pixel-perfect replay of a chat session).
  • Handle periodic data services like Twitter, SMS, email, or others in a clean and efficient way.

I’d like to see this take off, but it’s a very different way of looking at programming than what most .NET developers are used to.  Enjoy it, take a look, and let’s build it up!

27May/103

Launching OpenGraph.NET

Posted by Rob Paveza

Tonight I’m publishing to Codeplex a project that I’ve been working on for about a month, that I’ve called OpenGraph.NET.  It’s a C# client for Facebook’s still-new Graph API.  It currently supports regular desktop applications, web sites (using Web Forms and ASP.NET MVC), and to some extent, Silverlight.  All of the groundwork is there – it’s just going to take a bit more work to get it across the finish line.  I’m calling it version 0.9.1 "Beta”.  (Maybe I’ll come up with some clever name like “Froyo,” like the operating system on my phone).

image

OpenGraph.NET’s documentation is available at http://robpaveza.net/opengraph.net/docs/ and the project can be downloaded from CodePlex at http://opengraph.codeplex.com/.  There are also a couple demos on the CodePlex site within the download.

OpenGraph.NET is licensed with the new BSD license – basically, you can use it for whatever you want, but if you hand out the project publically, either compiled or as source code, you should include a copy of my copyright notice and license terms.  I’m not an advocate of copyleft, but I would certainly welcome patch submissions.  Over the weekend, I’ll be porting the source code repository from my web server onto CodePlex.

One more note – it IS indeed working out there.  We’re using it on a currently-undisclosed project at Terralever for an event being hosted by one of our clients, and I am using the Real Time Updates handler for it as well.

Over the coming weeks, I’ll be talking about the internals of how this works, including dynamic methods.

I’d like to mention a big thank-you to James Newton-King, for the awesome Json.NET library which is used extensively throughout OpenGraph.NET.

19Jan/100

Improving Performance with Dynamic Methods Part 1: The Problem Definition

Posted by Rob

One of the problems that a large part of the a certain gaming community has understood over the years has been one of version checking.  A common, though now older, method of version checking among this community has been to execute a known algorithm based on a seeded value; however, the algorithm would change based on a formula sent over the wire.  For instance, suppose for every four bytes in a file, there are four state values: A, B, C, and S.  The S value is the current four bytes of the file.  The server might send the following formula as an initialization: A=A-S B=B-C C=C+A A=A+B.  In addition, it sends some startup values for A, B, and C.  It means, that for every four bytes of the file, we need to perform the math in the stops outlined in the above file initialization string.

Now, one of the common ways to approach this problem has been to, basically, attack it by brute force.  We’d keep track of the state values in an array, then keep track of the indices of the state values in another array offset by their letters, then keep track of operators in another array, and finally doing double-dereferencing (dereferencing the index of the state value then actually dereferencing the state value.  So you might have code that looks like this:

foreach (step)
{
    states[Transform('S')] = ReadNext();
    foreach (string formula in list)
    {
        states[Transform(formula[0])] = DoMath(states[Transform(formula[2])], states[Transform(formula[4])], GetOperator(formula));
    }
}

Here, the “Transform” function translates a character to its index into the state value index.  This is a pretty sub-optimal solution given all of the extra dereferencing, and this is really a pseudo-implementation of this activity.  What would be best is if we could somehow unroll that inner loop and access the values directly (or through a single dereference, as a pointer would do).  In other words, it could be rewritten better like so:

foreach (step)
{
    S = ReadNext();
    A = A - S;
    B = B - C;
    C = C + A;
    A = A + B;
}

The challenge is that, the server provides the verification string, and it changes over time, so the client can’t reliably predict which combination of formulae will be used.  Although in the wild only a fixed set of combinations have ever been observed, there are a number of others that could potentially be presented, with no fixed number of formulas, three potential writeable state values and four readable state values per formula, and eight binary operators (+, –, *, /, %, &, |, and ^).  So, either we keep going with the inner loop, or we figure out some way to get all the benefits of compilation without the headaches of having to know exactly what we’re programming before we program it.  Fortunately, the .NET framework provides a way for us to do exactly that: dynamic methods.

To simplify the code that we need to generate, we’ll rewrite the inner code to look like this:

foreach (step)
{
    S = ReadNext();
    ExecuteStep(ref A, ref B, ref C, ref S);
}

Now, all we need to do is dynamically emit the ExecuteStep method.  To do so we’ll need to get into the System.Reflection.Emit namespace – kind of a scary place to be!  Fortunately, Reflector is going to make this easier for us – and we’ll be glad we’re doing this in IL.

In Part 2, we’ll look at how to actually emit the dynamic method by writing the equivalent code in C# and then looking at it in Reflector, then figuring out how to generate it at run-time.  Along the way, we’ll learn a little bit about the .NET evaluation stack.

Oh – one more thing – here’s why you should care about all of this.  A simple testing framework indicated a speed increase of a factor of four when changing this to use a dynamic method instead of the previous implementation.  Over 50 iterations, I observed the dynamic method versions taking a little less than 1/4 of the execution time of the original array-based implementation.

Now, if that’s not a marked improvement, I don’t know what is.  But remember, as with all performance optimizations, your mileage may vary.

Improving Performance with Dynamic Methods

  • Part 1: The Problem Definition
  • Part 2: Emit and Execute
8Apr/090

Unsung C# Hero: Closure

Posted by Rob

Today I’m going to talk about a feature of C# that has been around since 2.0 (with the introduction of anonymous delegates) but which gets nearly no lip service and, despite the fact that most C# developers have probably used it, they’ve probably used it without thinking about it.  This feature is called closure, and it refers to the ability of a nested function to make reference to the surrounding function’s variables.

This article will make extensive discussion of how delegates are implemented in C#; a review may be appropriate before diving in.  Also, we’ll be making a lot of use of The Tool Formerly Known as Lutz Roeder’s .NET Reflector, which is now owned by Red Gate Software.

Anonymous Methods without Closure

Supposing that I had a reason to do so, I could assign an event handler as an anonymous method.  I think this is generally bad practice (there is no way to explicitly dissociate the event handler, because it doesn’t have a name), but you can:

    public partial class SampleNoClosure : Form
    {
        public SampleNoClosure()
        {
            InitializeComponent();

            button1.Click += delegate
            {
                MessageBox.Show("I was clicked!  See?");
            };
        }
    }

This will work as expected; on click, a small alert dialog will appear.  Nothing terribly special about that, right?  We could have written that as a lambda expression as well, not that it buys us anything.  It looks like this in Reflector:

Anonymous method with no closure.

We see that the compiler auto-generates a method that matches the appropriate signature.  Nothing here should be completely surprising.

Simple Example of Closure

Here is a sample class that includes closure.  The enclosed variable is sum.  You’ll note that everything just makes sense internally, right? 

    public partial class SimpleClosureExample : Form
    {
        public SimpleClosureExample()
        {
            InitializeComponent();

            int sum = 1;
            for (int i = 1; i <= 10; i++)
                sum += sum * i;

            button1.Click += delegate
            {
                MessageBox.Show("The sum was " + sum.ToString());
            };
        }
    }

So, it only makes sense that sum can be part of that anonymous function, right?  But we need to bear in mind that all C# code must be statically-compiled; we can’t just calculate sum.  Besides, what happens if the value was a parameter to the function?  Something that couldn’t be precompiled?  Well, in order to handle these scenarios, we need to think about how this will work.

In order to keep that method state alive, we need to create another object.  That’s how the state can be maintained regardless of threads and regardless of calls to the function.  We can see it as a nested class here, and the anonymous method looks just like it does in code:

Closure supporting class

A More Advanced Example

Whenever you work with a LINQ expression, chances are you’re using closure and anonymous functions (and lambda expressions) and don’t realize it.  Consider this LINQ-to-SQL query:

            int boardID = 811;
            int perPage = 20;
            int pageIndex = 0;

            var topics = (from topic in dc.Topics
                          orderby topic.IsSticky descending, topic.LastMessage.TimePosted descending
                          where topic.BoardID == boardID
                          select new
                          {
                              topic.TopicID,
                              Subject = topic.FirstMessage.Subject,
                              LatestSubject = topic.LastMessage.Subject,
                              LatestChange = topic.LastMessage.ModifiedTime,
                              NameOfUpdater = topic.LastMessage.PosterName,
                              Updater = topic.LastMessage.User,
                              Starter = topic.FirstMessage.User,
                              NameOfStarter = topic.FirstMessage.PosterName,
                              topic.ReplyCount,
                              topic.ViewCount
                          })
                            .Skip(perPage * pageIndex)
                            .Take(perPage);
            foreach (var topic in topics)
            {
                Console.WriteLine("{0} - {1} {2} {3} {4} by {5}", topic.Subject, topic.NameOfStarter, topic.ReplyCount, topic.ViewCount, topic.LatestChange, topic.NameOfUpdater);
            }

The closure here is happening within the where clause; you may recall that the C# where clause evaluates to the IEnumerable<T> extension method Where(Func<TSource, bool> predicate).

Here, it’s very easy to imagine a case where we wanted to write actual parameters.  This query is used to generate and display a topic list for a message board; all “stickied” posts should be at the top and the rest should be sorted by last time posted.  If I’m making that into a web server control, I’m going to need to not hard-code the board ID, the number of topics per page to display, and which page I’m looking at.

Now, this is kind of a hard thing to conceptualize; when I was going through this project, I expected all three variables to be incorporated into the class.  It turns out that Skip() and Take() don’t evaluate a lambda expression – they take straight values – so we don’t ultimately have to store them for evaluation later.  However, as expected, boardID does have to be stored, and that provides us with an interesting question of why.  And you might be asking why that is even the case; LINQ-to-SQL translates this into SQL for us:

SELECT TOP (20) [t0].[TopicID], [t2].[Subject], [t1].[Subject] AS [LatestSubject], [t1].[ModifiedTime] AS [LatestChange], [t1].[PosterName] AS [NameOfUpdater], [t4].[test], [t4].[UserID], [t4].[Username], [t4].[Email], [t4].[PasswordHash], [t6].[test] AS [test2], [t6].[UserID] AS [UserID2], [t6].[Username] AS [Username2], [t6].[Email] AS [Email2], [t6].[PasswordHash] AS [PasswordHash2], [t2].[PosterName] AS [NameOfStarter], [t0].[ReplyCount], [t0].[ViewCount]
FROM [dbo].[Topics] AS [t0]
LEFT OUTER JOIN [dbo].[Messages] AS [t1] ON [t1].[MessageID] = [t0].[LastMessageID]
LEFT OUTER JOIN [dbo].[Messages] AS [t2] ON [t2].[MessageID] = [t0].[FirstMessageID]
LEFT OUTER JOIN (
    SELECT 1 AS [test], [t3].[UserID], [t3].[Username], [t3].[Email], [t3].[PasswordHash]
    FROM [dbo].[Users] AS [t3]
    ) AS [t4] ON [t4].[UserID] = [t1].[UserID]
LEFT OUTER JOIN (
    SELECT 1 AS [test], [t5].[UserID], [t5].[Username], [t5].[Email], [t5].[PasswordHash]
    FROM [dbo].[Users] AS [t5]
    ) AS [t6] ON [t6].[UserID] = [t2].[UserID]
WHERE [t0].[BoardID] = @p0
ORDER BY [t0].[IsSticky] DESC, [t1].[TimePosted] DESC

So why, if we already have the SQL generated, do we need to bother with it?  Well, you may recall that LINQ-to-SQL doesn’t support all possible operators.  If we break support for the LINQ-to-SQL query and we have to pull back all of the relevant items, we’ll have to use that class.  At this point though, it goes unused.

Review

A closure is when you take the variables of a function and use them within a function declared inside of it – in C#, this is through anonymous delegates and lambda expressions.  C# typically will accomplish the use of closures by creating an implicit child class to contain the required state of the function as it executes, handing off the actual method to the contained class.

Further Reading

2Apr/090

Your Own Transactions with LINQ-to-SQL

Posted by Rob

I’m working on porting an existing forum-based community from SMF to a new .NET-based forum platform that I’m authoring.  I’m excited about it; I love SMF, but it doesn’t have what I want and frankly, it’s a scary beast to try to tackle.  I’d considered using some kind of bridge between it and my code, but I knew I wanted deep integration of the forums with the new community site, and I wanted the community site in .NET.  So I made the decision to write an importer to talk between MySQL and my SQL Server-based solution.  I chose LINQ-to-SQL as my O/R mapper because, quite frankly, I find it much easier and more elegant to work with; so far as I know, I’m not the only one who thinks so.

Because of the nature of the data that I’m importing, I needed to run several SubmitChanges() calls to get the data into the database.  But I wanted to make sure that these submissions only worked if they ALL worked.  So I needed a transaction external to the normal LINQ-to-SQL in-memory object mapper.  Unfortunately, when I began a transaction using the underlying Connection property of the DataContext, I was met with an error:

System.InvalidOperationException: SqlConnection does not support parallel transactions.
   at System.Data.SqlClient.SqlInternalConnection.BeginSqlTransaction(IsolationLevel iso, String transactionName)
   at System.Data.SqlClient.SqlInternalConnection.BeginTransaction(IsolationLevel iso)
   at System.Data.SqlClient.SqlConnection.BeginDbTransaction(IsolationLevel isolationLevel)
   at System.Data.Linq.DataContext.SubmitChanges(ConflictMode failureMode)

The solution was simple: DataContext has a Transaction property!  By setting this to the transaction that I was beginning, I was able to run the complete import in a single transaction:

dc.Connection.Open();
using (DbTransaction transaction = dc.Connection.BeginTransaction(IsolationLevel.ReadCommitted))
{
    dc.Transaction = transaction;
    try
    {
        // do databasey things
        dc.SubmitChanges();

        transaction.Commit();
    }
    catch (Exception ex)
    {
        transaction.Rollback();
        Console.WriteLine("Exception caught; transaction rolled back.");
        Console.WriteLine(ex.ToString());
    }
}

It took about 2 minutes to import 37,000 or so messages, plus all users, categories, forums, private messages, and polls from SMF.  The app ends up taking something in the neighborhood of 120mb of memory (I need to keep objects around to reference them for their children, since I assign new IDs), but it’s still a small one-off price to pay.

Tagged as: , , No Comments
13Mar/090

Speedy C#, Part 4: Using – and Understanding – CLR Profiler

Posted by Rob

CLR Profiler is a free and incredibly useful tool offered by Microsoft.  I'm fairly certain its primary use (at least from Microsoft's perspective) is to illustrate use of the CLR Profiling COM APIs, which aren't exceptionally clear-cut (in my opinion), particularly from a .NET programmer's point of view.  The really difficult part of using CLR Profiler is becoming accustomed to its interface and the data it presents; however, once you do so, I'm certain you'll find it incredibly helpful in addressing difficulties with memory usage.  This article aims to introduce you to the "important parts" of CLR Profiler - specifically, which graphs you should view, how you should interpret them, and how to address the problems you find.  This article will not review some of the more complicated parts of injecting CLR Profiler into something such as your ASP.NET application; there are other resources for that purpose.

For the purposes of this article, I've re-introduced a wasteful error into BN# that I found by using CLR Profiler.  We'll work through finding it in this article.

Getting Started

Once you have CLR Profiler "installed" - and I use the term loosely - you can start the application from the install path (don't look for a Start Menu item).  There are two versions of binaries, x86 and x64 versions; you should know which edition of the application you'd like to run.  If you're running a platform-neutral application (most .NET apps would fall under this category), and you're on an x64 system, you should use that one.  If you're running 32-bit Windows, or are running a program specifically targeted to x86, then you should run the x86 version of CLR Profiler.

As an important note, for Windows Vista users, if you're running with UAC enabled, make sure to run CLR Profiler as an administrator.  CLR Profiler works by injecting a COM DLL into the target, but it can't do that if you're not running the process as an administrator.

CLR Profiler while it's not running anything

When profiling memory, I turn off Calls tracking: it's located in the bottom-right of the UI window.

If your application requires access to the local application directory - for instance, by using the Application class in Windows Forms - you should go through the explicit Profile Application menu item within the File menu, and set the working directory option of that UI.  Otherwise, go ahead and click Start Application, browse to your application, and go.

During Operation

Other than the fact that your application will be measurably slower, you should be able to run the application as you otherwise would.  Your mileage will vary, but you'll get better results with more memory in your system.  But all developers have at least 4gb powering their boxes now, right?

During the application, you can click on the Show Heap now button on the main CLR Profiler GUI, which will display a heap graph of the current application, displaying the path to all currently allocated memory:

Heap Graph of current profile

To be honest, I find the heap graph to be relatively confusing, but the good news is that you don't need to keep using it.  But once you've dumped that temporary log, you can view the current heap and interesting information by closing that window and, in the main CLR Profiler window, going to the View menu, and choosing Summary, which displays a cool window:

A result summary of a profile

This window helps you understand what's happening:

  • Allocated bytes is really interesting – it relates the total amount of memory that you’ve allocated within managed code.
  • Final Heap Bytes is the amount of managed memory that currently is in use on the heap.  This doesn't necessarily reflect unmanaged items.
  • Relocated Bytes is the amount of memory that has been moved by the garbage collector during compaction operations.
  • Gen X collections shows the number of garbage collections that have occurred for each generation.
  • Garbage Collector Generation Sizes shows the number of bytes being used by each heap.

What's Happening with BN#?

I had a suspicion based on memory usage (reported by Task Manager) that BN# wasn’t quite as efficient as I would have hoped.  I wanted to do some investigation, so I plugged in CLR Profiler.  After a 30-second (or so) connection to Battle.net, joining Clan Recruitment, this is what I saw:

Profile of BN# with intentional memory bug

That’s pretty heavy – 31mb or so total allocated memory but only ending up with about 3mb on the heap and only 3.5mb were relocated throughout the lifetime of the app – that told me that I was doing a lot of allocating and freeing very rapidly.  What’s the next step?

I clicked on the Allocation Graph button and took a look:

Allocation graph indicating 10mb of byte[] on the heap.

In this we can see that byte arrays are on the heap frequently and account for about 35% of all memory allocations.  That’s a big problem – especially since I pooled their creation already!  CLR profiler helps me track it down though, as I follow the highlighted call chain back to its source:

The culprit

This image indicates that I have a problem with a method called DataReader::get_m_data().  Now, as I mentioned, I had to recreate this problem, and the path of least resistance for me was to change the identifier m_data (used frequently in DataReader) to be a property instead of a field, so originally this said get_Data.  I thought that was odd until I saw its implementation:

        protected virtual byte[] Data
        {
            get
            {
                byte[] dataCopy = new byte[_m_data.Length];
                Buffer.BlockCopy(_m_data, 0, dataCopy, 0, dataCopy.Length);
                return dataCopy;
            }
        }

So here, for every operation that accesses the Data property (in the original implementation, it was every operation, because the Data property was virtual), I was duplicating the entire arrayEVERY TIME.

I then changed the implementation so that operations defined within the base class wouldn’t needlessly go through a property, and derived classes had direct access to the buffer by reference (via the UnderlyingBuffer property).  What were my results?

Final Results

I think that fairly well speaks to the effectiveness of using tools like this. :)  A decrease of 27% in allocations, 33% in gen-0 collections, and 53% decrease of the amount of byte[] allocations:

Updated allocation graph

Further Reading

The "Speedy C#" Series: