Revealing Prototype Pattern: Pros and Cons
A short while ago, I wrote a post generally saying good things about the Revealing Prototype Pattern but mostly focused tearing down the other part that was presented with it, namely the way that variables were declared in a chain separated by the comma operator. This post will discuss some of the pros and cons of using this pattern, and give some examples about when you should or shouldn't use it.
Advantages
As Dan notes in his post, this features a significant improvement over straight prototype assignment (assignment of an object literal to a prototype), in that private visibility is supported. And because you're still assigning an object to the prototype, you are able to take advantage of prototypal inheritance. (Next time: How prototypal inheritance really works and how it can make your head explode).
Except for chaining his variable declarations, I have to admit Dan had me pretty well sold on the Revealing Prototype Pattern. It's very elegant; it provides good protection to its inner variables, and it uses a syntax we've been seeing more and more of ever since jQuery became a popular platform.
Unfortunately, it has some nasty drawbacks.
Disadvantages
To be fair, Dan lists some of the disadvantages about this pattern; however, he doesn't quite list them as such, and I think he was possibly unaware of some of their implications:
There's something interesting that happens with variables though, especially if you plan on creating more than one Calculator object in a page. Looking at the public functions you'll see that the 'this' keyword is used to access the currNumberCtl and eqCtl variables defined in the constructor. This works great since the caller of the public functions will be the Calculator object instance which of course has the two variables defined. However, when one of the public functions calls a private function such as setVal(), the context of 'this' changes and you'll no longer have access to the two variables.
The first time I read through that I glossed over the problems; I didn't quite understand the issue until I wrote some code. So let's do that - we'll implement the Java StringTokenizer class:
function StringTokenizer(srcString, delim)
{
if (typeof srcString === 'undefined')
throw new ReferenceError("Parameter 0 'srcString' is required.");
if (typeof srcString !== 'string')
srcString = srcString.toString();
if (typeof delim !== 'string')
delim = ' ';
if (!(this instanceof StringTokenizer)) // enforce constructor usage
return new StringTokenizer(srcString, delim);
this.sourceString = srcString;
this.delimiter = delim;
}
StringTokenizer.prototype = (function()
{
var that = this;
var _tokens = that.sourceString.split(that.delimiter);
var _index = 0;
var _countTokens = function() { return _tokens.length; };
var _hasMoreTokens = function() { return _index < _tokens.length; };
var _nextToken = function()
{
if (!_hasMoreTokens())
return false;
var next = _tokens[_index];
_index += 1;
return next;
};
var _reset = function() { _index = 0; };
var resultPrototype =
{
countTokens: _countTokens,
hasMoreTokens: _hasMoreTokens,
nextToken: _nextToken,
reset: _reset
};
return resultPrototype;
})();
If you've ever written a jQuery plugin, you'll probably recognize what I did with the prototype assignment function; when writing jQuery plugins, it's common to close over the current instance of the jQuery object by assigning var that = $(this); so that you can write event-handler functions without losing access to the overall context. Unfortunately, what I did in this case is wrong; you may already see why.
var that = this;
In this context, this is a reference to the global object, not to the instance of the object - even though the prototype is being set. This is a generalization of what Dan said. Rewriting it to overcome it results in information leaking:
function StringTokenizer(srcString, delim)
{
if (typeof srcString === 'undefined')
throw new ReferenceError("Parameter 0 'srcString' is required.");
if (typeof srcString !== 'string')
srcString = srcString.toString();
if (typeof delim !== 'string')
delim = ' ';
if (!(this instanceof StringTokenizer)) // enforce constructor usage
return new StringTokenizer(srcString, delim);
this.sourceString = srcString;
this.delimiter = delim;
this.tokens = srcString.split(delim);
this.index = 0;
}
StringTokenizer.prototype = (function()
{
var _countTokens = function() { return this.tokens.length; };
var _hasMoreTokens = function() { return this.index < this.tokens.length; };
var _nextToken = function()
{
if (!this.hasMoreTokens())
return false;
var next = this.tokens[this.index];
this.index += 1;
return next;
};
var _reset = function() { this.index = 0; };
var resultPrototype =
{
countTokens: _countTokens,
hasMoreTokens: _hasMoreTokens,
nextToken: _nextToken,
reset: _reset
};
return resultPrototype;
})();
The code works correctly; but you can see that we have to make public all of the state variables we'll use in the constructor. (The alternatives are to either initialize the state variables in each function, where they would still be public; or to create an init function, which would still cause the variables to be public AND would require the user to know to call the init function before calling anything else).
Dan also indicated that you needed a workaround for private functions:
There are a few tricks that can be used to deal with this, but to work around the context change I simply pass “this” from the public functions into the private functions.
Personally, I prefer to try to avoid things one might call clever or tricky, because that's code for "so complex you can't understand it". But even in the case where you have a public function, you'll still get an error if you don't reference it via a public function call. This error is nonintuitive and could otherwise make you go on a bug hunt for a long time. Consider this change to the above code:
var _hasMoreTokens = function() { return this.index < this.tokens.length; };
var _nextToken = function()
{
if (!_hasMoreTokens()) // changed from: if (!this.hasMoreTokens())
return false;
var next = this.tokens[this.index];
this.index += 1;
return next;
};
Simply removing the 'this' reference in the caller is enough to cause 'this' to go out-of-scope in the _hasMoreTokens function. This is completely unintuitive behavior for developers who grew up in the classical inheritance model.
Alternatives
I wouldn't want to give you all of these options without giving you an alternative. The alternative I present here is one in which the entire object is populated in the constructor:
"use strict";
function StringTokenizer(srcString, delim)
{
if (typeof srcString === 'undefined')
throw new ReferenceError("Parameter 0 'srcString' is required.");
if (typeof srcString !== 'string')
srcString = srcString.toString();
if (typeof delim !== 'string')
delim = ' ';
if (!(this instanceof StringTokenizer)) // enforce constructor usage
return new StringTokenizer(srcString, delim);
if (typeof Object.defineProperty !== 'undefined')
{
Object.defineProperty(this, 'sourceString', { value: srcString });
Object.defineProperty(this, 'delimiter', { value: delim });
}
else
{
this.sourceString = srcString;
this.delimiter = delim;
}
var _tokens = this.sourceString.split(this.delimiter);
var _index = 0;
var _countTokens = function() { return _tokens.length; };
var _hasMoreTokens = function() { return _index < _tokens.length; };
var _nextToken = function()
{
if (!_hasMoreTokens())
return false;
var next = _tokens[_index];
_index += 1;
return next;
};
var _reset = function() { _index = 0; };
if (typeof Object.defineProperty !== 'undefined')
{
Object.defineProperty(this, 'countTokens', { value: _countTokens });
Object.defineProperty(this, 'hasMoreTokens', { value: _hasMoreTokens });
Object.defineProperty(this, 'nextToken', { value: _nextToken });
Object.defineProperty(this, 'reset', { value: _reset });
}
else
{
this.countTokens = _countTokens;
this.hasMoreTokens = _hasMoreTokens;
this.nextToken = _nextToken;
this.reset = _reset;
}
}
The advantage of a structure like this one is that you always have access to this. (Note that this example is unnecessarily large because I've taken the additional step of protecting the properties with Object.defineProperty where it is supported). You always have access to private variables and you always have access to the state. The unfortunate side effect of this strategy is that it doesn't take advantage of prototypal inheritance (it's not that you can't do it with this strategy - more of that coming in the future) and that the entire private and public states (including functions) are closed-over, so you use more memory. Although, one may ask: is that really such a big deal in THIS sample?
Usage Considerations
The Revealing Prototype Pattern can be a good pattern to follow if you're less concerned with maintaining data integrity and state. You have to be careful with access non-public data and functions with it, but it's pretty elegant; and if you're working on a lot of objects, you have the opportunity to save on some memory usage by delegating the function definitions into the prototype rather than the specific object definition. It falls short, though, when trying to emulate classical data structures and enforce protection mechanisms. As such, it can require complicated or clever tricks to work around its shortcomings, which can ultimately lead to overly-complex or difficult-to-maintain code.
Like most patterns, your mileage may vary.
Defining Variables in JavaScript
I've lately been reviewing different patterns and practices recently, and after reading his article about the Revealing Prototype Pattern, I wanted to take some time to analyze Dan Wahlin's approach to defining variables. It's hard to believe I found some common ground with Douglas Crockford, but as they say about broken clocks.... [Addendum: apparently, since JSlint says to 'combine with previous var statement,' I don't agree with Crockford.] Anyway, to begin, this post is inspired by Dan Wahlin's presentation of the Revealing Prototype Pattern; I noticed what I thought was a curious way for him to define his private variables, and looking back through his blog posts he discussed it in the original blog post of the series, Techniques, Strategies, and Patterns for Structuring JavaScript Code. For the most part, I like what Dan has to say, but I'm going to have to disagree when it comes to defining variables.
The Proposition
As Dan points out, this is the standard way of defining variables in JavaScript:
var eqCtl; var currNumberCtl; var operator; var operatorSet = false; var equalsPressed = false; var lastNumber = null;
He advocates trading that for this:
var eqCtl,
currNumberCtl,
operator,
operatorSet = false,
equalsPressed = false,
lastNumber = null;
It saves on 20 keystrokes, and he claims improved readability. Now, I disagree with Crockford's argument that, because JavaScript hoists variables to the top of the function, that you should always declare the variable there. I believe that, whenever possible, you should try to maximize locality of a variable. This is a principle discussed in Steve McConnell's Code Complete; the reasoning behind maximization of locality is that the human brain can only comprehend so much at once. (This is, of course, another argument in favor of many, simple, and small subroutines). By delaying the declaration of a variable until it needs to be used, we are able to better-comprehend the meaning of the variable and how its use affects and relates to the rest of the program. As such, I believe that one of the premises for moving these declarations into a combined var statement - specifically, to reflect the hoisting - is a poor rationale.
Let's carry on.
Similarities to Other Elements
In The Prototype Pattern, Dan demonstrates the use of a JavaScript object literal in assignment to the Calculator prototype, so that any object created using the Calculator constructor would inherit all of those properties:
Calculator.prototype = {
add: function (x, y) {
return x + y;
},
subtract: function (x, y) {
return x - y;
},
multiply: function (x, y) {
return x * y;
},
// ...
};
The important thing to note here is that we are simply defining an object literal; we are not writing procedural code, and that comma is not an operator! It is an important part of the JavaScript grammar, to be sure, but the comma here does not have the same semantic meaning as the comma we saw before. This subtle difference may lead to coding errors, in which someone who uses comma syntax with both will mistakenly believe they are declaring an object literal and use colons to separate the identifier from the value; or that they are declaring variables and use assignment syntax to separate the property from its value.
Comma Operator Considered Harmful
It surprises me to learn that JSlint advocates combining var declarations. Crockford's The Elements of JavaScript Style, Part 1 indicates that he isn't terribly fond of it either:
The comma operator was borrowed, like much of JavaScript's syntax, from C. The comma operator takes two values and returns the second one. Its presence in the language definition tends to mask certain coding errors, so compilers tend to be blind to some mistakes. It is best to avoid the comma operator, and use the semicolon statement separator instead.
Whichever way Crockford prefers it, I think what we need to remember is just because you CAN do something does not mean you SHOULD.
Let's consider Dan's full body of JavaScript from the Revealing Prototype Pattern. I'm going to shrink it a little bit, to emphasize the changes I'll make; and I'm removing any of his comments.
var Calculator = function (cn, eq) {
this.currNumberCtl = cn;
this.eqCtl = eq;
};
Calculator.prototype = function () {
var operator = null,
operatorSet = false,
equalsPressed = false,
lastNumber = null,
add = function (x, y) { return x + y; },
subtract = function (x, y) { return x - y; },
multiply = function (x, y) { return x * y; },
// I'm going to do something evil here.
divide = function (x, y) {
if (y == 0) {
alert("Can't divide by 0");
}
return x / y;
},
setVal = function (val, thisObj) { thisObj.currNumberCtl.innerHTML = val; },
setEquation = function (val, thisObj) { thisObj.eqCtl.innerHTML = val; },
clearNumbers = function () {
lastNumber = null;
equalsPressed = operatorSet = false;
setVal('0',this);
setEquation('',this);
},
setOperator = function (newOperator) {
if (newOperator == '=') {
equalsPressed = true;
calculate(this);
setEquation('',this);
return;
}
if (!equalsPressed) calculate(this);
equalsPressed = false;
operator = newOperator;
operatorSet = true;
lastNumber = parseFloat(this.currNumberCtl.innerHTML);
var eqText = (this.eqCtl.innerHTML == '') ?
lastNumber + ' ' + operator + ' ' :
this.eqCtl.innerHTML + ' ' + operator + ' ';
setEquation(eqText,this);
},
numberClick = function (e) {
var button = (e.target) ? e.target : e.srcElement;
if (operatorSet == true ||
this.currNumberCtl.innerHTML == '0') {
setVal('', this);
operatorSet = false;
}
setVal(this.currNumberCtl.innerHTML + button.innerHTML, this);
setEquation(this.eqCtl.innerHTML + button.innerHTML, this);
},
calculate = function (thisObj) {
if (!operator || lastNumber == null) return;
var displayedNumber = parseFloat(thisObj.currNumberCtl.innerHTML),
newVal = 0;
switch (operator) {
case '+':
newVal = add(lastNumber, displayedNumber);
break;
case '-':
newVal = subtract(lastNumber, displayedNumber);
break;
case '*':
newVal = multiply(lastNumber, displayedNumber);
break;
case '/':
newVal = divide(lastNumber, displayedNumber);
break;
}
setVal(newVal, thisObj);
lastNumber = newVal;
};
return {
numberClick: numberClick,
setOperator: setOperator,
clearNumbers: clearNumbers
};
} ();
Note my comment: "I'm going to do something evil here." Here goes:
console.log('Hello, world')
Do you see what happened here? Let me put it in context.
subtract = function (x, y) { return x - y; },
multiply = function (x, y) { return x * y; },
console.log('Hello, world')
divide = function (x, y) {
if (y == 0) {
alert("Can't divide by 0");
}
return x / y;
},
JavaScript semicolon insertion blew away the var when I inserted any valid statement or expression. In fact, I could have simply put 'Hello, world!' or 5 there, on its own line, and because the next line is a valid statement that stands on its own, JavaScript semicolon insertion blew away the var. As such, divide, setVal, setEquation, clearNumbers, setOperator, numberClick, and calculate were all just elevated to global scope, possibly blowing away existing variables and leaking a whole bunch of information with them. This could happen in any instance in which someone mistakenly types a semicolon (let's be honest - JavaScript is a semicolon-terminated language; it will happen somewhat frequently), or if they forget to put a comma at the end of a line.
As such, joining variable declarations together by using the comma operator is inherently an unsafe operation. You might think of it as a run-on sentence; it's not a good thing to do in English, so why would it be good to do in JavaScript or any other programming language?
And if that's not reason enough, here's another: declaring a variable is a statement. You are stating to the compiler, "I am declaring this variable to be a variable." Use the var statement to make a statement, and use the comma operator to indicate that there are operations. (Specifically, the one and only place I can think of in which a comma operator would be appropriate is if you need a single for-loop with multiple iterator variables, e.g., for (var i = 0, j = myArray.length - 1; i = 0; i++, j--). Of course, I don't want to say you should never use it, or else I'd be like Crockford with his dogmatic "I have never seen a piece of code that was not improved by refactoring it to remove the continue statement," which is patently silly.
But, beware the comma. He is correct in that it is easy to mark programming errors with commas. If you're going to declare a variable, do everyone a favor and declare it, using var, make it feel special by giving it its own line and declaration and semicolon. It will help in maintenance down the line.
A Recent Discovery: IronJS
Since Microsoft first announced Managed JScript and shortly thereafter announced its demise (more on that here), I’ve been chomping at the bit for an implementation of JavaScript on the DLR. Actually, I’ve probably been hoping for it for far longer, because JavaScript is near and dear to my heart. If not for Brinkster offering free ASP-based web hosting back when I was just a young’un, I may never have discovered server-side programming, and may never have ended up where I am today.
IronJS is a free, open-source implementation of ECMAScript v3 on the DLR based predominantly on F#. It’s even the right kind of free, currently licensed under the Apache License version 2 (not a copyleft license). Check out this picture of its testbed application running:
As a learning exercise, I intend to figure out a good way to shoehorn this into my Battle.net (1.0) chat client, JinxBot. It actually shouldn’t be terribly difficult, but while I considered adding it as a plugin, I think I’d like it to be part of the core application instead.
I’ll more than likely be covering IronJS in several parts in my blog in the coming weeks (possibly months, since I’m getting married on May 1). But these are topics that I intend to cover:
- Hosting IronJS in a C# application, loading, and executing JavaScript code
- Sharing objects between script and .NET
- Isolating script from .NET (preventing script from having run of the whole runtime)
- Isolating script from script (running multiple environments in one application, similar to how web pages in browsers like IE and Chrome all run from one instance of the script engine but have different contexts so they have different sets of objects)
- Performance optimizations
- Dynamically generating isolation layers
- Other considerations
Stay tuned! In the meantime, check out IronJS on GitHub and get it running on your machine!
Unsung C# Hero: Closure
Today I’m going to talk about a feature of C# that has been around since 2.0 (with the introduction of anonymous delegates) but which gets nearly no lip service and, despite the fact that most C# developers have probably used it, they’ve probably used it without thinking about it. This feature is called closure, and it refers to the ability of a nested function to make reference to the surrounding function’s variables.
This article will make extensive discussion of how delegates are implemented in C#; a review may be appropriate before diving in. Also, we’ll be making a lot of use of The Tool Formerly Known as Lutz Roeder’s .NET Reflector, which is now owned by Red Gate Software.
Anonymous Methods without Closure
Supposing that I had a reason to do so, I could assign an event handler as an anonymous method. I think this is generally bad practice (there is no way to explicitly dissociate the event handler, because it doesn’t have a name), but you can:
public partial class SampleNoClosure : Form{public SampleNoClosure(){InitializeComponent();button1.Click += delegate{MessageBox.Show("I was clicked! See?");};}}
This will work as expected; on click, a small alert dialog will appear. Nothing terribly special about that, right? We could have written that as a lambda expression as well, not that it buys us anything. It looks like this in Reflector:
We see that the compiler auto-generates a method that matches the appropriate signature. Nothing here should be completely surprising.
Simple Example of Closure
Here is a sample class that includes closure. The enclosed variable is sum. You’ll note that everything just makes sense internally, right?
public partial class SimpleClosureExample : Form{public SimpleClosureExample(){InitializeComponent();int sum = 1;for (int i = 1; i <= 10; i++)sum += sum * i;button1.Click += delegate{MessageBox.Show("The sum was " + sum.ToString());};}}
So, it only makes sense that sum can be part of that anonymous function, right? But we need to bear in mind that all C# code must be statically-compiled; we can’t just calculate sum. Besides, what happens if the value was a parameter to the function? Something that couldn’t be precompiled? Well, in order to handle these scenarios, we need to think about how this will work.
In order to keep that method state alive, we need to create another object. That’s how the state can be maintained regardless of threads and regardless of calls to the function. We can see it as a nested class here, and the anonymous method looks just like it does in code:
A More Advanced Example
Whenever you work with a LINQ expression, chances are you’re using closure and anonymous functions (and lambda expressions) and don’t realize it. Consider this LINQ-to-SQL query:
int boardID = 811;int perPage = 20;int pageIndex = 0;var topics = (from topic in dc.Topicsorderby topic.IsSticky descending, topic.LastMessage.TimePosted descendingwhere topic.BoardID == boardIDselect new{topic.TopicID,Subject = topic.FirstMessage.Subject,LatestSubject = topic.LastMessage.Subject,LatestChange = topic.LastMessage.ModifiedTime,NameOfUpdater = topic.LastMessage.PosterName,Updater = topic.LastMessage.User,Starter = topic.FirstMessage.User,NameOfStarter = topic.FirstMessage.PosterName,topic.ReplyCount,topic.ViewCount}).Skip(perPage * pageIndex).Take(perPage);foreach (var topic in topics){Console.WriteLine("{0} - {1} {2} {3} {4} by {5}", topic.Subject, topic.NameOfStarter, topic.ReplyCount, topic.ViewCount, topic.LatestChange, topic.NameOfUpdater);}
The closure here is happening within the where clause; you may recall that the C# where clause evaluates to the IEnumerable<T> extension method Where(Func<TSource, bool> predicate).
Here, it’s very easy to imagine a case where we wanted to write actual parameters. This query is used to generate and display a topic list for a message board; all “stickied” posts should be at the top and the rest should be sorted by last time posted. If I’m making that into a web server control, I’m going to need to not hard-code the board ID, the number of topics per page to display, and which page I’m looking at.
Now, this is kind of a hard thing to conceptualize; when I was going through this project, I expected all three variables to be incorporated into the class. It turns out that Skip() and Take() don’t evaluate a lambda expression – they take straight values – so we don’t ultimately have to store them for evaluation later. However, as expected, boardID does have to be stored, and that provides us with an interesting question of why. And you might be asking why that is even the case; LINQ-to-SQL translates this into SQL for us:
SELECT TOP (20) [t0].[TopicID], [t2].[Subject], [t1].[Subject] AS [LatestSubject], [t1].[ModifiedTime] AS [LatestChange], [t1].[PosterName] AS [NameOfUpdater], [t4].[test], [t4].[UserID], [t4].[Username], [t4].[Email], [t4].[PasswordHash], [t6].[test] AS [test2], [t6].[UserID] AS [UserID2], [t6].[Username] AS [Username2], [t6].[Email] AS [Email2], [t6].[PasswordHash] AS [PasswordHash2], [t2].[PosterName] AS [NameOfStarter], [t0].[ReplyCount], [t0].[ViewCount]FROM [dbo].[Topics] AS [t0]LEFT OUTER JOIN [dbo].[Messages] AS [t1] ON [t1].[MessageID] = [t0].[LastMessageID]LEFT OUTER JOIN [dbo].[Messages] AS [t2] ON [t2].[MessageID] = [t0].[FirstMessageID]LEFT OUTER JOIN (SELECT 1 AS [test], [t3].[UserID], [t3].[Username], [t3].[Email], [t3].[PasswordHash]FROM [dbo].[Users] AS [t3]) AS [t4] ON [t4].[UserID] = [t1].[UserID]LEFT OUTER JOIN (SELECT 1 AS [test], [t5].[UserID], [t5].[Username], [t5].[Email], [t5].[PasswordHash]FROM [dbo].[Users] AS [t5]) AS [t6] ON [t6].[UserID] = [t2].[UserID]WHERE [t0].[BoardID] = @p0ORDER BY [t0].[IsSticky] DESC, [t1].[TimePosted] DESC
So why, if we already have the SQL generated, do we need to bother with it? Well, you may recall that LINQ-to-SQL doesn’t support all possible operators. If we break support for the LINQ-to-SQL query and we have to pull back all of the relevant items, we’ll have to use that class. At this point though, it goes unused.
Review
A closure is when you take the variables of a function and use them within a function declared inside of it – in C#, this is through anonymous delegates and lambda expressions. C# typically will accomplish the use of closures by creating an implicit child class to contain the required state of the function as it executes, handing off the actual method to the contained class.
Further Reading
- Bill Wagner: Looking Inside C# Closures
- C# in Depth: The Beauty of Closures
My C# 4.0 Wishlist Part 6: Automatic Properties for Enum Variables
OK, so I lied; I'm not stopping at 5 parts.
I've been working with enumerations frequently lately; the Battle.net chat protocol is binary and therefore the values that come over the wire have different contextual meanings based on the values that might have preceded them. For example, a chat message event actually can have about a dozen meanings; it can be a server-broadcasted message, a message from another user, or just an announcement that a user joined the channel. In addition to the standard values identifying things like message type, messages typically have one form or another of flags; if the event is based on a user, the flags contain information about the user's status on the server (whether the user is an administrator or has operator privileges in the channel). Others, such as channel information updates, contain information about the chat channel itself, such as whether it is public, silent, or otherwise normal.
The Problem
Having had to deal with enumerations frequently has made me hate code like this:
1: if (((UserFlags)e.Flags & UserFlags.ChannelOperator) == UserFlags.ChannelOperator)
Especially when working with bitwise values (enumerations decorated with the [Flags] attribute), because of the specific operator precedence constraints that C# places on the developer, this becomes annoying quickly. So much so, that classes where I have to do that frequently end up with several TestFlag() methods, but even these are limited. Consider code like this:
1: bool TestFlag(UserFlags test, UserFlags reference) { ... }
2: bool TestFlag(ChannelFlags test, ChannelFlags reference { ... }
Or this:
1: bool TestFlag<T>(T test, T reference) {
2: // hard to implement since no meaningful type constraint can be placed on T
3: }
Or this:
1: bool TestFlag(int test, int reference) { ... }
In proposition 1 we have to implement n methods, either repeatedly or in a globally-defined, internal utility class; that stinks. Proposition 2 is difficult to implement; we can't place a type constraint because C# doesn't allow enum type constraints, and since enums have a type constraint themselves of always being an integral value, this would be ideal; but type constraints in this case are limited to struct, which doesn't guarantee operator | or operator &. In proposition 3, every time we want to test, we need to cast to int (or long) and lose type information. I guess that works, but then you worry that you end up with code like this:
1: if (TestFlag((int)e.User.Flags, (int)UserFlags.ServerAdministrator))
2: {
3: // ...
4: }
5: else if (TestFlag((int)e.User.Flags, (int)UserFlags.ChannelOperator))
6: {
7: // ...
8: } // ...
No, there's a cleaner solution, and, like the compiler features added to C# 3.0, it doesn't require a new CLR: automatic properties on enumerations.
The Solution
Internally, enumerations are treated as their base numeric type by the CLR; the variable itself carries around type information, but it's not strong and can be changed by direct casting. But the compiler always knows the type of a local variable and can apply it directly. So, consider applying a property to an enumeration variable called IsEnumField. Consider this [Flags] enumeration, and look at the code that uses it when using this style of coding:
1: if (e.User.Flags.IsNone)
2: { }
3: else if (e.User.Flags.IsBlizzardRepresentative || e.User.Flags.IsBattleNetAdministrator)
4: { }
5: else if (e.User.Flags.IsChannelOperator
6: { }
7: else if (e.User.Flags.IsNoUDP)
8: { }
We can easily identify the pattern that the compiler supports; prefix "Is" to the field name and perform the underlying logic.
The great part about this solution is that the emitted code is exactly the same as what you or I would produce right now. So the compiler can know by its clever compiler tricks to do this:
1: if (e.User.Flags == UserFlags.None) { }
2: else if ((e.User.Flags & UserFlags.BlizzardRepresentative) == UserFlags.BlizzardRepresentative
3: || (e.User.Flags & UserFlags.BattleNetAdministrator) == UserFlags.BattleNetAdministrator) { }
4: else if ((e.User.Flags & UserFlags.ChannelOperator) == UserFlags.ChannelOperator) { }
5: else if ((e.User.Flags & UserFlags.NoUDP) == UserFlags.NoUDP) { }
In that example, I qualified the type name UserFlags nine times. Can you say "carpal tunnel"?
Future-Proofing
There are some considerations to make about this. First, there are already going to be some enumerations in the wild with field names that begin with "Is," and it could very easily raise confusion if someone sees code such as user.Flags.IsIsOnline. Fortunately, the solution is equally simple: create a decorator attribute, just like we did for extension methods:
1: namespace System
2: {
3: [AttributeUsage(AttributeTargets.Enum)]
4: public sealed class EnumPropertiesAttribute : Attribute { }
5: }
Then, when you create an enumeration that you'd like to expose these style of properties, simply decorate the enumeration with this attribute. IntelliSense knows to show the properties, the compiler knows to translate the properties, and we're in the free and clear.
Wouldn't it be great?
The C# 4.0 Wishlist series
My C# 4.0 Wishlist, Part 5 : The raise Keyword
One of the more obscure features of C# is the ability to specify custom overloads for adding and removing event registration similarly to properties, via the add and remove keywords. Known as "event accessors," they implement the parts of event registration that the C# compiler normally handles. You didn't think that that += operator was implemented on the type, did you?
1: class Test
2: {
3: public event EventHandler Event1;
4:
5: private EventHandler ev2;
6: public event EventHandler Event2
7: {
8: add
9: {
10: if (ev2 != null)
11: ev2 = (EventHandler)Delegate.Combine(ev2, value);
12: else
13: ev2 = value;
14: }
15: remove
16: {
17: if (ev2 != null)
18: ev2 = (EventHandler)Delegate.Remove(ev2, value);
19: }
20: }
21: protected virtual void OnEvent2(EventArgs e)
22: {
23: if (ev2 != null)
24: ev2(this, e);
25: }
26: }
27:
This pattern is actually used extensively throughout the Windows Forms library, where controls add event handlers to base event handler collections implemented within a hashtable. I can only surmise that this is done to prevent having dozens of event fields cluttering up the classes.
Now, if we were to compile this app and disassemble it in Reflector, we'd get a very similar picture to what we've got. Reflector would show the compiler-generated add/remove blocks for Event1, though not when the event declaration is selected, and it also indicates that there are compiler directives that show the event accessors are synchronized.
Visual Basic .NET also supports this pattern, but adds an additional keyword: the RaiseEvent keyword:
1: Public Class Test
2: Public Event Event1 As EventHandler
3:
4: Private ev2 As EventHandler
5: Public Custom Event Event2 As EventHandler
6: AddHandler(ByVal value As EventHandler)
7: If Not ev2 Is Nothing Then
8: ev2 = CType(System.Delegate.Combine(ev2, value), EventHandler)
9: Else
10: ev2 = value
11: End If
12: End AddHandler
13:
14: RemoveHandler(ByVal value As EventHandler)
15: If Not ev2 Is Nothing Then
16: ev2 = CType(System.Delegate.Remove(ev2, value), EventHandler)
17: End If
18: End RemoveHandler
19:
20: RaiseEvent(ByVal sender As Object, ByVal e As System.EventArgs)
21: ev2(sender, e)
22: End RaiseEvent
23: End Event
24:
25: Protected Overridable Sub OnEvent2(ByVal e As EventArgs)
26: If Not ev2 Is Nothing Then
27: RaiseEvent Event2(Me, e)
28: End If
29: End Sub
30: End Class
In this example, Visual Basic allows you to implement exactly how Event2 is raised. When I look at this in Reflector to see how C# uses this, here's what I see:
Reflector gives C# the raise keyword. Why haven't the C# language experts done so?
How would this be worthwhile? Well, suppose that we're building an application that can have plugins. We don't know that plugins are always going to work correctly, so when they handle an event, they may raise an exception. The problem is, if an event is invoked and the first event handler causes an exception, none of the successive handlers will be invoked.
Arguably, the "state of the application is undefined after an exception is raised, so we should gracefully exit." But that's not always the case! What if the way to gracefully do this is to analyze the stack trace within the application, determine which plugin caused the exception, and unload the plugin? We can't do any of this from C#.
Give us the raise keyword!
This is the end of my "C# 4.0 Wishlist" series. For reference, here are the other articles:
My C# 4.0 Wishlist, Part 4 : Constant typeof() Expressions
Along with some of the hacks I introduced into ShinyDesign, there was a problem using a generic parameter as an enum - I couldn't cast it back to an integral type, even System.UInt64, because T was not guaranteed to be an integral value (yet again why we should allow a type constraint, but I digress).
In any case, there have been cases where I'd like to, for instance, switch against a Type, particularly since incorporating generics. Consider:
1: switch (typeof(T).GetUnderlyingType())
2: {
3: case typeof(byte):
4: case typeof(sbyte):
5: break;
6: case typeof(short):
7: case typeof(ushort):
8: break;
9: case typeof(int):
10: case typeof(uint):
11: break;
12: case typeof(long):
13: case typeof(ulong):
14: break;
15: }
This is MUCH cleaner than the alternative, current implementation:
1: Type t = typeof(T).GetUnderlyingType();
2: if (t == typeof(byte) || t == typeof(sbyte))
3: { }
4: else if (t == typeof(short) || t == typeof(ushort))
5: { }
6: else if (t == typeof(int) || t == typeof(uint))
7: { }
8: else if (t == typeof(long) || t == typeof(ulong))
9: { }
So this is a working example of how the syntax would be cleaner by allowing us to use the typeof expression result as a constant value. If you've never tried this, the compiler complains. Given this code:
155: switch (t)
156: {
157: case typeof(int):
158: case typeof(uint):
159: break;
160: }
I get:
EnumTypeConverter.cs(155,21): error CS0151: A value of an integral type expected
EnumTypeConverter.cs(157,22): error CS0150: A constant value is expected
EnumTypeConverter.cs(158,22): error CS0150: A constant value is expected
I'm sure you've switched over a string, though - it's one of the nice syntactical features of C#. You might be wondering why, if switching over a string is possible, then why not a Type?
Switching on a string doesn't switch on a string - it shoots the strings into a Dictionary<string, int>, stores the offsets, and then uses a jump table with the IL switch instruction:
Yeah, obviously there's a lot of opportunity to misuse the typeof expressions. But there are going to be legit uses, too, and honestly - if C# can have a compiler trick for strings, it can have a compiler trick for types. And let's be honest - typeof() expressions aren't ever going to return different values for the same app (that's why people were locking types to synchronize across an AppDomain).
This - like the inability to constrain a type constraint to an enum - is an artificial constraint that really shouldn't be there.
My C# 4.0 Wishlist, Part 3: The Return of Const-ness
In C++, I can decorate member functions with the const modifier, which indicates that calling the member function will not modify the internal state of the object. Here's a sample class definition:
Test.h:
1: class CTest
2: {
3: private:
4: int m_nVal;
5:
6: public:
7: CTest(void);
8: ~CTest(void);
9: int GetValue() const;
10: void SetValue(int value);
11: int Add(int value) const;
12: };
Test.cx:
1: #include "Test.h"
2:
3: CTest::CTest(void)
4: {
5: }
6:
7: CTest::~CTest(void)
8: {
9: }
10:
11: int CTest::Add(int value) const
12: {
13: return value + m_nVal;
14: }
15:
16: int CTest::GetValue() const
17: {
18: return m_nVal;
19: }
20:
21: void CTest::SetValue(int value)
22: {
23: m_nVal = value;
24: }
This example demonstrates wrapping an integer value, and shows how GetValue() and Add() can be const by not modifying any internal values. Now, if I change the Add method to a void type, and add the value to the internal state, I get a compiler error. Here's the updated method:
1: void CTest::Add(int value) const
2: {
3: return SetValue(value + m_nVal);
4: }
Error:
error C2662: 'CTest::SetValue' : cannot convert 'this' pointer from 'const CTest' to 'CTest &'
I get a similar error (about lvalue type casting) if I just set the value within the Add method.
So how should this apply in C#? Realistically, I think I'd like it to just apply to member functions and properties. There are a lot of ways to use const in C and C++ - it's almost scary, actually (could you imagine using one parameter and having three const modifiers?). In C#, I'd just like it to be part of the method contract:
1: public class Class1
2: {
3: private string m_firstName, m_lastName;
4: private int m_val;
5:
6: public int Value
7: {
8: get const
9: {
10: return m_val;
11: }
12: set
13: {
14: m_val = value;
15: }
16: }
17:
18: public string GetName() const
19: {
20: return string.Format("{0}, {1}", m_lastName, m_firstName);
21: }
22: }
In both of these examples, we can tell that the internal state of the object itself isn't modified (note that the const modifier only applies to the get method of the Value property). It provides the user of the class additional information, and it helps to enforce the contract on the side of the class author.
Implementation in the compiler: add a System.Runtime.CompilerServices.ConstMethodAttribute and apply it to the methods as marked. Add a static code analysis rule that checks to see if a method could be marked as const, and if so, flag a warning.
I don't know that there are compiler optimizations that can be made, but one way or another, I think that it's a good method with which to give additional information about method implementations. Sometimes we don't want to call properties or methods if we know that it can cause side effects, because let's be honest: the base class library's documentation isn't always 100% clear. That's why we need tools like .NET Reflector. One more tool to help our code be self-documenting is one more good thing.
My C# 4.0 Wishlist, Part 1 : Eliminate Type Constraint Constraints
C# 2.0 introduced a great new feature to the .NET Type system: generics. Generics are really cool in that they allow you to define template classes; I can use a single class definition to provide a strongly typed collection, for example. They enable some other tricks that I would tend to consider something of a "hack" as well; for example, this expression evaluates to true:
1: typeof(IEnumerable<int>) != typeof(IEnumerable<double>)
This expression is nice because there are some odd class design decisions in places like the PropertyGrid's type structure. For example, in order to add a PropertyTab to the list of the PropertyGrid's tabs, you need to add a Type to the PropertyTabCollection exposed by the PropertyGrid's PropertyTabs property. The PropertyGrid caches the tabs that it creates, and so you can't add a single Type to create two property tabs. Consequently, even if you override the CreateTab method, you can't expect to add two tabs with the same Type.
My solution, then, was to create ExtensionPropertyTab<T>. This class's type parameter is utterly useless; I create an arbitrary Type using Reflection Emit, close my ExtensionPropertyTab generic type definition with it, and then add the PropertyTab with that closed type. Works great! This stuff will be in an upcoming blog post about my PropertyGridEx project.
All of that is leading up to my next hack and, ultimately, my wishlist item #1 for C# 4.0.
There's a simple design-time class called EnumConverter. This class is the default type converter for all enumeration types; EnumConverter is what displays the Enum names in the property grid when you're choosing items. I'm creating a type surrogate class that allows you to customize the names of properties, and I've also been working on displaying better values for enumerations. To this end, I created the EnumTypeConverter<T> class - this class provides enumeration names, but also retrieves friendly names from attributes on each enum entry. Using generics, I'm able to cache the friendly names so that reflection only needs to be invoked once; System.Enum does something similar.
What I'd like to do, however, is say this:
1: public class EnumTypeConverter<T> : TypeConverter where T : enum
C# doesn't allow me to do this. I get two errors:
error CS1031: Type expected
error CS1001: Identifier expected
So I try using the type name instead:
1: public class EnumTypeConverter<T> : TypeConverter where T : Enum
C# doesn't like this either:
error CS0702: Constraint cannot be special class 'System.Enum'
Why? I get the same error with System.ValueType, even though ultimately it means the same thing as "struct" (though I could understand this difference). But I can't do this with System.Delegate either (how about calling Invoke() or BeginInvoke() on T?).
Being able to specify Enum as a base would allow me to:
- Explicitly cast between T and integral numeric types.
- Specify 0 as the default value of T rather than using default(T).
- Perform bitwise operations on them
There's really no reason to have a constraint like "Constraint cannot be special class 'System.Enum'." Let's eliminate this artificial barrier - there shouldn't be any changes needed to be made to the CLR.
My C# 4.0 Wishlist, Part 2 : Default/Optional Parameters
When I was first getting into C# (about .NET 1.0 Beta 2), I saw that it didn't support optional parameters. The explanation was simple enough: method overloads supported an alternative method of default or optional parameters. I thought that it was probably a useful choice. But, check this out:
1: public static class MessageBox
2: {
3: static void Show(string message)
4: {
5: Show(message, null, MessageBoxButtons.OK, MessageBoxIcon.Information);
6: }
7:
8: static void Show(string message, string title)
9: {
10: Show(message, title, MessageBoxButtons.OK, MessageBoxIcon.Information);
11: }
12:
13: static void Show(string message, string title, MessageBoxButtons buttons, MessageBoxIcon icon)
14: {
15: // actually perform the showing
16: }
17:
18: // and more overloads
19: }
This is pretty lame, isn't it? Why can't I just do everything with a single method?
1: public static class MessageBox
2: {
3: static void Show(string message, string title = "", MessageBoxButtons button = MessageBoxButtons.OK,
4: MessageBoxIcon icon = MessageBoxIcon.Information);
5: }
So, the question is, how precisely could this work? Well, let's take a look at how this works in Visual Basic.
1: Public Class MessageBox
2: Public Shared Sub Show(ByVal message As String, Optional ByVal title As String = "", _
3: Optional ByVal buttons As MessageBoxButtons = MessageBoxButtons.OK, _
4: Optional ByVal icon As MessageBoxIcon = MessageBoxIcon.Information)
5: End Sub
6: End Class
Visual Basic turns this into a method and annotates the parameter list with attributes. In C# we could express it like this:
1: public static class MessageBox
2: {
3: static void Show(string message,
4: [Optional]
5: [DefaultParameterValue("")]
6: string title,
7: [Optional]
8: [DefaultParameterValue(MessageBoxButtons.OK)]
9: MessageBoxButtons buttons,
10: [Optional]
11: [DefaultParameterValue(MessageBoxIcon.Information)]
12: MessageBoxIcon icon)
13: {
14:
15: }
16: }
In languages that support optional parameters, the compiler provides parameters, so that a call that leaves off optional parameters looks (in IL) like a call that included the parameters.
I suggest that we use a compiler trick - dump the attributes and actually implement the overloads. This has the awesome benefit of being entirely compiler-dependent and entirely backwards-compatible even to .NET 2.0. We can even include one overload with the attributes included, so that development tools and compilers that use the attributes can tell the user about the optional parameter information, and the existing compilers can compile against a library using the new methods.
Consider this overload based on the above demonstrated Show method.
1: public static class MessageBox
2: {
3: static void Show(string message, MessageBoxIcon icon)
4: {
5: Show(message, "", MessageBoxButtons.OK, icon);
6: }
7: }
This function has the advantage of being inline-able, even if it means a slightly (and I mean ever-so-slightly) hit to file size. I'm not saying it'd be good or effective to have 10 optional parameters - just that it wouldn't be bad to have a few.
Finally, my suggestion for the syntax of how this all should work out - use the default keyword for each item when you want to specify options:
1: void Go()
2: {
3: MessageBox.Show("This is a test.", default, MessageBoxButtons.OK);
4: }
CLI implementation provides an attribute on the actual implementing method so that we can figure out which is the actual default value - the compiler then has the option of whether to call substituting in the actual value (as it's implemented in VB now) or call the correct overload (which is what the current C# compiler would do).
