Running with Code Like with scissors, only more dangerous


Exploring .dbc files with Dynamic Code Generation, part 4: Optimizing string reads

So far, we’ve written a simple parser for the .dbc file format. I’ve outlined that the .db2 file format is the same in principle, primarily different in the header format.

We do know that premature optimization is the root of all evil. However, I’m going to wager that this optimization is not premature. Let’s consider Item-sparse.db2 for a moment; it has 101 columns and a string table that is 2,762,301 bytes long.

Recall that a string-typed column in .dbc is stored as an integer offset into the string block, a region of UTF-8 encoded strings at the end of the table. This is nice for a lot of reasons; the tables can be read very fast because rows are all the same size, strings can be efficiently localized, and the strings don’t need to be decoded until they’re needed. This last optimization is the one we’re going to look at today.

Delay-loading the strings is a nice optimization. UTF-8 strings, when they contain non-US characters, do not have uniform character lengths. Reading them inline while we read the tables is likely to cause many processor cache misses. How can we optimize? By bypassing them altogether, of course. The simplest way would be to create a property such as “TitleStringOffset” instead of “Title” for these properties, call them integers, and expect the user to call DbcTable.GetString. That would be a fine approach, but in my opinion, would leak implementation details to the user.

Instead, let’s wrap that information into an object – call it DbcStringReference – and allow that string to be retrieved later. What would we need in order to retrieve it later? A reference to the DbcTable that produced it, and the integer offset into the string table. Storing the reference to the DbcTable would keep the DbcTable from being garbage-collected as long as I held any of my entities, so we’ll use a WeakReference instead.

public class DbcStringReference
        private WeakReference<DbcTable> _owner;
        private int _pos;
        private Lazy<string> _val;

        internal DbcStringReference(DbcTable owner, int position)
            if (owner == null)
                throw new ArgumentNullException("owner");
            if (position < 0)
                throw new ArgumentException("position");

            _owner = new WeakReference<DbcTable>(owner);
            _pos = position;
            _val = new Lazy<string>(() =>
                DbcTable table;
                if (!_owner.TryGetTarget(out table))
                    throw new ObjectDisposedException("DbcTable");

                return table.GetString(_pos);

        public override string ToString()
            return _val.Value;

There really isn’t anything stellar here. We validate the constructor arguments, take a weak reference to the owner table, and then create a Lazy<string> which tries to resolve the strong reference, throws if that fails, then returns the string. Once the string has actually been retrieved, it is reused in future instances. Because the value of the string is exposed via the ToString override, string formatting methods like Console.WriteLine and string.Format automatically get the correct value (as do concatenation with strings and usage in UIs).

In order to add DbcStringReference to the list of supported types, we’re going to need to make some modifications to our non-compiled and our compiled code sets. Fortunately, the ConvertSlow method itself doesn’t need to change; all we need to change is the TargetInfo.SetValue<TTarget> method. Add this to the switch statement:

                    case TargetType.StringReference:
                        DbcStringReference sref = new DbcStringReference(table, inputVal);
                        SetValue(target, sref);

Don’t forget to add a StringReference value to your TargetType enum:

        internal enum TargetType

And add in that return value to GetTargetTypeFromType:

        internal static TargetType GetTargetTypeFromType(Type type)
            if (type == typeof(int))
                return TargetType.Int32;
            if (type == typeof(float))
                return TargetType.Float32;
            if (type == typeof(DbcStringReference))
                return TargetType.StringReference;
            if (type == typeof(string))
                return TargetType.String;

            throw new InvalidDataException("Invalid data type.");

Let’s force slow mode on. Here’s a record definition for item-sparse.db2. (Yes, this is code-generated; I found the updated source).

public class Itemsparsedb2

        /// <summary>ID</summary>
        public int ID;

        /// <summary>Quality</summary>
        public int Quality;

        /// <summary>Flags</summary>
        public int Flags;

        /// <summary>Flags2</summary>
        public int Flags2;

        /// <summary>Column4</summary>
        public int Column4;

        /// <summary>Column5</summary>
        public float Column5;

        /// <summary>Column6</summary>
        public float Column6;

        /// <summary>Column7</summary>
        public int Column7;

        /// <summary>Price</summary>
        public int Price;

        /// <summary>SellPrice</summary>
        public int SellPrice;

        /// <summary>Column10</summary>
        public int Column10;

        /// <summary>Column11</summary>
        public int Column11;

        /// <summary>Column12</summary>
        public int Column12;

        /// <summary>ItemLevel</summary>
        public int ItemLevel;

        /// <summary>Column14</summary>
        public int Column14;

        /// <summary>Column15</summary>
        public int Column15;

        /// <summary>Column16</summary>
        public int Column16;

        /// <summary>Column17</summary>
        public int Column17;

        /// <summary>Column18</summary>
        public int Column18;

        /// <summary>Column19</summary>
        public int Column19;

        /// <summary>Column20</summary>
        public int Column20;

        /// <summary>Column21</summary>
        public int Column21;

        /// <summary>Column22</summary>
        public int Column22;

        /// <summary>Column23</summary>
        public int Column23;

        /// <summary>Column24</summary>
        public int Column24;

        /// <summary>Column25</summary>
        public int Column25;

        /// <summary>Column26</summary>
        public int Column26;

        /// <summary>Column27</summary>
        public int Column27;

        /// <summary>Column28</summary>
        public int Column28;

        /// <summary>Column29</summary>
        public int Column29;

        /// <summary>Column30</summary>
        public int Column30;

        /// <summary>Column31</summary>
        public int Column31;

        /// <summary>Column32</summary>
        public int Column32;

        /// <summary>Column33</summary>
        public int Column33;

        /// <summary>Column34</summary>
        public int Column34;

        /// <summary>Column35</summary>
        public int Column35;

        /// <summary>Column36</summary>
        public int Column36;

        /// <summary>Column37</summary>
        public int Column37;

        /// <summary>Column38</summary>
        public int Column38;

        /// <summary>Column39</summary>
        public int Column39;

        /// <summary>Column40</summary>
        public int Column40;

        /// <summary>Column41</summary>
        public int Column41;

        /// <summary>Column42</summary>
        public int Column42;

        /// <summary>Column43</summary>
        public int Column43;

        /// <summary>Column44</summary>
        public int Column44;

        /// <summary>Column45</summary>
        public int Column45;

        /// <summary>Column46</summary>
        public int Column46;

        /// <summary>Column47</summary>
        public int Column47;

        /// <summary>Column48</summary>
        public int Column48;

        /// <summary>Column49</summary>
        public int Column49;

        /// <summary>Column50</summary>
        public int Column50;

        /// <summary>Column51</summary>
        public int Column51;

        /// <summary>Column52</summary>
        public int Column52;

        /// <summary>Column53</summary>
        public int Column53;

        /// <summary>Column54</summary>
        public int Column54;

        /// <summary>Column55</summary>
        public int Column55;

        /// <summary>Column56</summary>
        public int Column56;

        /// <summary>Column57</summary>
        public int Column57;

        /// <summary>Column58</summary>
        public int Column58;

        /// <summary>Column59</summary>
        public int Column59;

        /// <summary>Column60</summary>
        public int Column60;

        /// <summary>Column61</summary>
        public int Column61;

        /// <summary>Column62</summary>
        public int Column62;

        /// <summary>Column63</summary>
        public int Column63;

        /// <summary>Column64</summary>
        public int Column64;

        /// <summary>Column65</summary>
        public int Column65;

        /// <summary>Column66</summary>
        public int Column66;

        /// <summary>Column67</summary>
        public int Column67;

        /// <summary>Column68</summary>
        public int Column68;

        /// <summary>Column69</summary>
        public int Column69;

        /// <summary>Name</summary>
        public string Name;

        /// <summary>Name2</summary>
        public string Name2;

        /// <summary>Name3</summary>
        public string Name3;

        /// <summary>Name4</summary>
        public string Name4;

        /// <summary>Description</summary>
        public string Description;

        /// <summary>Column75</summary>
        public int Column75;

        /// <summary>Column76</summary>
        public int Column76;

        /// <summary>Column77</summary>
        public int Column77;

        /// <summary>Column78</summary>
        public int Column78;

        /// <summary>Column79</summary>
        public int Column79;

        /// <summary>Column80</summary>
        public int Column80;

        /// <summary>Column81</summary>
        public int Column81;

        /// <summary>Column82</summary>
        public int Column82;

        /// <summary>Column83</summary>
        public int Column83;

        /// <summary>Column84</summary>
        public int Column84;

        /// <summary>Column85</summary>
        public int Column85;

        /// <summary>Column86</summary>
        public int Column86;

        /// <summary>Column87</summary>
        public int Column87;

        /// <summary>Column88</summary>
        public int Column88;

        /// <summary>Column89</summary>
        public int Column89;

        /// <summary>Column90</summary>
        public int Column90;

        /// <summary>Column91</summary>
        public int Column91;

        /// <summary>Column92</summary>
        public int Column92;

        /// <summary>Column93</summary>
        public int Column93;

        /// <summary>Column94</summary>
        public int Column94;

        /// <summary>Column95</summary>
        public int Column95;

        /// <summary>Column96</summary>
        public int Column96;

        /// <summary>Column97</summary>
        public int Column97;

        /// <summary>Column98</summary>
        public float Column98;

        /// <summary>Column99</summary>
        public int Column99;

        /// <summary>Column100</summary>
        public int Column100;

        /// <summary>Column101</summary>
        public int Column101;

Running this in our test harness with item-sparse.db2, enumerating all the records 5 times, this takes 173959ms. Replacing the five string properties with DbcStringReference reduces the time to 173150ms. Hardly any improvement! Now, let’s add in dynamic compilation support.

Add this below the MethodInfo declarations in DbcTableCompiler:

private static ConstructorInfo DbcStringReference_ctor = typeof(DbcStringReference).GetConstructor(BindingFlags.NonPublic | BindingFlags.Instance, null, new Type[] { typeof(DbcTable), typeof(int) }, null);

Add this into the EmitTypeData in DbcTableCompiler:

                case TargetType.StringReference:
                    generator.EmitCall(OpCodes.Callvirt, BinaryReader_ReadInt32, null);
                    generator.Emit(OpCodes.Newobj, DbcStringReference_ctor);

And that’s it. The test harness running with string properties takes 13715ms. But… the test harness running with DbcStringReference takes only 2896ms!

Why such a substantial difference? I don’t know for sure, but I can make a supposition. The constructor reference for DbcStringReference can be inlined. That allows the processor cache to be highly efficient. Since we’re not digging into the Encoding.GetString method, we’re saving even more time. Of course, we don’t actually have a copy of the string yet, but if I’m looking for an item by its ID, I’m saving quite a bit of time.

So in summary:

  • String fields, no dynamic compilation: 173959ms
  • DbcStringReference fields, no dynamic compilation: 173150ms, 0.5% improvement
  • String fields, no dynamic compilation: 13715ms, 92.1% improvement
  • DbcStringReference fields, dynamic compilation: 2896ms, 98.3% improvement

Next time? Not sure yet. I’ll get back to you.

Comments (0) Trackbacks (0)

No comments yet.

Leave a comment

ERROR: si-captcha.php plugin says GD image support not detected in PHP!

Contact your web host and ask them why GD image support is not enabled for PHP.

ERROR: si-captcha.php plugin says imagepng function not detected in PHP!

Contact your web host and ask them why imagepng function is not enabled for PHP.

No trackbacks yet.