Running with Code Like with scissors, only more dangerous


Exploring .dbc files with C# Dynamic Code Generation, part 1: Defining the problem

Posted by Rob Paveza

You know, I look back at my blog after all these years and the particularly infrequent updates and reflect a little bit on just how much things have changed for me. I know that right now I want to be writing code using back-ticks because I'm so accustomed to writing Markdown. But that's neither here nor there.

I recently published a project called DbcExplorer on GitHub. This was just a pet project I'd been working on during the World of Warcraft: Mists of Pandaria timeline; I'd just joined Microsoft and had my Windows Phone, but there's no Mobile Armory on Windows Phone (or even Big Windows, for that matter). A little bit of background: World of Warcraft stores simple databases in files with an extension of .dbc or .db2; these databases allow rapid lookup by ID or simple fast enumeration. There are a myriad number of them, and they commonly change from version to version. The reason I wanted them was to be able to crawl item information and achievement information for the purpose of creating a miniature Mobile Armory for Windows Phone, that could at least tell you which achievements you were lacking, and people could vote on which achievements were easiest, so that you could quickly boost your score.

(Side note: When Warlords of Draenor was released, Blizzard changed their storage archive format from MPQ to CASC. Ladislav Zezula, who created StormLib, which was a C library for accessing MPQ files, had made some progress at the time at CASC as well. However, I couldn't get it to work at the time, so I stopped working on this project. Ladik and I recently figured out what the disconnect was, and I've now wrapped his CascLib into CascLibSharp, but I don't know that I'll be resurrecting the other project).

Anyway, DBC files are pretty easy. They have a header in the following form:

uint32        Magic 'WDBC', 'WDB2', or 'WCH2'
uint32        Number of records
uint32        Number of columns per record
uint32        Number of bytes per record (always 4x # of columns as far as I can tell)
uint32        String block length

The files that aren't of type 'WDBC' have a few additional fields, but the general structure is the same. The files then have the general form:

DbcHeader     Header
Record[Count] Records
uint8         0  (Start of string table, a 0-length string)
uint8[]       String block (UTF-8 encoded, null-terminated strings)

Each column is one of:

  • Int32
  • Float32
  • String (an int32-offset into the String Table)
  • "Flags" (a uint32 but usually has a fixed set of bit combinations)
  • Boolean (just 0 or 1)

So this pretty well defines the problem space. We need to support deserializing from this binary format into plain objects, so that I can say I have a DbcTable<T>, and my runtime will be able to enumerate the records in the table. Now, because the CLR doesn't guarantee how the properties on objects will be enumerated (at least to the best of my knowledge); it probably keeps a consistent order based on some ethereal thing, but I don't know what that order is based on, so before I go, I probably have to do something.

Briefly, let's look at DBFilesClient\CharTitles.dbc. This file (at least as of the most recent patch) has six columns. I don't know for sure, but it looks like the following:

Column    Type       Description
0         Int32      ID
1         Int32      Required achievement ID
2         String     Title
3         String     Title, repeated
4         Int32      Unknown, just seems to continuously increase
5         Int32      Reserved (all records have 0)

Since I don't know what to do with columns 3-5, I can just define the following class:

public class CharacterTitleRecord
    public int ID;
    public int RequiredAchievementID;
    public string Title;

Next time: We'll see how the naïve implementation deserializes each record.