05 May 2008

When is a bitmap not an image?

I've been reading Programming Pearls lately and I'm really enjoying it. The book is a collection of thought-provoking articles on fundamental topics in programming and software development. (That's right: it's Pearls, not Perl) To be fair, I like going back to basics and discussing things like algorithms and data structures.

Being a light and easy read (unlike so many dry comp-sci books out there) it's tempting to just whiz through the articles in a few days. However, the true benefit comes from working on the end-of-chapter exercises and actually thinking about the questions the author proposes throughout the book.

Before you start to wonder, I'm not about to turn this into a book review. What I really want to do is put my proverbial money where my proverbial mouth is. I decided to tackle one of the first exercises in the book and write my own implementation of a Bitmap data structure. Don't worry, I promise not to bore you with all the details, I'll just focus on the interesting bits.

Let's get started with this Bitmap thing. That's an image file format, right? Well, not quite. In this case, we're talking about an incredibly space-efficient way of storing numbers. I don't know about you, but I was blown away by just how simple and ingenious it is. Sure, it only works for sets of unique integers but, even then, how come I'd never heard of it before? I must've over-slept and missed that class at university. Come on, I even had to implement a red-black tree. I think the Bitmap deserved at least a brief mention.

Incidentally, after learning about this other Bitmap, I realized I was already
indirectly dealing with it in one of the tools I use on a daily basis. You might have seen it there too: SQL Server uses a Bitmap in the execution plan for some types of queries. And you thought the whole thing was just an academic brain teaser.

Anyway, the gist of the structure is that instead of storing the numbers themselves, we store an ordered checklist of the numbers. If we want to store a certain number, we check it on the list. Otherwise, we leave that number's position unchecked. As you can imagine, it only takes one bit to store each checked/unchecked state. Pack those bits into bytes and you've put a lot of data into a very small space.

I decided to store everything in a simple byte array. Although a byte array in C# knows how long it is, I had to keep track of the length of the whole bitmap since the last byte might only be partially used. Any implementation I could think of would have to deal with the bitwise operations to set and unset the bits inside the bytes so that was my first challenge. I brushed up on the mechanics of ANDing and ORing bytes (it's not that complicated — just find the right byte in the array, then set up a mask to single out the bit you want in there) and implemented the two fundamental operations of my data structure as a bool-valued indexer:

public bool this[int index] { get; set; }

My initial implementation used a fixed length for the array that was set when the object was created. Auto-expanding collections are all the rage these days so making the bitmap resize itself on the fly was the next item on my list. That was enough to obtain a pretty usable collection class. Did it work? It sure did! As predicted, it took up very little memory and it was also really fast.

I could keep tweaking the class and adding more features but I think that's enough for a 30 minute exploratory coding session. That's right, it only took about that long to get the basic pieces in place. Now, if only writing about it were as easy as that...

No comments: