05 May 2008

When is a bitmap not an image?

I've been reading Programming Pearls lately and I'm really enjoying it. The book is a collection of thought-provoking articles on fundamental topics in programming and software development. (That's right: it's Pearls, not Perl) To be fair, I like going back to basics and discussing things like algorithms and data structures.

Being a light and easy read (unlike so many dry comp-sci books out there) it's tempting to just whiz through the articles in a few days. However, the true benefit comes from working on the end-of-chapter exercises and actually thinking about the questions the author proposes throughout the book.

Before you start to wonder, I'm not about to turn this into a book review. What I really want to do is put my proverbial money where my proverbial mouth is. I decided to tackle one of the first exercises in the book and write my own implementation of a Bitmap data structure. Don't worry, I promise not to bore you with all the details, I'll just focus on the interesting bits.

Let's get started with this Bitmap thing. That's an image file format, right? Well, not quite. In this case, we're talking about an incredibly space-efficient way of storing numbers. I don't know about you, but I was blown away by just how simple and ingenious it is. Sure, it only works for sets of unique integers but, even then, how come I'd never heard of it before? I must've over-slept and missed that class at university. Come on, I even had to implement a red-black tree. I think the Bitmap deserved at least a brief mention.

Incidentally, after learning about this other Bitmap, I realized I was already
indirectly dealing with it in one of the tools I use on a daily basis. You might have seen it there too: SQL Server uses a Bitmap in the execution plan for some types of queries. And you thought the whole thing was just an academic brain teaser.

Anyway, the gist of the structure is that instead of storing the numbers themselves, we store an ordered checklist of the numbers. If we want to store a certain number, we check it on the list. Otherwise, we leave that number's position unchecked. As you can imagine, it only takes one bit to store each checked/unchecked state. Pack those bits into bytes and you've put a lot of data into a very small space.

I decided to store everything in a simple byte array. Although a byte array in C# knows how long it is, I had to keep track of the length of the whole bitmap since the last byte might only be partially used. Any implementation I could think of would have to deal with the bitwise operations to set and unset the bits inside the bytes so that was my first challenge. I brushed up on the mechanics of ANDing and ORing bytes (it's not that complicated — just find the right byte in the array, then set up a mask to single out the bit you want in there) and implemented the two fundamental operations of my data structure as a bool-valued indexer:

public bool this[int index] { get; set; }

My initial implementation used a fixed length for the array that was set when the object was created. Auto-expanding collections are all the rage these days so making the bitmap resize itself on the fly was the next item on my list. That was enough to obtain a pretty usable collection class. Did it work? It sure did! As predicted, it took up very little memory and it was also really fast.

I could keep tweaking the class and adding more features but I think that's enough for a 30 minute exploratory coding session. That's right, it only took about that long to get the basic pieces in place. Now, if only writing about it were as easy as that...

01 May 2008

Fiddling With a Live Site

It all started with a problem on a live site. Not that the site didn't work — it really did. Except when it didn't. But most of the time it did. You see, it sometimes had a glitch on a specific browser (probably not the one you're thinking of) and even then it worked most of the time. We thought of slapping an appropriate badge on the page and calling it a day but management wasn't too impressed by that. (Who said they always fall for the Certification buzzword?)

There was nothing left to do but actually debug and fix. Sure, no sweat: just reproduce the error on a development server, figure out why it's breaking and come up with a patch to solve the problem. I got stuck on the frst step. I told you the problem was on the live site. No one mentioned anything about the site not working in development. The team would've picked it up if that had happened.

Right, so I was stuck with a bug I could only repro sometimes (does that even count as "repro"?) on a specific browser accessing the live site. Even if I could come up with a patch, I couldn't just apply it to the live site without testing. My first idea was to use FireBug. I'm not that good at it but I've heard it can do some amazing things. For reasons I won't go into here, that wasn't an option.

That's when inspiration struck me. I had to change the Javascript my browser saw without changing what was on the server. Another nifty tool that can do that sort of thing is Fiddler. I use it mainly to see what data is being passed back and forth but it can also change the data for you. (If you're going to try this, do yourself a favor and install the FiddlerScript Editor as well.)

Step number one was to identify what file contained the Javascript code I wanted to debug. At first, all I wanted was to insert some alert() calls to see what functions were being executed and what the data looked like. The trick was to find a "hook" to attach to. For example, if I had a line of code like this:


var i = 0;

And I wanted to check that i was actually being set to zero, I would replace that line of code with:


var i = 0;
alert(i);

That seems pretty obvious but when I say "replace", I really mean it. I was making textual replacements in the source code before the browser got a chance to run it. (Did I just hear "C macros"?) Of course, by doing that, every line of code that looked the same as this one would have an alert attached to it (or maybe just the first one would — it depends on the exact replacement function). So the trick was to find bits of code (it doesn't even have to be a valid statement — it's just text, remember?) that looked unique enough so I could attach an alert.

Once I'd figured out why the code didn't work, I had to test the patch. Using the same technique, I could either remove chunks of code altogether or comment them out by inserting comment delimeters in the right places. All that was left was to insert the fixed Javascript and check that it worked.

While, in the end, the patch altered less than five lines of code, the same idea can be extended to much bigger changes. A larger intervention could require modifying the HTML and CSS as well as the Javascript of a website. The practical limitation is the complexity of the code you have to write in FiddlerScript. When you start dealing with code that's modifying other code that, in turn, is modifying HTML, it can be pretty hard to mentally juggle all those different abstraction levels. I wonder if there's a gap in the market for a tool that can hide away that complexity while still enabling a developer to debug a website without having access to either server or browser.