Sunday, September 25, 2005

The Case For 64 bits

Hardly anyone has a clue as to why 64-bit computing is almost just as cool as the step from 16 bits to 32 bits.

The hard core of the database weenies can tell you that you get a larger address space, and that this is nice.
Nothing much to make the hacker's heart beat faster.

To make matters worse, there have been some benchmarks recently where folks took a 32-bit source, compiled it for 64-bits, ran it, and noticed that it ran slower (one of the reasons is the fact that 64-bit code is bigger and therefore puts more strain on the cache, another reason is that bigger code & data takes longer to fetch from memory).

It is clear that not many really understand what 64-bit code can do for software, apart from being able to address more memory.

So I had a look at my code and I will show you where I use 64-bit variables, and why this makes the code simpler and sometimes extremely much faster (a factor of 4 is not uncommon). Just to be clear: Of course my code only runs that fast when it's compiled by a 64-bit capable compiler, and running on a 64-bit machine running a 64-bit OS.

The point in using 64-bit variables is speed. Unadulterated, blazing speed. What I used to do with hand-optimized MMX assembly, I now do with 64-bit variables. MMX is severely limited (read: slower) compared to full-fledged 64-bit code.

My software for the Asiatic board game of Go could well use an address space larger than 2 GB, in fact I am at that limit right now. During pattern harvesting, I use an array of 134,217,728 64-bit variables. I need more space for other data structures, and some spare RAM to be able to work on my computer while the thing spends weeks number crunching. (I have another PC but it's a bit slower and it doesn't have 2 GB, only 1.5 GB which is just too little for pattern harvesting). So much for the boring part of 64-bits, the increased address space.

64-bit operations afford very major speedups in applications like encryption/decryption, video/audio encoding/decoding, Chess programs, image manipulation and data compression/decompression and I leave it to others to explain the details. I will give concrete examples on how 64 bit variables occur "naturally" in my computer Go program.

1. Patterns
Moyo Go Studio has a pattern matcher. Patterns are, for efficiency's sake, stored as 64-bit hash values. 64 bits is needed to achieve the required reliability. The pattern matcher uses 64-bit random values, it XOR's those values with each other, it looks those values up in an array with 64-bit values, and so on and so forth.

2. Bitboards
Chess programs use them: for every piece-class, they use a 64-bit variable that has a set bit for a piece of that class on the corresponding chessboard coordinate. In Go, bitboards are highly useful as well, but unfortunately they do not fit into a single 64-bit variable. That does not mean we should revert to a bunch of 32-bit variables - of course we want the largest width available that the compiler supports. Our code will be smaller, simpler and faster. Moyo Go's bitboards are records consisting of seven 64-bit variables, and operations on those bitboards go about three times faster when they are done in native 64-bit as opposed to 32-bit (or 32-bit opcodes simulating 64-bit variables by combining two 32-bit values).

3. Operations on bitfields
Moyo Go uses all kinds of databases (games, patterns, players) and the amount of data is so massive that every trick in the book is used to compress the data. Otherwise the software would not fit on a CD. Of course you can use bitfields in 32-bit variables, but as soon as a unit of data exceeds 32 bits and you need to sort them, you need to do two separate swaps of two 32-bit values instead of a single swap of a 64-bit value. The same goes for manipulations on bitfields. Things go faster when everything is in a single CPU register, and you save registers as well. 64-bit CPU's have not only the ability to operate on two 32-bit values in a single instruction, they also have more registers, something the optimizer can take advantage of. I use 64-bit variables to hold data collections. a 64-bit variable can hold the (max.) four coordinates of any adjacent chains and their color. So instead of using four variables, I use a single 64-bit variable.

4. Copying state
Search algorithms can be sped up by clever "state"-bookkeeping and that involves the copying back of previous states. Memory is most efficiently copied 64 bits or more at a time. Often, the OS should be able to do this most efficiently but whenever you use a loop to copy memory, 64-bit variables are faster than 32-bits.

There are many more places where 64-bit variables are the most natural and fastest solution, and I am simply flabberghasted that so few programmers realize it. Apart perhaps from simple databases, I can't think of any kind of software that would not benefit of their use. (In fact, people have told me that even "trivial" databases would benefit).