Friday, June 09, 2006

Tactical Module Progress Report

The state machine, after three weeks of coding, is done. It looks outlandishly complex but I used indentation, empty lines, column-alignment, meaningful identifiers and comments to alleviate some of that.

It has around 1250 "real" lines, and Pascal is a little more verbose than C so It would be around 1000 LOC in C. But only when the same optimization hacks are used, otherwise it's around 600 lines.

All these weeks I have been coding, I haven't done a single test run. Neither will I, in the very near future. First I'll draw a large flowchart of what this code is doing, because it's quite complex and very easy to oversee details. I have a pile of A3 sheets for this purpose. It will take several attempts and sticky tape to create the flowchart (Nassi-Schneiderman diagrams suck).

After the flowchart has been studied and shown to be an accurate reflection of how the state machine should look like, I'll add visualisation code to show the state of the state machine, and a method of adding stones. In spite of Delphi's Windows/GUI capabilities, this has to be stdin/stdout stuff because still I need to be able to test the EXE when it has been compiled as a console app by Free Pascal, and when it's assembled by MASM, after I hand-optimized Free Pascal's 64-bit MASM output. Many tricks have been used to take advantage of modern CPU's 64-bit registers, but Delphi can compile the tactical module to run on 32-bit machines as well, by emulating 64-bit words in two 32-bit doublewords.

So from now on it's documenting, testing, bugfixing and benchmarking the core TsumeGo state machine - a major milestone has been reached - exciting times ahead!

I want to stress again how important a dedication to speed is, with this kind of code. I have learned a lot. 16-bit words need one more cycle to load than larger sizes. Bytes are very slow (I'm talking of the only serious 64-bit contenders out there - made by AMD). I've been carefully weighing the benefit of gaining a cycle by using lookup tables with the drawback of straining the cache. Realistic "stress code" will be used to experiment with alternative data representations and the algoritms acting upon them. Speed in this module is of prime importance so I will not rest until it can't go any faster.

The speed of state maintenance is very dependent on what you want to know about a tactical situation of course. It's a complex equilibrium of speed of delta-updating the state and the speed of extracting what you need to know from that state.