Saturday, May 13, 2006

Inline Assembly Still Useful?

Since AGA published a devastating bogus "review" full of lies about MoyoGo (like: "It does not have a Help file"), my sales dropped to almost zero. The consequence is that I am not able to move to Czechia as I planned, to be a full-time Go programmer. So indeed , a litte collusion amongst friends does work to deny the competition a livelihood and suppress innovative Go software. The AGA eJournal has 9,000 readers and when you combine the negative "reviews" there with the boycot on ads on Sensei's Library and links from sites like GoBase, it becomes very hard to be a viable Go software developer.

Since this blog therefore isn't doing much good to persuade potential customers, I'll be heavier on the Programming side of things, so that at least fellow Go programmers might be able to benefit from it.

Which brings us to the topic: Inline Assembly language in our (C, C++ or Pascal, etc.) code. The knee-jerk response to this question is: "Due to the fact that optimization issues on modern CPU's are so complex, modern compilers generate code that is almost always better optimized than a human can do".

Which is 100% correct, but totally irrelevant to the issue!

Because we are not interested in optimized C++ code, we are interested in optimized "optimal" code! Let me explain. C++ and Pascal have "stuff they can do", like adding variables. This translates perfectly to machine opcodes, so a compiler can optimize it well.

But the processor in your computer is able to do a few things that C++ does not provide an interface to. Not all mnemonics have their equivalent higher-level language representation. What if it is possible to design the speed-sensitive part of a TsumeGo move maker/unmaker in such a way that it takes advantage of such instructions? (Hint: It is possible).

Then you're screwed, with C++ or Pascal or Java or whatever you use. Your compiler simply isn't going to compile it to efficient machine instructions, and it never will.

Compilers emit chunks of pre-defined code, mapping a higher level construct to lower-level instructions and then they optimize that output. But that's all they do. They are not creative with exotic opcodes. Yes they can do AND and OR and EXOR, but there are quite a few more interesting instructions that aren't mapped to C++ operands.

This is why, when you are serious about making a tactics module for Go (I mean serious in a quest for total world domination), you don't have much of a chance when you stick to Java or perhaps even the latest C++ compilers because I want to point out that no mainstream C++ compiler that targets 64-bit platforms is able, any more, to do inline assembly! Both Microsoft and Intel removed this feature. And it's a major hassle to maintain separate MASM assembly.

Free Pascal does allow 64-bit inline assembly. I develop MoyoGo's TsumeGo/Tactics module in 64-bit Pascal with inline assembly language. The most successful TsumeGo program, GoTools, is written in Pascal as well.

When you're a twentysomething, you probably know next to nothing about optimization techniques. Comp. Sci. does not have it in the curriculum. Unlike myself, you haven't had to tweak code running on a 1 MHz 8-bit processor. And large multinationals want to own the runtime environment, and because universities are closely affiliated to the corporate world, they have started to focus on "slow" languages like Java and C# .net.

Yes, I know there are compilers for those languages, but the resulting code is not fast, for various reasons, one of which is the concept of a garbage collector. But a language designed with the utmost contempt for speed, designed to be exclusively used where speed is of no concern, never is going to get highly optimizing compilers. Writing a good optimizer is rocket science, done by rocket scientists. No rocket scientist is going to write an optimizer for a Java or C# compiler, it just isn't going to happen, even if it were possible.

Arguments about .net or Java JITters being able to optimize on-the-fly for the specific CPU are true but in practice, this never happens because by definition, Java and C# programmers could care less about speed. And even if they do, the JITter simply doesn't have time to optimize thoroughly.

Assembly language is absolutely neccessary if you want to reach the top in Go programming. Even more so with Go, than with Chess, because Go needs even more speed.

Of course, your algorithms need to be optimal and exploit the CPU's capabilities to the max.
In assembly, you can do cache prefetches, align code and data and insert NOPs. I don't think you can do that in 64-bit C++. This alone makes inline assembly worthwhile.