ZAP [eax], [mm1]

Assembly language...

(scared yet?)

Writing it is like riding a bike without training wheels. No, more like riding a bike without training wheels or brakes, and it's a fixie. And you're riding down Mt Doom with a dwarf standing on the back pegs.

So, a few years ago I did a major revision of one of my software packages, and added support for 64-bit processors as part of the work. Along with a lot of nit-picky work, there were a few architectural changes, and one of them was reworking some in-line assembly I was using for performance reasons. The code is used to blend two images: pixel_top + pixel_bottom = pixel_new. It's the kind of thing that absolutely flies when using MMX because you can do 8 pixels at a time. Anyway, in x64, the MS C++ compiler doesn't allow in-line ASM as it does in Win32. Not exactly sure why. Doesn't matter. No big deal. I just moved each chunk of ASM to an external .ASM file, put a "PROC" and "END" around each, added parameter lists, etc.. Only slightly more complex than copy/paste.

And that worked fine for years. Until yesterday.

Yesterday I learned that one of those functions had started crashing in certain situations. And I stared at the code for hours and didn't see anything wrong with the logic - the code looked like it should do exactly what it was supposed to do. And it was! But it wasn't. For some reason, in release mode, with full optimizations turned on and with a certain data size, a pointer was going crazy and running off into the woods, where it encountered an angry wizard who killed the process dead.

Then I looked under a rock I found on the internet and learned something!

When you write in-line assembly with Microsoft's C/C++ compiler, you don't have to worry much about the contents of most of the common CPU registers when entering or leaving your _asm{...} blocks. The registers are almost like local variables. You can put anything you want in them and the compiler doesn't care and it will clean up after you. But when you move your ASM to an external (non-inline) function and use the MASM compiler (aka the 'assembler'), you are expected to behave as a responsible adult. You are supposed to give the assembler a list of all the registers you are going to use. You are supposed to do that so that the compiler can then coordinate its own register usage with yours - so that you don't interfere on each other. You're supposed to tell it; the ASM compiler doesn't actually care one way or another - want to ride your bike without a seat? ASM doesn't give a crap. Go ahead. Checking that you've done this would be easy, but the assembler doesn't bother. And if you don't tell the compiler that you're using a given register - EAX for example - it will assume EAX is available for other uses and that it doesn't have to worry about you meddling with EAX. But then if you do meddle with EAX in your function, and the compiler was using EAX to hold a pointer, and your program comes out of your function but now that pointer has gone crazy and run off into the woods where the angry wizard lives... well, your program is going to get killed.

Or maybe it won't! Maybe three years of new revisions will go by and never once will the compiler decide it cares about EAX when calling your function. It will be using EBX or EDX or whatever to hold that one pointer. But then one day (yesterday, in my case) it will decide to use EAX... and you won't know it until you hear the ZAP! of the angry wizard's staff flicking your program out of existence.

And that's why one should avoid assembly whenever possible.

5 thoughts on “ZAP [eax], [mm1]

  1. Rob Caldecott

    I love these posts.

    I haven’t written a line of assembler in over 20 years thank god.

    I once wrote a replacement UK keyboard driver to fix a serial port interrupt issue which was loaded as a DOS TSR. 1989 I think. That was my crowning achievement when it comes to asm. Back then Borland tools were king.

  2. cleek Post author

    ya can’t beat MMX when it comes to doing the same simple thing to a zillion bytes. being able to do 8 parallel adds (or subs or AND, OR, etc) is just awesome. SSE2 lets you do 16 parallel 8-bit ops, but the memory alignment requirements are a killer. either way, it’s such a crazy world down at that level. you have to worry about instruction pipelining and cache usage and all kinds of stuff that no C++ programmer should ever have to even hear about. that’s why i haven’t written any new ASM in years… and the stuff i claim that did write i probably modified from something i found on CodeProject.

    but ASM on my C64 in 1983? that was fun!

    and i did some VAX assembly in college… it’s so high-level that it has a string output function built in.

  3. John Weiss

    Assembly? Gods how I hated it. C was a revelation.

    As an aside, I knew a guy who was *very good* at it: he was a lead programmer for IBM and one of the most peculiar people I’ve ever met. He was the archetypical ‘geek’: he could hardly construct a sentence in English conversation. Coincidence?

Comments are closed.