Category Archives: Programming

Intrinsically Unpredictable

I found this article on CodeProject, that shows a way to do "alpha blending" of one image onto another using Intel's sexy SIMD instructions. Whassat? SIMD = Single Instruction Multiple Data. Instead of adding sixteen numbers to sixteen other numbers, one at a time, you can add sixteen numbers to sixteen other numbers all at once. One instruction, "add", performed on multiple values in parallel. Instant sixteen-fold speedup!

The drawback is that these SIMD instructions (Intel's MMX, SSE, SSE2, etc.) are fairly limited in utility, kindof arcane, and sometimes come with some pretty harsh restrictions. So, you end up having to rework your process to fit what the SIMD instructions allow. The potential benefits are great (16x! for every bit of arithmetic!), but you generally sacrifice a ton of speed in order to get all your data lined up just right so you can do those parallel adds. And as always, things that are simple in C/C++ become challenging in Assembly, and reworking your process to fit what the SIMD instructions demand on top of that can become an excruciating nightmare.

But excruciating is my first name. Well, not really, but is actually kindof close, in a theological and etymological way.

So, this CodeProject article shows how to blend one image onto another with variable opacity. It's not a really complicated process: for every pixel, Pixelnew = (Pixel1 * opacity) + (Pixel2 * (1.0 - opacity). The trick is to make it fast.

I have some C++ code that burns through it pretty quickly and I was satisfied with it. But, I tried this SIMD code from the article, and had to admit defeat. It was nearly four times as fast as my beautiful C++. So, I reworked it a bit and replaced my C++ code with the result.

That got me to thinking.

What the article calls "alpha blending" is not what I call alpha blending. The code just does a blend of two images with a global opacity value - every pixel uses the same opacity. In my experience, "alpha blending" is what happens when each pixel has its own opacity value, aka "per-pixel alpha." Where a standard pixel has a Red, Green, and Blue value, a pixel with per-pixel alpha has a Red, Green, Blue and Alpha value. That's much more flexible (at the cost of 33% more memory per image).

What I wanted was a way to blend an RGBA (Red, Blue, Green and Alpha) image onto an RGB (no alpha) image, using the SIMD instructions. Every pixel in the 'top' image has its own opacity value, but the bottom image doesn't.

Problem 1.
RGBA and RGB pixels are different sizes. RGBA pixels are four bytes wide and RGB pixels are three. So I have to shuffle things around to get the Rs, Gs and Bs all lined up. That's going to eat up time.

Problem 2.
The SSE instructions work on chunks of data 128 bits wide. You read 128 bits, do your stuff on those 128 bits, then write 128 bits out. That means we can read 4 RGBA pixels at a time (8 bits per byte, 1 byte each for R,G,B,A; 8 x 4 x 4 = 128). But 128 bits would hold 5.3 RGB pixels. Since we need to have an RGBA for every RGB, we're going to have to discard that last 1.3 RGB pixels at each pass. Not a problem, but the inefficiency gnaws on me.

Problem 3.
Due to the nature of computer math, you can't do that alpha blending equation using just 8 bits per R,G or B. The intermediate values will exceed what 8 bits can hold. So, you have to expand to 16-bit data. That means we can process just 2 pixels at a time, not 4. All's not lost, since there's a lot of initial work that can be done before we do that expansion into 16-bit data. So, we'll just split those four initial pixels of 3x8 into two 3x16 and do one after the other, then recombine at the end, saving all the initial work at the cost of a little more work at the end.

We're still ahead of C++, which has to do each pixel as three separate operations (and SIMD will do them 2 at a time), but you can see we're running out of potential, quick.

Problem 4.
After we process our two pixels, we have to store the result. But again, 128 bits = 5.3 RGB pixels, and we're only working on 4. So we have to keep those extra 1.3 pixels around, stick them on the end of the four pixels we actually processed and then write the whole thing out - otherwise, we'd end up writing garbage over those 1.3 pixels on the end (which are the first 1.3 pixels of the next round!)

Problem 5.
Given that we can't read or write past the end of the image, because we are processing 4 RGBA/RGB pixels at a time, always, we can't do anything if we have fewer than four pixels in the image. And if we end up with 1, 2 or 3 pixel at the end or a row, this code can't do anything with them, either. So, we have to limit ourselves to images that are multiples of 4 wide.

Did I solve these problems? Tune in tomorrow-ish!

(pt 2)

ZAP [eax], [mm1]

Assembly language...

(scared yet?)

Writing it is like riding a bike without training wheels. No, more like riding a bike without training wheels or brakes, and it's a fixie. And you're riding down Mt Doom with a dwarf standing on the back pegs.

So, a few years ago I did a major revision of one of my software packages, and added support for 64-bit processors as part of the work. Along with a lot of nit-picky work, there were a few architectural changes, and one of them was reworking some in-line assembly I was using for performance reasons. The code is used to blend two images: pixel_top + pixel_bottom = pixel_new. It's the kind of thing that absolutely flies when using MMX because you can do 8 pixels at a time. Anyway, in x64, the MS C++ compiler doesn't allow in-line ASM as it does in Win32. Not exactly sure why. Doesn't matter. No big deal. I just moved each chunk of ASM to an external .ASM file, put a "PROC" and "END" around each, added parameter lists, etc.. Only slightly more complex than copy/paste.

And that worked fine for years. Until yesterday.

Yesterday I learned that one of those functions had started crashing in certain situations. And I stared at the code for hours and didn't see anything wrong with the logic - the code looked like it should do exactly what it was supposed to do. And it was! But it wasn't. For some reason, in release mode, with full optimizations turned on and with a certain data size, a pointer was going crazy and running off into the woods, where it encountered an angry wizard who killed the process dead.

Then I looked under a rock I found on the internet and learned something!

When you write in-line assembly with Microsoft's C/C++ compiler, you don't have to worry much about the contents of most of the common CPU registers when entering or leaving your _asm{...} blocks. The registers are almost like local variables. You can put anything you want in them and the compiler doesn't care and it will clean up after you. But when you move your ASM to an external (non-inline) function and use the MASM compiler (aka the 'assembler'), you are expected to behave as a responsible adult. You are supposed to give the assembler a list of all the registers you are going to use. You are supposed to do that so that the compiler can then coordinate its own register usage with yours - so that you don't interfere on each other. You're supposed to tell it; the ASM compiler doesn't actually care one way or another - want to ride your bike without a seat? ASM doesn't give a crap. Go ahead. Checking that you've done this would be easy, but the assembler doesn't bother. And if you don't tell the compiler that you're using a given register - EAX for example - it will assume EAX is available for other uses and that it doesn't have to worry about you meddling with EAX. But then if you do meddle with EAX in your function, and the compiler was using EAX to hold a pointer, and your program comes out of your function but now that pointer has gone crazy and run off into the woods where the angry wizard lives... well, your program is going to get killed.

Or maybe it won't! Maybe three years of new revisions will go by and never once will the compiler decide it cares about EAX when calling your function. It will be using EBX or EDX or whatever to hold that one pointer. But then one day (yesterday, in my case) it will decide to use EAX... and you won't know it until you hear the ZAP! of the angry wizard's staff flicking your program out of existence.

And that's why one should avoid assembly whenever possible.

The Loop

How to piss off a customer.

int i;
for (i=0; i < theList.size(); i++);
{
   theList[i].something = 0;
}

= crash.

Since I couldn't run the code (couldn't come up with data that would ever cause that code to run), it took me a day to figure out what the problem was. Any of you C/C++/etc peeps see the problem?

Mochi! I Hibaried On Your Joomla! Xajax!

I've had this job:

We programmers know a few tools but not the ones the architect knows. That's ok - we've learned lots of tools over the years so we keep quiet and think to ourselves "I don't know what this guy is talking about but I can learn these tools". So when the architect says "exclude slf4j from the library's build sequence or modify the pom file dependency list" we don't say "what the hell are you talking about". We don't say anything. We go to google and spend the next two weeks learning slf4j and Ivy and Maven, and RESTful WebServices and Grails, and the proper syntax for the BuildConfig file. Then we reboot the computer three or four times for good measure; download security patches for the IDE; get the latest version of the JDK; clone a few repositories from GitHub; study the Artifactory website; look for new docs on the wiki; and hope to god someone has figured out why the WAR file doesn't deploy to the 3.2 version of the app server. In all this time no code has been written. No problems have been solved. No user interfaces have been created. But everyone has been terribly busy. And the architect has been studying newer, better versions of the three or four tools we have now almost learned, but which have become obsolete. Eventually, the project is cancelled. Management decides to continue using the prototype version written in Objective-C, even tho the source code has been lost and it doesn't use any of the company-standard tools, because it does 80% of what the one customer wants, and no one else really needs the application anymore.

Luckily for me, these days, I'm down in low-level C/C++ land, mostly pushing string data around. Because we're cross platform, we don't have a lot of external libraries to fuck with; even parts of the C++ STL are off-limits.

But I've been in the job where there answer to every requirement is a toolkit someone read about in last month's IT Mid-level-manager Journal. It's a great way to do nothing but add acronyms to your resume.

Urgentz! CodePlz?!

In a postscript to her article "Plagiarism and the mechanics of privilege", Teresa @ Making Light has some fun with a student who is trolling the Yahoo! Answers boards, looking for someone to write her essay on "Wuthering Heights". Teresa replies:

Here’s a thesis statement: Wuthering Heights is an early work of science fiction that takes place in a dimensional bubble separate from our own universe. We know this because any time someone gets too far from the Wuthering Heights/Thrushcross Grange/Gimmerton Kirk triangle (think Bermuda Triangle), they cease to exist as far as the story is concerned. Nothing outside the bubble is a solution to anything that happens inside it, no matter how logical that should be. Characters living inside it only leave if they’re desperate, and for some strange reason, they keep coming back. …

Which is awesome.

Unfortunately, begging strangers to do your homework isn't limited to English Lit. essays; it happens all the time on my favorite programming website, Code Project, too. People constantly post Please Do My Homework questions. And they usually get mocked pretty roundly by the rest of us.

Sometimes their questions are simply the text of the assignment (which they've copy/pasted), with no commentary or even a hint that they've done any thinking about the problem.

For example:

This is my question: please do my homework
Implement a standalone procedure to read in a file containing words and white space and produce a compressed version of the file in an output hie. The compressed version should contain all of the words in the input file and none.

Which generated the following replies:

should contain all of the words in the input file and none.

Well that's quite a challenge, I wonder how you will solve it?

and

You should assume your teacher is not as dumb as you, and knows how to use google. They probably already know you are a cheat, you should talk to them before they talk to you, to increase your chances of not being kicked off your course.

Sometimes they put in the effort of retyping the assignment:

write a cprogram using nested for statements
write a cprogram usung for nested statements to dicplay the following output:
1
2 2
3 3 3
4 4 4 4
5 5 5 5 5

Which received, in reply, two complete programs. Some people are kind. I usually tell them to do their own homework.

For whatever reason, most of these pleas are from people for whom English is definitely not a first language, though maybe textspeak is:

I need a code
Pls help me to sort numbers in both ascending and descending orders in javascript with HTML.I need a code for this.Pls send it as soon as possible.

... and ...

will u help me plz
dear Mr. basu
this is KAUSHAL VARSHNEY from aligarh. i m NIITian and persuing GNIIT from ther. we are in some coding trouble, actually we aer governing a project on USB security system, so will u plz help me in coding of my job. specially in identifying default antiviruse on a button click event by which we can provide more security to our project.
i will very thankful of urs. there is not much time we had. so respond me as soon as possible.

Or, the ever-popular chunk-of-code post:

How to convert c++ to c#...pls..
[...1000+ lines of C++ omitted...]

No question, no hint that the poster has tried anything, just a giant chunk of code. Even better, this was the kind of code that really couldn't be translated; it was the kind of code that's intimately tied to a C++ framework for which there is no C# equivalent. It'd be like translating e.e. cummings into Chinese. This person had no idea what he/she was doing. Or it was a joke.

There are plenty of wanna-be game developers:

can anyone give me the source code of pacman in java?
tnx!
i really need it to be able to do my assignment!
i hope you guys can help me on dis! tnx!

As the first reply notes, instead of typing that long, detailed question, the person could have just Googled "source code pacman java" and found the answer. "Have you tried Google?" or links to Let Me Google That For You are popular ways to answer these questions.

And...

How to develop a game from screenshots
hai i am dng project

i m dng blackberry gaming application prjct.
here i want 2 develop the code from screenshots.
plz suggest 2 me how to do it?

Replies included:

Look at the screen shots, and then write code to duplicate them.

..and..

here i want 2 develop the code from screenshots.

Usually the flow goes the opposite way...

But this one might be my favorite of recent questions:

I demand a lexical analyzer
prgram that convert from lex.l to lex.yy.a
plz
in lexical analizer

I realize there's a language issue here, but I just love the phrase "I demand a lexical analyzer". I think it would make a great T-shirt.

The Story of O

In addition to my day job, I run a small business writing image processing software. Sales have been pretty shitty for the past few months, so I've kindof lost interest in writing new stuff for it. It's no fun writing new features when you don't expect anyone to care - I work much better when people ask for things. But over the Christmas break, I found a research paper that got me interested in programming again; it describes a magic beast: an O(1) Median Filter.

What?

Come with me. I will tell you.
Continue reading

Bi-lateral

My day job is "Senior Programmer". That means I have at least 5 years of professional programming experience (14 actually). So, bascially, I know what I'm doing when it comes to programming. Sure, the young kids are going to have a better grasp of the nuances of C# or MSIL or whatever Microsoft has decided we all need to know this month. But programming isn't really about the programming language you use; programming is really about knowing what you need to do with whatever language you have to use, in order to satisfy your customers. And after a while, you hit a point where you realize they're not really asking for anything new, they're just asking for it in different ways or with slight variations on a common theme. And there are dozens of books on this - written well before I figured it out for myself. So, at one level, my day jobs have been all pretty much the same kind of thing.

I imagine it's something like house-building: the plans say a wall goes here, and there's a window here and it might have a strange angle for a dormer or something. But people who have a lot of experience building houses know how to build the kinds of walls that go into standard 2x4 and sheet-rock houses. And if a given plan mixes walls and ceilings in new and unusual ways, it's not a huge problem - it's just a matter of working it through using techniques you know like the back of your hand. Sure. maybe some new guy will show up with a tool that can cut wood like an Exacto knife through paper - but that doesn't tell him how to build a wall, and it can't teach him. That knowledge comes from practice. In many ways, like house building, programming is a craft.

My side job is also as a programmer. But I write image processing tools - a fun little niche, in my opinion. Of course I can do all the basics: read a JPG, rotate it, resize it, filter it, put text on it, etc.. But that kind of stuff has been understood for decades. I didn't invent any of it, I just learned from looking at what other people have done. Image processing is very technical at its foundations, but there has been so much research and development done on this stuff over the years that, at this point, a lot of it is like following a cookbook: need to know how to rotate an image? there are a million places to look for good ways to do it; so pick your algorithm and type it in. The only thing left for much of it is to make your implementation as fast as you possibly can. Because besides accuracy, image processing users want their image to rotate as fast as fucking possible, and not a millisecond slower. But, it's still pretty much cookbook stuff at heart. So, at that level, it's just like my day job - the standard problems have already been solved, it's up to you to adapt the well-known solutions to the task at hand. The fun for me is in the optimization (the speed improvements - an art form of its own).

But, unlike my typical day-job stuff (ask the database for the data, put it in a list, tell the buttons what to do, etc) there's also a cutting-edge to image processing. There's a pretty substantial number of real scientists working on new things all the time. They work in things like computer vision, 3D rendering, and morphological processing (the intersection of set theory and image processing) - esoteric stuff, by most standards. The things they come up with are useful, exciting and often almost magical. But they are also so far ahead of the mainstream that the professional programmer community hasn't had time to come up with ways of implementing them - think of a set of cutting edge architects who keep inventing buildings made out of exotic materials and using techniques that the average subdivision home builder couldn't possibly use and still get that job done on schedule and budget. It's nice to look at in drawings, but how the hell can the average crew build that kind of thing?

Still, my customers will read about this stuff in a magazine and ask me if I can do it; "This is cool! I need this now!" I hate to say "No", so I always try to see what I can do. And it's always fun to be able to find new things to work on. Most of the time, I can Google around enough to find a recipe I can use, and that's that. But sometimes, the thing the customer has asked about is something that is so new and esoteric that the only information I can find are technical papers, submitted to conferences or academic journals by professors and grad students who approach the problem from a mathmatical or theoretical viewpoint, written for an audience of academics, scientists, and theoreticians. There is rarely a plain-English explanation of what they're doing; there's always a bunch of long horrible equations written in terse notation where every variable has multiple super- and sub-scripts, lots of summations, glossing over details and "... this is as explained by Xhiao Lung and Frederic Grimenschtrudel in their 1986 paper, Techinques for invariant monological comprendium derivatitions and tri-quadrant bi-noodling"; and there are always graphs comparing the results of their idea to some other academics' idea - whose work I don't understand either. Once in a while I find a paper written by a student of these professors who has implemented what the professor described, but only describes a high-level summary of results (a picture of the finished house - never a description of how they actually built it). Tease.

So, my latest attempt is to do something called "tone mapping". It's basically an attempt to automatically adjust the brightness of an image so that previously invisible details in light and dark areas are made visible without horribly distorting the overall brightness of the image - ex. given an image of a dark room with a bright window, it would bring out details in the shadows and details in the bright areas at the same time (and in a way that looks natural) but wouldn't affect the middle tones much. Try that in Photoshop sometime, to see how difficult it is using standard tools. Well, the heart of the trick lies in knowing what constitutes a "detail", and the latest techniques for this rely heavily on something called a "bilateral filter". Roughly, this is a blurring filter that can recognize abrupt changes in image intensity - and if it sees a sharp change in intensity in a group of neighboring pixels, it assumes that it's looking at a detail and tones down the blur effect in that area. Incidentally, this reminds me of how automatic focus works in cameras: they look at a small part of the image (usually the center) and adjust the focus in and out in order to find the spot that maximizes the intensity differences between pixels - higher intensity difference = higher contrast = sharper focus. Squint your eyes, contrast goes down, image gets blurry.

Now, blurring an image happens to be a cookbook technique. It's very basic; a simple blur is one of the first things a budding image programmer is likely to learn. The typical bi-lateral filter is done with a "Gaussian" filter (a well-known cookbook filter) but with a twist. And that twist is the key to the whole thing. Now, I spent a week over Christmas playing with my Gaussian filter to try to improve its performance; I certainly understand how it works in practice (it's just a simple weighted average), but I'm not sure about the mathematical theory behind it (the weightings used are what makes it a "Gaussian", and I don't know why you need those particular weights or why they do what they do). And when the academic papers talk about modifying their Gaussian filters, they're doing it from the deeper mathematical viewpoint, which I don't understand, and can't seem to make heads nor tails of. And, of course, no cookbook has caught up to what they're talking about. So I suffer through these papers, hoping one will offer me a plain-English explanation - sometimes I never find it.

It's unfortunate for me that you need a post-grad degree in mathematics to understand the state-of-the-art in computer imaging. But, that's the way it is.

Too much typing for a Saturday night.