Hi folks, been a while, I know. Job has a way of sucking up all free time. The autoconf project has stalled, not for lack of desire but rather for needing my time and resources to solve larger problems. Those larger problems are what this column is about. Probably no one but myself will read it but even so, it will give me a place to organize some thoughts while I do a drunken-chicken Debian installation on a Toshiba laptop I have here. It did have a Gutsy Gibbon pre-release on it but I needed it to be more stable for other things and since Gutsy is as of this writing getting close to release, the updates are coming more and more frequently and since I only occasionally use that box, anytime I turn it on I have 300+ megs of updates to deal with. Bite me. So while I am playing Siphon Filter on the PSP, Resistance: Fall of Man on the PS3, updating JBCobb.net and other things on this long weekend, the Tosh is cranking away at a Debian install. I just want to stick something solid on it and then only pull it out when needed.
Speaking of which, a small digression is in order. Over the years I have gravitated more towards laptops than desktops for my local computing resources. I still have two towers cranking away in the server room doing basic data handling tasks (file serving, backup, additional processing for video encoding, occasionally housing virtual machines and so on) but most of what is here are laptops of various vintages, usually strong indications of my spending ability at any given time. Laptops can be problematic for Linux due to hardware configurations so linux-laptops.org has been a life-saver more times that I can count. Recently I purchased what will likely be the last laptop I buy for a long time thanks to the Friend of the Court in Michigan taking more than half of my income every month (and a big KMA to them, too), a cutting edge Sony Vaio. After trying a number of 64-bit distros on it, each failing in some way (no wifi, no sound, crappy video, etc), I decided to go low-tech and make it all work with a stock Debian installation. At the end I had 3D accelerated video, wifi, sound, everything but the camera working so I documented the process (including recompiling the kernel to get the mac80211 enabled for the Intel 4965 wifi driver) and posted it to JBCobb.net here:
And submitted it to linux-laptops.org. My way of saying thanks for the help over the years and here is a little something back. My access logs show at least some folks are reading it so I hope it helps.
As you might know I am a software developer by trade and have been since the early 80’s. I started with BASIC like many of that era, moved to assembly on the 6502/6510 processor, moved to x86 and found the segmented memory model sucked so moved to C, then C++, picking up Perl, Python and other scripting languages along the way. Next came distributed programming, then multicore programming, distributed multicore programming and now the Cell Broadband architecture. The paucity of tools available for that makes development for it seem kind of surreal in that I am working with cutting edge processors with tools not unlike the same ones I was using 20 years ago, fighting many of the same problems.
The thing is, we as an industry should have evolved past this by now. With the coming of multicore systems which have been COTS for some time now, every non-trivial app should be taking advantage of both processors. Yet most code only thinks of the task at hand and not how to leverage the power inherent in the platform. This power has been here for years yet we are still coding like it is 1994. By now, most mainstream languages should have some form of concurrency built into them, yet that functionality remains the work of platform-specific libraries and thus what would work well on say a Windows dual-core box won’t even compile on the same box running Linux even though the hardware and the language are the same (and the language was supposed to be cross-platform from the beginning. Even something like PThreads which has an implementation on both platforms doesn’t work the same everywhere; just look at thread priorities for example.
Then look at how these separate processes share data, even in the easiest of situations, shared memory models. It is up to the programmer to use any of a number of different ways of protecting common resources and the typical way that winds up getting done is through locking primitives such has the mutex and semaphore. The messed up part of that is that the whole data structure tends to wind up with all accesses being serialized thus crippling the throughput of subprocesses waiting on that data. In the end, you can have the best multicore design but unless your data access requests are completely predictable, serialization is the only way to go and you make everybody wait on data. This is another problem that has been around for a while, a long while yet we are still solving it through bottlenecks.
So at this point you have the same (business) problem, the same data, the same hardware (both in type and instance) and the same language and after a lot of work you have an application that only uses both cores with a lot of work and then only poorly due to the bottleneck at the data access point. What is wrong with this picture?
Sometimes it feels like that Darwinian progression of caveman to modern man is going in reverse or at best stagnating for software development. Then sometime in the late 90’s someone at Sun uttered the phrase “the network is the computer” (a BFO in retrospect) and then proceeded to take it to such an extreme that they lost all credibility and the import of that phrase got lost in the noise. However, the concept of distributed computing is here to stay … for a while or forever, depending on your perspective. Let me put it this way: it is here to stay as a concept but the distinction between a computer as a stand-alone device and as a node of a larger computing task is blurring fast. For an example of this one needs to look no farther than the Folding at Home project. This harnesses thousands of what are normally conceived as “my computer” and “your Playstation” into solving “our protein folding problem”. How that applies to this rant is that SETI at home was doing this in the late 90’s as well yet we are still fumbling mightily with the next part of what ails our industry: distributed memory. As if the problems outlined above were not bad enough, now we have to deal with coexisting with memory that is not our own too. We currently accomplish this through a variety of methods that all wind up as 49 flavors of mailing chunks of memory to each other usually using some combination of sockets and network connectivity. Yes, those same sockets we have been using for 20 years, serializing data at the socket, pushing it out to another host which re-assembles it into a coherent data structure of some kind and presenting it to a process at the other end. Yes it works but it presents another kind of bottleneck to the computing problem, further crippling the potential of our new-found hardware power. We have known about the problem of non-local memory for a decade or more and yet are still dealing with the problems in the same old and broken ways. The awkwardness with the way this has been handled has been written off as a network problem but now it has come home to roost in the Cell Broadband Engine. I chose that as an example only because it is what I have to deal with at work. It does however work as a perfect example of the new issue we need to deal with: multiple cores (8, 6 usable) in a single box, each with a token amount of memory (256K and no, that is not a typo) with an additional 256 M of ram available for the PPU (main processing unit). Ironically, the PPU has only a fraction of the processing power of the other cores (known as SPUs) but is the only one with direct IO and other system service access. Thus the additional cores are only good for number crunching but man, are they ever good at that. Typically the SPU cores operate so much faster than the PPU that the SPU can complete the execution of an entire block of code in the time it takes the PPU to finish a single instruction. Therefore it is imperative that work is intelligently farmed out to the SPUs but guess what? None of the SPUs can (directly) share memory with each other nor the PPU. Starting to sound like the distributed computing problem all over again? Seems like we would learn how to deal with this by now but we are largely doing things the way we have for years….
The final leg of this problem (or at least the last one I am going to outline today; my pizza is ready) is the problem of PEBCAK: Programmer Exists Between Chair and Keyboard. I am not saying programmers are stupid or anything but I will stand behind the idea that the programmers of today need to think like the problems of today. Far too many (and often I include myself in this lot although I am striving to change that with every project I start) programmers start by solving a problem sequentially and then apply concurrency to the solution after the fact and the resulting software solution reflects this. The problems with this are two-fold: 1. Since the path did not start with parallelism in mind, an inordinate amount of time is spent debugging the concurrency after it has been applied. 2nd, rarely if ever does the resulting multithreaded solution really achieve the performance gains envisioned by the programmer applying the threads in the first place. A sad fact is that with most software projects that evolve in this manner, the most you can hope for is a multithreaded app that doesn’t poop in its diaper too frequently. It is not (entirely) the fault of the programmer; it’s just that there are decades of sequential development thinking to unlearn. Of course, sometimes the programmer is an idiot which can be hard to distinguish from one who simply refuses to adapt with the times.
The solution?
The pizza is ready so I will post some thoughts on this tomorrow morning.
Till then, time to munch!
JeffC

Related Articles
No user responded in this post
Leave A Reply