So as I write we’re in the grip of the worst virus attack ever. I’m getting dozens of them per hour in my mailbox, and dozens more automatic replies from mail servers that think I have sent them a virus, because they don’t realize that there is such a thing as mail address spoofing.
On reflection, I should say "the worst virus attack so far" because there are bound to be more to come. And what do we hear so often that the root cause of the exploits is that allows the viruses and worms to take grip? Buffer overruns.
What does that mean? It is a polite way of saying that programmers aren’t checking their array indexes. They are writing programs that at some stage copy data from their input into a region of memory, and forgetting to check if the end of the region has been reached. Some virus writer has discovered this, and managed to find a way of getting the program to copy just the right bits of data to the right places in memory, to let the virus writer take control of the program (in a way similar to how bootstrap loaders work).
So why don’t programmers check their array bounds? I surmise it is for two reasons: It’s easy not to, and it is perceived as slowing programs down.
Programming languages used to offer the option of checking array bounds for you automatically, although for some odd reasons it was often switched off when the program went into production. I remember one amusing occasion at a university when they installed a new compiler where array bound checking was switched on by default, rather than off as before. There was uproar, as users demanded that it be switched off, because all of a sudden their previously working programs had stopped working!
Unfortunately, much system programming is done these days in C or its derivatives, and because of C’s design, it is more or less impossible to do automatic array bound checking in it.
This is unfortunate. As Dijkstra so famously said: Testing doesn’t show the absence of bugs, only their presence. But even testing is unlikely to show the presence of buffer overruns: Even if there is a data set used during the testing that is longer than the piece of memory reserved for it, the bit of damage inflicted is unlikely to have immediate effect, and at best will show up as an apparent random error later on.
My conclusion from this is that C and its derivatives are inherently unsafe.
If a plane crashed or a nuclear power station blew up, and it turned out to be due to a buffer overrun in C, I would be willing to appear as an expert witness and testify that using C is negligence when a program is required to be secure. Why? Because it has been known since the 70’s that it is possible to design a programming language that guarantees at compile time that no array index will go out of bounds (and, by the way, that no null pointer will be dereferenced). In other words, buffer overruns could be detected at compile time. (If you think that this is impossible because of the halting problem, then you may not have properly understood the halting problem.)
Why am I writing about this in a publication that is about interaction? Well let’s pretend for a moment that programming languages are an interface between humans and computers. Then you would want your programming languages to be usable, wouldn’t you? Usability means allowing users to achieve their tasks quickly, error-free, while enjoying doing it. Error-free, there’s the rub.
Oddly, no programming language I know of, bar one, has been designed with usability for the programmer in mind (and that one exception, ABC, I co-designed). Programming languages are always designed with the machine principally in mind, not the programmerthat is to say, features in programming languages reflect properties of the computers the programs are meant to run on, not properties of the programmers who are meant to write them. This is wrong. Programming languages should be designed to reduce programmers’ time and errors, and optimization should be used to keep the machines happy.
©2003 ACM 1072-5220/03/1100 $5.00
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.
The Digital Library is published by the Association for Computing Machinery. Copyright © 2003 ACM, Inc.