If you read any press on computer security problems, at some point
you are likely to come across the phrase “Buffer Overflow”–it’s by
far the most common security error that programmers make. It’s common
for several reasons.
It has nothing to do (by itself) with security.
It’s an easy error to make, and a hard one to detect.
It’s human nature not to expect the unexpected.
So what is a buffer overflow? I’ll start off extremely non-technical
here, and gradually bump up the level until the final section, at
which point if you don’t understand programming and call stacks you
may want to stop reading, and if you do understand them, you may
decide to start reading.
You need to tell a co-worker something important, you go to their
office, expecting a conversation something like this:
“I though you should know about this new thing.”
“Oh? What is it?”
You tell them the important thing.
Instead the conversation goes like this:
“Hey! Just the person I wanted to see! Did you hear about this
crazy election thing,”…followed by five minutes of political
diatribe. By the end of the conversation, not only have you
forgotten what you came in to say, you’re on the way out the door
with a poster to protest something.
Your buffer just overflowed, and you were hijacked for a purpose
other than your original intent. You had an expectation of how the
conversation would go (the protocol) and it was violated, with the
result that you ended up doing something different. That’s exactly
what happens to a program when someone exploits a buffer-overflow
When a program is designed, it is designed with an interface to the
outside world. That interface is not just what you see on the
screen, but also how it communicates with other programs and the
operating system. The interface is typically defined in terms of
either an API (a set of programming conventions for direct
communication with another piece of code) or a protocol (a definition
of a set of data and commands to be passed between programs). Think
of the API as how your brain tells you arm to pick something up, the
protocol as how you ask someone to pass the salt. Of course the
protocols are not always executed directly. Your brain tends to use
the mouth API to tell someone to pass the salt, rather than using
telepathy directly, and many programs use standard sets of code
provided by the operating system when they want to use a protocol.
Now, these APIs and protcols specify the form of the information to
be passed back and forth. For instance, a specification might say
that the correct response to an initial communication is no more than
five letters long (e.g. “Hello”). In the days before people had to
worry about hostile programs, code was written assuming that the
program you were talking to was going to be following the rules of
the protocol. If the protocol said “five letters” then there wasn’t
a lot of point in leaving room for six. Sure, your program might
crash if there were six, but it wasn’t your bug, it was a bug in
the program talking to you–it should have sent five letters.
So that’s a buffer overflow. You expect one thing, and somebody
sends you something much bigger. The “buffer” that you had set aside
to store that information doesn’t have room for what you get, and you
end up writing those six (or six hundred) letters on top of other
things that you were trying to remember. Obviously that’s not going
to be a good thing for the continued functioning of your program, but
it turns out it’s also a major security problem.
Computers tend to think in terms of two things–code and data. Code
consists of the instructions for the computer, telling it what to do.
Data is what it does it to and with. When you run a program, it
loads into memory both the code and the data that code needs. When
that program communicates with some other program, it is receiving
data, and it will then use the code that it already has to figure out
what to do next. This makes remote communication relatively safe.
The remote program can only tell the local program to do within the
constraints of the original code. Assuming nobody has done anything
stupid (which is not generally a good assumption), the remote program
cannot tell the local program to do anything that wasn’t originally
Modern computer architectures have an unfortunate design, however.
They don’t really no the difference between data and code. If
somebody can convince your program to try running the data that it
has in memory, it will do so quite happily. So a malicious program
has two goals. First it wants to get some code to your machine, and
then it wants to persuade somebody to run it. This is of course, no
different than an email virus writer’s goal. In that case, they
expect you to run it, in the case of a buffer overflow, they expect
the broken program to run it. Email viruses are so successful
because users often don’t know the difference between data and code
either (and some operating systems helpfully try to hide the
difference so as no to confuse them).
It turns out that if a malicious programmer can find a target program
that didn’t check for a buffer overflow, it can be very trivial to
get that program to execute code provided by the remote program. So
easy, in fact, that there are standard packages out there that
provide the entire payload for the overflow–all the script kiddie
(we’ll define that sometime, but suffice to say it isn’t a compliment
of someone’s hacking prowess) has to do is find the write length for
the buffer overflow and bang–they have control of your computer.
Before you panic, remember that doing this requires that they have
remote access to a program on your computer already, and that that
program have a buffer overflow problem. That means (for an internet
exploit) that your computer has to have some program that is
listening to external connections (e.g. print server, file
sharing…) or that you have a malicious user at your computer (or
you helpfully downloaded and ran their software).
How does a buffer overflow exploit work from a programmer’s perspective?
First you find some place in that program where it’s reading data and
assuming that it’s going to be reading something rational. E.g.
char buf; /* Store 4 characters */
gets(buf) /* Read any number of characters from the input
and put them in buf */
where the input turns out to be more than 4 characters long.
Now the question is, where is the data stored in “buf” located?
If “buf” is a global variable, then that data is probably allocated
in a data segment somewhere, and you’re going to try and overwrite
some other piece of data which will result in something useful (e.g.
a place where the program was going to execute one program, now
executes another). That’s tricky and hard to do without source code.
However “buf” is probably a local variable, allocated on the stack.
So instead of overwriting data, your goal is to overwrite the stack
itself. So you are going to put in buf some amount of padding (that
will overwrite the rest of the data stored on the stack), followed by
some machine code that overwrites the part of the stack that had code
on it. You’ll set things up so that your code will be executed
(possibly when this particular function returns) instead of the code
that normally would have been executed. Now you’re home free. Since
there are plenty of examples of sample exploit machine code, all you
need to do when you find a new buffer overflow is figure out the
appropriate offset–the rest of the work has been done already. You
don’t need to transfer very much data, just enough to run something
that connects you to the remote machine–from there you can transfer
the rest of the software you want to install remotely.
This is where security-by-obscurity comes in handy. Want to lessen
the chance of buffer-overflow attacks? Just run some obscure piece
of hardware. Run a Mac, or even Linux on the PowerPC. It’s not that
there aren’t buffer-overflow problems, but their are less handy
examples of how to exploit them running around. Less examples, less
successful attacks. It’s not a solution of course (especially if
everyone does it :-), but it is one way to slightly increase your
odds of remaining secure.
are machine/OS architectures that would make buffer overflows
much harder to exploit. Disable dynamic creation and execution of
code on the stack for one. Or keep a separate data stack. And there
are tools out there which will put watchdog data on the stack, and
then watch it to make sure it doesn’t get overwritten (effective, but
rather painful from a performance standpoint). But fundamentally,
where there are bugs, there are exploits. And modern software, with
it’s layers and layers of abstraction that no one person can fully
grok, has a hell of a lot of bugs.