Дима Рубинштейн (dimrub) wrote in gotchas,
Дима Рубинштейн

Out of "space"

This is not really a gotcha, rather a bug I solved a while ago. I was reminded of it a while ago, so decided to put it up here, to make it easy to find and refer to (I described it once before in my Russian-language blog, so I apologize if you've read about it already).

I was working on an embedded system (VXWorks) at the time. It's sort of a single address space single app multiple threads (aka tasks) thing, where any sufficiently severe bug brings down the whole system. I was responsible, among other things, for the management interfaces, consisting, among other things, of a terminal application that gave access (and control) to the internal state of the device. The bug, as was assigned to me, said that listing a certain table (one I had recently added) caused the device to crash (in free(), of all places, with a stack dump pointing to the terminal library). I tried reproducing it for hours, using exactly the same configuration as the QA, and the exact described steps (which were quite simple: run the command that shows the table, scroll till the end of the table, do it a few times, see the device crash and burn in spectacular flames) but in vain. Finally, I gave up, went upstairs to the QA, and started watching closely how the QA engineer successfully reproduces the problem time after time. Eventually I noticed, that his pattern of reproduction looks as follows:

- bring up the last command from history (arrow up)
- hit enter
- hit space many times, to scroll through the table
- repeat until the device has crashed

He performed these actions in quick succession. Usually, he would hit the space much more times than is necessary to scroll through the table. And occasionally, he would brush the up arrow without actually hitting it, so he'd be hitting enter (submitting a command line) on a line full of spaces. That's when the crash would occur. It was easy to find out at this point, that a string of spaces consisting of 16 to 31 spaces causes the crash - and the particular table had nothing to do with it.

I then deep-dived into the terminal library code, and found there a function, that removes trailing white spaces from a command line. This function would start from the end of the line, and go back replacing whitespaces with null characters till it encounters a non-white space character NOT CHECKING WHETHER IT REACHED THE BEGINNING OF THE STRING! It so happens, that an allocated memory block in this system is preceded with a one-byte prefix, in which a value of 0 means "the block is free", and any other value means that the block is allocated, and indicates its size. The memory is allocated in blocks of sizes which are powers of 2. If the string is 17 to 32 characters long (including the terminating null), the value of the prefix will be 32 - which happens to be the ASCII code for whitespace. So the function would happily overwrite the prefix with 0 - upon seeing which (much later), free() would abort().
  • Post a new comment


    Anonymous comments are disabled in this journal

    default userpic

    Your IP address will be recorded