Adjusting to C
I am witnessing an odd devolution of otherwise reasonable high-level language programmers to the C programming language. Python folks, Ruby dudes, and Java programmers are being drawn to the C flame. I guess it's the Linuxes and vims and nginxes and other assorted C wonders. Maybe it's that if you write your code in C, it can be used by all those other highfalutin languages, no apologies, no hassles. Maybe it's speed, though I really wonder about that, given our blistering fast processors and endless banks of RAM.
I have noticed a bit of a trend with the Python/Ruby/Java mindset when returning to their roots, so's to speak. It's odd, but they seem to love malloc(). I will leave the whys and whatnots as an exercise for the reader, but I thought I might offer a perspective.
The Stack
If you've spent any time around a C programmer, you will notice that they talk about 'the stack' and 'the heap' a lot. If they are feeling particularly compsciey, they might even refer to 'automatic variables'. They talk about it in a way that there is a pretty simple choice you must make about where your variables' data lives. You might hear them ask: "So, why is variable 'glaxxor' on the heap? You could have easily put it on the stack and passed it's address". You might pick up the reverent staccato of their words and concluded there is some reverence for this stack. The heap doesn't seem as good. Most things we put in heaps aren't. There's a reason for that.
The Heap
C programmers hate malloc(). This is the most important thing to know to become a grade A (or B) C programmer. They mostly hate it because it's a pain in the ass to free(), it's the primary cause of memory leaks (the other, I suppose, is infinite recursion), it's slow, and it fragments your once ordered memory like so many leaves in the wind. The canonical C program will only have calls to malloc when they are absolutely necessary, because C programmers see malloc() for what it really is: a one-size-fits-few solution to a difficult but very specific problem.
Please note that malloc() and free() are just regular functions in the C language. They aren't built-in, they aren't magic, they are just plain functions. Is it odd that a language, whose primary abstraction is the memory address has no built-in way of allocating or deallocating memory?
Garbage
You won't understand C programming until you understand the C programmer's disdain for garbage collected languages. The C programmer approaches malloc with a plan, a clever way to 'get in' and 'get out' with the fewest casualties possible. The GC language is carpet bombing your memory, your performance. You see, a C program is always about moving bytes in and out of memory and external devices. The GC languages are about something that is decidedly not moving bytes around. It might be objects or CAR/CDRs or OOPs or JVM objects (whatever their acronym is), but it's not about memory any more. The language has decided it knows better how to handle memory than you. To the C programmer, this is like giving the keys to the car to some super (and most likely brain dead) version of malloc. Heck, they didn't like malloc when they could control it, now it's going to interrupt their program to clean up 'garbage' that shouldn't have been created in the first place?
Be Free from the Heap
If I haven't dissuaded my high-level colleagues from dipping their toes in the C waters, let me give a few tips on how to not avoid shying away from malloc():
- avoid libs that allocate memory "on your behalf"
- prefer libs that force the client to worry about memory management. This usually takes the form of the lib API accepting pointers to structs to populate them. The good thing about this is that you can malloc the memory, or leave it 'on the stack' - up to you, but...
- use the stack! Remember that functions you call don't care if the memory is on the heap or stack. Use automatic variables when you can. Prefer 'populating' data structures to allocating memory. The sign of the master is a pointer to a stack variable.
- use strings + lengths. A common pattern is finding stuff in strings. Instead of returning a new copy of a found substring, just return a pointer to it in it's original source string + it's length
- use lookup tables. Return pointers into statically allocated structures
- use buffers. Don't allocate memory for every single item you read from a file or socket. Statically allocate a big horking buffer and re-use that memory!
- use reasonable maximums. We often use malloc when we don't know how big something is going to be (usually accompanied by hand wringing). Put a reasonable limit on it's size, then truncate it into a buffer. Log the 'error'.
- use 'const' and 'static' properly when declaring variables and arguments. Be thankful you have a free CC compiler that isn't K&R!
The Problem
Of course, malloc() is there for a reason. At some point your will convince yourself that you need a linked list or red-black tree or something. Sometimes there's no way around it:
- make a rule that whoever allocates the memory is responsible for freeing it (I am personifying functions)
- where that's impossible make another rule that covers all of the possible edge cases. Do this before writing any code
- use cacheing. If you are mallocing 8 bytes over and over to hold the string 'GLOXXOR', heaven help you. Malloc it once, store it in some useful hash based structure and return the same pointer every time it's needed. Added bonus is that there is a single data structure to free up when the time comes.
- prefer big mallocs to little. Malloc is a very inefficient function. Calling it often will eat away at performance and will cause 'heap fragmentation' (another useful google search!)
- check out the entire malloc family. There are a bunch of other flavours of malloc, use the right one. I like realloc() because it feels like I'm getting one free free().
What this doesn't solve
Pointers are gnarly. They are hard to reason about and easy to get wrong. When they go wrong in C, the best thing that can happen is a sudden death of your program and a dumpage of core. That won't always happen though. You may just corrupt some nearby memory location and happily continue. These are the worst. These are the reasons programmers invented Java, Python, Ruby and Smalltalk in the first place!
Although that's not why they invented C++.