Converting Numbers to Strings in C11
While working on my bachelors thesis this past weekend, I came across an interesting problem: Given an int
or float
of unknown size, how can I write it to a string with no bytes to spare?
Conventionally, you’d probably write something like this when having to compute the string representation of an int
:
char *inttostr_conventional(int n) {
char *s = (char*) malloc(sizeof(char) * 42);
sprintf(s, "%d", n);
return s;
}
However, that’s not quite what I wanted. While 42 bytes is certainly sufficient for an arbitrary 32-bit integer1, some of this space will go unused more often than not.
One way to improve upon this is to use the function. When represented as a string of characters, the “length” of a positive integer is . Except that is not defined2, so you’ll have to add a special case for that. And there’s these pesky negative numbers too, for which you’ll have to add another byte to account for the minus sign. And then, in all three cases, another byte for the null terminator:
char *inttostr_log(int n) {
int len;
if (n > 0) {
len = (int) floor(log10(n)) + 1;
} else if (n == 0) {
len = 1;
} else if (n < 0) {
len = (int) floor(log10(-n)) + 2;
}
len += 1;
char *s = (char*) malloc(sizeof(char) * len);
sprintf(s, "%i", n);
return s;
}
That’s fairly verbose, and I’m sure there’s an edge case that breaks this implementation. Plus this approach only works for ints
, we haven’t even gotten started with float
s yet!
Meet snprintf
After some research, I came up with the folling three-line function that seems to magically work in all cases (even for float
s if you change the parameter and format string accordingy):
char *inttostr_magic(int n) {
int len = snprintf((char*) 0, 0, "%i", n) + 1;
char *s = (char*) malloc(sizeof(char) * len);
snprintf(s, len, "%i", n);
return s;
}
The function snprintf
was added as a part of the C11 standard and has an additional argument over the good old sprintf
:
int sprintf(char *str, const char *format, ...); int snprintf(char *str, size_t size, const char *format, ...);
…
The functions
snprintf()
andvsnprintf()
do not write more thansize
bytes (including the terminating null byte ('\0'
)). If the output was truncated due to this limit then the return value is the number of characters (excluding the terminating null byte) which would have been written to the final string if enough space had been available.
We can exploit this behavior by passing a size
of 0
. That results in snprintf
not even trying to write to the pointer given in its first argument (so we can safely pass a null pointer) and instead returning the number of characters that would have been written if we hadn’t been so sneaky.
Now we can simply malloc
this number of characters (plus one, for the null terminator), use snprintf
again to actually write our number to the string we’ve just created and return it!
Analysis
When considering time efficiency as well, things look a bit different than what we saw when only looking at how much memory was allocated. Here’s a simple benchmark, with – spoiler alert – an averaged runtime measurement next to each function call:
void inttostr_benchmark() {
char *s = (char*) 0;
for (int i = -1000000; i < 1000000; i++) {
s = inttostr_conventional(i); // => 0.365s
//s = inttostr_log(i); // => 0.379s
//s = inttostr_magic(i); // => 0.544s
}
}
Depending on your needs, it might be your best bet to go with the conventional wisdom. If performance isn’t too important to you, go with the snprintf
approach. As a tradeoff between runtime and memory usage3, manually computing the required string length is an option too – just make sure to cover every edge case!
The code is available as a Gist.
Background
This problem arose when I was faced with the task of serializing data structures4 for logging purposes. I chose to implement a tostr
function for each data type, which recursively calls the tostr
functions of its children and concatenates the results (with some syntactic glue in-between).
At some point, even highly nested data structures – which is the primary use case – boil down to atomic values: int
s, float
s and bool
s (which are really just tiny int
s), which led me to look into what I’ve covered in this article.
-
Actually, it’s even enough for 128-bit (unsigned) integers: . My point is that while it’s guaranteed to be enough space, it’s a bit wasteful for small numbers, which tend to occur more frequently. ↩
-
Its limit is , which is what’s commonly implemented: On my machine, running
printf("%f", log10(0));
prints-inf
, andprintf("%i", (int) log10(0));
yields-2147483648
. ↩ -
And when dealing with bases . ↩
-
Lists, dictionaries and a special flavor of dictionary used to represent database rows. ↩