|
|
Subscribe / Log in / New account

The ups and downs of strlcpy()

Ready to give LWN a try?

With a subscription to LWN, you can stay current with what is happening in the Linux and free-software community and take advantage of subscriber-only site features. We are pleased to offer you a free trial subscription, no credit card required, so that you can see for yourself. Please, join us!

By Michael Kerrisk
July 18, 2012

Adding the strlcpy() function (and the related strlcat() function) has been a perennial request (1, 2, 3) to the GNU C library (glibc) maintainers, commonly supported by a statement that strlcpy() is superior to the existing alternatives. Perhaps the earliest request to add these BSD-derived functions to glibc took the form of a patch submitted in 2000 by a fresh-faced Christoph Hellwig.

Christoph's request was rejected, and subsequent requests have similarly been rejected (or ignored). It's instructive to consider the reasons why strlcpy() has so far been rejected, and why it may well not make its way into glibc in the future.

A little prehistory

In the days before programmers considered that someone else might want to deliberately subvert their code, the C library provided just:

    char *strcpy(char *dst, const char *src);

with the simple purpose of copying the bytes from the string pointed to by src (up to and including the terminating null byte) to the buffer pointed to by dst.

Naturally, when calling strcpy(), the programmer must take care that the bytes being copied don't overrun the space available in the buffer pointed by dst. The effect of such buffer overruns is to overwrite other parts of a process's memory, such as neighboring variables, with the most common result being to corrupt data or to crash the program.

If the programmer can with 100% certainty predict at compile time the size of the src string, then it's possible (if unwise) to preallocate a suitably sized dst buffer and omit any argument checks before calling strcpy(). In all other cases, the call should be guarded with a suitable if statement to check the size of its argument. However, strings (in the form of input text) are one of the ways that humans interact with computers, and thus quite commonly the size of the src string is controlled by the user of a program, not the program's creator. At that point, of course, it becomes essential for every call to strcpy() to be guarded by a suitable if statement:

    char dst [DST_SIZE];
    ...
    if (strlen(src) < DST_SIZE)
        strcpy(dst, src);

(The use of < rather than <= ensures that there's at least one byte extra byte available for the null terminator.)

But it was easy for programmers to omit such checks if they were forgetful, inattentive, or cowboys. And later, other more attentive programmers realized that by carefully controlling what was written into the overflowed buffer, and overrunning into more exotic places such as function call return addresses stored on the stack, they could do much more interesting things with buffer overruns than simply crashing the program. (And because code tends to live a long time, and the individual programmers creating it can be slow to to learn about the sharp edges of the tools they use, even today buffer overruns remain one of the most commonly reported vulnerabilities in applications.)

Improving on strcpy()

Prechecking the arguments of each call to strcpy() is burdensome. A seemingly obvious way to relieve the programmer of that task was to add an API that allowed the caller to inform the library function of the size of the target buffer:

    char *strncpy(char *dst, const char *src, size_t n);
The strncpy() function is like strcpy(), but copies at most n bytes from src to dst. As long as n does not exceed the space allocated in dst, a buffer overrun can never occur.

Although choosing a suitable value for n ensures that strncpy() will never overrun dst, it turns out that strncpy() has problems of its own. Most notably, if there is no null terminator in the first n bytes of src, then strncpy() does not place a null terminator after the bytes copied to dst. If the programmer does not check for this event, and subsequent operations expect a null terminator to be present, then the program is once more vulnerable to attack. The vulnerability may be more difficult to exploit than a buffer overflow, but the security implications can be just as severe.

One iteration of API design didn't solve the problems, but perhaps a further one can… Enter, strlcpy():

    size_t strlcpy(char *dst, const char *src, size_t size);

strlcpy() is similar to strncpy() but copies at most size-1 bytes from src to dst, and always adds a null terminator following the bytes copied to dst.

Problems solved?

strlcpy() avoids buffer overruns and ensures that the output string is null terminated. So why have the glibc maintainers obstinately refused to accept it?

The essence of the argument against strlcpy() is that it fixes one problem—sometimes failing to terminate dst in the case of strncpy(), buffer overruns in the case of strcpy()—while leaving another: the loss of data that occurs when the string copied from src to dst is truncated because it exceeds size. (In addition, there is still an unusual corner case where the unwary programmer can find that strlcat(), the analogous function for string concatenation, leaves dst without a null terminator.)

At the very least, (silent) data loss is undesirable to the user of the program. At the worst, truncated data can lead to security issues that may be as problematic as buffer overruns, albeit probably harder to exploit. (One of the nicer features of strlcpy() and strlcat() is that their return values do at least facilitate the detection of truncation—if the programmer checks the return values.)

All of which brings us full circle: to avoid unhappy users and security exploits, in the general case even a call to strlcpy() (or strlcat()) must be guarded by an if statement checking the arguments, if the state of the arguments can't be predicted with certainty in advance of the call.

Where are we now?

Today, strlcpy() and strlcat() are present on many versions of UNIX (at least Solaris, the BSDs, Mac OS X, and Irix), but not all of them (e.g., HP-UX and AIX). There are even implementations of these functions in the Linux kernel for internal use by the kernel code. Meanwhile, these functions are not present in glibc, and were rejected for inclusion in the POSIX.1-2008 standard, apparently for similar reasons to their rejection from glibc.

Reactions among core glibc contributors on the topic of including strlcpy() and strlcat() have been varied over the years. Christoph Hellwig's early patch was rejected in the then-primary maintainer's inimitable style (1 and 2). But reactions from other glibc developers have been more nuanced, indicating, for example, some willingness to accept the functions. Perhaps most insightfully, Paul Eggert notes that even when these functions are provided (as an add-on packaged with the application), projects such as OpenSSH, where security is of paramount concern, still manage to either misuse the functions (silently truncating data) or use them unnecessarily (i.e., the traditional strcpy() and strcat() could equally have been used without harm); such a state of affairs does not constitute a strong argument for including the functions in glibc.

The appearance of an embryonic entry on this topic in the glibc FAQ, with a brief rationale for why these functions are currently excluded, and a note that "gcc -D_FORTIFY_SOURCE" can catch many of the errors that strlcpy() and strlcat() were designed to catch, would appear to be something of a final word on the topic. Those that still feel that these functions should be in glibc will have to make do with the implementations provided in libbsd for now.

Finally, in case it isn't obvious by now, it should of course be noted that the root of this problem lies in the C language itself. C's native strings are not managed strings of the style natively provided in more modern languages such as Java, Go, and D. In other words, C's strings have no notion of bounds checking (or dynamically adjusting a string's boundary) built into the type itself. Thus, when using C's native string type, the programmer can never entirely avoid the task of checking string sizes when strings are manipulated, and no replacements for strcpy() and strcat() will ever remove that need. One might even wonder if the original C library implementers were clever enough to realize from the start that strcpy() and strcat() were sufficient—if it weren't for the fact that they also gave us gets().


Index entries for this article
SecurityGlibc
SecurityVulnerabilities/Buffer overflow


to post comments

The ups and downs of strlcpy()

Posted Jul 19, 2012 2:01 UTC (Thu) by Ben_P (guest, #74247) [Link] (7 responses)

I've been told that strcpy on x86 receives some substantial help from hardware; more so than strlcpy and strncpy. Can anyone confirm?

The ups and downs of strlcpy()

Posted Jul 19, 2012 11:29 UTC (Thu) by cladisch (✭ supporter ✭, #50193) [Link]

x86 has several string instructions that essentially implement mem* functions: rep movs for memcpy(), repne scas for memchr(), rep stos for memset(), and repe cmps for memcmp().

As far as I can see, strcpy(), strncpy(), and strlcpy() could be implemented equally well on top of these primitives.

The ups and downs of strlcpy()

Posted Jul 19, 2012 22:00 UTC (Thu) by nix (subscriber, #2304) [Link] (4 responses)

Halfway so. On x86-64 with a recent enough glibc, there are multiple assembler versions of strcpy() using plain assembler, SSE2 and SSSE3, using the ifunc mechanism to choose between them: strncpy() has almost as many assembler implementations, lacking only a plain assembler one (indeed it uses the same code as strcpy(), with a tiny macro replacement). stpcpy() is similarly optimized. Furthermore, all three of these functions can be expanded inline in some situations by GCC, without calling down to glibc at all.

strlcpy() gets none of this. (However, a countervailing caveat: ome of the assembler implementations are so huge and unrolled that I'm not sure they don't cost more in icache hit than they gain in speed...)

The ups and downs of strlcpy()

Posted Jul 19, 2012 22:04 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link]

>Furthermore, all three of these functions can be expanded inline in some situations by GCC, without calling down to glibc at all
Quite often that actually _hurts_ the performance, because glibc versions are so über-optimized.

The ups and downs of strlcpy()

Posted Jul 20, 2012 2:45 UTC (Fri) by Ben_P (guest, #74247) [Link] (2 responses)

Hmm, thanks. Do you happen to know if SPARC, Niagara 2's specifically, give strcpy and strncpy similar assistance?

The ups and downs of strlcpy()

Posted Jul 20, 2012 12:54 UTC (Fri) by nix (subscriber, #2304) [Link] (1 responses)

glibc/sysdeps/sparc/sparc32/sparcv9/str*cpy.S redirect to the corresponding files in glibc/sysdeps/sparc/sparc64/, which use the 64-bit SPARC instruction set but do *not* use VIS or other extensions yet. There are VIS-using implementations of memcpy() and memset() in glibc/sysdeps/sparc/sparc64/multiarch/, but nothing similar for string instructions so far. (This is all in glibc trunk.)

Obviously the GCC builtins for strcpy() et al still work.

The ups and downs of strlcpy()

Posted Jul 23, 2012 17:58 UTC (Mon) by BenHutchings (subscriber, #37955) [Link]

David Miller was doing some work on string optimisations on SPARC and more generally a while back; see the March and April 2010 entries in <http://vger.kernel.org/~davem/cgi-bin/blog.cgi/index.html>.

The ups and downs of strlcpy()

Posted Apr 28, 2014 16:56 UTC (Mon) by mirabilos (subscriber, #84359) [Link]

If you want that, *and* if you want to safely use strcpy(), you’d better use memcpy() anyway.

The ups and downs of strlcpy()

Posted Jul 19, 2012 2:25 UTC (Thu) by quotemstr (subscriber, #45331) [Link] (45 responses)

Microsoft came much closer to getting these sorts of functions right than the BSD people did. Consider strcpy_s and strncpy_s. strcpy_s's signature looks like this:
errno_t strcpy_s(
   char *strDestination,
   size_t numberOfElements,
   const char *strSource 
);
By default, the function checks that we're copying at most (numberOfElements - 1) bytes plus a terminating NULL into strDest. If we try to exceed this buffer, strcpy_s crashes explicitly with the equivalent of abort(3). This way, we still deflect a potential security hole and turn it into a relatively safe and controlled (if inconvenient) crash, thereby closing some security holes without introducing new truncation-based attacks.

If you do want the truncation behavior, you can use strncpy_s:

errno_t strncpy_s(
   char *strDest,
   size_t numberOfElements,
   const char *strSource,
   size_t count
);
strncpy_s works just like strncpy, except 1) it has the same buffer-checking safeties as strncpy, 2) always NULL-terminates the destination buffer, aborting if there's not enough room, and 3) doesn't zero-fill the remainder of the buffer if strlen(strSource) < count. This function is useful enough on its own, but you can get strlcpy-like truncation behavior by passing the special value _TRUNCATE for the count parameter. In this case, when the input string is too long, strncpy_s truncates it and instead of aborting, returns the special value STRUNCATE.

By decoupling the output buffer size from the expected number of bytes to copy, we can sidestep a lot of the issues that this article raises for strlcpy. There are also neat C++ template overloads that let you call these functions and have the compiler fill in numberOfElements in the case that you're using C++ and strDestination is an array.

The singular flaw of the entire *_s family of functions is the _set_invalid_parameter_handler function, which allows you to turn off the abort behavior above. There's no good reason to do so, and nobody in practice does, making this family of functions a much better alterantive to strlcpy and friends.

The ups and downs of strlcpy()

Posted Jul 19, 2012 7:35 UTC (Thu) by piman (guest, #8957) [Link] (3 responses)

The _s functions are also part of an optional annex to C11, but I am not sure anyone but Microsoft has implemented them so far. (Probably the only place where MS has led any support for C standards in decades...)

The ups and downs of strlcpy()

Posted Jul 19, 2012 22:04 UTC (Thu) by nix (subscriber, #2304) [Link] (2 responses)

The biggest problem with these functions is that their parameter order is abominable and entirely unlike any other function in the C library, or indeed any other function I have ever seen. dest, n, src, count? What were they *thinking*? Did they just shuffle the formal parameter list at random?

The ups and downs of strlcpy()

Posted Jul 19, 2012 23:26 UTC (Thu) by quotemstr (subscriber, #45331) [Link] (1 responses)

The parameter order seems fine to me --- you have (buffer, size_of_buffer) pairs. How would you do it?

The ups and downs of strlcpy()

Posted Jul 20, 2012 12:46 UTC (Fri) by nix (subscriber, #2304) [Link]

The problem with that parameter order is simply that it's completely different from everything else in the C library. Everything else packs the size-of arguments together, although one could argue that this is because they normally relate to element and array size of a single entity.

Ah well, I'm picking nits anyway. The argument order is abominable *to me*, and I'd never get it right without looking it up, but this is a personal stylistic foible.

The ups and downs of strlcpy()

Posted Jul 20, 2012 4:06 UTC (Fri) by cmccabe (guest, #60281) [Link] (40 responses)

Please. You really think that aborting the program is the right behavior when a string is too long?

Clue: it's not. And so you're back to checking "if strlen(...)" which you could have done without the _s functions.

If mandatory checks are what you want, use something like electric fence, -D_FORTIFY_SOURCE, or, best of all, a managed language!

strcpy_s is about as useful as a screen door on a submarine.

The ups and downs of strlcpy()

Posted Jul 20, 2012 4:15 UTC (Fri) by quotemstr (subscriber, #45331) [Link] (8 responses)

> You really think that aborting the program is the right behavior when a string is too long?

Yes, I do. It's difficult to turn abort() into an escalation-of-privilege. In a well-written program, you never get to the abort() call because you've already checked the length of the input string and done something sensible, which is rarely to just truncate it.

> Clue: it's not.

I'm glad we're civil around here.

> If mandatory checks are what you want, use something like electric fence, -D_FORTIFY_SOURCE, or, best of all, a managed language!

Programs can't run normally under electric fence. -D_FORTIFY_SOURCE is nice, but it only works when the compiler knows the size of the destination buffer. Sometimes it doesn't, but you do, and you can tell the compiler about the destination buffer. strcpy_s isn't a substitute for length checks; as I mentioned in my first post, you can use strncpy_s to tell the compiler (and your reviewer!) explicitly that you want string truncation. strcpy_s ensures that if you _do_ screw up, your program fails in an obvious and controlled fashion instead of veering off into exciting undefined behavior. -D_FORTIFY_SOURCE can't make the same guarantees in all cases.

> strcpy_s is about as useful as a screen door on a submarine.

It'd reflect better on you if you used metaphors that had some relationship to your argument.

The ups and downs of strlcpy()

Posted Jul 20, 2012 7:20 UTC (Fri) by cmccabe (guest, #60281) [Link] (7 responses)

> Yes, I do. It's difficult to turn abort() into an escalation-of-privilege

However, it's easy to turn abort() into a denial of service.

Look, I realize you are serious, and I'm sorry if I was overly snarky. But your idea just does not make sense. You can't magically make C into a managed language by adding more layers of bureaucracy. It's been tried before and it just doesn't work. And that's all I'm going to say about that.

The ups and downs of strlcpy()

Posted Jul 20, 2012 13:25 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link]

DoS is much preferable to complete pwning. And the point is, abort()-ing means that there would have been a buffer overflow more likely than not.

The ups and downs of strlcpy()

Posted Jul 20, 2012 17:33 UTC (Fri) by bronson (subscriber, #4806) [Link] (5 responses)

Who said anything about turning C into a managed language?

> You really think that aborting the program is the right behavior when a string is too long?

Yes, obviously yes. You are now outside the parameters of the program as written and the only 100% safe thing to do is just stop.

Or do you know of a magic solution that's not subject to silent truncation attacks?

The ups and downs of strlcpy()

Posted Jul 20, 2012 17:51 UTC (Fri) by jimparis (guest, #38647) [Link] (4 responses)

> > You really think that aborting the program is the right behavior when a string is too long?

> Yes, obviously yes. You are now outside the parameters of the program as written and the only 100% safe thing to do is just stop.

It's not obvious, and it's not always true. Security is hard and there's not always one single answer.

If my code is trying to concatenate "/etc/passwd" and ".bak", then yes, it is likely better to stop executing rather than fail to append the suffix.

But if my code is a web server reading someone's preferred subtitle from a form, it's likely better to truncate "Jimparis the magnificent" to just "Jimparis the magni" if it can't fit in my buffer -- the rest of the code will behave no differently than if the user had just typed the truncated version in the first place, while bringing down a whole server process can easily turn into a DoS.

The ups and downs of strlcpy()

Posted Jul 20, 2012 18:29 UTC (Fri) by quotemstr (subscriber, #45331) [Link]

> it's likely better to truncate "Jimparis the magnificent" to just "Jimparis the magni"

If you want that behavior, you can ask for it. If the programmer doesn't specify, the safer thing to do is abort. You'll notice an abort and fix it fast. You might not notice a truncation vulnerability until it's too late.

The ups and downs of strlcpy()

Posted Jul 20, 2012 20:31 UTC (Fri) by bronson (subscriber, #4806) [Link] (2 responses)

Yes, but is libc told how the string is being used? No.

So what's the only safe thing for libc to do when it notices that initial conditions are invalid?

The ups and downs of strlcpy()

Posted Jul 21, 2012 4:19 UTC (Sat) by cmccabe (guest, #60281) [Link] (1 responses)

libc can't "notice that the conditions are invalid," because C IS NOT A MANAGED LANGUAGE.

We all make copy and paste errors and all other things being equal, long, hard to inspect C code is less secure than short and clear code.

The ups and downs of strlcpy()

Posted Jul 25, 2012 2:00 UTC (Wed) by bronson (subscriber, #4806) [Link]

You don't need a managed language to make strlcpy abort instead of truncating.

I agree with the rest of your comment.

The ups and downs of strlcpy()

Posted Jul 20, 2012 8:06 UTC (Fri) by renox (guest, #23785) [Link] (30 responses)

> Please. You really think that aborting the program is the right behavior when a string is too long?

There is a "fail fast" design to abort as early as possible and let the parent process handle the error: Erlang's programs tend to act like this.

So aborting early is a reasonable design choice (Rust will be like this too), which can be used with C too, being snarky only makes you look foolish|ignorant.

The ups and downs of strlcpy()

Posted Jul 23, 2012 2:19 UTC (Mon) by cmccabe (guest, #60281) [Link] (29 responses)

Calling exit() because of a routine error is very bad form in C. For one thing, it makes your code impossible to use in a library.

From https://git.kernel.org/?p=linux/kernel/git/kay/libabc.git;...
> Never call exit(), abort(), be very careful with assert()
> - Always return error codes.
> - Libraries need to be safe for usage in critical processes that
> need to recover from errors instead of getting killed (think PID 1!).

Similar coding standards exist for Java. In fact the findbugs static code checker will flag calls to System.exit as problems.

Exiting an "erlang process" (really a green thread) doesn't terminate the whole application. I'm pretty sure you know this, so I don't know why you're bringing it up. You are comparing apples and oranges.

The ups and downs of strlcpy()

Posted Jul 23, 2012 2:24 UTC (Mon) by quotemstr (subscriber, #45331) [Link]

There's a difference between runtime errors and logic errors. The former are things that can go wrong for reasons outside the program's control. These should be reported in a way that allows recovery, and for these errors, exiting the program is inappropriate. The latter class of error always indicates a problem in the structure of the program, and the safest way to react to them is to abort the program. The idea behind strcpy_s is that an overlong string that makes it as far as strcpy_s represents a logic error in the program, and that there is no sensible way to continue past that point. If a program receives untrusted a string of unknown length as input, the program should first check the string's length, reject it with an actionable error if too long, and only then pass it to a lower layer that might use strcpy_s. strcpy_s should be used only on strings that _should_ be valid according to the programmer's mental model of the program. The function exists because it's easy to get these models subtly wrong.

The ups and downs of strlcpy()

Posted Jul 23, 2012 7:53 UTC (Mon) by renox (guest, #23785) [Link] (27 responses)

So? I didn't claim that the 'fail fast' design applies to all the cases, just that it can be a reasonable way to implement an application

> I don't know why you're bringing it up. You are comparing apples and oranges.

Nope. Many programs use the 'fail fast' design (a big percentage of the Erlang's program do), what is useful in Erlang can be useful in C..

The ups and downs of strlcpy()

Posted Jul 23, 2012 12:11 UTC (Mon) by nix (subscriber, #2304) [Link] (26 responses)

Quite. Not only is this useful for logic errors, it's useful for runtime error paths that are almost impossible to test and that it is nearly impossible to continue execution past.

The classic example of this is OOM. I would divide this into two subsets: if you're writing a library routine whose primary purpose is memory allocation and that can be expected to allocate a lot (e.g. a data structure's initialization function), and it runs out of memory, then by all means free what you allocated and return NULL. But if you're writing a library routine whose primary purpose is something else, and recovery from OOM is going to be tricky, then just exit() (and document this policy, of course). Your caller is unlikely to be able to do anything much on OOM anyway, exiting will free up memory at once, and if your caller is desperate to clean up or even jump out and keep going after freeing up memory, that's what atexit() is for.

But, be honest, your caller isn't going to jump away and keep going on OOM, your caller will just die: anything else is too hard to test properly. If you're lucky your caller might arrange to clean up in atexit() handlers, though I note the X server never does this and just appears to *hope* that none of its libraries exit on OOM. But perhaps this is because you can't even rely on cleaning up in atexit() handlers, because if you happen to OOM in a stack allocation the kernel is just going to kill you. So it doesn't matter if you have lots of complex cleanup-and-continue OOM code, you have to cater for an immediate exit without cleanup *anyway*. And you can't avoid this merely by not using malloc(): you have to not call functions either, at least not without 'preallocating' stack space by doing a deep recursion in advance. A few programs actually do this, but it's rare.)

There is one thing I wish we could get, but is really hard to do properly -- an automatic backtrace on OOM, so we could tell roughly which allocation was failing and why. Unfortunately on most platforms that requires a modicum of debugging information for everything, and that's huge and not loaded by default, even if it's present, so you'd be unlikely to be able to consult it at OOM time anyway.)

The ups and downs of strlcpy()

Posted Jul 23, 2012 13:21 UTC (Mon) by Cyberax (✭ supporter ✭, #52523) [Link] (24 responses)

You can produce backtrace without symbol names and lookup symbol names later. Windows mini-dumps certainly allow this, though it's sometimes tricky to keep all the required debugging information at hand.

The ups and downs of strlcpy()

Posted Jul 24, 2012 17:16 UTC (Tue) by nix (subscriber, #2304) [Link] (23 responses)

Producing a backtrace of any sort without frame pointers is very hard, and x86-64 GCC disables them by default (as does x86-32 modern GCC). To produce backtraces on such systems, you need DWARF debugging info -- though perhaps the exception frame section would serve the same purpose, though of course it too is not loaded by default. I suppose you could write the entire stack to disk on OOM (as long as it's not too big -- coredumping may be much harder, as if you're out of memory the full coredump is likely to be huge).

The ups and downs of strlcpy()

Posted Jul 24, 2012 18:16 UTC (Tue) by renox (guest, #23785) [Link] (3 responses)

> x86-64 GCC disables them by default (as does x86-32 modern GCC)

I wonder why the GCC developpers chose this default behaviour, x86-64 isn't register starved like x86-32.

The ups and downs of strlcpy()

Posted Jul 24, 2012 23:19 UTC (Tue) by nix (subscriber, #2304) [Link] (2 responses)

Because the ABI allows it, because it still provides a performance improvement (somewhere between 1% and 5%, not insignificant, though well below the 8--12% I've seen reported for x86-32), and because it's useless -- everything from GDB through libgcj and now I find even glibc backtrace() uses the DWARF unwinder tables instead. Why maintain a 'feature' which costs a register and adds runtime overhead to every function call when nobody needs it?

The ups and downs of strlcpy()

Posted Jul 25, 2012 15:11 UTC (Wed) by paulj (subscriber, #341) [Link] (1 responses)

One reason is debugging stack corruption, where normal tools may not give meaningful backtraces. With frame-pointers, you can easily figure out where earlier, uncorrupted, frames really are, and figuring out why it crashed.

The ups and downs of strlcpy()

Posted Jul 25, 2012 17:17 UTC (Wed) by nix (subscriber, #2304) [Link]

Yep. That's why frame pointers should be *enableable*. It doesn't mean they should be on by default.

The ups and downs of strlcpy()

Posted Jul 24, 2012 19:48 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link] (1 responses)

Windows does both. It uses the FS register pointing to thread information block to track frame pointers (since Windows uses SJLJ exceptions) and it also does crashdumps that contain offending threads' stacks. Minidumps are usually quite small.

The ups and downs of strlcpy()

Posted Jul 29, 2012 15:34 UTC (Sun) by pbonzini (subscriber, #60935) [Link]

Exception handler information pointed to by FS only tracks a (possibly very small) subset of frames, since most frames do not install an exception handler.

The ups and downs of strlcpy()

Posted Jul 24, 2012 20:37 UTC (Tue) by foom (subscriber, #14868) [Link] (16 responses)

The eh_frame section *is* loaded by default. And the backtrace function in glibc uses it, even.

So, no, it's not hard to produce a backtrace. You just call the function.

The ups and downs of strlcpy()

Posted Jul 24, 2012 23:10 UTC (Tue) by nix (subscriber, #2304) [Link] (15 responses)

Sorry, I misspoke, was thinking of other DWARF sections. .eh_frame is loadable, but is, of course, like all ELF sections, *mapped* in. In a severe OOM situation, it's very likely that you won't be able to map more pages in, and that not much of it is going to be mapped in at any given time.

(I must have been unlucky with backtrace(). It's never been willing to do more than coredump for me without frame pointers. Mind you I haven't tried it for years because I was so sure it was broken :) time to try it out in my next debugging blitz.)

The ups and downs of strlcpy()

Posted Jul 25, 2012 2:19 UTC (Wed) by quotemstr (subscriber, #45331) [Link] (12 responses)

> In a severe OOM situation, it's very likely that you won't be able to map more pages in, and that not much of it is going to be mapped in at any given time.

If you run your system in a sane configuration, the kernel should have swap somewhere it can use to evict the pages it'll need to hardfault the pages from the unwind section. It might be slow, but it'll work. If you run overcommit, there's no such guarantee, but if you run overcommit, why the hell are you complaining about OOM behavior?

The ups and downs of strlcpy()

Posted Jul 25, 2012 13:45 UTC (Wed) by nix (subscriber, #2304) [Link] (11 responses)

If you have swap free somewhere, you are not in an OOM situation. OOM happens when you're out of RAM *and* swap (or when you hit your RLIMIT_AS boundaries, I suppose).

If you are in overcommit state 2 with an overcommit ratio of 0, you may be right, but neither of these are the default -- and even then, I don't believe Linux reserves swap pages for every in-memory page, like Solaris does (a good thing too, it's fantastically annoying as even very-much-not-OOM systems can find fork() failing because there's not enough swap to back every page that might get dirtied in the new address space, even if it's only going to exec() and throw them all away).

The ups and downs of strlcpy()

Posted Jul 27, 2012 19:44 UTC (Fri) by quotemstr (subscriber, #45331) [Link] (10 responses)

> OOM happens when you're out of RAM *and* swap

OOM happens when you're out of commit. If you're doing it right, you paid for the commit for the pages you'll need when you loaded the image, so making the backtrace tables resident should still be possible.

> a good thing too, it's fantastically annoying as even very-much-not-OOM systems can find fork() failing because there's not enough swap to back every page that might get dirtied in the new address space, even if it's only going to exec() and throw them all away

I disagree: strict commit accounting makes a system more predictable in practice. If you find fork failing, you should either add more swap (which won't actually get used, as you note, except in the worst case) or change your program to use vfork or posix_spawn instead, both of which don't have the intrinsic commit-accounting problems of fork.

The ups and downs of strlcpy()

Posted Jul 28, 2012 10:41 UTC (Sat) by nix (subscriber, #2304) [Link] (9 responses)

If you find fork failing, you should either add more swap (which won't actually get used, as you note, except in the worst case) or change your program to use vfork or posix_spawn instead, both of which don't have the intrinsic commit-accounting problems of fork.
Right. So I'm a mere user on a system with 250 users. fork() is failing in my Emacs so I can't start a shell (Emacs is much bigger than a shell). And your proposal for fixing this awful user interface failure is either to beg the sysadmin to add swap (I did, he said no, of course turning overcommit off was out of the question as this machine was running a database, never mind that it was a test instance that nobody was using, also it was 'like Solaris does it' and he liked Solaris) or spend time hacking at Emacs and every other program that uses fork()/exec() -- i.e. nearly everything in Unix -- so it no longer does?! This despite the fact that vfork() cannot do many of the things you do between a fork() and exec(), and posix_spawn() cannot do any of them unless the developer of posix_spawn() thought of it, hence the appallingly insane complexity of the interface? And this on a machine with almost no memory left? And this when I'm supposed to be getting something else done?

Your former proposal betrays your single-user roots. Your latter proposal betrays your ignorance of what makes fork()/exec() better than the Windows model in the first place. Neither is at all times practical: the latter in particular is absolutely crackpot.

Thank goodness I can turn overcommit off on my own systems.

The ups and downs of strlcpy()

Posted Jul 28, 2012 10:43 UTC (Sat) by nix (subscriber, #2304) [Link] (3 responses)

The reason why my rant above sounds terribly specific is that this scenario actually happened to me. And kept happening to me, every week or so, for *years*, costing me perhaps time begging people to close other jobs down each time.

Needless to say the thought of rewriting (then X)Emacs's ferociously complex subprocess-handling infrastructure to use posix_spawn() never crossed my mind. (I tried vfork(), but that was clearly out of the question.)

The ups and downs of strlcpy()

Posted Jul 31, 2012 1:30 UTC (Tue) by khc (guest, #45209) [Link] (1 responses)

nevermind that posix_spawn() uses fork/exec on linux anyway

The ups and downs of strlcpy()

Posted Jul 31, 2012 23:38 UTC (Tue) by nix (subscriber, #2304) [Link]

True, so the underlying overcommit problem isn't actually fixed by it, except inasmuch as it sometimes falls back to vfork() for you. It just makes your software much much uglier, and makes it work better on major platforms such as MMU-less embedded systems, the Hurd, and Cygwin.

The ups and downs of strlcpy()

Posted Aug 1, 2012 15:53 UTC (Wed) by quotemstr (subscriber, #45331) [Link]

Emacs already uses vfork if it's available. (Read the source.) Perhaps something else was wrong with that system.

The ups and downs of strlcpy()

Posted Jul 29, 2012 2:18 UTC (Sun) by foom (subscriber, #14868) [Link] (3 responses)

fork() is pretty evil, especially now that we have multi-threaded programs.

It would be pretty cool if you could spawn an empty process in a stopped state, and then poke at it from the parent for a bit (open up new file descriptors/etc) before causing it to exec a real subprocess.

Doing things that way would avoid all the memory accounting issues, the performance issue of copying the page table for no good reason, and the significant complication of not actually being allowed to do anything that's not async-signal-handler-safe between fork() and exec(). (And nearly nothing actually falls into that category!)

The ups and downs of strlcpy()

Posted Jul 29, 2012 13:26 UTC (Sun) by nix (subscriber, #2304) [Link] (2 responses)

It would be pretty cool if you could spawn an empty process in a stopped state, and then poke at it from the parent for a bit (open up new file descriptors/etc) before causing it to exec a real subprocess.
You can do that with PTRACE_O_TRACEFORK or PTRACE_O_TRACEEXEC, but as with anything involving ptrace() there are so many tentacles that virtually any alternative is preferable.

The ups and downs of strlcpy()

Posted Jul 30, 2012 1:51 UTC (Mon) by foom (subscriber, #14868) [Link] (1 responses)

Apparently not *any* alternative, or a new userspace API would have been merged upstream by now. :)

The ups and downs of strlcpy()

Posted Jul 30, 2012 8:46 UTC (Mon) by nix (subscriber, #2304) [Link]

Yeah, true. But if ptrace() was something everyone had to use, a replacement would have been merged by now, because ptrace() is just so odious in so very many ways. (Though the improvements in recent kernels have been substantial, and in maybe as few as five to ten years I'll be able to rely on them enough to actually use them in real software, which these days means "meant to be portable between Linux distros, including the dinosaur-era RHELs too many people insist on running their bleeding-edge software on". sigh.)

The ups and downs of strlcpy()

Posted Aug 1, 2012 15:50 UTC (Wed) by quotemstr (subscriber, #45331) [Link]

> Right. So I'm a mere user on a system with 250 users

That's a rare edge case these days, like it or not. If you do regularly use such a system, it's the administrator's job to make sure system resources are adequate. The kernel is there to accurately account for system resources, not work around your sysadmin's snobbery.

> This despite the fact that vfork() cannot do many of the things you do between a fork() and exec()

Such as?

> hence the appallingly insane complexity of the interface

I don't think the interface is particularly complex. It's less complex than pthreads, certainly.

> the latter in particular is absolutely crackpot.

Do you really need to make it personal?

> Thank goodness I can turn overcommit off on my own systems

I think you mean "on".

The ups and downs of strlcpy()

Posted Jul 25, 2012 15:40 UTC (Wed) by mmorrow (guest, #83845) [Link] (1 responses)

Backtracing on x86_64 is actually quite reasonable. Here are two methods:
#if defined(USE_BACKTRACE)
/*
  -DUSE_BACKTRACE -rdynamic
  (-rdynamic for backtrace_symbols)
*/
#include <execinfo.h>
void print_trace(void)
{
  const size_t n = 10
  void *array[n];
  size_t size = backtrace(array,n);
  char **strings = backtrace_symbols(array,size);
  for(size_t i = 0; i < size; i++)
    fprintf(stderr,"%s\n",strings[i]);
  free(strings);
}
#elif defined(USE_LIBUNWIND)
/*
  -DUSE_LIBUNWIND -lunwind-x86_64
*/
#include <libunwind.h>
void print_trace(void)
{
  unw_cursor_t cur;
  unw_context_t cxt;
  unw_getcontext(&cxt);
  unw_init_local(&cur,&cxt);
  while(unw_step(&cur) > 0)
  {
    unw_word_t off, pc;
    char fname[64] = {[0] = '\0'};
    unw_get_reg(&cur,UNW_REG_IP,&pc);
    unw_get_proc_name(&cur,fname,sizeof(fname),&off);
    printf("%p: (%s+0x%x) [%p]\n",pc,fname,off,pc);
  }
}
#endif

The ups and downs of strlcpy()

Posted Jul 25, 2012 15:55 UTC (Wed) by mmorrow (guest, #83845) [Link]

9c9
<   const size_t n = 10
---
>   const size_t n = 10;

The ups and downs of strlcpy()

Posted Jul 24, 2012 6:37 UTC (Tue) by kleptog (subscriber, #1183) [Link]

There are programs that attempt to recover from OOM, PostgreSQL for example. It has a pre-allocated area which it uses to create the error message to send to the client and the rip-cord allocator will quickly release any memory allocated to the current query context.

It's not perfect of course, if the client is using SSL then you have to rely on the SSL library to not do anything silly but it's worked every time for me. On the client you get a nice message along the lines of "server ran out of memory". Your transaction is aborted, but the rest of the server is still running.

This obviously only works if malloc() returns NULL, so memory overcommit needs to be off. OOM during stack growth is uncatchable, you can only try to mitigate the risk bit keeping your stack small.

I just wanted to point out that it is possible to create code that handles OOM, and it's not helpful if libraries assume they can just die in that case.

strncpy() history

Posted Jul 19, 2012 3:49 UTC (Thu) by tialaramex (subscriber, #21167) [Link] (5 responses)

I was led to believe that strncpy() is the way it is not because the developers were trying to "fix" strcpy() to be secure (for which it's a terrible design) but because they wanted to truncate strings in fixed-width buffers that aren't themselves terminated anyway. Writing an "extra" NUL byte would be either wasteful or actively disastrous depending on whether it fitted inside the fixed width buffer or destroyed the next byte after it.

Such structures hardly exist at all today, but were very common back when we had things like record-oriented file operations and people worried about "wasting space" with the redundant century fields of a date...

strncpy() history

Posted Jul 19, 2012 5:40 UTC (Thu) by akanaber (subscriber, #23265) [Link] (4 responses)

Yes, that's why strncpy pads out the rest of the destination with NULs if strlen(src) < n
It's not terribly well named, it's really a function for converting between C-style null-terminated strings (src) and fixed-length records (dst). As you say it would be an insane design for a "safer strcpy", and as the article mentioned people who try to use it that way just introduce a different security hole.

strncpy() history

Posted Jul 19, 2012 12:33 UTC (Thu) by Yorick (guest, #19241) [Link]

I agree that strncpy() is badly named, and that this has led people to misunderstand its function and purpose. It is still useful for (mostly legacy) wire protocols and on-disk formats where, in addition to the fixed record size, the null byte padding is important (in order not to leak any information).

By the way, strncpy() does not seem to have an obvious inverse in the C library. Perhaps sscanf can be used, but it feels a little hacky and not what I would use in code that needs to go fast.

strncpy() history

Posted Jul 19, 2012 14:04 UTC (Thu) by danscox (subscriber, #4125) [Link] (2 responses)

The original reason for strncpy() was when directory names were limited to 14 chars. The other two bytes contained the inode number. For that particular case, strncpy() worked quite well. Yes, I've been at this for 'way too long now ;-).

strncpy() history

Posted Jul 19, 2012 19:21 UTC (Thu) by smoogen (subscriber, #97) [Link]

Dear lord.. that sentence led back to flashbacks from the 80's. Thanks.

strncpy() history

Posted May 25, 2017 1:36 UTC (Thu) by rlhamil (guest, #6472) [Link]

I thought that was the case too (14 character filenames in a directory entry only null-terminated if shorter than that), but looking through the v7 code, I didn't really see much in the way of examples of that, certainly not in the kernel. Here's everything I found (this is running on v7 on an emulated PDP-11 that is so much faster than the real thing that it reminds me how painfully slow they were by comparison):

unixv7# time find /usr/src /usr/sys -type f -name '*.[ch]' -exec grep strncpy /dev/null {} \;
/usr/src/libc/gen/strncpy.c:strncpy(s1, s2, n)
/usr/src/cmd/crypt.c: strncpy(buf, pw, 8);
/usr/src/cmd/ed.c: strncpy(buf, keyp, 8);
/usr/src/cmd/login.c:#define SCPYN(a, b) strncpy(a, b, sizeof(a))
/usr/src/cmd/mkdir.c: strncpy(pname, d, slash);
/usr/src/cmd/atrun.c: strncpy(file, dirent.d_name, DIRSIZ);
/usr/src/cmd/xsend/lib.c: strncpy(buf, s, 10);
/usr/src/cmd/ranlib.c: strncpy(firstname, arp.ar_name, 14);

real 7.0
user 2.0
sys 4.5

Without looking further, three appear to deal with pathname components, three deal with other things, one is a macro (turns out it was for a not-necessarily-terminated utmp.ut_name field), and one is the definition of strlcpy.

The ups and downs of strlcpy()

Posted Jul 19, 2012 4:26 UTC (Thu) by wahern (subscriber, #37304) [Link] (22 responses)

The premise is completely wrong here.

First of all, checking and branching on the return value of strlcpy is easier than other interfaces. The return value gives you all the information you need to know: specifically, whether truncation occurred and how much you need to grow your buffer. Other interfaces require more logic to get at this information. In other words, strlcpy is a more elegant abstraction.

But more importantly, the idea that you _need_ to check the return value is wrong. If truncation occurs, it's likely because there's input that is unexpected by the developer. This may because the developer was lazy, or maybe because there's an some implicit contract or protocol.

Either way, the input is garbage. Garbage in is garbage out. Now, you can try to catch that garbage, but why? If someone gives me a domain length of 300 characters, should I bail or just silently truncate? It depends. you could bail, but then you may have to add a new error path unlikely to be tested much, if ever. Sometimes it's better to just handle garbage as sanely as possible, and that means just going through the motions.

More bugs occur in "exception" blocks than perhaps anywhere else. NTP bug last month? Exception. Memory errors. Signals. Etc. I mean, come on people!

So, sometimes the sanest thing to do with garbage is to keep trucking along, and let the user reap what he has sown, at least as long as the garbage output is direct function of the garbage input. Thus, strlcpy has a third useful mode: truncation.

No other alternative to strlcpy does all three of these things. And no other alternative or set of alternatives is as simple to use. Simple code is better code, _always_.

This debate over strlcpy is plain stupid. Of course there are always better ways to do address any particular scenario. But the job of a C library isn't to give you the perfect tool for each and every job. It's to provide a small collection of tools with higher average utility. This reductive obsession with telling people to resort to things like memcpy, or to use dynamic string libraries, is insane. It's like taking away condoms and telling people to get married before they have sex. It ain't gonna happen. The world isn't that simple.

The ups and downs of strlcpy()

Posted Jul 19, 2012 15:55 UTC (Thu) by RobSeace (subscriber, #4435) [Link] (8 responses)

Indeed, it seems to me the same arguments against strlcpy() could be made against snprintf(), and yet everyone seemed to accept that THAT was clearly worth having as a superior alternative to sprintf()... Maybe it's just some people like "n" and really hate "l"? ;-) Admittedly, I do find the strl*() names rather nonstandard and hard to get used to... But, given than strn*() was already taken, they didn't have much choice, and nothing else would be much of an improvement... Maybe nstrcpy()?

The ups and downs of strlcpy()

Posted Jul 19, 2012 22:08 UTC (Thu) by nix (subscriber, #2304) [Link] (4 responses)

snprintf() has the advantage that there was no sane way to use sprintf() securely -- you couldn't tell how big a buffer it would need in the general case without reimplementing the guts of sprintf() yourself. You couldn't do it by tracking some extra value, the way you can with string lengths. snprintf() lets you do it with two calls, one with a tiny size, then one with the size returned from the previous call, plus one. Much better.

The ups and downs of strlcpy()

Posted Jul 19, 2012 22:44 UTC (Thu) by RobSeace (subscriber, #4435) [Link] (3 responses)

> snprintf() has the advantage that there was no sane way to use sprintf()
> securely -- you couldn't tell how big a buffer it would need in the
> general case without reimplementing the guts of sprintf() yourself.

I'm not sure how long glibc has had open_memstream(), but you could've done it with that and fprintf() instead... Or, asprintf(), if that's been around longer... Or, hell, you could always have just fprintf()'d to a temp file, checked the size, allocated a buffer, and read the file back in... Oh, wait, you said "sane"... ;-)

I just thought the same "Oh noes, truncation!!!" worries applied to snprintf(), as well... And, frankly, I haven't heard of that causing major security nightmares anywhere yet... Has it?

The ups and downs of strlcpy()

Posted Jul 19, 2012 23:11 UTC (Thu) by nix (subscriber, #2304) [Link] (2 responses)

I said 'sane' but I assumed 'portable': this stuff had to run on Solaris as well as Linux. Neither asprintf() nor open_memstream() are portable, alas.

The ups and downs of strlcpy()

Posted Jul 20, 2012 10:09 UTC (Fri) by RobSeace (subscriber, #4435) [Link] (1 responses)

Ah, I thought we were just talking about the rationale for getting accepted into glibc or not... Was it just that snprintf() was accepted by POSIX/ISO, so glibc was forced to accept it as a standard, while strl*() wasn't?

The ups and downs of strlcpy()

Posted Jul 20, 2012 12:44 UTC (Fri) by nix (subscriber, #2304) [Link]

Ah, right. From a *user's* perspective we cannot use open_memstream() et al because they are not portable. From a *library developer's* perspective we shouldn't introduce strlcpy() because it sucks and everyone should just use open_memstream(). :)

Obviously the only real solution is to write your own string abstraction using counted strings or whatever floats your boat...

The ups and downs of strlcpy()

Posted Jul 20, 2012 4:18 UTC (Fri) by cmccabe (guest, #60281) [Link] (2 responses)

Yeah, the debate over strlcpy doesn't make a whole lot of sense.

Programmers who want to check for truncation will find it easier to do with strlcpy. It requires fewer lines of code. That makes it easier to audit the code and find bugs. Programmers who don't want to check for truncation aren't going to do so anyway, no matter what API you add or don't add to the standard library.

C is a fundamentally different language than most of the modern languages out there. It's not managed. There will always be a way for the programmer to screw up. People seem to not grasp this concept. They think strlcpy should not be added, because somewhere-- maybe hiding under the couch-- is a perfect function which will magically make copying strings in C safe, like in the managed languages. There isn't.

The ups and downs of strlcpy()

Posted Jul 20, 2012 21:08 UTC (Fri) by smurf (subscriber, #17840) [Link] (1 responses)

Not to mention the fact that you don't need two passes through your string.

Look, we all know that there's no magic bullet in programming. Different tools do different parts of the job well, others … not so much.

Fortunately, there are two quite simple workarounds for not having strlcpy in libc:

* add -lbsd to your GCC command line.

* #define strlcpy(d,n,s) snprintf((d),(n),"%s",(s))

Which of these is more efficient is left as an exercise to the reader. :-P

The ups and downs of strlcpy()

Posted Aug 24, 2013 23:57 UTC (Sat) by tjc (guest, #137) [Link]

> #define strlcpy(d,n,s) snprintf((d),(n),"%s",(s))

I just tried this (better late than never!), and parameters 2 and 3 are reversed. Fore the sake of posterity, it should be:

#define strlcpy(d,s,n) snprintf((d),(n),"%s",(s))

The ups and downs of strlcpy()

Posted Jul 20, 2012 12:13 UTC (Fri) by vonbrand (subscriber, #4458) [Link]

The problem with your "just keep on trucking with garbage input" is that that case will be as little tested as the error handling code it replaces, and I'm not so sure it will be handled with any real care for wacky data by your average programmer.

The ups and downs of strlcpy()

Posted Jul 20, 2012 21:21 UTC (Fri) by epa (subscriber, #39769) [Link] (11 responses)

Silently truncating sounds like a nice way to deal with garbage input, but it creates security holes if not done consistently. Suppose one function uses char[300] for a domain name while another has char[200]. An attacker can pass a string of 250 characters where different initial substrings will be seen by the two functions. If the function doing validation truncates silently to 200 characters but other code uses a longer length, the attacker can sneak in nasty stuff after the first 200 chars.

The _s family of functions mentioned elsewhere in the discussion sound like the right approach. If you want truncation, and you've thought about the consequences, then you can ask for it. If you haven't thought about it and you don't explicitly check for the too-long error case, then the fail-safe behaviour is just to abort if this happens.

The ups and downs of strlcpy()

Posted Jul 20, 2012 21:36 UTC (Fri) by dlang (guest, #313) [Link] (10 responses)

_silently_ truncating is a problem.

truncating and returning an error saying that you did so leaves the error checking up to the programmer where it belongs.

sometimes programmers will not check error conditions properly, if they don't their software will have problems no matter what the library routines do.

If I have a service serving thousands of users per second, shutting the entire service down because someone entered a too-long string is unlikely to be what I want to have happen.

The ups and downs of strlcpy()

Posted Jul 20, 2012 23:48 UTC (Fri) by apoelstra (subscriber, #75205) [Link] (9 responses)

>If I have a service serving thousands of users per second, shutting the entire service down because someone entered a too-long string is unlikely to be what I want to have happen.

But if this happened, and it had never occurred to you that it -could- happen, better that the program crashes than to do something unpredictable.

On the other had, if it did occur to you, you'd have put a check in.

The ups and downs of strlcpy()

Posted Jul 21, 2012 0:27 UTC (Sat) by dlang (guest, #313) [Link] (8 responses)

it all depends on what you are doing.

not everything is so critical that being protected against every possible exploit is the most important thing.

If you are running a game server, doing something "unexpected" for someone who puts in a 200 character name may be preferred to shutting down the game for everyone else.

It also depends on what the worst 'unexpected' think that it could do is. If it's "gain a shell prompt on the server" it's a lot more significant than put the wrong thing in a high score list"

You are exibiting the biggest failing of security people, mistaking security as an end in and of itself as opposed to being a tool to support everything else. As a security person myself, it's a tendency that I trip over regularly in myself. Everything has a cost and sometimes the cost of something is higher than the thing it's preventing.

The ups and downs of strlcpy()

Posted Jul 23, 2012 20:12 UTC (Mon) by mmeehan (subscriber, #72524) [Link] (5 responses)

Most of the debate seems to center around a split in what contract developers feel they have with their libraries. The camps are:
* If I send garbage to a function it should fail immediately (abort). The state of my program is undefined.
* If I send garbage to a function it should deal with it somehow (truncate & null pad). No library should ever abort, because that may make my program crash when the error was survivable.
* If I send garbage to a function it's my own fault and my code should check pre and post conditions (silently do whatever). If you fail to do this you will have buffer overflows and get pwned.

I like how glib handles this (G_DISABLE_CHECKS and G_DISABLE_ASSERT). Normally function calls are made safe with g_assert and g_return_if_fail macros, but if you'd like to be unsafe (and slightly faster), you can disable them with compile-time options. By default you're safer security-wise, but you can remove the brakes if you desire.

The ups and downs of strlcpy()

Posted Jul 25, 2012 19:13 UTC (Wed) by bronson (subscriber, #4806) [Link] (4 responses)

Agreed. On top of that, I don't think anybody's saying that strlcpy must always abort, just that it should be safe by default. If you don't want safety, no problem, turn it off and and see what you get.

It would be nice if safety was always the default. Alas, libc (as standardized) only started thinking this way relatively recently.

The ups and downs of strlcpy()

Posted Jul 25, 2012 20:58 UTC (Wed) by smurf (subscriber, #17840) [Link] (3 responses)

Aborting is not "safe". Aborting is one of at least five ways to handle this particular error. Whether that is 'safe' depends on the context, i.e. your definition of that word.

The fundamental point is that you cannot know beforehand whether looking at the string twice is a [performance] problem, whether truncating (with or without fixing incomplete UTF-8 codes) is better than not starting to fill the buffer in the first place, whether calling abort() is a good idea (I'd say that if you are a library, it almost never is), whether to return something negative or the new length or the source length or …, and a host of related questions, all of which do not lend themselves to consensus answers. As this discussion shows quite clearly, IMHO.

My point is that, with the sole exception of leaving the destination buffer undisturbed when the source won't fit, any of the aforementioned behaviors can be implemented with a reasonably-trivial O(1) wrapper around strlcpy(). Therefore, keeping strlcpy() out of libc is … kindof stupid. Again, IMHO.

Instead, people are told to use strncpy(). Which they'll do incorrectly. Let's face it, running off the end of a string into la-la land is always worse than truncating it.

The ups and downs of strlcpy()

Posted Jul 26, 2012 1:28 UTC (Thu) by nybble41 (subscriber, #55106) [Link] (2 responses)

Here's a suggestion (only partly sarcastic):
typedef size_t (*strxcpy_handler_t)(char *dst, const char *src, size_t size, void *data);

size_t strxcpy(char *dst, const char *src, size_t size, strxcpy_handler_t overflow_fn, void *overflow_data)
{
  char *p;
  const char *q;

  for (p = dst, q = src; *q; ++p, ++q)
  {
    if ((p - dst) >= size)
    {
      return overflow_fn(dst, src, size, overflow_data);
    }

    *p = *q;
  }

  /* get here only if strlen(src) < size */
  *p++ = '\0';
  return (p - dst);
}

size_t strxcpy_truncate(char *dst, const char *src, size_t size, void *data)
{
  if (size <= 0) abort();
  dst[size - 1] = '\0';
  return size + strlen(src + size);
}

size_t strxcpy_abort(char *dst, const char *src, size_t size, void *data)
{
  abort();
  return size;
}

if (strxcpy(dst, src, dst_size, strxcpy_truncate, NULL) >= dst_size) ...;
(void)strxcpy(dst, src, dst_size, strxcpy_abort, NULL);
(void)strxcpy(dst, src, dst_size, strxcpy_subst, "(input too long)");
/* ... */

The ups and downs of strlcpy()

Posted Jul 26, 2012 8:53 UTC (Thu) by renox (guest, #23785) [Link] (1 responses)

I'm not sure what is the point of the wrapper function, calling directly strxcpy_abort, strxcpy_truncate, etc would be simpler..

That said, one size doesn't fit all so having different function is reasonable, the biggest issue is that there is no sane default behaviour..

The ups and downs of strlcpy()

Posted Jul 26, 2012 16:27 UTC (Thu) by nybble41 (subscriber, #55106) [Link]

The strxcpy function isn't just a wrapper; it does all of the real work. The strxcpy_abort, strxcpy_truncate functions only run when an overflow condition is detected. This allows you to substitute your own preferred method of error-handling.

This is actually rather similar to the way exceptions are handled in Common Lisp or Scheme programs, except that the Lisp version would use dynamic variables rather than explicit arguments for the handler code, which results in less cluttered code.

(define (default-error-handler error-value) (abort))
(define current-error-handler (make-parameter default-error-handler))

(define (do-something)
  (... (if (ok? var) var ((current-error-handler) var)) ... ))

; aborts on error
(do-something)

; evaluates to #t on success, or #f on error
(let/cc return
  (parameterize ([current-error-handler (lambda _ (return #f))])
    (do-something)
    #t)

; uses "value" in place of var on error
(parameterize ([current-error-handler (lambda _ value)])
  (do-something))

Scheme-style parameters are attached to the current continuation, meaning that they're not only thread-safe, but that the bindings only affect the inner dynamic scope of the (parameterize) form, even in exceptional cases such as non-local return (like the middle example above) and even re-entry into a dynamic scope which was previously exited.

The ups and downs of strlcpy()

Posted Jul 25, 2012 12:30 UTC (Wed) by NAR (subscriber, #1313) [Link] (1 responses)

Why would aborting a single process shut down the whole gaming server? I mean if you have a sufficiently large and complicated software, there will be terminal bugs there that you have to handle (e.g. have a lightweight supervisor process that starts a separate server for each new client). But if you already handle these problems, then you might as well crash on purpose in the worker processes.

The ups and downs of strlcpy()

Posted Jul 26, 2012 11:51 UTC (Thu) by hppnq (guest, #14462) [Link]

It would not make sense to handle the relatively complex case of failing modules correctly but not string manipulation routines.

The ups and downs of strlcpy()

Posted Jul 19, 2012 7:40 UTC (Thu) by paulj (subscriber, #341) [Link] (9 responses)

As the last paragraph hints at, I'd hope anyone let out of programmer kindergarten to work on real code knows to hide away and abstract the bothersome C str* functions behind a more sane string API that tracks sizes.

The ups and downs of strlcpy()

Posted Jul 19, 2012 9:36 UTC (Thu) by renox (guest, #23785) [Link] (2 responses)

From my experience, your hopes are not realistic *at all*!

That's why I find the rejection of strlcpy *stupid*: it's not perfect but it's much better than strcpy or strncpy, and it's a step in the right direction for the many projects which don't use a library for strings handling.

Oh, and the "good" code in the article which use strlen before strcpy is bad from a performance point of view, especially when the strings are bigger than the cache.

The ups and downs of strlcpy()

Posted Jul 19, 2012 18:53 UTC (Thu) by Fats (guest, #14882) [Link]

> That's why I find the rejection of strlcpy *stupid*: it's not perfect but it's much better than strcpy or strncpy, and it's a step in the right direction for the many projects which don't use a library for strings handling.

+1
I am probably missing some strange security twist in my brain but IMHO it's not the glibc's maintainer task to say to their users what not to use unless if proven bad (in this case not better than strcpy/strncpy).
They should just look at if this functions adds something to their library and I do think it does. If people still can and probably will misuse these functions is IMHO not a good reason to keep it out of their library. And we are talking about a few bytes of code only.
Sometimes I really can get into the comments from Linus on the security purists (avoiding the 4 letter word here).

greets,
Staf.

The ups and downs of strlcpy()

Posted Jul 20, 2012 22:18 UTC (Fri) by gch (guest, #63880) [Link]

It seems it so common to call strlen then another str* function that a new optimization pass was added to gcc 4.7 to keep track of strings length and thus avoid redundant passes on strings.

See gcc 4.7 release notes:
http://gcc.gnu.org/gcc-4.7/changes.html

The ups and downs of strlcpy()

Posted Jul 19, 2012 9:46 UTC (Thu) by etienne (guest, #25256) [Link] (4 responses)

> [leave] programmer kindergarten [and use] a more sane string API that tracks sizes.

And obviously lose the use of any string whatsoever in a place where you cannot allocate memory.
And lose the capability to use multi-lingual constant strings because the size of string memory can never be bigger than the total size of your code, I mean:
const char *error_mlstr = "error\0erreur\0erro\0Ошибка\0";
But the main problem anyway is that strlcpy() do not even try to behave with UTF8, cutting the string in the middle of a char may create bigger security problems.

The ups and downs of strlcpy()

Posted Jul 19, 2012 15:49 UTC (Thu) by smurf (subscriber, #17840) [Link]

There are two easy workarounds for that:

* if you reallocate the buffer anyway, or if your program does not care about the character set, this is not a problem.

* if your program blindly assumes that its input is valid UTF-8, don't bother – you're going to fail anyway.

* otherwise, a wrapper which NULLs an incomplete UTF8 character at the end of your buffer is ten lines of C and left as an exercise to the reader. ;-)

The ups and downs of strlcpy()

Posted Jul 19, 2012 22:10 UTC (Thu) by nix (subscriber, #2304) [Link] (2 responses)

And obviously lose the use of any string whatsoever in a place where you cannot allocate memory.
Places where you cannot allocate memory are vanishingly rare (excepting in OOM situations, where the only sane thing to do is to terminate the process and let a parent deal with it). It is not worth crippling the string API just for this.
const char *error_mlstr = "error\0erreur\0erro\0Ошибка\0";
And for this, you want a string table abstraction. C gives you all the tools you need to write proper ADTs; why do so many C programmers persist in trying to do everything without such help?

The ups and downs of strlcpy()

Posted Jul 20, 2012 9:56 UTC (Fri) by etienne (guest, #25256) [Link] (1 responses)

>> const char *error_mlstr = "error\0erreur\0erro\0Ошибка\0";
>And for this, you want a string table abstraction. C gives you all the tools you need to write proper ADTs; why do so many C programmers persist in trying to do everything without such help?

And replace three to eight bytes strings with arrays of eight bytes pointers?
Some ways to write code wants you to use small strings:
cout << "The" << pet? "cat" : "dog" << (nb>1)? "are" : "is" << "black.";

> Places where you cannot allocate memory are vanishingly rare

Places where you need non-standard memory allocation (fail if allocation would sleep, allocate as virtual or physical memory, fail if allocation obviously too big at 10 Mbytes for a string, force a stack allocation) may not be so rare.

The ups and downs of strlcpy()

Posted Jul 20, 2012 12:59 UTC (Fri) by nix (subscriber, #2304) [Link]

And replace three to eight bytes strings with arrays of eight bytes pointers?
No. I'd expect such an abstraction to return a struct (for information-hiding purposes) which has one member, an offset into the string table. No pointers needed.
Places where you need non-standard memory allocation (fail if allocation would sleep, allocate as virtual or physical memory, fail if allocation obviously too big at 10 Mbytes for a string, force a stack allocation) may not be so rare.
To a first approximation these are all things that are only going to happen in kernel coding. If you're writing kernel code I expect you to be smart enough to use the language you're writing in, or at the very least to have appropriate abstractions that can be told things like 'do not allocate now' (and indeed the kernel's various internal abstractions can be told just this).

However, most people are not kernel programmers, and don't operate under such harsh constraints. For them, there's no excuse to not use appropriate abstractions other than a pointlessly minimalist C coding style more appropriate for the tiny systems of the 1970s than for now.

The ups and downs of strlcpy()

Posted Jul 19, 2012 17:18 UTC (Thu) by Aliasundercover (subscriber, #69009) [Link]

Managed strings in allocated memory come with their own baggage. Often allocated memory is not consistent with other important goals or constraints of the program. Programs which use managed string libraries must often deal with classic strings at their boundaries for communication or storage.

Pushing people to managed strings is not a legitimate goal of C library maintenance. It is reasonable to wish for the library maintainers to not view the C library as kindergarten declining useful, simple advances intending to shove people out into another language or library.

The ups and downs of strlcpy()

Posted Jul 19, 2012 15:27 UTC (Thu) by walters (subscriber, #7396) [Link]

But note if you're programming in C, you also have the option to entirely skip the crappy variants of C string concatentation routines and use e.g.
http://git.gnome.org/browse/glib/tree/glib/gstring.h

Data structures like that make it *hard* to do things incorrectly, and are quite efficient enough.

There's very little reason not to use a framework like GLib, APR, or the Samba stuff.

The ups and downs of strlcpy()

Posted Jul 19, 2012 15:40 UTC (Thu) by msbrown (guest, #38262) [Link]

First, an apology to Michael; he sent me (as a member of the joint POSIX/ISO/SUS working group that made the decision not to include strl*() ) a request for more information on the decision -- but his email came to me while I was in a set of meetings and I did not respond in time for this article.

At the time that strl*() was being proposed, the *_s() routines were being discussed in ISO C. As POSIX (and SUS) defer to ISO C in places where there are conflicts this was one reason not to add strl*().

Additionally, the glibc participants pointed out that to *correctly* use strl*() typically required more lines of code than coding the operation from scratch - as noted in Paul Eggert's observations.

The result was that there was no consensus in the WG for adding the functions at the time the 2008 specification was being worked.

The ups and downs of strlcpy()

Posted Jul 20, 2012 0:16 UTC (Fri) by ewen (subscriber, #4772) [Link]

Apparently not mentioned in the article is that strncpy() and friends will not only NUL terminate the string (so long as there is room) they will also NUL fill the entire rest of the buffer, doing much more work than is likely to be required if the destination is big. strlcpy() and friends just add the one NUL to make a valid string. So perfomance wise you can safely use it in place of strcat() all the time.

IMHO (and that of the *BSD folks), strlcpy() is the least bad trade off of performance and reliability of all the variations. It's most unfortunate that "just use strlen()" and other slow suggestions keep coming up as a viable alternative -- if performance is that little a concern that you can scan the source string twice, you might as well just use a language with managed strings anyway.

This seems to be a "perfect is the enemy of the good" situation, in a problem space where there is no "safe, fast, easy -- pick any three" solution.

Ewen

The ups and downs of strlcpy()

Posted Jul 20, 2012 17:39 UTC (Fri) by bronson (subscriber, #4806) [Link]

Great article! Balanced coverage, flows well, and the final sentence is gold.

Amazing that it's been over a decade and the strl* calls are still under discussion.

Another

Posted Jul 21, 2012 3:08 UTC (Sat) by ncm (guest, #165) [Link] (1 responses)

Where I work we use

int str_to(char* dest, char const* src, size_t bufsiz, size_t *off);

It returns zero if the copy succeeds without truncating. If off is non-NULL, then it starts copying at dest + *off, and writes the lesser of (dest + *off + strlen(src)) or (bufsiz-1) into *off. If it's obliged to truncate, it returns the number of bytes it abandoned.

If you are calling it just once, you pass NULL to off. But if you're appending a series of strings, you initialize "size_t end = 0;" first, and pass &end to each subsequent call. All the calls get the same dest, the same bufsiz, and the same off.

It's not perfect. Forgetting to (re-)initialize end is easy. It would be better if checking only the final result of a sequence of calls sufficed; as it is, if *src is empty in the final call, you won't know whether one of the previous calls was obliged to truncate. But C is limited.

Often snprintf is a better choice.

Another

Posted Aug 26, 2013 1:18 UTC (Mon) by hummassa (guest, #307) [Link]

I had
const char *stringcopy(char *dst, size_t size, const char *orig)
and it was used like:
for( const char *neworig = stringcopy(dst, size, orig); neworig; neworig = stringcopy(dst, size, neworig) ) {
  // do something with dst buffer of size
}

The ups and downs of strlcpy()

Posted Nov 4, 2013 20:07 UTC (Mon) by rmongiovi (guest, #93769) [Link]

Let me see if I've got this straight. Linux doesn't get strlcpy, which would be quite useful for code portability, because it could cause you to silently lose data?

Well, golly. I guess we'd better undefine snprintf then.

Or maybe we just ought to admit that even though they aren't perfect, having a consistent suite of routines that work with strings and have a maximum length parameter is a useful thing.

No one is forcing anyone to use these calls. And no one is preventing anyone from checking for truncation after the call. But I will go on record as saying that it's effing annoying to have someone decide for me what's good for me. I'm a grownup who can make that decision for myself.....

The ups and downs of strlcpy()

Posted Apr 28, 2014 16:56 UTC (Mon) by mirabilos (subscriber, #84359) [Link]

“apparently for similar reasons to their rejection from glibc”

Well, the list of POSIX 2008 authors, half of them looks like a who’s who of glibc developers…

The ups and downs of strlcpy()

Posted May 1, 2014 18:32 UTC (Thu) by netghost (guest, #54048) [Link]

You wrote such a long article to say the "up" and "down"s to justify "strlcpy()" not going into glibc, and the ONLY "down" side you points out in the article is "if you do not check the return value of strlcpy(), then you can lose data"? Sorry, this seems an very far-fetched argument to me, since how many C library functions can do EXACTLY what you want it to do WITHOUT checking return value and handling the error accordingly?


Copyright © 2012, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds