Linux and Glibc API Changes

AceJohnny2 · on May 6, 2021

I note a lot of references to LWN.net articles, most of them written by Jon Corbet.

As a longtime subscriber of LWN, the man is a treasure to the Linux community (and industry!).

itamarst · on May 5, 2021

"The Linux Programming Interface" book referenced in the intro is an amazingly useful book if you do Linux (or POSIX) system programming, massive amount of detailed and clearly explained information.

https://man7.org/tlpi/

I do hope there's a new edition eventually, but as he says the core is still the same.

froh · on May 5, 2021

are there any interface removals or backwards incompatible glib and syscall changes? to my understanding kernel and glibc manage to be binary compatible since iirc the switch to posix threads?

if so that would be glibc and kernel abi additions only?

JoshTriplett · on May 5, 2021

The Linux kernel very occasionally removes interfaces, but generally only via "this turns out to have been broken for years and nobody has been using it".

glibc does deprecate and remove things, but it uses very careful symbol versioning, such that code compiled against previous versions of glibc continues to run but new code can't use those interfaces. It's a rare example of being ABI-compatible but not API-compatible.

anitil · on May 6, 2021

Could you go in a bit more about how this is done?

The only way I can think of this working is if the symbols are exported in the lib file but not exposed in a header. Is that what you mean?

JoshTriplett · on May 6, 2021

That's part of it. The other half is that whenever you link the library, you get the current version of each symbol, so it's possible to change the behavior of a new version while preserving a bug-compatible previous version. There's an extensive mapping that provides the symbols for ranges of versions.

lathiat · on May 6, 2021

That's basically it, this might help: http://peeterjoot.com/2019/09/20/an-example-of-linux-glibc-s...

rwmj · on May 5, 2021

glibc APIs are occasionally deprecated, usually because they are broken or dangerous. A well-known example is gets[1] which is impossible to use without introducing a buffer overflow. Less well-known ones include readdir_r, sigpause, register_printf_function.

APIs are occasionally broken too, although that would be a bug.

At least one API is known buggy and hasn't been fixed: fts_open. It only works properly if used on 32 bit architectures. [Edit: I just noticed that glibc got around to fixing this - yay!]

glibc+Linux is not entirely conforming to POSIX - threads being a good example where it differs in some significant respects from what POSIX requires.

Sometimes POSIX itself isn't well defined. Until very recently it was not properly specified if a file descriptor is closed if close(2) returns an error. Some POSIX systems closed it, some didn't, some closed it on some errors but not on others; and most programs wouldn't retry the close on error so would leak the fd. It has since been changed so that the fd is always closed even in the error case.

[1] https://www.man7.org/linux/man-pages/man3/gets.3.html

cesarb · on May 5, 2021

> A well-known example is gets[1] which is impossible to use without introducing a buffer overflow.

Actually, it is possible to use gets() safely, under a sufficiently contrived set of circumstances. Since it reads from stdin, you just have to make sure that stdin reads from a pipe which has its write end under the control of the same process, and be careful to only ever write a limited amount of data to that pipe, smaller than the buffer of any gets() call in the process.

jcranmer · on May 5, 2021

Fun fact: the linker complains at you when you try to use gets. It makes it rather annoying when you're trying to use it in a testsuite making sure your tool can handle some idiot using gets correctly...

zlynx · on May 5, 2021

I like to nitpick and point out gets() can be used safely, as a stunt.

Memory map a read/write page and after that memory map a no-permissions guard page. Now you can safely use gets() to read a page size string without allowing a buffer overflow.

kevincox · on May 5, 2021

Does gets() guarantee that it will write its output in order? If not it could in theory write after your guard page before touching the guard page itself. Of course I don't know if either the kernel or glibc would ever do this.

I think the only safe way to use gets() is with trusted input.

Arnavion · on May 5, 2021

Why does the order matter? It'll only write to the guard page if the input string is long enough to necessitate it, in which case it was going to fault anyway regardless of which page it touched first.

Edit: I guess you're considering "used safely" to include reading a truncated string, in which case writing in order would allow the program to be written such that it recovers from the fault and reads the valid page-worth of string.

kelnos · on May 5, 2021

I think the parent meant that if your string was longer than the size of the read/write page plus the size of the guard page, and if gets() is allowed to, say, write the string from the end to the beginning (I think this is unlikely, but let's say it could), then it would try to write the last character (first) all the way out beyond your guard page, possibly scribbling on some memory that the application had allocated for something else.

tsimionescu · on May 5, 2021

If it can write past the guard page, even if your program faults afterwards, it could have already compromised the larger system. Not claiming that it can, just entertaining the what-if.

jmgao · on May 5, 2021

You could hypothetically have a situation where libc has an arbitrarily large internal FILE* buffer (instead of reading a block, looking for a newline, and copying everything over immediately), and then copies in reverse, corrupting data after the guard page before it hits the guard page.

If there are other threads accessing data that happens to be placed after the guard page, bad things could happen, but this seems rather unlikely to be a real problem.

Arnavion · on May 5, 2021

Ah, right. >2 pages-worth of input would break the scheme.

gpm · on May 5, 2021

Or if you own both sides of stdin...

Denvercoder9 · on May 5, 2021

If we're nitpicking, doesn't this technically not still allow a buffer overflow, just negate the consequences of it?

pjmlp · on May 5, 2021

Actually gets() was deprecated in ISO C11.

einpoklum · on May 5, 2021

... but this does not apply retroactively; nor are you required to use C11 to interface with libc. So maybe this will be impactful in, oh, 20-30 years from now? :-(

pjmlp · on May 6, 2021

Whatever, the point being that this wasn't a decision from glibc developers.

As for what C version one is required to use alongside libc, UNIX is C's platform, on other OSes libc is part of the compiler not OS.

Also a post C11 compliant compiler is not required to still provide gets() in their libc for the case you use C89/C99 language mode.

mhitza · on May 5, 2021

> are there any interface removals or backwards incompatible glib

Yes https://abi-laboratory.pro/?view=timeline&l=glibc

arunc · on May 5, 2021

glibc breaks ABI quite often. Linus has roasted about it openly in the past https://www.youtube.com/watch?v=Pzl1B7nB9Kc

Notable quote from that: If there's a bug that people rely on, it's not a bug, it's a feature.

wahern · on May 5, 2021

Linux famously removed the sysctl syscall (the original, BSD-derived syscall version of /proc). It was justified because distros had already removed it. The removal was a huge API breakage and even broke security sensitive software, like Tor, for countless deployed systems. But because the distros removed it first (RedHat, specifically), Linus got to claim that "nobody was using it" and was shielded from the fallout.

Otherwise, both the kernel and glibc regularly break things accidentally. You rarely hear about it, though, because its the nature of software development that the areas most likely to be broken are those where people rarely lurk. glibc makes at least as much effort as Linux in terms of supporting backward compatibility, but glibc's job is in some ways much more difficult, and they have far fewer contributors to help out. There's no shortage of bugs in glibc, and I have plenty of my own gripes, but by the standards of the industry (particularly of FOSS), they do an outstanding job of maintaining ABI compatibility.

Once upon a time people would claim that glibc's efforts were feeble as compared to proprietary OSs like Solaris, AIX, or Windows. But these days those backward compat stories are far more complex and less pristine, and glibc has well over a decade (or two decades?) of using ELF symbol versioning to maintain compat.

Denvercoder9 · on May 5, 2021

> The removal was a huge API breakage and even broke security sensitive software, like Tor, for countless deployed systems.

Honestly, I'd say that is on them. It has been discouraged to use it since basically forever (it has been noted in all-caps in the manpage since at least 2001), the kernel started complaining about its usage since Linux 2.6.24 which was released in January 2008, and it finally disappeared in Linux 5.5, released in January 2020. That's a two-decade deprecation period.

wahern · on May 6, 2021

Sure[1], but it was nonetheless a backward break that caused substantial trouble. I'm only trying to push back on the claim that Linux has a pristine and principled record in this regard, not that the removal wasn't reasonable for Linux. Linux can make certain claims because distros make many of the hard decisions for them. If projects were fronting glibc (like eglibc for awhile), glibc might be able to make similar claims resting on technicalities.

Also, the removal of sysctl by distros took away a facility, descriptor-less kernel entropy consumption via sysctl+RANDOM_UUID, that wouldn't be restored until getrandom was added many years later. Until then jail'd processes (or other code that couldn't make too many assumptions about its environment) had no easy way to seed their RNGs. Indeed, it likely created many unknown security issues that have [hopefully] been accidentally fixed with the adoption of getrandom by various libraries.

To this day Linux is still resolving issues and dilemmas caused by the removal of sysctl. There are many scenarios where /proc can't and shouldn't be accessible. (In most of those scenarios sysctl shouldn't be accessible, either, but especially since the addition of seccomp BPF it's easier to filter scalar syscall arguments than /proc opens.)

[1] Though, I don't remember any man page warning prior to 2008. (Or after, for that matter. I just remember the dmesg warnings, which because of the aforementioned dilemma regarding /proc put you between a rock and a hard place, waiting for the sword to fall, presuming you even caught it in time. Embedded developers might revisit a particular codebase only every couple of years.) Perhaps you're referring to notes that it wasn't portable? But there are countless interfaces that glibc documents as non-portable but infinitely less likely to disappear than even a Linux syscall. Do you have a link to a 2001 manual page?

Denvercoder9 · on May 6, 2021

> I'm only trying to push back on the claim that Linux has a pristine and principled record in this regard

For sure, I agree that Linux's record isn't perfectly clean. Just wanted to point out that if you were hit by that removal, part of the blame is on you.

> Do you have a link to a 2001 manual page?

I got it from the oldest manpages package from archive.debian.org. The git history on kernel.org doesn't go as far back.

The note I was referring to was the following:

  BUGS
       The object names vary between kernel versions.  THIS MAKES THIS SYSTEM CALL WORTHLESS FOR APPLICATIONS.  Use the /proc/sys interface instead.

Which in 2007 got replaced with the following (partly bolded):

  NOTES
       Glibc does not provide a wrapper for this system call; call it using syscall(2).

       Or rather... don't call it: use of this system call has long been discouraged, and it is so unloved that it is likely to disappear in a future kernel version.  
       Remove it from your programs now; use the /proc/sys interface instead.

matheusmoreira · on May 5, 2021

Linux kernel system call interface is considered stable. I checked the changelog and didn't see any removals. Don't know about glibc.

MichaelMoser123 · on May 6, 2021

can someone explain to me why they added faccessat2 ? ( https://man7.org/linux/man-pages/man2/faccessat.2.html ) Why would one need to do a relative access(2) check? Isn't this just bloat?

remexre · on May 6, 2021

I think there's a general push to have dirfd versions of all syscalls, since it makes it easier to write libraries that don't break when the application or another library chdir()s.

MichaelMoser123 · on May 6, 2021

thanks, makes sense. Still interesting that nowadays they are not afraid to add non posix syscalls to linux and glibc (like c); i think that several years ago they were less likely to do that.

I would be afraid to use all these new additions in an application; you couldn't use that with a docker container on an older system (docker uses the same kernel/glibc as the host)

Denvercoder9 · on May 6, 2021

> docker uses the same kernel/glibc as the host

Docker containers usesthe same kernel as the host, but the libc comes from the container.

MichaelMoser123 · on May 6, 2021

but for syscalls libc calls down into the kernel; libc and the kernel are a pair of shoes.

Denvercoder9 · on May 6, 2021

> libc and the kernel are a pair of shoes.

Not really. The kernel obviously is compatible with older versions of libc, but glibc is also compatible with older kernel versions (down to 3.2 for the latest version), and it even attempts to emulate some system calls if your kernel doesn't support them. You've quite some freedom to mix versions.

MichaelMoser123 · on May 7, 2021

here is the present source code for glibc where faccessat2 is used. What you say is true for some cases; if glibc is built for kernel versions below 0x050800 then it doesn't assume faccessat2, here it falls back to faccessat for some cases - and faccessat was added in 2.6.16. But if you have flags other than AT_SYMLINK_NOFOLLOW or AT_EACCESS the call just fails without doing the fallback. See https://github.com/bminor/glibc/blob/595c22ecd8e87a27fd19270...

TIL that this this glibc mirror is kept in sync with the real one https://github.com/bminor/glibc