Beej's Guide to Unix IPC (2010)

pizza · on May 28, 2015

Also check out his guide to C programming [0] and his guide to network programming [1]

[0] http://beej.us/guide/bgc/ [1] http://beej.us/guide/bgnet/

rubiquity · on May 28, 2015

I owe my career to Beej. I was getting burnt out of web development and his guides (along with Jesse Storimer's books) made programming fun for me again. I highly recommend learning how Unix systems work. It's a lot of fun and opens up a whole new world of programming!

shabbyrobe · on May 29, 2015

I'm feeling the same way about web, I'd like to transition to something similar. I quite enjoy programming in C, so hopefully that helps. What are you doing at your job now? How did you go about making the switch from web to systems? Has it paid off with an increase in enjoyment?

rubiquity · on May 29, 2015

Truth be told, a significant portion of my day job is still web application stuff. The web, whether that's HTML/CSS/JS applications or APIs, is inescapable at this point for a huge portion of programmers. Only the lucky few get to hack on databases, hypervisors and file systems for a living!

But, armed with the knowledge of sockets, processes, etc. and how this correlates to performance/scaling, I've been able to carve out roles where I get to spend a decent amount of time working on problems that I get excited about.

But even before I was able to work the knowledge into my day job, nothing beats the dopamine rush of learning things that fascinate you! The rabbit holes that learning the basics of Unix have opened feel like they could occupy my hobbyist hours for the rest of my life.

vivekian2 · on May 29, 2015

Its amazing how we think the grass is greener. I have been doing systems programming for the past 7 years and have been thinking about moving over to JS based web development for the past few months.

kanungoparth · on May 29, 2015

Actually both of you are right. You need a break from what you have been doing from a long time. Perhaps exchanging your jobs with one another might help. :)

rubiquity · on May 29, 2015

Agreed. I think we humans just need change every now and then. Even when the problems and tools we use to solve those problems never change, we still somehow get upset with the tools or the problems.

ridiculous_fish · on May 28, 2015

Something I've struggled to implement on Linux is cross-process multicast notifications, where any process can post and multiple subscribed processes receive it. FIFOs and SysV IPC are unicast, and I think DBus is too.

On OS X there's notify(3) which is very nice. Any good options on Linux, other than writing my own socket server?

pdkl95 · on May 29, 2015

The solution for this kind of problem will depend heavily on what type of messages you're trying to send. When messaging becomes that complex, there are often other things that impact the overall design in important ways that need need to be considered.

For example, if I was implementing something that is usually associated with user events (rare-ish, basically zero bandwidth, complex signal with stateful messaging semantics), I would probably just write a simple server to manage it all. This has an advantage of centralizing any messaging complexity and lets you manage any multi-message state easily. Rebroadcasting messages to allow peer-to-peer messaging would be a trivial addition. This would probably UNIX sockets if the connections are persistent (which can changed into AF_INET{,6} sockets easily, if you wanted to add network support).

For something that requires very low latency (e.g. audio/MIDI), in the past I have had to use shared memory, which has zero overhead once it is setup (no syscalls or context switches). Here, the need for low latency dictated the design. Of course, this means managing locks. Not fun, but a cost that is sometimes worth paying.

There really isn't a one-size-fits all solution.

edit:

Or, as troglobit said, TIPC. I keep forgetting that we now have it has an option. :(

ridiculous_fish · on May 29, 2015

Thanks for the reply. My use case is very simple: a stateless "something happened" notification, which can be delivered asynchronously. Coalescing or even occasional drops are fine.

I did originally use a Unix domain socket server, but that added a lot of complexity: one has to arrange for it to be launched, guard against the possibility that it gets stuck, version it, deal with permissions, etc.

My new solution on Linux is a total hack: there's a FIFO, and to post a notification, you write to it. Clients see that the FIFO became readable, and that change represents the notification. The sender then drains the data it wrote, so that the FIFO becomes unreadable again. This is a total abuse of FIFOs, but it's proven to be much simpler than trying to manage a separate server.

I've never heard of TIPC. From a little searching it looks like it's very capable but geared towards clusters, and is overkill for my use case. What do you think?

teddyh · on May 29, 2015

> there's a FIFO, and to post a notification, you write to it. Clients see that the FIFO became readable, and that change represents the notification. The sender then drains the data it wrote, so that the FIFO becomes unreadable again.

Beware: I tried that once, and it was unreliable. Only some clients woke up.

bdash · on May 29, 2015

We had a similar problem to solve for iOS and OS X and settled on the using FIFOs in that manner as well. A colleague wrote up a blog post about the various alternatives that were evaluated before settling on that approach: https://realm.io/news/thomas-goyne-fast-inter-process-commun...

ridiculous_fish · on May 29, 2015

Ha ha ha, that's great! If we are sent to programmer purgatory, at least we'll have each other.

pdkl95 · on May 29, 2015

If drops are fine, TIPC is probably overkill. I would probably just wrap something generic using UNIX domain sockets up into a library and re-use that as needed.

Depending on your permissions requirements[1], and if you really only need a signaling flag, have you considered the filesystem? Just touch a file in a well-defined directory named after the event that happened, and poll it periodically. Removing the file clears the flag. Signals can coalesce, but you should never drop any. You can poll a directory (that will normally be empty) without much CPU load (the directory inode will be cached most of the time). You could setup a multiple listeners by giving them their own "inbox" directory, like. e.g.:

    # send a notification
    touch "${HOME}/.${app_name}rc/messages/${destination}/${signal_flag_name}"

Using the filesystem opens up the possibility of a message sender being anything that can generate - even indirectly - an open(O_CREAT). Your signals also persist across programs shutdowns and crashes - you can send and receive even when the other side isn't running - and your state can persist across reboots. Also, you can leverage some of the guarantees provided by the kenel's vfs layer. For example, rename(2) is atomic, so you can send small data payloads by writing to a different name first.

    FLAG_PATH="${HOME}/.${app_name}rc/messages/${destination}/${signal_flag_name}"
    # using the PID ($$) to not collide with other message senders
    TEMP_PATH="${FLAG_PATH}-new-$$"
    echo -e "foo=bar\nbaz=quux\ncount=42" > "${TEMP_PATH}"
    mv "${TEMP_PATH}" "${FLAG_PATH}"

As an optional linux-specific feature, you can extend that technique to be event-driven (no polling loop) by telling the kernel to notify you about file-create events by listening to the directory (NOT the file) with inotify(7) for the IN_CREATE messages. Those events can be received either in a simple blocking style by letting poll(2) wake-up your process. Alternatively, you can receive events in in a non-blocking style with poll(2) if you give inotify_init1(2) the IN_NONBLOCK flag. The man oage inotify(7) should have an example.

[1] this can get annoyingly complicated - but certainly not impossible - if you have to care about user/group permissions, esp. on the directory. Making a group specific to the message sending can help.

ridiculous_fish · on May 29, 2015

Thank you for this thoughtful reply. There's a variety of options if I'm willing to poll, including shared memory or the filesystem idea you outline, but I hope to avoid polling for hygienic reasons. I also explored inotify but found it to be unreliable (https://github.com/travis-ci/travis-ci/issues/2342).

pdkl95 · on May 29, 2015

That's why I like inotify in blocking mode - the call to poll() is just to wake up the process (I think you could just blocking-read the inotify file handle? I haven't tried it directly). The point of using inotify is that you don't need to poll, because the kernel send your process reliable events instead over a file handle. The use of poll(2) is just a consequence of the interface using a file handle.

As, I originally said, though, there is certainly no one-size-fits-all solution, these are just a few of the available options, which may not be apropriate for your situation.

ridiculous_fish · on May 29, 2015

I like blocking inotify in principle - the problem is that it just didn't work! I think there is a gap in the Linux APIs in this area. Its multicast IPC mechanisms are just too heavyweight.

rdtsc · on May 29, 2015

You can do it with shared memory. One writer writes and multiple readers can observe. I did this for both low latency and throughput reasons.

In general you have to be very careful how you handle it and consider various consistency and failures scenarios.

The main part of memory layout looks something like this:

  [write_counter][......buffer.....]

This is owned and updated by the writer. Readers have a read_counter that they maintain in their own context (not shared).

You'd probably have to declare this using the 'volatile' keyword. Otherwise compilers will optimize away access to part of this (seemingly) unused variables.

Then it works like this:

The writer_counter value is always counting up when writer writes. Data gets written to buffer. Then "write_counter" is incremented. Both reader and writer index into buffer by using {write|reader}_counter%buffer_size.

Also, note these counters also function as total counts of items written and so each reader cand determine how far ahead the writer is.

Another note: depending on the sizes of your counter it will not necessarily be updating atomically. Compiler could separate the update as multiple instructions and say, increment the lower part of the value, then the upper part. Writer could get pre-empted between those two instructions, so you could get this strange torn value. In this case because of the % you'd still fall into the range of the buffer. But, you might be reading data you didn't expect. Whether that works for your use case or not you'll have to see.

EDIT: Don't also forget about the slow and stupid multicast mechanism -- writing to a file. Some file operations can be atomics (renaming a file). And some operating systems let you watch the files for changes.

gnachman · on May 29, 2015

I haven't tried it, but you should be able to have multiple processes listen on the same UDP port bound to localhost using SO_REUSEPORT. Send broadcast UDP packets (using SO_BROADCAST) so they all get it.

From here: http://www.kohala.com/start/mcast.api.txt

"More than one process may bind to the same SOCK_DGRAM UDP port if the bind() is preceded by:

int one = 1; setsockopt(sock, SOL_SOCKET, SO_REUSEADDR, &one, sizeof(one))

"In this case, every incoming multicast or broadcast UDP datagram destined to the shared port is delivered to all sockets bound to the port. For backwards compatibility reasons, THIS DOES NOT APPLY TO INCOMING UNICAST DATAGRAMS -- unicast datagrams are never delivered to more than one socket, regardless of how many sockets are bound to the datagram's destination port."

troglobit · on May 28, 2015

TIPC is pretty neat, it's available in the kernel since quite a long time (AF_TIPC) and works between processes on one or many nodes.

http://tipc.sourceforge.net/

ploxiln · on May 29, 2015

Classic signals are tricky... but you can send a signal to all members of a "process group".

see: man killpg, man setpgid

also, "zeromq" has a pubsub mode, though I've never used it and I'm not sure about its limitations

easytiger · on May 29, 2015

zeromq over loopback or with pgm is prob as good/fast as UDS

vezzy-fnord · on May 29, 2015

You can actually implement a basic pubsub (both one-to-many and many-to-many) mechanism using FIFOs and file system permissions in a particular fashion known as a fifodir:

http://skarnet.org/software/s6/fifodir.html

http://skarnet.org/software/s6/ftrig.html

ekiru · on May 29, 2015

dbus signals can be multicast. Specifically, dbus signals without a destination are routed to all connections with match rules (added with org.freedesktop.DBus.AddMatch on the message bus) which match the signal.

paraiuspau · on May 29, 2015

Ahh.... summer 98, discovering sockets, too poor to afford Stevens, 14.4 kbps modem, netscape 4.x rendering Beej's .edu site (forgotten now), me staring my future in the face.

Thank you for your efforts, Brian!

jrapdx3 · on May 29, 2015

I'll add to the chorus of praise for Beej's work. I've often consulted this particular guide and the companion piece for networks as well. Helped me write web and other servers, a great way to learn about important technologies, providing knowledge that stays useful even if the production server runs on Node.js.

Besides, I'm biased. "BJ" happens to be my wife's unofficial name. She's a remarkably smart person, so I was predisposed to think "BeeJ" would know what he's talking about and it turns out he did.

ladybro · on May 29, 2015

Beej helped me pass my Operating Systems final last semester. Really quality and entertaining guides he puts together.

alekratz · on May 29, 2015

Beej is great. I've printed this book out and have it sitting in a binder on my bookshelf.

stuaxo · on May 29, 2015

Thanks for reminding me, I really need to put this into practice, so much great stuff in here.

ausjke · on May 30, 2015

what about an update to posix IPC instead of sysV? I thought Posix IPC is "better" these days?

am185 · on May 29, 2015

beej! how can i forget the his network programming guide!