Yes, can you describe your SPF library? I'd love to learn more about how using a...

wahern · on Sept 22, 2016

You can find it here:

  https://github.com/wahern/dns/blob/master/src/spf.rl

(The Ragel stuff is for parsing the SPF records and is a rather small detail. I always intended to remove it--it's a dependency that turns alot of people off--but never got around to it. But, FWIW, Ragel is awesome when you don't mind the build-time dependency.)

Basically what happened was that when I set out to write an async SPF library I found myself constantly refactoring the internal API and devising ever more clever hacks for implementing continuations in C. The biggest problems were

1) It had to support asynchronous I/O, and specifically non-blocking I/O.

2) I explicitly DID NOT want to use callbacks because one design goal of the library was that it be easy to integrate into other code and frameworks, regardless of event loop and even without an event loop. I had recently implemented a non-blocking DNS library after years of using and hacking on ADNS and c-ares taught me to hate callbacks in C.

Instead, you repeatedly call spf_check() until you get something other than EAGAIN. When you get EAGAIN, you use spf_pollfd(), spf_events(), and spf_timeout() for the polling parameters. (The file descriptor can change dynamically, e.g. if the DNS query had to switch to TCP from UDP. The events can be POLLIN or POLLOUT, though usually POLLIN, and the timeout is necessary because DNS send/recv timeouts can be much smaller than the timeout for the whole SPF check operation.)

3) SPF is recursive--policies can include other policies, which can include other policies. So you're already dealing with a logical stack situation, but you can't rely on the language's runtime stack. Also, for each rule there can be an iteration over DNS records with intermediate lookups, and it's especially ugly in C to pause and resume loops with inner function calls.

4) I had never implemented SPF before. I kept running into areas where my approach turned out to be deficient as I became more familiar with the spec. And because of constraints 1-3, this was very costly in time and effort. Were I to implement an SPF library again I probably wouldn't use a VM; instead I'd go back to using a basic state machine, function jump table, and a simpler state stack. But at the time it made sense because it was a more flexible and conceptually simpler approach given the unknowns. Plus, it was an excuse to try out a concept I had long had, which was implement a VM for non-blocking I/O operations.

So basically what it does is decompose the terms in an SPF policy to logical opcodes. So the "ip4:1.2.3.4" term composes into an opcode which pushes a typed address (1.2.3.4) onto the stack and the opcode (OP_IP4) for matching the address at the top of the stack to the address of the connecting host (which is part of the context of an instantiated SPF processor object).

The term "mx:foo.bar", for example, requires querying MX records for foo.bar, iterating over each host and recursively querying the A or AAAA records, and then matching the IP address like above. The "mx:" term is first emitted like the "ip:" term--push domain string following my OP_MX. But when the OP_MX opcode is executed it dynamically generates more bytecode to do the query. I don't remember why I did it this way--probably because it seemed simpler/easier at the time.

It works well, but objectively is over-engineered. At the very least the VM is _too_ general. Nonetheless, AFAIK it's the most comprehensive async I/O SPF library in C that doesn't use callbacks. It only passes ~80% of the SPF test suite, but that's mostly because of the policy parser--IIRC the ABNF grammar for SPF is broken because the proof-of-concept was implemented using a naive regular expression to chop of the string (the ABNF translation of the pattern is ambiguous); and I never bothered or needed to handle some of the corner cases necessary to match the behavior of, e.g., libspf2. But my implementation handles many millions (if not billions) of queries every day as it's used in several commercial and cloud products.

linkregister · on Sept 22, 2016

Thanks for responding. Great explanation! I agree that adhering to the SPF standard has diminishing returns as support is added for the more esoteric macros. Support for every single macro in the standard is almost certainly unnecessary.

I've seen a simple recursive descent parser to process SPF records, but never considered it could be done with a VM or a stack-based machine. Even with the DNS Lookup Limit, it appears you made the right choice to avoid the call stack for a fully async I/O interface.

Your coding style is simple but since I have never studied a VM before, I have trouble telling what's going on. You've inspired me to learn more!