The cycle accurate assembly language has enabled a number of timing sensitive FIFO data processes. What had to be done with FPGA's before has some limited support with PIO. I hope RPI increases the number of instructions and simultaneous running PIO machines in the future.
It's great to have the PIO in a smaller/cheaper device, but if that is the kind of thing that you really like, I do want to mention the BeagleBone SPUs and the Parallax Propeller / Propeller2 are similar/more powerful implementations of this concept.
I've done similar things (for a Playstation 1/2 controller) bit-banging with 8-bit microcontrollers. They often include instruction timing information in the datasheets.
The cycle accurate assembly language has enabled a number of timing sensitive FIFO data processes. What had to be done with FPGA's before has some limited support with PIO. I hope RPI increases the number of instructions and simultaneous running PIO machines in the future.