I don't get why you process the whole connector board at once. If I understand correctly, you're connecting individual & identic boards to your board. So each connector on you giant board is actually dealing a bunch of small board right?
In that case, can't you exploit the inherent symmetry in the design here to only route a quarter of your connectors and then mirror/rotate the result for the other one? Or, if you have a X*X matrix, route one size minus the corners and replicate to the other sides?
Also, with such a huge connection board, it smells a NIH issue here. I think you'd better serialize the IO to a bus (whatever) and few lines and perform the connection in software (in a GoWin FPGA for example, both extremely cheap and quite powerful). Just think of the harness you'll need to build to fit the connectors in. The obvious routing bugs, and so on. Any maintenance will be a nightmare, if you need to swap 2 pins on a connector or re-run the routing.
Hi, author here, for this project, the backplane is as much of the computer as the 'daughter cards'. Think of it like the wire-wrap boards of _really old_ minicomputers. I'm using the PDP straight-8 as an analogy here because that's the oldest computer I've been inside of, but the backplane connects the different daughter cards together in a way such that the backplane _is_ the computer.
As far as symmetry goes, there really isn't any. For example, Board 0 conects to 1, 2, 4, and 8. Board 1 connects to 0, 3, 5, and 9. Board 3 connections to 1, 2 , 7, and 11.
There's one way I can think of to make this routing easier. Of of the 16 daughter boards, make the pinout unique to each daughter board. If I was doing this as a product, for manufacturing, this is exactly what I would do. I'd rearrange the pins on each daughter card so it would be easier to route. The drawback of this technique is that there would be 16 different varieties of daughter cards; not economical if you're just building one of these things.
So, with those constraints the only real optimization I have left is ensuring that the existing net plan is optimal. I already did that when I generated the netlist; used simulated annealing to ensure the minimal net length for the board before I even imported it into KiCad.
And yeah, serializing the IO would be better, but even better than that would be to emulate the entire system in a giant black box of compute. But then I wouldn't have written a GPU autorouter. I'm trying not to, but there is some optimization for _cool_ here, you know?
In that case, can't you exploit the inherent symmetry in the design here to only route a quarter of your connectors and then mirror/rotate the result for the other one? Or, if you have a X*X matrix, route one size minus the corners and replicate to the other sides?
Also, with such a huge connection board, it smells a NIH issue here. I think you'd better serialize the IO to a bus (whatever) and few lines and perform the connection in software (in a GoWin FPGA for example, both extremely cheap and quite powerful). Just think of the harness you'll need to build to fit the connectors in. The obvious routing bugs, and so on. Any maintenance will be a nightmare, if you need to swap 2 pins on a connector or re-run the routing.