Fresh Hacker News | OrthoRoute – GPU-accelerated autorouting for KiCad

▲OrthoRoute – GPU-accelerated autorouting for KiCad(bbenchoff.github.io)

161 points by wanderingjew 13 hours ago | 4 comments

▲morellt 12 hours ago

I would love to know the application of this ludicrous PCB, and I'd be even more interested to see the quote price

Hey, guy who made this here. This probably deserves a little explanation. First off, I'd like to tell you I'm really, really unemployed, and have the freedom to do some cool stuff. So I came up with a project idea. This is only a small part of a project I'm working on, but you'll see where this is going.

I was inspired by this video: https://www.youtube.com/watch?v=HRfbQJ6FdF0 from bitluni that's a cluster of $0.10-0.20 RISC-V microcontrollers. For ten or twenty cents, these have a lot of GPIOs compared to other extremely low-cost microcontrollers. 18 GPIOs on the CH32V006F4U6. This got me thinking, what if I built a cluster of these chips. Basically re-doing bitluni's build.

But then I started thinking, at ten cents a chip, you could scale this to thousands. But how do you connect them? That problem was already solved in the 80s, with the Connection Machine. The basic idea here is to get 2^(whatever) chips, and connect them so each chip connects to (whatever) many other chips. The Connection Machine sold this as a hypercube, but it's better described as a hamming-distance-one graph or something.

So I started building that. I did the LEDs first, just to get a handle on thousands of parts: https://x.com/ViolenceWorks/status/1987596162954903808 and started laying out the 'cards' of this thing. With a 'hypecube topology' you can split up the cube into different parts, so this thing is made of sixteen cards (2^4), with 256 chips on each card (2^8), meaning 4096 (2^12) chips in total. This requires a backplane. A huge backplane with 8196 nets. Non-trivial stuff.

So the real stumbling block for this project is the backplane, and this is basically the only way I could figure out how to build it; write an autorouter. It's a fun project that really couldn't have been done before the launch of KiCad 9; the new IPC API was a necessity to make this a reality. After that it's just some CuPy because of sparse matrices and a few blockers trying to adapt PathFinder to circuit boards.

Last week I finished up the 'cloud routing' functionality and was able to run this on an A100 80GB instance on Vast.io; the board wouldn't fit in my 16GB 5080 I used for testing. That instance took 41 hours to route the board, and now I have the result back on my main battlestation ready for the bit of hand routing that's still needed. No, it's not perfect, but it's an autorouter. It's never going to be perfect.

This was a fun project but what I really should have been doing the past three months or so is grinding leetcode. It's hard out there, and given that I've been rejected from every technician job I've applied to, I don't think this project is going to help me. Either way, this project.... is not useful. There's probably a dozen engineers out there in the world that this _could_ help.

So, while it's working for my weird project, this is really not what hiring managers want to see.

▲vrinsd 8 hours ago

Author: Thanks for taking the time to reply.

I read the write-up with a LOT of interest, this is really amazing work, there's not a lot of good options for auto-routing with open-source PCB tools (i.e. KiCad). I have also used the other autorouter you mentioned for "low-complexity" boards in KiCad and it helped do the job but was painful.

In my career I've also used the autorouter built into the "high-end" PCB tools and they could handle the complexity of boards you outlined WITHOUT needing a massive GPU, but they also paid people to improve this stuff over 15-to-20-years and development happened when single-core computers with limited RAM were the norm.

On the technical side, somewhat more recent FPGA 'placement' algorithms used a simulated annealing algorithm, while what you didn't isn't about placement, that approach could posisbly help with 'net cross-over reduction' type of passes, and maybe help with designs where you can do port swap / pin swap.

I'm amused you made a RISC-V array with discrete parts -- I'm sure you considered using an FPGA? Jan Gray has done > 1000+ RISC-V cores (https://fpga.org/grvi-phalanx/) in "older" Xilinx FPGAs.

If you're trying to emulate Thinking Machines / CM-x or anything else, frankly I think a "mondo" FPGA is still the way to go.

Job-wise: A suggestion might be to reach out to the guys at AllSpice ( allspice.io ) who make revision control software for Altium and possibly KiCad. The work you did to enable IPC, etc seems like exactly the type of skillset these guys might need (contractor, maybe full-time?) to interoperate with KiCad.

If I see anything that might be up your alley I'd also reach out. I'm not in a position to hire anyone and while "some companies" may not be impressed by what you did, the right organization WOULD be.

I share your sentiment that the likes of "modern" companies like Apple, MSFT, etc the hiring process is really taylored to "I want a guy who can do X" and rarely "I want a guy who's shown he can learn Y and Z so he can certainly do X".

▲wanderingjew 7 hours ago

> On the technical side, somewhat more recent FPGA 'placement' algorithms used a simulated annealing algorithm, while what you didn't isn't about placement, that approach could posisbly help with 'net cross-over reduction' type of passes, and maybe help with designs where you can do port swap / pin swap.

Yeah, that was the first step in creating the netlist for the backplane. Simulated annealing on the 8196 nets. TO BE FAIR, this would be a lot easier to route if I didn't explicitly want each of the 16 cards to be identical, but I think that's the most cost-effective way to do it.

As far as an FPGA.... I don't know if I see the point. The nodes in the original CM-1 were basically _only_ ALUs. Very little processing power. The CM-5 was a little better, but this entire thing is batshit crazy. I might as well go for four thousand individually programmable cores. Like, what even is a MISD computer? I have no idea, so lets build one. See what it can actually do.

▲vrinsd 7 hours ago

If you're open to technical feedback your last comment, I've worked on these kinds of systems, have architected and built things even far "weirder" and these products have shipped and out in the real world, in silicon, in FPGAs and things between.

The reason an FPGA is a more suitable platform is you can translate "physical effort of making PCBs" into "creating a design in an infinitely re-programmable platform" and change your design as needed to your hearts content.

In fact, the original design of RISC-V included a bus called 'TileLink' to enable 'Many core' arrays of RISC-V processors.

Translation: You can pare-down open-source RISC-V cores and use TileLink and emulate CM or build something more complex as you see fit since that was built into the original open-source RISC-V specs.

FPGAs are their own joy and pain for sure and it's not as "cool" to re-program a blackbox on a PCB as it might be to make your own thing, so all depends on your goals.

▲farkanoid 4 hours ago

Would you happen to be the "what the god damn shit is this fuck" Brian Benchoff of benchoffisms[1] fame?

[1] https://hackaday.io/project/7986-benchoffisms

▲thebeardisred 3 hours ago

yes, that is the same Brian, an editor emeritus of Hackaday.

▲abraae 10 hours ago

It would be interesting to know if the finished board would work at all, given there must be some non-zero failure rate for each via.

▲NoiseBert69 11 hours ago

A4-sized, 32 layers.. I'd guess something around 1500€ at JLC.

▲farkanoid 3 hours ago

~$320USD each at ALLPCB with a MOQ of 5 for their standard 32-layer stackup, but that doesn't include the blind vias. Probably close to $400?

▲15155 1 hour ago

JLC will not do blind vias for you under any circumstances.

▲RicoElectrico 10 hours ago

Me too. I can't imagine a backplane where the connections would be so irregular as to require bringing out such big guns.

▲zeroping 2 hours ago

Beautiful closing to the write-up. "Never trust the autorouter. But at least this one is fast."

▲bsder 2 hours ago

Was there a particular reason why "Think real hard on the symmetry and write a Python script." wouldn't do the job? Or was it simply "It would be cool to abuse some GPUs"?

Either option is cool, though.

▲NoiseBert69 13 hours ago

Being that engineer in a PCB Fab in China who has to punch all the vias by hand

heart attack

▲varispeed 12 hours ago

Are these pushed by hand? I'd believe if you said it happens in the UK. Fabs are still stuck in the 90s.

▲NoiseBert69 12 hours ago

Joking.

There are videos from JLCPCB (one of the biggest). That stuff is 90% automated.

▲sitzkrieg 12 hours ago

bias (even blind) are certainly automated these days. only time i've seen people manually via in the last several years is using a pcb mill, and little copper rivets you could slam thru a fr2 board for quick in house top/bottom prototypes