Fresh Hacker News | GateGPT: 56k tokens per second Transformer (KV cache) on FPGA at 80 MHz

▲GateGPT: 56k tokens per second Transformer (KV cache) on FPGA at 80 MHz(twitter.com)

28 points by laxmena 2 hours ago | 3 comments

Transformers scale poorly vs. context window size and parameter count.

Which means really impressive when those N’s are small!

I’m but a pundit in this area so don’t know much. But one wonders if there’s a future in burning larger models to FPGAs - whether big enough FPGAs exist (or can be built), and whether locating specialized compute right with the memory it needs can speed things up.

Likely would need a lot of algorithm parallelism work that’d translate back to CPUs/GPUs.

▲T-A 33 minutes ago

▲genxy 1 hour ago

The context window is 16 characters. Talking about tokens per second is meaningless.

▲dominotw 57 minutes ago

its not meaningless. there could be usecases like spell correction.

▲genxy 21 minutes ago

It is only interesting as an academic exercise in EDA design. Just like microGPT. For something with an n^2 complexity and advertising perf is clickbait.

▲amelius 2 hours ago