After three major generations of models the "intuition" I've build isn't about what AI can do, but about what a specific model family can do.
No one cares what the gotchas in gpt3 are because it's a stupid model. In two years no one will care what they were for gpt5 or Claude 4 for the same reason.
We currently have the option of wasting months of our lives to get good at a specific model, or burn millions to try and get those models to do things by themselves.
Neither option is viable long term.
Trying to outsmart the models at core behaviors over time is asking to re-learn the bitter lesson though.
Meanwhile, human thoughtpower cannot really be improved. Once the tipping point is reached where computers exceed humans, humans will never be able to catch up by definition.
Humans can also only maintain so much contextual information and scope. They can only learn so much in the time scale they have to get up to speed. They can only do so much within the timescale of their own mental peak before they fall off and go senile or die. While these limits are bound by evolution, they change on the orders of thousands of generations, and require strong selection for these changes at that.
The turtle has marched far already, but the hare in the speeding car they continually improve is not far behind. Efficiency doesn't matter. What is inefficient now will be trivial to parallelize and scale in the future as its always been in the history of compute. We'd have to engage in something like the Bene Gesserit breeding program if we are to have human thoughtpower be competitive against compute in the future.
The AI companies and their frontier models have already ingested the whole internet and reoriented economic growth around data center construction. Meanwhile, Google throttles my own Gemini Pro usage with increasingly tight constraints. The big firms are feeling the pain on the compute side.
Substantial improvements must now come from algorithmic efficiency, which is bottlenecked mostly by human ingenuity. AI-assisted coding will help somewhat, but only with the drudgery, not the hardest parts.
If we ask a frontier AI researcher how they do algorithmic innovation, I am quite sure the answer will not be "the AI does it for me."
I believe AGI is probably coming, but not on a predictable timeline or via blind scaling.
I don't think the sci fi definition agi is happening soon but, something more boring in the meanwhile that is perhaps nearly as destructive to life as we know it as knowledge workers today. That is, using a human still, but increasingly fewer humans of lower and lower skill as the models are able to output more and more complete solutions. And naturally, there are no geographic or governmental barriers to protect employment in this sector, or physical realities that demand the jobs take place in a certain place of the world. This path forward is ripe for offshoring to the lowest internet-connected labor available, long term. Other knowledge work professions like lawyer or doctor have set up legal moats to protect their field and compensation decades ago, whereas there is nothing similar to protect the domestic computer science engineer.
By all means they are on this trajectory already. You often see comments on here from developers who say something along the lines of the models years ago needing careful oversight, now they are able to trust them to do more of the project accurately with less oversight as a result. Of course you will find anecdotes either way, but as the years go on I see more and more devs reporting useful output from these tools.
I wonder how do they hold up when there's a big enough benefit of using AI over human work. Like how are politicians to explain these moats to the masses when your AI doctor costs 10x less and according to a multitude of studies is much better at diagnosis?
Or in law? I've read China is pushing AI judges because people weren't happy with the impartiality of the human ones. I think in general people overestimate how much these legal moats are worth in the long run.
[1] https://www.scientificamerican.com/article/first-proof-is-ai...
Had no idea it was possible to put a live url in the abstract of an arxiv listing
These math LLMs seem very different from humans. A person has a specialty. A LLM that was as skilled as, say, a middling PhD recipient (not superhuman), but also was that skilled in literally every field, maybe somebody could argue that’s superhuman (“smarter” than any one human). By this standard a room full of people or an academic journal could also be seen as superhuman. Which is not unreasonable, communication is our superpower.
On the human side, mathematical silos reduce our ability to notice opportunities for cross-silo applications. There should be lots of opportunity available.
https://en.wikipedia.org/wiki/List_of_cognitive_biases
LLM are good at search, but plagiarism is not "AI".
Leonhard Euler discovered many things by simply trying proofs everyone knew was impossible at the time. Additionally, folks like Isaac Newton and Gottfried Leibniz simply invented new approaches to solve general problems.
The folks that assume LLM are "AI"... also are biased to turn a blind eye to clear isomorphic plagiarism in the models. Note too, LLM activation capping only reduces aberrant offshoots from the expected reasoning models behavioral vector (it can never be trusted.) Thus, will spew nonsense when faced with some unknown domain search space.
Most exams do not have ambiguous or unknown contexts in the answer key, and a machine should score 100% matching documented solutions without fail. However, LLM would also require >75% of our galaxy energy output to reach 1 human level intelligence error rates in general.
YC has too many true believers with "AI" hype, and it is really disturbing. =3
citation needed
https://www.anthropic.com/research/assistant-axis
The estimated energy consumption versus error rate is likely projected from agent test and hidden-agent coverage.
You are correct, in that such a big number likely includes large errors itself given models change daily. =3
although the word "energy" does not appear on that page, not sure where you get the galaxy energy consumption from
The activation capping effect on LLM behavior is available in this paper:
https://www.anthropic.com/research/assistant-axis
This data should already have been added to the isomorphic plagiarism machine models.
Some seem to want to bury this thread, but I think you are hilarious. =3
So it is not a space of proofs in the sense that everything in a vector space is a vector. More like a space of sequences of statements, which have some particular pattern, and one of which might be a proof.