If you've been writing Perl for 30 years, you might not want to learn JavaScript just to make a little fun idea in your head to show your wife. Vibe code that shit man. Who cares? Your wife does not care about LOC or those internal design decisions you made.
If you're trying to learn something new like an algorithm, protocol, or API write that shit by hand. You learn by doing, and when you know how the thing works and have that mental context, you will always be faster than an AI. Also, when did we stop liking to learn? Why is it a bad thing to know all the ins and outs of a programming language? To write and make all the decisions yourself? That shit is fun. I don't care if you disagree.
If you're at work and they really care about getting something out of the door, do whatever you think is best. If you just wanna ship vibed code and review PRs all day, all the power to you. If you wanna write it by hand, and use AI like a scalpel to write up boiler plate, review code, do PR audits, etc... go for it!
A hammer is a really great tool that has thousands of purpose-designed uses. I still prefer my key to get into my car. It's all tools, you are a person.
A lot of this stuff if coming top-down from people who do not have the experience you do. Wouldn't a smart employee use their expertise to advise the organization? If you work at a company where that would not be okay, maybe it's time to start looking for another firm.
I suspect it happened when we achieved a level of such constant stimulation (there is a pocket computer always on us with infinite effortless distraction) that we’re never bored and never engage the default mode network.
https://en.wikipedia.org/wiki/Default_mode_network
https://www.youtube.com/watch?v=orQKfIXMiA8
When you’re bored, your mind goes to places it wouldn’t otherwise go. Curiosity kicks in. Curiosity is a precursor to learning. Learning engages the brain and is fun. But it’s not fun all the time, some of it is challenging and frustrating (which is good, that’s the process that teaches you).
When you have the digital equivalent to infinite candy and the brain equivalent to a sweet tooth, it’s hard to resist the siren’s call. The consequence is the brain equivalent to a stomachache—depression and loss of meaning—but unfortunately it doesn’t hit you the same way so you don’t make the immediate connection to make yourself stop. When you think about it, it’s ridiculous from several angles: the candy is infinite, it’s never going to run out, so you don’t need to gorge! But then we justify ourselves as only a true addict would, that while the candy is infinite, the flavours are limited editions and always rotating, and what if I miss that really good one everyone is on?! Then you miss it, is the answer. No one will be talking about it in fifteen minutes anyway.
> when you know how the thing works and have that mental context, you will always be faster than an AI
That's just plain false, honestly. No one can type at the speed AI can code, even factoring in the time you need to spend to properly write out the spec & design rules the AI needs to follow when implementing your app/feature/whatever. And that gap will only increase as LLMs get more intelligent.
The more specific your requirements the closer you get to natural language not being useful anymore.
What do you use them for? For most AI users it's usually CRUD and I've never seen a web server or frontend in APL like languages.
The reason why programming is hard is because most languages force you to use a hammer when you need a screw driver. LLMs are very good at misusing hammers and most people find them useful for that reason.
If you use a sane dsl instead the natural language description of a problem is always more complex and much longer than the equivalent description in a dsl. It's also usually wrong to boot.
This is what algebra used to look like before variables: https://en.wikipedia.org/wiki/Archimedes%27s_cattle_problem#...
I don't think you will find anyone who can do better than an LLM at one shotting the prose version of the problem. Both will of course be wrong.
But I also don't think you will find an LLM that can solve the problem faster than a human with Prolog when you have to use the prose description of the problem.
The volume of people successfully adopting agentic engineering practices suggests this stuff isn't rocket science, but it is a learned skill and takes setup.
A year later into heavy AI coding, my experience is what you're describing should aid in being able to run 5+ agents simultaneously on a project because you know what you're doing, you set it up right, and you know how to tell agents to leverage that properly.
What's your definition of "successfully"?
More LOC committed per day is probably the only one that's guaranteed when you let spicy autocomplete take the wheel.
I don't think it's at all possible to reason about the other more meaningful metrics in software development, because we simply don't have the context of what each human is working on, and as with the WYSIWYG fad of 3 decades ago, "success" is generally self-reported, by people who don't know what they don't know, and thus they don't know what spicy autocomplete is getting woefully wrong.
"But it {compiles,runs,etc}" isn't a meaningful metric when a large portion of the code in question is dynamic/loosely typed in a non-compiled language (JavaScript, Python, Ruby, PHP, etc).
To give an example of where I hear this, it is indistinguishable from the things I hear from my coworkers: "You just need the right setup!" (IMO the actual difference is I need to turn off the part of my brain that cares about what the code actually does or considers edge cases at all) What I actually see, in practice, are constant bugs where nobody ever actually addresses the root cause, and instead just paves over it with a new Claude mass-edit that inevitably introduces another bug where we'll have to repeat the same process when we run into another production issue.
We end up making no actual progress, but boy do we close tickets, push PRs, and move fast and oh man do we break things. We're just doing it all in-place. But at least we're sucking ourselves off for how fast we're moving and how cutting edge we are, I guess.
I dunno, maybe I'm doing it wrong, maybe my team is all doing it wrong. But like I said the things they say are indistinguishable from the common HN comment that insists how this stuff is jet fuel for them, and I see the actual results, not just the volume of output, and there's no way we're occupying the same reality.
Translating that into code can happen directly by you, or into prompt iterations that need to result in the same/similar coded representation.
In other words, when it matters how something works and it is full of intricate details, you do not need to specify it, you just do it (eg. as an example which is probably not the best is you knowing how to avoid N+1 query performance issue — you do not need a ticket or spec to be explicit, you can just do it at no extra effort — models are probably OK at this as it is such a pervasive gotcha, but there are so many more).
You setup the environment and then you do the work. Unless you are switching employers every week, you invest in writing that stuff down so the generation is right-ish and generate validation tooling so it auto-detects the mistakes and self-repairs.
imagine you have to implement a specific algorithm for a quantum computer.
There's no value setting up AI to do the writing for you. That might be orders of magnitude harder then writing the algorithm directly.
For highly specialized one-off features, it doesn't always pay off.
On the other hand, if all you do are some generic items that AI can do well... then I'm not sure you're going to have a job long term, your prompts and automation will be useful for the new junior hires that will be specialized in using these and cost effective.
It's like talking legalese vs plain English; or formal logic vs English. Some people have the formal stuff come more naturally, and then spitting code out is not a burden.
But man AI is phenomenal for getting stuff out of your head and working quick.
You speak as if AI development is frozen, and you ignore the poster's point:
> that gap will only increase as LLMs get more intelligent
Current AI systems are extremely serial, in that very little of the inherent parallelism of the problem is utilized. Current-gen AI systems run at most a few hundreds of thousands of operations in parallel, while for frontier models, billions of operations could be run in parallel. Or in other words, what currently takes AI 8 hours will take it barely long enough for you to perceive the delay after you release the enter key.
For a demo, play around with https://chatjimmy.ai/ , the AI chatbot of Taalas, where they etched the model into silicon in a distributed way, instead of saving it in RAM and sucking it to execution units by a straw. It's a 8B parameter model, so it's unsuitable for complex problems, but the techniques used for it will work for larger models too, and they are working to get there.
And even Taalas is very far from the limits. Modern better quality LLM chatbots operate at ~40 tokens per second. The Taalas chatbot operates at 17000 tokens/s. If you took full advantage of parallelism, you should be able to have a latency of low hundreds of clock cycles per token, or single request throughput of tens of millions of tokens per second. (With a fully pipelined model able to serve one token per clock cycle, from low hundreds of requests.) Why doesn't everyone do it like that right now? Because to do this, you need to etch your model into silicon, which on modern leading edge manufacturing is a very involved process that costs hundreds of millions+ in development and mask costs (we are not talking about single chips here, you can barely fit that 8B model into one), and will take around a year. So long as the models keep improving so much that a year-old model is considered too old to pay back the capital costs, the investment is not justified. But when it will be done, it will not just make AI faster, it will also make it much more energy-efficient per token. Most of the energy costs are caused by moving data around and loading/storing it in memory.
And I want to stress that none of the above is dependent on any kind of new developments or inventions. We know how to do it, it's held back only by the pace of model improvement and economics. When models reach a state of truly "good enough", it will happen. It feels perverse to me that people are treating this situation as "there was a per-AI period that worked like X, now we are in a post-AI period and we have figured out that it will work like Y". No. We are at the very bottom of a very steep curve, and everything will be very different when it's over.
Everything I do to interact with my computer is through an agent now.
Everything I do to interact with my computer is still the same.
See how boring you are?
Telling the agent your high level plan that you are extremely familiar with and then having the agent execute on 2000 lines of code is FASTER then having you execute on that 2000 lines of code. There is no reality where that can be physically beaten by even someone who's typing really quickly with zero pause. Physically impossible.
Less boring or not? Another way to put it... although my answer is boring, I think I'm right. He is either a liar or like many other people lacks skill in using AI... because the transition to AI is happening so fast... not many people are fully utilizing AI to it's maximum potential. Many still use IDEs, many still interact with terminal. Many people still don't use it to configure infrastructure, do database administration, deploy code... etc.
Honestly, I'm still faster than AI cooking scrambled eggs, but definitely not faster than neither AI (or compiler) in translating stuff into code.
No way U beat an LLM on this, even on trivial ones. LLMs are better at that since at least 2024, if you haven't noticed, then you're not doing enough SQL perhaps.
But, of course it took years for people to realize they cannot outpace Visual Studio in the 90s by being very good at x86 assembly.
But during feature development? Not possible. And I consider myself a very fast developer
Bugs happen during feature development, as you say, but then Claude is in the context, and I don't need to tell it where to go, it sees the bug with failing tests, or smth similar.
BTW. One thing that helps my Claude with debugging harder problems is that I tell it to apply scientific method to debugging. Generate hypotheses, gather pros/cons evidence, write to a journal file debug-<problem>.md, design minimal experiments to debunk hypotheses.
You can add that as a skill, and sometimes it will pick it up automatically, but it works wonders just as a single sentence in the input.
besides it is a system that you query, it responds. I'm sure your dbs are not always 'right' and particularly when you as the wrong questions.
I can with full confidence say that the code AI writes is more robust and safe than if I would have done it myself. The code definitely becomes more bloated though.
Don't we already have a weekly post nowadays explaining, again, that typing isn't the bottleneck?
Going fast isn’t the difficult bit.
You can definitely be faster than frontier models. The number of tokens per second is not that high and they require a lot of tokens for thinking and navigating things.
The Spicy Autocomplete koolaid club is out in force today I see.
We clearly have different ideas of what the word "intelligent" means.
But there are other nerds who care, just not about the code quality, but about conversion, testing out business ideas quickly, getting to know their customers better.
There are nerds who care about business strategy.
There are nerds who care about accounting principles and clean financial reporting.
There are nerds who care about sales targets and partnerships.
There are many types of nerds out there. Don’t limit nerds to engineers, because “tech” world is not just an engineering world anymore. All these nerds you can team up with to build meaningful things, because they do care.
(Though, granted, the results are a lot better if you craft it by hand)
How often have engineers decried yet another rewrite that some project is doing? Or talked about "over-engineering" something that isn't needed, or that another person in a team has setup a full kubernetes gitops thing that's glorious to them but you just want to scp a go binary and be done with it?
I've seen truly excellent engineers hit this issue, I worked in a team years ago and people disagreed on the approach to take on a new project. So we all made a prototype and presented it, so we could pick a direction. There was a requirement that it be done in ruby since that was the language most of the developers were most fluent in. One of the engineers, remarkably smart, wrote a lisp interpreter in ruby so that technically it'd be "in ruby" but have the benefits of lisp.
He cared about the quality and process in one area. Deeply. However focussing on that would be at the detriment to the rest of the actual product we wanted to ship. If you considered the quality of the product as a whole and the process at the level of the organisation, you'd do something very different.
Now, none of this means all business people are good at this or long term vision or anything, just as it doesn't mean all engineers have a very narrow focus. But I've seen engineers focus on the quality or engineering of some component without looking at what it is you're actually trying to achieve as a business, and so push for a worse overall process and lower "quality" result. It's the same sort of disconnect that leads a lot of engineers to rail against meetings and PMs that slow them down without seeing from the other side that it's often better to build the right thing more slowly than the wrong thing more quickly.
This means different things to different people, lot of people enjoy the process of engineering solutions with LLM agents, build out tailored skilled, custom approaches that make up their own flavour "agentic" workflow. There are also people who find joy in Javascript that other people cannot understand why. And other people again love system languages or even tinkering with assembly etc.
What I wanted to say is that LLM use does not automatically mean people just want to get results faster, there are still nerds enjoying the process of working with these new tools.
But the real trick isn't "number of personal projects", but how weird they are. There's no "rational" reason to do them, they don't increase the person's marketability / hireability. They are done purely for intrinsic reasons.
(On reflection, this also seems to be a pretty robust predictor of autism. :)
and the whole world suffers for it.
Good code for a business is robust code, that's functionally correct, efficient where it needs to be and does not cost too much.
I believe most developers who care about good code are trying to articulate this, they care about a strong system that delivers well, which comes from good architecture.
LLMs actually deliver pretty well on the more trivial code cleanlines stuff, or can be made to pretty trivially with linters, so I don't think devs working with it should be worried about that aspect.
What is changing fast is that last point I mentioned, "that doesn't cost too much" because if you can get 70% of the requirements for 10% of the perceived up front cost, that calculus has changed. But you are not going to be getting the same level of system architecture for that time/cost ratio. That can bite you later, as it does often enough with human coders too.
I played with Image Playground last year some time. It was really fun. You know why? I can't draw, and I can't paint, to save my life. It's letting me do something I can't do well/at all on my own.
Using an LLM to do something I can do, with the caveat that it's pretty mediocre at the task, and needs to be constantly monitored to check it isn't doing stupid things? If I wanted that I'd just get an intern and watch them copy crappy examples from StackOverflow all day.
The same logic explains the use of LLM's to write emails/other long form text.
It makes accessible something that people otherwise cannot do well. Go look at submissions on community writing sites. The people who write because they're good at it, are adamant they don't use an LLM.
People use LLM's to do things they're otherwise not able to do. I will die on this hill.
That would imply that either the person in question has infinite time, or has access to all software that could ever be of utility to them, which seems unlikely.
So yeah, I guess the value of doodles has shot up simply because of optics.
Somewhere else in this comment section someone tried to broaden the definition of nerd so much so that pretty much anybody who is a consummate professional is also a nerd. The hill I will die on is that people don't actually dislike all this new AI stuff but more so the attitude of people heavily invested in it.
And to add another data point regarding your hill my drawing/painting moment was NLP stuff. Now if I want to do (rudimentary) sentiment analysis or keyword extraction I can lean on a local LLM. Yet I don't go around yelling Snowball (I think?) is obsolete.
Exactly.
LLM bros are just the new blockchain/crypto bros, but they aren't necessarily even writing their own spruiking comments any more.
I do not know the inns and out of the assembly layer my high level code end up as. It's not because I don't like to learn, it's because I genuinely don't need to. At a certain level of AI performance, how will this be any different?
An LLM is not coupled to anything and can generate output that simply does not relate to the input. This doesn’t happen with compilers, and if it does, then it’s a specific bug to be addressed. An LLM can never guarantee certain output based on the input.
If I write x < 100, I know exactly how the compiler will treat that code every single time, and I know what < means and how it differs from <=
If I tell an LLM that “I want numbers up to 100.” Will that give me < or <= and will it be consistent every single time, even the ten thousandth program that I write?
The language is ambiguous where the code is specific
I have a co-worker in another team that write java endpoins we consume. I can tell him what I need and I trust the output. I don't need to know java to trust him, it doesn't mean I don't want to learn.
There are thousand examples like this across every stack and abstraction level. From ssh-handshakes to gps.
Sure my co-worker is fundamentally different from a compiler which is fundamentally different from an LLM.
My argument is that the chain-of-trust where you offload knowledge to an external source is identical. We do it all the time but somehow doing it with an LLM means we no longer want to learn?
Multibillion dollars companies are now the gateway for every line of code you need to write. That’s dystopian. It sucks
I made a checklist for my kids to stamp off items after they get back from school (sort bag, get changed, etc). I had two goals, 1) I was trying to solve a problem at home and would have pip installed a library that just straight up did this already and 2) I wanted to check out what the claude website outputs was like at the time. My time was best spent poking at claude a bit but mostly playing with my kids - so vibe coding it was.
Client test speedup issues, I'm trying to speed up tests for them and spend as little time as possible doing so. Vibe coded some analysis and visualisation tools, mostly AI but with some review guided multiple prototypes for timing and let it just fix whatever. More dedicated review for the actual solutions.
Learning a new thing - goal is to learn that thing. AI there is good for doing a lot of the work around that. Maybe I'm focussing on, say, Z3. AI there can help with debugging, finding docs, setting up an environment and leave me to do the central part.
To split the difference, I now try to hand code as much as I can from the beginning, leave TODO comments for the agent to mop up and I'll ask it to complete the issue with reference to the current diff. It reduces the surface for agents to make stupid assumptions. If I can get it done fast on my own, win for me, if the agent finds issues or there's logic that needs checking, also a win. This way you stay sharp, but you have access to an oracle if you get stuck and it costs you fewer tokens.
The problem is mixing vibe-coding and agentic-eng, and switching the brain in 2 different modes (fast-feedback gratification vs deep-focus gratification).
There’s no clear cut rule on what works. Different people, different brains, and especially amongst devs some optimized low-key neurodivergence.
And then there’s waiting mode, those N seconds/minutes that agents take to think and write.
What’s the right mix? Keep a main focused project and … what do you do in the meantime? Vibe code something else? Hn? Social media? Draw lines on a paper sheet? Wood carving? Exercise? Rewatch some old tv series?
I have experimented….
There are side activities that help you go back to the task at hand in the correct mental framework for it. Not just for productivity, but for efficiency and enhancing critical thinking on the main task. Or whatever you choose to optimize for. Can anyone point me towards some people talking about this?
It's not just fun (i agree it is), but it is also essential for creation.
What we have done with the 'AI' is to create a lot of ignorant morons who think they can create a lot of things without knowledge. This is not gonna end well.
Who said "managers"
Now we have influx of people with not a single shred of technical knowledge thinking they can create something.
To be, it means expensive to evolve.
Many that wore the Ring had pure and righteous intentions. The thought of, "If I were in power, I would..." was the arrogance and corruption which the Ring amplifies.
So, I cannot agree that it is AI doing the harm. Rather, AI just gives us the power to do the harm, the shortcuts, the cheats, etc. we have always desired. And just like the Ring, I believe much of the harm from LLMs often comes from people that started with good intentions, and the power it grants is just too tempting for many.
> If you're at work and they really care about getting something out of the door, do whatever you think is best.
If you don’t mind being jobless, sure do whatever you think is best. Not all of us can simply switch companies easily. Folks need to realise that AI in a company setting works for the benefit of the company, not for the individual.
It's the practitioner who eventually figures out what really works. I see this the same way the agile movement emerged. It was initiated by people who were hands-on programmers and showed enough benefit at minimizing software waste before it took a life of its own and started getting peddled by people who didn't really understand the underlying principles.
And no, this isn't playing what ifs.
I have seen it happening with offshoring, migration to cloud, serverless, SaaS and iPaaS products, and now AI powered automations via agents.
Less devops people, less backend devs , no translation team, no asset creation team,...
I have been layoff a few times, having to do competence transfer to offshoring teams, the quality of the output is something c suites don't care all.
Do you wanna bet what is behind Microslop, Apple Tahoe bugs and so forth?
Plenty of engineers have loose (or no!) standards and practices over how they write coee. Similarly, plenty of engineering teams have weak and loose standards over how code gets pushed to production. This concept isn't new, it's just a lot easier for individuals and teams who have never really adhered to any sort of standards in their SDLC to produce a lot more code and flesh out ideas.
I personally don’t know any colleagues who were good engineers just because they wrote code faster. The best engineers I know were ones who drew on experience and careful consideration and shared critical insights with their team that steered the direction of the system positively.
> Claude, engineer a system for me, but do it good. Thanks!
I don't know if good engineers can necessarily continue to be good. There is limit to how much careful consideration one can give if everything is on an accelerated timeline. Regardless good or not, there is limit on how much influence you have on setting those timelines. The whole playing field is changing.
There's a cycle that is needed for good system design. Start with a problem and an approach, and write some code. As you write the code, you reify the design and flesh out the edge cases, learning where you got the details wrong. As you learn the details, you go back to the drawing board and shuffle the puzzle pieces, and try again.
Polished, effective systems don't just fall out of an engineers head. They're learned as you shape them.
Good engineers won't continue to be good when vibe-coding, because the thing that made them good was the learning loop. They may be able to coast for a while, at best.
A complex system that works is invariably found to have evolved from a simple system that worked. A complex system designed from scratch never works and cannot be patched up to make it work. You have to start over with a working simple system.
When there's a lot of complexity, it's often repetitive translation layers, and not something fundamental to the problem being solved.
We mocked these "architects" from experience. We knew that if you weren't feeling the friction yourself, you wouldn't learn enough to do good design.
Maybe you don't care about engineering great systems. Most companies don't. It's good for profit. This isn't new, though AI enables less care.
In my experience, in a lot of organizations, a lot of people either lacked the ability or the willingness to achieve any level of technical competence.
Many of these people played the management game, and even if they started out as devs (very mediocre ones at best), they quickly transitioned out from the trenches and started producing vague technical guidance that usually did nothing to address the problems at hand, but could be endlessly recycled to any scenario.
People who care about craft will care about the quality of what they produce whether they use AI or not.
The code I ship now is better tested and better thought through now than before I used AI because I can do a lot more. That extra time goes into additional experiments, jumping down more rabbit holes, and trying out ideas I previously couldn’t due to time constraints. It’s freeing to be able to spend more time to improve quality because the ROI on time spent experimenting has gone up dramatically.
I couldn't get exercises done where there were tricks/shortcuts which are learned by doing a lot of exercises, but for many, these are still the same tricks/shortcuts used in proofs.
This was indeed rare among students, but let's not discount that there are people who _can_ learn from well systemized material and then apply that in practice. Everyone does this to an extent or everyone would have to learn from the basics.
The problem with SW design is that it is not well systemized, and we still have at least two strong opposing currents (agile/iterative vs waterfall/pre-designed).
- I've taken a controversial new pill that accelerates my brain.
-- So you're smart now?
- I'm stupid faster!
That being said, being stupid faster can work if validation is cheap (and exists in the first place).
Turns out "eh close enough" for AGI is just stupidity in an "until done" loop. (Technically referred to as Ralphing.)
I've optimized my game's code and it finally runs at 1000 FPS.
--So your game is good now?
It's shit faster.
That has always been the case. That is why weeks or even months of programming and other project busy work could replace a couple of days of time getting properly fleshed out requirements down.
Good engineers are also capable of managing expectations. They can effectively communicate with stakeholders what compromises must be made in order to meet accelerated timelines, just as they always have.
We’ve already had conversations with overeager product people what the ramifications are for introducing their vibe coded monstrosities:
- Have you considered X?
- Have you considered Y?
Their contributions are quickly shot down by other stakeholders as being too risky compared to the more measured contributions of proper engineers (still accelerated by AI, but not fully vibe-coded).If that’s not the situation where you work, then unfortunately it’s time to start playing politics or find a new place to work that knows how to properly assess risk.
I estimate that I'm now spending about 10 to 30 hours less time a week in the mechanical parts of writing and refactoring code, researching how to plumb components together, and doing "figure out how to do unfamiliar thing" research.
All of those hours are time that can now be spent doing "careful consideration" (or just being with my family or at the gym or reading a book, which is all cognitively valuable as well).
Now, I suppose I agree that if timelines accelerate ahead of that amount of regained time, then I'm net worse off, but that's not the current situation at the moment, in my experience.
What you said: "figure out how to do unfamiliar thing" -- is correct, and will get things done, but overall quality, maintainability or understanding how individual pieces work...that's what you don't get. One can argue who care about all that as AI can take care of that or already can. I don't think its true today at-least.
What I find is actually necessary for me to have a mental model of the system is not typing out the definitions of the classes and such, but rather operating and debugging the system. I really do need to try to do things, and dig into logs, and figure out what's going on when something is off. And pretty much always ends up requiring reading and understanding a bunch of the implementation. But whether I personally typed out that implementation, or one of my colleagues, or an AI, is less important.
I mean, I already had to be able to build a mental model of a system that I didn't fully implement myself! I essentially never work on anything that I have developed in its entirety on my own.
10 to 30 hours saved on not learning new things! Hurray!
What do you mean by "barely working"? I can now put more iterations into getting things working better, more quickly, with less effort. That seems good to me.
10 to 30 hours a week is 25% to 75% of my time working. Seems like a pretty good trade?
I do understand that the calculation is different for people who are new to this. And I worry a lot about how people will build their skills and expertise when there is no incentive to put in all the tedious legwork. But that just isn't the phase of my career that I'm in...
My time is spent more on editing code than writing new lines. Because code is so repetitive, I mostly do copy-pasting, using the completion and the snippets engine, reorganize code. If I need a new module, I just copy what’s most similar, remove everything and add the new parts. That means I only write 20 lines of that 200 lines diff.
Also my editor (emacs) is my hub where I launch builds and tests, where I commit code, where I track todo and jot notes. Everything accessible with a short sequence of keys. Once you have a setup like this, it’s flow state for every task. Using LLM tools is painful, like being in a cubicle reading reports when you could be mentally skiing on code.
Or at least, the limit is increasing by the day.
Same, if anything, the opposite seems to be true, the ones that I'd call "good engineers" were slower, less panicked when production was down and could reason their way (slowly) through pretty much anything thrown at them.
Opposite experience, I've sit next to developers who are trying their fastest to restore production and then making more mistakes to make it even worse, or developers who rush through the first implementation idea they had for a feature, missing to consider so many things and so on.
Unfortunately, a lot of workplaces are ignoring this, believing their engineers are assembly line workers, and the ones who complete 10 widgets per minute are simply better than the ones who complete 5 widgets per minute.
Companies want workflows that work with mediocre programmers because they are more like interchangeable parts. This is the real secret to why AI programming will work in a lot of places. If you look at the externalities of employing talented people, shitty code actually looks better than great code.
This is the earworm the leaders of these companies have allowed into their minds. Like Agent Mulder, they Want To Believe in this so badly...
If you assume they are not idiots and analyze the FOMO incentives via a little game-theory, it becomes clear why.
Assuming the competition has adopted AI, leadership can ignore it, or pursue it. If they adopt it, then they are level with the completion whether AI actually succeeds or fails - they get to keep their executive job.
If leadership ignores AI, and it actually delivers the productivity gains to the competition, they will be fired. If they ignore AI and it's a bust, they gain nothing.
The company does better than the money-burning competition, but the executives personally gain nothing; there are no bonuses just because the competition took a misstep.
To me, none of this feels like "going faster", it feels like "opening up possibilities to try more things, with a lot less tedious work".
For things that have a visual elements like UI and UX, you can start with sketches (analog or digital) and eliminate the bad ideas, refine the good ones with higher quality rendering. Then choose one concept and inplement it. By that time, the code is trivial. What I found with LLM usage is that people will settle on the first one, declaring it good enough, and not exploring further (because that is tedious for them).
The other type of problem are mostly three categories (mathematical, logical, or data/information/communication). For the first type you have to find the formula, prove it is correct, and translate it faithfully to code. But we rarely have that kind of problem today unless you’re in a research lab or dealing with floating-point issues.
The second type is more common where you enacting rules based on some axioms originating from the systems you depend on. That leads to the creation of constraints and invariants. Again I’m not seeing LLM helping there as they lack internal consistency for this type of activity. (Learning Prolog helps in solving that kind of problem)
The third type is about modelizing real world elements as data structures and designing how they transform overtime and how they interact with each other. To do it well, you need deep domain knowledge about the problem. If LLM can help you there that means two things: a) Your knowledge is lacking and you ought to talk to the people you’re building the system for; b) The problem is solved and you’d do well to learn from the solution. (Basically what the DDD books are all about)
Most problems are a combination of subproblems of those three categories (recursively). But from my (admittedly small amount of) interactions with pro LLM users, they don’t want to solve a problem, they want it to be solved for them. So it’s not about avoiding tediousness, it’s sidestepping the whole thing.
Unfortunately I have seen some really good software engineering peers regress into bad engineers through a increasing reliance on AI.
Conversely some very bad engineers (undeserving of the title) have been producing better outputs than I ever expected possible of them.
For someone with 3-4 kids who lives far from the city, WFH and time flexibility can be important motivators.
However, the best engineers I know are usually among the quickest to open an editor or debugger and use it fluently to try something out. It's precisely that speed that enables a process like "let's try X, hmm, how about Y, no... ok, Z is nice; ok team, here are the tradeoffs...". Then they remember their experience with X, Y, and Z, and use it to shape their thinking going forward.
Meanwhile, other engineers have gotten X to finally mostly work and are invested in shipping it because they just want to be done. In my experience, this is how a lot of coding agents seem to act.
It's not obvious to me how to apply the expert loop to agentic coding. Of course you can ask your agent to try several different things and pick the best, or ask it to recommend architectural improvements that would make a given change easier...
> Of course you can ask your agent to try several different things and pick the best, or ask it to recommend architectural improvements that would make a given change easier
The ideal solution increasingly seems to be encoding everything that differentiates a good engineer from a bad engineer into your prompt.
But at that point the LLM isn’t really the model as much as the medium. And I have some doubts that LLMs are the ideal medium for encoding expertise.
The way you apply the expert loop is to be the expert. "Can we try this...", "have you checked that...", "but what about...".
To some degree you can try to get agents to work like this themselves, but it's also totally fine (good, actually) to be nudging the work actively.
The Pragmatic Programmer book has whole chapters about this. Ultimately, you either solve the problem analogously (whiteboard, deep thinking on a sofa). Or you got fast as trying out stuff AND keeping the good bits.
That's not my experience... mostly it's about first interrogating the actual problem with the customer and conditions under which it occurs. Maybe we even have appropriate logging in our production application? We usually do, because you know, we usually need to debug things that have already happened.
(If it's new/unreleased code, sure fine, let's find a debugger.)
Unfortunately thoughtful design and engineering doesn't get recognised
The risk isn't that agents write bad code. It's that developers lose the sense that tells them where code is bad. Code review is perception. Writing code is proprioception. They're different senses and one doesn't substitute for the other.
The question for the agent era isn't "is the code good enough to ship" — it's "do I still have enough coupling to the codebase to know when it isn't?"
I figure if it cant code when it has all of the necessary context available and when obscure failures are easily detected then why would i trust it when building features and fixing bugs?
It never did get good enough at refactoring.
Loss of discipline can be a result of panic or greed.
Perhaps believing that your own costs or your competitors' costs are suddenly becoming 10x lower could inspire one of those conditions?
(Also for greenfield projects specifically, it can plausibly be an experiment just to verify what happens. Some orgs are big enough that of course they can put a couple people on a couple-month project that'll quite likely fall flat.)
I do this too, but then I sit and observe how agent gets very creative by going around all of these layers just to get to the finish line faster.
Say, for example, if I needlessly pass a mutable reference and the linter screams at me, I know it's either linter is wrong in this case, or I should listen to it and change the signature. If I make the lazy choice, I will be dissatisfied with myself, I might even get scolded, or even fired if I keep making lazy choices.
LLM doesn't get these feelings.
LLM will almost always go for silencing it because it prevents it from reaching the 'reward'. If you put guardrails so that LLM isn't allowed to silence anything, then you get things like 'ok, I'll just do foo.accessed = 1 to satisfy the linter'.
Same story with tests. Who decides when it's the test that should be changed/deleted or the implementation?
Claude is remarkably good at figuring this is out. I asked it to look at a failing test in a large and messy Python codebase. It found the root cause and then asked whether the failure was either a regression or an insufficiently specified test, performed its own investigation, and found that the test harness was missing mocks that were exposed by the bug fix.
It has become amazingly good at investigating.
I can generate a lot of tests amounting to assert(true). Yeah, LLM generated tests aren't quite that simplistic, but are you checking that all the tests actually make sense and test anything useful? If no, those tests are useless. If yes, I don't actually believe you.
It's the typical 10 line diff getting scrutinized to death, 1000 line diff: Instant LGTM.
Pay attention to YOUR OWN incentives.
Lead engineer says something is not workable? Pm overrides saying that Claude code could do it. Problems found months later at launch and now the engineers are on the hook.
New junior onboardee declares that their new vision is the best and gets management onto it cuz it’s trendy -> broken app.
It’s made collaboration nearly unbearable as you are beholden to the person with the lowest standards.
Exactly right.
As models get better, they seem to be biased to doing most of these things without needing to be told. Also, coding tools come with built in skills and system prompts that achieve similar things.
Two years ago I was copy pasting together a working python fast API server for a client from ChatGPT. This was pre-agentic tooling. It could sort of do small systems and work on a handful of files. I'm not a regular python user (most of my experience is kotlin based) but I understand how to structure a simple server product. Simple CRUD stuff. All we're talking here was some APIs, a DB, and a few other things. I made it use async IO and generate integration tests for all the endpoints. Took me about a day to get it to a working state. Python is simple enough that I can read it and understand what it's doing. But I never used any of the frameworks it picked.
That's 2 years ago. I could probably condense that in a simple prompt and achieve the same result in 15 minutes or so. And there would be no need for me to read any of that code. I would be able to do it in Rust, Go, Zig, or whatever as well. What used to be a few days of work gets condensed into a few minutes of prompt time. And that's excluding all the BS scrum meetings we'd have to have about this that and the other thing. The bloody meetings take longer than generating the code.
A few weeks ago I did a similar effort around banging together a Go server for processing location data. I've been working against a pretty detailed specification with a pretty large API surface and I wanted an OSS version of that. I have almost no experience with Go. I'd be fairly useless doing a detailed code review on a Go code base. So, how can I know the thing works? Very simple, I spent most of my time prompting for tests for edge cases, benchmarking, and iterating on internal architecture to improve the benchmark. The initial version worked alright but had very underwhelming performance. Once I got it doing things that looked right to me, I started working on that.
To fix performance, I iterated on trying to figure out what was on the critical path and why and asking it for improvements and pointed questions about workers, queues, etc. In short, I was leaning on my experience of having worked on high throughput JVM based systems. I got performance up to processing thousands of locations per second; up from tens/hundreds. This system is intended for processing high frequency UWB data. There probably is some more wiggle room there to get it up further. I'm not done yet. The benchmark I created works with real data and I added generated scripts to replay that data and play it back at an accelerated rate with lots of interpolated position data. As a stress test it works amazingly well.
This is what agentic engineering looks like. I'm not writing or reviewing code. But I still put in about a week plus of time here and I'm leaning on experience. It's not that different from how I would poke at some external component that I bought or sourced to figure out if it works as specified. At some point you stop hitting new problems and confidence levels rise to a point where you can sign off on the thing without ever having seen the code. Having managed teams, it's not that different from tasking others to do stuff. You might glance at their work but ultimately they do the work, not you.
I feel like this is just not true. An JSON API endpoint also needs several decisions made.
- How should the endpoint be named
- What options do I offer
- How are the properties named
- How do I verify the response
- How do I handle errors
- What parts are common in the codebase and should be re-used.
- How will it potentially be changed in the future.
- How is the query running, is the query optimized.
…
If I know the answer to all these questions, wiring it together takes me LESS time than passing it to Claude Code.
If I don’t know the answer the fastest way to find the answer is to start writing the code.
Additionally, whilst writing it I usually realize additional edge cases, optimizations, better logging, observability and what else.
The author clearly stated the context for this quote is production code.
I don’t see any benefits in passing it to Claude Code. It’s not that I need 1000s of JSON API endpoints.
That's just not true, and if it is in your case, then you're not great at writing prompts yet.
> Take the todo_items table in Postgres and build a Micronaut API based around it. The base URL should be /v1/todo_items. You can connect to Postgres with pguser:pgpass@1.2.3.4
That's about all it takes these days. Less lines of code than your average controller.
And every day I do something else where the LLM output is off enough that I end up spending the same amount of time on it as if I'd done it by hand. It wrote a nice race condition bug in a race I was trying to fix today, but it was pretty easy for me to spot at least.
And once a week or so I ask for something really ambitious that would save days or even weeks, but 90% of the time it's half-baked or goes in weird directions early and would leave the codebase a mess in a way that would make future changes trickier. These generally suggest that I don't understand the problem well enough yet.
But the interesting things are:
1) many of the things it saves 90% of the time on are saving 5+ hours
2) many of the things I have to rework only cost me 2+ hours
3) even the things that I throw away make it way faster to discover that 'oh, we don't understand this problem well enough yet to make the right decisions here yet' conclusion that it would be just starting out on that project without assistance
so I'm generally coming out well ahead.
Now that ratio is swinging way over towards the LLMs favor.
How do you reconcile that with your example prompt, which demonstrates no skill requirement whatsoever. It’s the first thing any developer would think of.
Your comment exemplifies what a lot of people complain about vibe coding: it works great for greenfielding CRUD apps, but it’s a bitch to use in a real code base.
Communicating, in words, is extremely hard. I don't think this should be as controversial as it's seems in the prompt era.
VS: someone has mastered one of the myriad openAPI generators, and it's shipped.
Letting the tool figure out your assumed intent on those things is a double-edged sword. Better than you never even thinking of them. But potentially either subtle broken contracts that test coverage missed (since nobody has full combinatoric coverage, or the patience to run it) or just further steps into a messy codebase that will cost ever-more tokens to change safely.
"I'll go in the other direction and say that if you're spending a lot of your time learning to [program] better then you're wasting it because [computer]s are only going to get better at [computing] regardless of "[software] engineering". The JSON API example to wire up a database can be [run] pretty easily by the latest [computer]s without much [design] and without setting up any [optimizations]. The more time you spend perfecting your [program], the more time you would have wasted when the next [computer] comes out to make it obsolete."
I think 3.5 would probably need more frequent intervention than a lot of harnesses give. But I bet 4 could do a simple JSON API one-shot with the right harness. Just back then I had to manually be the harness.
I started as a skeptic and have similarly drank the kool-aid. The reality is AI can read code faster than I can, including following code paths. It can build and keep more context than I can, and do it faster as well. And it can write code faster than I can type. So the effort to learn how to tell it what to do is worthwhile.
Time-wise, it's easy-mode vs easy-mode at that point.
The human is more likely to make copypasta errors, though!
> provides not great prompt
How so?
When I write code every character I type in my computer has less ambiguity than when I write it in human language? I also have the help of LSPs, Linters and Auto-completes.
- that you spend no amount of time looking things up, reorganising, or otherwise getting stuck
- that you have a solution to the problem ready to go at all times
- that your solution is better than the LLM's solution
I highly, highly doubt that all 3 of these are true. I doubt even 1 of them is true, I think you just don't know how to use LLMs in a focused way.
I've been trying to get into agentic coding and there are non-refactoring instances where I might reac for it (like any time I need to work on something using tailwind; I'm dyslexic and I'd get actual headaches, not exaggerating, trying to decipher Tailwind gibberish while juggling their docs before AIs came around)
Lets say on that JSON API I want to extract part of the logic in a repositiory file i CTRL + W the function then I have almost all of my shortcuts with left alt + two character shortcuts. So once marked i do LAlt + E + M for Extract Method then it puts me in a step in between to rename the function and then LAlt + M+V for MoVe and then it puts me in an interface to name the function.
Once you used to it its like a gamer doing APMS and its deterministic and fast. I also have R+N (rename), G+V (generate vitest) Q+C(query console), Q+H(Query history) and many more. Really useful. Probably also doable with other editors.
It took quite some time to figure out what works and what triggers it. However I don’t know it’s the same for RSI.
I’m grateful for the ability to use speaking as a second option, but utilizing both I can’t cope that speaking is even remotely close to typing :/
Understanding that limiting number of “design patterns” in a codebase made it better (easier to code and understand) was a good proxy for seniority before LLMs.
Now it’s even better: if all of a sudden “unusual code” is in a PR, either the person opening the PR or the one reviewing it has lost touch with the codebase. Very important signal, since you don’t want that to happen with code you care about.
LLMs amplify this behaviour.
Just be outside and present.
Every verb implemented, and implemented correctly according to the obscure IETF and most compatible way when the IETF never made it clear
Intuitively named routes, error, authentication all easily done and swappable for another if necessary
I feel like our timeline split if you’re not seeing this
Use-cases differ, you described a complete REST API, which can be as much of a problem as a too little.
It'll even suggest it
You want a single RPC websocket go for it
2. There absolutely are cases where modifying code "manually" is unquestionably faster than prompting an LLM. There are trivial examples for this - eg only an insane person would ask an LLM to rename a variable rather than using an LSP for that. It would provably and consistently take more keystrokes. There are less trivial examples as well, like, you know, having an understanding of your codebase and using good abstractions/libraries within it that let you make large changes to the program's behavior with little boilerplate code.
One can argue that producing a lot of complex changes through an LLM is faster, which I would agree with, but then see point #1. Sustainable software development has up to this point relied on iterative discovery of the right small components that together form a complete, functional, stable system (see "Programming as Theory Building").
There's zero indication so far that LLMs are capable of speeding up the process of creating complete, functional, stable systems. What every org within my career and friend circle is seeing (and research into productivity impacts of LLMs on software development is showing) is the same story - fast prototypes that either turn into abandonware, personal tools, or maintenance nightmares.
If the code doesn't compile, that's easy to spot. If the code compiles but doesn't work, that's still somewhat easy to spot.
If the code compiles and works, but it does the wrong thing in some edge case, or has a security vulnerability, or introduces tech debt or dubious architectural decisions, that's harder to spot but doesn't reduce the review burden whatsoever.
If anything, "truthy" code is more mentally taxing to review than just obviously bad code.
The current fever pitch mandates from above seem to want it applied liberally, and pushing back against that is so discouraging and often career-limiting as to wear the fabric of one's psyche threadbare. With all the obvious problems being pointed out to people, there are just as many workarounds; and these workarounds, as is often revealed shortly thereafter, have their own problems, which beget new solutions, ad infinitum.
At some point it genuinely seems like all this work is for the sake of the machine itself. I suppose that is true: The real goal has become obscured at so many firms today, that all that remains is the LLM. Are the people betting the farm and helping implement the visions of those who have done so guaranteed a soft exit to cushion them from the consequences, or is rationality really being discarded altogether?
Sure, sound engineering principles can help work around these problems, but what efficiency is truly gained, in terms of cognitive load, developer time, money, or finite resources? Or were those ever an earnest concern?
1. They're low stakes to get wrong.
2. The most common is MCPs or similar ai-tooling.
3. Making them look good takes time and effort still. It's a multiplier, not a replacement.
4. Quality and maintainability require investment. I had to restart an agentic project several times because it painted itself into a corner.
It’s an absolute game changer, and it can now multiply your productivity fivefold if it’s a solo greenfield project.
Maybe half a year ago it was as you said. You had to wait for the agent to finish, you had to review carefully, and often the result was not that great. You did not save a lot of time.
Now I can spin up 3+ parallel conversations in Codex, each in a git worktree. My work is mainly QA testing the features, refining the behavior, and sometimes making architectural decisions.
The results are now undeniable. In the past I could not have developed a product of that scope in my free time.
That is what is possible today. I suspect many engineers have not yet tried things that became feasible over the last months. Like parallel agents, resolving merge conflicts, separating out functionality from a large branch into proper PRs.
I have heard this statement every single day for 2 years and yet we still have no companies compressing 10 years into 1 year thus exploding past all the incumbents who don't "get it".
> if it’s a solo greenfield project
which is a pretty large caveat. Anecdotally, I've found my side projects (which are solo greenfield projects, and don't need to be supported to the same standards as enterprise software) have gained the boost the GP was talking about.
At work, it's different, since design, review, and maintenance is much more onerous.
The first line of code was written on November 25th. It achieved adoption in the "personal agents" space that far exceeded the other companies that had tried the same thing.
(Whether or not you trust the quality of the software you can't deny the impact it had in such a short time. It defined a new category of software.)
Like, look at e.g. YC minus the AI and AI ajacent companies. Are those startups meaningfully more impressive or feature-rich as compared to a couple years ago?
I expect we will start seeing the impact of the new coding agent enhanced development processes over the next few months.
If agents could really compress 10 years of development into 1 year, you'd see people making e.g. HFT platforms and becoming obscenely rich, not making a fun open-source project and getting hired by OpenAI as an employee.
https://tools.simonwillison.net/github-repo-stats?repo=OpenC...
I meant a month for the initial release, not current state.
Regardless, much like lines of code, number of commits is not a good metric, not even as a proxy, for how much "work" was actually done. Quickly browsing there are plenty[0] of[1] really[2] small[3] commits[4]. Agentic coding naturally optimizes for small commits because that's what the process is meant to do, but it doesn't mean that more work is being done, or that the work is effective. If anything, looking at the changelog[5] OpenClaw feels like a directionless dumpster fire right now. I would expect a lot more from a project if it had multiple people working on it for 5 years, pre-AI.
[0] https://github.com/openclaw/openclaw/commit/e43ae8e8cd1ffc07...
[1] https://github.com/openclaw/openclaw/commit/377c69773f0a1b8e...
[2] https://github.com/openclaw/openclaw/commit/ffafa9008da249a0...
[3] https://github.com/openclaw/openclaw/commit/506b0bbaad312454...
[4] https://github.com/openclaw/openclaw/commit/512f777099eb19df...
[5] https://github.com/openclaw/openclaw/blob/main/CHANGELOG.md
> (Whether or not you trust the quality of the software you can't deny the impact it had in such a short time. It defined a new category of software.)
I brought up OpenClaw here because the challenge was:
> we still have no companies compressing 10 years into 1 year thus exploding past all the incumbents who don't "get it".
I don't know anything about the code quality of OpenClaw, but telling me the number of commits tells me precisely nothing of use.
If that were true, all of these anti-AI greybeards who have been in the game for 30 years would all own their own jets.
Which is exactly why you can't use it as an example, there is no control. This is basic stuff.
https://www.reuters.com/technology/openclaw-enthusiasm-grips...
Cryptocurrencies? Barely any other use than money laundering, buying drugs and betting on the outcome of battles in war. And NFTs? No use at all other than money laundering and setting money ablaze.
It's like I never wrote them, because I didn't. I've got the gist of them, but it's the same way I get the gist of something like Numpy: I know how it works theoretically, but certainly not specifically enough to jump in and write some working Fortran that fixes bugs or adds features.
I now have a bunch of stalled projects I'm not very familiar with. I no longer do solo green field projects that way.
Why do I not see 5x as many interesting greenfield projects than before?
That's a big if. I don't have numbers but most professional engineers are not working on such projects
The degenerate side is clueless upper management and fad-driven engineering. We have talked extensively about this.
There is a more rational side to it that I've seen in my org: some engineers absolutely refuse to use AI and as a consequence they are now, clearly and objectively, much less productive than other engineers. The thing is, you still need to learn how to use the tool, so a nontrivial percentage of obstinate engineers need to be driven to use this in the same way that some developers have refused to use Docker or k8s or whatever.
Perhaps these “obstinate” engineers have good reason in their decision. And it should be their decision!
To be so confident in what is “the right way (TM)” and try to force it onto others is... revealing.
Sounds like a human? The ‘statistical’ part is arguable, I suppose.
I'm sure I will have no problem whatsoever remaining in the employ of a firm that trusts me to make products and tooling that still push the envelope of what's possible without having to resort to the sheer brute force of trillion parameter-scale models.
After 18 months the hard evidence is in place. And much like replacing bare-metal servers for many use cases where evidence shows that the burden of k8s or the substitution of shell scripts for Terraform, it's time to move on.
I don't really see a place for no AI usage in line-of-business software apps anymore.
Honest question: what about the counter-argument that humans make subtle mistakes all the time, so why do we treat AI any differently?
A difference to me is that when we manually write code, we reason about the code carefully with a purpose. Yes we do make mistakes, but the mistakes are grounded in a certain range. In contrast, AI generated code creates errors that do not follow common sense. That said, I don't feel this differentiation is strong enough, and I don't have data to back it up.
But another answer is that human autonomy is coupled to responsibility. For most line employees, if they mess up badly enough, it's first and foremost their problem. They're getting a bad performance review, getting fired, end up in court or even in prison. Because you bear responsibility for your actions, your boss doesn't have to watch what you're up to 24x7. Their career is typically not on the line unless they're deeply complicit in your misbehavior.
LLMs have no meaningful responsibility, so whoever is operating them is ultimately on the hook for what they do. It's a different dynamic. It's probably why most software engineers are not gonna get replaced by robots - your director or VP doesn't want to be liable for an agent that goes haywire - but it's also why the "oh, I have an army of 50 YOLO agents do the work while I'm browsing Reddit" is probably not a wise strategy for line employees.
Isn’t this just because you have seen a lot of PRs from inexperienced engineers? People learn LLM behavior over time, too.
Yes, as an engineer I make mistakes, but I could never make as many mistakes per day as an LLM can
Their mental model doesn't map cleanly enough to yours, and so where for a human you'd have some way to follow their thought patterns and identify mistakes, here the alien makes mistakes that don't add up.
Like the alien has encyclopedic knowledge of op codes in some esoteric soviet MCU but sometimes forgets how to look for a function definition, says "It looks like the read tool failed, that's ok, I can just make a mock implementation and comment out the test for now."
People used to like them and they used to be legends (even if not everyone liked them)
Notch, Woz, Linus and Geohot come to mind
The Metasploit creator Dean McNamee worked for me and he was just like that and a total monster at engineering hard tech products
I have no strong idea why people can't accept that intelligence formed separately of a human brain can truly be alien: not in the hyperbolic sense of "that person is so unique it's like they're a different species", but "that thing does not have a brain, so it can have intelligence that is not human-like".
A human without a brain would die. An LLM doesn't have a brain and can do wonderous things.
It just does them in ways that require first accepting that there is no homo sapien thinks like an LLM.
We trained it on human language so often times it borrows our thought traces so to speak, but effective agentic systems form when you first erase your preconceived notions of how intelligence works and actually study this non-human intelligence and find new ways to apply it.
It's like the early days of agents when everyone thought if you just made an agent for each job role in a company and stuck them in a virtual office handing off work to each other it'd solve everything, but then Claude Code took off and showed that a simple brain dead loop could outperform that.
Now subagents almost always are task specific, not role specific.
I feel like we could leap ahead a decade if people could divorce "we use language, and it uses language so it is like us", but I think there's just something really challenging about that because it's never been true.
Nothing had this level of mastery over human language before that wasn't a human. And funnily enough, the first times we even came close (like Eliza) the same exact thing happened: so this seems like a persistent gap in how humans deal with non-humans using language.
Or maybe just maybe... the thing should be much better designed around the human.
That's how personal computers made their way into homes. People like yourself are comical and can't understand how widespread adoption takes place to obtain value from what the thing intrinsically possesses.
Firms literally exist to take care of the hassle so that the person can get the value from the thing closer to the present - like hello...?
We can't choose if the LLM is like us unless you want to go back 10-20 years in time and choose a new direction for AI/ML.
We stumbled upon an architecture with mostly superficial similarities to how we think and learn, and instead focused on being able to throw more compute and more data at our models.
You're talking about ergonomics that exist at a completely different layer: even if you want to make LLM based products for humans, around humans, you have to accept it's not a human and it won't make mistakes like a human (even if the mistakes look human) -
If anything you're going to make something that burns most people if you just blindly pretend it's human-like: a great example being products that give users a false impression of LLM memory to hide the nitty gritty details.
In the early days ChatGPT would silently truncate the context window at some point and bullshit its way through recalling earlier parts of the conversation.
With compaction it does better, but still degrades noticeably.
If they'd exposed the concept of a context window to the user through top level primitives (like being able to manage what's important for example), maybe it'd have been a bit less clean of a product interface... but way more laypeople today would have a much better understanding of an LLM's very un-human equivalent to memory.
Instead we still give users lossy incomplete pictures of this all with the backends silently deciding when to compact and what information to discard. Most people using the tools don't know this because they're not being given an active role in the process.
Despite what the headlines say, these systems aren’t inscrutable.
We know how these things work and can build around and within and change parameters and activation functions etc…and actually use experience and science and guidance.
However those are not technical problems those are organizational social and quite frankly resource allocation problems.
> but effective agentic systems form when you first erase your preconceived notions of how intelligence works and actually study this non-human intelligence and find new ways to apply it.
There's no reason you can't make good use of them and learn how to do it more reliably and predictably, it's just chasing those gains through a human intelligence-like model because they use human language leads to more false starts and local maxima than trying to understand stand them as their owb systems.
I don't think it should even be a particularly contentious point: we humans think differently based on the languages we learn and grew up with, what would you expect when you remove the entire common denominator of a human brain?
Software developers get paid big money because they can speak alien, the only thing that is changing is the dialect.
I'm an engineers engineer: I get the job isn't LOC but being able to communicate and translate meatspace into composable and robust sustems.
So when I mean an alien when I say an alien.
Not human.
Not in the cute "oh that guy just hears what everyone else hears and somehow interprets it entirely differently like he's from a different planet" alien way, but in the, "it is a different definition of intelligence derived from lacking wetware" alien way.
Intelligence is such multidimensional concept that all of humanity as varied as we are, can fit in a part of the space that has no overlap with an LLM.
-
Now none of that is saying it can't be incredibly useful, but 99% of the misuse and misunderstanding of LLMs stems from humans refusing to internalize that a form of intelligence can exist that uses their language but doesn't occupy the same "space" of thinking that we all operate in, no matter how weird or unqiue we think we are.
I swear I'm living through mass hysteria.
I’m not saying that it’s all hunky dory, but you use AI for straight up test driven development to catch edge cases and correct sloppy implementations before they even get coded by your giant chaos machine.
You instruct it to write the code you want to be written. You still have to know how to develop, it just makes you faster.
If I get pwned because my AI agent wrote code that had a security vulnerability, none of my users are going to accept the excuse that I used AI and it's a brave new world. I will get the blame, not Anthropic or OpenAI or Google but me.
The same goes for if my AI generated code leads to data loss, or downtime, or if uses too many resources, or it doesn't scale, or it gives out error messages like candy.
The buck stops with me and therefore I have to read the code, line-by-line, carefully.
It's not even a formality. I constantly find issues with AI generated code. These things are lazy and often just stub out code instead of making a sober determination of whether the functionality can be stubbed out or not.
You could say "just AI harder and get the AI to do the review", and I do this a lot, but reviewing is not a neutral activity. A review itself can be harmful if it flags spurious issues where the fix creates new problems. So I still have to go through the AI generated review issue-by-issue and weed out any harmful criticism.
First of all, building a system that constrains the output of the AI sufficiently, whether that's typing, testing, external validation, or manual human review in extremis. That gets you the best result out of whatever harness or orchestration you're using.
Secondly, there's the level at which you're intervening, something along the hierarchy of "validate only usage from the customer perspective" to "review, edit, and validate every jot and tiddle of the codebase and environment". I think for relatively low importance things reviewing at the feature level (all code, but not interim diffs) is fine, but if you're doing network protocol you better at least validate everything carefully with fuzzing and prop testing or something like that.
And then you've got how you structure your feedback to the LLM itself - is it an in-the-loop chat process, an edit-and-retry spec loop, go-nogo on a feature branch, or what? How does the process improve itself, basically?
I agree with you entirely that the responsibility rests on the human, but there are a variety of ways to use these things that can increase or decrease the quality of code to time spent reviewing, and obviously different tasks have different levels of review scrutiny, as well.
My nonexistent backend isn’t going to be pwned if there is a bug in the thumbnail generation.
After the QA testing on my device, a quick scroll through of the code is enough.
Maybe prompt „are errors during thumbnail generation caught to prevent app crashes?“ if we‘re feeling extra cautious today.
And just like that it saved a day of work.
Hmm. Historically image editing was one of the easier to exploit security holes in many systems. How do you feel about having unknown entities having shell inside your datacenter or vpc?
- webview fallback with canvas capture for codecs not supported in the default player
- detecting blank frames and diff between thumbnails to maximize variety
- UI integration to visualize progress and pending thumbnails, batched updates to the gallery
- versioning scheme and backfill for missing/outdated thumbnail formats
Honestly, a day seems rather optimistic to me. Maybe if I was an expert for this platform and would have implemented a similar feature before, then I could hope to do it in a day.
If I had to handwrite it and estimate it for Scrum at work, I‘d budget a week.
Video thumbnails are a different beast altogether. And you might want to double check your assumptions about security considerations. If any of your ffmpeg, opencv, pyscenedetect code is running on your server, it might well be exploitable.
Ironically, already another user in this comment section was concerned about the security of my nonexistent backend.
But it’s good to know, I was not previously aware that video processing on the backend is a common source of vulnerabilities.
It is so embarrassing that LOC is being used as a metric for engineering output.
I have worked with code where 1000s of lines are very straightforward and linear.
I’ve worked on code where 100 lines is crucial and very domain specific. It can be exceptionally clean and well-commented and it still takes days to unpack.
The skills and effort required to review and understand those situations are quite different.
One is like distance driving a boring highway in the Midwest: don’t get drowsy, avoid veering into the indistinguishable corn fields, and you’ll get there. The other is like navigating a narrow mountain road in a thunderstorm: you’re 100% engaged and you might still tumble or get hit by lightning.
So I’m pretty skeptical that reviewing 2000 lines of code won’t take any more time than reviewing 200 lines of code.
Furthermore how do you know the AI generated lines are the open highway lines of code and not the mountain road ones? There might be hallucinations that pattern match as perfectly reasonable with a hard to spot flaw.
It depends on the code. If you’re comparing code of the same complexity then, sure, 2000 lines will take longer than 200.
I was comparing straight linear code to far more complex code. The bug/line rate will be different and the time to review per line will be different.
> Furthermore how do you know the AI generated lines are the open highway lines of code and not the mountain road ones?
Again, it depends on the code. Which was my point.
Linear code lacks branches, loops, indirection, and recursion. That kind of code is easy to reason about and easy to review. The assumptions are inherently local. You still have to be alert and aware to avoid driving into the cornfields.
It’s a different beast than something like a doubly-nested state machine with callbacks, though. There you have to be alert and aware, and it’s inherently much harder to review per line of code.
Very far from the truth in practice, every line of code isn't as difficult/easy to review as the other.
{x{x,sum -2#x}/0 1}
or def f(n):
if n <= 1:
return n
else:
return f(n-1) + f(n-2)
They're both the same programObjectives change; timeliness matters. The speed at which you deliver value is incredibly important, which is why it matters to measure your process. Deceptively dense is what I’d call software engineers who can’t accept that the process is actually generalizable to a degree and that lines of code are one of the few tangible things that can be used as a metric. Can you deliver value without lines of code?
This assumes that shorter code is faster to write. To quote Blaise Pascal, "I would have written a shorter letter, but I did not have the time."
> Can you deliver value without lines of code?
No, but you can also depreciate value when you stuff a codebase full of bloated, bug-ridden code that no man or machine can hope to understand.
“All models are wrong, some are useful”. What’s not useful is constantly bitching about how there’s no way to measure your work outside of the binary “is it done” every time process efficiency is brought up.
It's still useful, however, because that is the only metric that is instantly intuitively understandable and comparable across a wide variety of contexts, i.e. across companies and teams and languages and applications.
As we know, within the same team working on the same product, a 1000 LoC diff could take less time than a 1 line bug fix that took days to debug. Hence we really cannot compare PRs or product features or story points across contexts. If the industry could come up with a standard measure of developer productivity, you'd bet everyone would use it, but it's unfeasible basically for this very reason.
So, when such comparisons are made (and in this case it was clearly a colloquial usage), it helps to assume the context remains the same. Like, a team A working on product P at company C using tech stack T with specific software quality processes Q produced N1 lines of code yesterday, but today with AI they're producing N2 lines of code. Over time the delta between N1 and N2 approximates the actual impact.
(As an aside, this is also what most of the rigorous studies in AI-assisted developer productivity have done: measure PRs across the same cohorts over time with and without AI, like an A/B test.)
I rewrote the same program using my own brain and just using ChatGPT as google and autocomplete (my normal workflow), I produced the same thing in 1500 LOC.
The effort difference was not that significant either tbh although my hand coded approach probably benefited from designing the vibe coded one so I had already though of what I wanted to build.
My experience was the same as you when I started using agents for development about a year ago. Every time I noticed it did something less-than-optimal or just "not up to my standards", I'd hash out exactly what those things meant for me, added it to my reusable AGENTS.md and the code the agent outputs today is fairly close to what I "naturally" write.
> It is so embarrassing that LOC is being used as a metric for engineering output.
In one of my previous org, LOC added in the previous year was a metric used to find out a good engineer v/s a PIP (bad) engineer. Also, LOC removed was treated as a negative metric for the same. I hope they've changed this methodology for LLM code-spitting era...
We should have gone the other way; generated a lot of code and demanded pay raises; look at the LOC I cranked out! Company is now in my debt!
If they weren't going to care enough as managers to learn and line go up is all that matters to them, make all lines go up = winning
You all think there's more to this than performative barter for coin to spend on food/shelter.
Although this requires you to take pride in your profession and what you do.
Got it.
...ok fine; lack of political action to put us all on the hook for your healthcare is your choice to take a gamble on a paycheck. It's a choice to say your own existence is not owed the assurance of healthcare.
So I will honor your choice and not care you exist.
Good way of putting it.
Do you reject all stats that treat the number of people involved (eg. 2 million pepole protested X) as "embarrassing" ... because they lump incredibly varied people together and pretend they're equal?
AI helps eng ship more and faster, I think that’s the takeaway.
We're also assuming LOC vibe coded by competent engineers who should be able to tell when something is overengineered.
If we shift the paradigm of how we approach a coding problem, the coding agents can close that gap. Ten years ago every 10 or 15 minutes I would stop coding and start refactoring, testing, and analyzing making sure everything is perfect before proceeding because a bug will corrupt any downstream code. The coding agents don't and can't do this. They keep that bug or malformed architecture as they continue.
The instinct is to get the coding agents to stop at these points. However, that is impossible for several reasons. Instead, because it is very cheap, we should find the first place the agent made a mistake and update the prompt. Instead of fixing it, delete all the code (because it is very cheap), and run from the top. Continue this iteration process until the prompt yields the perfect code.
Ah, but you say, that is a lot of work done by a human! That is the whole point. The humans are still needed. The process using the tool like this yields 10x speed at writing code.
You could get to "something that works" rather fast but it took a long time to 1) evaluate other options (maybe before, maybe after), 2) refine it, 3) test it and build confidence around it.
I think your point stands but no one really knows where. The next year or so is going to be everyone trying to figure that out (this is also why we hear a lot of "we need to reinvent github")
Shame that what is left for the humans is the shitty, tedious part of the work.. It reminds me of the quote:
I want AI to do my laundry and dishes so that I can do art and writing, not for AI to do my art and writing so that I can do laundry and dishes..I believe the llm providers went with the wrong approach from the off - the focus should’ve been on complementing labour not displacement. And I believe they have learned an expensive lesson along the way.
But the first time I say “No, it should be …” it’s nearly game over. If you say it 3+ times in a row, you’re basically doomed.
Sure, you can get it to fix the bug, but it comes at the cost of future prompts often barely working.
The moment I hit the "no, it should be.." point, I know it's the end of it.
Sometimes I can salvage something by asking for a summary of the work and reasoning done, and doing a fresh restart. But often times, it's manual corrections and full restart from there.
The person who builds an agentic IDE or GitHub alternative that natively does the process you describe will be a multibillionare.
Do you want a demo of what this is capable of?
And it's not just easier because it's cheap, it's easier because you're not emotionally attached to that code. Just let it produce slop, log what worked, what didn't, nuke the project and start over.
It just gets incredibly boring.
Vibe coding: one shot or few shot, smoke test the output, use it until it breaks (or doesn't). Ideal for lightweight PoC and low stakes individual, family or small team apps.
Agentic engineering: - You care about a larger subset of concerns such as functional correctness, performance, infrastructure, resilience/availability, scalability and maintainability. - You have a multi-step pipeline for managing the flow of work - Stages might be project intake, project selection, project specification, epic decomposition, d=story decomposition, coding, documentation and deployment. - Each stage will have some combination of deterministic quality gates (tests must pass, performance must hit a benchmark) and adversarial reviews (business value of proposed project, comprehensiveness of spec, elegance of code, rigor and simplicity of ubiquitous language, etc)
And it's a slider. Sometimes I throw a ticket into my system because I don't want to have to do an interview and burn tokens on three rounds of adversarial reviews, estimating potential value and then detailed specification and adversarial reviews just to ship a feature.
I've been using Opus, GPT-5.5, and some lesser models on a daily basis, but not having them handle entire tasks for me. Even when I go to significant effort to define and refine specs, they still do a lot of dumb things that I wouldn't allow through human PR review.
It would be really easy to just let it all slide into the codebase if I trusted their output or had built some big agentic pipeline that gave me a false sense of security.
Maybe 10 years from now the situation will be improved, but at the current point in time I think vibe coding and these agentic engineering pipelines are just variations of a same theme of abdicating entirely to the LLM.
This morning I was working on a single file where I thought I could have Opus on Max handle some changes. It was making mistakes or missing things on almost every turn that I had to correct. The code it was proposing would have mostly worked, but was too complicated and regressed some obvious simplifications that I had already coded by hand. Multiply this across thousands of agentic commits and codebases get really bad.
But resisting that impulse is just another part of being a professional. If your standards involve a certain level of test coverage, but your tests haven't flagged any issues in a long time, you might be tempted to write fewer tests as you continue to write more code. Being a professional means not giving in to that temptation. Keep to your quality standards.
Sure, standards are ultimately somewhat arbitrary, and experience can and should cause you to re-evaluate your standards sometimes to see if they need tweaking. But that should be done dispassionately, not in the middle of rushing to complete a task.
And hell, maybe someday the agents will get so good that our standards suggest that vibe coding is ok, and should be the norm. But you're still the one who's going to be responsible when something breaks.
If I hire a plumber it's certainly not cheaper than doing it myself but when I am paying money I want to make sure it is better quality than what I am vibe plumbing myself.
Let's assume AI is 10x perfect than humnas in accuracy and produces 10x less bugs and increases the speed by 1000x compared to a very capable software engineer.
Now imagine this: A car travels at a road that has 10x more bumps but it is traveling 1000x slower pace so even though there are 10x bumps, your ride will feel less bumpy because you're encountering them at far lower pace.
Now imagine a road that has 10x less bumps on the road but you're traveling at 1000x the speed. Your ride would be lot more bumpy.
That's the agentic coding for you. Your ride would be a lot more painful. There's lots of denial around that but as time progresses it'll be very hard to deny.
Lastly - vibe coding is honest but agentic coding is snake oil [0] and these arguments about having harnesses that have dozens of memory, agent and skill files with rules sprinkled in them pages and pages of them is absolutely wrong as well. Such paradigm assumes that LLMs are perfect reliable super accurate rule followers and only problem as industry that we have is not being able to specify enough rules clearly enough.
Such a belief could only be held by someone who hasn't worked with LLMs long enough or is a totally non technical person not knowledgeable enough to know how LLMs work but holding on to such wrong belief system by highly technical community is highly regrettable.
And AI that has been helping all this time will suddenly stop helping out with this one use case. I have experienced AI running in circles, in this case trying to find a root cause. It failed, and the user is left holding the bag. That is when you feel like you have just been dropped into a vast ocean without a lifeboat. Then you'll have to just start looking through those massive chunks of vibe-coded crap to understand what is going on.
AI is good in terms of improving speed, but I am afraid we are massively taking it the wrong way as engineers. Everyone is just letting it go on autopilot and make it do things completely from start to end. The ideal solution lies where every piece of code it writes is reviewed by authors, and they make sure they are not checking in crazy stuff day in and day out.
I maxxed out Claude Max $200 subscription and before I justified spending $100/day.
And it was worth it, but not because it wrote me so good code, but because I learnt the lessons of software engineering fast. I had the exact ride you are describing. My software was incredible broken.
Now I see all the cracks, lies and "barking the wrong tree" issues clearly.
NOW i treat it as an untrustworyth search engine for domains I’m behind at. I also use predict next edit and auto-complete, but I don’t let AI do any edit on my codebase anymore.
Yeah. I'm not sure how other people work, but I almost never need to write formal tests because I essentially test locally as I write, one method at a time, and at that moment I have a complete mental map of everything that can potentially go wrong with a piece of code. I write and test constantly in tandem. I can write a test afterwards to prove what I already know, but I already know it. This is time consuming, anal, and obsessive-compulsive, and luckily that kind of work perfectly suits my personality. The end result is perfect before I commit it.
It is a lot of fun asking LLMs to write code around my code. Make 10 charts with chartjs in an html page that show something and put it behind a reverse proxy so the client can see it. Wow. Spot on, would've taken me an hour. I can even rely on Claude to somewhat honestly reason about things in personal projects.
But knowing every implementation decision makes a huge difference when anything real is at stake. "Guilt" wouldn't begin to describe the sense I'd have id my software did something because of a piece of code I hadn't personally reviewed and fully understood, at which point I probably should have just written it myself.
> The enterprise version of that is I don’t want a CRM unless at least two other giant enterprises have successfully used that CRM for six months. [...] You want solutions that are proven to work before you take a risk on them.
Perhaps not for every category of software and every company. But in practice, any SaaS app that is just CRUD with some business logic + workflows is, imo, absolutely vulnerable to losing customers because people within their customers' orgs vibe coded a replacement.
They are perhaps even more at risk because would-be new customers don't ever even bother searching to find them as an option because they just vibe code a competitor in-house.
The vulnerability lies primarily in the fact that most of these SaaS apps were talking about are _wrong_ to some meaningful degree. They don't fully fit how your company works, and they never did. There is something about them that you are forced to work around in some way. This is true because it is impossible to build a universally perfect product, to perfectly fit it to every business requirement of every user in every company.
But now it is relatively cheap to build the perfect version for your company in-house. Or maybe even just for YOU.
I think medium/long-term this will mean a redistribution of technical talent from SaaS companies to industry companies. Instead of paying millions for SaaS subscriptions, industry companies will spend fewer millions building precisely what they need in-house with the help of AI. Not every SaaS and not every company, but I already see this happening at my company right now.
No, it was never designed around that. All methodologies of software dev don't focus too much on writing the code, but on everything else: requirement definition, quality, maintenance, speed of integrating feature, scaling the work, ...
Personally with 20 years of experience, I never seen a single company were writing the code was a bottleneck
It's the bad, semi-coherent submissions that eat up your time, because you do want to award some points and tell students where they went wrong. It's the Anna Karenina principle applied to math.
Code review is the same thing. If you're sure Claude wrote your endpoint right, why not review it anyway? It's going to take you two minutes, and you're not going to wonder whether this time it missed a nuance.
Note: I still review pretty much every line of code that I own, regardless of who generates it, and I see the problems with agents very clearly... but I can also see the trends.
My take: Instead of crafting code, engineering will shift to crafting bespoke, comprehensive validation mechanisms for the results of the agents' work such that it is technically (maybe even mathematically) provable as far as possible, and any non-provable validations can be reviewed quickly by a human. I would also bet the review mechanisms would be primarily visually, because that is the highest bandwidth input available to us.
By comprehensive validations I don't mean just tests, but multiple overlapping, interlocking levels of tests and metrics. Like, I don't just have an E2E test for the UI, I have an overlapping test for expected changes in the backend DB. And in some cases I generate so many test cases that I don't check for individual rows, I look at the distribution of data before and after the test. I have very few unit tests, but I do have performance tests! I color-code some validation results so that if something breaks I instantly know what it may be.
All of this is overkill to do manually but is a breeze with agents, and over time really enables moving fast without breaking things. I also notice I have to add very few new validations for new code changes these days, so once the upfront cost is paid, the dividends roll in for a long time.
Now, I had to think deeply about the most effective set of technical constraints that give me the most confidence while accounting for the foibles of the LLMs. And all of this is specific to my projects, not much can be generalized other than high-level principles like "multiple interlocking tests." Each project will need its own custom validation (note: not just "test") suites which are very specific to its architecture and technical details.
So this is still engineering, but it will be vibe coding in the sense that we almost never look at the code, we just look at the results.
Other than for your own pet projects, almost all of what you said has no place for "vibe engineering" / or "vibe coding" on serious software engineering products that are needed in life and death situations.
And not all "production-grade, hundred billion dollar systems" are that critical. Like, Claude Code as we all know is clearly vibe-coded and is already a 10-billion (and rapidly increasing!) dollar system. Google Search and various Meta apps meet those criteria and people are already using LLMs on that code, and will soon be "vibe coding" as I described it.
AWS meets that criteria and has already had an LLM-caused outage! But that's not stopping them from doing even more AI coding. In fact I bet they will invest in more validation suites instead, because those are a good idea anyways. After all, all the cloud providers have been having outages long before the age of LLMs.
The thing most people are missing is that code is cheap, and so automated validations are cheap, and you get more bang for the buck by throwing more code in the form of extensive tests and validations at it than human attention.
Edited to add: I think I can rephrase the last line better thus: you get more bang for the buck by throwing human attention at extensive automated tests and validations of the code rather than at the code itself.
>> I think all coding will become vibe coding...
Nope. First of all, Let's get the true definition of "vibe coding" completely clear from the first mention of it from Karpathy. From [0]:
>> "There's a new kind of coding I call "vibe coding", where you fully give in to the vibes, embrace exponentials, and forget that the code even exists." [0]
>> "I "Accept All" always, I don't read the diffs anymore. When I get error messages I just copy paste them in with no comment, usually that fixes it. The code grows beyond my usual comprehension, I'd have to really read through it for a while. Sometimes the LLMs can't fix a bug so I just work around it or ask for random changes until it goes away." [0]
So with the true definition, you are arguing that all coding will become "vibe-coding" and that includes in mission critical software. Not even Karpathy would go as far as that and he's not even sure that he even knows that it works..."mostly".
Responsibility is what cannot be vibe-coded. The major cloud providers and the tech companies that own them have contracts with their customers which is worth billions to their revenue. That is why they cannot afford to "vibe-code" infra that causes them to lose $100M+ a hour when a key part of their infra goes down or stops working.
So:
> Like, Claude Code as we all know is clearly vibe-coded and is already a 10-billion (and rapidly increasing!) dollar system.
That is not vibe-coded anymore and it is maintained by software engineers who look at the code at all times, daily before merging any changes; AI generated or not.
> Google Search and various Meta apps meet those criteria and people are already using LLMs on that code, and will soon be "vibe coding" as I described it.
Nope. As Karpathy described it, that would never happen and human software engineers will be reviewing the agents code all the times. But that would not be vibe-coding would it?
> AWS meets that criteria and has already had an LLM-caused outage!
Are they vibe coding now after that outage? I bet that they are not.
> After all, all the cloud providers have been having outages long before the age of LLMs.
That isn't the point. Someone was held to account for the outages and had to explain why it happened.
They will lose trust + billions of dollars if they admitted that they vibe-coded their entire infra and had 0 engineers who don't understand why it went wrong.
> The thing most people are missing is that code is cheap, and so automated validations are cheap, and you get more bang for the buck by throwing more code in the form of extensive tests and validations at it than human attention.
The risk is amplified with the companies reputation on the line and it's very expensive to lose. I'm talking in the hundreds of billions annually and a 10% loss of global revenues due to constant outages can cause the stock to fall.
So you do understand the contradiction you said earlier about AWS indeed strengthens my point on the limitations on vibe coding especially on mission critical software?
> So this is still engineering, but it will be vibe coding in the sense that we almost never look at the code, we just look at the results.
It is pretty clear that "giving in to the vibes" is simply "looking at the results." But I'm predicting that it is going to be an engineering discipline in itself. Note that I started with (emphasis added):
> I think all coding will become vibe coding but it will be no less an engineering discipline.
And then I went on to explain the engineering aspect as extensive technical validation. There is a role called Validation Engineers in many industries including semiconductors, and I posit that it's going to be everybody's primary role soon.
> Responsibility is what cannot be vibe-coded. ... That isn't the point. Someone was held to account for the outages and had to explain why it happened.
I never implied a loss of accountability anywhere, but I completely agree, and have posted about it before: https://news.ycombinator.com/item?id=46319851
That is still orthogonal to vibe-coding. People have been sloppy without vibe-coding and were still held accountable. The flaw is assuming all vibe-coding is slop, because my point is that validation will matter much more than the code, which means soon we may never look at the code. In fact, extensive automated validation is probably a better signal for accountability than "We looked at the code very, very carefully."
There are people who write software for hedge funds, quant firms, aviation and defense systems, data center providers, major telecom services used by hospitals and emergency services and semiconductor firms and the big oil and energy companies and that is NOT "almost no-one" and these companies see and make hundreds of billions of dollars a year on average.
This is even before me mentioning big tech.
Perhaps the work most here on this site are doing is not serious enough that can be totally vibe-coded and are toy projects and bring in close to $0 for the company to not care.
What I am talking about is the software that is responsible for being the core revenue driver of the business and it being also mission critical.
E.g. there are 100s of millions of lines of code in a car, but the vast majority of that concerns non-critical parts like the dashboard; the primary Engine Control Unit has like ~10K LoC, and the number of people that work on it are proportionally smaller.
And if you think that is very well-designed code, here's something to help you sleep better: https://www.reddit.com/r/coding/comments/384mjp/nasa_softwar...
Favorite quote:" There are a whole bunch of reasons I’m not scared that my career as a software engineer is over now that computers can write their own code, partly because these things are amplifiers of existing experience. If you know what you’re doing, you can run so much faster with them. [...]
I’m constantly reminded as I work with these tools how hard the thing that we do is. Producing software is a ferociously difficult thing to do. And you could give me all of the AI tools in the world and what we’re trying to achieve here is still really difficult. [...]"
Pretty soon there is no code reuse and we're burning money reinventing the wheel over and over.
With LLMs, you can race right for that horizon, go right through, and continue far beyond! But then of course you find yourself in a place without reason (the real hell), with all the horror and madness that that entails.
Isn't this a bit like old Java or IDE-heavy languages like old Java/C#? If you tried to make Android apps back in the early days, you HAD to use an IDE, writing the ridicolous amount of boilerplate you had to write to display a "Hello Word" alert after clicking a button was soul destroying.
If the barrier is too high, code is refactored.
They really are bad for creating a healthy codebase
This is spot on. I think the tooling is evolving so much particularly on the design side that its not worth the "translation cost" to stay (or even be) on the Figma side anymore.
Disclaimer: I'm doing a CAD-like engineering desktop app, and I'm using VS 2026 Copilot, so YMMV.
When I get a Jira ticket, I will first diagnose the problem, and then ask AI to write a test case for it that will reproduce the problem, with guidance on what/how to do the test case (you will be surprised to know how many geometry, seemingly visual problems can be unit tested), and if necessary I provide clues (like which files to read, etc.) for AI to look at, and ask AI to just go and fix the test.
Often AI can do that; AI can make the test pass and make sure that adjacent tests also pass. If in doubt, I will check the output reasoning. I then verify that the fix is done properly via visual inspection (remember, this is a desktop app), and I ask for clarification if needed.
Then at night I'll let my automated test suites run... and oops! Regression found! Who broke it? AI or human? Who cares. I just tell AI that between these times one of the commits must have broken the code — can you please fix it for me? And AI can do that.
This works for small or medium feature implementation, trival bugfixes, or even annoying geometrical problems that require me to dig out the needle in the haystack. So the productivity gain is very real. But I haven't tried it on feature that requires weeks or months for implementation, maybe I should try it next time.
It's hard to describe the feeling. It's just that the AI is working like a very capable (junior?) programmer; both might not have full domain knowledge, but with strong test suites and senior guidance, both can go very far. And of course AI is cheaper and a lot more effective.
It's seriously the thing that worries (and bothers) me the most. I almost never let unedited LLM comments pass. At a minimum.
Most of the time, I use my own vibe-coded tool to run multiple GitHub-PR-review-style reviews, and send them off to the agent to make the code look and work fine.
It also struggles with doing things the idiomatic way for huge codebases, or sometimes it's just plain wrong about why something works, even if it gets it right.
And I say this despite the fact that I don't really write much code by hand anymore, only the important ones (if even!) or the interesting ones.
Also, don't even get me started on AI-generated READMEs... I use Claude to refine my Markdown or automatically handle dark/light-mode, but I try to write everything myself, because I can't stand what it generates.
"Ugh, no! Why would you say it like that? That's not even how it works! Now, I need to write a full paragraph instead of a short snippet to make sure that no future agents get confused in the same way."
How is producing more lines of code any good? How does quality assurance work with immeasurable code bloat? I want good software not slopware with 2000 different features. A good product does few things, but does these really well. There is no need to constantly add lines of code to a working product.
Because most of the complexity in software comes from interfacing with external components, when you don't need to adapt to this you can write simpler and better code.
Rather than relying on an external library, you just write your own and have full control and can do quality control.
Linux kernel is 30 000 000 LOC. At 100 tokens /s, let's say 1 LOC per second produced for a single 4090 GPU, in one year of continuous running 3600 * 24 * 365 = 31 536 000 everyone can have its own OS.
It's the "Apps" story all over again : there are millions of apps, but the average user only have 100 max and use 10 daily at most.
Standardize data and services and you don't need that much software.
What will most likely happen is one company with a few millions GPUs will rewrite a complete software ecosystem, and people will just use this and stop doing any software because anything can be produced on the fly. Then all compute can be spent on consistent quality.
We've known this since close to the advent of computing and yet every generation of has taken us further away from this goal. Largely driven by jealous resource-guarding, particularly when it comes to data. Why don't I have a generic media player app that can stream Netflix, Disney, Hulu, etc? Those brands want control over my experience. They will continue to want that control indefinitely. That basic human desire for control won't evaporate with a "single unified codebase".
People have been running crappy code commercially for over half a century now. Not many companies successfully differentiate by running good code - it usually does not matter to the end consumer, other things are much more important. So now companies will pay less for code, and maybe it is a bit worse (though I personally can't believe AI can do worse than corporate software developers on average). Hobbyists will remain hobbyists, and precious few will be lucky enough to have someone pay them to handcraft stuff. Exactly what happened to woodworkers and other craftsmen.
I work on database optimizers and other database related stuff, and I can assure Claude Code - with all the highest settings - does make mistakes. It will generate a test that does not actually test what it "thinks" it tests. It will confidently break stuff.
Do not get me wrong. It is still awesome! It takes much of grunt work off me. It can game out designs decisions even when that needs to refactor a lot of code. If you point out a mistake more often than not it can fix it itself.
It's just for a critical project I would never ship it without understanding every line of code - with the exception perhaps of some of the test code. Maybe in a year or two that will be different.
> If another team hands over something and says, “hey, this is the image resize service, here’s how to use it to resize your images”... I’m not going to go and read every line of code that they wrote.
The distance of accountability of the output from its producer is an important metric. Who will be held accountable for which output: that's important to maintain and not feel the "guilt".
So, organizations would need to focus on better and more granular building incentives and punishment mechanisms for large-scale software projects.
Which is the same issue of lack of understanding and care and accountability from the human operator, with extra steps and a false sense of security.
Property-based testing in particular has uncovered a number of invariants in every code base I've introduced it to.
tbf depending on the agent/model a lot of the tests end up being thrown out so it's possible I _should_ handwrite more tests, but having better prompts and detailed plans seems to mitigate that somewhat
Repeat after me: it follows that most of the money the software makes occurs during the maintenance phase.
Repeat after me: our industry still does not understand this after almost 100 years of being in existence.
Alan Kay was 100% right when he said that the computer revolution hasn't occurred yet. For all of our current advancements all tools are more or less in the Stone Age.
My great hope is that AI will actually accelerate us to a point where the existing paradigm fully breaks beyond healing and we can finally do something new, different, and better.
So for now - squeee! - put a jetpack on your SDLC with AI and go to town!!! Move fast and break things (like, for real).
My favorite JIRAs are the ones I prevent from being worked on in the first place because they were unnecessary.
The ideal prompt is the one I don't fire because it would be a waste.
In an application with an LLM component, the ideal amount of inference is zero.
Ultimately this seems to lead to "the ideal amount of computers in the world is none" but for the sake of my continued employment let's let that one go by. :)
I'd say if you're a semi-competent developer, as probably many people reading the article and commenting already are, this comment adds nothing new to the discussion and would already be a very vanilla usage example of "AI".
I think the point is that while you can "do things" like extracting the stripe integrations out into their own service in ten minutes, you're not stepping into other problems, such as how do you handle failures, how do you scale the stripe service, how do you structure all your other micro services so they can communicate in a coherent way, basically you're speed running yourself into harder decisions when using AI.
on the contrary, I freed myself from the burden of having to find all the places in the code base where we used stripe and patched them in one go along with the tests to prevent regressions. That represents DAYS of work that I condensed into a few hours.
who cares if it can't know good structure and how to handle failures? I know how to do that. I have a skills file I created that tells stripe our policy for handling error failures, defaults for structures as well as guidelines for how we should deal with communications between different systems. Before i spent hours building this stuff out. now I just spend 20-30 min reviewing a pr to make sure it follows my directives and move onto other problems.
Thats said, i agree with you on principle. I hand coded an app from a solo dev to now managing a team and gettin ready for an imminent series A. AI doesn't save you from scaling issues, you still need to have a clear idea of what you want from the ai and build processes that give it the context to do its job.
I call that job security :)
I am not a developer and have very basic code knowledge. I recently built a small and lightweight Docker container using Codex 5.5/5.4 that ingests logs with rsyslog and has a nice web UI and an organized log storage structure. I did not write any code manually.
Even without writing code, I still had to use common sense in order to get it in a place I was happy with. If i truly knew nothing, the AI would have made some very poor decisions. Examples: it would have kept everything in main.go, it would have hardcoded the timezone, the settings were all hardcoded in the Go code, the crash handling was non existent, and a missing config would have prevented start. And that is on a ~3000 line app. I cannot imagine unleashing an AI on a large, complex. codebase without some decent knowledge and reviewing.
Opus 4.7 built it about 90% the same way I would, but had way more convenience methods and step-validations included.
It's great, and really frees me up to think about harder problems.
Just having ~13yrs experience heavily weighted in one language with some formal studying of others makes directing llms a lot simpler.
Learning syntax, primitives, package managers, testing, etc isn't that much of a lift compared to how I used to program.
Was helping a non-dev colleague who's using claude cowork/code to automate reporting the other day. They understand the business intelligence side well, but were struggling with basic diction to vibe code a pyautogui wrapper to pull up RDP and fill out a MS Access abstraction on a vendor DB.
Think we'll be fine for another 5-10 years as a profession
But using an agentic LLM to complete boilerplate is attractive simply because we've created a mountain of accidental and intentional complexity in building software. It's more of a regression to the mean of going back to the cognitive load we had when we simply built desktop applications.
I find the LLM as interactive tutor reviewing my work in a proof checker to be a really killer combo.
Agentic Engineer does not make much sense to be applied to a developer.
It is weird and confusing to call a web designer that uses AI assisted coding tools "agentic engineer".
>I firmly staked out my belief that “vibe coding” is a very different beast from responsible use of AI to write code, which I’ve since started to call agentic engineering
Disturbing? Really? I admit I don't do agentic and am going only by vibes, but for me agentic engineering is basically vibe coding in a automated loop with some ornamentals. They both stem from the same LLM root and positioning them as significantly different is weird and unconvincing to me. There may be a merit to this article (I gave up after few sentences), but I reject this specific premise.
It's the difference between caring and not caring.
The future is going to dynamically budget and route different parts of the SLDC through different models and subagents running on the cloud. Over time, more and more of that process will be owned by robots and a level of economic thinking will be incorporated into what is thought of today as "software engineering." At some point vibe coding _is_ coding and we're maybe closer to that point than popularly believed.
Without pre-defined definitions and locked procedures, it's extremely easy to mistake iterative adaptation for genuine signal.
* The first agent's claim that was 3.x-only was wrong * is nice-to-have but doesn't target our exact case as cleanly as the agent claimed. * The agent's "direct fix for yyy" is overstated. * not 57% as the earlier agent claimed
etc etc etc
And I forgot how many times my session with claude starts: did you read my personal CLAUDE.md and use background agents for long running operations?
I use enterprise subscription, max effort, was with both 4.6 and 4.7.
And please refrain from comments like "you're using it wrong", as the drop in output quality is very clear and noticeable.
What standard of result are you pursuing and are you willing to discipline yourself enough to achieve it?
AI can't make you un-lazy, no matter how many tokens you pay for.
No one is suggesting that.
An ace software engineer is not an ace because of tooling.
It's not the plane, it's the pilot, or something like that.
Claude Code in particular seems really uninterested in this aspect of the problem and I've stopped using entirely because of this.
To me it’s a spectrum with varying levels of structure provided, review etc.
Basically oneshot vibes on one side, fully hand coded on other.
I believe this is a common fault of not being able to zoom out and look at what trade offs are being made. There’s always trade-offs, the question is whether you can define them and then do the analysis to determine whether the result leaves you in a net benefit state.
Coding agents are also upending how software development works, in a way that we are still very much figuring out.
I don't think anyone has a confident answer for how best to apply them yet, especially on larger production-ready projects.
But building software still requires domain knowledge, understanding data structures, architecture, which services to use. We probably have 2-5 years before thats fully automated.
So the number of bugs to find remains constant but the amount of code to review scales with the capability of the agent.
"I know full well that if you ask Claude Code to build a JSON API endpoint that runs a SQL query and outputs the results as JSON, it’s just going to do it right. It’s not going to mess that up. You have it add automated tests, you have it add documentation, you know it’s going to be good."
This really is Wordpress and early PHP all over again, but it's the seasoned folks rather than the amateurs that buy into it.
I believe these tools will be refined and locked down and eventually turn into RAD stuff used by certified enterprise consultants, much like SAP and Salesforce and IBM solutions and so on. From this I come to the conclusion that it is not a good idea to become dependent on them at this stage, which is corroborated by the pecuniary expense as well as excruciatingly fast change in available products.
e.g, I change velocity of player to '200' and of bullets to '300', and it only updated the bullet velocity. Then told me the player was already 'at the correct value' even though it was set to 150. Things like that.. :)
If you mean 'passes tests', that can be tackled by AI. Although AI writing its own tests and then implementing its own code is definitely not a foolproof strategy.
How do you manage/orchestrate this? I'm genuinely curious.
I'm working on a licensing system for a product I'm building. I've used Claude a little bit to help out with it, but it's also made a lot of very dumb decisions that would have large (security!) consequences if I didn't catch them. And a lot of them are braindead things, like I asked it to create a configurable limit on a certain resource for the trial version of the application. When I said configurable, I mostly meant: put the number in a constant so I can update it later. What Claude thought I asked was "make it so the user can modify the limits of the trial version in the settings panel" (which defeats the entire purpose of a free trial!). Another thing it messed up recently is I was setting up email-magic-link authentication. It defaulted to creating an account for anyone that typed in an email, which could allow a bad actor to both spam people with login requests (probably getting me kicked off Resend) or creating a lot of bogus accounts.
These things do not think. You cannnot outsource your thinking to them.
Can agentic engineers adhere to a similar code of ethics that a professional engineer is sworn to uphold?
https://www.nspe.org/career-growth/nspe-code-ethics-engineer...
Can software engineers?
You can use these tools wisely without letting it run unverified carelessly.
That's the spirit, I always say - _others_ will deal with AI slop during code review. Eventually they will get tired and start 'reviewing' this AI stuff with AI - so it's a win win. Right?
Fast feedback loops and delegating tasks to sub-agents have been pretty common for vibers since well before they were canonicalized by agenteers. Same thing, different day, hardly even any difference in quality: they evolve together, though vibe tends to lead and agents follow and refine... which vibers then use too.
If you think of vibe coders as agentic alpha testers it makes a lot more sense.
There are certain codebases and pieces of code we definitely want every line to be reasoned and understood. But like his API endpoint example, no reason to fuss with the boilerplate.
This has definitely been my shift over the past few months, and the advantage is I can spend much more time and energy on getting the code architecture just right, which automatically prevents most of the subtle bugs that has people wringing their hands. The new bar is architecting code to be defined as well as an API endpoint->service structure so you can rely on LLMs to paint by numbers for new features/logic.
Spend a lot more time on architecting and testing than hand rolling most repos now.
Hats off to people who enjoy the minutia of programming everything by hand, but turns out I enjoy the other aspects of software development more.
The most important part and why slop isn't the same as a code written by someone else. The model doesn't care, it just produces whatever it is asked to produce. It doesn't have pride, it doesn't have ego, it doesn't artisanal qualities, it doesn't have ownership.
Do this enough times, and I will have forgotten how to think.
Companies are shipping things and nobody understands what they're shipping.
Like many people I have used AI to generate crap I really don't care about. I need an image. Generate something like, whatever. Great hey a good looking image! No that's done I can do something I find more interesting to do.
But it's slop. The image does not fit the context. Its just off. And you can tell that no one really cared.
This isn't good.
You can't do that for images and text.
Makes me want to just give up programming forever and never use a computer again.
If LLMs stop improving at the pace of the last few years (I believe they already are slowing down) then they will still manage to crank out billions lines of code which they themselves won’t be able to grep and reason through, leading to drop in quality and lost revenue for the companies that choose to go all-in with LLMs.
But let’s be realistic - modern LLMs are still a great and useful tool when used properly so they will stay. Our goal will be to keep them on track and reduce the negative impact of hallucinations.
As a result software industry will move away from large complex interconnected systems that have millions of features but only a few of them actively used, to small high quality targeted tools. Because their work will be easier to verify and to control the side effects.
Depending on how you measure "improvement" they already have or they never will :-/
Measuring capability of the model as a ratio of context length, you reach the limits at around 300k-400k tokens of context; after that you have diminishing returns. We passed this point.
Measuring capability purely by output, smarter harnesses in the future may unlock even more improvements in outputs; basically a twist on the "Sufficiently Smart Compiler" (https://wiki.c2.com/?SufficientlySmartCompiler=)
That's the two extremes but there's more on the spectrum in between.
you can also execute larger tasks than this using subagents to divide the work so each segment doesn’t exceed the usable context window. i regular execute tasks that require hundreds of subagents, for example.
in practice the context window is effectively unlimited or at least exceptionally high — 100m+ tokens. it just requires you to structure the work so it can be done effectively — not so dissimilar to what you would do for a person
How to organize code like you said, and how agents interact with it, to keep the actual context window small is the fundamental challenge.
I looked at that response by GP (rgbrenner) and refrained from replying because if someone is both running hundreds of agents at a time AND oblivious to what "context window" means, there is no possible sane discourse that would result from any engagement.
Doesn't change my point: the amount of code the agent can operate on is very large, if not unlimited, as long as you put even a little bit of thought into structuring things so it can be divided along a boundary.
If you let the codebase degrade into spaghetti, then the LLM is going to have the same problem any engineer would have with that. The rules for good code didn't disappear.
It's like like if your context window with one agent is n, your context window with 10 agents is n/10. It is some skill, but that is also where a lot of the advances are coming in.
Assistant: “I propose A”
User: “Actually B is better”
Assistant: “you’re absolutely right”
User: “actually let’s go with C”
Assistant: “Good choice, reasons”
User: “wait A is better”
Assistant: “Great decision!”
Eh, what a waste. Can't we just stimulate the optic nerve? Or better yet, whatever region of the brain is responsible for me being able to 'see' anything? And perhaps we can finally get smell-o-vision too.
Second, LLM code can be less of a hot mess than human written code if you put in the time to train/prompt/verify/review.
Generating perfect well patterned SOLID and unit tested code with no warnings or anti-patterns has never been easier.
Write lots of code now and statistically look great, while the impact won’t be felt for a much larger range of time.
With the job search and whatnot then yeah, caring becomes a lot more important. That’s true.
It's not immediate, it still takes weeks if you want to actually do QA and roll out to prod, but it's definitely better than the pre-LLM alternatives.
AI will make this dynamic worse, and it's got the extra danger of the default banal way of applying the technology in fact encourages it's application to that end.
I also don't think that the commodification of programming is a substitute for things like understanding your customers, having good taste for design, and designing software in a way that is maximally iterable.
With the right investment, we could certainly have tooling that creates and maintains very good designs out of the box. My bet is that we'll continue chasing quick and hacky code, mostly because that's the majority of the code that it was trained on, and because the majority of people seem to be interested in a quick result vs a long-term maintainable one.
That the industry was already routinely dealing with fires of it's own creation is not a valid reason to start cooking with gasoline.
What would normally be considered overengineered gold plating is "free" now.
Same thing happens in other fields. A rich country and a poor country might build equivalent roads, but they won't pay the same price for them.
The system that makes it have an opinion about good vs bad architecture or engineering sensibilities will be something on top of the transformer and probably something more deterministic than a prompt.
"Shit's in the Game!"
"Chunder Everything"
"Maddening NFL 26"
"FIFiAsco 26"
"UFC 26 (Un Finished Code)"
"The Shits 4"
"Battlefailed"
"Need for Greed"
What you're suggesting is a negative flywheel where quality spirals down, but I'm hoping it becomes a positive loop and the quality floor goes up. We had plenty of slop before LLMs, and not all LLM output is slop. Time will tell, but I think LLMs will continue to improve their coding abilities and push overall quality higher.
We are used to thinking about software like in the article, a program that runs deterministically in an OS. Where we are headed might be more like where the LLM or AI system is the OS, and accomplishes things we want through a combination of pre-written legacy software, and perhaps able to accomplish new things on the fly.
Whether that happens or not is a different question, but I believe that's what they're suggesting.
Programming is taking ambiguous specs and turning them into formal programs. It’s clerical work, taking each terms of the specs and each statements, ensuring that they have a single definition and then write that definition with a programming language. The hard work here is finding that definition and ensuring that it’s singular across the specs.
Software Engineering is ensuring that programming is sustainable. Specs rarely stay static and are often full of unknowns. So you research those unknowns and try to keep the cost of changing the code (to match the new version of the specs) low. The former is where I spend the majority of my time. The latter is why I write code that not necessary right now or in a way that doesn’t matter to the computer so that I can be flexible in the future.
While both activities are closely related, they’re not the same. Using LLM to formalize statements is gambling. And if your statement is already formal, what you want is a DSL or a library. Using LLM for research can help, but mostly as a stepping stone for the real research (to eliminate hallucinations).
With the rise of LLMs that do all of that... those people shutup and shutup real fast.
That's what the Tech-Priests are for.
How many of us remember that VSCode is actually a browser wrapped inside a native frame?
The new standard, Web Apps. Why update 3 seperate binaries for Win/Lin/Mac when you can do 1 for a web framework and call it a day?
With such a low baseline, there is an optimistic perspective that LLMs could improve the situation. LLMs can produce excellent code when prompted or reviewed well. Unlike human employees, the model does not worry about getting a 'partially meets expectations' rating or avoid the drudgery of cleaning up other people's code.
AI certainly has the potential to make the underlying code/design a lot cleaner. We will also be working with dramatically more code, at a much higher rate of change. That alone will be a big challenge to keep sustainable.
The ones making the decision to under-invest on design are either are unaware of the real costs, or are aware and are deliberately choosing that path - that's not new, and I don't expect it to change.
As a piece of meat, I look forward to charge rates of $10,000 an hour, to fix code out the vibe code generation.
--
It's just as likely that people will be surprised that we used to have billions of lines of human generated code, that no LLM ever approved.
By then AI would be good enough to clean them all up...
[citation needed]To make my comment more on-topic: why do you think this is going to be the case? What newer LLMs will be trained on?
Now with LLM we are talking of millions and millions of line of code that could be generated in a single day. The scale of the problem might not be the same at all.
LLMs aren’t the first thing to come along and change how people develop applications.
You had the rise of frameworks like Django, Rails, etc. Also the rise of SPAs. And also the rise of JS as a frontend+backend language.
In a 3-5 yeats we’ll have adapted to the new norm like we have in the past
Also, companies are pressuring employees towards adoption in novel ways. There was no such industry-wide pressure by employers in the 90s, 2000s or 2010s for engineers to use a specific tech.
Companies have been enforcing technology mandates since time immemorial. In the early 2000s there were definitely a lot of mandates to move away from commercial UNIX to Linux. Lots of companies began enforcing the switch to PHP, Ruby and Python for new projects.
Good luck disliking LLM babysitting these days
I use AI tools daily (because they feel like they're helping me) but it's not exactly hard to imagine scenarios where an explosion of slop piling up plus harm to learning by outsourcing all thinking results in systemic damage that actually slows the pace of technological progress given enough time.
History of new technologies tend to average into a positive trend over a long enough time scale but that doesn't mean there aren't individual ups and downs. Including WTF moments looking back at what now seems like baffling decision-making with benefit of hindsight.
If it is, the fall out will be way worse than if AI ends up living up to (reasonable) expectations.
If it doesn’t, we are going to see over a trillion dollars of capital leave the tech sector, which I think will have worse impacts on the livelihood of tech workers than if AI ends up panning out.
This is something the naysayers need to grapple with. We’ve crossed a line where this tech needs to work simply because of the amount of money depending on that fact.
I don't think it will be worse; if AI pans out the world would be able to continue without a single programmer left. If a trillion dollars leave the tech sector, all those programmers employed outside of the tech sector will still have jobs.
The damage would come much later, well beyond the point where it could be simply pulled out and replaced without spending massive amounts of money and would also basically necessitate training an entire new generation of engineers.
Then the AI giants would start appearing vulnerable like cigarette companies in the 90s while an AI Superfund and interstate class action are being planned but Sam Altman would already be a centitrillionaire at that point so it would be someone else's problem.
a) The stuff output by the existing LLMs is too unwieldy even for them to handle , even if the product itself is a glorified chatbot.
b) If all software is throwaway, then the value of all software drops to, effectively, the price of an AI subscription. We'll all be drowning in a market of lemons (https://en.wikipedia.org/wiki/The_Market_for_Lemons), whilst also being producers in said market.
I think this highlights a problem that has always existed under the surface, but it's being brought into the light by proliferation of vibeslop and openclaw and their ilk. Even in the beforetimes you could craft a 100.0% pure, correct looking github repo that had never stood the test of production. Even if you had a test suite that covers every branch and every instruction, without putting the code in production you aren't going to uncover all the things your test suite didn't--performance issues, security issues, unexpected user behavior, etc.
As an observer looking at this repo, I have no way to tell. It's got hundreds of tests, hundreds of commits, dozens of stars... how am I to know nobody has ever actually used it for anything?
I don't know how to solve this problem, but it seems like there's a pretty obvious tooling gap here. A very similar problem is something like "contributor reputation", i.e. the plague of drive-by AI generated PRs from people (or openclaws) you've never seen before. Stars and number of commits aren't good enough, we need more.
> where you fully give in to the vibes, embrace exponentials, and forget that the code even exists [...] It's not too bad for throwaway weekend projects, but still quite amusing. I'm building a project or webapp, but it's not really coding - I just see stuff, say stuff, run stuff, and copy paste stuff, and it mostly works.
So clearly we need a term for what happens when experienced, professional software engineers use LLM tooling as part of a responsible development process, taking full advantage of their existing expertise and with a goal to produce good, reliable software.
"Agentic engineering" is a good candidate for that.
Its shifted so much for me. I used to think that I had a solemn duty to read every line and understand it, or to write all the test cases. Then I started noticing that tools like CodeRabbit, or Cursor would find things in my code that I would rarely find myself.
I think right now, its shifted my perception of my role to one where I am responsible for "tilting" the agentic coding loop; ultimately the goal is a matter of ensuring the agent learns from its mistakes, self-organize and embrace a spirit of Kaizen.
Btw thank you for your work on Django, last 20 years with it were life changing (I did .NET before).
> Claude Code does not have a professional reputation!
how come?
I'm not checking the code since the code doesn't really matter anymore anyways - I just have the agent write passing tests for the changes or additions I make, and so even if something breaks I can just point to the tests.
Some days, the tickets are completed much faster than I expect and I don't hit my daily token expenditure goal, so I have my own custom harness that actually hooks up an agent to TikTok, basically it splits up the reel into 1 second increments and then feeds those frames to the LLM for it's own consumption. I can easily burn 10m tokens a day on this, and Claude seems to enjoy it.
Personally I want to thank you Simon for putting me onto this "vibe engineering" concept, I really didn't expect an archaeology major like myself to become a real engineer but thanks to AI now I can be! Truly gatekeeping in tech is now dead.
My side project is 80% vibe code. Every now and then I look and see all the bad stuff, then I scold Codex a bit and it refactors it for me. So I do see the author's point.
I took a rock carving course in school that really enlightened me about software engineering, and it still applies today, especially to AI. You can't just decide what you want to carve, hold the chisel in just the right spot, and whack it with a hammer just perfectly so all the rock you want falls away leaving a perfect statue behind.
"I saw the angel in the marble and carved until I set him free." -Michelangelo
It's a long drawn out iterative process of making millions of tiny little chips, and letting the statue inside find its way out, in its natural form, instead of trying to impose a pre-determined form onto it.
Vibe coding is hoping your first whack of the hammer is going to make a good statue, then not even looking at the statue before shipping it!
But AI assisted conscientious coding (or agentic engineering as Simon calls it) is the opposite of that, where you chip away quickly and relentlessly, but you still have to carefully control where you chisel and what you carve away, and have an idea in your mind what you want before you start.
> But I’m not reviewing that code. And now I’ve got that feeling of guilt: if I haven’t reviewed the code, is it really responsible for me to use this in production?
Answer: it wholly depends upon what management has dictated be the goal for GenAI use at the time.
There seems to be a trend of people outside of engineering organizations thinking that the "iron triangle" of software (and really, all) engineering no longer holds. Fast, cheap, good: now we can pick all three, and there's no limit to the first one in particular. They don't see why you can't crank out 10x productivity. They've been financially incentivized to think that way, and really, they can't lose if they look at it from an "engineer headcount" standpoint. The outcomes are:
1) The GenAI-augmented engineer cranks out 10x productivity without any quality consequences down the line, and keeps them from having to pay other people
or
2) The GenAI-augmented engineer cranks out 10x productivity with quality consequences down the line, at which point the engineer has given another exhibit in the case as to why they should no longer be employed at that organization. Let the lawyers and market inertia deal with the big issues that exist beyond the 90-day fiscal reporting period.
Either way, they have a route to the destination of not paying engineers, and that's the end goal.
If you don't like that way of running a software engineering organization, well, you're not alone, but if nothing else, you could use GenAI to make working for yourself less risky.
Just piggy backing on this post since I'm early:
Would love to see your take on how the AI and Django worlds will collide.
Rather, I just feel like I have to constantly remind myself of the impermanence of all things. Like snow, from water come to water gone.
Perhaps I put too much of my identity in being a programmer. Sure, LLMs cannot replace most us in their current state, but what about 5 years, 10 years, ..., 50 years from now? I just cannot help be feel a sense of nihilism and existential dread.
Some might argue that we will always be needed, but I am not certain I want to be needed in such a way. Of course, no one is taking hand-coding away from me. I can hand-code all I want on my own time, but occupationally that may be difficult in the future. I have rambled enough, but all and all, I do not think I want to participate in this society anymore, but I do not know how to escape it either.
The job, as you have done it at least, was also not here 50 years before you started doing it.
Did you have any of the same feelings knowing that you were doing a job that has not existed in the world very long? That seems like a strange requirement for a meaningful job, that it should remain the same for 50+ years.
In truth, our world and what we do for our careers is entirely shaped by the time that we live in. Even people that ostensibly do the same thing people have done for centuries (farmer, teacher, etc) are very different today than 100 years ago.
My dad (now retired) was always super practical about stuff. He'd tell me pretty nonchalantly things like "yeah we're dealing with xyz constraint, we may have to cut a corner over here, but that's ok", when I asked him about it he gave me a little spiel that you can be thoughtful about how you do things, including when you can cut a corner and more importantly, what corners are ok to cut.
I really took that to heart - especially the "be thoughtful about the corners you cut"
If an LLM has consistently one shotted certain tasks and they are rote/mechanical - not reviewing that code is probably ok.
Are you getting lazy and not reviewing stuff that should be reviewed even if a human wrote it? That's probably not ok
I can live with some basic code that broke because it used outdated syntax somewhere (provided the code isn't part of a mission critical application), but I can't live with it fucking JWT signing etc
I don't buy this argument at all. I think if we could pay $20/month to a service that would send over a junior plumber/carpenter/electrician with an encyclopedic knowledge of the craft, did the right thing the majority of the time, and we could observe and direct them, we'd all sign up for that in a heartbeat. Worst case, you have to hire an experienced, expensive person to fix the mess. Yes, I can hear everyone now, "worst case is they burn your house down." Sure, but as we're reminded _constantly_ when we read stories about AI agent catastrophes -- a human could wipe your prod database too. wHy ArE yOu HoLdInG iT tO a DiFfErEnT sTaNdArD???
The business side of the house is getting to live that scenario out right now as far as software goes. Sure you've got years of expertise that an LLM doesn't have _yet_. What makes you think it can't replace that part of your job as well?
But that's not what the author is talking about in that passage you quoted. What he's saying is that, if you can pay $20 for an AI plumber, then it stands to reason that eventually you will be able to pay $30 to a company that manages AI plumbers for you, so that you don't even have to go to the trouble of supervising the plumber. Most people will choose the $30.
The implication here is software engineer jobs are still safe despite basically free labor/material being available to do said jobs because he thinks other people would prefer to pay experienced professionals to do it right at a significantly higher cost. My point is, I think most people will take the low-stakes gamble of having the cheap AI agent do it with self-supervision[0]. He's naive in thinking people are really going to care about artisanal software built by experienced professionals in the future.
0: Even if you subscribe to the "your job will be to supervise the agents" train of thought, you're kinda glossing over the fact that it's probably gonna involve a pretty significant pay cut and the looming problem of "how do new experienced professionals get created if they don't have to/don't need to get their hands dirty"?
I don’t think this comparison quite works (or maybe I think it works and is wrong) and I think it has something to do with creativity or the initial ideation.
I would do this, but I’m a jack of all trades. I built my own diner booth in my kitchen recently. But my wife, who loves the diner booth, just doesn’t really want to get over the hump of figuring out what she might want. I think most people want to offload the mental load of figuring out where to start.
Most people aren’t just bored by coding, they’re bored or overwhelmed by the idea of thinking about software in the first place. Same with plumbing or construction, most people aren’t hiring someone to direct, they’re hiring a director.
Even I have this about some things, sometimes I choose to outsource the full stack of something to give me more space to do creativity elsewhere.
And AI generated code should be different than human code. AI has infinite memory for details. AI doesn’t need organizational patterns like classes. Potentially AI can write code that is more performant than any human.
Will it look like garbage? Sure. Will the code be more suited to the task? Yes.
The code produced will only be understandable by AI. You could use locally hosted LLMs, but it won't be as performant as AI run by big guys. And there is nothing stopping greedy companies implementing some ridiculous pattern that only their model can reasonably work with.
So what you'll do in situation when you can't understand "your" codebase and you have to make changes or fix a bug?
It will be a black box, and the code will be generated just in time by ai for each api request
The open weight models are nipping on the heels of frontier models. The frontier labs have to make forward progress and keep tokens cheap in order to maintain marketshare.
Eventually, we'll have a Mythos-level model running on integrated hardware on every PC.
Code that is organized well and operates coherently in the first place, by an LLM or not, will be easier to iterate on, by an LLM or not.
No, just no.