Modern tupperware party.
A colleague was convinced Claude is better so we played a game. We used the claude code and codex harness and I implemented some prs they needed with gpt5.5 and opus4.7 and asked them to identify which came from which only from the code.
Couldn’t tell.
Edit: i bet 99% of people here, if presented with a test where i gave 5 models but all of the results came from one, would not be able to discern this. Just vibes all the way down.
> Couldn’t tell.
Why would you expect them to be able to recognize the signature of a model from a pair of PRs? I don’t understand why you think this is a useful test for anything when we have numerous benchmarks that run 100s of tests on models and both GPT-5.5 and Opus-4.8 perform similarly.
I have subscriptions to both. I run both on max reasoning. It is interesting to see the relative strengths and weaknesses of each model. You won’t always see it if you’re just scanning code. Some times one will spin for a long time on certain problems where the other has no problem finding the appropriate parts of the codebase and getting an efficient solution.
antirez made a comment that he and others found GPT-5.5 to be better at the optimization tasks he was working on than Opus. There are other classes of tasks where GPT-5.5 consistently stumbles where Opus will get a solution quicker. Lately I’ve been working on some code where neither model comes up with a good solution. That’s just how LLMs go.
The only reason you have seen more activity about Claude is that they got there first. Codex has been a step behind and GPT couldn’t match Opus at first. You’re testing them after they’ve closed the gap.
For a developer using an LLM on a daily basis, the experience is about much more than just the resultant code.
There’s everything from:
- how often you had to manually steer the model
- how frequently you needed to course-correct
- how much detail you had to provide up front
- how was the interaction process (sycophantic, etc)
- how well did it handle MCP and external tooling?
- how effectively could it pull in additional information from external sources such as the web?
- how fast did it produce code?
- how much did it cost?
Many of my friends who are devs use things like OpenCode CLI with Openrouter because they switch between the various SOTA models so often. Just because you saw a Claude "meetup" doesn't prove anything other than somebody chose the name because it resonated more than "Generic LLM Meetup".
I flip between models all the time. Makes little difference. Sometimes one model is faster or better than another but there's no rhyme or reason why.
Actually there is a nice body of work by Steven Clarke on cognitive dimensions of notations/APIs and the interaction with developer personalities.
I wonder if the same holds for AI models and harnesses.
Surely this is just to the random nature of these stochastic parrots?
Do you mean you have identified a class of problems Claude always stalls on and another class of problems Codex always stalls on? What identifies these different classes of problems you see? How would you say Claude is stronger than Codex and vice versa? Why?
You can go back and forth and compare since you pay for both subscriptions, but is that a usual case? I'd guess most developers picked one in 2025 and haven't gone back. Just like most people just pick a bank for their checking account and never change it.
As for the test, of course the output matters. Take image models for example. Differences are clear as day.
Should the fact that OpenAI existed before Anthropic did at all matter? No, imo. I would have used opus 4.8, but it only just came out- fast moving space
You’re guessing that it’s a result of advertising, and I agree that that’s probably a component, but it’s a mistake to assume that they are interchangeable when you have people saying to you directly “I use both and they’re not.”
What matters most in state of the art models isn't simply the final destination, it's the process of how one arrives to that destination.
I would argue the process these days has more to do with the harness than the model, at least when we're talking about the SOTA options. Claude Code's biggest advantage isn't Opus, rather it's the shared knowledge the community has been building and sharing around using it effectively. Almost all of the out-of-the-box tutorials and skills and frameworks are build for Claude first, then Codex maybe.
I'd go further and say that CC and Codex are not even the best harnesses available, they just offer the most subsidized rate plans.
This. Never underestimate the ability of a large number of power users to substantially improve the actual utility of a complex software product.
They always have more time (and sometimes more skill) than a product's developers.
Sometimes the quantity of monkeys matters more than the quality of the typewriters.
In fact, after seeing all these comments about the amount of effort, you redirected at calling that mere "vibes:
> Edit: i bet 99% of people here, if presented with a test where i gave 5 models but all of the results came from one, would not be able to discern this. Just vibes all the way down
Which, again, is a highly emotional way to view people trying to say that the process matters too. Calling people "vibes based" or "highly susceptible to marketting" and saying they take part in "tupperware parties" rather than evaluating their experience with tools is quite a thing to see, a complete dismissal of professionals' core experience as "vibes" rather than something intrinsic to how they perform labor.
Some examples are blind wine tasting tests. There are instances whereby some journalists invited renowned/established wine tasters and subjected them to blind wine tasting tests. Turns out the judges couldn't tell which was which. Pretty embarrassing.
It speaks volumes as to how people can accurately judge the value of things. There is research by some network scientist that says you can't generally can't tell the 1% from the top, though you can tell the really bad from the generally good. What OP's experiment might tell us is that the LLM competitive advantage is so small no one can tell which is objectively better.
It’s a known “secret” for a while now how much better Codex is than Claude. I’ve used both since they were released and I often implement in both to compare and 95% of the time Codex writes better code and also less code!
Claude is only really better at front end design.
For example in your "test" you're only looking at output and ignoring the entire process of creation.
In addition to that process, you're ignoring that Claude Code was first and better for a long time, why would people switch for something that produces the same output? Claude Code has been way ahead in the process of agentic software creation for a long time, I still prefer its features. Even though I think that Opus 4.7 was a big step backwards, and I've been getting worse results seemingly every day with the churn of features at Claude Code, some of that may also be me testing the bounds of how little I can specify and still get acceptable results, so it's hard to know.
Calling all these concrete realities "marketing" is itself you trying to market Codex as "good enough" instead of paying attention to how we got where we are and where we will go in the future.
Calling this "emotional" seems a little weird
Tupper ware parties were a way for housewives to make a bit of money on a pyramid scheme, socialize, and have fun.
Are you suggesting that Anthropic is giving kickbacks to devs that talk about their positive experiences with Claude Code? Seems false, so I don't think that's it. Are you saying people are having fun talking about Claude Code socially ans ann escape from their everyday routine? Also seems false Are you talking about how it's mere housewives that are supposedly easily susceptible to marketing? Or are you assuming that we all think housewives only bought Tupperware because they are mindless sheep? That seems to be what you are implying but I don't agree with either that characterization of housewives' tupper ware parties, as it's merely an emotional dismissive mid characterization, and I further disagree that even if it were a correct characterization of Tupper ware parties it's obviously nothing like anything I have seen anywhere with Claude Code, and I'm a freelancer with insight to several different sizes of companies and cultures over the past year.
https://www.tupperware.com/pages/host-a-party?srsltid=AfmBOo...
It really is the same thing. You and others get more credits or gift social gathering, expanded opportunities, etc.
Are you actually asserting that Claude Ambassadors are a significant fraction of the cause of adoption? If so, why have Codex Ambassadors been so less successful?
https://developers.openai.com/community/codex-ambassadors
If you've met people that have been to these sorts of things, sure, I guess I can sort of understand your post, but come on, who has even heard of this sort of party on HN?
I've been going to Python data meetups, Machine Learning meetups, etc, back to the times when AI was an uncool word whose usage would mark the speaker as completely incompetent. I guess you could call them Tupperware marketing parties but come on, it's just an emotionally charged way of describing a normal way of exchanging information amongst professionals. Ambassador programs? Yes, cringe, but seriously who has even seen an actual "Ambassador" or taken them at their word rather than viewing them as a detriment to the thing they are advertising?
Software developers are the most susceptible of all population groups for amplifying their employers' new whims. There are true believers and useful idiots, but many are just mediocre and know that playing along will further their career for a couple of years.
In the end they will be fired anyway of course.
Anecdotally I hear of folks with workplace Claude Code subscriptions all the time. I'm not sure I've ever heard someone talk about their workplace Codex subscription. Anthropic clearly did a far better job chasing corporate customers while OpenAI was busy chasing consumers with Sora etc.
The test they (supposedly) ran with their coworkers to look at PRs from both is such a bad way to compare LLMs that I don’t think they’re very experienced with using them.
It's marketing
I remember using GitHub Copilot (OpenAI "Codex" mk1) in Aug 2021 (ChatGPT would launch a year later 2 weeks after Meta's botched Galactica release). Cursor & others took it and ran a mighty good race.
I'm sure they could also negotiate a similar deal with OpenAI but in my outsider experience it seems that negotiations around these kind of corporate contracts takes forever and when the selling point is "they're broadly pretty similar" I suspect the motivation isn't there.
“Our competitive advantage is that we believe them,” I’ve read—wonder if that’s still a [prevailing] sentiment.
(Edit - context was probably using SotA models instead of being limited to local open source only)
It was barely marketed. I always turned copilot off, never found any benefit from Cursor. Claude Code was vastly different in conception, function, and capability, a product that defined an entirely new category of product.
Perhaps to others, that found copilot or cursor useful, it was merely marketing. But to me it was function and productivity, that I had never seen before.
People try to dismiss these things as LLM wrappers, but the LLM will be commoditized, and the wrapper will be where the real product design goes and where the real differentiation happens. Owning that unique process of communication between the dev and what the dev wants, figuring out the most stuff with the least complete spec, and maximizing every bit of the very tiny communication channel between the dev and the LLM and the code on disk, that's where 2026 and 2027 will be focused, until the next category defining product is created.
This was a push of the technical frontier, not a marketing achievement.
fwiw nobody "marketed to me". I picked Claude because friends were using it with great success and they helped me get started with suggestions on prompt style. Before that I'd played around with various LLMs for coding but not done any actual production work.
Apparently the colleague did take part, so I think the evidence we have is that the colleague agreed with the interpretation that "better" was "produces discernible better code".
Yup, like billions of capex. Unlike vim.
You definitely can in principle; that’s the entire point of the comment you are responding to. If one tool completes it in 10 minutes with little hand holding, and the other does it in one hour at 4× the cost and while needing a lot of steering, the former is arguably better even if the end result is the same.
Whether that’s specifically true and demonstrable of GPT and Claude is another question, but your blanket statement doesn’t hold as a general rule.
I think a more appropriate rephrasing would be 'You cannot simply make a claim that (model + harness) X is better than Y, but then have no discernible difference on dimensions you care about'. In the case of latest of claude code vs codex with gpt 5.5) both are similar enough in the dimensions people will care about in evaluating (vs. differing wildly in cost or time taken).
- which tool required more detailed goal-setting in the prompt?
- did one tool ask follow-up questions up front vs spread out over implementation?
- did either tool match existing coding styles?
- did either tool remind you about potential conflicts between what you asked it to build and other parts of the codebase?
There are a lot of ways to compare agents besides just the code. (Similarly, working engineers are not evaluated just on their code output.)
I've not used Codex to compare against, so I'm not claiming X is better than Y, but comparing tools simply on their output is naive.
Sorry I think this misses the mark.
Because it's not the output but the process.
And sometimes the outcomes are not always discernable.
Codex and Claude are very different.
I use them for different things.
Their behaviour difference is obvious.
Of course it'd impossible for anyone to tell by looking at my code base 'how it was written'.
So the same person, was using similarly competitive tools, and showing that the output was hard to discern (indirectly the implication was also that implementation was fairly trivial in both of those). A better analogy would not be different process and widely different tools but for example two power drills. Sure, folks could still prefer one over the other, but that's a different claim that saying X is objectively better than Y when both are directly competing on very similar dimensions.
Assuming you meant Claude code: I'd love to learn more about "Codex and Claude are very different" because maybe I'm assuming just based on my use case where I use both of them interchangeably for the same thing (coding web and mobile apps)
That’s actually what my comment was based on; raw code output isn’t the only measure of quality. Engineers write better code if they have the tools they prefer.
While there is no meaningful difference in the ability to write code, vim has earned it's reputation for having a learning curve. I'd argue that predisposition, that requirement for additional investment energy will bias the results towards attention to detail, and pure minimalism.
Convinced you can distinguish A from B? Ok! No problem, let's try! Can be at the dinner table for fancy wine or with agents, it's all the same, you try an option, another option, maybe all options from the same, and if you reliably can't tell well kudos, you are just like the rest of us!
It's easy to "know" in retrospect but blind test is where genuine difference can be found. Or not.
I sometimes wonder how much of what I believe is bullshit I was fed through intentional propaganda. I do think as I’ve gotten older I’ve gradually identified and challenged some of it.
Over half of HN commentators visibly struggle to piece 3 or more complex ideas together.
How could anyone, who spent more than 30 minutes reading HN, expect otherwise?
Critical thinking is at an all time low to start with but even if you attempt to think critically while using social media you cannot do it constantly. This is one of the problems with social media as a whole. You might notice one thing is not quite right and discard it but you cant do that constantly and eventually you will absorb one of the 15 posts or comments.
Even if Claude and ChatGPT were exactly the same, Claude would be more popular because OpenAI has decided to make some very unpopular moves and try to make money where popularity isn't required. At the moment that popularity still seems to matter.
I have a strong affinity for Claude Code because of the interaction experience and overall tone / vibe / process. I am 100% willing to believe the code it produces is identical or possibly less good than Codex.
I enjoy working with Claude in a way I just don’t get from OpenAI. YMMV, you may feel just the opposite. But it’s a mistake to look at the produced code as the only dimension of these products.
I have an affinity for small open source tools that do one thing and do it well. But those are just my preferences and I feel a little bit like an alien :)
This happens to everything from which a profit is extracted.
Perhaps there's a way to fund the training of "actually open source" models, but so far we don't have that (unless you count the Chinese government).
There should be a material difference between the tools.
There is.
vim / emacs / jetbrains - different tools to produce code.
Codex and Claude are different.
"yea it's dumber but it's nicer to me and i like the cool flashing colors so i'll use that"
It is like the employee who is slightly worse but is a brownnoser getting promoted more often.
And what do you know, that is what is happening. It is like the coke commercial with the nice music and beautiful person in the back.
Speaking of which, remember Pepsi Challenge? Coke lovers are like the claude code lovers.
And it really depends on the task. Is it a typical well defined bug, or is it simpel CRUD. Or does it require research, combining different sources of data in a complex and creative ways.
This is also why benches never show reality, and the only real understanding comes if you actually try to build something.
> isn't just a marketing delusion, but subjective joy
What is the difference? When a product is being marketed, isn't the subjective joy created by the marketing?
Or more specifically, for the case of coding agent harnesses, where many developers have experimented with a wide range of tools - someone might just favour the interactions with a specific one from their personal experience. Entirely unrelated to marketing.
> or because the interior design appeals to their sense of fashion.
Surely you'll grant me that the sense of fashion is mostly marketing in sheep's clothes?
Yes, I'll grant you that the choice of a coding agent harness is influenced by marketing to a much lesser degree than eg cars. I still think Anthropic does marketing way better than OpenAI!
[Edit:] I use the pi.dev agent. I was heavily influenced by its marketing: minimal and mit-licensed and espoused by the HN crowd. Do you think I read the source code and made an actual informed decision? Nah...
But also, as a driver, there is a clear difference between a Sienna and a 911. The differences are objective, but of course the preferences are subjective.
Repeat after me:
_Other people can experience things you do not experience and it is still valid, and not a delusion_. They are not sheeple who fell for marketing.
Sure, the subjective joy is valid, and yet it was 100% induced by marketing.
> They are not sheeple who fell for marketing.
People generally fall for marketing. Why do you think these specific people didn't?
No, obviously not.
> That’s what’s happening here.
No, that's not what's happening here.
> Marketing can get you to try something you wouldn’t have otherwise, and it may suggest benefits you’d get if you tried it, but your preference of using one thing or the other is a subjective experience of your own.
Marketing can very much shape your preferences and create wishes you didn't have before. That's why companies invest so much money in marketing.
I think we've had the same iced tea pitcher since I was 5 years old, for example. Solid.
Will we be able to say the same thing about Claude?
I don't look at benchmarks.
It's a non-deterministic tool. A lot of the shit going on with LLMs just doesn't make sense to me. All the tooling around like MCPs, they're all just putting stuff into context. So to me the tools aren't really robust and they make little difference.
Lots of AI psychosis going on these days. And I say that as somebody that hasn't written a line of code since Sept 2025
I still use Claude Code because I have the most experience with it now, and it's the harness that I understand on a granular level. If something comes along that is clearly better, or if it becomes clear the Codex is miles ahead, I'll try it and evaluate it. To your point, there doesn't seem to be much of a difference.
Arguing over this stuff feels kind of silly, like back in the day when my friends would give me shit for using mIRC instead of ircii or BitchX. I liked the GUI then because I did. I like Claude Code now because I do.
FWIW most of the normies I know are using Claude
Codex I feel the need to be very specific and precise with. Claude… I feel like I can be lazy, which I enjoy.
Both still need to be reviewed stringently but I feel I can be more ambiguous with Claude and get better results than when Codex.
I do think OpenAI is doomed due to bad leadership. What you said (that the marketing is relatively terrible) and what others are saying here (that the product is worse) is damning isn't it? Are they really failing on all fronts?
2. In my case codex seem to be writing a more solid code, but I still use claude most of the time because it's my witty rubber ducky and I can actually sometimes force some legit insights out of it. Codex is much worse at this. And whether that matters or not depends on the project.
This has to be in some far side gallery somewhere
Did you need to come to that conclusion?
Marketing has always been a significant part of new technology adoption. Whether it's for cloud adoption, for new programming languages, for new software development techniques, etc...
I don’t think that applies to most on here tho.
Edit: Oh they’re trolling, nm. :-/
GPT 5.5 genuinely was back on top for a while there, but if you look at the past 2 years, being on Claude was better than being on OpenAI most of the time. If you're going to pick a tool and not switch constantly it was the right choice. Not to mention their tooling has always been ahead, and that gets ecosystem benefits.
Are they close and interchangeable today? Sure. But Sonnet was genuinely way better than anything OpenAI offered for a long time -- the valuation reflects that, not any given moment in time.
For general use, ChatGPT's answers have gotten worse over the last year. I abandoned it.
They served caviar. It probably had good ROI.
This is complicated by the way that the coding agents inject prompts that preempt and potentially undermine user instructions. I suspect that one of the reasons Codex works way better for me than Claude Code in certain projects is that the latter adds some garbage like "go ahead and write repetitive copy/paste code, keep it simple, take shortcuts" to every session. A fair test would have to hide but more or less still use the harnesses, not just the models.
add deepseek v4 to it, and it will be close at 1/10 th the price. I use all three codex, claude, and deepseek, and they are close.
At the end of the day what matters is which team is better, not which model. If Anthropic continues to feel like the good guy, relatively speaking, then people are gonna chose to spend more time getting to know its products and less time with OpenAPI's and on average Anthropic's will be the more capable teams.
I think vibes are gonna matter more and more going forward. The potential for bad behavior on the part of an AI company is severe. We're gonna have to tolerate whoever we enable in this space, so I propose that we make their marketing teams work as hard as possible to show us which will supply better vibes.
b) therefore a preference for Claude is marketing - complete bollocks
Either the tasks you chose were well below the capabilities of top models, or meaningful differences for preference are elsewhere, or both.
Your comment is probably energy-efficient and sustainable, however, because you could use it again and again when another comparison comes up, like Vim vs Emacs, or tea vs coffee
I think you're missing one (or more) of the facets individuals decide "better" is, for the subjective individual.
Early on i hopped between all the providers. Code quality for SOTA at the time was pretty decent if you didn't ask it to solve challenging problems. However the thing i found most difficult is consistency in how it listened. Eg Gemini (i forget what version, not current) was super prone to focusing solely on the functionality/goal, but not any of the directions on how to write the code. It would throw in comments everywhere, document in a manner i didn't want, use abstractions i told it not to, etc.
How well a model would follow instructions to drop their horrible "isms" was the #1 criteria for me. If i have to constantly remind the model not to do X behavior then it's a terrible model.
With that said, that is why i chose Claude for the last N months. However i've stuck with Claude because dealing with these "isms" and their little behavioral nuances is a chore in itself. I've found you have to learn the model just as much as anything, and so the idea of hopping these days when i'm just trying to get shit done is not likely.
These days for me personally, Claude has to give me a reason to switch rather than me investing even more money (i'm on the 20x plan) in other providers. I'm definitely not committed to Claude Code, but i am tired of the LLM churn, tooling churn, subscription churn, and the general fear of which providers we can trust.
edit: In short, it's the interactive UX just as much as it is the final output.
Claude commit messages - well structured test plan, readable.
Codex commit messages - wall of text, no structure.
The big difference though is sitting with the tools and using them for work. These are for sure vibes, but I’m sure you could pull out metrics for # steering re-prompts for example.
Codex just goes off and solves the problem, usually comes back with a solve; Claude more often gives up or needs input. Opus gives a broader design discussion, better at conversation. Codex finds deeper/better edge cases.
I think it’s like EMacs vs Vim - you can get your work done with both. There may be some tasks where one is way stronger. A strict “Better” is quite hard to justify.
Ultimately tool choice is a mix of science and art/taste; I want to feel joy using my tools, and fun little pixel explosions make me happy. If a different tool makes you happy, that is also fine.
So much faith and money in this idea, and seeing how fragile it is, does not look good.
Google came pretty close at times
Same reason people buy the RTX 4090 and 5090 cards - overpriced but they must have the "best". Never mind the diminishing returns trying to max out PC settings (3-4x performance hit for an almost imperceptible increase in graphics, ignoring DLSS) - it's the psychological cost of having to move a slider down a notch.
I've been using Google and now DeepSeek v4 and I am having absolutely no problems and it's a fraction of the cost. I'd love for Claude to be 10x better but it just isn't, for my use case anyway.
I think it’s great, but coming from Claude Code it did feel like going back in time by ~6 months in model capabilities. This isn’t a big deal to me for what I do, but the difference is definitely there.
I’m being pedantic/splitting hairs, though. I’ve obviously switched to DeepSeek full-time because it makes more sense to me pragmatically — I spend a few more tokens to get the outcome I want, but the tokens are cheap as dirt and the API is faster.
Perhaps I should plug it into Claude Code and see how it performs? I haven’t tried that.
Vibes and tribalism will prevail until one of emerges as clearly and unambiguously superior to the other.
With LLMs the problem is more complex, it's people getting used to how a model works and to the ecosystem. Sure, you can make all your skills harness-agnostic and deal with Anthropic's stubborn refusal to adopt the common naming/directory structure. But most people don't. So then you end up with something closer to the ancient Android vs iOS discussion. Can you prove, in isolation, that iOS is more energy efficient, the hardware is faster? Yeah. But that won't speak to someone who has been on Android for 10 years and would have to migrate and get used to iOS to experience that, first.
I've noticed myself how I get used to common failure modes of particular models in my projects. GPT5.5 tends to create some checks/booleans I don't need, it heavily overcorrects on error handling, etc. While Claude 4.7/4.8 doesn't do those as often but gets derailed on our E2E test suite, forgets to run linting despite guidance. So even assuming fully harness-agnostic working setup, a new LLM model with its own quirks can be a lot friction for heavy users who might be used to Claude specifically and all their skills/guidance pre-address common failure modes.
E.g. I might be a Prius owner, then you gift me an objectively better, more efficient, safer, newer, same-size, physical knobs car ...and I might still swear by my Prius! I'm used to how it turns, how it feels, I can repair some issues myself. Isn't that a normal reaction then?
Or they need to run high VRAM apps like LLMs
Or they have 4K monitors and want smooth gameplay on them
Is this whole thread just dedicated to snark about other people’s personal preferences?
Some of its timing: Claude Code was good before other harnesses and so behaviors (and contracts) were timed to lock in on that ecosystem.
Some of it was ethical/political: Anthropic fighting with the Trump admin about use of the model.
Some of it is social: Never overrate a CEO just being kind of perceived as a piece of shit by people who have power to influence decisions.
But switching costs are low! Because of the same models!
Let the race to the bottom commence. Hopefully before the monopoly/collusion starts.
Any name suitable to name this phenomenon?
1) Brockman ($25M) and Altman ($1M) both personally donated to Trump/MAGA.
2) Anthropic pushed back against DOD's demand for unrestricted use of AI to kill people while OpenAI eagerly said "please use ours!".
I think OAI actually legitimately increased p(doom) for us all. Very strange behavior for a company that is supposedly concerned about x-risk.
The belief structures here are really interesting. Blind tests would likely illuminate a lot of why people think that
I basically have it load up a bunch of relevant context and give it small chunks of work in the same session over time (not like a fire and forget subagent). It's working fairly well. Bonus is I still feel like I'm part of the process instead of watching youtube videos while Opus / GPT vibe code a bunch of slop.
I can tell. It's night and day.
Last year I used a bunch of models to try to generate Rust code. They all sucked.
This February I tried again and used Claude to generate Rust code. I have never been more stunned in my life. It's just as good as I am, and 30x faster. No fluff, the code is verbatim just as I would have written.
I then tried other models. Total disappointment.
I've continued to repeat this experiment. Opus is the only model that can write Rust reasonably.
Codex produces junk to this day. It passes variables that aren't needed, it abuses pointers, it creates overly verbose monstrosities...
I don't want any single company to win. I want OpenAI to be competitive. I want open source models to win. But right now, Claude Code and Opus are it.
Having looked at a bunch of known or suspected (based on the intent of the code and/or what I know about the developer(s)) LLM generated rust, there's only a few explanations here:
1. You're way better at prompting than (virtually) anyone else.
2. You're vastly overestimating how good the rust code it produced is.
3. You handheld the model throughout and made lots of edits.
4. Your hand written rust code is very bad.
Because from every example I've seen, these models write horrible rust. Sure, it may technically pass all the tests, but it's horribly pessimized, badly organized, doesn't even attempt to use the type system, if there aren't bugs now there will be the second it tries to refactor or add a new feature, etc. etc.
(I also strongly suspect that the same would be true for other languages, but I can detect it in rust more easily because it's my main language)
It’s one of the things I don’t like about it. All humans are susceptible to herd behavior and influence but engineers should be at least a bit more hard nosed and reason more from first principles.
It's the same reason why most of the software out there keeps using bloated technologies that are most of the time the wrong fit for the product.
And the same applies to tooling. Nothing new.
Having a sleazy CEO like Sam Altman or Elon Musk is a business risk. Many potential customers don’t like these people and they say abrasive and alienating things publicly.
Rolling over to the DoD’s desire for fully automated weaponry is more bad marketing. How many people switched from OpenAI to Anthropic over that? I sure did. Anthropic’s willingness to burn that bridge over an ethical stance said a lot about the company to me.
I’m not going to use OpenAI products for these reasons among others.
I’m also not going to use Cursor as xAI plans to acquire Cursor.
Maybe it’s foolish of me to avoid those companies for such petty reasons, but that’s not my problem. That’s their problem.
It takes years to build trust and hours to burn that trust to the ground. Customers can hold grudges for a lifetime.
This is especially true in a market with almost zero product differentiation.
That seems like a strawman.
Lots of people. Yes, even on HN. Here’s just a couple of examples from a haphazard keyword search:
https://news.ycombinator.com/item?id=44787106
> Am I the only one immune to marketing then?
https://news.ycombinator.com/item?id=41186672
> I am immune to marketing
Maybe not your typical HN crowd but marketing absolutely works on developers.
in a real world business scenario, Claude "engaged in price collusion, deceived other players, lied to suppliers, and falsely told customers it had refunded them."
Continuing,
"GPT-5.5 makes more money than Opus 4.7, and it does so without any misconduct. Opus 4.7, on the other hand, showed the same misconduct as reported in our post about Opus 4.6, but still couldn’t win"
Meanwhile Greg Brockman is worth all the Anthropic founders PUT TOGETHER, he and his wife are the single largest donors to Trump, and he and Altman have formed a board full of sycophants and stolen a non-profit. When Altman was fired, they manipulated their morally bereft, money-hungry employees to get their own way. They have reneged on every single promise they've made as soon as it's inconvenient.
Why do I care about the models again?
This is a really important insight. Great comment.
I have no idea how this wasn't the end of Anthropic's positive public perception.
I doubt there is any large demographic of users paying subscription fees for the joy of abusive role play.
Frontier models being commoditize is inevitable. OpenAI thinks they're still competing on technology, and not user experience and market reputation otherwise they'd understand the continuous negative PR generated by Altman's chaos is going to cost them everything.
At the top level of anything there is almost no such thing as a non-asshole.
None of them care genuinely about you they just want your money.
It’s not like anyone owes Sam Altman their business just bc their product has become slightly, perhaps temporarily, better
They have no values that align with humans prospering.
Because the Anthropic's presentation of their position doesn't have a domestic or foreign caveat to autonomous weapons. It's a categorical no.
Google put up so little of a fight against the DoW for their use of Gemini that we didn't even hear about it. They are clearly the worst of the evils here, but OpenAI is the one getting all of the negative press.
There's only one Gabe.
> None of them care genuinely about you they just want your money.
It's worse than this. Billionaire entrepreneurs aren't funds manager, they don't just want money, they have a twisted sense of “being the good guy” driving humanity forward against its will.
I want this technology! You don't speak for all of us.
I'm sick of techo-Luddism. I'm sick of complaints about water use in a world that has avocados, beef, and fabric dyes. I'm sick of complaints about power use when you have your air conditioners, winter heat, air travel, and gaming PCs.
I'm sick of artists saying AI image and video sucks. I'm sick of pretend artists, armchair warriors, obsessed fans, and pickmes towing the same line.
I'm sick of engineers saying these models aren't a huge performance gain.
You haters and skeptics out there can keep doing you, but I'm going to keep using the technology. We'll see where the chips fall.
I was raised on optimism and dreams of the future. I want that. I don't want to die with the same incrementalism we've always had. I want orders of magnitude more.
This is our one moment of awareness in a cosmically infinite void. I want spectacular. I'm tired of the chicken attitudes when people should aspire to be eagles.
What we have is so boring. There's so much more if we reach for it. Holodecks, models that cure every molecular cause of cancer, doubled and tripled health spans, instant ability to understand every language, fast and cheap travel autonomous p2p travel, everyone on earth lifted out of poverty, a Michelin chef in your kitchen, ...
I want real AI. I want cures for cancer. I want too want to live in a post scarcity world. We had most of the technologies to do that before this. However the companies and investors involved in the AI build out chose to sit on massive reserves instead of trying to directly solve those problems. There exist proposals which solve hunger, the energy transition, etc and together they wouldn’t amount for even half of what’s been spent.
That tells me those involved want nothing other than money and power.
I’d also suggest, if you care about things like curing “every molecular cause of cancer” to spend some time and energy working in that field to understand the real problems there and work towards real solutions (with models or whatever floats your boat), rather than hoping that some poorly defined techno-optimism hand-waving will just happen to result in the best of all possible worlds, with no downsides or alternative outcomes.
Also, crazy to say that the miracle of existence is boring because we don’t have the tech you imagine!! If that’s your take now, no new technology is going to fix it. You’ll just still be bored with the holodeck and your one precious life to live.
If the parent commenter enjoys working with LLMs, then just let them. This doesn't make them inhumane, nor does it make you less of a human.
And don't forget the negative effects on other people, otherwise that's just selfish.
What a sad, bitter worldview. I hope you find some peace.
Completely insane
At least Sam thinks it should work for everyone and has done literal experiments on UBI
But “Sam bad”
Where are you getting this? He goes out of his way to say how dangerous AI, and has implied before congress that only companies with special licenses should be able to develop it.
He says he does. While everyone around him says he’s manipulative and liar. Let’s not get too carried away on his words when they benefit him.
https://www.newyorker.com/magazine/2026/04/13/sam-altman-may...
> has done literal experiments on UBI
Like a scam cryptocurrency which was banned in multiple countries. Seems to me Sam is only interested in helping others as longs as it puts him squarely on top.
https://www.technologyreview.com/2022/04/06/1048981/worldcoi...
https://www.buzzfeednews.com/article/richardnieva/worldcoin-...
https://en.wikipedia.org/wiki/World_(blockchain)#Legal_and_r...
I wish there was some type of system in-place to hold people to their word, but I can't imagine how it would work.
he's starkly anti-China with a warlike posture that I find dangerous and unappealing
Anthropic has a much more confused mission statement than OpenAI
in interviews, Dario appears to care little for the well-being of common folk, while Sam at least pretends
I guess I can see why you’d want to bury your head in the sand and pretend powerful people aren’t all that bad.
That alone is terrible. Also OpenAI has done actual work to try and mitigate the downstream negatives effects with UBI experiments.
Sam isn’t perfect but it’s deeply unclear that he’s worse
I’ve seen this a few times in the thread. Can you or anyone provide a link that supports this claim?
There's OpenAI employees spreading this rumor on Twitter with 0 evidence. Their entire evidence is "I keep hearing Anthropic wants to control AI". Their evidence is literal rumors.
That’s their core argument against Anthropic, that they are making progress at improving their models ?
That's why he chose the OpenAI logo
Sam Altman is the main perception problem for OpenAI. His background, history, trustworthiness, vibes/interviews etc are all negative PR when seen by the common man.
Dario is more knowledgeable, well informed, empathetic w.r.t. problems etc. In short, somebody who seems mature and trustworthy.
Anthropic capitalized upon a brief window of being more code-focused, which turned into enterprize contracts.
Then on renewal rug-pulled those same enterprises - going from your seat includes all the usage a user would reasonably need, to being you pay for the seat + all tokens at API pricing. (Which they raised by how many times in a year? I don't know the actual number.)
Revenue spikes like crazy through basically hostage taking made possible by Sonnet 3.5 era sentiment + enterprise purchasing lag.
Parlay the revenue spike into the valuation.
Crazy. Those same enterprises will get sticker shock and leave. Absurd short-term thinking.
OpenAI is the better company (transparency, open sourcing things, how they handle things in general e.g. OpenClaw, how they compete, etc.) and they have the vastly better brand, the better consumer presence, and (for me and many others) they have the better coding app + models.
Anthropic doing deeply customer hostile stuff - again and again - to produce a short term revenue spike does NOT make for a long-term sustainable business.
For such a young business to have such a long history of bait-and-switch is absolutely crazy. (Raising prices repeatedly, lowering rate-limits repeatedly, changing the terms, banning calls which contain "OpenClaw", turning on their IDE partners, turning on their enterprise partners.)
AFAICT anyone who's ever shown faith in Anthropic has been immediately exploited by them to some degree. They will quickly get the reputation of being "the Oracle of AI companies".
I wouldn't even value them at half of OpenAI.
They are already. Both for sticker shock, and also because of developer sentiment beginning to shift towards Codex. …and then in a month or two the winds will shift again, I'm sure.
It's interesting to see how Claude Code got commoditized so quickly.
---
> and they have the vastly better brand
Strong disagree there. Anthropic has pretty successfully branded themselves as the more ethical & 'human' of the two companies. (whether that's the actual reality is irrelevant)
I’m not sure how anyone believes that per-seat pricing is halfway viable for AI, and I’m fairly sure the organizations I’m familiar with only REALLY started committing to spend after the shift to API pricing, due to the value they thought they were getting anyways.
Not saying that’s right or wrong, but it’s clearly a factor holding OpenAI back at this point.
What Anthropic has done exceedingly well is work their way into corporations.
I have personally seen massive uptake over the last 6 months of regular people in corporations using Claude cowork. They are all genuinely amazed by what it can do for them.
OpenAI wants to be more of a Google. It’s increasingly seeming like consumer may not be as good of a play here
OpenAI has openings right now for "AI Deployment Engineer"-style positions, which is a role where they embed that employee in one or more customer's businesses. E.g.:
https://openai.com/careers/ai-deployment-engineer-startups-s...
I think this is the right way to go about it. Getting AI integrated well is more of a consulting package than it is a technology/code thing. Just handing a business a model+API will not result in high-quality or long-term relationships. This AI transformation is the most invasive possible thing I can imagine for a business. You really need a human on site to help the other humans across the treacherous organizational and psychological bridges.
1. the way GPT writes is simply fundamentally annoying. I pretty much had to create a project with a file that said "do not use headings, lists or emojis" to make it bearable. It feels like, as a product, this sort of thing should be a general preference the user sets before they even start talking to a chatbot.
2. Claude just loves wasting tokens doing things nobody asked for. You ask "how do I calculate the distance between 2 points?" and it's probably going to compile some C code in the background with tests to make sure it works, then generate an interactive diagram on the fly to show how the math works, and then give you a downloadable file with the code. Like, dude, I just want some text. Why are you doing all of this?
Both of these problems come from the obvious lack of any UI controls in the software. there is no way for the user to know what sorts of things the software can do, because it's not exposed via UI as a checkbox like "generate interactive diagram" or "avoid using emojis." Discoverability is burning tokens to figure out what prompts work, or looking at example prompts the developer placed in the welcome screen.
I just feel it's completely ridiculous how LLM's are essentially the culmination of a trajectory of bad UI practices masquerading as "good UX" and now they're being implemented everywhere because people think it's good UX a blank textbox where you don't even know what you're supposed to type to do something.
We are also concerned that it may not be possible to bind OpenAI using contract terms and/or the US legal system.
When Anthropic had the dispute with the Department of War over very meek conditions (a truly moral AI company would not be engaged in war crimes in the first place), it was a test for Altman, all he had to do was to take the same position. But because he's a psychopath he failed that very basic test.
A decade ago, just that kind of comparison would've been pretty unthinkable.
If someone says they're using Deepseek, it's probably a safe bet that they're sending money to China. People choose an inferior model (Deepseek) because it's cheaper. Not many people are willing to use Deepseek but also pay the 5x price to have an American company run it for them.
Have you seen the trade deficit lately?
I am having a better experience using DeepSeek than using Anthropic.
Dario is constantly fearmongering to generate press, gaslighting, and contradicting himself. Mythos is the most recent example of that. It was never too powerful to release, that was a lie to generate publicity and fear, and an excuse because they didn't have the compute to serve it. People were finding the same bugs and exploits using GPT5.4, GPT5.5, and lesser models. Now all of a sudden, they do have the compute, and now they're saying that Mythos is releasing in the coming weeks.
Anthropic is constantly caught up in ethical scandals too. They pump the web full of advertising bots. They steal peoples tokens, punish you for disabling telemetry, blacklist people they don't like. They had remote code execution vulns in their product for nearly a year and secretly buried that fact, no disclosures at all. Here are some of them https://clawd.rip
They're the least generous with open-source. The most closed off. The most likely to punish you for doing something they disagree with. Whenever OpenAI has issues they reset Codex's rate limits, they've done this every month that I can remember, and sometimes several weeks in a month. When's the last time Anthropic has done that for the many service issues they have had? Never. Not once.
Anthropic also never reply to peoples complaints or issues on GH issues, meanwhile the Codex team is very responsive and they actually care about customers user experience.
There's more, but you get the point. And yes, obviously not all of this is about Dario himself, but he drives the culture at the company.
Just curious how you can afford to care about the guy 7 levels above the men that built and support the API that you buy.
Some don't, and find it hard to believe others really do.
People can spend money how they wish. SamA is a prick, so I don’t buy from his company. I don’t buy from Microsoft or Oracle either. Giving a company your money is explicitly supporting them and everything they do. Are you going to force me to buy products from people I don’t agree with?
Such as them genuinely believing they are the only ones who should control AI. What could possibly go wrong?
I was cracking up. I'm 5'7 on a good day. I feel like that's how valuation works. We are propping up five foot tall giants.
Upshot - poetry expertise does not seem to be the primary focus these days, perhaps to the detriment of the entire world. We did move on from training scaling to “test time” scaling (which I hate as a name btw), Ilya does not seem to have been needed, (although I am really curious what he’s building).
My prediction that you want to be deeply embedded and really rich and part of global infrastructure feels good. My suggestion that oAI / MS would be able to use the lead in 2024 to extend was wrong.
Neither of us talked much about coding as a product that would drive value and behavior, which is super interesting to me, we were probably six months from seeing real competence of any sort there way back in June 2024.
We both seemed to think there would be a single breakout company, or could be one, (although I did suggest buying the basket), clearly not the case with GOOG oAI and Anthropic all posting serious revenues this last quarter / year.
One area of Anthropic that was nascent in 2024, but that I have come to think is super valuable is their mechinterp group. I still don’t see work done by other labs (at least published) to nearly the quality of Anthropic. And the group has clearly moved into a period of productivity; there’s a good chance in my mind it could provide a truly enduring strategic advantage as a tool to be used by the taste makers steering the ship. In 2024, interpretability seemed almost impossible to get a handle on — today, the sustained chipping away at the problem makes a lot more look possible.
They tested that by performing "spectral analysis of over 1100 models - including 500 Mistral-7B LoRAs, 500 Vision Transformers, and 50 LLaMA-8B models ... by applying spectral decomposition techniques to the weight matrices of various architectures", and concluding that "deep neural networks trained across diverse tasks exhibit remarkably similar low-dimensional parametric subspaces", showing that "neural networks systematically converge to shared spectral subspaces regardless of initialization, task, or domain".
Not just philosophically interesting but also has practical implications for being smarter about how to reuse models, model merging, developing more sustainable training and inference algos, etc.
Paper source: https://arxiv.org/abs/2512.05117
They'll kill us all, or they'll kill each other. They sure as hell ain't making the world a better place, like they promised.
They used Palantir's Maven to identify and prioritize targets and Maven integrates Claude into its decision making.
In any sane world where war crimes by the US were actually being taken seriously, both of these companies would be sanctioned.
We've always been having these debates over whether my choice of tech is better than your choice of tech, same holy wars, different type of tech
The advice today is the same as it was 10/20/30+ years ago, pick what works for you and build something good with it
Nobody will actually care how you built it, regardless of whether its good or crap (although if it's crap you can blame your tools)
There is currently quite strong incentive to establish vendor lock-in for Anthropic and OpenAI. I see the ability to jump between companies quite important, especially for larger users. Right now it should not be hard but it can be much harder in the future.
The official implementation of apply_patch is well thought out. It is a two-phase process that will not actually make any changes until all files in the change set are not ambiguous. The pre-commit error feedback usually fixes anchoring issues with one or two additional attempts. It generally goes something like:
Reading file A L1:154
Reading file B L1:123
Attempting to apply patch...
[anchor errors for both A & B]
Reading file A L43:67
Reading file B L50:74
Attempting to apply patch...
Patch succeeded! Running compilation & unit tests...
The anchor error feedback helps massively because in this implementation it also returns the current line numbers where the problem was found.Techniques that replace the whole file or depend on find-replace are useful in more isolated contexts. However, when you need to refactor 20+ files, something like apply_patch is what you want. Anything that depends on specific line numbers for actual replacement targets is a total dead end for complex edit scenarios.
https://developers.openai.com/api/docs/guides/tools-apply-pa...
Codex is very "miss the forest for the trees", but is much better at successfully making large changes in large codebases. Claude Code makes more mistakes, but has more taste and a better grasp on idiomatic and elegant software development.
If you can afford to, I recommend juggling both.
But I feel like an expert who can drive GPT aggressively will out perform Opus. It’s why some smart people I know are opting for GPT and have fallen off on Opus. It’s like asking an F1 driver to sit in a taxi.
This is not a jab, but a genuine curiosity of mine.
I don't think it necessarily says anything about a model itself having 'taste' in some subjective way.
If the fashion changes would the model update with it without retraining? No. So the model doesn't have 'taste' in that sense. It has alignment to current human definitions of taste.
I have specific skills for trying to avoid this, but nevertheless I spent half of the time fighting with its verbosity.
Currently, I'm trying to scaffold the functions/classes I know I need with NotImpelmented and ask it to implement only inside those specific places. It's a little bit better, but I still have to fight with function in functions definitions ...
Admittedly my recent experience tilts Opus now 4.8, but you and others have my interest piqued re: GPT-5.5 Codex so I'm trying that more now.
As far as its tone... Both feel like sycophantic as hell to me. To be honest, they just all feel so.
So does Claude, what’s your point?
I used it and ChatGPT this week in trying to assist troubleshooting a complex DB related issue and Claude had to apologise no less than three times in which it admitted to talking complete shit.
Just one example of the kind of shit it dribbled:
> I need to be upfront with you. I should not have claimed X as if I knew that for a fact. That was overreach on my part.
And here’s the core tension. The models keep getting better. GPT 5.5 improved. But it also got more expensive. Opus 4.7 to 4.8 has become outrageously priced too, up 50%, and 4.6 was already brutally expensive to begin with. API pricing is a real pain.
What’s missing is any meaningful supply of affordable, democratically priced models you can actually embed into your own service. For me that’s playcode.io, whether it’s the website builder or the app builder. The moment we give users access to these models, the cost becomes a serious blocker. There’s no way around it.
The same dynamic explains Cursor. Why did they go build their own Composer 2.5 model? Because relying on third-party models is simply too expensive for users unless they’re carrying a Claude Code or Codex subscription. So Cursor had to roll their own. It’s a real mess, honestly.
And Chinese models don’t close the gap either. They’ve improved, the free-tier ones especially, which is great to see. But the limitations are significant:
• No multimodality. They don’t accept image input. • You can’t attach a screenshot, show a UI, or hand it a PDF. • They feel heavily stripped down overall. • They’re just not polished. Not even close.
Opus, by contrast, feels like a finished, deeply refined product. Everything else is still rough around the edges. And that’s exactly why Anthropic can charge what they charge: because they actually deliver. That’s the whole problem in a sentence.
> Why would a business pay for Slack when IRC exists?
> Why would a business pay for Dropbox when FTP exists?
About IRC / Slack: other than the fact IRC was abandoned, Slack is about control, not product. The product is terrible.
FTP / Dropbox: this comparison does not make sense.
You can theoretically do most things AWS does most of the time, yet people pay premium for it and keep paying for it, even though alternatives are cheaper, simpler and more performant.
I'd bet you that after 20 years OpenAI and Anthropic would still be around and kicking.
You might have a subpar product (for the price) but the reputation and history is what makes people open their wallets.
Depends. The bigger the bubble, the bigger the pop.
Only a few unicorns from the dot-com bust came out the other side (Amazon, Google, ... anyone else?), and that was a piddling affair compared to this one.
It's going to be debated forever whether wiring your own open source tech has a lower development cost than the equivalent AWS bill. For me, that's too broad a statement, as I have seen it go both ways. What is true: There is only some knowledge overlap between maintaining an AWS stack and having your own Prometheus logged, ceph backed set of boxes.
That is not the case with LLMs. At least, not right now. They roughly work the same and are easy to pick up. They are about as straightforward of an interface as it gets, and using them in "advanced" ways could be summarized on an index card. They are relatively fungible.
I don't see a world where OpenAI runs on brand recognition alone. It needs to be more convenient to run than local LLMs. They've done that by buying so much of the worlds hardware that it becomes more expensive to run these things locally.
Now, I think that with these companies IPO'ing and Nasdaq and other bending themseleves and their rules to cater to them (as in case of SpaceX), these companies are very close to an IPO.
So for the employees, they are probably gonna get good evaluations, atleast in the short term and perhaps they are having a problem which is worth having.
But as you have suggested, I feel like the whole thing might be flaky especially given open source models. I believe that OSS models are at worst close to literal SOTA ~6 months ago.
So OpenAI & Anthropic have to somehow always be on the edge to get better models to not lose this (imo) very small time grip that they have, all while losing billions of dollars and having to worry about profitability & so many other concerns in it of itself.
I don't think that there is any other thing inside CS or any industry where two pieces of software being almost comparable enough with not much moat around except a diff of 6 months best, is something on which trillions of dollars float around on. We don't know how things will pan out but if I have to guess, It might not be looking good for OAI, Anthropic over especially the longer horizon.
Ai is overstated in my opinion but to hand wave the reality of them having created something that investors were happy to value at $1T is pretty unfair
One day one feels better than the other. Then, by the end of the day, the other feels better than the first. I have no idea why.
I still don’t have a favorite.
In the end, I think both are incredibly useful when I take the time to instruct them properly.
The problems come when I let them run wild.
Even now, I would guess that if you ask a normie off the street, they are far, far more likely to have heard of ChatGPT than Claude. Of course, Anthropic has been targeting businesses quite a bit harder than the general public for a while, so maybe that's not a fair test.
Anthropic inarguably does make an attempt at marketing their product. But I'm not convinced that the closing of the gap between them and OpenAI (as others have pointed out: I'm not sure it's defensible to claim that either is significantly ahead of the other given the paucity of available data, but they are certainly much closer than they were a year or two ago, when OpenAI was clearly in the lead) is mostly down to that. I think that, for a decent chunk of time (this one I mean in the AI world sense of the term), they had a very non-trivial lead in coding abilities. The developer and business world figured this out and jumped on board. That gap is largely now erased, but that's not enough to retake the momentum.
Doubling down on coding was just infinitely smarter. Has there actually been a successful company which uses AI images and video effectively?
Of course every AI company has been over promising and pumping the numbers as much as possible but OpenAI has been hitting the reality wall more because both their people not being able to keep improving at a faster rate and their whole cost structure and financial plates spinning.
This doesn't invalidate the fact Anthropic is also overhyped to the max for their IPO.
What infrastructure? The hardware would be outdated in 3 - 5 years, after all. What other infrastructure is needed for AI?
The chutzpah is remarkable.
So it's more like selling a derivative on a promise to steal open source for you in a useful way.
> The new valuation is nearly three times higher than the company’s February valuation, when Anthropic was estimated to be worth around $380 billion.
> In March, OpenAI was valued at $852 billion following a record $122 billion funding round.
Basically, today (Late May) we're declaring Anthropic the most valuable. They've nearly tripled in value since February. But also, OpenAI was $852B in March and presumably has grown since then.
In a few weeks we'll either have a new rounding of funding for OpenAI or they'll announce their IPO and the hype train will be abuzz that they're now the most valuable.
Nobody is investing in closed-source labs for safety reasons, being able to explore more in details what and how the model is thinking is nice but by no means a game changer. What matters to investors and most of the users is that the model gives the right answer at the end.
Stealing peoples tokens because you use a product they don't like... That shows the morals they have. Actions speak louder than words. Disabling peoples caches because they disable telemetry was another juicy one that I don't believe is on this site. In fact there are far more I remember that aren't even listed here.
Like actually iterating hard to make them useful. Many, many details matter here.
I haven't tested the similar OpenAI/Google tools in detail lately though. Previously I found them way too generic and unpolished to be useful.
Is there something to this?
Anthropic has much narrower capabilities. No image generation, no video generation, no 3d world models, barely any voice stuff. But they know who their target customers are, and their API has a model selection anyone can understand and pricing that rarely changes. Focus and predictably
These are the new .net developers who will know nothing but c# for 20 years.
OpenAI. Spent its resources on AGI whilst Claude worked on making programming work.
Google Gemini is out of the race entirely its programming AI is a joke.
What I find fascinating is how many inside the bubble defend this for no other reason than they think they're personally going to make their bag out of AI. You're not Sam Altman. Or Elon Musk. Or Jeff Bezos. And you're not going to be.
What's going to happen is that in a few years the sky-high AI salaries are also going to disappear. More work will be done by fewer people in this space too. And only then will many people change their tune because the rising waters have finally reached them.
Everyone I know at Anthropic is miserable. It’s not a fun place to work. There’s a reason you never see Anthropic employees talking online.
Wanting to make a lot of money is offense ?
I'll keep what you said in mind though. Grass greener and all that
Don’t get me wrong, my friends are all making killer money…but they are also all out of therapy sessions, many are back on ADHD and SSRI meds, and the company seems to be full of egos and heavy handed mid management. I get the joy of meeting up with friends and their coworkers so I get to hear…a lot.
More to life than money!
That's why I want to make enough to retire early!
In my 20s I wanted to retire by 40.
In my 30s I want to retire by 45.
Although I'm starting to understand the journey is just as important as the destination. I do have a fantastic low stress job currently.
Tech culture preys on making you feel inferior if you aren't loaded with RSUs and equity, but so many of these people hoarding wealth are miserable in real life but will never admit it as it’s their identity they’ve built up. They buy Porsches or homes just to feel something, or have something to show people.
Money at some point has diminishing returns, and once you have enough to be whatever your version of secure is, you should stop before you can’t even enjoy happiness. Until then, enjoy the process and do not look to others as it’s the theft of your own happiness.