So, maybe this is just sloppiness and not intentionally misleading. But still, not a good look when the company burning through billions of dollars in cash and promising to revolutionize all human activity can't put together a decent powerpoint.
I pasted the image of the chart into ChatGPT-5 and prompted it with
>there seems to be a mistake in this chart ... can you find what it is?
Here is what it told me:
> Yes — the likely mistake is in the first set of bars (“Coding deception”).
The pink bar for GPT-5 (with thinking) is labeled 50.0%, while the white bar for OpenAI o3 is labeled 47.4% — but visually, the white bar is drawn shorter than the pink bar, even though its percentage is slightly lower.
So they definitely should have had ChatGPT review their own slides.
funny isn't it - makes me feel like it's kind of over-fitted to try and be logical now, so when it's trying to express a contradiction it actually can't
My unjustified and unscientific opinion is that AI makes you stupid.
That's based solely on my own personal vibes after regularly using LLMs for a while. I became less willing to and capable of thinking critically and carefully.
It also scares me how good they are in appealing and social engineering. They have made me feel good about poor judgment and bad decision at least twice (which I noticed later on, still in time). New, strict system prompt and they give the opposite opinion and recommend against their previous suggestion. They are so good at arguing that they can justify almost anything and make you believe that this is what you should do unless you are among the 1% experts in the topic.
> They are so good at arguing that they can justify almost anything
This honestly just sounds like distilled intelligence. Because a huge pitfall for very intelligent people is that they're really good at convincing themselves of really bad ideas.
That but commoditized en masse to all of humanity will undoubtedly produce tragic results. What an exciting future...
> They are so good at arguing that they can justify almost anything
To sharpen the point a bit, I don't think it's genius "arguing" or logical jujitsu, but some simpler factors:
1. The experience has reached a threshold where we start to anthropomorphize the other end as a person interacting with us.
2. If there were a person, they'd be totally invested in serving you, with nearly unlimited amounts of personal time, attention, and focus given to your questions and requests.
3. The (illusory) entity is intrinsically shameless and appears ever-confident.
Taken together, we start judging the fictional character like a human, and what kind of human would burn hours of their life tirelessly responding and consoling me for no personal gain, never tiring, breaking-character, or expressing any cognitive dissonance? *gasp* They're my friend now and I should trust them. Keeping my guard up is so tiring anyway, so I'm sure anything wrong is either an honest mistake or some kind of misunderstanding on my part, right?
TLDR: It's not not mentat-intelligence or even eloquence, but rather stuff that overlaps with culty indoctrination tricks and con[fidence]-man tactics.
AI being used to completely off load thinking is a total misuse of the technology.
But at the same time that this technology can seemingly be misused and cause really psychological harm is kind of a new thing it feels like. Right? Like there are reports of AI Psychosis, don't know how real it is, but if it's real I don't know any other tool that's really produced that kind of side effect.
That would still be basic fail, you don't label a chart, you enter data, the pre-AGI computer program does the rest - draws the bars and slows labels that match the data
clearly the error is in the number, most likely the actual value is 5.0 instead of 50.0 which matches the bar height and also the other single digit GPT-5 results for metrics on the same chart
This half makes sense to me - 'deception' is an undesirable quality in an llm, so less of it is 'better/more' from their audiences perspective.
However, I can't think of a sensible way to actually translate that to a bar chart where you're comparing it to other things that don't have the same 'less is more' quality (the general fuckery with graphs not starting at 0 aside - how do you even decide '0' when the number goes up as it approaches it), and what they've done seems like total nonsense.
It would be interesting to know how this occurred. I assume there may have been last-minute high-level feedback suggesting: "We can't let users see that the new model is only slightly better than the old one. Adjust the y-axis to make the improvement appear more significant."
Claude and ChatGPT actually took me several prompts to get them to identify this. They recognized from a screenshot that labeled axes that start at zero can be misleading, but missed the actual issue.
"ChatGPT, this slide deck feels a bit luke warm, help me make a better impression"
I could completely believe someone who is all-in on the tech, working in marketing, and not really that familiar with the failure modes, using a prompt like this and just missing the bad edit.
I mean this is the industry standard. For example every time Nvidia dumps a new GPU into the ether, they do the same thing. Apple with M series CPUs. They even go a step further and compare a few generations back.
The other chart on that slide was actually to scale. My suspicion is that it was super rushed to the deadline for this presentation and they maybe didn't use excel or anything automatic for the charts, so they look better, and they missed the detail due to time pressure.
The next management consulting flavor of the month will be full spectrum, panopticon RTO employee monitoring to ensure employees are doing work themselves, not using LLMs, and not working other jobs. It will be scored by AI, of course.
The 69.1 column has the same height as the 30.8 column. My guess is they just duplicated the 30.8 column and forgot to adjust the height to the number, which passed a cursory check because it was simply lower than the new model.
This doesn't explain the 50.0 column height though.
Eyeballing it, that bar looks to be around 15% in height. Typing "50" instead of "15" is a plausible typo. Albeit, one you might expect from a high-schooler giving a class presentation, not in a flagship launch by one of the most hyped startups in history.
Just remember, everyone involved with these presentations is getting a guaranteed $1.5 million bonus. Then cry a little.
Yep. It sounds more like a dictation error as “fifteen” and “fifty” sound similar. No idea why this should matter in the slide production process though.
> The 69.1 column has the same height as the 30.8 column. My guess is they just duplicated the 30.8 column and forgot to adjust the height to the number
Why, unless specifically for the purpose of making it possible to do inaccurate and misleading inconsistencies off this type, would you make charts for a professional presentation by a mechanism that involved separately manually creating the bars and the labels in the first place? I mean, maybe, if you were doing something artistic with the style that wasn't supported in charting software you might, but these are the most basic generic bar charts except for the inconsistencies.
Why are they so sloppy? Is it because they want to go viral with le funny bad graphs? I'm sure AI could handle "converting test results in an excel document to a visual graph"
I tend to blame performance issues on the developer writing the code on a top of the line computer. There are too many WebGL effects on startup websites that were built to run on a M4 Max.
> There are too many WebGL effects on startup websites that were built to run on a M4 Max.
Tale as old as time. When the retina display macs first came out, we say web design suddenly no longer optimized for 1080p or less displays (and at the time, 1376x768 was the default resolution for windows laptops).
As much suffering as it'd be, I swear we'd end up with better software if we stopped giving devs top of the line machines and just issued whatever budget laptop is on sale at the local best buy on any given day.
At my work every dev had two machines, which was great. The test machine is cattle, you don't install GCC on it, you reflash it whenever you need, and you test on it routinely. And it's also the cheapest model a customer might have. Then your dev machine is a beast with your kitten packages installed on it.
I wouldn't go that far, but maybe split the difference at a modern i3 or the lowest spec Mac from last year.
It would be awesome if Apple or someone else could have an in-OS slider to drop the specs down to that of other chips. It'd probably be a lot of work to make it seamless, but being able to click a button and make an M4 Max look like an M4 would be awesome for testing.
No no no.. go one better for the Mac. It should be whichever device/s which are next to be made legacy from Apple’s 7 year support window. That way you’re actually catering to the lowest common denominator.
Weren’t some people, unironically, expecting AgI announcement for GPT-5. Like I have heard a water cooler (well, coffee machine) conversation about how OpenAI master plan is to release GPT-5 and invoke the AGI clause in their contract with Microsoft. I was shaking my head so hard,
They are both using the "capitalist" definition of AGI, that is "an AI system that can generate at least $100 billion in profits". I think it's short for "A Gazillion Idiots"...
It is actually incredible how they managed to find an even
more unscientific definition than "can perform a majority of economically useful tasks." At least that definition requires a little thought to recognize it has problems[1]. $100bn in profits is just cartoonishly dumb, like you asked a high schooler to come up with a definition.
[1] If a computer can perform the task its economic usefulness drops to near zero, and new economically useful tasks which computers can't do will take its place.
OpenAI has always known that "data" is part of marketing, and treated it as such. I don't think this is intentional, but they damn well knew, even back in the dota 2 days, how to present data in such a way as to overstate the results and hide the failures.
There's a social media engagement tactic where people will deliberately add specially tailored elements to content in order to bait people into commenting about it. I wonder if there was some deeper strategy to this chart being used during their presentation or if it really was just a blunder.
Maybe the fact that there were additional blunders, such as the incorrect explanation of the Bernoulli Effect, suggests that the team responsible for organizing this presentation didn't review every detail carefully. Maybe I'm reading too much into a simple mistake.
the jumping game it created has the easiest way to beat it. You can keep jumping indefinitely and never hit an obstacle. Probably an extra prompt would fix that, but its funny they published as is.
GPT-5 is probably going to be a meaningful improvement for most of my non-technical family members who like ChatGPT but have never used anything other than 4o. In fact, most of the people I know who use ChatGPT pay no attention to the model being used, except for the developers I know. This update is going to be a big deal for them.
For me, it's just another nice incremental improvement. Nothing special, but who doesn't like smarter better models? The drop in hallucination rates also seems meaningful for real-world usage.
GPT-5 models are actually great models for the API, the nano model is finally good enough to handle complex structured responses and it's even cheaper than GPT-4.1-nano.
No, i mean what is the context. Who created this originally? Where is the link to openai or whoever creating this chart, or context behind the misinformation if any. I check the comments and stories about chatgpt5 and there is no reference to this, so im at a loss.
Ok, I see there was a bug on the site and it wasn't scrolling on iOS. They fixed that now, although the background context is still unclear, and none of the links in the site seem to explain it.
Yes, that second image was initially hidden on iOS due to the scrolling bug in the site (now fixed).
So they spotted what seems to be an unintentional error in a chart in a youtube video, and created a completely different chart with random errors to make a point, while due to their own coding error the (somewhat obtuse) explanation wasn't even visible on mobile devices.
Not sure why this was voted to the top of the first page of HN, although I can surmise.
I think the stock market has just proven time and time again that a large proportion of investors (and VCs) do basically no due diligence or critical thinking about what they're throwing money at, and businesses actually making profit hasn't mattered for a long time - which was the only thing tethering their value to the actual concrete stuff they're building. If you can hype it well your share price goes up, and even the investors that do do due proper diligence can see that and so they're all in too.
By and large people do not have the integrity to even care that numbers are obviously being fudged, and they know that the market is going to respond positively to blustering and bald faced lies. It's a self reinforcing cycle.
Oh trust me I know. I worked at Palantir well before it was public and had firsthand experience of Alex Karp. He would draw incomprehensible stick figure box diagrams on a whiteboard for F100 CEOs, ramble some nonsensical jargon, and somehow close a multimillion dollar pilot. The guy is better at faking it than high-end escorts. It doesn't surprise me that this has fooled degens around the world, from Wall Street to r/wallstreetbets. Incredibly, even Damadoran has thrown in the towel and opened a position, while still admitting he has no idea what they do.
It’s more terrifying that no one cares about the truth it seems anywhere. Vibeworld, we are all selling vaporware and if you don’t build it who cares move into the next hype cycle that pumps the stock / gets VC funding. Absurd industry.
Could you please stop posting unsubstantive comments and flamebait? You've unfortunately been doing it repeatedly. It's not what this site is for, and we've asked you many times to stop.
@dang why is this post allowed to be flagged off its #1 spot? Is that not clearly a misuse of the flagging system, which is not supposed to be just for posts that people don’t like?
From the HN FAQ:
> What does [flagged] mean?
> Users flagged the post as breaking the guidelines or otherwise not belonging on HN.
> Moderators sometimes also add [flagged] (though not usually on submissions), and sometimes turn flags off when they are unfair.
I mean it in the kindest way, but scientists might be the sloppiest group I've worked with (on average, at least). They do amazing work, but they're willing to hack it together in the craziest ways sometimes. Which is great in a way. They're very resourceful and focused on the science, not necessarily the presentation or housekeeping. That's fine.
This was a big COVID-era lesson; that places like the CDC and NIH and whatnot really need a well-trained PR wing for things like Presidential press conferences, to communicate to the public.
The engineers, sure. Product team... well, we've seen the past 2-3 years that AI isn't necessarily based on quality and accuracy. They are also at the top of their game in terms of how to optimize revenue.
Poor OpenAI workers, they worked so hard for the GPT-5 release and now discussions about the model are side by side with discussions about their badly-done graphs.
I don’t believe they intentionally fucked up the graphs, but it is nonetheless funny to see how much of an impact that has had. Talk about bad luck…
Everybody including the employee who put those graphs into the slide deck just got $1.5M just for showing up at work. So there's not a lot of room for sympathy.
Let's entertain a completely imagined, made up thought experiment.
Imagine a revolutionary technology comes out that has the potential to increase quality of life, longevity and health, productivity and the standard of living, or lead to never before seen economic prosperity, discover new science, explain things about the universe, or simply give lonely people a positive outlet.
Miraculously, this technology is free to use, available to anyone with an internet connection.
But there was one catch: during its release, an error was made on a chart.
Let's ignore for the moment that we're talking about a word generator that relies on an infinite amount of pirated data input to "learn" anything. Let's also ignore that the primary goal of "AGI" for the people pushing it is to replace workers en masse and to enrich themselves, and not any naive notion of progress or whatever.
So this miraculous technology that can do everything, cure diseases, reverse human aging, absolve us of our sins etc. can't accurately make a bar chart? Something kids learn in 5th grade mathematics? (At least I did, mileage might vary there)
After all of this hype this is the best they can do? This is the forefront company (arguable) of the forefront tech and no one can review slides before being shipped out? I think the reason why this has resonated with people is that it gives a "vibe" of not giving a shit, they'll ship whatever next slop generator they want and they expect people to gladly lap it up. Either that or they're using their own dog food and the result is this mess. Do the stats even matter anymore? Is that what they're banking on?