Ask HN: Am I missing something with AI (news.ycombinator.com)

19 points|by vasko|5d ago|23 comments|Read full story on news.ycombinator.com

I constantly hear developers around me talk about how AI has completely changed their life and how they don't even program anymore, they just prompt. But any time I've used it, the output has always been off. And when the output is off I have to go and read through everything, learn how it works and fix it, which at that point I might as well write it myself.

I just don't understand what other people are seeing, I've mainly used Claude and ChatGPT, I got a free trial for premium but it's just underwhelming, their only use so far for me has been as a search engine, but they're a search engine that's wrong 20% of the time so even that use is questionable.

Comments (23)

23 shown

1. dejan_kocic|5d ago|context

I think AI is good for creating a foundation, then branching out and adding features, you shouldn't overdo it with AI.
2. morisil|5d ago|context

I am shocked how much my experience is different from yours. I wrote Claudine, my own version of Claude Code, almost 2 years ago. This experience gave me the understanding of how the technology works. Since then I've produced maybe 300k lines of open source code, and all of it meaningful to the bones. What kind of projects are you working on, maybe it's the specificity of your domain?
3. JohnFen|5d ago|context

You might be even more shocked to learn that the author's experience isn't rare.
4. yellow_lead|4d ago|context

Can you share the code, since it's open source?
5. Festro|5d ago|context

We're reaching a point currently where output quality is very much determined by input quality. Previously output quality was hampered fundamentally by model knowledge, hallucinations, and model quality.
Now, we have better knowledge of prompting as people have learnt what to say, models are better, models make use of memory from other conversations, they have skills written by humans or even themselves on how to do things, access to the internet to get live info, access to project files to check info, and the built in 'thinking' to challenge their own assumptions and loop on outputs until its refined.
You're right that output is always off still, but a lot of people have reached a point where it's only 'off' by an amount that is less than the effort required to do the task themselves, and considerably so.
My example today is prompting Claude to do a technical audit of a new client site.
It has skills for UX and SEO audits. Connects to an SEO tool. Pulls client info from OneDrive. Outputs to Word from a template for our agency. I even had it drive a remote pagespeed testing tool in Chrome because they don't have an MCP server currently.
Doing that report myself is 3.5-7 hours depending on what's found. Claude did it in 0.5 hours. Now I'm sorting out the oddities and anything that feels 'off'. I know and understand the full content of the report and can get on with actioning the recommendations or prioritising them for others. I've got maybe 1 hour of review and writing to do. It's not a 10x improvement but I'm happy with it.
Although, whilst Claude did it's bit I was doing other work. So, perhaps the multiplier is higher than I give it credit for.
6. vasko|5d ago|context

The way AI is able to interact with outside resources is pretty impressive, but the quality of code it produces to me is still questionable, more so in the larger scope, and the errors it produces are sometimes hard to catch because they're not normal human errors.
Recently I tried to get Claude to write a script that produces large amounts of code so I could profile a compiler. The script ended up outputing code that uses variables outside of their scope, didn't utilize like 90% of the features of the language, and basically ended up being something that I could make by spamming copy paste.
The script itself was also written in really weird way, utilizing recursion for pretty much everything when most of what it did could be done in simple loops. It ended up being a bit of a nightmare to fix and the entire time I was asking myself "why didn't I just write this in 30 minutes instead of going through all of this".
7. Festro|5d ago|context

I can't speak to coding as it's not my area but certainly the pattern I've spotted is that it's best at grunt work. That's where the time savings kick in.
Browsing sites, linking up data, spotting anomalies, writing documentation, formatting documents, etc.
If a task isn't repetitive or doesn't involve ingesting data, then I think the time savings shrink rapidly and the need for oversight increases massively. I think some people are managing to set up enough automated oversight to get round that, but it's adding a layer that multiplies your token usage to do so and still has no guarantee. But certainly all these layers being added are increasing success rates.
Andrei Karpathy is speaking about barely coding now. He has a bias, a comment from him like that is marketing for Anthropic, but I believe he's found some groove with his setup to achieve that.
I think the current status quo this month in 2026 we're at a point where the best tips and tricks to get usable answers out of ChatGPT a year ago have been consolidated into what we know call memory and skills in Claude and other agent harness type systems. You might need to explore those more, in fact I think for Claude Code/Cursor there are even more layers for checking outputs that I've not even seen in Claude Desktop.
And I think your exact issue, and the experience of the vast volumes of people who share it with you, are an audience that the app makers want to better convince. The free tiers and marketing sites are going to step up their game gradually and there will be new features that lower failure rates even more.
8. SyneRyder|4d ago|context

This caught my eye:
> The script ended up outputing code that uses variables outside of their scope, didn't utilize like 90% of the features of the language
Using variables outside of their scope sounds very unusual to me for Claude. You are using Claude Opus (4.5 or higher) and have set the thinking to High or above, right? Make sure you're not using Claude Haiku. Sonnet can be okay, but I'm sure the developers you've heard raving about it are all using Opus 4.8 or GPT 5.5, and all using it from within Claude Code or Codex (or OpenCode or Pi, tools like that anyway).
Claude should catch something like variables being outside of scope immediately when compiling, and fix it as soon as it notices the compiler bug.
> The script itself was also written in really weird way, utilizing recursion for pretty much everything when most of what it did could be done in simple loops...
That's actually a great opportunity to develop a new prompt to give to Claude. AI is really good at pattern matching. Take one of those weird recursion methods Claude came up with, then rewrite it as that simple loop that you would prefer, and show both to Claude. Then ask in the same turn: "This is how I prefer to write this code. Can you suggest a prompt to me that would encourage you to write this style of code instead in future?"
See if you can get Claude to reduce that down to a simple maxim or principal you can include in a startup prompt you provide at the start of each session, or into your global CLAUDE.md file that is loaded at the start of every Claude Code session. It might end up being a guideline like "Prefer simple loops over recursion whenever appropriate."
It's possible that the developers you've heard raving about AI have already developed startup prompts / CLAUDE.md files filled with similar maxims & principals, tailored specifically to how they like to code & work, evolved from months of working with AI.
9. bawis|4d ago|context

>> Now, we have better knowledge of prompting as people have learnt what to say
Can you back up this claim? what do you mean exactly by "better knowledge" ?
10. jr_isidore|5d ago|context

When you join a new company, is it faster to fix a bug rewriting everything from scratch or to modify what's there? Seriously, get your head out of your ass.
11. ex-aws-dude|4d ago|context

Are you treating it like a genie to build huge things in one shot or working on small incremental changes?
I’ve found the latter works way better
12. hash0|4d ago|context

This I found to be true, too. "One-shotting" a prompt and getting the AI to build you a working "mock-up" or "pre-Prototype" is satisfying but won't scale. As soon as you want to add features on top of that which you have not specified in the first prompt, AI will drag you down into bugfixing both the code and trying to make the AI behave. My personal best practice for using AI is this: Describe the problem you have, then let the AI explain to you the common solutions to that - after all, it's training data contains the aggregation information of the internet, including the newest paradigms, frameworks, and best practices. I then let it teach me how these work so that I can build them into the code myself.
13. dhruvyads|4d ago|context

Depends heavily on the models you use. SOTA models (Fable 5, Opus 4.8, GPT 5.5) are quite good in their native harness.
14. montfort|4d ago|context

This is a common experience for everyone. A combination of small models with overly abbreviated prompts will produce poor and faulty output. This is where what's called "prompt engineering" comes in. With practice, you'll see that detailed, well-structured, and regularly large requests will yield better results. You'll also find that you can establish dialogues with the agents to refine ideas and ask them to take structured notes before starting implementation work. The idea of achieving good results with a lot of agent work stemming from a small prompt should be dismissed. It requires a lot of conversation, planning, and design on our part. Don't give up; it's just a matter of getting used to this "language" for communicating with the agents.

15. Kuyawa|4d ago|context

Use this prompt, thank me later...

  Create an online sports betting platform following these rules:

  - Create a web app for betting on sports like baseball, basketball, football, hockey and soccer
  - App name is BetMax

  Use this tech stack:
  - Use node js and express for server code, ejs for templates and UI, postgresql for database management
  - Use only HTML, JS and CSS for web pages
  - Create a public folder with fonts, icons, media, scripts and styles subfolders
  - Place respective files in each folder according to its function

  Styling:
  - Design in light mode and dark mode
  - Design in responsive mode for mobile devices
  - Use one single file for shared styles named common.css, and one file for every web page for custom styles related to that page if necessary, each stylesheet file will be named like the page that it customizes but using .css extension
  - Create a header and footer component, include them in every html file directly at the top and the bottom according to ejs, passing vars when needed

  Authentication:
  - User registration, login, logout, forgot password, and all the forms for user and session management
  - Once registered, users can top up their wallet with money, we will use Binance wallet for that
  - Binance integration will be done later, for now leave placeholders
  - No avatar is required
  - Use node:crypto Sha256 encryption instead of bcrypt

  Betting info:
  - Once registered, landing page will show five buttons, one for every sport: baseball, basketball, football, hockey, soccer
  - Once clicked on the desired sport, daily schedule will be shown with all the games for that day, showing teams, times, odds
  - Users will be able to pick one or more winners in parlay mode, recalculating the max winning amount based on the odds of selected teams
  - They can bet all the money they want up to a limit per user, stored in the user config info
  - Once ready, the user will finish the bet, deduct a payment from their wallet balance, and the betting ticket will be stored in the database and shown in the screen
  - Keep in mind some matches can be postponed or cancelled, use a status field for that

  Other pages neded:
  - Landing page: make it professional, light/dark themed, responsive, minimalist but be creative
  - Once the user is registered, show the sports selection page
  - Once the sport is selected, show the schedule for games of the day
  - They can pick their teams, confirm the purchase and get their betting ticket
  - Once the match has finished and results available, admins will enter them in the database and calculate winners, payouts and commissions
  - Keep a history of tickets so users can know their outcome, winner or loser

  Rules:
  - Place views in 'views' folder
  - html file extensions as .html not .ejs, make changes in express for that
  - Keep the style in all pages, make it uniform
  - Try not to use popups, use linked pages most of the time, save user state while navigating between pages
  - Design the database tables and fields, create a database.sql file with all commands to create the DB
  - Use dummy data for now, create a dummydata.sql file for that. Make it easy to replace dummy data for db calls in the future
  - Place all db commands and queries in a class stored in a single database.js file that can be accessed from any module
  - Use an .env file for all keys, secrets and configuration

  Thank you

16. Kuyawa|4d ago|context

Oh and btw, it costs less than 5 cents using DeepSeek
17. xpnsec|4d ago|context

It’s like I asked the exact question :) Your experience is certainly not unique. I see the stories, and often feel like it is just a skill issue on my part. So I ask for advice to help correct my workflow, and I receive a mixture of suggestions on getting the LLM to produce maintainable code (which is always welcomed when it helps to improve the output). But then I noticed that there is a small pocket of people who tell you that you are wrong, or it is a skill issue, or their view is completely at odds to yours. That they have been 10x’ing their output for months. So you ask if they can share a codebase or GitHub link which demonstrates how they have managed to tame the LLM on larger projects, and people go silent.
So now I’m trying to let the code do the talking as one method of learning. Hunting through GitHub looking for SDD projects and trying to understand what works vs what is parroted on X.
18. drrob|3d ago|context

You're really not missing anything I'm afraid; if you're a shit coder it'll seem like voodoo magic, if you're a good coder it's good in certain circumstances (parsing logs, interpreting debug messages, that sort of thing) but won't replace you.
19. jamesli233|3d ago|context

Perhaps you should give those programming agents a try, such as Codex. Try to go beyond the IDE and organize tasks at a higher level.
20. al_borland|3d ago|context

It’s the same for me. Even if it technically work, the solution looks overly complicated and is hard to parse, which makes me think it will be hard to maintain.
This happened to me last week. I went back and forth with the AI for 2 days. My company then ran out of tokens for the months, so I just did it myself and came up with a solution that I feel is a lot more straightforward. That, plus all finishing touches, and testing were done by noon.
I find more and more that AI turns into a procrastination machine. I’ve only found it useful for things that are so basic the AI one-shot it, low stakes (logic issues won’t be a major issue), and completely independent, where I don’t really have to worry about maintaining it. For anything else I’m finding more and more than it’s faster to not try and have AI do anything.
21. rkochanowski|3d ago|context

That's why it's so important to design a solution before letting AI implement it. I noticed that AI often can't see clean, elegant and primitive solutions that fit best for a given problem. Even when I ask if it can be designed in a simpler way, it can't see what I can see. I think this is one of the most important points where humans should be involved.
22. al_borland|3d ago|context

I often figure out my design by writing code. By the time the design is figured out, it’s done. I’m not sure what AI adds to that equation.
I could do some of that in pseudo code, but it’s usually just as easy to make it work and actually test the hypothesis.
23. imron|2d ago|context

As someone with over 25 years experience in software engineering, 6 months ago I used to feel the same way (https://news.ycombinator.com/item?id=46389417).
What changed:
- Opus. This was the first model family for me that produced good enough output _and_ could also be correctly steered to correct itself when not good enough. ChatGPT 5 level models are also good enough here but Opus still has an edge I think.
- OpenCode. The UX of OpenCode just seems to fit well with how I work - enough information about what the agent is doing that I can stop it if its getting stupid/doing something wrong, high enough level that I don't need to constantly babysit it. I keep trying Claude Code every now and then but continually get unsatisfactory results even with the same underlying model. Codex works better in this regard.
- Tokenmaxxing. At first I got the standard $30/month plan but would hit session limits in about 30 mins, then I needed to wait a few hours before I could continue so no net benefit in productivity. Then I upgraded to the 5x plan and could go 1-2 hours before hitting sessions limits. This also was no net benefit. Then I upgraded to the 20x plan and was swimming in a sea of tokens. The problem then becomes figuring out how to use them all so you are 'wasting' any of them.
It's the last one that really helped shift the mindset for me. My process now is something like this:
1. use the agent to build and refine an overview of what I'm trying to do and what I'd like to build. This gets saved to the docs folder in the repo.
2. use the agent to build out specific plans to build out what I need. Plans are reasonably high level and describe the what and the why along with important design decisions and measurements of success. Each plan is about enough to implement in a given session. I purposefully do not get it to specify code or tests in the plan as too much specificity in the plan causes the implementing agent to get hooked up on the details rather than trying to find a good solution. These are saved to plans/backlog/NNNNN-plan-name
3. Use the agent to help me review all plans and make sure they are consistent and fit with the overview, and also figure out dependencies between the plans, and which ones can be done in parallel.
4. Use the agent to start implementing - this involves moving the plan to plans/active/... creating a worktree and a branch and working on the feature. I will kick off multiple agents working in parallel where the dependency graph allows it. I review each implemented plan throroughly (I've written my own review tool for this) and iterate until the code meets my standards and the requirements. Then I move the plan to plans/completed/.. merge to main, remove the worktree and then kick off the next agent. Usually I'll be switching between reviewing code, kicking off the next plan in a separate agent, planning out new features, all in parallel.
This is the real productivity enabler. You need to have a backlog of well-scoped work and can then have multiple agents working on different parts of it. Human review is essential if you care about long-term maintainability of the code and ease of future improvement because the AI will still make many flawed decisions.
I tend to avoid other peoples skills. I've found it more productive to build my own as I go if I find myself repeating myself to the agent. Agents will regularly ignore instructions in skills anyway so it's all a bit hit and miss. I try to keep any skills that I make brief and too the point (the more concise, the less likely the agent will skip over it/ignore it).
Overall I've found I've manage to build things more quickly, and the things that I build are now very well documented and explained which helps both agents and humans understand the codebase.