GLM-5.2 is a step change for open agents (interconnects.ai)

365 points|by vantareed|5d ago|223 comments|Read full story on interconnects.ai

Comments (223)

120 shown|More comments

1. Balinares|5d ago|context

I can't help wondering what kind of models we'll see coming out of China once it gets its own chip fabs up and running. Right now it sounds like the US's export ban is not slowing them down a whole lot.
2. ceejayoz|4d ago|context

> Right now it sounds like the US's export ban is not slowing them down a whole lot.
It may wind up being a massive boost to them in the long run, even.
Necessity is the mother of invention.
3. pkroll|4d ago|context

If this pans out, you're not at all kidding: https://www.youtube.com/watch?v=8ekndZwyOzo
4. verdverm|4d ago|context

Trump allowed more advanced chips (H200s) to be sold after his visit, because some people in the admin still believe the US can "addict" China to the hardware. It seems China is only letting a token few in, the ban is more on their side now, as Xi really wants indiginous capability.
5. briga|4d ago|context

With subsidization from the Chinese government they will probably be equal to or better than the models here. I mean, have you looked at the author list of any given AI paper published within, say, the past 5 years? I wouldn't be surprised if half or more AI researches are from China.
6. buzzin__|4d ago|context

Can you compare the amount to the USA subsidization? Which one is bigger? Per Capita? Per unit of economic growth achieved?
7. usef-|4d ago|context

You mean from the private investors? It seems the labs on both sides of the ocean are quite negative in their profitability right now due to the competitiveness. Though Anthropic claims they will have a profitable quarter this year (despite the huge build-out), so their margins on API costs are likely quite decent.
8. pianopatrick|4d ago|context

There does not seem to be a big penalty for going slow anyways. People seem to just switch on cost as soon as a model can do a task well enough. There do not seem to be strong network effects or vendor lock in.
Seems to me that going slow is the better long term tactic. China can just let the USA pay the high R&D costs to figure out what works, then just copy what works.
9. khurs|3d ago|context

>Right now it sounds like the US's export ban is not slowing them down a whole lot.
Just costing them a lot more money as they pay multiples more buying on the underground grey market.
10. jerojero|5d ago|context

Open weight models from Chinese labs tend to be significantly cheaper.
I think theyre absolutely needed. I can't afford 200 USD a month for personal use of coding AI, and I don't think such prices are reasonable for most of the world economy anyway. Not to mention US firms might be giving their employees a lot more than that.
It's increasingly feeling, to me, that theres a gap building up between haves and have nots. But then, we get news of these open weight models that are reasonably priced in inference with reasonable capabilities. Yes, they take maybe 6-9 months to get there, tbh, that's not a bad trade off at all.
11. ttoinou|5d ago|context

200 is much less than the value you’re supposed to get out of it. If it’s not then yeah go ahead and use cheaper models with worst quality
12. Dayshine|5d ago|context

I'm not sure how I'm supposed to get $200 of value out of personal use!
13. LPisGood|4d ago|context

Note that 200 dollars of value is different than 200 dollars of profit.
14. devmor|4d ago|context

I personally don’t find it that useful for most tasks, but if say, you get paid $50/hr for your work and it saves you more than 4 hours of work in a month, there you go.
15. selcuka|4d ago|context

Obviously this assumes that you can find 4+ extra hours of $50/hr work every month, or you can work 4 hours less. Neither of these assumptions is correct for people who work for a fixed salary.
16. devmor|3d ago|context

That doesn’t change value. It’s value whether or not you can maintain a profit over it.
17. selcuka|3d ago|context

That's the definition of "value" in a broad, economics sense but I don't think it applies to the parent comment of this thread:
> I'm not sure how I'm supposed to get $200 of value out of personal use!
18. devmor|2d ago|context

If you are not working paycheck to paycheck, but still need to commit that time to work, you may prefer $200 worth of free time that you otherwise could not have - that is my general point of view there.
19. windexh8er|3d ago|context

I think this is the rub the enterprise will be forced to grapple with. Not everyone is going to get $200 worth of value for the organization. In fact since it's not a restricted tool some will waste time and company resources using it. Undoubtedly some will get the value out of it, but it's very likely, that these are the same people providing more than what they're paid already. Nothing has changed other than, potentially, time savings and (hopefully) output improvement. Neither of those are any sort of guarantee though, either. Subjective systems are hard to show value, especially in the long term.
20. holoduke|4d ago|context

Here most of my colleagues have +200 dollar rates. It's really a no brainer. But sure, in south America or some Asian countries maybe it is. But still most devs need it anyway. Also in the poor regions.
21. HDBaseT|4d ago|context

$200/h is on the extreme end and I would argue most people here aren't anywhere close to that.
The median hourly wage in the US is $28/h, this equates to nearly 7.5 hours. A full day of work a month for the average person to use Claude with reasonable limits.
Yes, the people on $28/h may not be the software development types, so their income might not be as high, but these are the people who would probably be vibe coding the most since they aren't day to day programmers!
22. ray_kay777|4d ago|context

I suspect the reply above is referring to charge out rates rather than wages.
23. HDBaseT|4d ago|context

My fault, thanks for the correction (:
24. folkrav|4d ago|context

Most of the world's developers, even in not-poor regions, make significantly less than what your colleagues charge.
25. Kuxe|3d ago|context

In Sweden $200 is ~5% of average programmer monthly income after tax. $200/h rate is not a representative salary for SEs in South America, Asian countries nor Europe.
If you're running a business I agree it's a no-brainer, but the context here is for personal projects.
26. holoduke|3d ago|context

Come on. The 200 spend on Claude is easily earned back. A few hours of work maximum.
27. selcuka|3d ago|context

According to your parent comment, even in Sweden, it's 8.7 hours of work, more than a full day.
28. martinjc|4d ago|context

Are you aware of how much purchasing power 200 dollars is in china, brazil, thailand or india is? This is an extremely arrogant take.
29. nwienert|4d ago|context

I’ve hired many asian developers anywhere from 1-4k a month.
I get a lot more out of a 200/mo subscription now in a week than I did from them in a month.
Now obviously in today’s world they’d be using a 200/mo subscription themselves. But it’s not like money is nothing, software development doesn’t scale down below 1k/mo for anyone competent even in the poorest areas.
30. xydone|3d ago|context

The point the post you replied to is making is that while you get value out of it, and in your case it's not that expensive, it's just simply not the case worldwide
31. nwienert|3d ago|context

I don’t think you’re really reading between the lines.
32. matheusmoreira|4d ago|context

For the record, 200 USD is around 60% of the brazilian minimum wage.
33. ttoinou|3d ago|context

How about brazilian median software developer wage ?
34. matheusmoreira|3d ago|context

According to Glassdoor statistics, brazilian developers make between 600-1600 USD per month on average. Seniors might rise above 2000 USD.
So a 200 USD subscription falls between 10% and 33% of an average brazilian developer's salary.
35. ttoinou|2d ago|context

So it’s 10x to 3x cheaper…
36. dash2|3d ago|context

Parent’s point was that many many people will get much more than $200 value from the “expensive” model. Sure, a Bihar farmer won’t, but even an Indian software developer may easily do if he or she has Western clients.
37. mrngld|3d ago|context

What's that got to do with the cost of a thing? Are tradesmen in Thailand entitled to Makita tools just because American plumbers can afford them? I'm struggling to understand the entitlement in some of the comments. And even though it doesn't matter I'd point out it's not like OpenAI or Anthropic are making enormous profits at the moment.
38. uberex|4d ago|context

Unless that value is $200 cash in hand it will be hard to afford it for people who just don't have $200.
39. smrtinsert|4d ago|context

I've actually come to believe the overwhelming majority of use cases require nowhere frontier quality so there's that. Much faster execution is just a bonus on top of the much reduced cost
40. margalabargala|4d ago|context

Last time you bought a computer, did you buy the absolute fastest best CPU available?
41. girvo|4d ago|context

Yes, but that was because I could see the writing on the wall with respect to hardware prices being cooked by AI demand, so I built the best computer possible at the time knowing it'd probably need to last me the next 5+ years
So not really comparable. I use Step 3.7 Flash locally, models are good enough for so many coding tasks even at the lower end! (Though I note that calling a 200B model "lower end" is kind of amusing)
42. tacomagick|4d ago|context

DeepSeek through their own API has saved me tons of tokens honestly. Even though it is not as smart as Kimi or Claude, their level of entry is very low with a top up of 2$ and Pay as you go compared to the subscription of Claude or 20$ top up of Kimi
43. praveer13|4d ago|context

For personal use I’m considering using the frontier models from openai or anthropic to create a plan with research and brainstorming etc with enough details for cheap models to be able to follow (glm, deepseek etc) - with openrouter - will monitor how cheap and effective that turns out to be.
44. ImaCake|4d ago|context

You should try out the cheaper models first. I find Deepseek v4 models pretty comparable to sonnet 4.6 but at a fraction of the cost. You might find you just don't need to use the American models at all.
45. tacomagick|3d ago|context

For my case Openrouter breaks Deepseek caching and charges me multiple times over what I pay for Deepseek's API, with 2$ I was able to get around 120M tokens from deepseek easily when Openrouter could only barely do 250k
46. jabroni_salad|3d ago|context

deepseek's direct API is super loosey goosey about caching. On multiple occasions I have gotten cache hits resuming a session from the previous day.
47. lionkor|3d ago|context

Seconding the recommendation to use Deepseek directly via the API. I've burnt 287 million tokens in the last couple of days, costing me a whopping $5.77 USD.
48. mdjxnxnxnd|3d ago|context

I call this the reviewer/implementer pattern.. Opus for planning then ds4/qwen/kimi for.implementation then opus for PR review
49. Fr0styMatt88|4d ago|context

If we can agree that the AI model is at least as capable as a junior engineer or new contractor, how’s that different to saying “software engineering isn’t worth $200 a month”?
Has a very race-to-the-bottom feel to it.
Though in the grand scheme of it, $200/mo probably isn’t the real price either. Also looking at it not just in a vacuum - paying for a product that can change what you get from under you doesn’t seem great anyway.
At least with a locally-hosted model you know what you’re getting.
50. matheusmoreira|4d ago|context

Yeah. There's no way to verify what these providers are doing. The real future is running these models at home. Opus level inference on our own hardware would be a dream come true.
51. IncreasePosts|4d ago|context

How will anyone running home instances be able to compete against people paying some money running much more powerful models on much more powerful hardware?
52. Fr0styMatt88|4d ago|context

It’ll be interesting.
I’m using Qwen3.6:27B at home and mostly Sonnet/Opus (depending on the complexity of the task) at work.
You have to break things down into smaller chunks for the local models. For the bigger cloud ones they can do a lot of the broader thinking.
53. fragmede|3d ago|context

Time is money, but apparently now thinking is money as well. How much is it going to cost to think harder? If it's, say, $10 to use a bigger cloud model, it becomes easier to qualify the cost of thinking.
54. jimbokun|3d ago|context

At some point it will be hard for us to tell the difference.
55. Bombthecat|3d ago|context

Yeah. There always will be a gab. And it will keep growing for the next years...
56. baq|3d ago|context

I dream of having an LLM in a box over usb bought off AliExpress for a year and change now.
The LLM in a box is something you can buy today, but it 1. doesn’t serve over usb by default 2. costs $100k for hardware (not counting electricity) at 100 tps 3. can’t buy this from AliExpress.
Better to put that $100k in t-bills and just buy tokens even at api prices.
57. rescbr|3d ago|context

I understand your point (and definitely want the same), but I do have an almost-AliExpress-LLM-in-a-box: it's an Thunderbolt eGPU dock (that I got from AliE, and it is USB-C...) with a RTX 4060 Ti with 16 GB of VRAM (bought locally for gaming before the price boom)
It's been awesome for embeddings and document OCR!
3D printing a case for it is on my todo list.
58. RazorBucksICO|3d ago|context

The appropriate price is what the output is worth to you. Some people could pay $10,000/month, some $5 and feel like they were breaking even. There is a big jump between convenience and curiosity uses versus business critical.
OpenAI already charges enterprise users a premium purely for that title over on-demand, no-contract usage. Retail users get a good deal. People make a lot of hay about subsidies but this is a very sane approach if you want exposure to these three different types of customers.
59. ImaCake|4d ago|context

Significantly cheaper than comparable models if you are using openrouter [0]. Just yesterday I spent roughly 13 cents centering some divs using Deepseek in a personal project. It would have been north of $1 to do that with a US frontier model.
0. https://openrouter.ai/compare/z-ai/glm-5.2/anthropic/claude-...
60. ipaddr|3d ago|context

For centering divs the free models opencode offers can easily handle that work. DeepSeek V4 Flash is pretty decent.
61. ImaCake|3d ago|context

Sure, but something that is “sonnet tier” is going to get there faster and with less pain. Well worth the 13 cents!
62. ipaddr|3d ago|context

Flash will get their faster then the sonnet tier which involves reasoning which is slow. And you don't need reasoning to center divs.
The sonnet tier sits below claude or chatgpt in terms of price but costs so much more than free models. If you are breaking downtasks now I'm not sure that 13 cents is worth it.
63. arikrahman|4d ago|context

Someone else on this forum put it well, U.S. is trying to achieve AGI at all costs, while Chinese models are seeking widespread adoption.
64. azinman2|4d ago|context

I don't think anthropic/openai/google aren't also seeing widespread adoption. In fact they already have they already have the marketshare.
65. Turskarama|3d ago|context

The difference is that the US companies are using it as a means to an end, they need to make just enough profit that the investors don't all get cold feet before they get to AGI. The Chinese companies on the other hand are trying to be profitable immediately, which means that they're going slower to save development costs.
66. lionkor|3d ago|context

None of the AI companies in the US are on the path to AGI. They are, however, on the path to claiming they have AGI, then subsequently not releasing it and only giving it to the US government to make drones that can bomb the homes of political dissidents.
67. dotancohen|3d ago|context

What kind of off topic political ideology spam is this? Do you not think that the Chinese kill their enemies?
The Chinese are genociding Uyghurs as we speak, purely for being Muslim, in numbers that dwarf any harm the US has done.
68. lionkor|3d ago|context

> in numbers that dwarf any harm the US has done.
The list of wars the US is or was actively involved in[0] is SO LONG that the Wikipedia page is split into multiple different pages.
The main relevant ones are 20th[1] and 21st century[2], for which you better get a good grip on your mouse to scroll down.
I urge you to use your favorite AI to give you a rough summary of direct and indirect casualties of just those wars directly caused, started, or provoked by the US, from these lists.
For example, the "war on terror" alone has, so far, seen around 4.5–4.6 million+ people killed, and at least 38 million people displaced.
[0]: https://en.wikipedia.org/wiki/Lists_of_wars_involving_the_Un...
[1]: https://en.wikipedia.org/wiki/List_of_wars_involving_the_Uni...
[2]: https://en.wikipedia.org/wiki/List_of_wars_involving_the_Uni...
69. metobehonest|3d ago|context

The US funds Israel and it is only those funds and military aid that keep it from collapsing unto itself. That's the state that orchestrates the largest scale genocide by a "first world" power since WW2, as recognized by the United Nations and independent organizations like the Amnesty International.
https://amnesty.ca/wp-content/uploads/2024/12/Amnesty-Intern...
Nothing China did comes close to this.
70. andriy_koval|3d ago|context

> as recognized by the United Nations
its not, this would require voted resolution to declare genocide. It was some report on inquiry by individuals with unknown bias.
71. tsss|3d ago|context

Everyone wants widespread adoption, of course. I'm sure that China is also working on more expensive frontier intelligence models behind doors, but they're lagging behind America on that front. Going for cost-optimized open weight models is their bet to stay relevant in a market where they can't compete for the "luxury" segment. It is important for them to get a foot in the door and maintain a presence in the press to attract future customers, given the general animosity towards China in the west that they need to overcome. Similarly, European providers like Mistral are hopelessly outclassed in every respect and thus try to carve out a niche in the market with regulation and anti-American fearmongering. They position themselves as "privacy-conscious" not out of goodwill but because it is their only chance to survive as a company with an utterly inferior product.
72. arikrahman|3d ago|context

I didn't think of the European angle, that's something I can update in my synthesis.
73. rglullis|3d ago|context

> U.S. is trying to achieve AGI at all costs
If that was true, they would be collaborating with each other and opening up all the results from their work.
74. HappMacDonald|2d ago|context

.. "NOBUS" AGI with a moat at all costs..
75. throwaway-blaze|4d ago|context

Just don't ask it to tell you the events of June 4, 1989.
76. girvo|3d ago|context

Not that it matters but most of the open weight models aren’t actually censored that way: they run another layer on top of to do that. At least some of them do, Step 3.7 Flash locally happily tells me about the Tiananmen Square massacre
77. swingboy|3d ago|context

My work involves asking LLMs about both Tianenmen Square and what’s going on in Gaza, so I can’t use Chinese or American models!
78. matheusmoreira|4d ago|context

> It's increasingly feeling, to me, that theres a gap building up between haves and have nots.
People speak of a permanent underclass.
https://www.nytimes.com/2026/04/30/opinion/ai-labor-work-for...
79. fbrncci|4d ago|context

You made me realize something. I routinely spend upwards of 500$ per month on LLMs for coding (expensed towards clients). However I live in a place where 500$ is around the avg. salary. I’m lucky that I know my way around western clients. Clients who pay these expenses and are happy to work with me because I am still about 50% cheaper than local talent in EU/US, while my salary at home converts to an upper class income at the highest tax bracket.
Which of course causes some unfairness on both ends. Nobody here can compete with me. I often use left over tokens on local client projects; which despite lower pay, still pays off because they now take hours not days or weeks to complete. And nobody in the local clients talent pool can compete with me; unless they charge about half the market rate.
Take away my 500$ monthly grant; and I’d be more or less screwed. Better open models will more or less start to reduce this advantage. It’s not like I positioned myself here on purpose. But it’s definitely a „right place, right time“ situation.
80. swader999|4d ago|context

If you are running multiple agents your cost to them should be multiples less what their roi is.
81. fbrncci|4d ago|context

My costs are 0$ as any token or subscription spend on agents is invoiced as an expense to my clients.
82. kreelman|4d ago|context

Thanks so much for being bold enough to be fairly open about the costs, how you arrange billing and the advantages that's given you.
I've been fooling around with DeepSeek 4 agentically. It's probably not as good as Anthropic offerings, but even those seem to be roiled in politics and strife and DeepSeek 4 is very good IMHO. I'll later try out GLM.
I'm in Australia. The government has set up a "return and earn" scheme to keep aluminium cans, plastic bottles and paper drink cartons out of the waste stream. A laudable project. The money you make from return drink containers is pretty low, $AU 0.1 per container. I've participated to get the rubbish out of natural water streams and to make a nano amount of money on the side.
When I looked at the costs of an app I was getting DeepSeek to help me with, I realised that the several hours I'd spent learning and building had cost something like 8 recycled containers. In my head after doing some DeepSeek stuff, I calculate a "cans per app" metric for myself for fun. I may even setup a simple graph to view my costs that way.
I kind of hope the Anthropics of the world get enough price competition from sources like DeepSeek and GLM to drop their prices significantly. Time will tell.
I'm using the Chinese DeepSeek provider, so everything done there could potentially be taken and used by the CCP... But this is hobbyist learning.
There is probably a market for Deepseek/GLM served from non CCP available servers. I might even look into how hard that would be to setup here.
I also hope that inference focused hardware will come to the fore, reducing energy use and cost. Realistically this will take time though, on the order of years.
Here in Oz, we have community batteries that community members can charge and later draw from. Their electricity prices are competitive. I wonder if someone could setup something like a community battery to run data centres... That way reasonable environmental consideration could be given to inference power generation... This might not work in a market like the US or Europe, but small market size might be an advantage... Who knows.
83. esperent|3d ago|context

> I'm using the Chinese DeepSeek provider, so everything done there could potentially be taken and used by the CCP
As opposed to Anthropic or OpenAI where everything done could potentially be taken and used by the US government.
Also, replace "could potentially" with "will definitely" in both cases, there's no conspiracy here.
We're stuck between two bad positions, so just use the one that's best for you, and wait for a better solution to arrive.
84. usef-|3d ago|context

It's very easy to use other providers. See https://openrouter.ai/ which also lets you filter by where the provider is hosted and their data retention policy.
Jeremy Howard was recommending fireworks.ai as a host of you want to go direct. Or there's Cloudflare.
For subscription alternatives people here on HN seem to mention Open Code Go a lot too https://opencode.ai/go
85. SyneRyder|3d ago|context

> There is probably a market for Deepseek/GLM served from non CCP available servers. I might even look into how hard that would be to setup here.
Please do. There is definitely a market for Deepseek / GLM hosted from non-China servers, there's over 20 providers for GLM 5.2 on OpenRouter alone... and they're all either Singapore (home of Z.AI / GLM), China, or US. There is nothing yet listed on OpenRouter from Europe (Inceptron still only has GLM 5.1). And of course, there is absolutely nothing hosted in Australia.
We're in a particularly dire situation in Australia. We're about to be cut off from Claude Fable and premium American models. The European Mistral models are garbage, at least in comparison to US models. Our only hope is going to be Chinese models (GLM 5.2 is good), and we're not even hosting them in Australia.
By the way, if you haven't tried an Anthropic model, it's worth spending at least $20 one month to give Opus 4.8 a try. I only got one night of access to Fable before I was cut off, but one single evening of Fable provided plans that I've been working through for about a week afterwards with Opus 4.8... and that was only Fable, not even Mythos. That's the kind of intelligence lead Australia is about to be cut off from.
(And kudos on the Containers For Change, that's something I do as well - mostly as an exercise incentive to walk to the local recycling machine, because the money certainly doesn't compensate for the time spent on the recycling.)
86. Sanzig|3d ago|context

Same issue in Canada - domestic inference capability for the open models is woefully behind.
87. trollbridge|3d ago|context

Canada has fewer excuses, given sparsely populated places that are cold with nearly infinite water and extremely cheap electricity.
88. Sanzig|3d ago|context

Yep, agreed. Main issue in Canada is a notoriously slow and stingy investment ecosystem. Resource-wise we're incredibly well positioned.
89. forshaper|3d ago|context

Would you happen to know why there are so many Canadian investments in American telecom?
90. trollbridge|1d ago|context

Canadian telecom is very obstructionist to outside investment and basically structured around rent seeking for exisiting telecoms.
American telecom lets basically anyone come here and spend money.
91. trollbridge|2d ago|context

Canadian firms can easily access U.S. capital markets. So the question remains of why we aren’t building all kinds of data centers out in the tundra next to giant hydro plants.
92. trollbridge|3d ago|context

Hosting in Australia is not feasible at Australian electricity prices.
(Speaking as a not-so-proud Australian.)
93. Mossy9|3d ago|context

Cortecs (EU router) lists GLM 5.2 from Tensorix and Nebius https://cortecs.ai/detailedServerlessView/glm-5.2
So two European providers at least
94. SyneRyder|1d ago|context

Thank you, I haven't heard of Cortecs before. Might see if I can integrate this into my harness, or at least wire up Tensorix.
Also, I don't know how accurate that tokens/per/second measure for GLM 5.2 is, but if that is even remotely true, then I won't complain about the mild markup Tensorix have for GLM ;) Thank you for the heads-up!
95. dudisubekti|3d ago|context

You don't seem to like the "CCP" and their political views, but why are you using their sponsored models?
Why don't you exclusively host and use the open-weight western models, even if right now they don't perform as well?
I'd like to know the psychology behind this, because your actions feel contradictory to me.
96. listic|4d ago|context

Thanks for sharing your insight.
Mind if I ask you for a few vibe coding tips? I failed to solve you gh puzzle in the profile though.
97. lanthissa|3d ago|context

AI is the first technology that doesn't incentivize offshoring, and incentivizes co-location of talent.
A NYC dev and a dev in india have the same ai costs, based the ratio tokens/salary it becomes less of comparative disadvantage to be in NYC.
Now combine that with the fact that AI makes the act of generating code less a % time of the job, and the ability to get/refine requirements more of the job and you have a decent shift.
98. Sammi|3d ago|context

Errr you just responded to someone that is offshore and is using AI to be much cheaper than local talent.
99. fbrncci|3d ago|context

The tokens/salary ratio is not relevant at all. Because while 200-500$ is a lot of money, it’s still a fraction of the salary you’d pay any dev in the world. It just comes out as a tooling expense. It also matters how those devs use the tools; you can’t assume everyone gets the same out of it. So that amount can last a day or it can last a month. I would say a dev in a developing nation would be more budget aware than someone being used to everything being priced in NYC rates.
For example I build other AI products and I have been hyper aware of the token spend of our users. I was going crazy seeing that some users were having 5$ conversations. So that was optimized and I found ways to use sub agents to get it down to 1-2$. Just for management asking me why I was worrying to begin with? The users using these are consultants being paid 120$ per hour. They have a daily 10-20$ token expense, no problem. “But amazing job on the cost reduction.”.. well 5$ for me is what I spend on food daily. While the consultant is slamming: “yes” 10 times in a chat , for whatever reason for the same cost. Would the NYC dev care as much natively? No.
You can still hire three devs in India for the price of a dev in NYC. Now you give them AI and you might only need 1-2. That makes offshoring even more appealing, not less. And the dev in India now having tooling to out compete local talent. Well that’s my reality (I am not in India though).
100. whazor|3d ago|context

The problem is that the differences between flagship and local models are compounding heavily. An 4% different could be massive when you keep iterating on the same code base.
101. swiftcoder|3d ago|context

> The problem is that the differences between flagship and local models are compounding heavily
This depends a lot on how you work, and how much of the architectural thinking you do yourself.
People seem to lose sight of the fact that a flash model today is as powerful as a frontier model from a year ago. If you were happy with GPT 4.x, you should be ecstatic that equivalent power is now basically free...
102. wolttam|3d ago|context

I am one of those ecstatic folk :)
103. jmalicki|2d ago|context

I find that with a lot of the cheaper models, I end up spending a lot more time correcting the easy stuff.
If I am 100% spot on on the architectural stuff, I have anecdata that some of the frontier models might actually be cheaper than "cheaper" alternatives once you look at what it takes to get to good output, since they require less correction.
But that is on pure token costs. When you value the human overseer's time, there is just no competition. A model that is 10x more expensive that requires 10% less oversight is just a plain win.
104. swiftcoder|2d ago|context

> A model that is 10x more expensive that requires 10% less oversight is just a plain win
I think you are drastically underestimating the cost delta here. We're talking models we can run pretty much continuously for $10/month in tokens.
105. jmalicki|2d ago|context

For $400/mo I can run multiple continuous sessions of frontier models.
106. brian-armstrong|3d ago|context

I read these stories and I can never figure out how people are managing to use these $200 plans. If I really go full bore, I can sometimes max out the $20 plan. Even then, it already produces more code than I can reasonably review and merge.
107. ipaddr|3d ago|context

I've maxed out my chatgpt plus the first week and that include an smf forum rewrite. Trying my best I haven't been able to max out again. Things are setup that you need to max out your 5 hour window multiple times which becomes a job in itself.
At work I'm struggling to keep my claude bill around $500.
108. girvo|3d ago|context

Simple: a lot of the people claiming they’re reviewing the output of these models are lying.
Also if you run the “loops” they’re now yapping about, it will burn through enormous amounts of usage as well.
109. hgomersall|3d ago|context

I can't even keep up with the chain of thought needed to manage a single session, let alone review. I typically never exceed 30% of a 5x plan. Fable took me almost to the limits, but not Opus. Claude design hits things harder, but still not to saturation.
110. theoli|3d ago|context

Exactly this, it’s the loops. The first 50k tokens of a task is by far the most valuable. But when left to run independently, the agent will consume millions of tokens of error messages from running tests and discovering a minor syntax error, a missing import, a method call with incorrect parameters, etc. Then it will write some helper program while debugging the main task and get into the same loop debugging minor errors in the helper. From my experience, the vast majority of tokens consumed by Claude Code on totally independent tasks are consumed fixing minor mistakes it just made.
111. RugnirViking|3d ago|context

do you do it for a job (8 hours a day)? and do you work in large, mature projects (more than 5 team members)? A big part of it is dealing with frankly terrible architecture and 15 people's different ideas of how things should work (and the spam theyve been able to do with their own agents makes this worse)
112. alpineman|3d ago|context

With open weight models there is true inference competition. Whoever can serve the model at the lowest price. And the consumer wins. Capitalism, served by China.
113. narrator|3d ago|context

The tokens cost the same everywhere on earth. This does hurt some cost advantages of outsourcing when tokens start to become a bigger part of development costs.
114. giancarlostoro|3d ago|context

As much as I don't like Mark Zuckerberg, part of me wishes he would get his head in the game and compete with these models, he's literally got all the capability to do so, and he could easily sell the model through deals with GCP, AWS, and Azure. Hell, Amazon needs a hot model they can host that's exclusive to them I feel like, maybe he can work something out with them, whatever the case, it seems so glaringly obvious to me, I'm not sure why he hasn't taken a stab at competing with Claude Code or at least frontier open models and then cutting a deal with cloud providers to recoup the costs of maintaining said models.
He's sitting on a frontier model letting it burn a hole in his wallet that could actually pay for itself.
115. khurs|3d ago|context

Meta internally have been using Google Gemini
"Meta has been using Google’s Gemini large language model for most of its moderation and customer support, but staff have recently been told to switch to Meta’s new foundational model, Muse Spark, the people said."
https://www.ft.com/content/39251a31-4a9d-4870-b86c-dc6353d67...
116. giancarlostoro|3d ago|context

It feels really insane to me that they have a model that could be better, but its just sitting there burning a hole in his wallet instead as he chases trying to recreate Grok's companion thing.
117. cameldrv|3d ago|context

Yes, but you’re paying with your data unless you’re hosting with a provider you trust or self-hosting.
118. sixothree|3d ago|context

My first instinct has been - well this is an open source project, what does it matter. But even then, I am guessing that using their service even for open source projects still provides them some value.
119. cookiengineer|1d ago|context

Kind of funny that you're assuming that you are not paying with your data in both cases.
Do I need to remind you how LLMs are being trained? ...or that Anthropic claimed their codebase is 100% vibecoded, making it uncopyrightable by their own logic? ...or that Anthropic took down all Claude Code leaks they could find using DMCA takedown notices? ...or how do you think the caching mechanisms work when there's allegedly no data stored to be able to cache it?
I'm just saying. Anything you build with online models is their training data anyways. Assuming otherwise is pretty stupid at this point.
120. themgt|4d ago|context

I just tested GLM 5.2 out via Z.ai in pi for a little one-off project that was already scoped. It actually did a relatively decent job starting out, and figured important things out from context.
But the reasoning traces became increasingly hilarious, with it getting confused and going in loops, doubting itself. I began to feel almost sad, it was like listening to the internal monologue of someone with anxiety disorder.
It made pretty good progress but wound up going in a lot of goofy loops and doing things a bit "off" from standards I'd hoped it would infer, and finally started going a bit nuts, "This is very confusing.", "OH WAIT", seemingly hallucinating a whole side-quest that didn't make sense and looking at making internal system changes to try to achieve its (now very confused) goal when I pulled the plug.
Without seeing the reasoning traces from Claude/GPT it's hard to really know, but it definitely didn't feel like the same quality of reasoning, even if dogged persistence does wind up actually working eventually.