The gap between open weights LLMs and closed source LLMs (blog.doubleword.ai)

300 points|by kkm|1d ago|241 comments|Read full story on blog.doubleword.ai

Comments (241)

1. samat|1d ago|context

Article confuses open source models with open weights models.
Not the same thing.
It’s used right in the articles body, but title is misleading.
2. NitpickLawyer|1d ago|context

Literally no one cares. There are "full" open certified GMO free grass fed training data blah blah models. Apertus, Olmo, etc. No one cares. For all intents and purposes people use the term to describe a model that you can run locally and are allowed to modify and re-release. The rest is useless semantics. No one can "rEpRoDuCe" a model anyway.
3. throwuxiytayq|1d ago|context

No-one cares to quit social media or stop using Windows, but it’s a goal worthy of discussion all the same.
The name is bad, doesn’t even make any fucking sense and it gives open source a bad rep.
4. komadori|1d ago|context

I wouldn't say that no one cares, but obviously many fewer people care when the cost of "recompiling" a model from its open source training pipeline is so high. Also, if you only have the weights, you can still use it to generate training data for a new model (i.e. distillation) so it's inherently less locked down then closed source binaries were.
5. judge2020|1d ago|context

open source vs source-available. Companies taking an extremely cautious approach to AI can't use source data that is potentially a violation of copyright (pending worldwide court decisions and/or regulation on said topic). Although that cat is already out of the bag for basically every stock-traded company using LLMs trained on non-licensed data, so I don't see there being much actual risk in using them.
6. reinitctxoffset|1d ago|context

I was advocating for "available weight" as a value neutral term for a while.
I gave up. No one cares. And no one will ever tell the truth about the training anyways.
Substantial and growing freedom beats zero freedom ever again.
7. jackconsidine|1d ago|context

Achilles and the tortoise [0] is usually a fallacy. If the tortoise has a head start, then Achilles will never catch it because in the time it takes Achilles to reach the tortoise's location the tortoise has moved some degree further, ad infinitum. Obviously not real because Achilles will pass the tortoise -- I think a fallacy because the framing creates a fake asymptote (they will both pass the point where they're approaching a tie).
In this case it may actually apply though, no? Open models get better from closed model distillation?
[0] https://en.wikipedia.org/wiki/Zeno%27s_paradoxes
8. igravious|23h ago|context

comparing a thought experiment about relative movement through an alleged continuum over the sum of infinitesimal quasi-instants to the release cadence and maturation of open weights to proprietary LLMs is super bizarro guy
9. profsummergig|1d ago|context

IMHO, the biggest problem with the future of open weights models is that currently, open weights models are the result of philanthropy by some private org. (e.g. DeepSeek).
The spigot can be turned off at any time.
Until there's some sort of "community owned hardware", open weights models are always at risk of being discontinued.
10. Shitty-kitty|1d ago|context

It's just a smart business decision that allows their models to compete and gain market-share against much pricier private models. No philanthropy there.
11. foxglacier|1d ago|context

It depends how you define philanthropy - obviously corporations don't just donate such valuable products to the world to make it a better place, but in effect that's what they end up doing in their effort to gain market share or brand recognition. Actual human philanthropists are sometimes doing it for the similar reasons of self-promotion.
12. Shitty-kitty|1d ago|context

Open source, Open weights, these are core business decisions.
13. NitpickLawyer|1d ago|context

Yeah, but the biggest plus for open models is that they can never be taken away. In other words, whatever capabilities they reach (even if there will never be another model), those stay forever. That can't be said for API-based models where a provider can sunset models whenever they feel like (i.e. gpt5-mini will soon be gone, and replaced by a more expensive 5.4-mini, same for goog, etc).
And there will always be incentivised parties that release models. Nvda for one has every incentive to keep the nemotron line going, as they're directly profiting from people running this. And the models aren't really far from open SotA anyway.
Goog will probably continue to release the small models, since they'll use them for browser stuff anyway, and know that they'll leak. So for them it's a win-win to release the small models and gain some dev market share.
And the chinese labs also have incentives to keep releasing models, and will likely continue to get gov support to do so (yay commercial wars between nations).
14. felooboolooomba|1d ago|context

> they can never be taken away
Your right to 3d print whatever you want is about to be taken away (in California).
What software you can run on your computer can already be restricted.
Absolutely everything can be taken away. The simplest way to remove open models is probably to declare them a tool that terrorists could use. Crazy? Yes, the world is totally crazy these days.
15. redox99|1d ago|context

That only affects people in California. Whereas Fable being shut down affects people all over the world.
16. anticorporate|1d ago|context

There's also, importantly, a distinction between what are told we can no longer use, and what can actually be taken away.
Open source and open hardware can be called illegal by a government, but, if we collectively invest our energy into open alternatives, they can't be taken away in the same sense. I can build a RepRap printer and I can use a local AI model. It's on all of us to make sure that the open alternatives are viable, maybe in the current global political reality now more than ever.
Making something illegal isn't a disincentive for everyone. When they start banning books, some of us start assembling printing presses.
17. echoangle|1d ago|context

Believe me, if the government wants to stop you from having access to something like that, they could do it. Just give people some incentive to report you and make really harsh punishments and everyone will be thinking really hard about how bad they want have access.
18. dvngnt_|1d ago|context

They can stop piracy or child predators. what makes you think they can prevent access to running models that require no internet access to run
19. NemoNobody|1d ago|context

Piracy is in a practical Golden Age rn and the Epstein Files exist - so the Government doesn't really do either of those things very well at all.
Plus for a certain type of person "Piracy" is more of a philosophical belief or political position - there are fundamentalist equivalent, very proficient, "Pirates" who will under no circumstances stop and are not doing it for money. There are obviously an enormous amount who are in it for the money - "big brand names" now reportedly comprise as high as 63% of the advertising on illicit piracy sites - I'm too lazy to get the link, that sentence ought to be enough tho if you want to look into that bizarre reality.
I'm not certain either of those things are in the Government's direct control - both require society at large to share the belief and essentially choose not to do said activities.
(Regarding your second example, unfortunately most abusers are people children know, the Epstein Class was supposed to be just Q Anon crazy conspiracy stuff, none of this is ok in any fashion. Both exist, one local entirely beyond the government - the other appears to have incorporated people from government.)
My point is simply this - WE determine what the Government can do. What we believe matters more than anything else. Don't ever discredit The People's ability - we are pretty awesome.
20. taneq|1d ago|context

They can’t even stop people typing “can” when they mean “can’t”. :P
21. bijowo1676|1d ago|context

the government is not God, they cant do much beyond declaring anything bad.
It is on people to realize we have the ultimate power and oppose the overreach of government in all ways we can to keep our freedoms.
Freedom is not free, after all
22. anticorporate|1d ago|context

Well, sure. The same could be said of any freedom they want to take away. The responsibility is on us to preserve those freedoms. Free software, open hardware, right to repair, privacy tools, etc. will all be the weapons of the people in the fight against totalitarianism.
23. danny_codes|1d ago|context

Fortunately we have both a democracy and a constitution, making those sorts of things hard for the government to do.
24. Zetaphor|1d ago|context

Because that has worked so well for:
* Drugs
* Media piracy
* Alcohol
* Sex work
* Unlicensed gambling
The government is not an all powerful entity with absolute control over its people. Even in countries under past and present dictatorship there are examples of people getting access to what the government deemed as illegal.
25. echoangle|1d ago|context

I was thinking of this one:
https://en.wikipedia.org/wiki/Executive_Order_6102
Of course you’ll always be able to get access but the risk can be made so high that most people won’t try it.
There are countries that have death penalty on dealing with drugs and really severe prison terms just for having a small amount of drugs. There are still people that do it, but most people are effectively deterred because it’s just not worth it.
26. exe34|1d ago|context

Did that cause the complete disappearance of gold from private hands?
27. echoangle|1d ago|context

Probably not, but I never claimed that’s what happens. But for a regular person, it’s probably a high enough risk to stop doing the thing the government wants to disincentivize.
28. roenxi|1d ago|context

> but the risk can be made so high that most people won’t try it.
Possibly I've been mislead about how easy it is to access illegal drugs. I get the impression most people have actually tried them; or at least have easy access if they feel like it. Although I haven't bothered to look up the stats so maybe that is mistaken.
Hypothetically... say I rent out some server space in Russia, host GLM 5.2 and charge/pay for everything with crypto (one of the privacy coins). How exactly would the US government shut that little operation down? Or make it more risky for any participant than marijuana or torrenting? Even detecting it is an interesting technical challenge. It seems like it'd be low-skill and low-risk, and take insane resources on the part of the US to stamp out for something so harmless. The hydra would be growing heads faster than they could cut them off.
This isn't bars of gold, they can search my house all they like and there isn't going to be a lot in it. They would probably struggle to figure out who I am to do a targeted raid, let alone all the other small fry who could pull of a similar scheme.
29. woctordho|1d ago|context

Fun fact: Hacker News is canonically banned in China, but I'm still talking here. There are plenty of techs to work around region block. The incentive to report somebody is comically called '50w' (500k CNY) and no one gives a shit about it in real life.
30. echoangle|1d ago|context

What’s the penalty if you get caught?
31. felooboolooomba|1d ago|context

They know. They just don't want a international incident to deal with. But if the shit hits the fan, they know.
32. Zetaphor|1d ago|context

Hello from the US! I'll never not be amazed by the fact that we live in a point in human history where language and distance are no longer barriers to the exchange of ideas, despite the efforts of our governments.

33. felooboolooomba|1d ago|context

> That only affects people in California

  First they came for the Californians
  And I did not speak out
  Because I was not a Californian

34. Y_Y|1d ago|context

  Then they came for me
  But it was unsuccessful 
  Because I was not in California

35. pyvpx|1d ago|context

That’s not how any of it (human nature) works
36. vitally3643|1d ago|context

Just like declaring piracy illegal stopped piracy and removed pirated materials from everyone's computers.
Everything cannot, in fact, be taken away. Don't propagandize yourself. Some things, like information, are free. Not even China can prevent all its citizens from accessing Western internet. USGov simply does not have the resources to find and audit every hard drive and USB stick in the country for illegal files. The internet cannot be censored 100% without literally cutting every cable and confiscating every radio.
The software that runs on my computer cannot, in fact, be restricted. It can be declared illegal, but there literally is no mechanism by which it can be enforced other than a government goon standing over my shoulder 24/7.
Some freedoms really cannot be removed without utterly implausible amounts of effort. Arguing otherwise is helping to erode freedom. So stop it.
37. Simran-B|1d ago|context

Remote attestation?
38. advael|1d ago|context

On PCs, the best you could really do is restrict access to certain websites on certain boxes with TPMs the users can't disable. Remote attestation can lock people out of your stuff, but not out of their own stuff. For that you need control of the device. Of course, most mobile phones aren't easy for the user to have control of, but most PCs still are, so long as you scrub the rootkits (e.g. windows) off 'em
39. bijowo1676|1d ago|context

it doesnt even work in the government's own servers to protect their own shit
40. NamlchakKhandro|1d ago|context

You wouldn't download a car?
41. psychoslave|1d ago|context

In Soviet Russia, one couldn't download a car. In modern America, cars upload you.
42. citadel_melon|1d ago|context

Maybe we can each get assigned an AI government goon to look over our shoulders 24/7. Maybe each neuron in my brain will have their own subagent goon. Each mitochondria gets their own subagent government goon. The government will perfectly model my every move. They will perfectly model the smell of my asparagus piss aroma.
43. jgalt212|1d ago|context

> What software you can run on your computer can already be restricted.
Are laws that are inherently unenforceable even laws?
44. fsflover|1d ago|context

With the age verification and whatnot, these laws are getting more enforceable with time.
45. kageroumado|1d ago|context

There’s going to be more and more propaganda about open models being a tool to program your children into being Chinese spies or some other absurd reason, and then a new beautiful law will be enacted unanimously, banning their use. And “thankfully," child-protecting measures will by then be implemented at the OS level.
46. gspr|1d ago|context

Anything can be taken away, yes.
But in a free and democratic society, there's an enormous difference between "the democratically chosen state powers may take something from me" and "a private entity takes away something from me on an inscrutable whim with no recourse".
Neither is good if you don't want the thing taken away. But removing the second mechanism is still a laudable goal.
47. BryanBigs|1d ago|context

I think one is worse than two. Only governments have the power of violence.
48. felooboolooomba|1d ago|context

Currently there's very much of a blur between "democratically chosen state" and "a private entity" (with a lot of money).
49. gspr|1d ago|context

And we want them to. And to have the power to take things away. It's all a matter of degree and details.
50. andai|1d ago|context

Well that's true isn't it? If your goal is to blow up the proverbial Death Star, you want to be running your AI locally, right?
51. tarpitt|1d ago|context

Your right to 3d print has been taken away, but not your ability.
52. drnick1|23h ago|context

> What software you can run on your computer can already be restricted.
Yes, over my dead body.
53. jason_oster|9h ago|context

> Your right to 3d print whatever you want is about to be taken away (in California).
What are they going to do? Fine me for not updating my printer's firmware?
54. oompydoompy74|3h ago|context

Legality doesn’t matter as long as you use open source tools. Buy hardware that you can load your own software or firmware on. Keep backups of the software that you use. Nobody can take away an open weight model sitting on my NAS. Almost everything my family uses is self hosted. Stop depending on the government and companies to do the right thing.
55. jfim|1d ago|context

True, but the capabilities and knowledge of that model are also frozen in time, so the value of that model declines over time.
A model that writes code without knowledge of any language or library changes for half a decade is less useful. A 2021 era chatgpt would be quite quaint in 2026.
Right now the Chinese labs might have incentives to release their models for free, and maybe Google is happy to release open weights today, but I'm sure there are already bean counters at Google salivating at the idea of having Gemini in Chrome as part of a Google AI monthly subscription just like YouTube premium and other Google subscriptions.
56. api|1d ago|context

Fine tuning and updating is far cheaper than training from scratch.
57. charcircuit|1d ago|context

The weights are not frozen in time. You can train the model on new data. It's just a matter of economics of whether you have a leading lab pay for the training or you pay for it. For the past few years having the labs do it has been the economical choice but if they stop doing so the choice will shift back to the users.
58. teleforce|1d ago|context

>True, but the capabilities and knowledge of that model are also frozen in time, so the value of that model declines over time.
Correction: The capabilities and knowledge of that model can be improved via self-distillation, so the value of that model increases over time.
This is where I think self-distillation is the main way forward, and probably the second best thing ever happened to AI/LLM after the transformer.
Based on self-distillation, the value of the open weights models will incease over time for sub-specialization through post-training and fine-tuning.
Please check these very promising recent works and results from MIT/ETH, UCLA and Apple [1],[2,[3]. For example the MIT/ETH self-distillation approach was demonstrated by a single H200 GPU. Apple approach is even simpler that it's simply called Simple Self-Distillation (SSD), pun intended.
[1] Self-Distillation Enables Continual Learning:
https://arxiv.org/abs/2601.19897
[2] Self-Distilled Reasoner: On-Policy Self-Distillation for Large Language Models:
https://arxiv.org/abs/2601.18734
[3] Embarrassingly Simple Self-Distillation Improves Code Generation:
https://arxiv.org/abs/2604.01193
59. anon373839|1d ago|context

> capabilities and knowledge of that model are also frozen in time
I think this matters less than you think. If the spigot turns off, open LLM research is going to have a powerful incentive to focus on post-training to refresh stale base models. And post-training, in general, is so much cheaper and faster than pre-training anyway. I was pretty surprised to learn that GLM-5.2's entire RL training (the part that makes it reliable at agentic tasks) was completed in just TWO DAYS.
60. NemoNobody|1d ago|context

If the world ends all I have to do is power my desktop and I'll have my locals - a decent iteration of Deepseek and a few smaller models, some focused, some just older versions - having several is key tho. They can be cross referenced to limit hallucinations and inaccurate information - this means I can confidently say that I have on my desktop - all of human history, knowledge, discoveries, maths, languages - at least in summary or truncated form (also another bonus of multiple models - will often have more comprehensive total output than one model provides) and all of those models have absolutely no restrictions other than the broadest limits allowed by current laws - so, practically no limits (I bet I could get them all it to explain splitting the atom with minor effort).
I realize that my amazing tool/system of local AI is out of date - I still very much like having it and it is not at all a bad thing to hav. Everyone in theory ought to have a local backup - for just in case.
The fact that people will have this in this one, albeit extreme, example - it would most definitely matter in the event of a societal collapse. Not everyone will have it - can they run those giant data centers off a few solar panels like a desktop PC?
For this one existential reason alone, I recommend everyone at least play around local enough to have a few models functional.
61. UncleOxidant|1d ago|context

> Nvda for one has every incentive to keep the nemotron line going
They're releases so far have been kind of lackluster compared to Qwen and other Chinese models. My suspicion is that Nvidia won't be releasing models that appear to compete with frontier models because that would upset their big customers.
62. anon373839|1d ago|context

Nvidia's future incentives are not clear to me. Their big customers are actively working to develop custom silicon, see e.g. "Open"AI's Broadcom announcement. The more independence their whale customers attain, the more attractive cutting them off at the knees and selling sovereign AI inference hardware directly to businesses and consumers becomes.
This is pure speculation, but I have a hunch that the Nemotron line is intended as a shot across the bow, and that's why its capabilities have been strong but not quite open-frontier level.
63. Bolwin|1d ago|context

> Yeah, but the biggest plus for open models is that they can never be taken away. In other words, whatever capabilities they reach (even if there will never be another model), those stay forever.
In theory yes, but the average person can't really run the big open models.
This is already happening, try to find a provider that still hosts older, especially less popular or succeeded open models.
For me personally, I've been trying to access Kimi K2-0711. There seems to be only one provider left on openrouter (NovitaAI) and 3/4 requests error out
64. veqq|1d ago|context

> NovitaAI is a low cost provider who's strategy seems to be to host as many models as possible for the lowest cost possible so that OpenRouter's routing algorithm will default to them as often as possible. The problem is that they clearly don't spend much time on actually testing and configuring all of the models they provide. There's a reason they are very often the first provider to host a new model. I also suspect that they run models at lower quants than they claim but that is not something I can prove. https://www.reddit.com/r/LocalLLaMA/comments/1mk4kt0/be_care...
65. alfiedotwtf|1d ago|context

> And the chinese labs also have incentives to keep releasing models
Not really.
66. CTDOCodebases|1d ago|context

Is this a valid point when we live in an evolving world. Language changes, facts change etc. Or can everything can just be grabbed from webpages and stored in the context window?
67. fridder|1d ago|context

We need a SETI@Home but for model training
68. kamranjon|1d ago|context

Have been thinking about this a lot lately.
69. 0x3f|1d ago|context

Consumer hardware over the internet is not really suitable for this, AFAIK.
70. baby_souffle|1d ago|context

There's some really early days work on making training loops robust to failure but they all have trade-offs right now.
I remain hopeful that we'll be able to democratize the entire tech stack for this tech.
71. Azantys|1d ago|context

I think model training is pretty hard to do efficiently on a vastly distributed network. If the model cant fit into the VRAM of the node your performance becomes so bad its useless, so a distributed model could only be properly trained if the size of the model doesnt exceed the majority of the nodes VRAM sizes. Maybe there is a different way of doing training but this would be the only way I can see. And it would still be much worse than just using a big datacenter where everything is fully interconnected. BOINC projects work great because its usually just a lot of small compute and memory required so every old desktop and laptop can contribute. Training a model which can compete and is not tiny requires neither low compute or low memory amount. BOINC tasks take minutes usually or sometimes hours but not weeks or months like training a model from scratch. But something like 7B or lower could maybe be trained like this. Im not sure but I think someone is already working on something like this but I dont remember the name of the project.
72. wuschel|1d ago|context

My understanding is that in addition to your comment and the development of a method to separate the training data for distributed learning, the latency/bandwidth of systems connected on the internet is a challenge, too. Information has to be sent around before and after any hypothetical number crunching.
73. charcircuit|1d ago|context

You would probably not be able to go down to the scale of a single PC, but it should be possible to train models focusing on different specialties on different nodes and then have them periodically "mix" together.
74. nmfisher|1d ago|context

With current paradigms, yes. I'm hoping to see more focus on architectures that are more amenable to distributed training in the near future.
75. ainka-ainka|1d ago|context

Here's a project trying that - https://nousresearch.com/nous-psyche
76. alecco|1d ago|context

Also https://pluralis.ai/ has distributed training (though they reach limits within seconds).
And I think https://allenai.org/ has something like this, too.
77. calebkaiser|1d ago|context

This has been a (noble) goal of lots of different projects in the community for a long time. Federated learning projects like Flower have been chipping away at it for a long time. There are many many hurdles to be cleared before anything in this area is super feasible as an alternative, but I applaud everyone who works on it.
78. g023|1d ago|context

Slap the gpus in a car and offset the cost of ownership by supplying the grid for GPU power on the go. Either get paid in rebates or tokens. Contribute to a distributed training/inferencing network.
79. recursive|1d ago|context

This seems backwards. Access to Fable can be removed. I don't see how an open weight model can ever be put back into the bag though.
80. Smaug123|1d ago|context

The model itself, sure; the comment is about the production of more advanced models (to keep open weights near the frontier).
81. recursive|1d ago|context

The proprietary spigots can be turned off at any time also. To me, that seems more likely.
82. jfaat|1d ago|context

I'd call 'more likely' an extremely safe take given that it's exactly what's happening right now
83. ItsMonkk|21h ago|context

Governments can always offer prizes, and any model that sufficiently meets the criteria of the prize would win it and claim a large cash prize. Once claimed, the model would then be free to all forever.
84. ForHackernews|1d ago|context

It's not pure philanthropy: https://gwern.net/complement
85. notnullorvoid|1d ago|context

> Until there's some sort of "community owned hardware"
Or until some bright people figure out drastically more efficient means of training.
86. UncleOxidant|1d ago|context

> The spigot can be turned off at any time.
True. And it's possible that this has already happened at Alibaba Qwen - at least for the smaller models that people had a chance of running at home (122B and smaller).
87. gunalx|1d ago|context

We'll see. The qwen team has always released a few close to sota but proprietary models in between tgeir open releases. We did get 3.6 35B and 27B so its not all set in stone yet.
Its higley unlikely we get another open llama model though after the llama4 flop, even if their muse spark seems pretty good.
88. trollbridge|1d ago|context

Has it though? They've been releasing free models interpersedwith the "Max" models for quite some time.
89. slashdave|1d ago|context

Training these models is not a "hardware" problem.
90. nomel|1d ago|context

I think that simplifies it a bit. You can't train without hardware, which is why the Chinese companies are illegally importing Nvidia cards [1].
[1] https://www.theinformation.com/articles/deepseek-using-banne...
91. adrian_b|1d ago|context

The usefulness of the smuggled NVIDIA GPUs has greatly diminished for AI purposes, because the elimination of NVIDIA as a competitor has allowed the growth of the production of domestic GPUs.
Moreover, China has just demonstrated a supercomputer faster than any US supercomputer, which unlike the US supercomputers, which need GPUs, achieves its high computational throughput with custom CPUs designed in China (implementing an Armv9-A ISA with SME, i.e. the scalable matrix extension, and with BF16/INT8 operations for AI).
The CPUs used in that supercomputer can reach both a computational throughput and a memory bandwidth sufficiently high for training any LLMs (they have fast HBM memory). Their only disadvantage in comparison with the best NVIDIA GPUs is a slightly lower energy efficiency, but China has abundant cheap energy so this is not a serious disadvantage for them.
92. menaerus|1d ago|context

SIMD programmers have to be paid very well then in the China ... Jokes aside, some 2 or 3 years ago I thought that it is becoming inevitable for CPU designs to become an extended versions of their already quite capable vectorized execution engine units.
93. trollbridge|1d ago|context

There is significant evidence they are transitioning to Huawei and other home-grown CPUs and NPUs.
94. 0xbadcafebee|1d ago|context

It was announced in April that Deepseek v4 ran at launch on Huawei Ascend chips. They then shared details of their implementation with other Chinese providers to strengthen the Chinese market against import restrictions (more people buying Huawei leads to more production, cheaper capacity)
95. jmyeet|1d ago|context

How is this a complaint? Once you have the model, you have the model. Download DeepSeek-R1 671B and you have it. You might not get improvements in the future, just like you may not ever get a future release of an open source project. Is that an indictment of open source?
But consider the alternative. OpenAI and Anthropic can shut off your account or API key at any time for any reason. How is this better? You have way more security when you're running your own model.
96. girvo|1d ago|context

> Download DeepSeek-R1 671B
Dunno why you'd want to though, considering v4 Pro (and even Flash) outpace it drastically
97. throwawayffffas|1d ago|context

I don't think that's the case, it's not philanthropy, they are getting something out of it. The labs are learning from one another from the shared models.
Plus I am certain it makes financial sense. I am guessing here but fully utilizing a subscriptions limits probably costs the operator more money than the subscription revenue, that is why anthropic is making such a big stink about the chinese data harvesting. By releasing the weights, you are relieving yourself from that burden because the competition does not need to hammer your subscription service they can just download your model and analyze it and run it all day.
Also for the largest models it makes no sense to run it yourself unless you are a major player. Renting the hardware is ludicrously more expensive than their subscription tens of thousands of dollars. And buying the hardware to run them is in the hundreds of thousands of dollars.
98. yorwba|1d ago|context

The primary benefit of releasing weights is the attention it generates. Some people have the hardware to run it, try it out because it's free, tell everyone about it, and then even people who don't have the hardware might get interested and pay the original developer. So it's a marketing expense, basically.
The most popular LLM product in China is Bytedance's Doubao. You probably haven't heard of them since they never released weights and don't benchmark particularly well, but Bytedance already had enough users on its other apps that they could directly advertise Doubao to.
99. bijowo1676|1d ago|context

I believe we are still very very early in AI development, so it doesnt even make sense to close models.
Open source and open weights model is how you can harness the potential of all humans to continue development and improving the SOTA of your model. Literally every student on the planet wants to play and improve these models for their own use case.
Plus the ecosystem, once you have users in the ecosystem on your open weight model, this is a giant leverage point in itself
100. FooBarWidget|1d ago|context

That's not meaningfully different from philanthropy. If Chinese AI products generate sufficient revenue with cheaper marketing strategies, then the incentives for releasing open models will go away.
Right now, there is a shortage of talented researchers, and the attention that open models generate allow them to attract good hires. But this is a fragile dynamic that can break in the future. It's not very different from commercial open source work, except it's much more capital intensive and lower volume.
101. gwerbin|1d ago|context

Isn't this also true of a lot of FOSS software and libraries? tensorflow and pytorch for example, among many others.
102. 40four|1d ago|context

We should address the elephant in the room. The problem with the future of open weight models is not they are created as a result of philanthropy by some private org. All of the top contenders are created by the Chinese government.
I don’t think we should describe these companies as simply releasing these highly capable open weight models out of the goodness of their hearts
103. psychoslave|1d ago|context

Bhutan didn't release any model yet as far as I know, if the level of care government give to people actual happiness is what are supposed to be concerned about here.
Among over countries that are consistent being on top on gross national happiness are Finland, Denmark, Iceland, Switzerland, and the Netherlands. Among them the current abilities to release open models is observable.
USA unfortunately continues to fall down quickly in World Happiness Report rank, and that's not because many other countries made great progresses.
104. nmfisher|1d ago|context

None of those companies are created by the Chinese government. They're obviously subject to the Chinese government, whose whims may change at any given moment, but as we're seeing at the moment, so are the American companies.
And while I don't have a very positive view of the Chinese government, last I checked, they haven't been dropping bombs on innocent schoolchildren recently.
105. cheesecakegood|1d ago|context

Bombs are a bit of a non sequitur here. The point is that Chinese companies are demonstrably hostile to American ones historically (and threatening in some specific structural ways to the American consumer). The presentation may be similar but to attribute American ethics to a Chinese decision is dubious.
106. defrost|1d ago|context

Isn't the nature of capitalism such that many companies are demonstrably competitive (aka 'hostile' ?) with one another?
Chinese companies have also demonstrably pandered to the American consumer for many decades now.
To further muddy the waters, US companies have, some would argue, been openly hostile to the American consumer via monopoly practices, restricting access to purchased devices, etc.
107. 40four|1d ago|context

Hey I hear you, I’m not trying to make this a political argument of who’s dropping bombs on who, or the American government is better than or worse than the Chinese. But what I said is a matter of fact.
We can debate the semantics of whether “created by” or “subject to” means the same thing in regards to the Chinese government, but that is neither here nor there.
I’m happy to take your wording that they are obviously “subject to” the Chinese government. That logically means they are subject to carrying out the CCP’s long term strategy. And as you said “whose whims may change at any given moment”.
That directly relates to the OP’s fears, that these models could be taken away at any given moment. “The spigot can be turned off at any time” as they put it.
Or another possibility is they will never turn the spigot off, but they will engineer it in a way to best achieve their goals. My bet is that’s the more likely outcome.
I simply disagree with the OP’s description of the problem as “open weights models are the result of philanthropy by some private org”, I think the problem is much more complicated than that
108. nmfisher|1d ago|context

What you said is not "a matter of fact" because it's simply untrue.
These companies were not "created by" the Chinese government. Specifically, I'm talking about DeepSeek, Zhipu, MiMo (Xiaomi), Kimi (Moonshot), Qwen (Alibaba). "Subject to" certainly does not mean "created by", it just means that the government ultimately has the power to tell them what to do. The US government has the exact same power, hence why none of us has access to Fable at the moment, but you wouldn't say that OpenAI or Anthropic were "created by" the US government.
There is zero evidence that open-sourcing their models is part of some grand strategy from the Chinese government. In DeepSeek's case, I think it probably is a genuine commitment to open source, for the others I think it's probably just a convenient business decision to gain market share (though Zhipu is probably more aligned, given their academic lineage from Tsinghua).
At some point in the future, the Chinese government may decide it's not in their national interest for Chinese companies to open source their frontier AI models, and DeepSeek et al will be restricted from doing so. I'm well aware of that. But until that point in time, the rest of the world is unanimously better off with open-source Chinese models. We should put as much reliance on Chinese companies long-term as we do on American companies - zero.
109. mkozlows|1d ago|context

Go to chat.z.ai right now and ask it about what happened in Tiananmen Square. Do you think it's good for the world if software is written by the model that answers that question that way?
110. nmfisher|15h ago|context

It makes no difference to me if a coding model has an opinion about Tiananmen Square, Americans bombing schoolgirls in Iran, how many genders there are, or anything else other than designing and writing code.
A coding model is a tool, as long as it follows its user's instructions for building software I don't really care what opinion it spits out about world history.
Yes, it is important to ensure that aren't hidden guardrails that are affecting its ability to perform its function. But the great thing about open weight models is that you can actually evaluate this rigorously, and retrain to remove any prejudices you don't like.
111. alfiedotwtf|1d ago|context

Exactly my worry. I’m optimistic in the future the EU, the EFF, the GNU, or the Linux Foundation could have been the umbrella to run a LARGE open model for everyone.
It’s sad to think that Mozilla spent years and millions doing virtual reality and AI, they would have been perfect to do this but let’s face it - who knows if Mozilla will be around even 5 years from now
112. Eridrus|1d ago|context

I think the bigger issue is the ever increasing capital requirements, which may cause even the closed weight companies to fall away from the frontier, e.g. Google & Meta are barely hanging on. For Google it feels a bit existential to remain at the frontier, but even then they're barely there.
I hope that we find ways of continuing to improve these models besides continuing to exponentially increase capex spend until all but one of your competitors falls away.
113. Onavo|1d ago|context

Google and Meta's failures are more due to mismanagement no?
114. disgruntledphd2|1d ago|context

At times of rapid change, having a working business model can be a disadvantage.
For instance, Facebook were able to optimize their core ads product for mobile, in a way that was much more difficult for Google.
115. Eridrus|1d ago|context

I have no idea what's going wrong inside Google/Meta, they certainly have capital. But when you need this much cash, not many people are going to be able to have a shot on goal. It would not surprise me if Meta threw in the towel. Microsoft and Apple aren't even trying.
116. ehsankia|1d ago|context

Isn't another issue that most successful open models are distilled from closed models, but closed models are putting more and better safeguards against distillation?
117. alecco|1d ago|context

> Until there's some sort of "community owned hardware"
The hardware is already available for renting at reasonable prices. We need community funding. I wish people pooled a fraction of the money they burn on local GPU rigs on funding training/testing/etc.
A big problem is like in open source: it's way too atomized. Just one competitive ground-up community LLM would require tens of millions $. But who gets to pick?
IMHO the only chance is highly specialized and smaller LLMs instead. And this is still millions to train.
And remember LLMs are competitive for only a handful months.
118. matheusmoreira|1d ago|context

I wish we had some kind of distributed training capability... Like Folding@home, but for LLMs.
119. woctordho|1d ago|context

See the recent advance of DiLoCo at Nous Research and Prime Intellect.
120. matheusmoreira|17h ago|context

Really interesting!! This gives me hope!