Ask HN: Running local LLMs? What's your model and hardware (news.ycombinator.com)

11 points|by alfiedotwtf|1d ago|9 comments|Read full story on news.ycombinator.com

Comments (9)

9 shown

1. roscas|1d ago|context

qwen3-coder:30b
codestral:22b
codegemma:7b
codellama:34b
north-mini-code-1.0:q8_0
laguna-xs.2:latest
Currently testing those above on AMD Ryzen 5 3600x with 48GB of RAM and a nVidia 3080 with 10GB of VRAM.
Favorite model is laguna-xs.2 because it is really fast on CPU and very good.
2. alfiedotwtf|1d ago|context

If you’re able to run qwen3-coder, have you thought about 3.6 27B or 35B? Looking at benchmarks, 3.6 looks its gained a lot over qwen3-coder
3. alfiedotwtf|1d ago|context

Oh! Looks like I’ve been sleeping on Laguna!
4. cyanydeez|1d ago|context

qwen 3.6 35B on 128GB strix halo.
perfect speed to not melt the brain and can extend context for well scoped projects.
need to work with dynamic context pruning to ensure full reuse in larger projects.
deer-flow seems. to work well for project scoping and high level evals. opencode for coding.
5. msalsas|21h ago|context

Quewn3.6 35B A3B on MSI laptop with RTX 5080 (16G VRAM)
6. dlcarrier|20h ago|context

I have a 16 GB Intel A770 and before that used an AMD Mi25.
I've had SDXL stable diffusion working on both, but struggled to get LLMs going. The entire field of software development is already well known for its technical debt and lack of interest in testing (see also: https://xkcd.com/2030/), but anything having to do with AI brings it to an all new level.
You pretty much need to run the same stack the developer used, down to the correct outdated version of Python and every library in use, as well as the same GPU drivers and OS version, or the whole thing falls apart.
Of course, various hardware vendors port everything to their hardware, so I could for example run Intel's OpenVINO version of llama.cpp, but I have the wrong Linux version to run their binaries, and I didn't want to put in the effort of running a new OS, but my computer couldn't finish compiling it overnight, so I gave up on it.
Of course, I could put it all in a VM, but then I'd take a performance hit and need even more RAM.
7. vunderba|16h ago|context

Yeah. It's an unfortunate reality but anything non-nVidia CUDA is effectively a second-class citizen where a lot of this generative stuff is concerned (though AMD with ROCm is getting better).
8. da-x|4h ago|context

Qwen3.6 27B on RTX PRO 6000
9. K0IN|1h ago|context

Qwen 3.6 27b @ 90k context on a rtx 5090 (vllm)