NewsLab
Jun 28 17:12 UTC

Ask HN: Running local LLMs? What's your model and hardware (news.ycombinator.com)

11 points|by alfiedotwtf||9 comments|Read full story on news.ycombinator.com

Comments (9)

9 shown
  1. 1. roscas||context
    qwen3-coder:30b

    codestral:22b

    codegemma:7b

    codellama:34b

    north-mini-code-1.0:q8_0

    laguna-xs.2:latest

    Currently testing those above on AMD Ryzen 5 3600x with 48GB of RAM and a nVidia 3080 with 10GB of VRAM.

    Favorite model is laguna-xs.2 because it is really fast on CPU and very good.

  2. 2. alfiedotwtf||context
    If you’re able to run qwen3-coder, have you thought about 3.6 27B or 35B? Looking at benchmarks, 3.6 looks its gained a lot over qwen3-coder
  3. 3. alfiedotwtf||context
    Oh! Looks like I’ve been sleeping on Laguna!
  4. 4. cyanydeez||context
    qwen 3.6 35B on 128GB strix halo.

    perfect speed to not melt the brain and can extend context for well scoped projects.

    need to work with dynamic context pruning to ensure full reuse in larger projects.

    deer-flow seems. to work well for project scoping and high level evals. opencode for coding.

  5. 5. msalsas||context
    Quewn3.6 35B A3B on MSI laptop with RTX 5080 (16G VRAM)
  6. 6. dlcarrier||context
    I have a 16 GB Intel A770 and before that used an AMD Mi25.

    I've had SDXL stable diffusion working on both, but struggled to get LLMs going. The entire field of software development is already well known for its technical debt and lack of interest in testing (see also: https://xkcd.com/2030/), but anything having to do with AI brings it to an all new level.

    You pretty much need to run the same stack the developer used, down to the correct outdated version of Python and every library in use, as well as the same GPU drivers and OS version, or the whole thing falls apart.

    Of course, various hardware vendors port everything to their hardware, so I could for example run Intel's OpenVINO version of llama.cpp, but I have the wrong Linux version to run their binaries, and I didn't want to put in the effort of running a new OS, but my computer couldn't finish compiling it overnight, so I gave up on it.

    Of course, I could put it all in a VM, but then I'd take a performance hit and need even more RAM.

  7. 7. vunderba||context
    Yeah. It's an unfortunate reality but anything non-nVidia CUDA is effectively a second-class citizen where a lot of this generative stuff is concerned (though AMD with ROCm is getting better).
  8. 8. da-x||context
    Qwen3.6 27B on RTX PRO 6000
  9. 9. K0IN||context
    Qwen 3.6 27b @ 90k context on a rtx 5090 (vllm)