Hey guys,
What’s currently the best LLM for low-VRAM machines with only 6 GB VRAM? I’ve got 32GB RAM as well.
I’m experimenting a little with SillyTavern and I’m curious which model gets the most out of my setup. Should be multilingual and suitable for “casual chatting”.
I know I will probably not get very far with this, but I’m still interested in how far we’ve already come.
(Using KoboldCPP if that matters).
~sp3ctre


There are many excellent options - far too many to list. So I will briefly say - there are some really nice 4B models (like Qwen3-4B HIVEMIND, Nanbeige, IBM Granite 3B) which you should be able to run at higher quants (Q6 and up) quite nicely. Of course, there are always newer models (Gemma, Qwen3.6 - soon 3.7) etc.
Best bet is to poke around hugging face, on TheBloke, Unsloth or DavidAUs archives and see what they have in the 3-7B range that tickles your fancy. Don’t immediately jump for the newest releases - the old ones are still good. Qwen3-4B 2507 instruct is still a favourite of mine and more recently Qwen3.5-2B shows promise.