With the release of Qwen 3.6, we are there: for 80% of what you do, there is no need for a premium cloud model. A quantized model running on a decent consumer GPU is more than enough. Beyond the obvious privacy concerns, the real question is not quality. It is freedom.

What is better: firing off iteration after iteration without thinking about token costs, or weighing every single question because one miss could cost real money? I like testing things, throwing out a bad idea, and rewriting it three times just to see what happens. That is not how you work when every generation costs cents that quickly add up to euros.

That does not mean premium cloud models no longer have a place. But with the intelligence of current open-source models, you can often compensate for model size with a good agent, the right tooling, and web access.

With a 27B model for coding and a 35B-A3B model for conversation, I easily get 150+ tokens per second through Hermes, and I have not even bothered with MTP yet.

Once everything is set up, you can tune, adapt, and improve your own system with the help of a harness. That is what makes it fun.

Privacy Preference Center