Google's Gemma 4 model is a mixture-of-experts architecture that runs well on local hardware, offering zero API costs, no data leaving the machine, and consistent availability. The 26B-A4B variant is a sweet spot for local inference, delivering roughly 10B dense-equivalent quality at 4B inference cost, making it suitable for tasks like code review, drafting, and testing prompts.