The author is running a 26B-parameter Mixture-of-Experts model on a 2016 Intel Xeon with 128 GB DDR3 RAM and no GPU, using a custom fork of the inference engine and various optimizations to achieve reading speed. The author emphasizes the importance of understanding how the inference engine works to overcome the usability moat in open-weight AI.