Running an LLM locally used to mean settling for something that felt like typing into a blender. That’s no longer true. A mid-range gaming GPU now runs Llama 3.3 70B at speeds that are genuinely useful for coding and document work, and a Mac Studio with 192GB of unified memory fits the same model in full precision without any quantization. The question is just which hardware makes sense for what you actually want to run.