Best Open Source Self-Hosted LLMs for Coding in 2026


Updated on Mar 31, 2026
ยท 14 mins read
open source LLM self-hosted AI coding LLM Kimi K2 GLM-5 DeepSeek Qwen Devstral local AI

Best Open Source Self-Hosted LLMs for Coding in 2026

The gap between proprietary and open source AI models for coding is narrowing fast. A year ago, self-hosting an LLM for development meant settling for significantly worse performance than cloud-based alternatives like GPT-4 or Claude. In 2026, the best open source models are closing in on proprietary leaders across benchmarks like LiveBench, and some even outperform them on specific tasks like code generation and completion.

Whether you’re a solo developer who wants to keep code off third-party servers, a startup looking to cut API costs, or an enterprise with strict data compliance requirements, self-hosted open source LLMs have become a genuinely viable option for professional software development. In this guide, we’ll cover the best open source models you can self-host for coding, the tools to deploy them, and the hardware you need to get started.

Summary

Top Open Source LLMs for Coding (Self-Hostable):

  1. GLM-5 - 744B MoE, 90% HumanEval, MIT license - Get GLM-5
  2. Kimi K2.5 - 1T MoE with agent swarm, 99% HumanEval, 85% LiveCodeBench - Get Kimi K2.5
  3. DeepSeek V3.2 - 671B MoE, 75.69 LiveBench Coding Avg, MIT license - Get DeepSeek V3.2
  4. Devstral 2 - 123B model, 72.2% SWE-bench Verified (self-reported), 256K context - Get Devstral 2
  5. Qwen3-Coder - 480B MoE, agentic coding model with CLI tool - Get Qwen3-Coder
  6. Devstral Small 2 - 24B model, 68% SWE-bench Verified (self-reported), runs on consumer GPU - Get Devstral Small 2
  7. Qwen 2.5 Coder 32B - Best mid-range coding model, Apache 2.0 - Get Qwen 2.5 Coder

Best Self-Hosting Tools:

  • Ollama - Easiest way to get started locally
  • vLLM - Best for production serving
  • LM Studio - Best GUI for desktop users

Open Source vs Proprietary: How Close Is the Gap?

Before diving into individual models, it’s worth understanding where open source stands relative to the proprietary options you might already be using. LiveBench is a contamination-free benchmark with 23 diverse tasks spanning Coding, Agentic Coding, Data Analysis, Language, Instruction Following, Math, and Reasoning. Questions refresh regularly and are delay-released to minimize training contamination, making it one of the most reliable benchmarks available. The scores below are sourced from LiveBench (March 2026 snapshot).

LiveBench Agentic Coding Average (March 2026)

ModelOrganizationTypeAgentic Coding AvgContext Window
GPT-5.4 Thinking xHigh EffortOpenAIProprietary70.001.05M
GPT-5.3 Codex xHighOpenAIProprietary66.67400K
GLM-5Z.AIOpen Source55.00200K
MiniMax M2.5MiniMaxOpen Source51.67200K
MiniMax M2.7MiniMaxOpen Source50.00205K
Kimi K2.5 ThinkingMoonshot AIOpen Source48.33256K
DeepSeek V3.2DeepSeekOpen Source46.67160K
Devstral 2MistralOpen Source43.33256K
GLM-4.7Z.AIOpen Source41.67200K
DeepSeek V3.2 ThinkingDeepSeekOpen Source40.00160K
Kimi K2 ThinkingMoonshot AIOpen Source38.33256K
MiMo V2 ProXiaomiOpen Source30.001M

LiveBench Coding Average (March 2026)

ModelOrganizationTypeCoding AvgContext Window
GPT-5.2 CodexOpenAIProprietary83.621.05M
GPT-5.1 Codex MaxOpenAIProprietary81.381.05M
Kimi K2.5 ThinkingMoonshot AIOpen Source77.86256K
DeepSeek V3.2DeepSeekOpen Source75.69160K
Kimi K2 InstructMoonshot AIOpen Source74.28128K
GLM-5Z.AIOpen Source73.64200K
GLM-4.7Z.AIOpen Source73.13200K
MiniMax M2.5MiniMaxOpen Source70.70200K
MiMo V2 ProXiaomiOpen Source68.851M
Kimi K2 ThinkingMoonshot AIOpen Source67.44256K
Devstral 2MistralOpen Source66.79256K

The LiveBench data shows that on agentic coding tasks, GLM-5 leads the open source pack at 55.00, followed by MiniMax M2.5 (51.67) and Kimi K2.5 Thinking (48.33). Proprietary models still hold a lead, with GPT-5.4 Thinking at 70.00. On standard coding tasks, the gap is much smaller. Kimi K2.5 Thinking scores 77.86 compared to GPT-5.1 Codex Max’s 81.38, and several open source models cluster in the 70-78 range alongside proprietary options.

For the latest scores and full model list, visit the LiveBench leaderboard directly.

Best Open Source LLMs for Coding

1. GLM-5 (Zhipu AI) - #1 Agentic Coding (55.00)

GLM-5 Open Source LLM by Zhipu AI

GLM-5 from Zhipu AI ranks #1 among open source models on LiveBench Agentic Coding with a score of 55.00, and scores 73.64 on LiveBench Coding Average. Released in February 2026, it uses a Mixture of Experts architecture with 744 billion total parameters but only 40 billion active per token, spread across 256 experts with 8 active at any given time. It was trained on 28.5 trillion tokens and supports a 200K token context window.

What makes GLM-5 particularly noteworthy is its training infrastructure. The entire model was trained on 100,000 Huawei Ascend 910B chips rather than NVIDIA GPUs, making it a significant milestone for non-NVIDIA AI hardware. Zhipu also introduced a novel reinforcement learning infrastructure called “Slime” that reduced hallucination rates from 90% to 34%.

On coding benchmarks, GLM-5 scores 90% on HumanEval and 77.8% on SWE-bench Verified according to Zhipu AI’s official reports. Its 200K context window is generous enough to handle analysis of large codebases in a single session.

Key Specs

  • Architecture: MoE, 744B total / 40B active parameters
  • Context Window: 200K tokens
  • License: MIT (free commercial use)
  • LiveBench Agentic Coding Avg: 55.00 (#1 open source)
  • LiveBench Coding Avg: 73.64
  • SWE-bench Verified: 77.8% (self-reported by Zhipu AI)
  • HumanEval: 90%
  • Self-hosting: Supported via vLLM and SGLang; weights available on Hugging Face and ModelScope

2. Kimi K2.5 (Moonshot AI) - #1 Coding (77.86)

Kimi K2.5 Open Source LLM by Moonshot AI

Kimi K2.5 from Moonshot AI ranks #1 among open source models on LiveBench Coding Average with a score of 77.86, and scores 48.33 on Agentic Coding. It builds on the already impressive Kimi K2 foundation. The original K2, released in July 2025, was a 1 trillion parameter MoE model with 32B active parameters that scored 65.8% on SWE-bench Verified and pioneered the use of the MuonClip optimizer during training on 15.5 trillion tokens.

K2.5, released in January 2026, takes this further by adding native multimodal capabilities (text, vision, and video) trained on an additional 15 trillion multimodal tokens. Its standout feature is Agent Swarm, a capability that lets the model self-direct up to 100 sub-agents across 1,500 coordinated steps using Parallel-Agent Reinforcement Learning (PARL). This makes K2.5 uniquely powerful for complex agentic coding workflows where the model needs to plan, execute, and verify changes across an entire codebase.

On benchmarks, Moonshot AI reports K2.5 delivers 76.8% on SWE-bench Verified, 99% on HumanEval, and 85% on LiveCodeBench. The 99% HumanEval score is the highest among any open source model, demonstrating exceptional raw code generation ability.

Key Specs

  • Architecture: MoE, ~1T total / 32B active parameters
  • Context Window: 128K (256K with updates)
  • License: Modified MIT (commercial use allowed)
  • LiveBench Coding Avg: 77.86 (#1 open source)
  • LiveBench Agentic Coding Avg: 48.33
  • SWE-bench Verified: 76.8% (self-reported by Moonshot AI)
  • HumanEval: 99%
  • Self-hosting: Recommended via vLLM or SGLang; production requires 2x H100 80GB or 4x A100 80GB, 512GB RAM

3. DeepSeek V3.2 - Coding 75.69, Agentic 46.67

DeepSeek V3.2 Open Source LLM

DeepSeek has consistently pushed the boundaries of what open source models can achieve for coding. The V3.2 release scores 75.69 on LiveBench Coding Average and 46.67 on Agentic Coding, placing it among the top open source contenders. It features 671 billion total parameters with 37 billion active, using a Mixture of Experts architecture and a 160K context window. It’s released under the MIT license.

DeepSeek’s lineage in code-specific models runs deep. The original DeepSeek Coder series (1B to 33B) was trained on 2 trillion tokens composed of 87% code and 13% natural language. DeepSeek Coder V2 expanded to support 338 programming languages. V3.2 combines these strengths into a general model that excels at coding, scoring 73.1% on SWE-bench Verified.

The model’s API pricing is remarkably low at roughly $0.27 to $0.55 per million tokens, making it one of the most cost-effective options even before considering self-hosting. For local deployment, the smaller DeepSeek Coder models (6.7B) run comfortably on consumer hardware through Ollama or LM Studio, while the full V3.2 requires enterprise-grade infrastructure.

Key Specs

  • Architecture: MoE, 671B total / 37B active parameters
  • Context Window: 160K tokens
  • License: MIT
  • LiveBench Coding Avg: 75.69
  • LiveBench Agentic Coding Avg: 46.67
  • SWE-bench Verified: 73.1% (self-reported by DeepSeek)
  • Self-hosting: Full model requires multi-GPU setup; smaller Coder variants run on consumer GPUs via Ollama

4. Devstral 2 (Mistral AI) - Coding 66.79, Agentic 43.33

Devstral 2 Open Source LLM by Mistral AI

Devstral 2 from Mistral AI is a 123 billion parameter model specifically designed for agentic software engineering. It scores 66.79 on LiveBench Coding Average and 43.33 on Agentic Coding. Released in December 2025, it scores 72.2% on SWE-bench Verified with a 256K context window, making it one of the most capable code-focused models available. Mistral describes it as 7x more cost-efficient than Claude Sonnet and 5x smaller than DeepSeek V3.2 while remaining competitive in benchmarks.

What makes the Devstral family compelling for self-hosting is the smaller sibling, Devstral Small 2 (24B parameters), which scores an impressive 68% on SWE-bench Verified. That’s remarkable for a model that runs on a single RTX 4090 or a Mac with 32GB of RAM. It also supports image inputs and comes with Apache 2.0 licensing, making it one of the most permissive options available.

Mistral also offers Vibe CLI, an open source terminal coding assistant powered by Devstral, giving you a ready-made development workflow out of the box.

Key Specs (Devstral 2)

  • Parameters: 123B
  • Context Window: 256K tokens
  • License: Modified MIT
  • LiveBench Coding Avg: 66.79
  • LiveBench Agentic Coding Avg: 43.33
  • SWE-bench Verified: 72.2% (self-reported by Mistral AI)
  • Self-hosting: Multi-GPU recommended for full model

Key Specs (Devstral Small 2)

  • Parameters: 24B
  • Context Window: 128K tokens
  • License: Apache 2.0
  • SWE-bench Verified: 68.0% (self-reported by Mistral AI)
  • Self-hosting: Single RTX 4090 or Mac with 32GB RAM

5. Qwen3-Coder (Alibaba) - Best Agentic CLI Tool

Qwen3-Coder Open Source LLM by Alibaba

The Qwen3-Coder family from Alibaba represents one of the most comprehensive open source coding model lineups available. The flagship model features 480 billion parameters with a Mixture of Experts design, and Alibaba describes it as “our most agentic code model to date.” There’s also a smaller 30B variant (3B active) for resource-constrained environments.

The more recent Qwen3-Coder-Next (80B total, 3B active) pushes the envelope further with hybrid attention combined with MoE, trained with large-scale reinforcement learning specifically for agentic tasks. It scores 70.6% on SWE-bench Verified, an impressive result for a model with only 3B active parameters.

Alibaba also provides Qwen Code, an open source terminal coding agent optimized for Qwen3-Coder models. This gives developers a Claude Code or Aider-like experience powered entirely by open source infrastructure.

The broader Qwen ecosystem also includes Qwen 2.5 Coder (available in sizes from 0.5B to 32B), which remains one of the best mid-range options. The 32B Instruct variant scores 73.7 on the Aider benchmark (comparable to GPT-4o) and is readily available through Ollama.

Key Specs

  • Architecture: MoE, up to 480B total parameters
  • License: Apache 2.0
  • SWE-bench Verified: 70.6% (Qwen3-Coder-Next, self-reported by Alibaba)
  • Self-hosting: Qwen 2.5 Coder 32B runs on consumer hardware via Ollama; larger variants require multi-GPU

6. Llama 4 (Meta) - Largest Context Window (10M)

Llama 4 Open Source LLM by Meta

Llama 4 from Meta continues to be the most widely deployed open source model family, with over 650 million total downloads and roughly 9% of enterprise production workloads running on Llama variants. The Llama 4 family released in April 2025 includes Scout (109B total, 17B active, 10M context window), Maverick (400B total, 17B active, 1M context), and the announced but unreleased Behemoth (~2T total, 288B active).

While Llama 4 isn’t specifically a coding model, its massive context windows and multimodal capabilities (text and image input across 12 languages) make it highly versatile for development workflows. The code-specific Llama 4 Coder variant brings improved code generation, debugging, and completion accuracy.

The main caveat is licensing: Llama’s license does not meet the OSI Open Source Definition and includes restrictions for companies with very large user bases. For most developers and smaller organizations, this is a non-issue, but it’s worth noting compared to the MIT or Apache 2.0 licenses of other models on this list.

Key Specs

  • Architecture: MoE, up to 400B total / 17B active (Maverick)
  • Context Window: Up to 10M tokens (Scout)
  • License: Llama Community License (restrictions for very large companies)
  • Self-hosting: Scout and Maverick available via Ollama, vLLM; smaller variants run on consumer hardware

7. StarCoder 2 (BigCode / Hugging Face) - Most Auditable Training Data

StarCoder 2 Open Source LLM by BigCode

StarCoder 2 is a collaboration between Hugging Face and ServiceNow under the BigCode project. Available in 3B, 7B, and 15B sizes, it was trained on 3.3 to 4.3 trillion tokens from The Stack v2, covering 619 programming languages. It uses Grouped Query Attention with a 16K context window.

StarCoder 2’s standout quality is its data transparency. Every training data source is documented with Software Heritage Identifiers (SWHIDs), making it the most auditable coding model available. This matters for enterprises concerned about IP and licensing compliance. The 15B model matches or outperforms CodeLlama 34B (a model twice its size), demonstrating strong efficiency.

While it doesn’t compete with the larger MoE models on raw benchmarks, StarCoder 2 remains an excellent choice for teams that need a lightweight, well-documented coding model they can run on modest hardware.

Key Specs

  • Sizes: 3B, 7B, 15B
  • Context Window: 16K tokens
  • License: OpenRAIL (fully transparent training data)
  • Self-hosting: Runs on consumer hardware via Ollama; 3B variant works on laptops

Honorable Mentions

Several other open source models deserve recognition for specific strengths:

IBM Granite Code Open Source LLM
  • IBM Granite Code - Available from 350M to 34B parameters under Apache 2.0, trained on 116 programming languages with license-permissible data. Granite 4.0 introduces hybrid Mamba-2/transformer architecture using 70% less memory. Best choice for enterprise compliance.
NVIDIA Nemotron-Cascade 2 Open Source LLM
  • NVIDIA Nemotron-Cascade 2 - A 30B MoE with only 3B active parameters that achieves Gold Medal-level performance on competitive programming benchmarks (IMO, IOI, ICPC) with 20x fewer parameters than comparable models. Remarkable efficiency.
Yi-Coder Open Source LLM by 01.AI
  • Yi-Coder - From 01.AI, available in 1.5B and 9B sizes with 128K context and Apache 2.0 license. Yi-Coder 9B scores 85.4% on HumanEval, on par with DeepSeek Coder 33B at a fraction of the size.
Qwen 3.5 Open Source LLM by Alibaba
  • Qwen 3.5 - Released February 2026 with a 397B MoE model, featuring unified vision-language capabilities and support for 201 languages. One of the top-ranked open-weight models across multiple benchmarks.

How to Use These Models with a Coding Agent

If you want a Claude Code or Aider-style workflow with self-hosted models, one of the easiest setups is OpenCode + Ollama. This combination gives you a local coding agent with a simple terminal workflow and no cloud dependency.

Easiest Setup: OpenCode + Ollama

If you’re using Ollama’s built-in Applications flow, the setup is even simpler. The current Qwen 3.5 Ollama page lists a direct OpenCode launch command.

  1. Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
install ollama
  1. Install Opencode
curl -fsSL https://opencode.ai/install | bash
install opencode
  1. Launch OpenCode directly through Ollama Applications
ollama launch opencode --model qwen3.5
opencode
  1. Open your project and start working

Once OpenCode starts, point it at your repository and use it like any other terminal coding agent for explaining code, refactoring files, writing tests, or implementing features.

If you want a smaller local footprint, Ollama also provides smaller Qwen 3.5 tags such as qwen3.5:4b, qwen3.5:9b, and qwen3.5:27b.

Why This Setup Works Well

  • Fastest setup path because Ollama can launch OpenCode directly as an application
  • Runs fully local with no separate model gateway to configure
  • Easy to scale up or down by swapping the Ollama model tag based on your hardware

How to Self-Host These Models Locally

Once you’ve picked a model, you need the right tools and hardware to run it. We’ve covered this extensively in our previous guides:

Quick Decision Guide

Your NeedRecommended ModelWhy
Best overall codingGLM-5 or Kimi K2.5Highest SWE-bench scores among open source
Best on consumer hardwareDevstral Small 2 or Qwen 2.5 Coder 32BStrong performance on single GPU
Best tiny model (<10B)Yi-Coder 9B or StarCoder2-3BRuns on laptops, punches above weight
Best for agentic workflowsKimi K2.5 or Qwen3-Coder-NextBuilt-in agent orchestration
Best for enterprise complianceIBM Granite CodeApache 2.0, ethics-vetted training data
Best efficiency per parameterNVIDIA Nemotron-Cascade 2Gold-medal competitive programming at 3B active params

Conclusion

The open source LLM landscape for coding has matured dramatically. Models like GLM-5 and Kimi K2.5 are closing in on proprietary leaders, while smaller models like Devstral Small 2 and Qwen 2.5 Coder 32B bring genuinely useful coding assistance to consumer hardware. The tooling ecosystem around Ollama, vLLM, and llama.cpp has made self-hosting accessible to anyone with a decent GPU.

For most developers, the practical recommendation is to start with Ollama and Qwen 2.5 Coder 32B if you have 24GB of VRAM, or Devstral Small 2 if you have less. These will handle the vast majority of code completion, generation, debugging, and refactoring tasks. If you need the absolute best performance and have access to enterprise hardware, GLM-5 or Kimi K2.5 will get you within a few percentage points of the best proprietary models while keeping your code entirely on your own infrastructure.

The 44% of organizations that cite data privacy as their top concern with LLM adoption now have no reason to hold back. Self-hosted open source models are production-ready for coding, and the gap with proprietary alternatives continues to shrink with each new release.