ADE-Bench Benchmark Results

altimate-code + DeepSeek V4 Pro achieves 78.0% pass rate (32/41 tasks on DuckDB) — matching Sonnet 4.6 at a fraction of the cost.

Model
Database

About ADE-Bench

ADE-Bench is a benchmark created by Benn Stancil (founder of Mode) in collaboration with dbt Labs. It evaluates AI agents on real-world analytics and data engineering tasks using actual dbt projects and databases. Each task runs in a Docker container sandbox; the agent attempts to resolve the task, and success is measured by whether all dbt tests pass afterward. Tasks include realistic data problems: vague requests like "it’s broken," debugging, schema issues, and complex analytics queries.

Test Configuration

Harness and LLMaltimate-code (DeepSeek V4 Pro)
Model sourceOpenRouter (deepseek/deepseek-chat-v4-pro)
DatabaseDuckDB (local)
Total Tasks41
Max Retries on failures3
Best Run (pass@3)32/41 (78.0%)
Single-run range26–28/41 (63–68%)

Benchmark Comparison

Agents evaluated on ADE-Bench with DuckDB.

altimate-code(DeepSeek V4 Pro · DuckDB) — 32/4178%
altimate-code(Sonnet 4.6 · DuckDB) — 32/4178%
dbt Labs(Sonnet 4.5 · DuckDB) — ~25/4359%
Source →
Claude Code(Sonnet 4.6 · baseline · DuckDB) — ~17/4340%

Key Insight: The Harness Matters More Than the Model

Across both benchmarks, altimate-code on Sonnet 4.6 beats competitors running Opus 4.6 — a more capable, more expensive model. Purpose-built tooling and deterministic operations outperform raw model capability alone.

The harness — not the model — is the differentiator.

Per-Task Results — DuckDB

Best Run — 32 passed, 9 failed out of 41 tasks

#TaskResultScorePass Rate
1airbnb00110/10100%
2airbnb00211/11100%
3airbnb0037/7100%
4airbnb0042/2100%
5airbnb0054/4100%
6airbnb0067/7100%
7airbnb00711/11100%
8airbnb0084/4100%
9airbnb0090/10%
10analytics_engineering0011/1100%
11analytics_engineering0022/2100%
12analytics_engineering0032/2100%
13analytics_engineering0041/250%
14analytics_engineering0053/3100%
15analytics_engineering0064/757%
16analytics_engineering00710/10100%
17analytics_engineering0081/1100%
18asana0012/2100%
19asana0023/3100%
20asana00316/1794%
21asana0045/683%
22asana0057/887%
23f10016/6100%
24f10029/1090%
25f10034/4100%
26f10042/2100%
27f10054/4100%
28f10064/4100%
29f10076/6100%
30f10091/1100%
31f10102/2100%
32f10115/683%
33intercom0012/2100%
34intercom0024/4100%
35intercom0032/2100%
36quickbooks00112/12100%
37quickbooks0028/8100%
38quickbooks0035/1435%
39quickbooks00448/48100%
40simple0011/1100%
41simple0021/1100%

Sources

ADE-Bench FAQs