An evolution of the Karpathy LLM Wiki Pattern

Compiled Knowledge
Pattern LLM

A file-native approach to LLM knowledge bases that pre-compiles semantic relationships at ingest time — delivering faster, higher-quality responses without vector databases, embedding models, or infrastructure.

Author Alessandro Marocchini
Published 20 May 2026
Format Any (.md, .html, .txt)
Infrastructure Zero
Works with Any LLM, any agent
Builds on Karpathy LLM Wiki
01 — The Problem

RAG retrieves. Wikis compile.
Neither routes.

Karpathy's LLM Wiki pattern is a genuine improvement over RAG: instead of retrieving and forgetting, it compiles knowledge into structured files loaded directly into context. The heavy reasoning happens at write time, not at query time.

But even compiled wikis have a structural gap: when you load a knowledge base, you load all of it. The LLM receives 20 files to answer a question that needs 3. Context is noisy. Response quality drops. Speed suffers.

And the connections between concepts — the "see also" links — are plain text. The LLM must infer the strength and type of every relationship on every query, from scratch, every time.

RAG
Retrieves chunks, forgets context
Runtime embedding cost
Vector database required
Probabilistic, misses precision
LLM Wiki (Karpathy)
Loads all files into context
No retrieval step — clean
No infrastructure needed
Relationships are implicit prose
The missing piece

A lightweight routing layer that tells the LLM which files to load, and pre-declares how concepts relate — computed once at ingest, reused on every query.

"Load less. Reason more. Compute once, reuse forever."
02 — The Pattern

Five additions.
Same files. Same workflow.

CKP adds five structured fields to the header of each knowledge file. No new tools. No new infrastructure. The same folder, the same Git workflow, the same "load into context" approach — just richer files.

The five fields
FieldPurposeMax sizeBenefit
TLDR
One-sentence summary optimised for LLM consumption
1 sentence
Immediate answer without reading the full file
ANSWERS_WHEN
Keywords that signal this file is relevant
5–10 words
Fast keyword routing — BM25-style
SIMILAR_HIGH
Concepts this file directly depends on or extends
Max 3
Always load these together
SIMILAR_MID
Concepts in the same domain, often co-relevant
Max 5
Load conditionally based on query
VALIDATED
Timestamp per relationship, not per file
YYYY-MM per entry
Staleness detection at relationship level

There is no SIMILAR_LOW field. Weak relationships add noise, not signal — the LLM infers them from content.

Why categorical tiers instead of decimal scores

LLMs are significantly more consistent when classifying into categories than when assigning decimal scores. A score of 0.73 from one session may be 0.61 in another. But HIGH vs MID — with a fixed rubric and anchor examples — produces stable, reproducible results across any LLM and any session.

TierDefinitionCapAnchor example
HIGH
One concept requires understanding the other. Direct dependency or extension.
3
jwtoauth2
MID
Same domain, frequently relevant together. No direct dependency.
5
jwthttp-headers
Loosely related, rarely co-relevant. Not stored.
0
jwtdatabase
03 — File Anatomy

A knowledge file,
fully annotated.

The CKP header is format-agnostic. The example below uses a plain text block that works in Markdown, HTML, or any other format. The body below the header is unchanged — write it however you normally would.

knowledge file — any format
---
CONCEPT:      JWT Authentication
TLDR:         Stateless token-based auth — server signs a token, client stores and sends it.
ANSWERS_WHEN: authentication, token, login, bearer, stateless, jwt, auth

SIMILAR_HIGH: oauth2:2025-03, refresh-token:2025-03, bearer-token:2025-03
SIMILAR_MID:  http-headers:2025-03, cors:2025-03, api-security:2025-03

CONFIDENCE:   high
VALIDATED:    2025-03
---

# JWT Authentication
Your normal content here. Nothing changes below the header.
Write in Markdown, HTML, plain text — whatever fits your workflow.
Timestamp per relationship, not per file

The date on each SIMILAR entry (oauth2:2025-03) records when that specific relationship was last validated — not when the file was last touched. This makes staleness detectable at the relationship level, not just the file level.

If oauth2 was updated after 2025-03, that specific relationship is potentially stale and should be re-evaluated at next ingest. The rest of the file's relationships remain valid.

The index file

A single index file at the root of your knowledge base aggregates only the headers — no body content. It is the only file always loaded into context. Everything else is loaded on demand.

index file
---
CONCEPT:      jwt-authentication
TLDR:         Stateless token-based auth — server signs a token, client stores and sends it.
ANSWERS_WHEN: authentication, token, login, bearer, stateless, jwt
---

---
CONCEPT:      oauth2
TLDR:         Delegated authorisation framework. Issues access tokens on behalf of users.
ANSWERS_WHEN: oauth, authorization, delegate, scope, grant, flow
---

... one header block per file. No body content.

The index stays small regardless of knowledge base size because it contains only the five header fields — never the concept body. With 50 files, it remains a few hundred lines.

04 — The Ingest Process

Computation happens once.
Reused on every query.

Ingest is the only moment where semantic work occurs. At query time, the LLM reads pre-computed relationships — it never re-derives them.

When ingest runs
trigger Every time you create or modify a knowledge file. In code: a git hook. In an agent: a rule in AGENT.md.
01 LLM reads the new or modified file and the current index.
02 LLM applies the fixed rubric (see below) to classify relationships against all existing concepts.
03 LLM writes the updated header into the file: SIMILAR_HIGH, SIMILAR_MID, timestamps, VALIDATED.
04 LLM updates the index with the new header block. No other files are touched.
The fixed rubric — copy this into your ingest prompt
ingest prompt — rubric block
You are updating the header of a knowledge file.
Classify the relationship between this concept and each existing concept
using exactly these three tiers:

HIGH — one concept requires understanding the other.
     Direct dependency, extension, or prerequisite.
     Anchor: jwt ↔ oauth2 = HIGH

MID  — same domain, frequently used together.
     No direct dependency.
     Anchor: jwt ↔ http-headers = MID

NONE — loosely related or rarely co-relevant. Do not store.
     Anchor: jwt ↔ database = NONE

Rules:
- SIMILAR_HIGH: select at most 3. If fewer qualify, write fewer.
- SIMILAR_MID:  select at most 5. If fewer qualify, write fewer.
- Do not store NONE relationships.
- Append today's date (YYYY-MM) to each entry: concept:YYYY-MM
- Only update the header. Do not modify the body content.
Using in AGENT.md

For coding agents like Claude Code, add this rule to your AGENT.md. The agent treats it as a mandatory step after every file write inside your knowledge folder.

AGENT.md — knowledge base rule
## Knowledge Base — Mandatory Rule

Whenever you create or modify a file inside /knowledge/:

1. Re-read the file you just wrote.
2. Apply the CKP ingest rubric (see below) to classify
   relationships against the current index.
3. Update the file header: SIMILAR_HIGH (max 3),
   SIMILAR_MID (max 5), timestamps, VALIDATED.
4. Update /knowledge/index with the new header block.

This step is part of the task. It is not optional.

[paste the rubric block here]
05 — Query Time

Load less. Answer faster.
Zero computation.

At query time the LLM performs no semantic computation. It reads pre-compiled structure and routes accordingly.

01 Load index only. The index contains all TLDRs and ANSWERS_WHEN keywords. It is always in context. Body files are not.
02 Keyword match. Query terms are matched against ANSWERS_WHEN across all index entries. Relevant files are identified in one pass.
03 Load matched files + their SIMILAR_HIGH. Files that match the query are loaded. Their SIMILAR_HIGH entries are loaded unconditionally — they are always needed together.
04 Load SIMILAR_MID if relevant. MID entries are loaded only if the query also relates to their domain. Checked against ANSWERS_WHEN of each MID candidate.
05 TLDR first. If the TLDR in the index already answers the query, the full file may not need to be read. Speed gain for simple queries.
Result

A query that previously loaded 20 files now loads 2–4. Context is denser, response quality is higher, and no vector database or embedding model was invoked at any point.

Staleness at query time

Query time is read-only. The LLM never writes during a query. If it notices a relationship timestamp is older than the target file's VALIDATED date, it notes this internally but still loads the most recent version of the file. Re-evaluation happens at the next ingest — not now.

06 — Examples

The same pattern,
two different domains.

Coding agent — authentication module
Query: "how do I implement refresh token rotation?"
Index scan:
  "refresh" + "token" → matches jwt-authentication

Load:
  jwt-authentication          ← query match
  oauth2:HIGH                 ← always load with jwt
  refresh-token:HIGH          ← always load with jwt
  http-headers:MID            ← relevant to token transport

Skip:
  cors, api-security, database, cryptography ← not relevant

Result: 4 files loaded instead of 18.
Civic intelligence platform — territorial security
knowledge/sicurezza-territoriale
---
CONCEPT:      sicurezza-territoriale
TLDR:         Crime statistics by geographic area via ISTAT SDMX API, with trend and anomaly detection.
ANSWERS_WHEN: crime, security, reati, statistics, ISTAT, territory, area, criminalità

SIMILAR_HIGH: istat-sdmx-api:2025-04, geographic-filtering:2025-04, crime-statistics:2025-04
SIMILAR_MID:  sentiment-analysis:2025-04, civis-reporting:2025-04, anomaly-detection:2025-04

CONFIDENCE:   high
VALIDATED:    2025-04
---
Query: "quanti reati a Fiumicino nel 2023?"
Index scan:
  "reati" + "Fiumicino" → matches sicurezza-territoriale

Load:
  sicurezza-territoriale     ← query match
  istat-sdmx-api:HIGH        ← data source, always needed
  geographic-filtering:HIGH  ← "Fiumicino" signals location
  civis-reporting:MID        ← civic context, conditionally relevant

Skip:
  rag-chat, multilingual-support, social-analytics
07 — Benchmarks

Measured results on
real codebases.

The following benchmark was run on a real NestJS codebase (Mentat — a civic intelligence platform) using Claude Sonnet. Two environments were compared: Environment A loaded all knowledge files with no routing (no CKP); Environment B used full CKP routing. Each of 5 queries was run 3 times per environment.

66% Token reduction at 11 files
~85% Projected at 30 files
100% Accuracy with CKP
$0 Extra infrastructure
Token reduction by knowledge base size
Files in memory-bankAvg tokens — No CKPCKP tokensReduction
3 files
564
522
8% — below break-even
11 files
4,800
1,618
66.3%
30 files (projected)
~13,000
~1,900
~85%
Answer accuracy
No CKP — 11 files
Q1 — ISTAT SDMX fetch: correct
Q2 — Anomaly notifications: correct
Q3 — Auth (JWT Guards): incorrect
Q4 — Stripe (out of domain): hallucinated
Q5 — Project TLDR: correct
Accuracy: 60% (9/15)
CKP — 11 files
Q1 — ISTAT SDMX fetch: correct
Q2 — Anomaly notifications: correct
Q3 — Auth (JWT Guards): correct — auth file only
Q4 — Stripe (out of domain): correct — no hallucination
Q5 — Project TLDR: correct — from index only
Accuracy: 100% (15/15)
The most important finding

On out-of-domain queries, No-CKP hallucinates connections between unrelated modules. CKP loads nothing and answers honestly. Reduced context is not just a cost saving — it is a hallucination risk reduction.

Estimated cost saving — Claude Sonnet ($3 / 1M input tokens)
Team sizeQueries / daySaving / daySaving / month
Small team
1,000
$9.55
$286
Mid-size team
5,000
$47.75
$1,432
Enterprise
10,000
$95.50
$2,865

Based on 3,182 tokens saved per query. Model: Claude Sonnet (Thinking). Date: 2026-05-20.

08 — Integration

Layering CKP on top of
an existing memory bank.

If you already use a structured memory bank — with files like activeContext.md, projectbrief.md, progress.md, and per-project files — CKP does not replace it. It layers on top, adding one thing the memory bank pattern is missing: explicit semantic relationships between files.

Rule of thumb

Apply CKP headers to your projects/ files. Leave activeContext.md, dailyLog.md, and decisionLog.md unchanged — they are session-scoped, not knowledge-scoped.

Combined header — CKP + memory bank metadata
memory-bank/projects/prescient.md
---
# CKP fields
CONCEPT:      prescient
TLDR:         Predictive analytics engine with ISTAT SDMX data and social sentiment.
ANSWERS_WHEN: analytics, predictive, ISTAT, trend, forecast, anomaly, dati
SIMILAR_HIGH: mentat:2025-04, istat-sdmx:2025-04
SIMILAR_MID:  civis:2025-04, social-analytics:2025-04, rag-chat:2025-04
CONFIDENCE:   high
VALIDATED:    2025-04

# Memory bank metadata (unchanged)
updated_at:   2025-04-01T10:00:00
updated_by:   agente
ttl:          30d
project_id:   prescient
---
How the boot flow changes
Memory bank — current boot
Read _index.md
Detect project from path/context
Load projects/[detected].md
Agent infers which other files are needed
Memory bank + CKP
Read _index.md (TLDR + ANSWERS_WHEN included)
Match query against ANSWERS_WHEN
Load matched file + SIMILAR_HIGH automatically
Load SIMILAR_MID only if query domain matches
Which memory bank files get CKP headers
FileCKP header?Why
projects/[name].md
YES
Knowledge files — semantic relationships between projects matter
_index.md
PARTIAL
Add TLDR + ANSWERS_WHEN per entry. Keep existing table structure.
projectbrief.md
NO
Always loaded, session-independent. No routing needed.
activeContext.md
NO
Session-scoped, changes every session. Not a knowledge file.
dailyLog.md
NO
Append-only log. Loaded only when explicitly requested.
decisionLog.md
NO
Architectural archive. Loaded on demand, not by routing.
The AGENT.md rule to add
AGENT.md / GEMINI.md — add this block
## CKP LLM — Compiled Knowledge Pattern (obbligatorio)

### REGOLA ASSOLUTA — Zero domande all'utente
Non fare MAI domande durante il flusso CKP.
Fai assunzioni ragionevoli. Dichiarale in una riga.

---

### BOOT — Si attiva SEMPRE al primo messaggio di ogni sessione
1. Leggi PROJECT_ROOT/memory-bank/_index.md.
2. Se esiste → ROUTING. Se non esiste → INIT.

---

### ROUTING — Caricamento selettivo
1. Confronta keyword della query con ANSWERS_WHEN.
2. Carica il file con più match + tutti i SIMILAR_HIGH.
3. Carica SIMILAR_MID solo se ANSWERS_WHEN matcha la query.
4. Se il TLDR risponde già, non caricare il file completo.

Dichiara: [CKP: caricati X/Y file — match: [nome] via [keyword]]

---

### UPDATE — Aggiornamento knowledge base
Ogni volta che crei/modifichi un file in memory-bank/projects/:
- HIGH: dipendenze dirette. Max 3. concept:YYYY-MM
- MID:  stesso dominio. Max 5. concept:YYYY-MM
1. Aggiorna header CKP del file (solo header, non il corpo).
2. Aggiorna riga in memory-bank/_index.md.
Questo step è parte del task. Non chiudere senza averlo completato.
09 — Rules

The complete rule set.

Header rules
Ingest rules
Query time rules
10 — FAQ

Common questions.

Does this replace RAG?
No. CKP is designed for structured, maintained knowledge bases of 10–200 files. For millions of documents, RAG remains the right tool. CKP replaces RAG for the use cases where Karpathy's wiki already works — it just makes those use cases faster and more precise.
Do the SIMILAR scores need to be re-computed if I use a different LLM?
Not necessarily. The fixed rubric with anchor examples produces consistent results across major LLMs. If you switch LLMs, a re-ingest pass is recommended but not urgent — the tier categories are stable enough to be useful even across model changes.
What happens if a concept belongs to two SIMILAR_HIGH groups?
This is expected and fine. Each file independently declares its outgoing relationships. If oauth2 is HIGH for both jwt and pkce, both files declare it independently. At query time, whichever file is matched first will trigger loading oauth2.
Is there a minimum number of files where this makes sense?
Below ~10 files, loading everything into context is still cheap and CKP overhead isn't worth it. The pattern pays off starting around 15–20 files, and becomes increasingly valuable as the knowledge base grows.
Does the file format matter?
No. The header block works in Markdown frontmatter, an HTML comment, a plain text block, or any other format. The pattern is format-agnostic by design.
What does "unidirectional relationships" mean in practice?
File A may list B as HIGH, while B lists A as MID — or not at all. This is intentional. Each file describes its own view of the knowledge space. Asymmetric relationships often reflect real semantic asymmetry.