CKP LLM — Compiled Knowledge Pattern

01 — The Problem

RAG retrieves. Wikis compile.
Neither routes.

Karpathy's LLM Wiki pattern is a genuine improvement over RAG: instead of retrieving and forgetting, it compiles knowledge into structured files loaded directly into context. The heavy reasoning happens at write time, not at query time.

But even compiled wikis have a structural gap: when you load a knowledge base, you load all of it. The LLM receives 20 files to answer a question that needs 3. Context is noisy. Response quality drops. Speed suffers.

And the connections between concepts — the "see also" links — are plain text. The LLM must infer the strength and type of every relationship on every query, from scratch, every time.

RAG

Retrieves chunks, forgets context

Runtime embedding cost

Vector database required

Probabilistic, misses precision

LLM Wiki (Karpathy)

Loads all files into context

No retrieval step — clean

No infrastructure needed

Relationships are implicit prose

The missing piece

A lightweight routing layer that tells the LLM which files to load, and pre-declares how concepts relate — computed once at ingest, reused on every query.

02 — The Pattern

Five additions.
Same files. Same workflow.

CKP adds five structured fields to the header of each knowledge file. No new tools. No new infrastructure. The same folder, the same Git workflow, the same "load into context" approach — just richer files.

The five fields

TLDR

One-sentence summary optimised for LLM consumption

1 sentence

Immediate answer without reading the full file

ANSWERS_WHEN

Keywords that signal this file is relevant

5–10 words

Fast keyword routing — BM25-style

SIMILAR_HIGH

Concepts this file directly depends on or extends

Max 3

Always load these together

SIMILAR_MID

Concepts in the same domain, often co-relevant

Max 5

Load conditionally based on query

VALIDATED

Timestamp per relationship, not per file

YYYY-MM per entry

Staleness detection at relationship level

There is no SIMILAR_LOW field. Weak relationships add noise, not signal — the LLM infers them from content.

Why categorical tiers instead of decimal scores

LLMs are significantly more consistent when classifying into categories than when assigning decimal scores. A score of 0.73 from one session may be 0.61 in another. But HIGH vs MID — with a fixed rubric and anchor examples — produces stable, reproducible results across any LLM and any session.

HIGH

One concept requires understanding the other. Direct dependency or extension.

3

jwt ↔ oauth2

MID

Same domain, frequently relevant together. No direct dependency.

5

jwt ↔ http-headers

—

Loosely related, rarely co-relevant. Not stored.

0

jwt ↔ database

03 — File Anatomy

A knowledge file,
fully annotated.

The CKP header is format-agnostic. The example below uses a plain text block that works in Markdown, HTML, or any other format. The body below the header is unchanged — write it however you normally would.

knowledge file — any format

---
CONCEPT:      JWT Authentication
TLDR:         Stateless token-based auth — server signs a token, client stores and sends it.
ANSWERS_WHEN: authentication, token, login, bearer, stateless, jwt, auth

SIMILAR_HIGH: oauth2:2025-03, refresh-token:2025-03, bearer-token:2025-03
SIMILAR_MID:  http-headers:2025-03, cors:2025-03, api-security:2025-03

CONFIDENCE:   high
VALIDATED:    2025-03
---

# JWT Authentication
Your normal content here. Nothing changes below the header.
Write in Markdown, HTML, plain text — whatever fits your workflow.

Timestamp per relationship, not per file

The date on each SIMILAR entry (oauth2:2025-03) records when that specific relationship was last validated — not when the file was last touched. This makes staleness detectable at the relationship level, not just the file level.

If oauth2 was updated after 2025-03, that specific relationship is potentially stale and should be re-evaluated at next ingest. The rest of the file's relationships remain valid.

The index file

A single index file at the root of your knowledge base aggregates only the headers — no body content. It is the only file always loaded into context. Everything else is loaded on demand.

index file

---
CONCEPT:      jwt-authentication
TLDR:         Stateless token-based auth — server signs a token, client stores and sends it.
ANSWERS_WHEN: authentication, token, login, bearer, stateless, jwt
---

---
CONCEPT:      oauth2
TLDR:         Delegated authorisation framework. Issues access tokens on behalf of users.
ANSWERS_WHEN: oauth, authorization, delegate, scope, grant, flow
---

... one header block per file. No body content.

The index stays small regardless of knowledge base size because it contains only the five header fields — never the concept body. With 50 files, it remains a few hundred lines.

04 — The Ingest Process

Computation happens once.
Reused on every query.

Ingest is the only moment where semantic work occurs. At query time, the LLM reads pre-computed relationships — it never re-derives them.

When ingest runs

trigger Every time you create or modify a knowledge file. In code: a git hook. In an agent: a rule in AGENT.md.

01 LLM reads the new or modified file and the current index.

02 LLM applies the fixed rubric (see below) to classify relationships against all existing concepts.

03 LLM writes the updated header into the file: SIMILAR_HIGH, SIMILAR_MID, timestamps, VALIDATED.

04 LLM updates the index with the new header block. No other files are touched.

The fixed rubric — copy this into your ingest prompt

ingest prompt — rubric block

You are updating the header of a knowledge file.
Classify the relationship between this concept and each existing concept
using exactly these three tiers:

HIGH — one concept requires understanding the other.
     Direct dependency, extension, or prerequisite.
     Anchor: jwt ↔ oauth2 = HIGH

MID  — same domain, frequently used together.
     No direct dependency.
     Anchor: jwt ↔ http-headers = MID

NONE — loosely related or rarely co-relevant. Do not store.
     Anchor: jwt ↔ database = NONE

Rules:
- SIMILAR_HIGH: select at most 3. If fewer qualify, write fewer.
- SIMILAR_MID:  select at most 5. If fewer qualify, write fewer.
- Do not store NONE relationships.
- Append today's date (YYYY-MM) to each entry: concept:YYYY-MM
- Only update the header. Do not modify the body content.

Using in AGENT.md

For coding agents like Claude Code, add this rule to your AGENT.md. The agent treats it as a mandatory step after every file write inside your knowledge folder.

AGENT.md — knowledge base rule

## Knowledge Base — Mandatory Rule

Whenever you create or modify a file inside /knowledge/:

1. Re-read the file you just wrote.
2. Apply the CKP ingest rubric (see below) to classify
   relationships against the current index.
3. Update the file header: SIMILAR_HIGH (max 3),
   SIMILAR_MID (max 5), timestamps, VALIDATED.
4. Update /knowledge/index with the new header block.

This step is part of the task. It is not optional.

[paste the rubric block here]

05 — Query Time

Load less. Answer faster.
Zero computation.

At query time the LLM performs no semantic computation. It reads pre-compiled structure and routes accordingly.

01 Load index only. The index contains all TLDRs and ANSWERS_WHEN keywords. It is always in context. Body files are not.

02 Keyword match. Query terms are matched against ANSWERS_WHEN across all index entries. Relevant files are identified in one pass.

03 Load matched files + their SIMILAR_HIGH. Files that match the query are loaded. Their SIMILAR_HIGH entries are loaded unconditionally — they are always needed together.

04 Load SIMILAR_MID if relevant. MID entries are loaded only if the query also relates to their domain. Checked against ANSWERS_WHEN of each MID candidate.

05 TLDR first. If the TLDR in the index already answers the query, the full file may not need to be read. Speed gain for simple queries.

Result

A query that previously loaded 20 files now loads 2–4. Context is denser, response quality is higher, and no vector database or embedding model was invoked at any point.

Staleness at query time

Query time is read-only. The LLM never writes during a query. If it notices a relationship timestamp is older than the target file's VALIDATED date, it notes this internally but still loads the most recent version of the file. Re-evaluation happens at the next ingest — not now.

06 — Examples

The same pattern,
two different domains.

Coding agent — authentication module

Query: "how do I implement refresh token rotation?"

Index scan:
  "refresh" + "token" → matches jwt-authentication

Load:
  jwt-authentication          ← query match
  oauth2:HIGH                 ← always load with jwt
  refresh-token:HIGH          ← always load with jwt
  http-headers:MID            ← relevant to token transport

Skip:
  cors, api-security, database, cryptography ← not relevant

Result: 4 files loaded instead of 18.

Civic intelligence platform — territorial security

knowledge/sicurezza-territoriale

---
CONCEPT:      sicurezza-territoriale
TLDR:         Crime statistics by geographic area via ISTAT SDMX API, with trend and anomaly detection.
ANSWERS_WHEN: crime, security, reati, statistics, ISTAT, territory, area, criminalità

SIMILAR_HIGH: istat-sdmx-api:2025-04, geographic-filtering:2025-04, crime-statistics:2025-04
SIMILAR_MID:  sentiment-analysis:2025-04, civis-reporting:2025-04, anomaly-detection:2025-04

CONFIDENCE:   high
VALIDATED:    2025-04
---

Query: "quanti reati a Fiumicino nel 2023?"

Index scan:
  "reati" + "Fiumicino" → matches sicurezza-territoriale

Load:
  sicurezza-territoriale     ← query match
  istat-sdmx-api:HIGH        ← data source, always needed
  geographic-filtering:HIGH  ← "Fiumicino" signals location
  civis-reporting:MID        ← civic context, conditionally relevant

Skip:
  rag-chat, multilingual-support, social-analytics

07 — Benchmarks

Measured results on
real codebases.

The following benchmark was run on a real NestJS codebase (Mentat — a civic intelligence platform) using Claude Sonnet. Two environments were compared: Environment A loaded all knowledge files with no routing (no CKP); Environment B used full CKP routing. Each of 5 queries was run 3 times per environment.

66% Token reduction at 11 files

~85% Projected at 30 files

100% Accuracy with CKP

$0 Extra infrastructure

Token reduction by knowledge base size

3 files

564

522

8% — below break-even

11 files

4,800

1,618

66.3%

30 files (projected)

~13,000

~1,900

~85%

Answer accuracy

No CKP — 11 files

Q1 — ISTAT SDMX fetch: correct

Q2 — Anomaly notifications: correct

Q3 — Auth (JWT Guards): incorrect

Q4 — Stripe (out of domain): hallucinated

Q5 — Project TLDR: correct

Accuracy: 60% (9/15)

CKP — 11 filesQ1 — ISTAT SDMX fetch: correct
Q2 — Anomaly notifications: correct
Q3 — Auth (JWT Guards): correct — auth file only
Q4 — Stripe (out of domain): correct — no hallucination
Q5 — Project TLDR: correct — from index only
Accuracy: 100% (15/15)

The most important finding

On out-of-domain queries, No-CKP hallucinates connections between unrelated modules. CKP loads nothing and answers honestly. Reduced context is not just a cost saving — it is a hallucination risk reduction.

Estimated cost saving — Claude Sonnet ($3 / 1M input tokens)

Small team

1,000

$9.55

$286

Mid-size team

5,000

$47.75

$1,432

Enterprise

10,000

$95.50

$2,865

Based on 3,182 tokens saved per query. Model: Claude Sonnet (Thinking). Date: 2026-05-20.

08 — Integration

Layering CKP on top of
an existing memory bank.

If you already use a structured memory bank — with files like activeContext.md, projectbrief.md, progress.md, and per-project files — CKP does not replace it. It layers on top, adding one thing the memory bank pattern is missing: explicit semantic relationships between files.

Rule of thumb

Apply CKP headers to your projects/ files. Leave activeContext.md, dailyLog.md, and decisionLog.md unchanged — they are session-scoped, not knowledge-scoped.

Combined header — CKP + memory bank metadata

memory-bank/projects/prescient.md

---
# CKP fields
CONCEPT:      prescient
TLDR:         Predictive analytics engine with ISTAT SDMX data and social sentiment.
ANSWERS_WHEN: analytics, predictive, ISTAT, trend, forecast, anomaly, dati
SIMILAR_HIGH: mentat:2025-04, istat-sdmx:2025-04
SIMILAR_MID:  civis:2025-04, social-analytics:2025-04, rag-chat:2025-04
CONFIDENCE:   high
VALIDATED:    2025-04

# Memory bank metadata (unchanged)
updated_at:   2025-04-01T10:00:00
updated_by:   agente
ttl:          30d
project_id:   prescient
---

How the boot flow changes

Memory bank — current boot

Read _index.md

Detect project from path/context

Load projects/[detected].md

Agent infers which other files are needed

Memory bank + CKPRead _index.md (TLDR + ANSWERS_WHEN included)
Match query against ANSWERS_WHEN
Load matched file + SIMILAR_HIGH automatically
Load SIMILAR_MID only if query domain matches

Which memory bank files get CKP headers

projects/[name].md

YES

Knowledge files — semantic relationships between projects matter

_index.md

PARTIAL

Add TLDR + ANSWERS_WHEN per entry. Keep existing table structure.

projectbrief.md

NO

Always loaded, session-independent. No routing needed.

activeContext.md

NO

Session-scoped, changes every session. Not a knowledge file.

dailyLog.md

NO

Append-only log. Loaded only when explicitly requested.

decisionLog.md

NO

Architectural archive. Loaded on demand, not by routing.

The AGENT.md rule to add

AGENT.md / GEMINI.md — add this block

## CKP LLM — Compiled Knowledge Pattern (obbligatorio)

### REGOLA ASSOLUTA — Zero domande all'utente
Non fare MAI domande durante il flusso CKP.
Fai assunzioni ragionevoli. Dichiarale in una riga.

---

### BOOT — Si attiva SEMPRE al primo messaggio di ogni sessione
1. Leggi PROJECT_ROOT/memory-bank/_index.md.
2. Se esiste → ROUTING. Se non esiste → INIT.

---

### ROUTING — Caricamento selettivo
1. Confronta keyword della query con ANSWERS_WHEN.
2. Carica il file con più match + tutti i SIMILAR_HIGH.
3. Carica SIMILAR_MID solo se ANSWERS_WHEN matcha la query.
4. Se il TLDR risponde già, non caricare il file completo.

Dichiara: [CKP: caricati X/Y file — match: [nome] via [keyword]]

---

### UPDATE — Aggiornamento knowledge base
Ogni volta che crei/modifichi un file in memory-bank/projects/:
- HIGH: dipendenze dirette. Max 3. concept:YYYY-MM
- MID:  stesso dominio. Max 5. concept:YYYY-MM
1. Aggiorna header CKP del file (solo header, non il corpo).
2. Aggiorna riga in memory-bank/_index.md.
Questo step è parte del task. Non chiudere senza averlo completato.

09 — Rules

The complete rule set.

Header rules

Every knowledge file must have a CKP header block.
SIMILAR_HIGH contains at most 3 entries. Write fewer if fewer qualify.
SIMILAR_MID contains at most 5 entries. Write fewer if fewer qualify.
Do not store SIMILAR_LOW. Weak relationships are noise, not signal.
Every SIMILAR entry carries a timestamp: concept:YYYY-MM.
Relationships are unidirectional — each file declares its own outgoing relationships only.

Ingest rules

Ingest runs every time a knowledge file is created or modified.
The fixed rubric must be included verbatim in every ingest prompt.
Ingest updates only the file being ingested and the index. No cascading updates.
The index contains only header blocks — never body content.

Query time rules

Query time is read-only. The LLM never writes during a query.
The index is always in context. Body files are loaded on demand.
SIMILAR_HIGH files are always loaded alongside their parent.
SIMILAR_MID files are loaded only if relevant to the current query.
Staleness is noted but not acted upon at query time. Re-evaluated at next ingest.

10 — FAQ

Common questions.

Does this replace RAG?

No. CKP is designed for structured, maintained knowledge bases of 10–200 files. For millions of documents, RAG remains the right tool. CKP replaces RAG for the use cases where Karpathy's wiki already works — it just makes those use cases faster and more precise.

Do the SIMILAR scores need to be re-computed if I use a different LLM?

Not necessarily. The fixed rubric with anchor examples produces consistent results across major LLMs. If you switch LLMs, a re-ingest pass is recommended but not urgent — the tier categories are stable enough to be useful even across model changes.

What happens if a concept belongs to two SIMILAR_HIGH groups?

This is expected and fine. Each file independently declares its outgoing relationships. If oauth2 is HIGH for both jwt and pkce, both files declare it independently. At query time, whichever file is matched first will trigger loading oauth2.

Is there a minimum number of files where this makes sense?

Below ~10 files, loading everything into context is still cheap and CKP overhead isn't worth it. The pattern pays off starting around 15–20 files, and becomes increasingly valuable as the knowledge base grows.

Does the file format matter?

No. The header block works in Markdown frontmatter, an HTML comment, a plain text block, or any other format. The pattern is format-agnostic by design.

What does "unidirectional relationships" mean in practice?

File A may list B as HIGH, while B lists A as MID — or not at all. This is intentional. Each file describes its own view of the knowledge space. Asymmetric relationships often reflect real semantic asymmetry.

Compiled KnowledgePattern LLM

RAG retrieves. Wikis compile.Neither routes.

Five additions.Same files. Same workflow.

A knowledge file,fully annotated.

Computation happens once.Reused on every query.

Load less. Answer faster.Zero computation.

The same pattern,two different domains.

Measured results onreal codebases.

Layering CKP on top ofan existing memory bank.