How Much Context Does Your ~/.claude Folder Eat? Run This Audit

Your ~/.claude folder is a context tax you pay on every session, whether or not you know it. Skills, agents, and always-on instruction files accumulate by accretion — you install one to try it, another to evaluate, a third for a one-off task, and none of them ever leave. The five ClaudeKit v2 kits together measure 82,197 tokens across 101 commands, 19 skills, and 13 read-only agents — and they were designed lean. An uncurated folder can easily double that. Here is how to measure yours, what the numbers mean, and how to act on them.

Why Does Your ~/.claude Folder Hurt Claude Code Performance?

Claude Code performance is mostly a context-budget problem. Skills use progressive disclosure: a short description is always available, and the full SKILL.md body loads only when the skill is triggered. Agent definitions and always-on instruction files, however, tend to sit in context whether or not you use them. A ~/.claude folder that has accumulated dozens of installed skills and agents over months can carry meaningful always-loaded weight before you type a single prompt.

The math is not subtle. If you have installed three skill repos to evaluate them, each carrying a few hundred kilobytes of markdown, you could easily add 10,000-20,000 estimated tokens of passive overhead to every session. That is context Claude has to hold in working memory while helping you ship something entirely unrelated. The first step to fixing this is measuring it.

The estimate we use throughout this guide applies the same rough rule every honest token discussion applies: roughly four characters per token for English text. That is an approximation, not a guarantee. Actual tokenization depends on the specific tokenizer and the content — code, punctuation, and non-English text tokenize differently. Good enough to find your biggest offenders; not good enough to bill against. For exact figures, use ck tokens after the triage.

What Does the Audit Script Actually Do?

This script walks your ~/.claude tree, sums the character counts of every .md file, and prints a per-file token estimate sorted largest first. It is read-only. It changes nothing.

#!/usr/bin/env bash
# claude-context-audit.sh — estimate token weight of ~/.claude markdown.
# Read-only. Token estimate = characters / 4 (tiktoken-style approximation).
 
ROOT="${1:-$HOME/.claude}"
 
if [ ! -d "$ROOT" ]; then
  echo "No directory at $ROOT"
  exit 1
fi
 
echo "Auditing: $ROOT"
echo
 
# Total on-disk size of the folder
echo "On-disk size:"
du -sh "$ROOT"
echo
 
# Per-file char count -> token estimate, sorted heaviest first
echo "Heaviest markdown files (estimated tokens = chars / 4):"
find "$ROOT" -type f -name '*.md' -print0 \
  | while IFS= read -r -d '' f; do
      chars=$(wc -m < "$f")
      tokens=$(( chars / 4 ))
      printf '%8d tokens  %s\n' "$tokens" "$f"
    done \
  | sort -rn \
  | head -25
echo
 
# Grand total estimate across all markdown
total_chars=$(find "$ROOT" -type f -name '*.md' -exec cat {} + | wc -m)
echo "Total markdown chars: $total_chars"
echo "Total estimated tokens (chars / 4): $(( total_chars / 4 ))"

Save it, make it executable, and run it:

chmod +x claude-context-audit.sh
./claude-context-audit.sh

If you keep project-scoped skills in a repo, point the script at that path instead: ./claude-context-audit.sh ./.claude.

How Do You Read the Audit Output?

The output gives you three things: total on-disk size, a ranked list of the heaviest files, and a grand-total token estimate. Read them in this order.

The grand total is your ceiling, not your per-session cost. The total estimated tokens across all markdown is the absolute most these files could contribute if everything loaded at once — which does not happen, because progressive disclosure keeps most skill bodies dormant until triggered. Treat the grand total as a ceiling and a trend line. If it doubles over a few months of installing repos, that is the signal to prune even if no single session feels slow.

The ranked list finds the always-loaded offenders. Two patterns to flag immediately:

A single SKILL.md over roughly 2,000 estimated tokens. Large skill bodies are not inherently bad, but they should justify their weight. A 3,600-token skill you trigger twice a year is a poor trade.
Agent definitions and always-on instruction files. These are more likely to sit in context regardless of use. A heavy agent roster you installed to evaluate once and never removed is pure overhead.

Cross-check high-ranking files against real usage. The chars/4 estimate is directional. If a file ranks high and you want the real number, use a tiktoken-compatible tool or run ck tokens on a ClaudeKit-managed install. The point of the audit is triage — find the candidates — not final accounting.

Sample Audit Output and Diagnosis

Suppose the script prints this top of the ranking:

    3612 tokens  /Users/you/.claude/skills/pdf-tools/SKILL.md
    2104 tokens  /Users/you/.claude/agents/code-reviewer.md
    1890 tokens  /Users/you/.claude/agents/data-scientist.md
     710 tokens  /Users/you/.claude/skills/blog-writer/SKILL.md
     680 tokens  /Users/you/.claude/skills/tweet-gen/SKILL.md

The diagnosis writes itself. The PDF skill at 3,612 tokens is fine if you process PDFs daily and it loads on trigger. It is a liability if you installed it once and it sits resident. The two agent definitions at roughly 2,000 each are the more likely problem — agent definitions tend to be always-considered, so 4,000 estimated tokens of agents you rarely invoke is pure overhead. The two small skills at around 700 each are individually harmless, but if there are twenty more like them, the aggregate is the real cost.

How Does ClaudeKit v2 Compare to a Typical Free Skill Repo?

One reason we measured every kit at install time is precisely this problem. Here is how the five ClaudeKit v2 kits compare against the kinds of context loads we have seen in public repos evaluated in our free-skills token study.

Kit	Commands	Skills	Agents	Measured Tokens
EngineerKit (/eng)	25	4	4	20,413
MarketingKit (/mkt)	20	3	2	16,714
SEOKit (/seo)	19	4	2	16,004
EcomKit (/ecom)	20	3	2	16,464
VideoKit (/video)	17	5	3	12,602
All 5 kits	101	19	13	82,197
Typical free repo (large)	varies	varies	varies	40,000-120,000+

The comparison is not about size alone. The important difference is that ClaudeKit v2 publishes its measured token cost at install time via the token ledger, and all 13 agents are read-only specialist reviewers and researchers — not blocking quality-gate agents that stay resident and add passive overhead. Commands end with evidence (a report, a diff, a verified file), not a gatekeeper that holds up the loop.

The token ledger prints on every ck install <kit>. You can recount at any time with ck tokens <kit>. There is no guessing. See the measured token costs post for the methodology.

What Is the Right Pruning Strategy?

Once you have the ranking, prune in three passes:

Remove what you do not use. Uninstall skills and agents you have not triggered in the last month. This is the highest-leverage step and the safest. If you cannot remember what a file does, it probably does not earn its context cost.
Demote heavy-but-rare to project scope. If a large skill is genuinely useful but infrequent, keep it project-scoped rather than global. Install it inside ./.claude in the repo that needs it so it only loads in that context, not in every other session.
Prefer kits that publish their weight. When a skill declares its measured token cost, you can budget a multi-skill command before running it instead of discovering the cost after a slow session. The ck tokens <kit> command gives you the exact figure for any ClaudeKit kit.

Make the Audit a Habit, Not a One-Off

The most useful version of this audit is a recurring one. A ~/.claude folder grows by accretion — you install a skill to try it, a repo of agents to evaluate, a one-off helper for a single task, and none of them ever leave. The folder that was lean in January is heavy by June without any single decision causing it. Running the audit monthly and watching the grand-total trend line is how you catch creep before it costs you.

If the total estimated tokens climbs steadily while your actual usage has not changed, that delta is accumulated cruft. Schedule the script the way you would any maintenance task. It takes seconds and changes nothing, so there is no reason not to run it on a cadence.

A principle worth stating explicitly: progressive disclosure is the feature that makes large skill libraries affordable. A well-built skill keeps its always-available surface tiny and pushes detail into the body that loads on demand. When you audit a folder and find skills that are heavy even before triggering, that is usually a missing progressive-disclosure implementation — the pattern we measured across public repos in our real-cost analysis.

What Are the Automated Audit Tools?

The bash audit is the manual, zero-dependency version. If you run ClaudeKit kits, two CLI commands do this with real measurements rather than estimates.

# Exact token accounting for what is installed
ck tokens <kit>
 
# Diagnose install health: orphaned skills, version drift, heavy loads
ck doctor
 
# List what you have installed
ck list

ck tokens <kit> reports measured context cost rather than a chars/4 estimate, so you get accounting you can actually budget against. ck doctor is the health check — it surfaces install problems and context-weight issues and is the first command to run when a session feels heavier than it should.

The CLI is claudekits v0.1.3 on npm. Install it, authenticate with ck auth <key>, then install a kit with ck install <kit>. The token ledger prints automatically at install time. You can also install kits via /plugin marketplace add Madni-Aghadi/claudekit-<kit> if you prefer the Claude UI flow.

For a full comparison of kit architectures and what the v2 read-only agent model means for context budget, the EngineerKit page walks through the specific commands, agents, and measured costs in detail.

How Do Kit Namespaces Work in v2 vs Older Installs?

ClaudeKit v2 uses five namespaces: /eng, /mkt, /seo, /video, and /ecom. If you have an older install from before June 2026, you may have files under previous namespace conventions sitting in your ~/.claude folder and contributing passive weight. The audit script will surface them — look for files that do not map to any of the five current namespaces.

The correct resolution is to uninstall the old kit and reinstall via ck install <kit> with the current CLI. Running ck doctor will also flag orphaned files from version mismatches. Do not try to manually rename or reorganize the files; the install tooling handles the right structure.

A few examples of v2 commands to orient you on what each namespace covers:

/eng debug — root-cause-first debugging, ends with a verified diff
/seo quick-wins — positions 8-20 and low-CTR pages ranked by opportunity
/mkt humanize — strips 14 AI tells from a draft
/ecom no-sales — store triage against AOV-band benchmarks
/video clone — recreate a reference video's style in Remotion and verify the match

Each command ends with evidence you can inspect, not a reviewer gate that adds more context overhead. That is the architectural choice that keeps the token ledger predictable. More on how commands, skills, and agents differ in the commands vs skills vs agents guide.

FAQ

Is dividing characters by four accurate enough to act on?

It is a deliberate approximation, not a precise measurement. Roughly four characters per token holds for typical English prose, but code, punctuation, and non-English text tokenize differently. Use chars/4 to rank and triage; use ck tokens <kit> or a tiktoken-compatible counter when you need a number you can budget against. The script is designed to find your biggest offenders quickly, not to produce billable accounting.

Does everything in ~/.claude load on every session?

No. Skills use progressive disclosure — a short description is always available, and the full body loads only when the skill is triggered. Agent definitions and always-on instruction files are the more likely always-resident items, which is why the ranked list in the audit output matters more than the grand total. The grand total is a ceiling; the ranked list is where actual overhead lives.

Will pruning actually make Claude Code faster?

Removing unused agents and always-loaded instruction files reduces the context Claude holds in working memory every session. The improvement is real but not dramatic on its own. Context budget is one factor in responsiveness, not the only one. The more important outcome of pruning is predictability: your sessions become more consistent because you know what is loaded and why. Prune for a leaner, auditable baseline rather than expecting a single dramatic speedup.

What is the difference between ck tokens and ck doctor?

ck tokens <kit> gives you exact, measured context cost for an installed kit — the accounting view. ck doctor is the diagnostic: it checks install health, flags context-weight problems, surfaces orphaned files from version drift, and is the right starting point when something feels broken or heavy. Run ck doctor when something seems wrong; run ck tokens when you want to budget a command before you run it.

Should I install kits globally or project-scoped?

Global install (ck install <kit>, default path ~/.claude) makes the commands available in every project. Project-scoped install (ck install <kit> --local) puts the kit in ./.claude so it only loads in that directory. Heavy-but-infrequent skills belong project-scoped. Daily-use kits like EngineerKit belong global. The audit script works on either path — just pass the directory as the first argument.

How often should I run the audit?

Once a month is a reasonable cadence. The script is read-only and takes a few seconds, so there is no cost to running it frequently. The metric to watch is the grand-total trend line. If it climbs more than 20% month over month while your actual usage has not changed, that is the signal to spend ten minutes on a pruning pass. Most people only need one pruning session to establish a clean baseline, then the monthly check is just confirmation.

If you want a kit where the token cost is published, measured, and auditable before you install, the EngineerKit and MarketingKit are good starting points. Both show their exact token ledger at install time, all agents are read-only, and every command ends with evidence rather than adding passive overhead. Check pricing for the options, or start with a single kit at $14.99/month and expand from there.