All posts
Data

Measuring Claude Code Context Token Costs: The Full Breakdown (2026)

Every skill, command, and agent you install costs context tokens on every session. How ClaudeKit measures 82,197 tokens across 5 kits and how to audit your own setup.

Updated 12 min read
Measuring Claude Code Context Token Costs: The Full Breakdown (2026)

Every skill, command, and agent you install into Claude Code is text the model may load — and you pay for context in tokens, attention, and money. Across ClaudeKit's five v2 kits we measured 82,197 total tokens: EngineerKit at 20,413, MarketingKit 16,714, EcomKit 16,464, SEOKit 16,004, VideoKit 12,602. Almost nobody publishes these numbers. We print them on every install.

Why does context footprint matter at all?

Install a few popular packs from GitHub and your baseline session can quietly grow by tens of thousands of tokens before you type a single prompt. Symptoms: slower first-token latency, earlier compaction kicks in, the model forgets instructions that got crowded out, and your bill climbs without an obvious cause. The culprit is invisible because nobody put a number on the box.

Context pressure is real at any model tier. Even with Opus 4.8's extended context window (shipped May 28, 2026), the cost math still applies: that model runs at $5/$25 per million tokens input/output. A bloated setup that adds 50,000 tokens of dead-weight skills across a 20-turn session is costing you real money — and degrading quality by crowding out your actual project files.

The fix is not "install less." It is "know what you're installing." Measurement first, then judgment.

What actually loads into context, and when?

Not everything you install costs the same. Claude Code loads installed content in two tiers:

Always-loaded (every turn, every session):

  • Every command's name and description line
  • Every skill's frontmatter description field
  • Every agent's description block

This is how the model knows what tools are available. It rides along whether you invoke anything or not.

Loaded on demand (only when triggered):

  • The full skill body (the actual instructions) — only enters context when the model decides the skill is relevant to the current task

This split is why a pile of 200 free skills from a random GitHub repo hurts even if you use five of them: 200 description lines are burning context on every single turn. It is also why a well-written description matters twice — it is both the selection trigger and the always-on tax.

ClaudeKit's architecture takes this seriously. Each kit's skills are written with tight descriptions that are specific enough to trigger correctly but short enough to cost almost nothing at baseline.

How do we measure? The ceil(chars/4) method

We compute every token count with one reproducible rule: characters divided by four, rounded up, per file. This is not a rough estimate — it is a deliberate engineering choice.

Why chars/4 instead of running a real tokenizer?

  1. Verifiable by anyone. You can check our numbers with wc -c and a calculator. No tokenizer version to install, no API call to make, no trust required.
  2. Stable across model versions. Tokenizer outputs shift between releases. Character counts do not. A published number from today should still be correct when Anthropic ships the next model.
  3. Conservative in the right direction. For English prose and code, chars/4 runs slightly high against Claude's actual tokenizer. Our published footprints err toward overestimate, not marketing numbers that err low.

This matters because we are making a commitment when we publish a token count. If the real number is 18,000 and we say 16,714, we have misled you. If the real number is 15,400 and we say 16,714, we have given you a safe upper bound and you will never be surprised.

What are the measured token counts for each ClaudeKit kit?

We measure total context footprint for everything each kit installs — commands, skills, and agents combined. The ledger prints automatically when you run ck install <kit>.

KitCommandsSkillsAgentsMeasured Tokens
EngineerKit (/eng)254420,413
MarketingKit (/mkt)203216,714
EcomKit (/ecom)203216,464
SEOKit (/seo)194216,004
VideoKit (/video)175312,602
Totals101191382,197

EngineerKit is the heaviest because it ships the most commands (25) and has four always-on specialist agents covering code review, debugging, security, and test coverage. VideoKit is the lightest because its work is mostly procedural — generate, verify, render — and five focused skills cover the domain without needing verbose instruction bodies.

The per-kit numbers appear in each kit's manifest file and get reprinted at install. If we add a command or skill, the number moves and you see it move on the next ck install or ck tokens <kit> run.

How do the v2 kits compare to a typical free skill pack?

This question comes up constantly, so let's put real numbers on it. Free GitHub skill repos typically ship between 80 and 200 skill files with no token reporting. We audited several popular repos for the full comparison, but here is the summary:

A 150-skill free pack scenario vs. a single ClaudeKit kit:

Factor150-skill free packSingle ClaudeKit kit
Reported token countNonePrinted at install
Estimated always-loaded tokens~30,000-60,0004,000-8,000
Commands (slash workflows)017-25
Specialist agents0-22-4
Token ledger toolNoneck tokens <kit>
Recalculate after updatesManualck tokens <kit>

The free pack's actual footprint depends entirely on how verbose each skill file is. We have seen repos where a single skill file runs 4,000 characters — that is 1,000 tokens of always-loaded description context, for one skill. Multiply by 150 files and you are pushing 150,000 tokens of dead weight before you write a line of code.

ClaudeKit's discipline is not just about knowing the number. It is the pressure that number creates on the writing itself. If a skill description costs 120 tokens instead of 30, that four-times cost forces the question: does this description earn its place? Usually the answer is a tighter rewrite, not a pass.

How to audit your own Claude Code setup in two minutes

You do not need our tooling to apply this method to whatever you have already installed. Run these two commands:

# Total character count across everything Claude Code loads from this project
find .claude -name "*.md" | xargs wc -c | tail -1
 
# Same audit for your global ~/.claude setup
find ~/.claude/skills ~/.claude/agents ~/.claude/commands \
  -name "*.md" 2>/dev/null | xargs wc -c | tail -1

Divide the output by four. If the global number surprises you — and after a year of "I'll just install this one repo" it usually does — run the find with -l to list individual files and sort by size:

find ~/.claude -name "*.md" -exec wc -c {} \; | sort -rn | head -20

That shows your twenty heaviest files. For each one, ask: is this skill triggered at least weekly? Does its description alone cost more than its benefit in the sessions where I do not use it? If the answer to either question is no, remove it.

We have written in detail about where the waste typically hides inside free skill packs — description bloat, duplicate files, skills that were written as demos and never trimmed.

What about context spirals — does token footprint compound over a session?

Yes, and this is the part most people miss. The always-loaded footprint is a fixed cost per turn. But Claude Code sessions also accumulate conversation history, file reads, and command outputs in the context window. A long debugging session can add 40,000-80,000 tokens of conversation context on top of your installed footprint.

This means the installed footprint sets your floor. Start a session with 20,413 tokens of EngineerKit loaded, run a five-turn debug loop that pulls in three files and two command outputs, and you might hit 60,000 tokens before you have done anything interesting. That is still well within modern context windows but it is the trajectory that matters.

We covered the compaction mechanics in detail in our context spirals post. The short version: heavy always-loaded footprints accelerate when you hit compaction, because the compacted summary has to carry forward the skeleton of everything still in context — including your installed skills and agents. A lean kit footprint gives compaction more room to work with your actual project state.

How does ClaudeKit's v2 architecture keep footprint low?

The v2 architecture made three deliberate choices to manage context cost:

  1. Commands over skills for workflows. A slash command like /eng debug or /seo quick-wins is a single trigger that runs a structured workflow. The workflow logic lives in the command file, which only loads when you invoke it. A skill that tries to do the same job has to keep its full instruction set in the always-loaded description to work reliably — much more expensive.

  2. Read-only specialist agents, not orchestrators. ClaudeKit's 13 agents are all read-only specialists: reviewers, auditors, and researchers. They do not spawn subagents, they do not block command completion, and they do not add orchestration overhead to the always-loaded context. An agent that reads your diff and returns a structured finding costs a fraction of one that coordinates a pipeline.

  3. Evidence-ending commands. Every v2 command ends with a concrete deliverable — a report, a diff, a verified file, a structured JSON — not a hand-off to a reviewer gate. This matters for token efficiency because reviewer gates typically require the reviewer agent's full instruction context to stay live throughout the command run. Evidence-ending commands can release that context the moment the deliverable is produced.

The result is that installing all five kits costs 82,197 tokens — roughly the same as a moderately long blog post. You get 101 commands, 19 skills, and 13 specialist agents for that price.

How do I recount tokens after updating a kit?

Three tools:

# Recount a specific kit's footprint
ck tokens engineerkit
 
# Recount all installed kits
ck tokens --all
 
# Full diagnostic including version, install path, and any size warnings
ck doctor

The ck doctor command also flags if any individual file has grown past a threshold that suggests it is pulling more than its weight in always-loaded context. It is the fastest way to catch a kit update that added a verbose skill without updating the manifest total.

Numbered checklist: how to keep your Claude Code setup lean

  1. Audit your current global setup with the find ~/.claude command above. Know your floor before adding anything.
  2. Check the token count before installing any new kit or skill pack. If the source does not publish one, run the chars/4 calculation yourself on the downloaded files.
  3. Remove skills with always-loaded descriptions over 500 characters if you cannot point to a session where they triggered correctly in the last week.
  4. Prefer command-based kits over skill-heavy packs for recurring workflows. Commands load on demand; description-heavy skill files do not.
  5. Run ck tokens --all after any update. A kit update that added three skills without trimming three others has increased your footprint without you knowing.
  6. Check your context window usage in Claude Code's session stats at the end of long runs. If you are hitting 60-70% before the interesting work starts, your installed footprint is too heavy.

FAQ

Does the token footprint change between Claude models?

The character count stays constant — the files do not change based on which model you are using. The actual token count may vary slightly because tokenizers differ between model versions. Our ceil(chars/4) method gives a stable upper-bound estimate that holds across models. When Anthropic ships a new tokenizer, our published numbers do not become wrong; they become slightly more conservative.

Do all installed skills load on every turn, or only when triggered?

Only descriptions load on every turn. The full skill body loads on demand when the model determines the skill is relevant to the current prompt. This is why description length is the real tax and why ClaudeKit writes very tight descriptions: a 30-token description that triggers correctly is strictly better than a 120-token description that also triggers correctly.

How does ClaudeKit's token count compare to installing a full free skill repo?

Popular free repos range from 30,000 to 150,000+ estimated tokens depending on file count and description verbosity. A single ClaudeKit kit ranges from 12,602 (VideoKit) to 20,413 (EngineerKit). The difference is not just the number — free repos typically have no token reporting, so you would not know what you installed without running the audit yourself.

What happens to token costs when I install multiple kits?

Costs add roughly linearly. Installing all five ClaudeKit kits totals 82,197 tokens at baseline. That is the floor for every session where you have all five active. In practice, most users install one or two kits relevant to their current role — an engineer who occasionally writes marketing copy might run EngineerKit globally and MarketingKit as a project-level install.

Can I install a kit just for one project to limit global context cost?

Yes. ck install <kit> --local installs to the project's .claude/ directory instead of the global ~/.claude/. The kit only loads when you are inside that project. This is the recommended pattern for role-specific kits like VideoKit or EcomKit where the commands are only useful in specific projects.

Why measure with chars/4 instead of using the official Anthropic tokenizer?

Three reasons: verifiability (anyone can check with wc -c), stability across model versions, and honest directional error (estimates run slightly high, not low). The Anthropic tokenizer would give more precise numbers but would require API access to verify, would change with model updates, and would create a situation where our published numbers could become optimistic if the new tokenizer is more efficient. We would rather publish a conservative number that never misleads you.


Token transparency is one of the things we feel strongly about at ClaudeKit — it is the constraint that forces everything else to be tight. If you are shopping for a kit or trying to figure out which one fits your work, the pricing page has the full breakdown by role, and each kit page (/engineer, /marketing, /seo, /video, /ecom) lists the exact commands, skills, and agents included. The footprint numbers are there because you should know what you are buying before you install it.

Give Claude Code a real team

Five kits, 101 commands, every token measured. Pick the team that matches your work and install it in five minutes.

See the kits

Keep reading