Customise and contribute
Where the data comes from, how to add your own scorer, tune thresholds, consume the history programmatically, and contribute back.
By Level 300 you’ve internalised the headline, run the weekly MOT a few times, and one or two scores keep nagging at you. This level is for changing the tool itself — your thresholds, your scorers, your pipeline.
Background reading. The architectural patterns Level 300 customisations enable — subagents, skills as on-demand context, hooks as pre-filters, MCP hygiene — are covered in depth in Part 3: Scaling the Discipline. When you’re tuning thresholds for a team rather than yourself, that’s the framing to read alongside.
Where the data comes from
Everything TokenSquirrel scores is derived from files Claude Code already maintains:
~/.claude/history.jsonl— every prompt you’ve sent, grouped into sessions. Powers slash-command frequency scorers (/clear,/context,/cost,/model,/mcp, plan mode), session hygiene, and paste hygiene.~/.claude/stats-cache.json— token counts, model usage, daily activity. Powers cache efficiency, tool efficiency, and the cost estimate.~/.claude/settings.jsonand project-local.claude/settings.json— MCP servers, plugins, hooks. Powers all setup-weight scorers.- Your
CLAUDE.mdfiles (project + user) — word count for the CLAUDE.md weight scorer. ~/.claude/skills/and installed plugins — count and frontmatter token weight for the skill bloat scorer.~/.claude/projects/**/*.jsonl— transcripts of past sessions, used byskillsandskills --auditto compute per-skill last-used dates and invocation counts. Read bysrc/parsers/transcripts.ts(cached so repeat runs are cheap).
CLI flags for customisation
--claude-dir <path>— point at a different~/.claude/. Useful if you have multiple installs, or want to audit a teammate’s exported data.--history-file <path>— store the audit history JSONL somewhere other than the default. Handy if you want it in a synced folder, or split per-machine.--no-save— run an audit without appending to history. Use it for one-off “what does this look like with--hours 6?” experiments that you don’t want polluting your trend.--label <text>— tag a run in the history with a snapshot label, e.g.--label "before-mcp-cleanup". Then--trendshows the label so you can see the before/after.--format md— applies to compact, detail, trend, and MOT modes.
Adding your own scorer
Walk-through is in CLAUDE.md at the repo root. Short version:
- Add a function to
tools/tokensquirrel/src/scoring/scorer.tsthat returns aScoreResult. - Use the helpers in
scoring/benchmarks.ts—ratioToScorefor “more is better”,inverseScorefor “less is better” — so your grade thresholds line up with the rest. - Append the function call to the array returned from
runAllScorers(). - Add a narrative case in
output/narratives.tsso the compact/MOT views can phrase your scorer’s output as a sentence rather than just a number. - The terminal and markdown formatters pick it up automatically — no formatter changes needed.
Tweaking thresholds and pricing
- Grade thresholds live in
tools/tokensquirrel/src/scoring/benchmarks.ts. If you think 50 messages/session is too generous for an A, raise it there. The change applies to every scorer that uses that helper. - Pricing lives in
tools/tokensquirrel/src/scoring/cost.ts— input/output/cache rates per model. Update when Anthropic changes prices, or override locally if you’re on a custom contract.
Consuming the history programmatically
The history file is line-delimited JSON; one AuditRecord per line (schema in src/types.ts). Drop it into anything that reads JSONL:
# average overall score across all stored runs
jq -s 'map(.overall) | add / length' ~/.tokensquirrel/history.jsonl
# pull every run with a specific label
jq 'select(.label == "before-mcp-cleanup")' ~/.tokensquirrel/history.jsonl
If you want to ship audits somewhere — a dashboard, a Slack bot, a quarterly review — that JSONL is the integration point. Append-only, no schema migration baggage.
Contributing back
TokenSquirrel is on GitHub. If you’ve added a scorer, tuned thresholds for a use case the defaults don’t handle well, or fixed something — open a PR. The codebase is small, has no runtime dependencies, and the architecture (under tools/tokensquirrel/src/) splits cleanly into parsers, scoring, and output. New scorers are usually 50 lines plus a narrative case.
Issues, ideas, and “this score doesn’t reflect how I actually work” reports are equally welcome.