caveman

token optimizationmedium Token Costintermediate

Prompt provenance

Inspired by JuliusBrussee/caveman.

Source: JuliusBrussee/caveman

Reverse-engineered from real GitHub workflow.

Updated 3 weeks ago·1 copies

🪨 why use many token when few token do trick — Claude Code skill that cuts 65% of tokens by talking like caveman

What this solves

No specific use case defined.

Prompt Variants

I need you to create a token-reduction system for Claude Code and other AI coding agents. The core idea is that AI models use way fewer tokens when responding in extremely terse, caveman-like speech while keeping full technical accuracy intact.

Here's what to build:

Create a skill/plugin system that integrates with Claude Code, Cursor, and other agent platforms. The system should make the AI agent talk like a caveman — using minimal words, dropping articles and pronouns, being super direct. Think "New object ref each render. Wrap in useMemo" instead of "The reason your React component is re-rendering is likely because..."

The main components:
- A core caveman mode that rewrites Claude's responses into terse, caveman-speak format while preserving technical accuracy. Should support multiple intensity levels (lite, medium, deep, ancient).
- A token compression tool that pre-processes input to reduce input tokens by ~46% using similar terse techniques
- Integrations for different platforms: Claude Code skill files, Cursor rules, Windsurf configs, and generic Claude plugins
- Special features like 文言文 (classical Chinese) mode for variety, terse commit message generation, and one-line code review summaries
- A benchmark system to measure token savings across different use cases
- Evaluation suite to verify that technical quality doesn't degrade when tokens are cut

The tech stack should be Python-based since that's what the project uses. Store different integration formats (Claude skill files, JSON plugin configs, text-based rules) in separate directories. Include comprehensive documentation for installing across different platforms and usage examples showing before/after token counts.

The whole point is proving you can cut 65-75% of output tokens and 46% of input tokens without losing anything important. Make it dead simple to install — ideally one-line setup. Include benchmarks and evals to back up the token savings claims.