Inspired by alexgreensh/token-optimizer.
Source: alexgreensh/token-optimizerReverse-engineered from real GitHub workflow.
Find the ghost tokens. Fix them. Survive compaction. Avoid context quality decay.
No specific use case defined.
I need you to build a token optimization tool for AI coding agents. This is a Python project that helps developers reduce unnecessary token usage in their LLM interactions, especially across context windows that suffer quality decay.
The core problem you're solving: AI agents waste tokens on bloated configs, duplicate system prompts, unused skills, redundant memory, and context loss during compaction. Most tools only fix 15-25% of the problem by compressing output. You need to tackle all of it—the other 75-85% of wasted context.
Here's what to build:
A modular Python system (3.8+, zero dependencies, zero telemetry) that identifies and removes "ghost tokens"—dead weight in context. It should work as a Claude Code plugin and integrate with OpenClaw. Create an installer script and organize it so there's a `commands` folder for CLI operations, a `hooks` folder for integrations, an `openclaw` folder for that framework's specific implementation, and a `skills` folder for modular optimization routines.
The tool should scan and optimize multiple layers: command output compression, config bloat removal, duplicate prompt deduplication, stale memory cleanup, and intelligent context compaction that preserves quality. Each pass should measure token savings and context quality metrics.
Build a local dashboard that auto-updates after every session, showing real-time visibility into token usage, cost per turn, and whether optimizations actually helped. Make it visual and informative—people need proof that the tool works.
The system needs to survive context window compactions without losing critical information, and it should gracefully handle being run across multiple platforms (macOS, Linux, Windows). Design it so it can eventually support Windsurf and Cursor, not just Claude Code and OpenClaw.
Keep the architecture plugin-first—hooks and modular skills that can be enabled or disabled per use case. The philosophy is transparency and measurement: users should see exactly where tokens go, what gets cut, and what quality trade-offs they're making.