5 Open Source Repos That Make Claude Code Unstoppable

2026-05-16 12 minutes

Contents

Five open source repositories dropped in March 2026 that expand what Claude Code can do. Karpathy’s AutoResearch runs overnight ML experiments without you. OpenSpace makes agent skills fix and improve themselves. CLI-Anything turns GUI software into agent-ready command-line tools. Claude Peers MCP lets many Claude Code sessions coordinate on one machine. And Google Workspace CLI opens Gmail, Drive, Calendar, and Sheets to agents. All five are free, open source, and plug right into Claude Code.

Here is a quick comparison before the deep dives:

Repository	Stars	Category	Install Complexity	Best Use Case
AutoResearch	42,000+	Autonomous ML	Medium (NVIDIA GPU required)	Overnight ML optimization loops
OpenSpace	1,500+	Skill Evolution	Low (MCP server install)	Self-healing, self-improving agent skills
CLI-Anything	21,000+	Software Bridge	Low (Claude Code plugin)	Making GUI apps agent-accessible
Claude Peers MCP	500+	Multi-Agent	Low (Bun + MCP)	Multi-session coordination
Google Workspace CLI	2,000+	Productivity	Medium (OAuth setup required)	Gmail, Drive, Sheets automation

AutoResearch - Autonomous Machine Learning Experiments on Autopilot

AutoResearch is the breakout repo of March 2026. Andrej Karpathy pushed 630 lines of Python to GitHub on March 6. He went to sleep and woke up to 50 finished experiments. Within two weeks the repo had over 42,000 stars. That makes it one of the fastest-growing repositories in GitHub history.

You point Claude Code at an ML training task, and it enters a loop: run an experiment, measure the result, keep the wins, drop the regressions, repeat. Each experiment runs on a fixed 5-minute training budget. That makes the results easy to compare, no matter what the agent changed between runs. At about 12 experiments per hour, you get roughly 100 experiments overnight.

The repo has only three files that count. prepare.py handles fixed constants, one-time data prep (it downloads training data and trains a BPE tokenizer), and runtime helpers. You don’t touch this file. train.py is the single file the agent edits. It holds the full GPT model, the optimizer (Muon + AdamW), and the training loop. Architecture, hyperparameters, optimizer settings, and batch size are all fair game. Finally, program.md tells the agent what to optimize. It is the only file you edit as a human, and it works as a lightweight skill definition .

The feedback loop is built on git. Every improvement gets committed. Every regression triggers a git reset. The metric is val_bpb, short for validation bits per byte. Lower is better, and the score does not depend on vocabulary size, so the agent can compare architectural changes fairly.

AutoResearch experiment progress chart showing validation bits per byte improving over successive experiments — AutoResearch tracks val_bpb across experiments, with each improvement committed to git

Shopify CEO Tobi Lutke ran AutoResearch on an internal 0.8B parameter model. Over 8 hours and 37 experiments, he reported a 19% efficiency gain. No human stepped in.

When AutoResearch Works and When It Does Not

AutoResearch is strong at anything with a numeric or pass/fail score. Think script speed tuning, prompt pass/fail rates, system prompt format checks, or hyperparameter tuning. If a machine can score the output, AutoResearch will grind through variations all night.

It falls apart on subjective tasks. Creative writing quality, social media engagement, cold email results: anything that needs human judgment turns the loop into a random walk. The agent has no real signal to optimize against, so the gains are fake.

Getting Started

You need an NVIDIA GPU, Python 3.10+, and the uv package manager. The repo was tested on an H100, but community forks exist for macOS , Windows , and AMD .

curl -LsSf https://astral.sh/uv/install.sh | sh
git clone https://github.com/karpathy/autoresearch.git
cd autoresearch
uv sync
uv run prepare.py
uv run train.py

Once the single manual run finishes, spin up Claude Code in the repo directory. Disable permissions, then tell it to read program.md and start experimenting. Walk away. Check the git log in the morning.

For smaller hardware, Karpathy suggests a few changes. Switch to the TinyStories dataset , drop vocab_size to 2048 or even 256, cut MAX_SEQ_LEN to 256, and set DEPTH to 4. The community forks handle most of these tweaks for you.

OpenSpace - Self-Evolving AI Agent Skills

OpenSpace comes from the Data Intelligence Lab at Hong Kong University (HKUDS). That is the same group behind LightRAG . Where AutoResearch tunes ML models, OpenSpace tunes the skills themselves. It is an MCP server that watches how your Claude Code skills perform, then sorts them into three buckets to act on.

The three buckets are AUTO-FIX, AUTO-IMPROVE, and AUTO-LEARN. When a skill errors out, AUTO-FIX reads the error logs, finds the root cause, writes a repair patch, and checks the fix on its own. When a task finishes smoothly, AUTO-IMPROVE studies the run, spots patterns it can tune, and turns the winning practices into standard workflows. When a skill keeps performing well, AUTO-LEARN locks it down as a template that future skills can inherit.

Under the hood, OpenSpace tracks four kinds of skill change. Captured stores patterns that worked. Fixed writes repair code for failures. Derived merges two patterns into a stronger one. Import pulls external skills from the community cloud at open-space.cloud and adapts them for local use.

The Numbers

HKUDS tested OpenSpace on 220 real professional tasks across 44 jobs using Qwen 3.5+. Average quality jumped from a 40.8% baseline to 70.8%. Agents that used the improved skills also burned 46% fewer tokens. The cost of running the MCP monitoring pays for itself, because it cuts repeated failures and wasted tokens.

OpenSpace framework architecture showing the skill evolution pipeline with auto-fix, auto-improve, and auto-learn phases — OpenSpace self-evolution framework from the HKUDS Data Intelligence Lab

Their showcase project, “My Daily Monitor,” shows the approach at scale. Starting from just 6 skills, the agent built 60+ more from scratch to create a multi-panel live dashboard. No human-written code. The skills grew through use, and each round fixed bugs and improved on the last one.

Setting It Up

Install OpenSpace as an MCP server in your Claude Code configuration:

pip install openspace-ai

Copy the base skills (delegate-task and skill-discovery) to your skills directory. OpenSpace also supports a direct execution mode and a Python API for code integration. It works with Claude Code, Codex, and other agent frameworks that support skills.

CLI-Anything - Turn Any Software into an Agent-Native CLI

CLI-Anything is also from HKUDS, and it tackles a different problem. Claude Code cannot touch GUI applications. You can tell it to edit a Blender scene or adjust audio in Audacity, but it has no interface to do so. CLI-Anything builds tested, self-documenting command-line interfaces for any open source app. That gives Claude Code real control over software it could not reach before.

CLI-Anything overview showing supported applications including Blender, GIMP, Audacity, OBS, and other creative and productivity tools — CLI-Anything bridges the gap between AI agents and existing desktop software

The repo has picked up over 21,000 GitHub stars since its March 2026 release. The project homepage at clianything.org and the community hub at CLI-Hub offer pre-built CLIs and docs.

The 7-Phase Pipeline

When you point CLI-Anything at a software project, it runs a 7-phase pipeline. First it reads the codebase and maps GUI actions to the APIs and entry points behind them. Then it designs command groups that mirror what the app does. It builds a Click-based CLI with REPL mode, JSON output, and undo/redo support. Next it writes full unit and end-to-end test suites. It also generates a SKILL.md file for agent discovery and full --help docs. Last, it publishes the CLI to your system PATH.

The result is a stateful CLI with structured JSON output, clean state handling, and predictable behavior. Each CLI wraps single endpoints into tidy command groups. You get one tool instead of dozens of raw API calls.

Pre-Built CLIs and Getting Started

CLI-Anything ships with pre-built CLIs for Blender , GIMP , Krita , Inkscape , Audacity , OBS Studio , LibreOffice , Draw.io , Ollama , ComfyUI , and 30+ more apps. You can also generate a CLI for any software with a codebase.

Installation takes two commands in Claude Code:

/plugin marketplace add HKUDS/CLI-Anything
/plugin install cli-anything

Then point it at any codebase:

/cli-anything:cli-anything ./target-software

After the first pass, you can keep refining and adding features if it missed anything. The March 23 launch of CLI-Hub added a meta-skill. It lets agents find and install CLIs on their own from a community registry.

Why CLI Over MCP

HKUDS makes a clear design argument: CLIs work better than MCPs for agent use. A CLI runs on any agent and any platform. It composes into bigger workflows through piping and chaining. It runs light, with no server process to keep alive. It documents itself through --help. And it behaves the same way every time, which makes automation reliable. An MCP server needs a running process and protocol setup. A CLI is just a binary on your PATH.

Claude Peers MCP - Let Your Claude Code Instances Talk to Each Other

Claude Peers MCP solves an isolation problem. When you run five Claude Code sessions across different projects, each one sits in its own bubble. They cannot find each other, share context, or coordinate work. Claude Peers breaks that wall down.

The design is simple. A broker daemon runs on localhost:7899, backed by SQLite. Each Claude Code session spawns its own MCP server. That server registers with the broker and polls for messages every second. Messages arrive over Claude’s channel protocol, so the other session sees them right away.

Four tools become available to every connected session:

Tool	What It Does
`list_peers`	Find other Claude Code instances - scoped by machine, directory, or repo
`send_message`	Send a message to another instance by ID (arrives instantly via channel push)
`set_summary`	Describe your current work context (visible to other peers)
`check_messages`	Manual message retrieval fallback if not using channel mode

If you set an OPENAI_API_KEY in your environment, each instance writes a work summary on startup using gpt-5.4-nano. Each call costs a fraction of a cent. The summary guesses what you are working on from your directory, git branch, and recent files. Without the API key, Claude sets its own summary through set_summary.

The Multi-Agent Harness Pattern

The real value of Claude Peers ties to Anthropic’s own engineering post from March 24, 2026, on harness design for long-running application development . The key insight is simple. Claude Code judges its own work poorly and tends to rate its output too high. The fix is role separation. A planner session defines the task, an executor session builds it, and an evaluator session grades the result on its own.

Claude Peers makes this three-session pattern work in practice. The planner sends task definitions to the executor. The executor builds and commits. The evaluator pulls the result, runs checks, and sends feedback back for another round. Each session keeps its own context window, so it avoids the compaction problems that hit single long-running sessions.

Quick Start

git clone https://github.com/louislva/claude-peers-mcp.git ~/claude-peers-mcp
cd ~/claude-peers-mcp
bun install
claude mcp add --scope user --transport stdio claude-peers -- bun ~/claude-peers-mcp/server.ts

Launch Claude Code with the channel enabled:

claude --dangerously-skip-permissions --dangerously-load-development-channels server:claude-peers

The broker starts on its own the first time you use it. Open a second terminal with the same command, then ask either session to list peers. You need the Bun runtime, Claude Code v2.1.80+, and a claude.ai login. Channels require that login, and API key auth will not work.

Google Workspace CLI - Full Google Suite Access for Claude Code

Google Workspace CLI (gws) is one command-line tool that covers Drive, Gmail, Calendar, Sheets, Docs, Chat, and Admin. Google developers built it under the Apache-2.0 license, though Google does not formally endorse it. It gives Claude Code clean, structured access to the whole Google productivity stack.

The dynamic command surface sets gws apart from other Google integrations. The CLI reads Google’s own Discovery Service at runtime and builds its commands on the fly. When Google adds an API endpoint, gws picks it up at once. No updates, no new releases, no waiting.

Agent Skills and MCP Server

The repo ships 100+ agent skills sorted by service. Gmail gets +send, +reply, and +triage for the inbox. Sheets has +append and +read for spreadsheets. Calendar offers +agenda and +insert for scheduling. Drive includes +upload for files. Higher-level helpers like +standup-report, +meeting-prep, and +weekly-digest chain several services together.

Every response returns structured JSON. For paged results, NDJSON streaming is on tap through --page-all. The CLI also runs an MCP server mode through gws mcp, so it works with Claude Desktop, Gemini CLI, and VS Code.

Security: Model Armor Integration

Giving an AI agent access to your email and documents raises real security concerns. A bad actor who controls content in your Drive or inbox could craft a document that hijacks agent behavior the moment the file is read. The --sanitize flag hooks into Google Cloud Model Armor to scan every API response before it reaches the agent. It runs malicious URI detection, responsible AI filters, and prompt injection classifiers. You get two modes: warn logs and continues, or block halts the response.

This does not remove every risk. Service account credentials stored the wrong way, OAuth scopes that are too broad, and unverified testing-mode apps all create exposure. The project suggests a simple sandbox approach: do not give Claude Code access to everything. Start with just Drive, or just a separate email account with shared folders. Clone the repo and have Claude Code walk you through which skills fit your case.

Installation and Authentication

npm install -g @googleworkspace/cli
gws auth setup
gws auth login

The auth setup command walks through Google Cloud project setup, turns on the required APIs, and handles OAuth. Credentials are encrypted at rest with AES-256-GCM. For headless or CI use, you can export credentials from an interactive session and set GOOGLE_WORKSPACE_CLI_CREDENTIALS_FILE on the target machine. Service account keys also work for server-to-server setups.

One catch: if your OAuth app is unverified (testing mode), Google caps consent at about 25 scopes. The default scope preset includes 85+ scopes and will fail for unverified apps. Use gws auth login -s drive,gmail,sheets to pick only the services you need.

The project is at version 0.4.4 and headed toward v1.0. Expect breaking changes. It is also on Homebrew (brew install googleworkspace-cli), Cargo (cargo install google-workspace-cli), and Nix.

What Ties These Repos Together

These five repositories hit different parts of the same bottleneck. Claude Code is strong inside a single coding session, but it runs into real walls at the edges. Those walls are autonomous experiments, skill upkeep, GUI software, multi-instance coordination, and productivity suite access.

Each repo pushes one of those walls outward. You can run all five, or just the one that fits your workflow. Either way, they show how fast community tooling is closing the gap between what Claude Code does on its own and what your work actually needs. The code is on GitHub, MIT or Apache licensed, and installs in minutes.