I Stopped Writing Better Prompts and Started Writing Better Files

Project state belongs on disk, not in your chat history.

Here’s the workflow I ran for most of last year: open Claude Code, paste a 4,000-word project brief into the system prompt, type “let’s start with the PRD,” and pray that by the end of the session Claude still remembered what we were building.

It worked. It also burned twenty minutes at the start of every session and roughly the same again when I’d inevitably forget which version of the brief I’d last updated. By the time I had three projects in flight, I was managing four documents per project across a directory tree I couldn’t keep straight — and a single bad paste was good for an afternoon of Claude confidently building the wrong feature.

So I did what every senior developer eventually does. I built another framework. It’s called TStack. The framework isn’t the point — what’s worth your time is the handful of small design choices it forced me to make.

The Problem Isn’t Prompts

“Just write better prompts” is the wrong answer. The actual problem is that project state — the brief, the PRD, the architecture, the roadmap, where you are in the roadmap — lives in your head or in a chat tab. Claude has to be hand-fed all of it every session, and the file you paste from drifts faster than the project does.

The fix is moving state to disk and giving Claude a way to find it on its own. That’s the bet behind Anthropic’s Agent Skills: a folder, a SKILL.md, a YAML frontmatter that auto-triggers when you say something matching its description. I took my paste-the-guide workflow, broke it into stages, turned each stage into a skill, and chained them. That’s TStack.

The Lifecycle

Seven skills. Four form a one-time setup chain. Two form a loop you run every milestone. The seventh is for iteration after launch.

discover → product → architect → roadmap → [plan → build] → ...
                                            ↑
                                       specify (post-launch additions)

The setup chain takes you from “I want to build an app that helps freelancers track time” through a business brief, a PRD, an architecture doc set, and a roadmap broken into milestones. Then tstack-plan reads a single milestone and writes an approved implementation plan; tstack-build ships it. Loop those two until the roadmap is empty. tstack-specify is the entry point after launch — it folds a new feature into the existing docs and roadmap without starting over.

Each stage reads its input from disk and writes its output to disk. The next stage’s auto-trigger requires the previous stage’s output to exist before it’ll fire. You don’t have to remember where you are; the file system remembers for you.

The key insight: this isn’t a workflow. It’s a state machine, and the disk is the state.

The Description Field Is Load-Bearing

The thing about skills that took me longest to internalize: the description in your SKILL.md frontmatter isn’t documentation. It’s a prompt fragment Claude reads every turn to decide whether to load this skill into context.

Wishy-washy descriptions miss the trigger. Too-aggressive ones fire when you didn’t want them to. Overlapping ones lose coin flips to whichever sibling skill looks more relevant in the moment.

After several rewrites, the shape that worked for me is four parts in this order:

What it does, in one sentence.
“Use when…” clauses with the trigger phrases.
Input/Output, stated explicitly. The Input clause is what makes Claude check prerequisites before triggering.
Handoff — what the next skill expects.

Here’s the trick. tstack-product’s description includes “Input: a completed docs/1 - Discovery/business-brief.md.” On a fresh project that file doesn’t exist — so when the user says “let’s write the PRD,” Claude reads the Input requirement, sees nothing to read from, and suggests running tstack-discover first. Free prerequisite checking via a string in YAML.

The other trick: disambiguating mutually exclusive skills. tstack-product creates an initial PRD; tstack-specify amends an existing one. Both could plausibly trigger on “add a thing about pricing.” So each description carries an explicit negative phrase pointing at the other — “Not for amending an existing PRODUCT.md (use tstack-specify for that).” Without that, you get coin-flip triggering.

The key insight: the description is load-bearing prose, not docstring. Treat it like production code.

Three Guarantees Worth Stealing

The lifecycle is the part that looks impressive in a diagram. The parts that actually shipped useful behavior are smaller and uglier.

1. Force the boring decisions upstream

Every project I’ve worked on has eventually rediscovered that nobody made an explicit decision about logging, or that “we’ll figure out accessibility later” was a decision that just never got made. The conversation is dramatically cheaper before there’s code to defend.

So tstack-architect won’t write a line of architecture until four ADRs exist: security, observability, accessibility, privacy. AI/LLM products add a fifth covering model choice, evals, and fallback. The skill refuses to proceed otherwise. Thirty seconds of annoyance, three months of saved arguments.

2. Make infrastructure unskippable

Every project I’d built with AI assistance had the same shape: dazzlingly fast for three weeks, then a grinding halt the moment I needed to deploy or debug a production-shaped problem. The dazzling part was Claude shipping features against non-existent infrastructure. The halt was me catching up.

So tstack-roadmap refuses to save a roadmap unless M0 — Infrastructure baseline is the first milestone: CI, branch protection, secrets, deployment skeleton, observability, lint, test runner. Making it unskippable doesn’t make infrastructure interesting. It just means you can’t put it off long enough for the bill to compound.

3. Refuse “Verified ✓” without the bytes

Models claim things work because the idea of it working is in their context. The cheapest possible hallucination check is to demand the actual command output — npm test, xcodebuild, curl, whatever the criterion is. tstack-build requires it. A few extra lines per milestone. It’s saved me from shipping a “passing” test suite where the test file had never been imported.

The key insight: pick three or four small guarantees the skill won’t skip. Those refusals are what turn a workflow from suggestion into discipline.

Eval-Based Acceptance for AI Features

One more refusal, big enough to deserve its own section.

tstack-product accepts Given/When/Then for normal features. For AI features — anything where behavior depends on a model — it doesn’t. The required shape is three parts:

Eval set. Versioned, on disk, in the repo. A list of inputs paired with expected outputs or quality signals.
Quality bar. A measurable threshold the eval set must hit. (“≥85% pass on evals/summarization-v3.jsonl.”)
Fallback. A deterministic path when the model is unavailable or below threshold.

The point isn’t that Given/When/Then is bad. The point is that “Given a long article, when I summarize, then I get a summary” is not a test. It’s a vibe. Eval sets are slightly annoying to maintain and they catch regressions that nothing else catches.

If you’re building AI features and you don’t have eval sets in your repo, you’re flying blind. Whether you adopt TStack or not, the format is worth stealing.

When Multiple Plugins Want To Trigger

Most people don’t install one plugin. I run TStack alongside superpowers, whose writing-plans, executing-plans, and brainstorming skills overlap with three of mine. In theory, every “plan it” should be a coin flip. In practice it isn’t, and the reason is the Input clause again.

TStack descriptions reference specific files: docs/PRODUCT.md, docs/ROADMAP.md, the milestone IDs. Generic-named skills from other plugins don’t. When docs/ROADMAP.md exists and you say “plan the next milestone,” Claude sees TStack’s prereqs satisfied and the others’ not, and picks TStack. When docs/ROADMAP.md doesn’t exist — say, on a repo that isn’t using TStack — TStack steps aside and the generic skill takes the call.

It’s not perfect. The repo includes a small table of disambiguating phrases for the cases where it does collide, and an explicit slash-command escape hatch (/tstack-plan, /tstack-build, etc.) for when you want to force the choice.

The general lesson is more interesting than the specific table: if you want your skill to lose gracefully to other plugins in unrelated contexts, name your prerequisites in the description. The presence or absence of those files is what tells Claude whether the skill belongs in this conversation.

AGENTS.md as Single Source of Truth

One more pattern. AGENTS.md is an open standard for cross-tool agent config — boring, sensible, exactly what should have existed from the start. Make it the only source of truth; every tool-specific file points at it:

Tool	Config file	Contents
Claude Code	`CLAUDE.md`	`See @AGENTS.md`
Cursor	`.cursorrules`	`See @AGENTS.md`
Codex	`codex.md`	`See @AGENTS.md`

Cursor and Claude Code chase each other on config-file naming. The only sane move is to write the conventions once and let everyone read from the same file.

Mental Model

If you take one thing from this, take this table:

Layer	Where it lives	What it’s for
Lifecycle	The chain of skills	Stages and handoffs
Triggering	`description` frontmatter	Deciding when a skill loads
Prerequisites	Filenames inside the description	Checking that state exists
State	Files in `docs/`	Surviving session boundaries
Guarantees	Skill bodies	What the skill refuses to skip
Cross-tool sync	`AGENTS.md` + pointer files	One source, many readers

Practical habits:

Treat your skill description as production code. Audit it the way you audit a regex.
Name your prereq files explicitly in descriptions. Free prerequisite checks and graceful disambiguation.
Pick the three things your skill should refuse to skip. Make those refusals loud.
Whatever your domain, write the eval-set / quality-bar / fallback contract before you write the AI feature.
One AGENTS.md. Every other config file points at it.

If the thing you keep pasting at the start of every session is more than a paragraph, it should probably be a skill. The disk is right there, and the auto-trigger is doing more work than you think.