← Back to all work

Design Debt Dashboard: Auditing a Design System Against Itself

A build-time scanner that checks this portfolio's own source for design-token drift, instead of proving the idea against a mock file.

97% color declarations using a real token
Role
Designer & Developer
Timeline
2026
Tools
Node.js, Astro
Team
Solo

The Problem

Design systems decay the moment they ship. A token gets defined once, with real intention behind it, and then nothing keeps checking whether anyone actually used it. Six months later half the codebase still calls var(--accent) and the other half has quietly reintroduced #c2752e as a literal, usually because someone was moving fast in a file that wasn’t open next to the token sheet. No tooling catches this by default; lint rules for arbitrary hex literals exist but aren’t on by default, and nobody adds them retroactively to a system that already feels “done.”

This is also the code-side sibling of a paused Figma-plugin concept elsewhere in this portfolio: the same drift problem, but on the production side of the handoff instead of the design-file side.

The Process

I didn’t want to demo this against a synthetic fixture file built to produce a clean before/after. That’s the easy version, and it proves nothing about whether the idea holds up against a real, organically-grown codebase. So the scanner points at this portfolio’s own src/ directory: the same repo this case study lives in, scanned at the moment I’m writing this sentence.

63 Design Debt Score
97% Token Adoption Rate
24 Files Scanned

Color drift

File Literal Matches token
src/components/BranchGraph.astro #c2752e --accent-amber
src/pages/design-system.astro rgba(56, 43, 95, 0.08) --accent-indigo

Spacing — no token system exists yet

76 hardcoded spacing declarations across 27 distinct px/rem values were found in 9 files. There's no spacing token to compare these against, so this isn't scored — the absence of a system here is itself the finding.

Show distinct values and files

Distinct values

1px4px8px16px24px320px0.15rem0.2rem0.25rem0.3rem0.35rem0.4rem0.45rem0.5rem0.6rem0.65rem0.75rem0.9rem1rem1.1rem1.25rem1.5rem1.75rem2rem2.5rem3rem4rem

Files

  • src/components/ConstellationMap.astro
  • src/components/DebtScanChart.astro
  • src/components/ReadingProgressRing.astro
  • src/pages/404.astro
  • src/pages/about.astro
  • src/pages/colophon.astro
  • src/pages/contact.astro
  • src/pages/design-system.astro
  • src/pages/index.astro
Scan last run Jun 23, 2026, 11:10 AM — triggered manually via node scripts/scan-design-debt.mjs, not on every build.

The mechanics are intentionally plain. A small Node script reads the seven color tokens out of global.css’s :root block (the single source of truth, not a duplicated literal list living somewhere else), then walks every .astro and .css file under src/ — excluding global.css itself, since that file is where tokens and their legitimate theme/season re-definitions live, not a drift target. For every hex color literal it finds, it checks whether the normalized value matches a token’s value; it does the same for rgb()/rgba() literals, converting each token’s hex to its decimal channels so a duplicate written either way gets caught. If a literal matches and isn’t already written as var(--token-name), that’s a drift finding. Separately, it inventories hardcoded px/rem values used in spacing-shaped declarations — margin, padding, gap, inset, and the four directional properties — without scoring them against anything, because there is no spacing token in this system to violate.

It’s read-only. It doesn’t rewrite anything or propose a one-click fix; auto-remediation is a different, riskier tool than the one I wanted to build first.

The Decision

The harder design decision wasn’t the regex, it was what to do with the spacing data once I had it. A token-vs-drift table is satisfying because it has a clear pass/fail shape. Spacing has no such shape here — there’s no --space-sm to compare a hardcoded 1.5rem against, so “27 distinct spacing values across 9 files” isn’t a violation count, it’s just a description of a gap. I decided not to force it into the same scored framing as the color findings. The dashboard says so directly: this section isn’t a score, it’s an honest admission that the system stops at color.

I also kept the scan itself manual rather than wiring it into astro build. A build-blocking design-debt gate is a reasonable thing to want eventually, but bolting one on before establishing what “acceptable drift” even means for this codebase would just be noise on every deploy. Better to let the number move only when someone deliberately asks the question.

The Outcome

Run against this repo’s actual src/ directory, the scan found:

The second finding is the more useful one, in an unflattering way. It means this audit, as built, can’t see drift in the one file where the tokens themselves live — global.css has its own rgba()-literal duplicates of --accent-indigo (the same .tag-pill rule, plus a few others documented in that file’s own header comment), invisible to a scanner that exempts the whole file on the reasonable-sounding assumption that the token source is automatically the token’s best citizen. It isn’t, always. Catching that required a scan-the-tool-against-itself moment, not the tool catching it on its own — worth naming rather than quietly patching, since the case study’s whole premise is reporting what’s true.

Reflection

The spacing finding is the uncomfortable one, and it’s the part I almost framed more gently. 27 distinct spacing values with nothing to check them against is a real gap, not a stylistic choice — it means every 1.5rem versus 1.25rem decision in this codebase has been made independently, file by file, with no system enforcing consistency the way the color tokens do. Calling that “27 distinct values, 9 files” instead of scoring it felt like underselling it at first. But scoring it against nothing would have been worse: a fake number with false precision is exactly the kind of dashboard-for-its-own-sake this case study is arguing against. The self-referential angle only works if I hold the tool to its own standard — report what’s true, not what makes a tidier chart.