Design Debt Dashboard: Auditing a Design System Against Itself
A build-time scanner that checks this portfolio's own source for design-token drift, instead of proving the idea against a mock file.
- Role
- Designer & Developer
- Timeline
- 2026
- Tools
- Node.js, Astro
- Team
- Solo
The Problem
Design systems decay the moment they ship. A token gets defined once, with real intention behind
it, and then nothing keeps checking whether anyone actually used it. Six months later half the
codebase still calls var(--accent) and the other half has quietly reintroduced #c2752e as a
literal, usually because someone was moving fast in a file that wasn’t open next to the token
sheet. No tooling catches this by default; lint rules for arbitrary hex literals exist but aren’t
on by default, and nobody adds them retroactively to a system that already feels “done.”
This is also the code-side sibling of a paused Figma-plugin concept elsewhere in this portfolio: the same drift problem, but on the production side of the handoff instead of the design-file side.
The Process
I didn’t want to demo this against a synthetic fixture file built to produce a clean before/after.
That’s the easy version, and it proves nothing about whether the idea holds up against a real,
organically-grown codebase. So the scanner points at this portfolio’s own src/ directory: the
same repo this case study lives in, scanned at the moment I’m writing this sentence.
Color drift
| File | Literal | Matches token |
|---|---|---|
src/components/BranchGraph.astro | #c2752e | --accent-amber |
src/pages/design-system.astro | rgba(56, 43, 95, 0.08) | --accent-indigo |
Spacing — no token system exists yet
76 hardcoded spacing declarations across 27 distinct px/rem values were found in 9 files. There's no spacing token to compare these against, so this isn't scored — the absence of a system here is itself the finding.
Show distinct values and files
Distinct values
1px4px8px16px24px320px0.15rem0.2rem0.25rem0.3rem0.35rem0.4rem0.45rem0.5rem0.6rem0.65rem0.75rem0.9rem1rem1.1rem1.25rem1.5rem1.75rem2rem2.5rem3rem4rem Files
src/components/ConstellationMap.astrosrc/components/DebtScanChart.astrosrc/components/ReadingProgressRing.astrosrc/pages/404.astrosrc/pages/about.astrosrc/pages/colophon.astrosrc/pages/contact.astrosrc/pages/design-system.astrosrc/pages/index.astro
node scripts/scan-design-debt.mjs, not on every build.The mechanics are intentionally plain. A small Node script reads the seven color tokens out of
global.css’s :root block (the single source of truth, not a duplicated literal list living
somewhere else), then walks every .astro and .css file under src/ — excluding global.css
itself, since that file is where tokens and their legitimate theme/season re-definitions live, not
a drift target. For every hex color literal it finds, it checks whether the normalized value
matches a token’s value; it does the same for rgb()/rgba() literals, converting each token’s
hex to its decimal channels so a duplicate written either way gets caught. If a literal matches and
isn’t already written as var(--token-name), that’s a drift finding. Separately, it inventories
hardcoded px/rem values used in spacing-shaped declarations — margin, padding, gap, inset, and
the four directional properties — without scoring them against anything, because there is no
spacing token in this system to violate.
It’s read-only. It doesn’t rewrite anything or propose a one-click fix; auto-remediation is a different, riskier tool than the one I wanted to build first.
The Decision
The harder design decision wasn’t the regex, it was what to do with the spacing data once I had
it. A token-vs-drift table is satisfying because it has a clear pass/fail shape. Spacing has no
such shape here — there’s no --space-sm to compare a hardcoded 1.5rem against, so “27 distinct
spacing values across 9 files” isn’t a violation count, it’s just a description of a gap. I decided
not to force it into the same scored framing as the color findings. The dashboard says so directly:
this section isn’t a score, it’s an honest admission that the system stops at color.
I also kept the scan itself manual rather than wiring it into astro build. A build-blocking
design-debt gate is a reasonable thing to want eventually, but bolting one on before establishing
what “acceptable drift” even means for this codebase would just be noise on every deploy. Better to
let the number move only when someone deliberately asks the question.
The Outcome
Run against this repo’s actual src/ directory, the scan found:
- 97% Token Adoption Rate — of 73 total color declarations that should resolve to a token, 71
correctly reference
var(--token-name)and 2 are hardcoded literals that duplicate a token value - 2 color drift findings. One is a
#c2752eliteral inBranchGraph.astro’s demo commit data, which happens to match--accent/--accent-amber— not a coincidence; the component fabricates a fictional commit history for a teaching demo, and the realistic-looking accent color it invented for that fiction turned out to be this site’s actual accent. The other is stranger: argba(56, 43, 95, 0.08)string insidedesign-system.astro’s component-annotation data — not live CSS at all, but a documentation string that quotes the real.tag-pillrule’s background color so the/design-systempage can show it in a hover tooltip. That real rule lives inglobal.cssitself, which this scanner deliberately doesn’t touch (it’s the token source, not a drift target) — so the actual hardcoded value has been sitting in plain sight in the stylesheet this whole time, and the only reason the scanner caught anything related to it is that a second, unrelated file happened to quote it back as a string - A Design Debt Score of 63 — a simple composite (100 minus a penalty per drift finding, minus a smaller penalty per distinct hardcoded spacing value), included as a secondary summary number; the raw counts above are the more honest read
- 76 hardcoded spacing declarations, 27 distinct px/rem values, across 9 files — no score attached, because there’s no spacing token system yet to measure adoption against
The second finding is the more useful one, in an unflattering way. It means this audit, as built,
can’t see drift in the one file where the tokens themselves live — global.css has its own
rgba()-literal duplicates of --accent-indigo (the same .tag-pill rule, plus a few others
documented in that file’s own header comment), invisible to a scanner that exempts the whole file
on the reasonable-sounding assumption that the token source is automatically the token’s best
citizen. It isn’t, always. Catching that required a scan-the-tool-against-itself moment, not the
tool catching it on its own — worth naming rather than quietly patching, since the case study’s
whole premise is reporting what’s true.
Reflection
The spacing finding is the uncomfortable one, and it’s the part I almost framed more gently. 27
distinct spacing values with nothing to check them against is a real gap, not a stylistic choice —
it means every 1.5rem versus 1.25rem decision in this codebase has been made independently,
file by file, with no system enforcing consistency the way the color tokens do. Calling that “27
distinct values, 9 files” instead of scoring it felt like underselling it at first. But scoring it
against nothing would have been worse: a fake number with false precision is exactly the kind of
dashboard-for-its-own-sake this case study is arguing against. The self-referential angle only
works if I hold the tool to its own standard — report what’s true, not what makes a tidier chart.