How we designed, tested, and shipped a 3-type content classification system through data-driven iteration, user simulation, and relentless simplification.
Final result: A 3-type content taxonomy (SPREXPNOTE) that achieved 90% classification accuracy, 4.25/5 clarity rating, and 100% "just right" consensus from 8 simulated user personas.
This seemingly simple system required aggressive consolidation (8 types → 5 types → 3 types), 4 complete test runs with 116 total classification attempts, and 6 design iterations to land on a hover interaction that felt right.
SPRINTFocused deployments that ship to production and measure results. Fast iterations that go live and show real impact. Example: "Customer Support Triage Agent" (90 mins, 40% faster response time).
EXPERIMENTTesting hypotheses or exploring approaches to learn something new. Long-form investigations documenting what we tried and learned. Example: "Local LLMs vs API Calls" (3 months, 80% cost savings).
NOTEQuick insights, patterns, observations, and lessons learned. Intentionally broad to capture everything from single tips to reusable frameworks. Example: "Token Limits Break Everything" (tactical observation + code fix).
Supreme clarity was the top priority. The site (tinylab.com) needed visitors to instantly understand what each post type meant and confidently choose the right filter. Too many types = cognitive overload. Too few = forced bucketing.
We landed on 3 types because:
"Supreme clarity is the top priority over number of types."
This constraint drove every decision. When data showed 8 types hit 95% accuracy but users said "too many," we cut. When 5 types hit 100% accuracy but filters wrapped to two lines, we cut again. The final 3-type system sacrificed 10% accuracy to gain unanimous clarity.
Context: Tiny Lab (tinylab.com) is an AI deployment lab documenting fast shipping cycles, experiments, and lessons learned. The site needed a taxonomy to classify posts in a way that aids navigation while maintaining technical clarity.
Paper background (#f4f2eb) limited color options. Standard web colors failed WCAG AA on this warm cream background. We had to darken Sprint green and Note yellow to meet contrast ratios.
Minimal cognitive load. The target audience (technical builders) needed instant comprehension, not a categorization puzzle. If they had to think too hard about which filter to click, the taxonomy failed.
The hardest constraint wasn't in the brief—it emerged during testing: users don't read descriptions. They glance at type names and make snap judgments. This meant type labels had to be self-explanatory, not rely on hover tooltips or help text.
Approach: Evidence-based design using synthetic personas + simulated classification tasks. We didn't guess what would work—we tested every hypothesis with quantitative data.
We created 8 synthetic personas representing different user archetypes:
Each persona was run by qwen2.5:14b (local LLM) with embedded personality traits, decision-making patterns, and vocabulary preferences.
Classification task: Given 10 post descriptions, classify each into the appropriate type and explain reasoning.
Metrics collected:
Test scenarios included:
Real user testing with 8 people across 4 iterations = expensive and slow. Synthetic personas let us iterate daily, test edge cases systematically, and eliminate interviewer bias. The trade-off: personas lack real-world messiness, so we validated key findings with Hiten's gut check at each stage.
The consolidation story. We started ambitious and cut ruthlessly based on data.
Result: 95% accuracy, 3.54/5 clarity, only 37.5% said "just right"
Problem: 5 out of 8 personas said "too many types." Filters wrapped to two lines. Users confused INSIGHT vs NOTE, PATTERN vs PLAYBOOK.
Decision: Consolidate. INSIGHT/PATTERN/LESSON can fold into NOTE. BUILD can fold into SPRINT.
Result: 100% accuracy, 4.20/5 clarity, 100% said "clearer than 8 types"
Problem: Filters still wrapped on mobile. Navigation test showed 94% accuracy but INSIGHT vs NOTE confusion remained.
Decision: Push further. Can we get to 3 types?
Result: 45% accuracy — FAILED
Problem: Too abstract. "BUILD" felt vague, "LEARN" was unclear. Users couldn't map real posts to these conceptual categories.
Decision: Abandoned. Abstract labels don't work for technical content.
Result: 75% accuracy — FAILED
Problem: "LOG" was unclear. Users expected chronological logs, not insights/patterns. Confusion between LOG and EXP.
Decision: Renamed LOG → NOTE. "Note" is familiar and broad enough.
Result: 90% accuracy, 4.25/5 clarity, 100% said "just right" ✅
Success factors: Clear boundaries, familiar terms, intentionally broad NOTE category catches everything else.
Only errors: 2 edge cases where breakthrough discoveries could be NOTE or EXP (acceptable ambiguity).
Decision: SHIP IT.
| System | Accuracy | Clarity | "Just Right" | Filters Fit Line |
|---|---|---|---|---|
| 8 types | 95% | 3.54/5 | 37.5% | ❌ Wraps |
| 5 types | 100% | 4.20/5 | 100% | ⚠️ Wraps on mobile |
| 3 types | 90% | 4.25/5 | 100% | ✅ Yes |
The trade-off: We gave up 10% accuracy (90% vs 100%) to gain supreme clarity. The 10% drop came from intentional ambiguity—NOTE is deliberately broad to avoid forced categorization.
Fewer types with intentional breadth beats more types with forced precision. Users prefer a slightly fuzzy "NOTE" bucket over choosing between "INSIGHT," "PATTERN," and "LESSON." The cognitive cost of fine-grained distinction outweighs the benefit of specificity.
Design philosophy: Color as accent, not content. Type colors appear only on navigation elements—all post content stays ink blue for readability.
Why these colors:
Contrast testing: All type colors meet WCAG AA (4.5:1) on paper background. Standard web yellow (#FFD700) failed at 3.2:1, so we darkened to #C79100 (5.1:1).
Type badges: 8% colored background + 4px colored left border
Section headings: 4px colored left border (post content only)
Filter buttons: 3px colored accent on active state
Navigation menu: Colored top border + hover tint
Where color NEVER appears:
8/8 personas unanimously agreed: "Keep content text blue, use type colors only for badges and navigation." Colored body text reduced readability and felt gimmicky. Color works as wayfinding, not decoration.
The hover struggle. It took 6 iterations to land on the right hover treatment for the navigation menu.
Failed attempts:
Final solution (Option 5): Shadow + 4px lift, no color change
Why this works:
What we learned the hard way.
Problem discovered: Individual post pages were 720px max-width, but category "See All" pages were 800px. Users would experience jarring width jumps when navigating.
Why it happened: Individual pages were designed first (optimized for readability), then category pages were added later (optimized for showcasing multiple posts). No one noticed the inconsistency until Hiten flagged it.
Fix: Changed all pages to 800px max-width. The wider container handles both single posts and multi-post grids better.
Lesson: Establish layout constraints (max-width, padding, grid systems) BEFORE building individual pages, not after.
Problem discovered: AI couldn't see the actual hover effect rendering, so multiple attempts failed to match Hiten's expectations.
Why it happened: Describing hover states in CSS is easy. Predicting how they'll FEEL in the browser is hard without visual feedback.
Fix: Created hover-exploration.html with 5 side-by-side variations. Hiten could hover over each one and pick the winner visually instead of relying on descriptions.
Lesson: For subjective aesthetics (hover, animation, spacing), show multiple options side-by-side instead of iterating blind.
"this hover thing has been tricky for you, so you probably need to figure out how to simulate it better!"
Accurate diagnosis. The solution: stop describing, start showing. The hover-exploration page let Hiten experience all 5 options in 30 seconds instead of 5 rounds of back-and-forth.
Problem discovered: BUILD/TEST/LEARN system crashed at 45% accuracy (vs 90% for SPR/EXP/NOTE).
Why it happened: We thought abstract conceptual labels would feel elegant and scalable. Users thought they were vague and confusing.
What failed:
Lesson: Concrete labels (Sprint, Experiment, Note) beat abstract concepts (Build, Test, Learn) for classification tasks. Users need tangible boundaries, not philosophical categories.
Problem discovered: Standard web colors (bright yellow, bright green) failed WCAG AA contrast on paper background.
Iterations:
Lesson: Never assume web-safe colors meet WCAG on custom backgrounds. Test every color against every background it'll appear on.
Surprise insight: When Hiten saw the homepage, he said "So simple and me... love it."
What we almost did: Added featured posts, category breakdowns, trending tags, recent activity feeds.
What we shipped: Reverse chronological list with 4 filter buttons. That's it.
Why it worked: The content is the hero. Adding navigation chrome would have diluted the focus. Filters provide structure without prescribing paths.
Lesson: Resist the urge to fill white space. Simplicity is a feature, not a compromise.
What shipped: 7 complete pages, all tested and ready for production.
All 7 pages are production-ready HTML. They can be:
How to apply this process to other projects.
Before designing taxonomy, define what "clarity" means for your audience:
For Tiny Lab: Clarity = instant comprehension, no reading descriptions, filters fit one line.
Build 6-8 personas representing user diversity. Run classification tasks:
Key insight: Test multiple systems (8-type vs 5-type vs 3-type) in parallel. Don't assume your first design is optimal.
When personas say "too many types," they're right. Consolidation steps:
Warning: Don't consolidate below 3 types. Two buckets force binary thinking. Three provides structure without overload.
Concrete beats conceptual. Compare:
| Abstract (Failed) | Concrete (Won) |
|---|---|
| BUILD | SPRINT (specific timeframe + outcome) |
| TEST | EXPERIMENT (clear hypothesis-testing frame) |
| LEARN | NOTE (familiar, low-stakes) |
Rule: If a label requires explanation, it's too abstract.
For subjective decisions (hover, spacing, color intensity):
Why this works: "Shadow + lift" means nothing until you see it. Hover-exploration.html let Hiten pick in 30 seconds what took 6 iterations to describe.
Before building pages, lock in:
Catch width inconsistencies BEFORE building 7 pages, not after.
Start ambitious, cut ruthlessly, test everything, ship simple.
We began with 8 types (ambitious scope). Cut to 5 (ruthless consolidation). Tested 3 variations (evidence-based). Shipped 3 types with universal hover (simple result).
The final product feels obvious. The process was anything but.
| Test Round | System Tested | Personas | Scenarios | Total Classifications |
|---|---|---|---|---|
| Round 1 | 8-type (SPR/EXP/INS/PLY/NOTE/BLD/PAT/LES) | 8 | 5 | 40 |
| Round 2 | 5-type (SPR/EXP/PLY/INS/NOTE) | 8 | 6 | 48 |
| Round 3a | 3-type (BUILD/TEST/LEARN) — FAILED | 4 | 5 | 20 |
| Round 3b | 3-type (SPR/EXP/LOG) — FAILED | 4 | 5 | 20 |
| Round 3c | 3-type (SPR/EXP/NOTE) — WINNER | 4 | 5 | 20 |
Total effort: 5 test rounds, 116 classification attempts, 8 unique personas
| Color | Hex | Contrast Ratio (on #f4f2eb) | WCAG AA |
|---|---|---|---|
| Sprint Green | #00A878 | 4.8:1 | ✅ PASS |
| Experiment Blue | #0d3b9c | 8.9:1 | ✅ PASS |
| Note Yellow (Final) | #C79100 | 5.1:1 | ✅ PASS |
| Note Yellow (Try 1) | #FFD700 | 3.2:1 | ❌ FAIL |
| Note Yellow (Try 2) | #E6B800 | 4.2:1 | ❌ FAIL |