From Medieval Manuscripts to Training Cards: The HEMA Data Pipeline
I wanted to make HEMA training cards. Physical flashcards with medieval sword techniques on them. The data existed in manuscripts, wikis, and academic translations. Getting it into a format that could generate print-ready cards, QR-linked landing pages, and digital training tools took a pipeline spanning five languages and six hundred years of source material.
The pipeline
Stage 1: Scraping. A Go application reads Wiktenauer via the MediaWiki API, parses the eight-column table layout, identifies which manuscript each illustration belongs to, and outputs structured JSON.
Stage 2: Enrichment. An AI layer generates card metadata: type classification, one-line summaries, training notes, technique tags, and relationship data.
Stage 3: Card generation. A Node.js tool generates 78 print-ready card images at 300dpi using HTML templates rendered via Puppeteer.
Stage 4: Web platform. A Laravel application serves technique pages, a filterable index, training tools, and a shop with Stripe pre-orders. Every card has a QR code that resolves through a redirect layer to its technique page.
Stack
Go (scraper), Node.js (card generation), Laravel 12 + Blade + Alpine.js + HTMX (web platform), Stripe (payments), SQLite, Tailwind.
What made it interesting
The data modelling was the hard part. A technique can appear in four different manuscripts. The type system had to work across Fiore's entire curriculum and eventually across different masters. The QR redirect layer means printed cards never go stale.
Current state
Fiore Longsword deck (78 cards) at pre-order stage. Fiore Dagger next.