I've been building a research tool that automatically extracts entities from historical document collections to create structured knowledge databases. (It's called 'hinbox' (i.e. 'historian in a box') via McChrystal's framing IYKYK 🫠)
The project connects my historian background (manually building Afghan media databases in the 2000s) with current AI capabilities. It processes documents to identify people, organisations, locations, and events, then intelligently merges similar entities across sources.
What makes this interesting: I'm using it as a practical testbed for systematic AI evaluation techniques from the
@HamelHusain /
@sh_reya evals course I've been taking. Rather than abstract methodology discussions, I want to document rigorous approaches to prompt optimisation and error analysis applied to real research problems.
The tool isn't production-ready - entity merging needs work, prompts require iteration. (Frontend also needs prettifying.) But that's the point. The meaningful learning happens in systematic improvement, not just initial builds.
Upcoming posts will show concrete examples of structured evaluation frameworks improving extraction accuracy. Perfect case study for moving beyond intuitive AI development toward measurable approaches.
Full technical breakdown and context linked in the thread 👇
#AI #Research #History #EntityExtraction #Evals
ALT Frontpage of the entity browser UI that comes with the project.
ALT Some processing logs while the system crunches through the articles.