CarExtract - Medical Document Field Extraction Platform
CarExtract is a full-stack platform for extracting structured fields from medical and care documents — patient intake forms, clinical notes, prescriptions, referrals, and more — using vision-capable language models.
Developed as an open-source reference implementation under the Cloud2 Labs Innovation Hub, CarExtract demonstrates how user-defined field schemas, OpenAI-compatible provider routing, and inline ground truth editing can be packaged into a production-grade microservices architecture. Upload typed or handwritten care documents, define the fields you need to extract, connect any vision model via a single endpoint config, and measure extraction accuracy across providers — against ground truth you review and correct yourself.
It showcases a two-service architecture: a FastAPI backend handling field schema management, provider CRUD, dynamic prompt construction, batch async extraction, and accuracy evaluation — paired with a React + Vite + TypeScript + Tailwind frontend for document management, live analysis runs, result visualisation, and CSV export. All services are containerised via Docker Compose.
What It Demonstrates
CarExtract illustrates how to:
- Extract user-defined structured fields from medical document images (JPG, PNG, PDF) — patient names, dates of birth, phone numbers, addresses, diagnosis codes, medications, and more — using any vision LLM served over an OpenAI-compatible API
- Build a dynamic prompt engine that generates system and user prompts at runtime from a persisted field schema, with type-aware hints for strings, dates, phones, addresses, and numbers — no hard-coded templates
- Enable inline ground truth editing so users review and correct extracted values in the UI before running accuracy analysis — evaluation reflects real, human-verified labels rather than a static pre-stored file
- Route extraction requests to multiple providers simultaneously and evaluate per-field accuracy across all models in a single analysis run
- Apply type-aware field comparison: exact string matching, date normalisation, phone digit stripping, fuzzy address scoring, and numeric parsing
- Deploy a two-service full-stack application with Docker Compose, with nginx in production and Vite dev proxy in development
Designed for AI engineers, healthtech teams, and document automation builders who need a practical reference for multi-model LLM evaluation over real-world medical document datasets.
Key Capabilities

Built for Care Documents
Optimised for patient intake forms, clinical notes, prescriptions, and referrals. Works on typed and handwritten documents alike. Adapts to any medical schema via user-defined fields — no re-engineering required when document types change.

Dynamic Field Schema
Users define extraction fields directly in the UI: key, display name, data type (string, date, phone, address, number), description, and optional example. Fields are stored in config_data/fields.json and injected into prompts at runtime. Add, edit, reorder, or delete fields without touching code. Field definitions are snapshotted with every analysis run for full reproducibility.

Document Extract with Human-in-the-Loop Editing
A dedicated Document Extract page lets users select specific documents, choose a provider, and run on-demand extraction per file. Extracted field values are displayed inline as editable inputs — clinicians can correct any incorrect or missing values before saving entries as verified ground truth. Supports rapid spot-checking and iterative ground truth labelling without leaving the interface.

Extraction
Instructions
Per-run instructions can be appended to the system prompt to guide the model on document-specific context: date format conventions, abbreviation standards, handwriting characteristics, or specialty-specific field rules — without modifying the base field schema.

Provider-Agnostic Architecture
Connect any OpenAI-compatible endpoint — GPT-4o, Claude, Gemini, local Ollama, vLLM, OpenRouter, or custom inference servers. Configure base URL, model ID, API key, temperature, and max tokens per provider. Live connectivity tests confirm reachability before a run.

Multi-Provider Analysis Runs
Trigger a single analysis run across all selected providers simultaneously. Async extraction with bounded concurrency maximises throughput. Progress is polled live in the frontend — current model and document count visible in real time. Results redirect automatically to the visualisation dashboard on completion.

Type-Aware Accuracy Evaluation
Each extracted field is evaluated against the human-edited ground truth with type-specific logic: date formats normalised, phone numbers stripped to digits, addresses fuzzy-matched, numbers parsed to floats. Status codes (TRUE_POSITIVE, TRUE_NEGATIVE, FALSE_POSITIVE, FALSE_NEGATIVE, INCORRECT, PARSE_ERROR) feed per-model accuracy metrics, per-field radar charts, latency percentiles, hallucination rate, and token cost telemetry.

Full Telemetry & CSV Export
Every completed analysis run surfaces per-model accuracy cards, latency (avg, P50, P95), hallucination rate, parse failure rate, and cost-per-extraction estimates. Per-document extraction results are viewable and comparable inline. A full audit CSV exports extracted values, ground truth values, and field-level status for every field, document, and provider — ready for offline analysis or clinical audit trails.

