Core Capabilities
1. What This System Is
This system converts unstructured incident records into structured, source-traceable data. AI applies a fixed set of rules from the codebook — a rubric that defines how to classify each aspect of an incident and what evidence is required to support it — while human oversight provides quality control, ensures evidentiary standards are met, and handles edge case judgment calls. The result is a dataset that makes university behavior comparable across incidents, allowing researchers to test whether patterns of bias or selective enforcement emerge.
2. Coding, Defined
At the core of the framework is the codebook — the structured logic that drives the entire system and represents the project’s key intellectual property.
What is “coding” in this context?
In social science research, “coding” means systematically categorizing qualitative information to identify patterns, themes, and relationships. It’s not about programming in the computer science sense — it’s about understanding complex human events.
How the system works:
- Variables: 20+ aspects of each incident (e.g., administrative response, severity, tone)
-
Values: Each variable has specific classification options:
- Binary (yes/no)
- Categorical (e.g., target group)
- Ordinal (low/moderate/high)
- Quantitative (number of days)
- Structured qualitative (interpreted but systematic)
What makes this approach unique:
Evidentiary requirements organically pair data and context, e.g. “administrative_response = Yes
” involves finding and citing exact quotes. Every solution is then stored in YAML with:
value
(coded outcome)justification
(quotes and final reasoning)sources
(document IDs)
Why this approach matters:
Manual coding tools like NVivo or Atlas.ti are rigorous but slow; automated detectors like GDELT are fast but often context-blind. This system keeps the rigor and adds scale—every claim is traceable to the exact words that support it, and it can handle dozens of incidents efficiently.
“This system keeps the rigor and adds scale—every claim is traceable… and it can handle dozens of incidents efficiently.” → This is basically a pitch line comparing your system to NVivo or GDELT.
Traditional manual coding tools such as NVivo or Atlas.ti provide rigor but are time-intensive, while automated systems like GDELT scale quickly but often miss context. This framework is designed to combine the strengths of both: each classification remains tied to the source text for traceability, while the system can process a larger number of incidents more efficiently than manual methods.
3. The AI System is Analytical, Not Creative
This system uses the AI model to apply a predefined set of decision rules — a process a human could follow step-by-step — within a controlled, auditable pipeline.
How this differs from typical AI use:
AI is often used for open-ended tasks — summarizing, generating content, or flagging patterns without showing their work.
This framework instead uses the model as an analytical decision engine that:
- Executes production rules derived from the codebook (IF-THEN logic)
- Can only output what it can prove with direct, relevant quotes
- Applies deterministic, procedural logic — the same inputs always yield the same outputs
- Documents its reasoning chain inside structured tags for auditability
- Operates as part of a decision support system, not a creative assistant
Example: If coding for “police involvement,” the model must confirm police presence in the sources, determine the role played (e.g., monitoring, enforcement), and classify accordingly — all based solely on direct, relevant quotes.
Built-in verification:
The system enforces a three-layer audit trail common to expert systems:
- Quote-level — Values tied to source text
- Reasoning-level — Decision steps in structured tags
- Output-level — Validated YAML output
Why this approach matters:
Unlike black-box AI or conventional human coding, every classification is transparent, reproducible, and falsifiable. Any third party can follow the chain from source document → quoted evidence → decision rule → output. For researchers, journalists, or oversight bodies reviewing this work: You can verify every coding decision by checking the quoted evidence against the stated rule. This transforms qualitative coding into a form of algorithmic decision-making that retains full context while enforcing strict evidence standards at scale.
“Unlike black-box AI… every classification is transparent, reproducible, and falsifiable. … This transforms qualitative coding into a form of algorithmic decision-making…” → The “transforms qualitative coding” phrasing is a bit grandiose compared to the rest.
Unlike many AI applications, this system is designed so that every classification is transparent and reproducible. Each step can be checked by tracing from the source document to the quoted evidence, the applied rule, and the resulting output. This structure allows reviewers to verify how each decision was made, while still preserving the context needed for qualitative interpretation.
4. Human-in-the-Loop Oversight
While the AI handles systematic evidence processing, human expertise drives the framework’s critical decisions and quality controls.
System Design & Architecture
I developed the framework’s core components — from codebook and protocol design to Claude API configuration and workflow optimization — integrating engineering methods with social science research standards to ensure both technical precision and methodological rigor.
Active Oversight Functions
- Incident Inclusion: Reviewing candidate incidents from the broader scrape to ensure they meet the defined inclusion rule
- Evidence Scope Validation: Spot-checking that collected source evidence falls within the incident scope as generated by the system
- Edge Case Resolution: When evidence conflicts or classification is ambiguous, human judgment decides whether to flag, investigate further, or exclude
- Gap Investigation: When the documentary record is incomplete, targeted outreach is initiated through FOIA requests, interview protocols, and supplementary searches
Quality Control Framework
In proof-of-concept testing, the model consistently produced accurate, quote-bound outputs without hallucinations or logically inconsistent classifications. These validation layers are maintained as a safeguard — confirming system reliability and ensuring accountability as the dataset scales.
- Consistency Testing: Running incidents through multiple coding passes to ensure stable outputs
- Pattern Validation: Statistical checks flag logically inconsistent outputs for human review
- Source Verification: Spot-checking quote accuracy and attribution to confirm correct sourcing
- Temporal Stability: Monitoring for drift in how edge cases are resolved over time
This hybrid approach ensures automation never compromises human judgment. The system flags what needs review rather than forcing classifications, maintaining both efficiency and accountability.
“The system flags what needs review rather than forcing classifications, maintaining both efficiency and accountability.” → This reads like a product benefit statement.