Back to blog
EngineeringFeb 18, 202610 min read

How We Built an AI That Understands Financial Regulation

JK

Jordan Kim

Co-Founder & CTO

When we started building StackTalk, one of the hardest problems we faced was seemingly simple: can an AI actually understand financial regulation well enough to be useful to compliance professionals? Not just surface-level keyword matching, but genuine comprehension of regulatory intent, applicability, and nuance.

After eighteen months of development, the answer is yes — but the path to getting there was anything but straightforward. Here's how we did it.

The challenge with regulatory text

Financial regulation is uniquely difficult for AI systems. Unlike most natural language tasks, regulatory text has several properties that make it adversarial to standard NLP approaches:

Precision matters enormously. The difference between "shall" and "may" in a regulation is the difference between a mandatory requirement and optional guidance. The difference between "consumer" and "customer" can determine whether a regulation applies to you at all. Getting these wrong isn't a minor error — it's a compliance failure.

Context spans documents. A single regulation rarely stands alone. Understanding what Reg E requires means also understanding the Electronic Fund Transfer Act, relevant CFPB commentary, enforcement actions that establish precedent, and state-level variations. The context window for any regulatory question is enormous.

Applicability is conditional. Whether a regulation applies to a specific product depends on dozens of factors: the product type, the customer segment, the states of operation, the charter type, the partner bank arrangement, and more. The same regulation can mean very different things to different institutions.

Our approach: domain-specific RAG with structured reasoning

We didn't try to fine-tune a model to memorize regulations. Instead, we built a retrieval-augmented generation (RAG) system specifically designed for the regulatory domain, with several key innovations:

Regulatory knowledge graph. We constructed a graph that captures relationships between regulations, agencies, enforcement actions, guidance documents, and product types. When the system needs to answer a question about UDAAP compliance for a payment product, it doesn't just retrieve the relevant regulation — it pulls in the full context of related guidance, enforcement precedent, and product-specific interpretations.

Multi-step reasoning chains. For complex regulatory questions, we decompose the problem into steps: first determine applicability, then identify specific requirements, then assess the institution's current compliance posture. Each step uses the most appropriate retrieval strategy and reasoning approach.

Compliance professional feedback loops. Every output from our system is designed to be verified by a human compliance professional. We built explicit uncertainty quantification into the system — when the AI isn't confident, it says so, and explains why. This feedback flows back into our training pipeline.

Validation: the hardest part

Building the system was one challenge. Proving it works was another. We partnered with experienced compliance professionals to create a benchmark of 2,000+ regulatory questions with expert-verified answers. Our system achieves 94% accuracy on this benchmark — and more importantly, it achieves 99.5% accuracy on the subset of questions where it expresses high confidence.

The remaining cases where the system is uncertain? Those get flagged for human review. We designed the system to be a force multiplier for compliance teams, not a replacement.

What we learned

Three lessons from building this system:

Domain expertise is non-negotiable. We hired compliance professionals before we hired ML engineers. Understanding the domain deeply — not just the text, but how practitioners actually use it — was the single most important factor in building something useful.

Calibrated uncertainty beats raw accuracy. A system that's right 90% of the time and doesn't know when it's wrong is dangerous. A system that's right 85% of the time but knows exactly when to ask for help is invaluable. We optimized for the latter.

The product is the whole workflow, not the model. The AI model is maybe 30% of the value. The other 70% is how it integrates into compliance workflows, how it presents information, how it handles edge cases, and how it helps teams take action on its outputs.

We're continuing to improve the system every week, and we're excited to share more about our technical approach in future posts. If you're an engineer interested in this space, we're hiring — check out our careers page.

Ready to modernize your compliance?

See how StackTalk helps fintechs and banks ship faster while spending less on compliance.