Governance Layer | DataCrawl

There is a moment every team hits when they start deploying AI agents seriously. It usually happens around the third or fourth agent, when someone asks a question that sounds simple: "If one of these agents does something it shouldn't, how do we know?"

The honest answer, for most teams, is that they don't. Not in real time. Not with any proof. Not with anything they could show an auditor or an angry client or a CFO who just found out an agent issued twelve refunds it wasn't supposed to.

We built DataCrawl because we kept running into that moment. And the more we looked at how teams were trying to solve it, the more we realized the solutions people were reaching for were solving the wrong problem.

The guardrail instinct

The first thing most engineers do when they realize their agents need constraints is add guardrails inside the agent itself. Prompt instructions. Conditional logic. Hard-coded limits. It feels like the right move because it's fast and it's close to the thing you're trying to control.

The problem is that guardrails are local. They live inside one agent, built by one team, in one framework. If you have a LangChain agent and an AutoGen agent and an n8n workflow all doing things that affect the same customer records, you have three separate guardrail implementations with no shared policy and no shared audit trail. When something goes wrong, you're correlating logs across three systems and hoping the story adds up.

And guardrails don't answer the governance question. They can slow an agent down or redirect it. But they can't tell you, six weeks later, exactly what policy was active when a specific action was taken, who approved it, and what the agent's payload looked like at that moment. That's not a guardrail problem. That's an infrastructure problem.

What the architecture is actually for

DataCrawl sits before execution, not beside it. When an agent wants to act, it submits a typed action to the governance layer first. The layer checks the agent's identity, validates the action against a policy and returns a structured decision before anything runs.

The reason this works where guardrails don't comes down to three design decisions we made early.

The policy lives outside the agent. When a refund threshold changes, one policy update covers every agent authorized to issue refunds. Nobody touches agent code. Nobody forgets to update one of the five implementations. The governance is centralized and the agents are just callers.

Every decision is reproducible. Each evaluation produces a trace ID and snapshots the exact policy version that was active at that moment. This isn't logging in the usual sense, where you can see that something happened. It's a verifiable record of why a decision was made, against what rules and with what data. That distinction matters the moment anyone questions a decision your agent made three months ago.

The human layer is a first-class part of the system. High-risk actions don't just get flagged. They stop. They sit in an approval queue until a human reviews the full context and makes a call. On approval, the agent receives a short-lived execution token. Without it, the action cannot proceed. This makes human oversight a mechanical guarantee rather than a process hope.

Why the chain has to be unbroken

The part we spent the most time on was making sure there were no gaps. An agent that can bypass the evaluation layer breaks everything downstream. An approval that doesn't require a token means the audit trail has a hole in it. A policy that can change without versioning means yesterday's decisions can't be reconstructed.

Each piece of the architecture exists because we found a way around it during testing and decided to close it. The agent key system means requests can't be spoofed. The canonical action schema means agents can't submit malformed payloads that slip past validation. The token expiry and single-use enforcement means an approved action can't be replayed later.

None of this required rebuilding any agents. That constraint shaped every decision we made. The governance layer had to be something teams could adopt without touching what they'd already built, because the alternative was a product that was technically correct and practically unused.

The problem we're really solving

Broken automations taught us something that turned out to apply to AI agents too. The failure is rarely dramatic. It's quiet. A field gets renamed and data flows through wrong for three weeks. An agent issues a refund slightly outside its intended range and nobody notices until the end of the month. The damage accumulates before anyone knows there's a problem.

The architecture exists to make that kind of invisible failure impossible. Not by making agents smarter. Not by writing better prompts. But by putting a layer between every agent action and every execution that asks one question: is this authorized? And keeps an answer that can be proven.

That's what governance infrastructure means in practice. Not a dashboard. Not a log. A chain of custody for every action your AI takes, from the moment it's proposed to the moment it runs.