When half your evidence is a model's chain-of-thought


The DFIR analysts I trust most all have the same instinct: when something feels off, they reach for evidence. They want to see what happened — packet captures, process trees, audit logs, the raw substrate of what the machine actually did.

That instinct stops working the moment an AI agent is in the loop.

Not because the evidence disappears. Because half of it has changed shape.

The new evidence model, in three layers

When you investigate an incident involving an AI agent, you’re really investigating three nested layers, each with different evidence properties:

Layer 1 — Substrate evidence. The classical layer. Identity events, network flows, process executions, file modifications, API calls. Same as it ever was. Still essential. Often the only thing that’s logged at all.

Layer 2 — Tool-call evidence. The agent’s structured outputs that turned into actions. Function calls, parameters, return values. Sometimes logged by the orchestration framework. Sometimes only inferable from substrate evidence. This is where most of the “what did the agent actually do” question gets answered.

Layer 3 — Reasoning evidence. The model’s intermediate thoughts. Chain-of-thought traces, plan trees, the internal monologue that decided which tool to call and why. This is where intent lives. And in most production systems, it is not retained.

The gap between Layer 2 and Layer 3 is the new evidence frontier.

Why this changes the hunter’s job

A traditional hunt asks: given a hypothesis, what observable evidence would confirm or refute it?

When the evidence is substrate logs, this is mechanical. You know what’s logged. You can write the query. The hunt either lights up or doesn’t.

When the evidence is reasoning, the question warps:

  • Is the reasoning even captured? (Usually no.)
  • If it is captured, is it faithful — does it actually reflect what the model used to make the decision, or is it post-hoc rationalization? (Open research question. Bet against faithfulness.)
  • If it’s faithful, can you query it? Reasoning traces are unstructured prose. Detection-as-code doesn’t translate.

This is why “AI threat hunting” can’t just be “regular threat hunting with extra log sources.” The shape of what you’re hunting for is different.

What I do about it

I’ve started building hunts in three passes:

  1. Substrate first. Run the classical hunt against host/network/identity evidence. If the answer’s there, great — you don’t need Layer 3.
  2. Tool-call diff. Compare what the agent did (substrate) against what it was supposed to be allowed to do (its policy / system prompt / role). Mismatches are leads even when the substrate looks clean.
  3. Reasoning forensics, when available. If trace logs exist, look at them — but treat them as suspect. Faithful or not, they’re useful for generating new hypotheses, not for confirming them.

The order matters. I’ve watched analysts get pulled into hours of chain-of-thought reading before checking whether the substrate evidence already answered the question.

What I’d ask vendors

If you’re shipping an AI agent into a production environment, the question isn’t “do you have logs?” The question is:

  • Are you capturing all three layers, or just the cheap ones?
  • Is Layer 3 retained long enough for an analyst to read it during an investigation? (Hint: the model API’s default isn’t.)
  • Can your audit trail withstand an attacker who’s also smart enough to inject instructions into the reasoning layer itself?

If the answer to any of those is “we’ll get to it in the next release,” you’ve already lost.


This is the kind of question the AI Threat Hunt Builder tries to operationalize — building hunts that explicitly map to which evidence layer they can use.