The attack surface starts before agents: the LLM integration trust boundary

By Published

Threat-modeling guide for the LLM integration boundary where prompts, tools, and production systems first intersect.

What this page is: a vendor-agnostic threat-modeling worksheet for the earliest trust boundary where LLM I/O can touch production systems.
What this page is not: a claim about LLM internals or any specific vendor trace.

Executive summary

Before you adopt an agent framework, many high-leverage security controls are decided at the LLM integration trust boundary: the first interface where model inputs/outputs can (a) read production data, (b) write to production systems, or (c) enter production observability (logs/telemetry/traces).

This page treats that interface as a trust boundary and maps it to OWASP GenAI LLM Top 10 (2025) risk categories.

How to use this worksheet

1) Fill the read paths table with every source that can enter context (including retrieval and tool outputs).
2) Fill the write paths table with every sink where model outputs can land (including observability and persistence).
3) For each boundary crossing, record the owner, the server-side enforcement point, and the minimum audit evidence required to reconstruct an incident.

Scope and evidence boundary

This article is a threat-modeling guide for a specific control point before agent frameworks: the earliest interface where LLM I/O can touch production data or production observability.

It does not claim mechanism-level properties about LLMs. Where risk categories are referenced, they are pinned to OWASP GenAI LLM Top 10 (2025) (see References).

Definition: the LLM integration trust boundary

LLM integration trust boundary = the first interface where LLM inputs/outputs can read from or write to:

  • production data stores, or
  • production observability systems (logs / telemetry / traces), or
  • tool/API surfaces that can cause side effects.

Operationally, this is the trust boundary between model I/O and production systems.

Why this boundary matters (even without agents)

OWASP’s LLM Top 10 (2025) includes risks that apply to non-agent LLM apps when model I/O is connected to real systems:

  • LLM01:2025 Prompt Injection — untrusted inputs steer behavior/output.
  • LLM02:2025 Sensitive Information Disclosure — sensitive data leaks via outputs or context handling.
  • LLM05:2025 Improper Output Handling — unsafe downstream consumption of model outputs.
  • LLM06:2025 Excessive Agency — the system grants tools/permissions/autonomy beyond minimum needed.

Threat scenarios at the trust boundary (protocol-level)

Scenario A — indirect prompt injection via external content + tool access

If the system ingests untrusted content (email/docs/web) and also enables tool calls, an attacker can place instruction-like payloads in that content.

OWASP’s Excessive Agency guidance includes mailbox-assistant scenarios where untrusted inputs can trigger sensitive-data access and exfiltration. Mitigations include minimizing extensions, least-privilege scopes, and requiring user approval for high-impact actions.

Scenario B — improper output handling downstream

If model output is passed into:

  • command execution,
  • templating/HTML rendering,
  • policy/routing decisions,
  • database writes, without strict validation/sanitization, the output becomes an injection surface (even if the user never sees it).

Scenario C — sensitive data exposure via stored artifacts

If sensitive inputs/outputs are stored (logs/telemetry/analytics/memory/RAG indexes), the exposure surface includes retention, access control, and replay into future prompts.

Mapping worksheet

1) Read paths into the model (inputs)

Document every source that can enter context:

SourceTrust levelSensitivityTransformations before modelNotes
User messageUntrustedVariesRedaction?
Retrieved docs / webUntrustedVariesFiltering / allowlist
Tickets/CRM/email summariesUntrusted (default)Often sensitiveRedaction + minimization
Database readsTrusted (system)Often sensitiveField-level selection
Tool outputs (if re-injected)Untrusted (default)VariesSanitization + provenance tags

2) Write paths from the model (outputs)

Document where outputs can land:

SinkPersisted?Retention/TTLReadersReplay into prompts?Controls
Product UINo/YesEnd userMaybeOutput policies
Logs / telemetry / tracesYesDefined TTLOperatorsPossibleRedaction + access controls
Analytics eventsYesDefined TTLAnalystsPossibleMinimization
Memory / context storeYesDefined TTLSystemYesScoped + gated writes
Tools / internal APIsYesSystemsServer-side authz + validation
Routing / feature flagsYesSystemYesDeterministic gating

3) Owner, enforcement, and audit evidence

For each boundary crossing, record:

  • Owner (accountable control owner),
  • Server-side enforcement (not “prompted”),
  • Audit evidence (minimum data required to reconstruct an incident).

Minimum controls at the trust boundary (vendor-agnostic)

Control 1 — data policy for model-visible content (enforced)

Define:

  • what the model may see,
  • what must be redacted/minimized,
  • what can be stored, where, and for how long,
  • what can be replayed into future prompts.

Control 2 — server-side authorization and validation for any side effects

If outputs can influence tools, writes, routing, or flags:

  • enforce authorization + validation outside the prompt,
  • minimize permissions and available functions,
  • require user review/approval for high-impact actions.

Control 3 — treat retrieved content and tool outputs as untrusted data

  • maintain a strict instruction hierarchy (policy/controller > tool data > user data),
  • constrain untrusted content so it cannot act as policy or permissions,
  • attach provenance (source, time, workflow) when re-injecting content into context.

Control 4 — auditability without over-collection

You should be able to reconstruct:

  • what entered context (at least pointers + provenance),
  • what output was produced,
  • what actions were requested/performed/blocked,
  • why.

What you should have when done

  • A complete inventory of context inputs (sources + trust level + transformations).
  • A complete inventory of output sinks (persistence + retention + readers + replay risk).
  • A list of server-side enforcement points for side effects (authz + validation + least privilege).
  • A minimum audit evidence set sufficient to reconstruct a timeline (inputs → decisions → outputs → actions).

Copy/paste checklist

  • I can point to the first place model I/O touches production data, observability, or tools.
  • Every context source is classified (trust + sensitivity) and transformed before ingestion.
  • Every output sink is documented (persistence, retention, readers, replay risk).
  • Side-effect actions are gated server-side (authz + validation + least privilege).
  • High-impact actions require explicit review/approval.
  • Audit evidence exists to reconstruct an incident timeline.

Suggested reading

References (pinned)

OWASP GenAI (2025):

OWASP (legacy v1.1 — numbering differs from the GenAI 2025 list):

OWASP cheat sheets:

OpenAI:

NIST: