AI articles
Long-form technical articles on AI agent security, agent architecture, prompt engineering, model evaluation, and evidence-based AI workflows.
Browse articles by recency or topic
Technical articles on AI agent security, agent architecture, prompt engineering, model evaluation, and evidence-based AI workflows. Use How-to for procedures and checklists; use Reference for stable lookup pages and diagrams.
Latest articles
Newest published pages (auto-generated).
Using ChatGPT Effectively at Work: A Practical Guide
A practical guide to choosing the right ChatGPT layer for work: modes, search, deep research, agent mode, personalization, memory, and projects.
Connected apps expand the capability and authorization surface of LLM systems
Why app-connected and MCP-enabled LLM systems should be analyzed as capability, scope, approval, and side-effect control problems—not only as prompt-processing systems.
Web-retrieved content is a prompt-injection boundary in tool-using LLM systems
Why retrieved web content must stay non-authoritative in browsing-enabled or tool-using LLM systems, and how to keep it from steering routing, tool arguments, or side effects.
Why “Almost Human, But Not Quite” Feels Wrong: From Clowns to AI-Generated Images and Text
Two separable mechanisms behind the “something feels off” reaction: cue-level perceptual mismatch (uncanny/cue conflict) vs AI-label effects on credibility and sharing.
Theory of mind in LLMs — what benchmarks test (and what they don’t)
Evidence-anchored overview of how ToM is defined in psychology, how it is operationalized for LLM evaluation, and what current results do and do not justify.
Sycophancy in LLM Assistants: What It Is, How Training Creates It, and Why It Shows Up in Production
A technically grounded explanation of sycophancy (belief-agreement bias): what it is, what the evidence supports about prevalence, how preference optimization can produce it, and what changes in training and release practice reduce it.
Browse by topic
Open a topic page to see its core articles and section resources.
Topic
Agent security
Trust boundaries, authorization & access control, orchestration (control-flow mechanisms), policy enforcement, and observability.
Topic
Agent architecture
Workflows, state and lifecycle management, tool invocation patterns, retrieval and context management, and evaluation harnesses.
Topic
Model training and evaluation
Reliability, evaluation methods, calibration, grounding, and benchmark interpretation limits.
Topic
Prompt engineering
Operational prompting notes, evidence contracts, and reusable prompt templates and workflow files.
All articles
Grouped by topic; within each topic sorted by published date (newest first).
Agent security (10) Agent security (10)
-
Connected apps expand the capability and authorization surface of LLM systems
Why app-connected and MCP-enabled LLM systems should be analyzed as capability, scope, approval, and side-effect control problems—not only as prompt-processing systems.
-
Web-retrieved content is a prompt-injection boundary in tool-using LLM systems
Why retrieved web content must stay non-authoritative in browsing-enabled or tool-using LLM systems, and how to keep it from steering routing, tool arguments, or side effects.
-
Agentic Systems 8 Trust-Boundary Audit Checkpoints
A practical audit checklist of 8 trust checkpoints where untrusted artifacts can steer routing, tool use, and write-path actions in chained LLM systems.
-
Control-Plane Failure Patterns in Tool-Using LLM Systems
Two vendor-agnostic control-plane failure patterns—privilege persistence across interaction boundaries and non-enforcing integrity signals—that allow untrusted state to steer tool execution across steps.
-
Social engineering in AI systems: attacking the decision pipeline (not just people)
Threat model of social engineering against AI decision pipelines; maps prompt injection to enforcement controls outside the model (PDP/PEP, validation, budgets).
-
Security report (client-captured): control-plane assurance failures at the LLM boundary
Client-only security report on text-only confirmations of privileged state/actions without verifiable signed audit artifacts; backend state changes not verified.
-
The attack surface is the orchestration loop, not the model
How multi-step orchestration (controller) loops change the threat model in tool-using systems, and where to enforce separation, authorization, validation, and budgets to reduce prompt injection, tool misuse, unsafe writes, and unbounded consumption.
-
Request Assembly Threat Model (Author-Mapped): Reading the “ChatGPT Request Assembly Architecture” Diagram
A reviewer-oriented explanation of the request path (S1–S5), context sources, and R1–R8 checkpoints in an author-mapped request-assembly model.
-
Prompt Assembly Policy Enforcement: Typed Provenance to Prevent Authority Confusion
Prevent authority confusion in prompt assembly by enforcing typed provenance separation between authoritative policy and untrusted content at ingress.
-
The attack surface starts before agents: the LLM integration trust boundary
Why agent-layer threat modeling is incomplete: the first high-leverage control point is the LLM integration trust boundary (before agent frameworks exist).
Agent architecture (3) Agent architecture (3)
-
LLM-Led vs Orchestrator-Led Tool Execution Control-Plane Placement Tradeoffs
A control-plane placement comparison across reliability, observability, latency, cost governance, and security for tool-using LLM systems.
-
LLM Memory Boundary Model: Context Construction (Eligibility, Selection, Persistence) and Why Answers Change
A vendor-agnostic model of context construction—what can enter context (eligibility), what gets used per response (selection), and what is retained for later (persistence)—and the security controls that must live outside the prompt.
-
Human vs GenAI capability map (engineering view)
A practical mapping of human cognitive capabilities to GenAI limitations, engineering substitutes, and residual gaps.
Model training and evaluation (5) Model training and evaluation (5)
-
Why “Almost Human, But Not Quite” Feels Wrong: From Clowns to AI-Generated Images and Text
Two separable mechanisms behind the “something feels off” reaction: cue-level perceptual mismatch (uncanny/cue conflict) vs AI-label effects on credibility and sharing.
-
Theory of mind in LLMs — what benchmarks test (and what they don’t)
Evidence-anchored overview of how ToM is defined in psychology, how it is operationalized for LLM evaluation, and what current results do and do not justify.
-
Sycophancy in LLM Assistants: What It Is, How Training Creates It, and Why It Shows Up in Production
A technically grounded explanation of sycophancy (belief-agreement bias): what it is, what the evidence supports about prevalence, how preference optimization can produce it, and what changes in training and release practice reduce it.
-
Orders of Intentionality and Recursive Mindreading Definitions and Use in LLM Evaluation
A precise reference for nested mental-state attribution (“orders of intentionality” / “recursive mindreading”) and how these constructs are operationalized in evaluations of humans and LLMs—without implying mechanism-level Theory of Mind.
-
Fluency Is Not Factuality Why LLMs Can Sound Right and Be Wrong
Why fluent LLM outputs can still be wrong, and how to enforce evidence-locked answers (retrieval + provenance + fail-closed gates).
Prompt engineering (2) Prompt engineering (2)
-
Using ChatGPT Effectively at Work: A Practical Guide
A practical guide to choosing the right ChatGPT layer for work: modes, search, deep research, agent mode, personalization, memory, and projects.
-
Prompt Engineering Guide for Daily Work (Deep Dive)
A deep dive into why prompts fail in daily work, how to design evidence-bounded prompt specifications (grounded outputs), and how to evaluate them.