LLM Agents in Retail: Structured Outputs, RAG, Tool Calling, and Data Boundaries
LLM Agents in Retail: Structured Outputs, RAG, Tool Calling, and Data Boundaries
Series: Foundations of Agentic AI for Retail (Part 6 of 10)
Based on the book: Foundations of Agentic AI for Retail
At 9:47pm, someone forwards a supplier email into a chat: "Can you get this order in tonight?"
The LLM answers instantly. Confident. It even formats the response like an ERP screen.
And that is the problem.
Retail doesn't fail because we can't generate text. Retail fails because we execute the wrong action for the right reason, or the right action for the wrong data, and we only discover it after the KPI damage is real.
Production LLM agents in retail need contracts, retrieval discipline, tool gateways, and data boundaries. Not as "best practices", but as an operating model you can defend when something goes sideways.
Jump to: Rule | Pattern | Contracts | RAG | Data boundaries | Tool gateway | 30-day checklist
TL;DR
- LLMs are powerful, but retail needs contracts: schemas, tools, policies, and monitoring.
- RAG is not a feature; it is a reliability dependency.
- "Never send" rules should be written as policy, not tribal knowledge.
The One-Sentence Rule
In retail, LLMs should propose structured actions; a policy gate and tool gateway decide what actually executes.
The Pattern: Propose -> Validate -> Gate -> Execute -> Audit
This is the production shape that prevents most "agent" failures.
If you skip validation and gating, you are betting your KPIs on prompt stability.
Structured Output Contracts (You Need More Than "JSON Mode")
The point of structured outputs is not formatting. The point is enforceable contracts.
import { z } from 'zod';
const SetPriceAction = z.object({
kind: z.literal('set_price'),
payload: z.object({
sku: z.string().min(1),
newPrice: z.number().positive()
})
});
const CreatePoAction = z.object({
kind: z.literal('create_replenishment_order'),
payload: z.object({
sku: z.string().min(1),
locationId: z.string().min(1),
qty: z.number().int().positive(),
needBy: z.string().min(1)
})
});
const FlagForReviewAction = z.object({
kind: z.literal('flag_for_review'),
payload: z.object({
reason: z.string().min(1)
})
});
export const AgentProposal = z.object({
traceId: z.string().min(1),
actions: z.array(
z.discriminatedUnion('kind', [SetPriceAction, CreatePoAction, FlagForReviewAction])
)
});
This lets you reject invalid actions deterministically (before you reach any downstream system).
Then the next failure mode shows up: the model proposes a "valid" action using the wrong facts. That is where retrieval discipline earns its keep.
RAG in Retail (What to Retrieve and Why)
Retail agents often need:
- policy documents: brand rules, pricing ladders, promo constraints
- operational facts: lead times, supplier constraints, substitution maps
- semantic context: taxonomy, category rules, planograms
A pragmatic RAG rule:
- Retrieve only what you can cite or log in a trace.
- Treat retrieval as part of the state contract (versioned, auditable).
A Retrieval Record (What to Log)
If you want RAG to be reliable, treat it like an input contract. Log what you retrieved the same way you log what you executed.
{
"event_type": "retrieval.record.v1",
"trace_id": "trace_abc",
"as_of": "2025-07-31T03:12:00Z",
"query_hash": "sha256:...",
"sources": [
{ "id": "policy/pricing_rules@2025-07", "chunks": [12, 13] },
{ "id": "supplier/terms@vendor-17@2025-06", "chunks": [4] }
]
}
When someone asks "why did the agent propose that?", this is the difference between an answer and a shrug.
Data Boundaries ("Never Send" Rules)
In retail, data boundary mistakes are catastrophic because they are silent.
| Data class | Examples | Agent rule |
|---|---|---|
| Public | published catalogs, public web content | OK to send |
| Internal | price rules, margin targets, supplier terms | send only to approved systems/models |
| PII | emails, phone, loyalty identifiers | never send to external models; tokenize or aggregate |
| Highly sensitive | contracts, legal, credentials | never send; isolate |
Write these as policy checks, not as guidance.
Once you have the contracts and the boundaries, you still need one more thing: a single place where writes actually happen.
Tool Gateway Pattern (Where Safety Actually Lives)
The tool gateway is the layer that makes tool calling safe.
Tool gateway responsibilities:
- validate payloads again (defense in depth)
- attach idempotency keys
- enforce allowlists
- log trace id + inputs hash
- return structured tool results
A Tool Gateway Envelope (Copy/Paste)
The LLM should not talk to your ERP, pricing engine, or ticketing system directly. It should talk to your gateway, with an envelope you can validate, dedupe, and audit.
{
"message_type": "tool.request.v1",
"trace_id": "trace_abc",
"idempotency_key": "pricing.write:2025-07-31:sku=SKU-001",
"requested_by": "llm_agent:pricing_assistant",
"tool": "pricing.write",
"payload": { "sku": "SKU-001", "new_price": 19.99 },
"policy_context": {
"policy_version": "pricing_policy@2025-07",
"approval_mode": "thresholds"
}
}
And your gateway should return something you can store:
{
"message_type": "tool.result.v1",
"trace_id": "trace_abc",
"idempotency_key": "pricing.write:2025-07-31:sku=SKU-001",
"status": "blocked",
"blocked_reason": "requires_approval (delta > 3%)",
"action_id": null
}
That is how you replay proposals, prove what executed, and explain why something did not.
Failure Modes (And How To Prevent Them)
| Failure mode | What you will see | Prevention |
|---|---|---|
| hallucinated actions | nonsensical tool calls | strict schemas + allowlists |
| prompt injection | tool misuse via untrusted text | isolate untrusted inputs + policy gate |
| data leakage | sensitive fields in prompts | data classification + "never send" policy |
| drift | quality decay over time | eval harness + monitoring + replay |
Implementation Checklist (30 Days)
- Define one action surface and write a strict schema for it.
- Build a policy gate (approve/block/route) and make it auditable.
- Add a tool gateway with idempotency and structured results.
- Add RAG with versioned sources (log retrieved doc IDs).
- Run shadow mode and evaluate proposals vs baseline.
FAQ
Do I need to fine-tune for retail agents?
Often no. The first wins come from contracts, retrieval discipline, and gating.
Why is RAG a dependency?
Because retail decisions depend on policies and constraints that change. You need grounded context.
Is tool calling the same as autonomy?
No. Tool calling is capability. Autonomy requires control-plane decisions (policies, approvals, evidence).
Talk Abstract (You Can Reuse)
LLMs make demos look like agents. Production quickly teaches you where that breaks.
This talk shows how to build LLM agents in retail that cannot hallucinate writes: strict action schemas, retrieval records you can audit, a policy gate, and a tool gateway with idempotency. The goal is not maximum autonomy. The goal is safe, measurable automation you can explain, monitor, and roll back.
Talk title ideas:
- LLM Agents in Retail: Contracts, Not Prompts
- Tool Calling Without Chaos: Policy Gates and Gateways
- Data Boundaries for Retail AI: "Never Send" Rules That Work
Next in the Series
Next: Perception for Retail Agents: Sensors, Edge Latency Budgets, Knowledge Graphs, and Causality
Series Navigation
- Previous: /blog/planning-vs-rl-retail-production-ladder
- Hub: /blog
- Next: /blog/perception-retail-agents-sensors-knowledge-graphs-causality
Work With Me
- Talks/workshops on LLM agents that cannot hallucinate writes (schemas, RAG, gateways, boundaries): /contact (see /conferences)
- Book: /publications/foundations-of-agentic-ai-for-retail
- If you want an agent control plane (schemas, gates, gateways, replay): OODARIS AI