Decision Theory for Retail Agents: Optimization, Bayesian Reasoning, and Counterfactuals
Decision Theory for Retail Agents: Optimization, Bayesian Reasoning, and Counterfactuals
Series: Foundations of Agentic AI for Retail (Part 3 of 10)
Based on the book: Foundations of Agentic AI for Retail
At 9:07am your forecast dashboard is green. At 9:09am someone asks, "So what do we do today?"
Forecasts are useful. They are also where a lot of retail AI projects quietly stop.
Agents live one step to the right: action under constraints, with a plan to prove KPI impact. Decision theory is what keeps that honest: objectives and constraints, uncertainty updates, and counterfactual evaluation.
The practical decision stack is signals -> beliefs -> actions -> evidence. A production agent turns that stack into a decision contract you can copy and an evaluation menu that survives real organizations.
Jump to: Thesis | Decision stack | Decision contract | Evaluation | 30-day checklist
TL;DR
- Retail is uncertainty + constraints. Decision theory is the bridge between predictions and actions.
- Optimization chooses best feasible actions. Bayesian reasoning updates beliefs under uncertainty.
- Counterfactual evaluation is how you avoid "we shipped it and hoped".
The One-Sentence Thesis
A retail agent is only as good as its decision theory: objectives, constraints, uncertainty updates, and a plan to prove KPI impact.
The Decision Stack (Signals -> Beliefs -> Actions -> Evidence)
Most teams build the left side (signals) and skip the right side (decision and proof).
A production agent needs all five boxes. Otherwise you are building insight, not autonomy.
Optimization vs Bayesian vs Causal (A Practical Contrast)
These are not competing religions. They are different tools in the same stack.
| Tool | The question it answers | Typical retail use |
|---|---|---|
| Optimization | "Given constraints, what action maximizes my objective?" | pricing, replenishment, allocation, scheduling |
| Bayesian reasoning | "Given new evidence, how should my belief change?" | demand risk, uplift estimation, uncertainty-aware decisions |
| Causal/counterfactual evaluation | "What would have happened if we acted differently?" | proving impact, avoiding confounded wins |
If you are building agents, you usually need all three.
Before you argue about which one to start with, write the decision down like a contract.
Start With a Decision Contract (Not a Model)
Before you talk about LLMs or RL, write down the decision in a contract-like form.
Decision: set price for SKU i in store cluster c for week t
Objective: maximize expected margin subject to availability and brand rules
Constraints: floors/ceilings, volatility caps, legal rules, promo calendar locks
Uncertainty: elasticity, competitor response, cannibalization
Action surface: pricing API write OR approval ticket
Evidence plan: shadow -> holdout -> gated autonomy
This is what makes later model choices meaningful.
A Typed Decision Contract (Copy/Paste)
If you want to keep teams aligned, make the contract executable.
type EvidenceMethod = 'shadow' | 'holdout' | 'ab_test' | 'backtest';
export type DecisionContract = {
decisionId: string; // stable key for replay and audit
asOf: string; // ISO timestamp
objective: string;
constraints: string[]; // policy IDs, floors, caps, legal rules
uncertaintyDrivers: string[]; // elasticity, competitor response, cannibalization
actionSurface: {
mode: 'api_write' | 'approval_ticket' | 'recommend_only';
tools: string[]; // allowlisted tool ids
};
baselinePolicy: string; // how the org would act without the agent
evidencePlan: { metric: string; method: EvidenceMethod }[];
};
You can start with objective as plain English and mature toward a formal objective later. The important part is that the organization agrees on what "good" means and how you will prove it.
With that in place, you can talk about engines. In retail, optimization is often the first one worth shipping.
Optimization: "Best Feasible Action"
Optimization is not about finding a perfect answer. It is about finding the best action that satisfies real constraints.
A simple mental model:
- objective is what you want (margin, availability, service level)
- constraints are what you must respect (brand, legal, capacity)
- decision variables are the actions (prices, orders, allocations)
A tiny example (replenishment intuition)
If demand is uncertain, you still choose an order quantity. The decision is not "predict demand". The decision is "place an order under risk".
from dataclasses import dataclass
@dataclass(frozen=True)
class Inputs:
unit_margin: float
holding_cost: float
stockout_cost: float
demand_mean: float
def expected_utility(qty: int, x: Inputs) -> float:
# Toy utility: trade margin vs holding vs stockout.
# In practice: use a demand distribution + constraints + service level targets.
sold = min(qty, x.demand_mean)
leftover = max(0.0, qty - x.demand_mean)
stockout = max(0.0, x.demand_mean - qty)
return sold * x.unit_margin - leftover * x.holding_cost - stockout * x.stockout_cost
This is deliberately simple, but it makes the point: agents are about actions, trade-offs, and constraints.
Bayesian Reasoning: "Update Beliefs as the World Changes"
Retail does not stand still. Bayesian reasoning is a disciplined way to update beliefs when new evidence arrives.
A practical use case: demand risk or uplift estimation when you have small data, shifting conditions, or lots of noise.
def beta_posterior(alpha: float, beta: float, successes: int, failures: int) -> tuple[float, float]:
# Conversion-rate or success-probability update.
# Prior: Beta(alpha, beta)
return (alpha + successes, beta + failures)
# Example: you believe conversion is ~2% with uncertainty.
# After observing outcomes, update your belief before the next action.
You do not need Bayesian machinery everywhere. You need it wherever uncertainty should change actions.
Counterfactual Evaluation (How You Prove KPI Impact)
Retail agents fail in two opposite ways:
- they do not move KPIs, but you cannot tell why
- they appear to move KPIs, but the "win" is confounded (seasonality, promos, competitor moves)
A practical evaluation menu
| Method | When it works | Typical agent stage |
|---|---|---|
| Shadow mode | safe to compare proposals vs baseline | first rollout |
| Holdout stores/SKUs | stable segmentation available | early proof |
| A/B test | you can randomize safely | later maturity |
| Backtest | you can replay past decisions | continuous improvement |
| Quasi-experimental | no randomization possible | high-stakes orgs |
The key is not picking the fanciest method. The key is attaching an evidence plan to every decision surface.
Failure Modes (And How They Show Up)
| Failure mode | What you will see | Fix |
|---|---|---|
| objective mismatch | agent optimizes the wrong thing | KPI owner + explicit objective |
| constraint drift | rules change, agent breaks silently | policy as code + versioning |
| uncertainty ignored | aggressive actions during high variance | uncertainty-aware thresholds |
| evaluation theater | dashboards, no causal story | counterfactual design up front |
Implementation Checklist (30 Days)
- Pick one decision surface and write the objective + constraints.
- Implement the action surface through a tool gateway (validation + idempotency).
- Add uncertainty estimates for the top 1-2 drivers (demand risk, uplift risk).
- Run shadow mode and log action proposals with trace ids.
- Define one counterfactual method you can actually execute in your org.
FAQ
Do I need a perfect causal setup to start?
No. Start with shadow mode and holdouts. The requirement is honesty and a baseline.
Where do LLMs fit in this stack?
LLMs can help with reasoning, summarizing evidence, and generating structured proposals. They do not replace objectives, constraints, or evaluation.
Is optimization too rigid for retail?
No. It is often the opposite: optimization is how you safely encode business flexibility while respecting hard constraints.
What is a baseline policy?
The current process (rules, human workflow, or existing model). If you do not define it, you cannot prove improvement.
Talk Abstract (You Can Reuse)
Teams often treat "AI agents" as a model problem. In retail, it is a decision problem: objectives, constraints, uncertainty, and proof.
This talk turns the 9:09am question ("So what do we do today?") into a framework you can run: a decision contract, an uncertainty-aware decision stack, and an evidence plan that survives reality. You will leave with a template you can copy, plus an evaluation menu (shadow, holdouts, backtests, A/B) for proving KPI impact without wishful thinking.
Talk title ideas:
- Decision Theory for Retail Agents: From Predictions to Actions
- Optimization + Uncertainty: The Real Engine of Retail Autonomy
- How to Prove Agent Impact: Counterfactuals for Practitioners
Next in the Series
Next: MDPs and POMDPs in Retail: Sequential Decisions, Reward Design, and Failure Modes
Series Navigation
- Previous: /blog/raom-retail-agent-operating-model
- Hub: /blog
- Next: /blog/mdp-pomdp-retail-sequential-decisions
Work With Me
- Keynote/workshop on decision contracts, uncertainty, and KPI proof (shadow, holdouts, counterfactuals): /contact (see /conferences)
- Book: /publications/foundations-of-agentic-ai-for-retail
- If you're turning decision logic into a governed product (contracts, gates, evidence): OODARIS AI