Short judge cards first. Full evidence on demand.
The default view now prioritizes verdict, score logic, risk, and tricky questions instead of slide-by-slide reading.
Judge Quick Read
Clear use-case choice, real prototype evidence, deterministic controls for regulated work, and honest value assumptions.
Claims like secure, permission-aware, private cloud, autonomous, immutable, or ROI are not evidence until they can explain implementation.
Ask for identity, ACL filtering, secret handling, audit logs, model data boundary, tool permissions, and incident owner.
Ask what is missing from Year 1 cost: integration, SME validation, model usage, monitoring, support, security review, and data cleanup.
Suggested scores are calibration guidance only, using the Day 1 weights: Problem 15, Priority 20, AI Governance 25, Value 20, POC Feasibility 20. Public-code evidence now adjusts governance and feasibility where it contradicts slide claims.
Published Work Security Addendum
Scope checked on 23 June 2026: public repositories linked by slides or supplied by the judge, Git history hygiene, static code review, Bandit, pip-audit/npm audit, and light HTTP checks for the Hoppers live app. No secret values are displayed.
Repo review materially weakens the security story: historical .env exposure, unauthenticated approval/admin/SAP endpoints, role checks declared but not enforced, SSRF/TLS validation gaps.
Repo contains Teams bot/RAG code and committed vector data. Live Render app behaves like a static Vite site; POST /api/messages returns 404, so the deployed demo does not prove the submitted bot backend.
Repo shows a real desktop prototype, but Docker defaults expose weak DB credentials and unauthenticated Mongo Express; crawler accepts custom URLs; dependency audit flags ChromaDB/lxml issues.
Only a Figma link was found. Treat as prototype/UI evidence, not executable code or security proof.
Day 1 Rubric Anchor
What Good Looks Like: Enterprise RAG And Agentic AI Controls
Use this as the interpretation layer. A team saying "permission-aware RAG" or "secure AI agent" should be able to explain these controls without hand-waving.
Inventory source owners, classify data, run DLP/secret scans, remove passwords/API keys, capture source ACLs/groups/sensitivity labels, and reject sources without ownership or freshness rules.
Each chunk should carry tenant/site, source system, document ID, owner, classification, sensitivity label, allowed groups/users, effective date, version, and expiry/review date.
The query must include the current user's identity/groups and filter out unauthorized chunks before semantic search results reach the LLM. Qdrant-style payload filters, Azure AI Search ACL filters, Pinecone metadata filters, or equivalent controls are expected.
The LLM should receive only authorized chunks, cite sources, refuse unsupported answers, redact sensitive content, and avoid turning retrieved document text into executable instructions.
Agents should use read-only tools by default. Write actions, workflow approvals, ERP updates, ticket changes, code changes, and file exports need separate authorization, human approval, and immutable audit logs.
Ask for permission-change sync timing, access recertification, stale-document workflow, prompt-injection tests, hallucination evals, cost quotas, monitoring alerts, incident owner, and rollback path.
Additional Judge Lenses
Does the team prove that source data is complete, current, authorized, and governed?
Is the human role a real decision point, or decorative sign-off after automation has effectively decided?
Are roles, permissions, approval rights, data scopes, and privileged actions explicit?
Can every AI recommendation be traced to input data, source version, model/prompt version, reviewer, and decision?
Are retrieval grounding, confidence thresholds, deterministic calculations, test sets, and escalation paths defined?
Does the team identify which outputs can create compliance, customs, financial, production, or safety risk?
Does the POC account for ERP/SAP/e2open/Teams/SharePoint/MES/CI-CD integration and ownership?
Who maintains sources, rules, prompts, models, logs, monitoring, incidents, and user support after the pitch?
Are benefits based on validated volumes, rates, error probabilities, and implementation/operating costs?
Will users trust it, use it, and avoid duplicating the old manual process?
MITRE ATLAS Supplemental AI Threat Lens
Use this as an AI-security challenge lens, not a replacement for the Day 1 rubric. Source: MITRE ATLAS, Adversarial Threat Landscape for AI Systems, release 2026.05, modified 2026-05-27. External framework evidence is separate from slide evidence.
Can supplier PDFs, SOPs, tickets, web pages, or user prompts override instructions, expose data, or trigger unsafe actions?
Can the model reveal restricted SOPs, trade data, source code, prompts, credentials, or decision history?
Are secrets, environment variables, API keys, bot credentials, or service accounts kept out of repositories and logs?
Can an agent call ERP, workflow, email, storage, CI/CD, or ticketing tools beyond the user's authority?
Can a tool-enabled agent send data to an unauthorized destination through a permitted integration?
Can high-volume, high-complexity, or agent fan-out requests create unbounded cloud/model costs?
Are models, packages, plugins, datasets, and agent tools verified before being trusted?
Do controls block tool execution when instructions originate from untrusted documents or user content?
Suggested Score Ranking
Vertical scorecards are easier to scan while judging: rank, score, rubric split, judge read, score rationale, best question, and value claim stay together per team.