Judge workspace / Day 1 rubric

ECP Bootcamp 3.0 Judge Report

A visual, evidence-backed review surface for scoring, hard questioning, security critique, and published-work findings across all 11 teams.

11 teamsSuggested scoresSecurity-heavy Q&AMITRE ATLAS lens
Illustrated dashboard with scorecards, audit trails, AI workflow nodes, and security review panels

Short judge cards first. Full evidence on demand.

The default view now prioritizes verdict, score logic, risk, and tricky questions instead of slide-by-slide reading.

11
Teams reviewed
Dropdown plus all-team score matrix.
5
Rubric areas
Problem, priority, governance, value, POC.
4
Published checks
Nexa, Hoppers, Suchaiirmon, Girls Power.

Judge Quick Read

Reward
Clear use-case choice, real prototype evidence, deterministic controls for regulated work, and honest value assumptions.
Challenge
Claims like secure, permission-aware, private cloud, autonomous, immutable, or ROI are not evidence until they can explain implementation.
Security minimum
Ask for identity, ACL filtering, secret handling, audit logs, model data boundary, tool permissions, and incident owner.
Budget minimum
Ask what is missing from Year 1 cost: integration, SME validation, model usage, monitoring, support, security review, and data cleanup.

Suggested scores are calibration guidance only, using the Day 1 weights: Problem 15, Priority 20, AI Governance 25, Value 20, POC Feasibility 20. Public-code evidence now adjusts governance and feasibility where it contradicts slide claims.

Published Work Security Addendum

Scope checked on 23 June 2026: public repositories linked by slides or supplied by the judge, Git history hygiene, static code review, Bandit, pip-audit/npm audit, and light HTTP checks for the Hoppers live app. No secret values are displayed.

Nexa
Repo review materially weakens the security story: historical .env exposure, unauthenticated approval/admin/SAP endpoints, role checks declared but not enforced, SSRF/TLS validation gaps.
Hoppers
Repo contains Teams bot/RAG code and committed vector data. Live Render app behaves like a static Vite site; POST /api/messages returns 404, so the deployed demo does not prove the submitted bot backend.
Suchaiirmon
Repo shows a real desktop prototype, but Docker defaults expose weak DB credentials and unauthenticated Mongo Express; crawler accepts custom URLs; dependency audit flags ChromaDB/lxml issues.
Girls Power
Only a Figma link was found. Treat as prototype/UI evidence, not executable code or security proof.

Day 1 Rubric Anchor

Business Problem Clarity & Relevance 15%
Real enterprise problem, impacted stakeholders, urgency, and why leadership should care.
Ability to Prioritize Work Items 20%
Justifies the chosen use case over alternatives using value, risk, and feasibility trade-offs.
AI Partnership & Governance Awareness 25%
Positions AI as productivity support with human accountability, risk awareness, governance, monitoring, and controls.
Quantify Potential Value 20%
Quantifies hard/soft value and explains assumptions, methodology, and measurable outcomes.
Feasibility of POC / Prototype 20%
Shows realistic scope, execution path, prototype evidence, and ability to defend the plan in Q&A.

What Good Looks Like: Enterprise RAG And Agentic AI Controls

Use this as the interpretation layer. A team saying "permission-aware RAG" or "secure AI agent" should be able to explain these controls without hand-waving.

Before indexing
Inventory source owners, classify data, run DLP/secret scans, remove passwords/API keys, capture source ACLs/groups/sensitivity labels, and reject sources without ownership or freshness rules.
Vector/index metadata
Each chunk should carry tenant/site, source system, document ID, owner, classification, sensitivity label, allowed groups/users, effective date, version, and expiry/review date.
Retrieval-time security trimming
The query must include the current user's identity/groups and filter out unauthorized chunks before semantic search results reach the LLM. Qdrant-style payload filters, Azure AI Search ACL filters, Pinecone metadata filters, or equivalent controls are expected.
Generation-time containment
The LLM should receive only authorized chunks, cite sources, refuse unsupported answers, redact sensitive content, and avoid turning retrieved document text into executable instructions.
Agent/tool boundary
Agents should use read-only tools by default. Write actions, workflow approvals, ERP updates, ticket changes, code changes, and file exports need separate authorization, human approval, and immutable audit logs.
Operations and proof
Ask for permission-change sync timing, access recertification, stale-document workflow, prompt-injection tests, hallucination evals, cost quotas, monitoring alerts, incident owner, and rollback path.

Additional Judge Lenses

Data readiness and source quality
Does the team prove that source data is complete, current, authorized, and governed?
Human-in-the-loop accountability
Is the human role a real decision point, or decorative sign-off after automation has effectively decided?
Access control and least privilege
Are roles, permissions, approval rights, data scopes, and privileged actions explicit?
Auditability and traceability
Can every AI recommendation be traced to input data, source version, model/prompt version, reviewer, and decision?
Hallucination and model-risk controls
Are retrieval grounding, confidence thresholds, deterministic calculations, test sets, and escalation paths defined?
Regulatory and legal exposure
Does the team identify which outputs can create compliance, customs, financial, production, or safety risk?
Integration burden
Does the POC account for ERP/SAP/e2open/Teams/SharePoint/MES/CI-CD integration and ownership?
Operational ownership
Who maintains sources, rules, prompts, models, logs, monitoring, incidents, and user support after the pitch?
Budget and ROI credibility
Are benefits based on validated volumes, rates, error probabilities, and implementation/operating costs?
Adoption and change management
Will users trust it, use it, and avoid duplicating the old manual process?

MITRE ATLAS Supplemental AI Threat Lens

Use this as an AI-security challenge lens, not a replacement for the Day 1 rubric. Source: MITRE ATLAS, Adversarial Threat Landscape for AI Systems, release 2026.05, modified 2026-05-27. External framework evidence is separate from slide evidence.

LLM Prompt Injection (AML.T0051)
Can supplier PDFs, SOPs, tickets, web pages, or user prompts override instructions, expose data, or trigger unsafe actions?
LLM Data Leakage (AML.T0057)
Can the model reveal restricted SOPs, trade data, source code, prompts, credentials, or decision history?
Unsecured Credentials (AML.T0055)
Are secrets, environment variables, API keys, bot credentials, or service accounts kept out of repositories and logs?
AI Agent Tool Invocation (AML.T0053)
Can an agent call ERP, workflow, email, storage, CI/CD, or ticketing tools beyond the user's authority?
Exfiltration via AI Agent Tool Invocation (AML.T0086)
Can a tool-enabled agent send data to an unauthorized destination through a permitted integration?
Cost Harvesting (AML.T0034)
Can high-volume, high-complexity, or agent fan-out requests create unbounded cloud/model costs?
AI Supply Chain Compromise (AML.T0010)
Are models, packages, plugins, datasets, and agent tools verified before being trusted?
Restrict Agent Tool Invocation On Untrusted Data (AML.M0030)
Do controls block tool execution when instructions originate from untrusted documents or user content?

Suggested Score Ranking

Vertical scorecards are easier to scan while judging: rank, score, rubric split, judge read, score rationale, best question, and value claim stay together per team.

Default view is the short judge briefing. Open the detail sections only when you need backup evidence.