AI Security & Governance Frameworks Analysis

Analysis of four frameworks applicable to the Agents-eval multi-agent evaluation system (PydanticAI-based MAS evaluating academic papers via LLM providers).

Category: Research / Informational Authority: security-advisories.md for CVE status; mas-security.md for MAESTRO implementation Created: 2026-03-01

Framework Overview¶

Framework	Type	Certifiable	Focus	MAS Relevance
OWASP MAESTRO	Threat model	No	Multi-agent system threats (7 layers)	Direct — designed for MAS
MITRE ATLAS	Attack taxonomy	No	Adversarial tactics/techniques for AI/ML	High — maps attacker TTPs
NIST AI RMF 1.0	Risk framework	No (voluntary)	AI lifecycle risk management	Medium — governance structure
ISO 42001 / 23894	Standards	Yes (42001)	AI management system / AI risk guidance	Medium — certification path

1. OWASP MAESTRO¶

Source: OWASP MAESTRO v1.0 Existing coverage: Comprehensive — see mas-security.md

7-Layer Threat Model¶

Layer	Focus	Key Concern
1. Model	LLM security	Prompt injection, data leakage
2. Agent Logic	Agent behavior	Input validation, type safety
3. Integration	External services	Service failures, API key exposure
4. Monitoring	Observability	Log injection, sensitive data in traces
5. Execution	Runtime safety	Resource exhaustion, race conditions
6. Environment	Infrastructure	Container isolation, secret management
7. Orchestration	Coordination	Registration hijacking, execution order

Implementation Status in Agents-eval¶

Controls implemented across sprints 5-6:

Layer 1: Structured outputs with Pydantic schema validation; prompt injection sanitization (tests/security/)
Layer 3: SSRF prevention with domain allowlisting (src/app/utils/url_validation.py); HTTPS enforcement
Layer 4: Log scrubbing for sensitive data (API keys, tokens); structured logging with loguru
Layer 5: Per-component timeouts; bounded iteration
Layer 6: .env excluded from VCS; credentials from environment variables

Unique Value¶

Purpose-built for multi-agent architectures — no adaptation needed
Prescriptive control mappings (what to implement, not just what to watch for)
Agent lifecycle governance (provisioning, access, deprovisioning)
Direct regulatory alignment (NIST AI RMF, EU AI Act)

2. MITRE ATLAS¶

Source: MITRE ATLAS Existing coverage: Minimal — name and scope referenced in security-advisories.md

Framework Structure¶

ATLAS (Adversarial Threat Landscape for Artificial-Intelligence Systems) extends the ATT&CK methodology to AI/ML systems. Uses the same Tactics → Techniques → Procedures hierarchy with AML.Txxxx technique IDs.

Tactics cover the full ML attack lifecycle: Reconnaissance, Resource Development, Initial Access, Execution, Persistence, Defense Evasion, Discovery, Collection, ML Attack Staging, Exfiltration, Impact, Credential Access, Privilege Escalation, and Agentic Behaviors (2024-2025 addition).

Key Techniques for Multi-Agent Systems¶

Technique	Name	MAS Relevance
AML.T0051	LLM Prompt Injection	One agent’s output becomes another’s prompt — injection propagates across the agent graph
AML.T0054	LLM Jailbreak	Bypassing safety guardrails on individual agents corrupts downstream reasoning
AML.T0056	Meta-Prompt Extraction	Recovering agent system prompts reveals orchestration logic and trust assumptions
AML.T0040	ML Supply Chain Compromise	Compromising agent frameworks (PydanticAI), tool registries (MCP), or model weights
AML.T0043	Craft Adversarial Data	Poisoning evaluation datasets to corrupt benchmark integrity
AML.T0024	Exfiltration via ML Inference API	Using agent output channels to exfiltrate sensitive data extracted during inference
AML.T0096	AI Service API Abuse	Credential theft, cost amplification, rate limit bypass on LLM provider APIs

Agentic AI Attack Surfaces (2024-2025)¶

ATLAS expanded to cover AI agents that take autonomous actions:

Multi-hop injection: Poisoned output from one agent cascades to downstream agents in the evaluation pipeline
Tool parameter injection: Attacker-controlled content modifies tool call arguments
Credential abuse: Agents manipulated to exfiltrate API keys via outputs or logs
Tool scope creep: Agent convinced to use tools beyond its operational envelope

Applicability to Agents-eval (ATLAS)¶

Component	ATLAS Threat	Specific Risk
PydanticAI agents	AML.T0051, AML.T0054	System prompts overridden via injected inputs in evaluation datasets
PeerRead dataset ingestion	AML.T0043	Poisoned papers skew evaluation metrics
Tool registry / function calls	Agentic Behaviors	Evaluation tools (file I/O, HTTP) are attack surfaces if scope is unbounded
API credentials	AML.T0096	Prompt injection could exfiltrate keys via agent outputs or logs
Agent graph orchestration	AML.T0056	Compromised evaluation agent corrupts downstream assessments
Trace/artifact collection	AML.T0024	Execution traces may contain sensitive model outputs

How ATLAS Complements MAESTRO¶

Dimension	ATLAS	MAESTRO
Perspective	Attacker (red-team TTPs)	Defender (control mappings)
Evidence base	Real-world case studies and incidents	Prescriptive checklists
Coverage	Broad ML/AI attack surface (non-LLM included)	Multi-agent topology-specific
Detection	Per-technique detection signals	Operational monitoring controls
ATT&CK integration	Direct mapping to ATT&CK for unified threat modeling	Standalone
Regulatory alignment	Indirect	Direct (NIST, EU AI Act)

Combined use: ATLAS enumerates the threat landscape (what attacks exist); MAESTRO maps those threats to operational controls (what to implement). Example: ATLAS AML.T0051 (Prompt Injection) + MAESTRO Layer 1 threat table → together define both the attack vector taxonomy and the control set.

3. NIST AI Risk Management Framework¶

Source: NIST AI 100-1 (January 2023) Companion: NIST AI 600-1 Generative AI Profile (July 2024) Existing coverage: Brief reference in security-advisories.md

Four Core Functions¶

The framework organizes AI risk management into four iterative, interconnected functions applied continuously throughout the AI lifecycle.

GOVERN — Culture, Policies, Accountability¶

Establishes organizational structures that enable risk management. Without governance, Map/Measure/Manage are ad hoc.

Category	Description
GOVERN 1	Policies, processes, procedures for AI risk management
GOVERN 2	Accountability structures and roles defined
GOVERN 4	Teams trained and resourced
GOVERN 5	Stakeholder feedback and organizational learning
GOVERN 6	Responsible disclosure provisions

MAS application: Define risk appetite for evaluation confidence thresholds; assign risk owner; review LLM provider API terms; establish disclosure process for evaluation errors.

MAP — Context, Risk Identification, Categorization¶

Establishes the context in which the AI system operates and identifies risks before measurement begins.

Category	Description
MAP 1	Context established (purpose, deployment, stakeholders)
MAP 2	Scientific knowledge supporting risk decisions documented
MAP 3	Risks to individuals, groups, society identified
MAP 5	Likelihood and magnitude of impacts characterized

MAS application: Document intended use vs foreseeable misuse; catalog AI supply chain (PydanticAI, LLM APIs, PeerRead); identify stakeholder impact (paper authors, reviewers, institutions).

MEASURE — Assessment, Metrics, Monitoring¶

Applies quantitative and qualitative methods to analyze and track risks, converting subjective awareness into evidence-based understanding.

Category	Description
MEASURE 1	Measurement approaches identified and applied
MEASURE 2	Risks analyzed, assessed, ranked, tracked
MEASURE 3	Risks tracked over time
MEASURE 4	Results documented and communicated

Trustworthiness characteristics: Accuracy/reliability, explainability, fairness/bias, privacy, safety, security/resilience, transparency, accountability.

MAS application: Benchmark evaluations against human ground truth; disaggregate scores for bias testing; implement confabulation detection; red-team for prompt injection via adversarial paper content.

MANAGE — Treatment, Response, Communication¶

Implements risk treatment decisions and establishes response processes.

Category	Description
MANAGE 1	Risk treatment plan established
MANAGE 2	Strategies planned, implemented, documented
MANAGE 3	Risks tracked and managed over time
MANAGE 4	Treatment impacts documented

Treatment options: Avoid, Mitigate, Transfer, Accept (with documented rationale).

MAS application: Pin LLM model versions; implement provider failover; add verification agents for confabulation; attach confidence intervals to scores.

Generative AI Profile (AI 600-1)¶

Extends AI RMF with twelve GenAI-specific risk categories:

Risk Category	MAS Relevance
Confabulation	HIGH — LLM agents may fabricate paper details or citations
Data Privacy	MEDIUM — papers may contain author PII
Human-AI Configuration	HIGH — over-reliance on automated scores
Information Security	HIGH — prompt injection, credential exfiltration
Toxicity, Bias, Homogenization	HIGH — LLM bias in paper scoring
Value Chain / Component Integration	HIGH — three external LLM API dependencies

Agentic AI risks (AI 600-1 extensions): prompt injection in tool outputs, uncontrolled tool use, goal misalignment, multi-hop trust degradation, reduced human oversight, context window manipulation.

4. ISO/IEC 42001 and ISO/IEC 23894¶

Sources: ISO 42001:2023, ISO 23894:2023 Existing coverage: Brief reference in security-advisories.md

ISO 42001 — AI Management System (AIMS)¶

First international certifiable standard for AI governance. Follows ISO High-Level Structure (Annex SL), enabling integration with ISO 27001, ISO 9001.

Key clauses (PDCA structure):

Clause	Title	Content
4	Context of the Organization	Internal/external issues, stakeholder needs, AIMS scope
5	Leadership	Top management commitment, AI policy, roles
6	Planning	Risk/opportunity assessment, AI objectives, AI impact assessment
7	Support	Resources, competence, awareness, communication
8	Operation	Lifecycle controls, data management, supplier assessment
9	Performance Evaluation	Monitoring, internal audit, management review
10	Improvement	Corrective action, continual improvement

Annex A controls — 38 controls across 8 domains:

Domain	Key Controls
A.2 Policies	AI policy, role-specific policies
A.5 Impact Assessment	AI system impact assessment process
A.6 AI Lifecycle	Specification, data, design, testing, deployment, monitoring, decommissioning
A.7 Responsible AI	Transparency, explainability, fairness, accountability, privacy, safety
A.8 Third Parties	Supplier assessment, contractual obligations
A.9 Documentation	Technical documentation, model cards

ISO 23894 — AI Risk Management Guidance¶

Extends ISO 31000 (generic risk management) with AI-specific considerations. Guidance document (not certifiable).

Risk management process (adapted for AI):

Scope and context: AI system purpose, stakeholders, risk criteria
Risk identification: Source-based (data, algorithms, environment) and event-based (failure modes, misuse, emergent behaviors)
Risk analysis: Likelihood estimation (accounting for AI non-determinism), consequence assessment (individual, group, societal)
Risk evaluation: Compare against criteria, prioritize for treatment
Risk treatment: Avoid, Modify, Share, Retain
Monitoring: Ongoing performance monitoring, incident tracking, register updates

AI-specific risk categories:

Category	Examples
Data risks	Bias in training data, data poisoning, distribution shift
Model risks	Adversarial vulnerability, lack of robustness, unexplainability
Integration risks	Automation bias, unsafe human-AI interaction, feedback loops
Operational risks	Misuse by operators, out-of-distribution deployment, model drift
Lifecycle risks	Inadequate testing, insufficient monitoring, uncontrolled updates

ISO 42001 vs ISO 23894¶

Dimension	ISO 42001	ISO 23894
Type	Requirements (shall)	Guidance (should)
Certifiable	Yes	No
Scope	Entire AI management system	Risk management process only
Output	AIMS with controls, SoA, audits	Risk register, treatment plan
When to use	Certification needed; full AIMS	Building risk assessment process

Integration: ISO 23894 provides the risk methodology that ISO 42001 Clause 6.1 requires. ISO 23894 answers “how do we identify AI risks?”; ISO 42001 answers “how do we govern the entire AI management process?”

Applicability to Agents-eval (ISO)¶

Highest-priority ISO 42001 controls:

A.5 Impact Assessment: Evaluation outputs influence research decisions — requires documented impact assessment
A.6.2 Data for AI Systems: PeerRead dataset provenance, quality, and bias assessment
A.6.4 Verification and Validation: Multi-agent evaluation accuracy validated against ground truth
A.7.4 Bias and Fairness: LLM judges inherit training data biases — requires bias testing
A.8 Third Parties: LLM API providers require supplier assessment

Highest-priority ISO 23894 risks:

Risk	Likelihood	Consequence	Treatment
LLM evaluation bias	HIGH	HIGH	Bias testing, multiple judge models, HITL validation
Specification gaming (Goodhart’s Law)	MEDIUM	HIGH	Multi-dimensional evaluation, periodic metric review
Data distribution shift	HIGH	MEDIUM	Scope documentation, out-of-distribution testing
Agent coordination failures	MEDIUM	MEDIUM	Pydantic schema enforcement, circuit breakers
Over-reliance by downstream users	MEDIUM	HIGH	Limitation documentation, confidence indicators

Cross-Framework Mapping¶

How the four frameworks relate to each other:

MITRE ATLAS (attack taxonomy — what adversaries do)
      |
      | informs threat identification
      v
OWASP MAESTRO (threat model — what to defend against in MAS)
      |
      | maps threats to controls
      v
NIST AI RMF (risk framework — how to govern/map/measure/manage)
      |
      | operationalized by
      v
ISO 42001 + 23894 (certifiable management system + risk methodology)

Unified Mapping Table¶

Concern	ATLAS Technique	MAESTRO Layer	NIST Function	ISO Control
Prompt injection	AML.T0051	L1 Model	MEASURE 2.6	A.7.3
API credential theft	AML.T0096	L3 Integration	GOVERN 1.5	A.8
Log data leakage	AML.T0024	L4 Monitoring	MAP 3	A.7.5
Resource exhaustion	—	L5 Execution	MANAGE 2	A.6.6
Supply chain compromise	AML.T0040	L6 Environment	MAP 1.6	A.8
Agent hijacking	AML.T0056	L7 Orchestration	MEASURE 2.6	A.6.4
Evaluation bias	AML.T0043	L2 Agent Logic	MEASURE 2.5	A.7.4

Recommendations for Agents-eval¶

Given the project’s open-source research context, full certification (ISO 42001) is not warranted. A lightweight alignment approach:

Continue using MAESTRO as the primary threat model — existing implementation in mas-security.md is comprehensive
Tag security tests with ATLAS technique IDs in docstrings (e.g., # ATLAS: AML.T0051) to ground existing tests in the adversary taxonomy
Adopt NIST AI RMF MEASURE function for evaluation quality — benchmark against ground truth, disaggregate for bias, track confabulation rates
Implement ISO 23894 risk register as a lightweight governance artifact — track the top 5-7 risks identified above with treatment status
Document an AI impact assessment (ISO 42001 A.5 / NIST MAP) covering evaluation output consequences on research community stakeholders