Federal teams already running an RMF program don't need a parallel universe for AI — they need a crosswalk. This guide maps the four NIST AI RMF functions (Govern, Map, Measure, Manage) to the NIST SP 800-53 control families you already implement, and shows where traditional controls like AC and AU need to evolve when the system in scope is an LLM or agentic AI.
Why a crosswalk, not a rewrite
Federal contractors and program offices already document, assess, and authorize systems against NIST SP 800-53. The AI RMF is intentionally compatible with that catalog — it is a risk framework layered on top, not a replacement. Treating it as a separate compliance regime duplicates work and creates two sets of evidence the AO has to reconcile.
The practical move is to extend your existing control implementations to cover AI-specific failure modes, then map the resulting evidence back to the AI RMF's Govern / Map / Measure / Manage structure. One set of controls, two views: 800-53 for authorization, AI RMF for trustworthy-AI assurance.
Govern → PM, PL, PT, RA, SR
The Govern function lines up with the program-management and planning families. PM (Program Management) and PL (Planning) cover the AI risk management policy, roles, and decision rights. PT (PII Processing and Transparency) and RA (Risk Assessment) cover transparency obligations and the risk-tolerance framing the AI RMF expects an accountable executive to set.
SR (Supply Chain Risk Management) is where Govern gets sharper for AI. Foundation models, embeddings APIs, fine-tuning datasets, and vector stores are all third-party dependencies. Your existing SR-3 and SR-6 evidence should be extended to cover model providers, dataset provenance, and the contractual terms governing training-data use.
Map → CA, CM, RA, SA
The Map function — understanding the system, its context, and its risks — maps cleanly to CA (Assessment, Authorization, and Monitoring), CM (Configuration Management), RA (Risk Assessment), and SA (System and Services Acquisition). The SSPP system description, authorization boundary, and data-flow diagrams are already Map artifacts; they just need to be extended with the AI-specific context: model type, training data sources, intended users, and out-of-scope use cases.
SA-8 (Security and Privacy Engineering Principles) and SA-15 (Development Process, Standards, and Tools) are where you document AI-specific design decisions: retrieval architecture, tool permissions, prompt construction, and human-in-the-loop checkpoints. RA-3 risk assessments should explicitly enumerate AI failure modes — prompt injection, training-data leakage, hallucinations, bias — alongside traditional threats.
Measure → CA, RA, SI, SR
Measure is about testing for the risks you mapped. CA-2 (Control Assessments) and CA-8 (Penetration Testing) are the natural home for adversarial AI evaluations: prompt-injection corpora, jailbreak suites, tool-abuse scenarios, and training-data extraction attempts. Run them against the model in its real deployment configuration — system prompt, retrieval sources, and tools attached — not in a sanitized lab.
SI-4 (System Monitoring) covers ongoing measurement: input/output logging, drift detection, and anomaly alerts on model behavior. RA-5 (Vulnerability Monitoring) extends to the AI supply chain — new model versions, dataset changes, and newly disclosed jailbreak techniques. Document methodology, test data, and results so the evidence is reusable across both 800-53 assessments and AI RMF reviews.
Manage → CA, CP, IR, RA, SI
Manage is treatment and continuous monitoring. The POA&M (CA-5) and continuous monitoring strategy (CA-7) you already maintain are the right home for AI risk treatments — input validation, output filtering, least-privilege tool scopes, human review gates. IR (Incident Response) extends to AI-specific incidents: confirmed prompt-injection exploitation, unsafe tool actions, leaked training data, or model-output harms that trigger the IR-4 process.
CP (Contingency Planning) is worth a second look for agentic systems. What happens when the model provider has an outage, deprecates a model, or changes safety behavior in a new version? Your CP-2 plan should address model-availability and model-behavior contingencies, not just infrastructure failover.
Where traditional controls evolve: AC and AU
AC (Access Control) was written for users and processes acting on data. For LLMs, the access decision often happens implicitly — the model decides which retrieved document to surface, which tool to call, and which fields to include in a response. AC-3 enforcement and AC-6 least privilege now apply to tool scopes (the minimum set of actions the model can take) and retrieval scopes (the minimum corpus it can read). AC-4 information-flow enforcement extends to preventing prompt content from one tenant influencing another's outputs.
AU (Audit and Accountability) gets harder and more important. Traditional audit logs assume deterministic actions you can replay. LLM interactions need richer logging: full prompts (including system and retrieved context), model version, sampling parameters, tool calls and their arguments, and final outputs. AU-2 event selection should explicitly enumerate AI events — model invocations, tool calls, safety-filter triggers, and refusals — so AU-6 review can detect abuse patterns that don't look like traditional intrusions.
Other families worth revisiting
SC (System and Communications Protection): SC-7 boundary protection now includes the model-provider API boundary and any retrieval source. SC-8 protects prompts and outputs in transit, including to third-party model APIs. SI (System and Information Integrity): SI-10 input validation extends to prompt construction; SI-15 output filtering is the natural home for response moderation and PII redaction on generated content.
MP (Media Protection) and PT (PII Processing) deserve a fresh review whenever a model is fine-tuned on agency data — the fine-tuned weights are themselves a form of media containing distilled training content, and the disposal, sharing, and access controls on those weights need to match the sensitivity of the source data.
How to use this in your authorization package
Don't create a separate AI control catalog. In your SSPP, extend the implementation statements for the families above with AI-specific language and evidence. In your SAR, include adversarial AI test results under the relevant CA-2 and CA-8 sections. In your POA&M, track AI-specific findings with the same rigor as any other control gap.
Then, for stakeholders who want the AI RMF view — customers, oversight bodies, internal AI governance committees — produce a crosswalk artifact that maps your 800-53 evidence to Govern / Map / Measure / Manage. One implementation, two presentations. That is the cleanest way to demonstrate trustworthy AI without doubling the compliance workload.
Related service
AI security assessment
AGILE ARMORY