Agentic AI Governance Frameworks 2026: Risks, Oversight, and Emerging Standards

This year, in tech, began with the rise of Agentic AI. A little less than two months into 2026, the AI debate has already been hijacked by AI agents, their capabilities, and their benefits for businesses. Between agents inventing Crustafarian religions overnight and sci-fi scenarios, a more prosaic set of questions emerges. Just to name a few: the governance risks of delegating tasks to machines, the impact on the human workforce, the increased need for human control and oversight. hijacked by AI agents Crustafarian religions human control and oversight Since I am allergic to any form of tech hype, I will not give in to the narration that sees AI agents taking over the planet by Christmas at the latest. But companies are indeed exploring the possibility of implementing AI agents to optimise workflows. The growing interest in these solutions seems confirmed by the surfacing of Agentic AI governance frameworks. Let’s see a couple of them. Singapore’s early move on Agentic AI governance In January 2026, Singapore’s Infocomm Media Development Authority (“IMDA”) published its Agentic AI governance framework. First of all, the (voluntary) framework acknowledges that the agents’ “access to sensitive data and ability to make changes to their environment” raises a whole new profile of risks. The complex interactions among agents substantially increase the risk of outcomes becoming more unpredictable. Since agents may be performing financial transactions or altering databases containing personal data, the magnitude of these potential risks cannot be minimised. Agentic AI governance framework Agentic AI governance framework risk of outcomes becoming more unpredictable Singapore’s model is not about rewriting governance but adapting AI considerations and translating them for agents. For instance, the principles of fairness and transparency continue to apply more than ever. So too does human accountability, human oversight, and control, which need to be continuously implemented across the AI lifecycle, to the extent possible. adapting AI considerations and translating them for agents Agentic AI risks Agentic AI risks Singapore’s framework recognises that Agentic AI risks are not too dissimilar from the traditional LLM-related risks (SQL and prompt injection, hallucination, bias, data leakage, etc). What changes is the way they manifest themselves: an agent may hallucinate by making a wrong plan to complete a task, or at a later stage, during execution, by calling non-existent tools or calling them in a biased manner. traditional the way they manifest themselves Risks are even higher when agents interact with each other. A mistake by one agent may produce a cascading effect, if the wrong output is passed on to other agents and propagates across the system. As mentioned above, complex interactions may lead to unpredictable outcomes and unexpected bottlenecks in the chain of actions. wrong output is passed on to other agents The model identifies five key, potentially harmful categories of risks: harmful categories of risks Erroneous action. Imagine an AI agent failing to escalate an IT incident to human operators because the anomaly detected does not match predefined thresholds. Depending on the context, the wrongful action may cause system compromise. Unauthorised actions. This risk is configured by an agent taking actions that sit outside of its permitted scope. Biased or unfair actions. We are familiar with bias as this is a frequent problem with traditional AI, especially binary classification models. The rationale here is the same: think of an agent making a biased hiring decision. Data breaches. A classic scenario is where agents may be disclosing sensitive information inadvertently, without recognising it as sensitive, or a security breach by malicious actors who gain access to private information via agents. Disruption to connected systems. This risk relates to the event where a wrongful action undertaken by an agent interacting with other systems propagates, disrupting the flow of information or actions (e.g., mistakenly deleting a production codebase). Erroneous action. Imagine an AI agent failing to escalate an IT incident to human operators because the anomaly detected does not match predefined thresholds. Depending on the context, the wrongful action may cause system compromise. Unauthorised actions. This risk is configured by an agent taking actions that sit outside of its permitted scope. Biased or unfair actions. We are familiar with bias as this is a frequent problem with traditional AI, especially binary classification models. The rationale here is the same: think of an agent making a biased hiring decision. traditional Data breaches. A classic scenario is where agents may be disclosing sensitive information inadvertently, without recognising it as sensitive, or a security breach by malicious actors who gain access to private information via agents. Disruption to connected systems. This risk relates to the event where a wrongful action undertaken by an agent interacting with other systems propagates, disrupting the flow of information or actions (e.g., mistakenly deleting a production codebase). Governance model Governance model The IMDA’s Agentic AI governance model is based on four pillars. 1. Assessing risks upfront 1. Assessing risks upfront Essentially, this step involves determining risks and use cases for agent deployment, and designing a risk control system. determining risks and use cases designing a risk control system Central to determining use cases is the identification of risk, described as a function of impact and likelihood (music to my risk management ears…) and threat modelling. The model illustrates a series of factors affecting the potential impact of AI agents (deployment domain, access to sensitive data and external system, scope and reversibility of agents’ actions) and likelihood (agents’ level of autonomy, task complexity). In IMDA’s view, threat modelling is complementary to risk assessments, inasmuch as it identifies potential external attack scenarios. Common threats may be memory poisoning, tool misuse, and privilege compromise. impact likelihood potential external attack scenarios The next logical step is to define agents’ limits and permissions. This means producing policies, procedures, and protocols that clearly outline the limits of agents in terms of access to tools and systems, their level of autonomy, and area of impact (e.g., deploying agents in “self-contained environments” with limited network and data access, particularly when they are carrying out high-risk tasks such as code execution). The problem of agents’ identity management and access control is trickier, as current authentication systems designed for humans do not smoothly translate to complex systems like AI agents. As new solutions and standards are being developed to circumvent this issue, a mix of traditional identity access and human supervision is required. a mix of traditional identity access and human supervision is required 2. Making humans truly accountable 2. Making humans truly accountable The second pillar concerns establishing clear responsibilities within and outside the organisation, and enabling meaningful human oversight. IMDA’s fundamental premise is that organisations and individuals remain accountable for their agents’ actions. establishing clear responsibilities meaningful human oversight Within the organisation, responsibilities should be defined for: a) key decision makers, including setting agents’ high-level goals, limits, and the overall governance approach; b) product teams, comprising defining agents’ requirements, design, controls, safe implementation and monitoring; c) cybersecurity team, including establishing baseline security guardrails and security testing procedures; d) users, comprising ensuring responsible use of agents and complying with relevant policies. Outside actors may include, for instance, model developers or agentic AI providers, and for these, the organisation should set clear responsibilities. key decision makers product teams cybersecurity team users Designing meaningful human oversight involves three measures. First, companies need to define action boundaries requiring human approval, such as high-stakes or irreversible actions (editing sensitive data or permanently deleting data), or outlier and atypical behaviours (agents acting beyond their scope). Secondly, they must ensure the continued effectiveness of human oversight, for instance by training humans to identify common failure modes and regularly auditing human control practices. Finally, they should introduce automated real-time alert monitoring. define action boundaries requiring human approval continued effectiveness of human oversight real-time alert monitoring 3. Implementing technical and control processes 3. Implementing technical and control processes On top of the traditional LLM-related technical control, the third pillar recommends adding new controls required by the novelty of Agentic AI across the lifecycle. traditional For instance, companies should introduce strict pre-deployment controls using test agents to observe how actual agents will operate once deployed. Companies should take a holistic approach when testing agents, including evaluating new risks, workflows, and realistic environments across datasets, and evaluating test results at scale. Just like traditional AI, agents should be continuously monitored and tested post-deployment, so that humans can intervene in real time and debug where necessary. This activity will not go unchallenged, as agents work at speed and companies may struggle to keep up. using test agents traditional agents work at speed companies may struggle to keep up 4. Enabling end-user responsibility 4. Enabling end-user responsibility Finally, to ensure responsibility and accountability of end users – that is, those who will use and rely on AI agents – companies should focus on transparency (communicating agents’ capabilities and limitations) and education (training users on proper use and oversight of agents). Organisations may focus on transparency for users who interact with agents (external-facing users, such as customer service or HR agents) and on education for users who integrate agents into their work processes (internal-facing users, such as coding assistants). transparency education UC Berkeley’s Agentic AI framework UC Berkeley’s Agentic AI framework In February 2026, a group of researchers from UC Berkeley’s Center for Long-Term Cybersecurity published the Agentic AI Risk-Management Standards Profile, a risk framework broadly reflecting NIST AI Risk Management Framework (AI RMF). Similarly to IMDA, the paper recognised the increased risks introduced by agents, including “unintended goal pursuit, unauthorized privilege escalation or resource acquisition, and other behaviors, such as self-replication or resistance to shutdown”. These unique challenges “complicate traditional, model-centric risk-management approaches and demand system-level governance”. researchers Agentic AI Risk-Management Standards Profile broadly reflecting NIST AI Risk Management Framework (AI RMF) unique challenges UC Berkeley’s framework was explicitly designed for single- or multi-agentic AI systems developers and deployers. However, the authors say, it can also be used by policymakers and regulators “to assess whether agentic AI systems have been designed, evaluated, and deployed in line with leading risk-management practices”. single- or multi-agentic AI systems developers and deployers Agentic AI risks Agentic AI risks Compared to IDMA, the paper identifies a broader array of risks: Discrimination and toxicity, including feedback loops, propagation of toxic content, and disparities in availability, quality, and capability of agents. Privacy and security, including unintended disclosure of personal or sensitive data, data leakage, and resulting misaligned outcomes. Misinformation, especially when hallucination and erroneous outputs from one agent are reused by other agents. Malicious actors and misuse, including easier execution of complex attacks, automated misuse, mass manipulation, fraud, and coordinated influence campaigns. Human-computer interaction, such as reduced human oversight, socially persuasive behaviour, and users’ difficulty in understanding or contesting agent behaviours. Loss of control, comprising oversight subversion, rapid execution outrunning monitoring and response, and behaviours that undermine shutdown or containment mechanisms. Socioeconomic and environmental harms, including inequalities in accessing agentic capabilities, collective disempowerment, and large-scale economic and environmental impacts. AI system safety, failures, and limitations, including autonomous replication, misalignment, deception, collusion, goal-driven planning, real-world impact, and insufficient human oversight. Discrimination and toxicity, including feedback loops, propagation of toxic content, and disparities in availability, quality, and capability of agents. Discrimination and toxicity Privacy and security, including unintended disclosure of personal or sensitive data, data leakage, and resulting misaligned outcomes. Privacy and security Misinformation, especially when hallucination and erroneous outputs from one agent are reused by other agents. Misinformation Malicious actors and misuse, including easier execution of complex attacks, automated misuse, mass manipulation, fraud, and coordinated influence campaigns. Malicious actors and misuse Human-computer interaction, such as reduced human oversight, socially persuasive behaviour, and users’ difficulty in understanding or contesting agent behaviours. Human-computer interaction Loss of control, comprising oversight subversion, rapid execution outrunning monitoring and response, and behaviours that undermine shutdown or containment mechanisms. Loss of control Socioeconomic and environmental harms, including inequalities in accessing agentic capabilities, collective disempowerment, and large-scale economic and environmental impacts. Socioeconomic and environmental harms AI system safety, failures, and limitations, including autonomous replication, misalignment, deception, collusion, goal-driven planning, real-world impact, and insufficient human oversight. AI system safety, failures, and limitations Focus on human control Focus on human control Much like IMDA, UC Berkeley’s standards primarily aim to enhance human oversight, focusing on: enhance human oversight Human control and accountability (clear roles and responsibilities, including clear role definitions, intervention checkpoints, escalation pathways, and shutdown mechanisms) System-level risk assessment (particularly useful for multi-agent interactions, tool use, and environment access) Continuous monitoring and post-deployment oversight (agentic behaviour may evolve over time and across contexts) Defence-in-depth and containment (treating agents as untrusted entities due to the limitations of current evaluation techniques) Transparency and documentation (clear communication of system boundaries, limitations, and risk-mitigation decisions to stakeholders) Human control and accountability (clear roles and responsibilities, including clear role definitions, intervention checkpoints, escalation pathways, and shutdown mechanisms) System-level risk assessment (particularly useful for multi-agent interactions, tool use, and environment access) Continuous monitoring and post-deployment oversight (agentic behaviour may evolve over time and across contexts) Defence-in-depth and containment (treating agents as untrusted entities due to the limitations of current evaluation techniques) Transparency and documentation (clear communication of system boundaries, limitations, and risk-mitigation decisions to stakeholders) The authors acknowledge the limitations of their own standard. Firstly, Agentic AI taxonomies widely vary and are inconsistently applied across the world, which limits “the ability to harmonize recommendations across organizations and jurisdictions”. Secondly, the bcomplex multi-system behaviour and increased autonomy make it difficult to ensure robust human control and the correct attribution of liability. Finally, many risk metrics remain underdeveloped, especially “with respect to emergent behaviours, deceptive alignment, and long-term harms”. widely vary and are inconsistently applied across the world For this reason, the authors warn, the paper adopts a “precautionary approach, emphasizing conservative assumptions, layered safeguards, and continuous reassessment”. Rather than a static governance checklist, it should be viewed as “a living framework intended to evolve alongside agentic AI research, deployment practices, and governance norms”. NIST design NIST design As mentioned above, the framework’s design overlaps that of NIST AI RMF, structuring the Agentic AI efforts around the four core functions: Govern, Map, Measure, and Manage. This is an intentional decision from the authors to help companies apply the risk management procedures on a structure they are familiar with and build a framework that is consistent with existing practices. overlaps that of NIST AI RMF consistent with existing practices More Agentic AI frameworks More Agentic AI frameworks IMDA and UC Berkeley’s frameworks have been recently published but are not the only Agentic AI governance programmes to be proposed. There are references to various other models that outline processes and procedures to address the risks posed by AI agents. Let’s have a look at four of them. not the only Agentic AI governance programmes to be proposed Agentsafe Agentsafe In December 2025, three Irish IBM experts published a paper proposing Agentsafe, a tool-agnostic governance framework for LLM-based agentic systems. Agentsafe tool-agnostic governance framework In practice, Agentsafe “operationalises the MIT AI Risk Repository by mapping abstract categories of risk into a structured set of technical and organisational mechanisms”, tailored to agent-specific risks. It also introduces constraints to risky behaviours, escalates high-impact actions to human oversight, and assesses systems based on pre-deployment incident scenarios, comprising security, privacy, fairness, and systemic safety. According to the authors, the framework provides assurance through evidence and auditability, offering a methodology that links risks to tests, metrics, and provenance. constraints to risky behaviours assurance through evidence and auditability Agentsafe appears to be a very promising framework and a natural extension of traditional AI technical governance to the realm of Agentic AI. It builds on ethical principles (accountability, transparency, and safety), is shaped by structured risk management processes aligned with international standards, and seems to bear the potential to address two key challenges of Agentic AI: timely containment and effective human oversight. very promising framework traditional timely containment effective human oversight AAGATE AAGATE In November 2025, on a decidedly more technical side, 11 entrepreneurs, researchers, and industry experts published a paper proposing the Agentic AI Governance Assurance & Trust Engine (AAGATE), defined as a “NIST AI RMF-aligned governance platform for Agentic AI”. The paper is based on the assumption that “traditional AppSec and compliance tools were designed for deterministic softwar,e not self-directed reasoning systems capable of improvisation”. paper Agentic AI Governance Assurance & Trust Engine (AAGATE) assumption To close this gap, AAGATE operationalises the above-mentioned NIST AI RMF principles (Govern, Map, Measure, Manage), integrating “specialized security frameworks for each RMF function: the Agentic AI Threat Modeling MAESTRO framework for Map, a hybrid of OWASP’s AIVSS and SEI’s SSVC for Measure, and the Cloud Security Alliance’s Agentic AI Red Teaming Guide for Manage”. The authors explain that this layered architecture will enable “safe, accountable, and scalable deployment”. operationalises the above-mentioned NIST AI RMF principles You can have a look at a simplified summary of AAGATE published on the Cloud Security Alliance. Cloud Security Alliance NVIDIA’s Agentic AI risk framework NVIDIA’s Agentic AI risk framework November 2025 also witnessed the publication of an Agentic AI safety and security framework by a group of experts from NVIDIA and Zurich-based AI company Lakera. The framework introduces the novel idea of using auxiliary AI models and agents, supervised by humans, to “assist in contextual risk discovery, evaluation, and mitigation”. framework NVIDIA Lakera novel idea of using auxiliary AI models and agents In a nutshell, the risk framework involves four actors: In a nutshell, the risk framework involves four actors: Global Contextualized Safety Agent, which sets and enforces system-wide policies, risk thresholds, and escalation rules across all agents, with full visibility and auditability. Local Contextualized Attacker Agent, which acts as an embedded red team, probing the system with realistic and context-aware attacks to surface emergent risks. Local Contextualized Defender Agent, which applies in-band protections at runtime, enforcing least privilege, validating tool use, and containing unsafe behaviour. Local Evaluator Agent, which monitors agent behaviour to measure safety, reliability, and deviations, triggering alerts and governance actions. Global Contextualized Safety Agent, which sets and enforces system-wide policies, risk thresholds, and escalation rules across all agents, with full visibility and auditability. Global Contextualized Safety Agent Local Contextualized Attacker Agent, which acts as an embedded red team, probing the system with realistic and context-aware attacks to surface emergent risks. Local Contextualized Attacker Agent Local Contextualized Defender Agent, which applies in-band protections at runtime, enforcing least privilege, validating tool use, and containing unsafe behaviour. Local Contextualized Defender Agent Local Evaluator Agent, which monitors agent behaviour to measure safety, reliability, and deviations, triggering alerts and governance actions. Local Evaluator Agent The framework operates in two phases: The framework operates in two phases: Phase 1: Risk Discovery and Evaluation. It takes place in a sandboxed environment and is designed to uncover emergent risks that do not appear in static testing. An embedded attacker may simulate adversarial attacks (prompt injection, poisoned retrieval data, or unsafe tool chaining), while an evaluator monitors full execution traces to measure safety, reliability, and policy compliance. The goal is to identify vulnerabilities, assess risk thresholds, and design pre-deployment defensive controls. Phase 1: Risk Discovery and Evaluation. It takes place in a sandboxed environment and is designed to uncover emergent risks that do not appear in static testing. An embedded attacker may simulate adversarial attacks (prompt injection, poisoned retrieval data, or unsafe tool chaining), while an evaluator monitors full execution traces to measure safety, reliability, and policy compliance. The goal is to identify vulnerabilities, assess risk thresholds, and design pre-deployment defensive controls. Phase 1: Risk Discovery and Evaluation Phase 2: Embedded Mitigation and Continuous Monitoring. It applies those controls in production. The system runs with in-band defenses that enforce least-privilege access, validate tool calls, apply guardrails, and contain unsafe behaviour in real time. A monitoring component continuously evaluates system behaviour against expected trajectories and predefined risk thresholds, triggering alerts or human escalation when necessary. This system ensures that safety is an adaptive, ongoing governance process that addresses behavioural drift, changing contexts, and newly emerging threats. Phase 2: Embedded Mitigation and Continuous Monitoring. It applies those controls in production. The system runs with in-band defenses that enforce least-privilege access, validate tool calls, apply guardrails, and contain unsafe behaviour in real time. A monitoring component continuously evaluates system behaviour against expected trajectories and predefined risk thresholds, triggering alerts or human escalation when necessary. This system ensures that safety is an adaptive, ongoing governance process that addresses behavioural drift, changing contexts, and newly emerging threats. Phase 2: Embedded Mitigation and Continuous Monitoring Agentic Risk & Capability (ARC) Framework Agentic Risk & Capability (ARC) Framework The Responsible AI team in GovTech Singapore's AI Practice published on GitHub the Agentic Risk & Capability (ARC) framework, a technical governance programme “for identifying, assessing, and mitigating safety and security risks in agentic AI systems”. GitHub Agentic Risk & Capability (ARC) framework Interestingly, the team developed a capability-centric taxonomy that categorises AI agents into three main domains: Cognitive capabilities (reasoning, planning, learning, and decision-making) Interaction capabilities (how agents perceive, communicate, and influence environments or humans) Operational capabilities (whether agents execute actions safely and efficiently) Cognitive capabilities (reasoning, planning, learning, and decision-making) Interaction capabilities (how agents perceive, communicate, and influence environments or humans) Operational capabilities (whether agents execute actions safely and efficiently) They also produced a risk register linking capabilities to specific risks: Component risks (failures or vulnerabilities in system modules) Design risks (architecture, logic, or decision loops issues) Capability-specific risks (threats arising from the agent’s abilities, reward hacking) Component risks (failures or vulnerabilities in system modules) Design risks (architecture, logic, or decision loops issues) Capability-specific risks (threats arising from the agent’s abilities, reward hacking) Each risk is then mapped to specific technical controls (guardrails, policies, monitoring) to mitigate it, providing direct risk-control traceability. This helps governance teams see which controls are applied for each capability and risk. Find out more on GitHub. GitHub Getting ahead of the singularity Getting ahead of the singularity We’re a long way from the horrors of the AI singularity; we’re aware of that. Yet it is no surprise that our altered perception of what AI agents really are – complex software systems as opposed to humanoid robots ready to exterminate us in our sleep – pushes us toward worrying about the latter rather than the former. horrors of the AI singularity At present, these fears are irrational and must be put into the right context. And the context is that of AI agents bringing as many benefits as potential dangers to companies or individuals. The governance frameworks emerging globally signal that Agentic AI is here to stay, the potential risks are certainly real, and some actors are working to address them proactively. AI agents bringing as many benefits as potential dangers address them proactively