Agentic AI Governance Frameworks 2026: Risks, Oversight, and Emerging Standards

This year, in tech, began with the rise of Agentic AI. A little less than two months into 2026, the AI debate has already been hijacked by AI agents, their capabilities, and their benefits for businesses. Between agents inventing Crustafarian religions overnight and sci-fi scenarios, a more prosaic set of questions emerges. Just to name a few: the governance risks of delegating tasks to machines, the impact on the human workforce, the increased need for human control and oversight.

Since I am allergic to any form of tech hype, I will not give in to the narration that sees AI agents taking over the planet by Christmas at the latest. But companies are indeed exploring the possibility of implementing AI agents to optimise workflows. The growing interest in these solutions seems confirmed by the surfacing of Agentic AI governance frameworks. Let’s see a couple of them.

Singapore’s early move on Agentic AI governance

In January 2026, Singapore’s Infocomm Media Development Authority (“IMDA”) published its Agentic AI governance framework. First of all, the (voluntary) framework acknowledges that the agents’ “access to sensitive data and ability to make changes to their environment” raises a whole new profile of risks. The complex interactions among agents substantially increase the risk of outcomes becoming more unpredictable. Since agents may be performing financial transactions or altering databases containing personal data, the magnitude of these potential risks cannot be minimised.

Singapore’s model is not about rewriting governance but adapting AI considerations and translating them for agents. For instance, the principles of fairness and transparency continue to apply more than ever. So too does human accountability, human oversight, and control, which need to be continuously implemented across the AI lifecycle, to the extent possible.

Agentic AI risks

Singapore’s framework recognises that Agentic AI risks are not too dissimilar from the traditional LLM-related risks (SQL and prompt injection, hallucination, bias, data leakage, etc). What changes is the way they manifest themselves: an agent may hallucinate by making a wrong plan to complete a task, or at a later stage, during execution, by calling non-existent tools or calling them in a biased manner.

Risks are even higher when agents interact with each other. A mistake by one agent may produce a cascading effect, if the wrong output is passed on to other agents and propagates across the system. As mentioned above, complex interactions may lead to unpredictable outcomes and unexpected bottlenecks in the chain of actions.

The model identifies five key, potentially harmful categories of risks:

Erroneous action. Imagine an AI agent failing to escalate an IT incident to human operators because the anomaly detected does not match predefined thresholds. Depending on the context, the wrongful action may cause system compromise.
Unauthorised actions. This risk is configured by an agent taking actions that sit outside of its permitted scope.
Biased or unfair actions. We are familiar with bias as this is a frequent problem with traditional AI, especially binary classification models. The rationale here is the same: think of an agent making a biased hiring decision.
Data breaches. A classic scenario is where agents may be disclosing sensitive information inadvertently, without recognising it as sensitive, or a security breach by malicious actors who gain access to private information via agents.
Disruption to connected systems. This risk relates to the event where a wrongful action undertaken by an agent interacting with other systems propagates, disrupting the flow of information or actions (e.g., mistakenly deleting a production codebase).

Governance model

The IMDA’s Agentic AI governance model is based on four pillars.

1. Assessing risks upfront

Essentially, this step involves determining risks and use cases for agent deployment, and designing a risk control system.

Central to determining use cases is the identification of risk, described as a function of impact and likelihood (music to my risk management ears…) and threat modelling. The model illustrates a series of factors affecting the potential impact of AI agents (deployment domain, access to sensitive data and external system, scope and reversibility of agents’ actions) and likelihood (agents’ level of autonomy, task complexity). In IMDA’s view, threat modelling is complementary to risk assessments, inasmuch as it identifies potential external attack scenarios. Common threats may be memory poisoning, tool misuse, and privilege compromise.

The next logical step is to define agents’ limits and permissions. This means producing policies, procedures, and protocols that clearly outline the limits of agents in terms of access to tools and systems, their level of autonomy, and area of impact (e.g., deploying agents in “self-contained environments” with limited network and data access, particularly when they are carrying out high-risk tasks such as code execution). The problem of agents’ identity management and access control is trickier, as current authentication systems designed for humans do not smoothly translate to complex systems like AI agents. As new solutions and standards are being developed to circumvent this issue, a mix of traditional identity access and human supervision is required.

2. Making humans truly accountable

The second pillar concerns establishing clear responsibilities within and outside the organisation, and enabling meaningful human oversight. IMDA’s fundamental premise is that organisations and individuals remain accountable for their agents’ actions.

Within the organisation, responsibilities should be defined for: a) key decision makers, including setting agents’ high-level goals, limits, and the overall governance approach; b) product teams, comprising defining agents’ requirements, design, controls, safe implementation and monitoring; c) cybersecurity team, including establishing baseline security guardrails and security testing procedures; d) users, comprising ensuring responsible use of agents and complying with relevant policies. Outside actors may include, for instance, model developers or agentic AI providers, and for these, the organisation should set clear responsibilities.

Designing meaningful human oversight involves three measures. First, companies need to define action boundaries requiring human approval, such as high-stakes or irreversible actions (editing sensitive data or permanently deleting data), or outlier and atypical behaviours (agents acting beyond their scope). Secondly, they must ensure the continued effectiveness of human oversight, for instance by training humans to identify common failure modes and regularly auditing human control practices. Finally, they should introduce automated real-time alert monitoring.

3. Implementing technical and control processes

On top of the traditional LLM-related technical control, the third pillar recommends adding new controls required by the novelty of Agentic AI across the lifecycle.

For instance, companies should introduce strict pre-deployment controls using test agents to observe how actual agents will operate once deployed. Companies should take a holistic approach when testing agents, including evaluating new risks, workflows, and realistic environments across datasets, and evaluating test results at scale. Just like traditional AI, agents should be continuously monitored and tested post-deployment, so that humans can intervene in real time and debug where necessary. This activity will not go unchallenged, as agents work at speed and companies may struggle to keep up.

4. Enabling end-user responsibility

Finally, to ensure responsibility and accountability of end users – that is, those who will use and rely on AI agents – companies should focus on transparency (communicating agents’ capabilities and limitations) and education (training users on proper use and oversight of agents). Organisations may focus on transparency for users who interact with agents (external-facing users, such as customer service or HR agents) and on education for users who integrate agents into their work processes (internal-facing users, such as coding assistants).

UC Berkeley’s Agentic AI framework

In February 2026, a group of researchers from UC Berkeley’s Center for Long-Term Cybersecurity published the Agentic AI Risk-Management Standards Profile, a risk framework broadly reflecting NIST AI Risk Management Framework (AI RMF). Similarly to IMDA, the paper recognised the increased risks introduced by agents, including “unintended goal pursuit, unauthorized privilege escalation or resource acquisition, and other behaviors, such as self-replication or resistance to shutdown”. These unique challenges “complicate traditional, model-centric risk-management approaches and demand system-level governance”.

UC Berkeley’s framework was explicitly designed for single- or multi-agentic AI systems developers and deployers. However, the authors say, it can also be used by policymakers and regulators “to assess whether agentic AI systems have been designed, evaluated, and deployed in line with leading risk-management practices”.

Agentic AI risks

Compared to IDMA, the paper identifies a broader array of risks:

Discrimination and toxicity, including feedback loops, propagation of toxic content, and disparities in availability, quality, and capability of agents.
Privacy and security, including unintended disclosure of personal or sensitive data, data leakage, and resulting misaligned outcomes.
Misinformation, especially when hallucination and erroneous outputs from one agent are reused by other agents.
Malicious actors and misuse, including easier execution of complex attacks, automated misuse, mass manipulation, fraud, and coordinated influence campaigns.
Human-computer interaction, such as reduced human oversight, socially persuasive behaviour, and users’ difficulty in understanding or contesting agent behaviours.
Loss of control, comprising oversight subversion, rapid execution outrunning monitoring and response, and behaviours that undermine shutdown or containment mechanisms.
Socioeconomic and environmental harms, including inequalities in accessing agentic capabilities, collective disempowerment, and large-scale economic and environmental impacts.
AI system safety, failures, and limitations, including autonomous replication, misalignment, deception, collusion, goal-driven planning, real-world impact, and insufficient human oversight.

Focus on human control

Much like IMDA, UC Berkeley’s standards primarily aim to enhance human oversight, focusing on:

Human control and accountability (clear roles and responsibilities, including clear role definitions, intervention checkpoints, escalation pathways, and shutdown mechanisms)
System-level risk assessment (particularly useful for multi-agent interactions, tool use, and environment access)
Continuous monitoring and post-deployment oversight (agentic behaviour may evolve over time and across contexts)
Defence-in-depth and containment (treating agents as untrusted entities due to the limitations of current evaluation techniques)
Transparency and documentation (clear communication of system boundaries, limitations, and risk-mitigation decisions to stakeholders)

The authors acknowledge the limitations of their own standard. Firstly, Agentic AI taxonomies widely vary and are inconsistently applied across the world, which limits “the ability to harmonize recommendations across organizations and jurisdictions”. Secondly, the bcomplex multi-system behaviour and increased autonomy make it difficult to ensure robust human control and the correct attribution of liability. Finally, many risk metrics remain underdeveloped, especially “with respect to emergent behaviours, deceptive alignment, and long-term harms”.

For this reason, the authors warn, the paper adopts a “precautionary approach, emphasizing conservative assumptions, layered safeguards, and continuous reassessment”. Rather than a static governance checklist, it should be viewed as “a living framework intended to evolve alongside agentic AI research, deployment practices, and governance norms”.

NIST design

As mentioned above, the framework’s design overlaps that of NIST AI RMF, structuring the Agentic AI efforts around the four core functions: Govern, Map, Measure, and Manage. This is an intentional decision from the authors to help companies apply the risk management procedures on a structure they are familiar with and build a framework that is consistent with existing practices.

More Agentic AI frameworks

IMDA and UC Berkeley’s frameworks have been recently published but are not the only Agentic AI governance programmes to be proposed. There are references to various other models that outline processes and procedures to address the risks posed by AI agents. Let’s have a look at four of them.

Agentsafe

In December 2025, three Irish IBM experts published a paper proposing Agentsafe, a tool-agnostic governance framework for LLM-based agentic systems.

In practice, Agentsafe “operationalises the MIT AI Risk Repository by mapping abstract categories of risk into a structured set of technical and organisational mechanisms”, tailored to agent-specific risks. It also introduces constraints to risky behaviours, escalates high-impact actions to human oversight, and assesses systems based on pre-deployment incident scenarios, comprising security, privacy, fairness, and systemic safety. According to the authors, the framework provides assurance through evidence and auditability, offering a methodology that links risks to tests, metrics, and provenance.

Agentsafe appears to be a very promising framework and a natural extension of traditional AI technical governance to the realm of Agentic AI. It builds on ethical principles (accountability, transparency, and safety), is shaped by structured risk management processes aligned with international standards, and seems to bear the potential to address two key challenges of Agentic AI: timely containment and effective human oversight.

AAGATE

In November 2025, on a decidedly more technical side, 11 entrepreneurs, researchers, and industry experts published a paper proposing the Agentic AI Governance Assurance & Trust Engine (AAGATE), defined as a “NIST AI RMF-aligned governance platform for Agentic AI”. The paper is based on the assumption that “traditional AppSec and compliance tools were designed for deterministic softwar,e not self-directed reasoning systems capable of improvisation”.

To close this gap, AAGATE operationalises the above-mentioned NIST AI RMF principles (Govern, Map, Measure, Manage), integrating “specialized security frameworks for each RMF function: the Agentic AI Threat Modeling MAESTRO framework for Map, a hybrid of OWASP’s AIVSS and SEI’s SSVC for Measure, and the Cloud Security Alliance’s Agentic AI Red Teaming Guide for Manage”. The authors explain that this layered architecture will enable “safe, accountable, and scalable deployment”.

You can have a look at a simplified summary of AAGATE published on the Cloud Security Alliance.

NVIDIA’s Agentic AI risk framework

November 2025 also witnessed the publication of an Agentic AI safety and security framework by a group of experts from NVIDIA and Zurich-based AI company Lakera. The framework introduces the novel idea of using auxiliary AI models and agents, supervised by humans, to “assist in contextual risk discovery, evaluation, and mitigation”.

In a nutshell, the risk framework involves four actors:

Global Contextualized Safety Agent, which sets and enforces system-wide policies, risk thresholds, and escalation rules across all agents, with full visibility and auditability.
Local Contextualized Attacker Agent, which acts as an embedded red team, probing the system with realistic and context-aware attacks to surface emergent risks.
Local Contextualized Defender Agent, which applies in-band protections at runtime, enforcing least privilege, validating tool use, and containing unsafe behaviour.
Local Evaluator Agent, which monitors agent behaviour to measure safety, reliability, and deviations, triggering alerts and governance actions.

The framework operates in two phases:

Phase 1: Risk Discovery and Evaluation. It takes place in a sandboxed environment and is designed to uncover emergent risks that do not appear in static testing. An embedded attacker may simulate adversarial attacks (prompt injection, poisoned retrieval data, or unsafe tool chaining), while an evaluator monitors full execution traces to measure safety, reliability, and policy compliance. The goal is to identify vulnerabilities, assess risk thresholds, and design pre-deployment defensive controls.

Phase 2: Embedded Mitigation and Continuous Monitoring. It applies those controls in production. The system runs with in-band defenses that enforce least-privilege access, validate tool calls, apply guardrails, and contain unsafe behaviour in real time. A monitoring component continuously evaluates system behaviour against expected trajectories and predefined risk thresholds, triggering alerts or human escalation when necessary. This system ensures that safety is an adaptive, ongoing governance process that addresses behavioural drift, changing contexts, and newly emerging threats.

Agentic Risk & Capability (ARC) Framework

The Responsible AI team in GovTech Singapore's AI Practice published on GitHub the Agentic Risk & Capability (ARC) framework, a technical governance programme “for identifying, assessing, and mitigating safety and security risks in agentic AI systems”.

Interestingly, the team developed a capability-centric taxonomy that categorises AI agents into three main domains:

Cognitive capabilities (reasoning, planning, learning, and decision-making)
Interaction capabilities (how agents perceive, communicate, and influence environments or humans)
Operational capabilities (whether agents execute actions safely and efficiently)

They also produced a risk register linking capabilities to specific risks:

Component risks (failures or vulnerabilities in system modules)
Design risks (architecture, logic, or decision loops issues)
Capability-specific risks (threats arising from the agent’s abilities, reward hacking)

Each risk is then mapped to specific technical controls (guardrails, policies, monitoring) to mitigate it, providing direct risk-control traceability. This helps governance teams see which controls are applied for each capability and risk.

Find out more on GitHub.

Getting ahead of the singularity

We’re a long way from the horrors of the AI singularity; we’re aware of that. Yet it is no surprise that our altered perception of what AI agents really are – complex software systems as opposed to humanoid robots ready to exterminate us in our sleep – pushes us toward worrying about the latter rather than the former.

At present, these fears are irrational and must be put into the right context. And the context is that of AI agents bringing as many benefits as potential dangers to companies or individuals. The governance frameworks emerging globally signal that Agentic AI is here to stay, the potential risks are certainly real, and some actors are working to address them proactively.