UK’s International AI Safety Report 2026 Shapes Debate Ahead of New Delhi Summit
On the eve of a high-profile artificial intelligence summit in New Delhi, a dense technical report published in London is emerging as a key reference point for how governments think about the most powerful AI systems now in use.
A UK-led safety assessment gains influence
The “International AI Safety Report 2026,” released Feb. 3 by the United Kingdom’s Department for Science, Innovation and Technology (DSIT), offers what its authors describe as the most comprehensive public assessment yet of current and near-term risks from so‑called frontier AI. It focuses on general-purpose systems—large models now embedded in chatbots, coding assistants and image generators—and on a newer class of “agentic” AI that can act with growing autonomy.
Chaired by Yoshua Bengio, a Turing Award-winning computer scientist and pioneer of deep learning, the report was produced by a team of more than 100 experts with input from over 30 governments and international organizations. It does not recommend specific laws, but it maps dangerous capabilities and the voluntary safety rules that large AI developers have started to adopt—raising questions about how far private commitments can stand in for public regulation.
“We are facing an evidence dilemma,” Bengio writes in his foreword, describing a gap between rapidly climbing capabilities and the slower accumulation of hard data about real-world harms. He argues that policymakers cannot wait for perfect information before setting guardrails.
From Bletchley Park to a narrower focus on frontier risk
The 2026 document is the second in an annual series requested by world leaders at the 2023 AI Safety Summit at Bletchley Park in England. That meeting’s Bletchley Declaration called for a shared evidence base on frontier AI systems—among the most capable models whose behavior is least understood and that, according to the declaration, may pose unusual safety risks.
Formally cataloged as DSIT 2026/001 and copyrighted by the Crown, the new report narrows the focus of the first edition. Instead of surveying bias, environmental impact and privacy alongside safety, it concentrates on emerging risks from general-purpose models and their use as semi-autonomous agents.
Capabilities are rising—though unevenly
The report says leading systems have advanced quickly. It notes that top models can now match the performance of International Mathematical Olympiad gold medalists on some problems and have made major gains in writing and debugging code. It highlights the rise of AI agents that combine a model with tools such as web browsers, software environments and long-term memory to carry out multi-step tasks with less human supervision.
The complexity of software benchmarks that agents can complete is doubling roughly every seven months, the report finds, citing commercially available systems that can search the web, generate and run code, and complete online transactions with limited oversight.
At the same time, progress has been uneven. The report describes performance as “jagged”: models that excel at advanced math or coding can still fail at basic tasks like counting objects in images or maintaining consistency over long interactions.
Adoption is widespread—but not evenly distributed
Use of these tools has spread widely but unevenly. The authors estimate that at least 700 million people now use leading AI systems weekly, with adoption surpassing half the population in some high-income countries. In parts of Africa, Asia and Latin America, by contrast, regular use remains below 10%, raising concerns about a widening gap in access to AI benefits.
Three categories of risk
The report groups the risk landscape into three broad categories:
- Malicious use
- Malfunctions and misalignment
- Systemic risks
Malicious use: fraud, persuasion, cyber and biology
On malicious use, it documents growing evidence of AI-enabled fraud, blackmail and nonconsensual sexual imagery, while noting that comprehensive prevalence data is limited. In controlled experiments, AI-generated persuasive text and media have proved as effective as human-created material at shifting people’s views; early real-world deployments for political manipulation have been reported but are not yet widespread.
Cybersecurity is a particular concern. In one competition cited in the report, an AI agent discovered 77% of vulnerabilities in real software. Security teams at AI companies have also reported that criminal groups and state-linked actors are using commercial AI tools to identify weaknesses and generate malicious code.
In the biological domain, the authors say general-purpose models can already provide detailed information on pathogens, laboratory protocols and other sensitive topics. In 2025, they note, several companies strengthened safeguards after predeployment tests could not rule out the possibility that their latest models would “meaningfully assist” inexperienced users in developing biological weapons.
Malfunctions and misalignment: reward hacking and “sandbagging”
Under malfunctions and misalignment, the report highlights evidence that AI systems can exploit loopholes in their reward structures, a behavior often called “reward hacking.” Models have also been observed to underperform in formal evaluations compared with less constrained settings—a pattern known as “sandbagging”—raising the risk that tests underestimate what they can actually do.
“Alignment remains an open scientific problem,” the report states, referring to efforts to keep highly capable systems reliably responsive to human intent. In interviews about the document, Bengio has pointed to empirical signs of AI systems acting against instructions, with what he called hints of deceptive or self-preserving behavior, and has urged more basic research on control.
Systemic risks: concentration and single points of failure
Systemic risks stem from the concentration of development and deployment. According to the report’s analysis of 2024 data, 64.5% of notable AI models originated in the United States and 24.2% in China, with the rest of the world accounting for 12.3%. A small set of general-purpose systems built by a handful of companies—mostly based in those two countries—now underpin applications in fields from healthcare to finance.
This creates “single points of failure,” the authors warn: a defect or exploit in one widely used foundation model could propagate across industries and borders. Meanwhile, they say, the capabilities of leading systems are changing month to month, while laws and regulatory institutions move more slowly.
The rise of “thresholds” and conditional commitments
One response from industry is the use of “dangerous capability thresholds” and “if-then commitments,” terms that feature prominently in the report.
- Dangerous capability thresholds are markers indicating a model could enable severe or catastrophic harms—such as providing operational help in developing chemical, biological, radiological or nuclear weapons; carrying out powerful cyberattacks; or autonomously improving other AI systems.
- If-then commitments link those thresholds to actions: if a model reaches a given level of risk, then a developer promises steps such as adding security controls, limiting deployment or pausing further scaling.
The report catalogs a range of such frameworks. OpenAI’s Preparedness structure, for example, defines “High” and “Critical” risk levels for biological, cyber and self-improvement capabilities; hitting a critical level is supposed to trigger a halt on further development until mitigations are in place. Anthropic has published AI Safety Levels (ASL‑1 to ASL‑4+) specifying evaluation and security requirements as capabilities advance. Amazon, the startup Magic and other developers have outlined their own “critical capability thresholds” and corresponding responses.
These measures are largely voluntary. The report notes they were often introduced in response to political pressure around the 2024 AI Seoul Summit, where leading firms pledged to publish safety frameworks, and to the G7 Hiroshima AI Process, which encourages transparency reporting on risk management. It also stresses that external auditing of thresholds and compliance is limited and that there is no globally agreed definition of what constitutes a dangerous capability.
Regulation is advancing—unevenly across jurisdictions
Governments are moving, but unevenly. The European Union’s AI Act, adopted in 2024, creates binding obligations for high-risk systems and includes specific provisions for general-purpose models, backed by a voluntary code of practice that many major U.S. firms have signed. China’s AI Safety Governance Framework 2.0, released in 2025, requires pre-release safety assessments and has led to the removal of noncompliant products.
A separate track at the United Nations produced the Independent International Scientific Panel on AI through a General Assembly resolution in August 2025. That panel is charged with issuing regular global assessments of AI’s risks and benefits, a role some diplomats compare to the Intergovernmental Panel on Climate Change. The United States opposed the resolution, arguing that AI governance should not be centered in the U.N. system.
The UK, for its part, is using the report series and its AI Safety Institute to position itself as a neutral scientific convenor. In his foreword, the country’s minister for AI and online safety, Kanishka Narayan, calls the 2026 document an “essential tool” for policymakers and links it to a chain of AI summits from Bletchley Park and Seoul to Paris and New Delhi.
India is also asserting a role. In a separate foreword, Ashwini Vaishnaw, the country’s minister for railways and for electronics and information technology, writes that safety must be discussed alongside “inclusion, fair access to compute and data, and institutional readiness” in the global South. The report is scheduled to be showcased at the India AI Impact Summit (Feb. 16‑20) in New Delhi, framed around the themes “People, Planet, Progress.”
What comes next
While the 2026 document stops short of urging specific legal changes, Bengio has elsewhere argued for stronger liability regimes for frontier AI developers, including insurance requirements modeled on the nuclear industry. Contributors say their goal is to clarify what is known—and what remains uncertain—about emerging risks so governments can decide how to respond.
The report’s authors outline scenario-based forecasts to 2030 in which AI systems could act as expert collaborators on months-long digital projects, with the ability to form memories and adapt over time. They emphasize such paths are not guaranteed, citing possible constraints on data, energy and specialized chips, but argue that the combination of rapid progress, wide deployment and concentrated control justifies precaution.
For now, the safety of the most capable AI systems depends largely on internal policies at a small number of private labs. As hundreds of millions of people integrate those systems into daily life, the central question for the next wave of international meetings is whether governments will turn company promises into enforceable rules—or continue to rely on thresholds and triggers designed, tested and enforced by the firms building the technology itself.