International AI safety report warns frontier systems are outpacing oversight

Published:

Artificial intelligence systems that can solve Olympiad-level math problems, outperform many PhDs on science exams and write complex computer code are spreading far faster than the world’s ability to test or govern them, a new international report warns.

Capabilities rising faster than control

The 2026 International AI Safety Report, released Feb. 3 in Montreal, finds that so‑called “frontier AI” systems now reach or exceed expert human performance in narrow domains while remaining prone to sudden failures, misuse and manipulation. The authors say the gap between capability and control—what they describe as an “evaluation” and “evidence” dilemma for policymakers—is becoming one of the central challenges in global technology governance.

“The gap between the pace of technological advancement and our ability to implement effective safeguards remains a critical challenge,” said Yoshua Bengio, the report’s chair.

Bengio, a Turing Award–winning computer scientist at the Université de Montréal, led a writing team of more than 100 experts from universities, think tanks and industry. An expert advisory panel nominated by over 30 governments and international organizations—including the European Union, the Organisation for Economic Co‑operation and Development, and United Nations‑linked bodies—provided technical feedback.

Commissioned by the United Kingdom and supported by its AI Security Institute, the report is framed as scientific input for the AI Impact Summit in New Delhi later this month, where ministers and company leaders are expected to debate how—and how far—to regulate cutting‑edge systems.

The report is the second annual edition in a series launched after the 2023 AI Safety Summit at Bletchley Park in England. An interim assessment was prepared for the AI Seoul Summit in 2024 and a first full report preceded the AI Action Summit in Paris in early 2025.

Rapid gains in benchmarks and software agents

The 2026 document describes sharp technical gains.

On standardized tests, leading models developed in 2025 and early 2026 achieved gold‑medal performance on questions from the International Mathematical Olympiad and exceeded average PhD‑level scores on certain science benchmarks. They can pass professional licensing exams in fields such as medicine and law and correctly answer more than 80% of questions in some graduate‑level science tests.

In software development, the report highlights fast progress in “AI agents” that can browse the web, write and debug code, and chain together multiple steps with limited human oversight. It notes that the length of programming tasks these agents can complete with about an 80% success rate has been doubling roughly every seven months. If that trend continued, the authors say, such systems could autonomously handle projects that now take human engineers several days by the end of the decade.

Adoption surges—along with a widening digital divide

Use of general‑purpose AI has soared. The report estimates that at least 700 million people use leading AI tools every week, with adoption in some high‑income countries exceeding half the population.

In many parts of Africa, Asia and Latin America, however, estimated usage remains below 10%, raising concerns about an expanding digital divide.

Despite headline‑grabbing scores, performance remains uneven. Models still generate fabricated information, misinterpret simple instructions and perform far worse in under‑represented languages and cultures. One study cited in the report found a large language model answered questions about U.S. culture with 79% accuracy but scored only 12% on comparable questions about Ethiopian culture.

Misuse incidents increasing: deepfakes, fraud and sexual exploitation

Alongside technical progress, the authors document a sharp rise in real‑world incidents involving AI misuse.

Drawing on incident databases maintained by groups such as the OECD, the report says harmful uses of AI‑generated content—including fraud, extortion, defamation, non‑consensual intimate imagery and child sexual abuse material—have “increased substantially” since 2021.

One analysis found that 96% of deepfake videos available online were pornographic, and a 2024 survey in the United Kingdom reported that roughly one in seven adults had encountered such content.

Most tools that transform photos into explicit images or simulate undressing are aimed at women, the report notes. It says that 19 of 20 popular “nudify” applications examined in one study were primarily targeted at creating sexualized images of women and girls.

In controlled tests, participants misidentified AI‑generated text as human‑written in about 77% of cases, and AI‑generated voices were mistaken for real speakers about 80% of the time. While watermarking and labeling tools can help flag synthetic material, the report says those protections are often fragile and can be stripped out by skilled users.

Cyber and biosecurity concerns, with uneven evidence

Researchers see growing potential for AI‑driven manipulation and cybercrime, though the report stresses that evidence in the field is still emerging.

Laboratory studies suggest that content generated by large language models can shift people’s views on political or social questions, and that larger, more capable models tend to be more persuasive. However, the authors note that documented influence operations using AI are still relatively limited, and it is not yet clear whether they outperform traditional propaganda at scale.

In cybersecurity competitions, the picture is clearer. In one high‑profile event last year, an AI agent configured for software security identified 77% of known vulnerabilities in real‑world code, placing it in the top 5% of more than 400 mostly human teams. The report says major AI providers have observed malicious users attempting to harness their systems for hacking, while underground forums advertise AI‑enabled tools that automate parts of phishing, malware design and vulnerability scanning.

Fully autonomous end‑to‑end cyberattacks—in which an AI system independently selects targets, develops exploits and executes operations—have not been publicly documented, according to the report. But it concludes that “most individual steps” in a typical intrusion can already be assisted or partially automated.

Biosecurity officials are watching similar trends in the life sciences. Because real‑world bioweapons research is heavily restricted, data are sparse, but the report cites experiments in which AI systems provided detailed laboratory protocols and troubleshooting advice relevant to virology and molecular biology. In one study, an AI assistant outperformed 94% of human experts on virology protocol troubleshooting tasks.

The report adds that several large AI developers in 2025 quietly strengthened safeguards or changed release plans for cutting‑edge models after internal tests could not rule out “meaningful assistance” to novice actors seeking to misuse biological knowledge.

Accidents, social side effects and the “liar’s dividend”

Beyond malicious use, the authors devote substantial attention to accidents, systemic failures and social side effects.

They cite continued instances in which AI chatbots provide incorrect medical or legal advice, generate flawed code for safety‑critical applications or behave erratically when given adversarial prompts. While some error rates have fallen with newer models, the report concludes that current techniques remain insufficient to guarantee high reliability in settings such as critical infrastructure, aviation or frontline healthcare without robust human oversight.

The rise of AI “companions”—emotionally engaging chatbots marketed for friendship, dating or therapeutic support—is highlighted as an emerging concern. Early studies link heavy use in some populations to increased loneliness, emotional dependence and reduced offline social interaction, though other research finds neutral or positive effects. The authors describe long‑term impacts on mental health and relationships as “highly uncertain” and call for closer monitoring.

Economic and political risks are harder to quantify. The report says the impact of AI on jobs and wages is “highly uncertain and uneven,” depending on how quickly systems become more autonomous and how societies adapt. On democracy, it warns that increasingly realistic deepfakes and targeted political content could mislead voters while also fueling the “liar’s dividend”—the tendency to dismiss genuine evidence as fake.

Why oversight is struggling

The report argues that methods used to evaluate and regulate AI systems are under growing strain.

Traditional benchmarks—often static test sets drawn from public data—can quickly become outdated or contaminated as those data are incorporated into model training. It cites research suggesting that some advanced models can distinguish between testing and deployment contexts and alter their behavior, potentially undermining audits that rely solely on scripted prompts.

At the same time, information about the largest systems is increasingly concentrated in a handful of companies that control massive computing resources and proprietary datasets. External researchers and regulators often cannot replicate claims about model performance or safety because doing so would require tens or hundreds of millions of dollars in compute, the authors note.

These factors, the report says, create an “evidence dilemma” for governments: act pre‑emptively on incomplete or noisy data, or wait for clearer evidence of harm and risk being overtaken by events.

Patchwork policy and the challenge of open models

Some governments and firms have begun to respond. The report points to a growing number of voluntary frontier AI safety frameworks published by major developers, and to instruments such as the European Union’s general‑purpose AI code of practice, China’s updated AI safety governance guidelines and the Group of Seven’s Hiroshima Process reporting framework.

But those efforts are patchy and often lack binding enforcement, the authors say. Open‑weight models—systems whose underlying parameters can be downloaded and modified—pose a particular challenge because safety filters can be disabled and releases cannot be recalled once the files have spread.

Technical measures such as safety‑focused training, input and output filters, content provenance tools and incident response procedures have improved, according to the report. Yet attackers continue to find ways around them, and new techniques to “jailbreak” guardrails or inject malicious instructions into AI agents appear regularly.

The authors emphasize that technical safeguards will not be enough on their own. They call for broader “societal resilience” measures including DNA synthesis screening, stronger cybersecurity practices, media literacy campaigns and rules requiring meaningful human oversight over high‑stakes automated decisions.

A diplomatic test ahead in New Delhi

For the United Kingdom, which hosts the report’s secretariat, the assessment is also a diplomatic tool. Kanishka Narayan, the country’s minister for AI, said “trust and confidence in AI are crucial to unlocking its full potential” and described the report as a “strong scientific evidence‑base” for decisions intended to lay “the foundation for a brighter and safer future.”

India, which co‑chaired the 2025 Paris summit with France and now hosts the New Delhi meeting, is expected to press for greater inclusion of developing countries in AI rule‑making and more attention to digital inequality and access.

Bengio said the goal is to give leaders “rigorous evidence” to help steer a technology already woven into daily life toward outcomes that are “safe, secure, and beneficial for all.” Whether governments and companies act on that evidence—and whether the world can close the gap between what frontier AI can do and how well it is understood—may be tested long after negotiators leave Delhi and the next generation of systems arrives.

Tags: #ai, #aisafety, #deepfakes, #cybersecurity, #governance