Open-Source ‘DNA GPT’ Evo 2 Hits the Internet, Raising Hopes for Medicine—and New Biosecurity Questions

Published:

In a lab in Palo Alto, Calif., scientists recently spelled “EVO2” not in ink or pixels but in the chemical marks that tell a cell which pieces of DNA to read.

They started with a blank stretch of genetic code. An artificial intelligence system proposed precise strings of A, C, G and T that, in theory, would cause pockets of that DNA to open up in specific places inside a cell’s nucleus. When the team inserted the sequences into mouse stem cells and read out the results, bands of “open” DNA traced a pattern that looked like Morse code, matching the intended letters.

Those sequences do not exist in nature. They were designed by Evo 2, a new large-scale AI model that its creators describe as a “biological foundation model” for the genome—and that they have now placed, in full, on the open internet.

On March 4, Nature published a paper detailing Evo 2, a 40‑billion‑parameter system trained on 9.3 trillion DNA bases from more than 128,000 species. The model can predict which mutations in human DNA are likely to be harmful, recognize regulatory elements that control gene activity, and generate new genome-scale sequences. The authors have released the model’s weights, the training and inference code, and the underlying dataset.

That combination of reach and radical openness is being hailed by many scientists as a milestone for genomics and medicine, and cited by security experts as a live test of how societies will handle dual-use AI in biology.

“We have made Evo 2 fully open, including model parameters, training code, inference code and the OpenGenome2 dataset, to accelerate the exploration and design of biological complexity,” the authors wrote in the Nature paper.

A ‘language model’ for DNA

Evo 2 was developed by researchers at the Arc Institute, an independent biomedical research center in Palo Alto, along with collaborators at Stanford University, the University of California, Berkeley, the University of California, San Francisco, NVIDIA, and others.

Like large language models that predict the next word in a sentence, Evo 2 learns by predicting the next nucleotide in long stretches of DNA. It uses a StripedHyena 2 architecture, which combines attention mechanisms with fast convolution-like operators to process up to 1 million bases of DNA at once. That long “context window” is intended to capture how regulatory regions far from a gene can influence its behavior.

The model was trained on OpenGenome2, a curated dataset of more than 8.8 trillion nucleotides from bacteria, archaea, eukaryotes, and bacteriophages, drawn from reference genomes and metagenomic samples.

For safety reasons, the team excluded all viral genomes that infect eukaryotic hosts—in practice, viruses that can infect humans, animals, or plants, including common human pathogens.

“By excluding genomic sequences of viruses that infect eukaryotes from our training data, we aimed to ensure our openly shared model did not disseminate the capability to manipulate and design pathogenic human viruses,” the Nature authors wrote.

The project builds on an earlier system, Evo, that focused on prokaryotic genomes and was published in Science in 2024. Evo 2 expands both the scale of the data and the biological scope, adding human and other eukaryotic DNA and extending the model’s view of the genome by an order of magnitude.

Early signals for medicine

In tests reported in Nature, Evo 2 showed strong performance on predicting the effects of changes in human DNA, a central challenge in genetic medicine.

Using data from ClinVar, a public archive of genetic variants linked to disease, the system’s scores could distinguish between pathogenic and benign variants in both protein‑coding and noncoding regions of the genome, without task‑specific training. The model was especially strong on small insertions and deletions, where many existing tools struggle.

The team also evaluated Evo 2 on large experimental datasets for the BRCA1 and BRCA2 genes, which are associated with inherited breast and ovarian cancer. There, the likelihood the model assigned to each variant closely tracked measured functional scores from lab assays.

In a news release, Arc said Evo 2 achieved “over 90% accuracy” in distinguishing benign from potentially pathogenic BRCA1 mutations in those tests. The model’s authors argue that such capabilities could help prioritize variants of uncertain significance for further study, potentially speeding diagnoses in cancer genetics and rare disease.

The model also performed well on benchmarks for regulatory DNA, such as recognizing promoters and enhancers and predicting how specific variants affect chromatin accessibility—how tightly DNA is packed—in different cell types.

“These models are beginning to capture the rules by which genomes encode regulatory programs,” said Brian Hie, a chemical engineering professor at Stanford and one of the senior authors, in an earlier university interview. He described Evo 2 as a tool that can “help us reason about and design DNA” across species.

Writing patterns in the epigenome

To show that Evo 2 could move from prediction to design, the team combined it with existing sequence‑to‑function models to generate thousands of short DNA sequences predicted to produce user‑specified patterns of chromatin openness.

In one series of experiments, they asked the system to create stretches of DNA whose accessibility rose and fell along the sequence in ways that would encode simple messages, such as “ARC” and “EVO2,” when read out. After integrating the designs into mouse embryonic stem cells and measuring chromatin accessibility, the patterns closely matched the intended designs, with statistical metrics known as AUROCs between 0.92 and 0.95.

In other tests in human HEK293T and K562 cells, 92% of designs achieved strong control of accessibility intensity across a sequence, and a smaller share produced marked differences between the two cell types. The authors suggest that similar approaches might eventually be used to craft regulatory elements that activate a gene therapy only in specific tissues.

The model was also evaluated on deep mutational scanning data for dozens of proteins and RNA molecules from bacteria and eukaryotes. Its scores correlated well with measured fitness in many cases, though specialized protein models still led on some benchmarks. The authors emphasize Evo 2 as a generalist system that can be adapted for specific tasks, not a replacement for all specialized tools.

Open weights, open data

Beyond the science, the Evo 2 release stands out for how much the team has shared.

Model checkpoints for the 40‑billion‑parameter version and a 7‑billion‑parameter variant are available on the Hugging Face platform, along with smaller models. The full training and inference code is posted on GitHub. The OpenGenome2 dataset is also downloadable, and the model is accessible through NVIDIA’s BioNeMo cloud service. The team has released a browser‑based interface, Evo Designer, and a visualization tool that links the model’s internal features to known genomic elements.

Nature’s editors noted that Evo 2 is “one of the largest‑scale fully open models thus far, even across other modalities, such as language and vision.”

Arc’s chief technology officer, Dave Burke, has compared Evo 2 to an “operating system kernel” for biology—a shared base upon which researchers and companies can build task‑specific applications.

Safeguards and stress tests

The Evo 2 developers devoted a section of their paper to safety, security, and ethics. In addition to filtering viral data, they said they conducted pre‑publication risk assessments with experts in health security, law, and policy, and tested the model’s behavior on sequences from high‑risk pathogens.

They reported that Evo 2 performs poorly on predicting the effects of mutations in human viral proteins and select‑agent viral genomes, and tends to produce random‑like outputs when prompted to generate sequences resembling such viruses. In a statement, Arc said the team “ensured that the model would not return productive answers to queries about these pathogens.”

Outside researchers have welcomed those steps but say they do not eliminate concern.

A blog post from Princeton University’s Center for Information Technology Policy last year reported that fine‑tuning Evo 2 on relatively small datasets of harmful viral sequences could partially restore its capabilities in those domains, suggesting that data filtering is not a complete safeguard once a powerful base model is public.

Another group, in a preprint describing an attack method dubbed “GeneBreaker,” said they were able to “jailbreak” several DNA language models, including Evo 2, by crafting adversarial prompts that elicited potentially harmful viral protein sequences. In some cases, the reported attack success rate approached 60% for certain viral families.

Separate work has raised questions about privacy, showing that representations produced by genomic models can, under some conditions, leak information about the underlying training sequences. Evo 2 was evaluated alongside other models in that study.

Policy analysts at organizations such as the Center for Strategic and International Studies have pointed to Evo 2 in reports on AI‑enabled biosecurity, arguing that it exemplifies the tension between open science and the need to manage dual‑use risks. Those reports note that existing regulatory tools—from export controls to DNA synthesis screening guidelines—were not designed with open‑weight bio‑AI models in mind.

A global test case

For many biologists, the benefits of openness are straightforward. A lab without the resources to train massive models can now download Evo 2 or access it through an API and apply it to problems in microbial ecology, agriculture, or human disease. Researchers in low‑ and middle‑income countries, advocates say, gain access to the same core computational tool as well‑funded centers.

Critics question whether that promise will be fully realized. They note that turning AI‑generated sequences into therapies or engineered organisms still requires advanced laboratories, regulatory know‑how, and intellectual property protections that are unevenly distributed worldwide. Meanwhile, the risks of misuse or accidents involving designed biological sequences could spread across borders.

Evo 2 arrives as governments in the United States and Europe are debating new rules for so‑called frontier models in AI. Most of those discussions have focused on general‑purpose language models and image generators. Systems like Evo 2, which operate directly on the genetic code of living things, now present lawmakers with a more concrete case.

The developers say they see the release as an experiment in responsible openness. Whether future biological foundation models follow the same path is unclear.

As more powerful systems are built—potentially combining genomic data with clinical records, proteomics, and other layers of biology—researchers and policymakers will likely revisit how much of their inner workings should be shared. For now, a model that can read and write the genome at million‑base scale is freely available, and the consequences of that choice will play out in labs, clinics, and policy debates for years to come.

Tags: #ai, #genomics, #biosecurity, #openai, #biotech