Stanford University researchers announced that 35% of new websites created since 2022 are AI-generated or AI-assisted marking major internet transformation in just three years.
The study used Internet Archive Wayback Machine data spanning 33 months from August 2022 to May 2025 analyzing website snapshots with AI-detection software.
The research team from Stanford, Imperial College London, and Internet Archive used Pangram v3 to analyze website content. The proportion of AI-generated sites rose from zero before ChatGPT’s late 2022 launch to around 35% by mid-2025.
“I find the sheer speed of the AI takeover of the web quite staggering,” Jonáš Doležal, an AI researcher at Stanford and co-author of the paper, said. “After decades of humans shaping it, a significant portion of the internet has become defined by AI in just three years. We’re witnessing, in my opinion, a major transformation of the digital landscape in a fraction of the time it took to build in the first place.”
The study titled “The Impact of AI-Generated Text on the Internet” tested six hypotheses about AI content effects on the web. Only two held up under data scrutiny according to published findings. AI-generated sites showed pairwise semantic similarity scores 33% higher than human-written ones meaning the same ideas keep getting expressed in nearly identical ways.
“The proliferation of AI-generated and AI-assisted text on the internet is feared to contribute to a degradation in semantic and stylistic diversity, factual accuracy, and other negative developments,” the researchers write in the paper. “We find that by mid-2025, roughly 35% of newly published websites were classified as AI-generated or AI-assisted, up from zero before ChatGPT’s launch in late 2022.”
The researchers found the web is becoming less semantically diverse as AI content proliferates. However, concerns about increased disinformation from AI hallucinations and failure to cite sources did not materialize in the data. The transformation happened faster than the original web buildout surprising even researchers studying the phenomenon.
The study partnered with Internet Archive to retrieve oldest available archived snapshots via Wayback Machine’s CDX Server API. Raw HTML of each snapshot was downloaded and stored locally for processing. The team tested several AI-detection tools finding Pangram v3 had highest detection rate for identifying AI-created websites.

