← All insights
The AI-First Web

10 Million Sites Scanned. Here's What the Web Actually Looks Like.

10,002,735 detections. WordPress 74.3%. Shopify 7.8%. Drupal 4.5%. Joomla 3.5%. Next.js 2.6%. 929 TLDs. 74 countries. The deeper you scan, the more legacy you find.

· 5 min read
Share on X LinkedIn

The Definitive Count

10,002,735
Total detections
Source: WebPulse Common Crawl scan. The largest independent framework detection survey published.
74.3% (7,427,780 of 10,002,735 detected)
WordPress
Source: WebPulse Common Crawl scan, 10M+ detections.
7.8% (777,276 of 10,002,735 detected)
Shopify
Source: WebPulse Common Crawl scan, 10M+ detections.
4.5% (444,706 of 10,002,735 detected)
Drupal
Source: WebPulse Common Crawl scan, 10M+ detections.
3.5% (352,042 of 10,002,735 detected)
Joomla
Source: WebPulse Common Crawl scan, 10M+ detections.
2.6% (263,000 of 10,002,735 detected)
Next.js
Source: WebPulse Common Crawl scan, 10M+ detections. #1 modern framework.

10,002,735 framework detections across 929 TLDs and 74 countries. This is the largest independent web framework survey ever published. The finding is unambiguous: the web is legacy infrastructure. WordPress alone accounts for 7.4 million of the 10 million sites we detected. Three out of four websites on the detectable web run on a PHP CMS released in 2003.

The Long Tail Revealed

From 2M to 8M detections, WordPress held at exactly 73%. We called it immovable. Then the long tail showed up. The last 2 million detections reached deeper — smaller sites, older sites, regional domains that barely get crawled. Those sites are even more WordPress-dominated. At 10M, WordPress gained share. So did Joomla (2.8% to 3.5%). Modern frameworks lost share: Next.js dropped from 3.0% to 2.6%. Shopify corrected from 9.6% to 7.8%.

What 10 Million Tells Us

~5.0% of detected sites
Modern frameworks combined
Source: WebPulse Common Crawl scan, 10M+ detections. Down from 6.4% at 6.28M.
929
TLDs covered
Source: WebPulse Common Crawl scan, 10M+ detections.
74
Countries with data
Source: WebPulse Common Crawl scan, 10M+ detections.
11,482 detections
HTMX
Source: WebPulse Common Crawl scan, 10M+ detections. Surpassed Gatsby and SvelteKit.

The deeper you scan, the more legacy you find. Every million additional detections pushes the modern share down, not up. The conference-circuit web — Next.js, Astro, SvelteKit, Remix — is real, but it accounts for approximately 5% of what actually exists. The other 95% runs on WordPress, Shopify, Drupal, and Joomla. This is not a trend waiting to reverse. This is the structural reality of the web at scale.

Share this insight
More insights