June 26, 2014, 2:08 PM

How many e-commerce companies are there?

It’s a question Internet Retailer editors get asked all the time. By crunching a variety of data RJMetrics comes up with an answer, at least for the English-speaking world.

There is no shortage of top-down research telling us that the e-commerce market is enormous, growing extremely fast, and showing no signs of slowing down. According to sources like eMarketer, e-commerce is the only trillion-dollar industry growing at a double-digit percentage each year. And with the U.S. Census Bureau estimating that only 7% of retail sales are done on the internet, e-commerce still has a lot of runway for growth.

Despite all this research, however, no one seems to be able to answer the key question: How many e-commerce companies are there? The few estimates that exist vary by orders of magnitude, from tens of thousands to nearly a million.

We set out to answer this question for ourselves.

How we did it

We have a secret ingredient that helped us build an estimate from the ground-up: proprietary data. Here at RJMetrics, we work with hundreds of online retailers who generously allow us to anonymize high-level data points for analyses like these.

By combining our proprietary data with size and revenue information from third-party sources like the Internet Retailer Top 500 Guide, Alexa, and BuiltWith, we’ve conducted a comprehensive bottoms-up analysis of the e-commerce industry.

Size matters

Obviously, the long tail is going to be very long here. Using BuiltWith to identify which web sites have e-commerce technologies installed, we found 180,000 live web sites with just the Magento shopping cart. When you extrapolate to include the full universe of competing e-commerce technologies, you can see how some estimates approach the 1 million mark. As you might have guessed, however, the majority of these sites are not generating revenue on any meaningful scale.

In order to separate the wheat from the chaff, we needed to come up with revenue-based exclusion criteria.

Tying Alexa rank to revenue

Alexa rank is an easily obtained proxy for traffic. Alexa ranks every web site in the world based on traffic volume. A global rank of 1 represents the website with the most traffic in the world (currently Google). Since e-commerce revenue is directly correlated with the number of visitors to a site, we theorized that Alexa rank could serve as a proxy for revenue. To test this, we needed revenue data for a set of e-commerce companies that spanned a broad spectrum of Alexa ranks.

To get revenue data, we turned to the data in the Internet Retailer Top 500 guide and augmented it with our own proprietary benchmarking data set. The IR 500 includes the heaviest hitters in e-commerce and our own data covered mid- and smaller-sized companies. Between these two data sets we had Alexa rank and revenue data on the full spectrum of e-commerce companies. Here’s what we saw:

Jackpot! There appears to be a pretty clear-cut link between revenue and Alexa rank. To be sure, let’s zoom in past the Walmarts and Amazons of the world and just look at the “long tail” of sites with Alexa ranks between 10,000 and 1,000,000:

Awesome. These combined data sets have given us visibility into the revenue of e-commerce companies throughout the Alexa top 1 million sites.

Meaningful scale

Note that, while the 500k-1M data point is quite low, it’s far from zero. The mean 2013 revenue for sites in that range is actually $1.5 million and the median is around $500,000. As evidenced by that discrepancy, average revenue drops meaningfully in this range.

For this reason, we’ve made an Alexa rank of 1,000,000 the cutoff for sites we include in our count.

While we are aware of many web sites with an Alexa rank above 1,000,000 that are generating well into six and even seven figures of revenue, we believe there would be far more false positives than false negatives if we included sites beyond this mark. We’re comfortable concluding that the balance of false positives/negatives that exist on either side of the threshold are well balanced with a threshold at the Alexa 1 million mark.

Defining e-commerce

Now that we had a way of estimating which e-commerce companies are actually generating meaningful revenue, we simply needed some way of figuring out which sites in the Alexa Top One Million are actually e-commerce.

Using the BuiltWith API, we were able to profile every web site in the Alexa Top One Million by evaluating the technologies being used by those sites. BuiltWith can detect a whole universe of shopping carts, marketing tools, and other e-commerce-specific technology that makes a web site a dead giveaway as e-commerce.

But this wasn’t good enough—we were still getting a lot of false positives and false negatives. We decided to go a step further. We scraped the HTML of each site’s home page and looked for certain words: “shop,” “buy,” “sell”. We also detected defunct pages and sites that looked more like linkspam. We ended up building an entire set of rules to automatically evaluate whether or not a given site was e-commerce.

And at every turn, we evaluated the rules against a set of web sites that we had evaluated by hand. Eventually, our algorithm was actually able to predict whether a site was e-commerce with 95% accuracy.

After we had fine-tuned the algorithm, we turned it loose on the Alexa Global Top One Million sites. Here’s what we found:

There are approximately 110,000 e-commerce web sites generating revenue of meaningful scale on the internet.

More than 12% of the 100,000 highest-traffic web sites are e-commerce, and that density clearly declines to about 10% for the long tail. According to our data, e-commerce web sites make up approximately 10-12% of the internet. And to our knowledge, we’re the first to actually attempt to count them.

I should point out that we include any online transactional business in our assessment. In addition to traditional online retail, this includes companies selling virtual goods, hosted software providers, marketplaces, travel sites, and even mobile apps with a commerce component. Basically, if you can spend money on their web site, it qualifies.

