Over at Creatuity, we just finished up our first eCommerce platform survey to determine the top eCommerce platforms in use on the web. First of all - huge thanks to Tom Robertshaw for starting the first eCommerce platform survey that I'm aware of a few years back - his most recent results are available on TomRobertshaw.net. I wanted to briefly talk about why we ran our own survey, why people run these surveys and how they work, especially after the recent survey that Aheadworks ran raised some questions on Twitter.
I decided to have the Creatuity team put together our own study/survey for a few reasons:
- Provide independent validation of Tom's survey.
- Implement improvements on the survey methodology being used by Tom and others.
- Because it provided an interesting challenge for us to tackle.
And, to be honest - everyone runs these studies because they make great material to share on social media and to encourage people to share and link back to your site. There's definitely a reason that everyone releases these studies in April - everyone wants to publish theirs before Magento Imagine, the annual eCommerce conference hosted by Magento, the platform that currently leads the web in usage.
As Tom mentions on his survey results, he co-owns an eCommerce development company that focuses on Magento, which could open him up to claims that he's biased. By developing our own study independent of Tom's (and in some cases providing differing results from Tom's study), I hope to provide validation that Tom's survey is accurate.
When reviewing the results from Tom's survey and the AheadWorks survey I saw a number of ways we could improve upon their methodology. Both of them survey the top 1 million sites on the Internet as measured by Alexa. The problem with Alexa is that they measure the top 1 million sites via data they collect from a toolbar they ask users to install. This toolbar only runs on Internet Explorer, Chrome and Firefox - ruling out users that use Safari or Opera. It's a limited data-set, and it has its own built-in biases due to the way the data is collected.
That's why our survey includes both the Alex top 1 million and the Quantcast top 1 million, Quantcast takes the opposite approach of Alexa - site owners install tracking code on their site, similar to Google Analytics, so the traffic data measures visits from all visitors, no matter what browser they're using, but only on sites that have the code installed.
I also noticed in both Tom's results from last year and the AheadWorks results, the most popular eCommerce solution for WordPress, woocommerce, wasn't included. This made me dig further into their results, and I realized there were a number of eCommerce platforms that weren't included. Tom has since updated his survey to include woocommerce, but Aheadworks still isn't including this platform, which makes up a significant portion of the eCommerce sites on the Internet.
So, we decided to put together our own study. Someone asked recently on Twitter if these surveys are based on downloads, sales, etc. - they're not - they're based on a technology similar to how Google builds its search index. We take the list of sites to study (the Alexa and Quantcast top 1 million sites, which, once de-duplicated, contain approximately 1.7 million sites) and load them into a database. We then connect to each of those sites, and scan it for a fingerprint. By a fingerprint, we mean a string of text that is unique to that specific platform - i.e., there are certain strings containing the word 'Varien' we use to detect various Magento versions. We maintain fingerprints for over 55 platforms at this time, and are constantly adding more. We then also check the site's IP address and the ownership data of that IP address to determine if the site is running on a cloud-based service (i.e., Shopify, Volusion, Magento Go, BigCommerce, etc.). These results are all loaded into a database, which we then analyze to generate our reports.
Of course, with 1.7 million sites if you assume it takes 10 seconds to connect to and evaluate each site, running this study on a single server would take over 6 months! That's where Amazon's AWS EC2 system comes in to play. We load our list of sites into an Amazon SQS queue and then run a master EC2 node that spins up 16-32 EC2 instances that split the queue up and process everything in under a week.
If anyone has any questions about our study, feel free to post them here or find me on Twitter at @JoshuaSWarren.