The interest in alexa.com hasn’t diminished, even with the holidays coming to the fore. The matter of how Alexa works interests many of us, so we decided to look more closely at Alexa. We derived that on 1st December there were 964 unreachable websites with ranks in the range of 104-99993 in alexa.com . But the main and fascinating fact of the analysis is that about 1% of the top 10,000 URLs don’t even have IP, don’t resolve the IP request, and don’t redirect to another address. To make sense of this we went to alexa.com, where it states:
“The traffic rank is based on three months of aggregated historical traffic data from millions of Alexa Toolbar users and data obtained from other, diverse traffic data sources, and is a combined measure of page views and users (reach). As a first step, Alexa computes the reach and number of page views for all sites on the Web on a daily basis. The main Alexa traffic rank is based on a value derived from these two quantities averaged over time (so that the rank of a site reflects both the number of users who visit that site as well as the number of pages on the site viewed by those users). The three-month change is determined by comparing the site’s current rank with its rank from three months ago.”
Should we take this on trust? To dispel the myth, we’ve compared the current situation with the previous one, taking into account the six-month earlier data for the same unresolved URLs. We note that about 60% of overall unresolved URLs appear in Alexa again. Moreover, about 40% of URLs still show up in the traffic rank increase. Does this solve the paradox? Can you fabricate a good Alexa score using various techniques? Can a user fake the rank statistics, by creating and generating improper data?
According to Alexa, the domain rank is not just the URL rank itself but also the rank of all sub domains.
“Traffic is computed for sites, which are typically defined at the domain level. For example, the Web hosts www.msn.com, carpoint.msn.com and slate.msn.com are all treated as part of the same site, because they all reside on the same domain, msn.com. An exception is blogs or personal home pages, which are treated separately if they can be automatically identified as such from the URLs in question. Also, sites which are found to be serving the same content (mirrors) are generally counted together as the same site”
Of course there are a lot of cases where the requests turn to the direct sub-domain, but the key is this: there are many cases where the directly working URLs don’t exist. We discovered another group of URLs working indirectly.
The unresolved IP address example that’s getting high ranks:
Current rank: 1977
November rank: 2115
Six-Month earlier rank: 7205
It’s found as part of Rackspace Inc. with user level images kept in rackcdn.com
The hosting applications keep images by the xxxx.rackcdn.com URL in formats
That’s why, even though it was in “non-working” condition, it still got continuously high ranks. Could it be the paradox solution?
So if you notice unresolved URLs, check them out. They may be a “child” for a heading domain.
Flirting bring the matter, fall in love or something