Indexation for SEO: Real Numbers in 5 Easy Steps

Posted by randfish

How many pages has Google indexed?

This question and the problems surrounding it run rampant through the SEO world. It usually arises when someone starts doing searches like this:

Indexation of SEOmoz According to Google

Google claims to have 93,800 pages indexed on the root domain, seomoz.org. That sounds pretty good, but when I ran that search query last week, the number was closer to 75,000 and when I run it again from Google.co.uk 60 seconds later, the number changes even more dramatically:

Indexation of SEOmoz.org on Google.co.uk

How about if I hit refresh on my Google.com results again:

Indexation on Google.com 3 minutes later

Doh! Google just dropped 8,500 of my pages out of their index. That sucks – but not nearly as much as managers, marketing directors and CEOs who use these numbers as actual KPIs! Can you imagine? A number that means nothing, fluctuates 300% between data centers, can change at a moment’s notice and provides no actionable insight being used as a business metric?

And yet… It happens.

Fortunately, there’s an easy way to get much, much better data than what the search engines provide through "site:" queries and this post is here to walk you through that process step-by-step.

Step 1: Go to Traffic Sources in Your Analytics

Google Analytics Step 1

Click the "traffic sources" link in Google analytics or Omniture (it can also be called "referring sources" in other analytics packages).

Step 2: Head to the Search Engines Section

Step 2 of the Indexation Process

We want to find out how many pages the search engines have indexed, so the obvious next step is to go to the "search engines" sub-section.

Step 3: Choose an Engine

Step 3: Choose an Engine 

Choose the engine you want indexation data on and click. If you have both paid and organic traffic from this engine, you’ll want to display organic only at this step, too.

Step 4: Filter by Landing Pages

Step 4: Filter by Landing Page

The "Landing Page" filter in the dropdown will show you the traffic each individual page on your site received from the engine you’ve selected. This also produces the magical "total" number of pages that have received traffic, described in the last step.

Step 5: Record the Number at the Bottom

Step 5: Indexation Count Arrives

That count tells you the unique number of pages that received at least one visit from searches performed on Google. It’s the Holy Grail of indexation – a number you can accurately track over time to see how the search engine is indexing your site. On its own, it isn’t particularly useful, but over time (I usually recommend recording monthly, but for some sites, every 2-3 months can make more sense), it gives you insight into whether your pages are doing better or worse at drawing in traffic from the engine.

Now, technically I’m being a bit cheeky here. This number doesn’t tell you the full story – it’s not showing the actual number of pages a search engine has crawled or indexed on your site, but it does tell you the unique number of URLs that received at least 1 visit from the engine. In my opinion this data is far more accurate and more actionable. The first adjective – accurate – is hard to argue (particularly given the visual evidence atop this post), but the second requires a bit of an explanation.

Why is Number of Pages Receiving ≥1 Visit Actionable?

Indexation numbers alone are useless. Businesses and websites use them as KPIs because they want to know if, over time, more of their pages are making their way into the engines’ indices. I’d argue that actually, you don’t care if your pages are in the indices – you care if your pages have the opportunity to EARN TRAFFIC!

Being a row in a search index means nothing if your page is:

  • too low in PageRank/link juice to appear in any results
  • displaying content the engines can’t properly parse
  • devoid of keywords or content that could send traffic
  • broken, misdirected or unavailable
  • a duplicate of other pages that the engine will rank instead

Thus, the metric you want to count over time isn’t (in most cases) number of pages indexed, it’s number of pages that earned traffic. Over time, that’s the number you want to rise, the number you want marketers to concentrate on and the KPI that’s meaningful. It tells you whether the engine is crawling, indexing AND listing your pages in the results where someone might (has) actually click(ed) them.

If the number drops, you can investigate the actual pages that are no longer receiving traffic by exporting the data to Excel and doing a side-by-side with the previous month. If the number rises, you can see the new pages getting traffic. Those individual URLs will tell a story – of pages that broke, that stopped being linked-to, that fell too far down in paginated results or lost their unique content. It’s so much better than playing the mystery game that SEOs so often confront in the face of "lower indexation numbers" from the site: command.

Some Necessary Caveats

This methodology certainly isn’t perfect, and there are some important points to be aware of (thanks especially to some folks in the comments who brought these up):

  • Google Analytics (and many other analytics packages) use sampled data at times to make guesstimates. If you want to be sure you’re getting the absolute best number, export to CSV and do the side-by-side in Excel. You can even expunge similar results from two time period to see only those pages that uniquely did/didn’t receive traffic. In many of these cases, you might also only care about pages that gained/lost 5/10/20+ visits.
  • Greater accuracy can be found from shrinking the time period in the analytics, but it also reduces the liklihood that a page receiving very long tail query traffic once in a blue moon will be properly listed, so adjust accordingly, and plan for imperfect data. This method isn’t foolproof, but it is (in my opinion), better than the random roulette wheel of site: queries.
  • This technique isn’t going to help you catch other kinds of SEO issues like duplicate content (it can in some cases, but it’s not as good as something like GG WM Tools reporting) or 301s, 302s, etc. which can require a crawling solution.

I’d, of course, love your feedback. I know many SEOs are addicted to and supportive of the site: command numbers as a way to measure progress, so maybe there’s things I’m not considering or situations where it makes sense. I also know that many of you like the number reported in Google Webmaster tools under the Sitemaps crawl data (I’m skeptical of this too, for the record) and I’d like to hear how you find value with that data as well.

p.s. Tomorrow we’ll be announcing two webinars (open to all) about using Open Site Explorer to get ACTIONABLE data. Be sure to leave either Wednesday the 27th at 2pm Pacific or Thursday the 28th at 10am Pacific free :-)

Do you like this post? Yes No

Posted by randfish

How many pages has Google indexed?

This question and the problems surrounding it run rampant through the SEO world. It usually arises when someone starts doing searches like this:

Indexation of SEOmoz According to Google

Google claims to have 93,800 pages indexed on the root domain, seomoz.org. That sounds pretty good, but when I ran that search query last week, the number was closer to 75,000 and when I run it again from Google.co.uk 60 seconds later, the number changes even more dramatically:

Indexation of SEOmoz.org on Google.co.uk

How about if I hit refresh on my Google.com results again:

Indexation on Google.com 3 minutes later

Doh! Google just dropped 8,500 of my pages out of their index. That sucks – but not nearly as much as managers, marketing directors and CEOs who use these numbers as actual KPIs! Can you imagine? A number that means nothing, fluctuates 300% between data centers, can change at a moment’s notice and provides no actionable insight being used as a business metric?

And yet… It happens.

Fortunately, there’s an easy way to get much, much better data than what the search engines provide through "site:" queries and this post is here to walk you through that process step-by-step.

Step 1: Go to Traffic Sources in Your Analytics

Google Analytics Step 1

Click the "traffic sources" link in Google analytics or Omniture (it can also be called "referring sources" in other analytics packages).

Step 2: Head to the Search Engines Section

Step 2 of the Indexation Process

We want to find out how many pages the search engines have indexed, so the obvious next step is to go to the "search engines" sub-section.

Step 3: Choose an Engine

Step 3: Choose an Engine 

Choose the engine you want indexation data on and click. If you have both paid and organic traffic from this engine, you’ll want to display organic only at this step, too.

Step 4: Filter by Landing Pages

Step 4: Filter by Landing Page

The "Landing Page" filter in the dropdown will show you the traffic each individual page on your site received from the engine you’ve selected. This also produces the magical "total" number of pages that have received traffic, described in the last step.

Step 5: Record the Number at the Bottom

Step 5: Indexation Count Arrives

That count tells you the unique number of pages that received at least one visit from searches performed on Google. It’s the Holy Grail of indexation – a number you can accurately track over time to see how the search engine is indexing your site. On its own, it isn’t particularly useful, but over time (I usually recommend recording monthly, but for some sites, every 2-3 months can make more sense), it gives you insight into whether your pages are doing better or worse at drawing in traffic from the engine.

Now, technically I’m being a bit cheeky here. This number doesn’t tell you the full story – it’s not showing the actual number of pages a search engine has crawled or indexed on your site, but it does tell you the unique number of URLs that received at least 1 visit from the engine. In my opinion this data is far more accurate and more actionable. The first adjective – accurate – is hard to argue (particularly given the visual evidence atop this post), but the second requires a bit of an explanation.

Why is Number of Pages Receiving ≥1 Visit Actionable?

Indexation numbers alone are useless. Businesses and websites use them as KPIs because they want to know if, over time, more of their pages are making their way into the engines’ indices. I’d argue that actually, you don’t care if your pages are in the indices – you care if your pages have the opportunity to EARN TRAFFIC!

Being a row in a search index means nothing if your page is:

  • too low in PageRank/link juice to appear in any results
  • displaying content the engines can’t properly parse
  • devoid of keywords or content that could send traffic
  • broken, misdirected or unavailable
  • a duplicate of other pages that the engine will rank instead

Thus, the metric you want to count over time isn’t (in most cases) number of pages indexed, it’s number of pages that earned traffic. Over time, that’s the number you want to rise, the number you want marketers to concentrate on and the KPI that’s meaningful. It tells you whether the engine is crawling, indexing AND listing your pages in the results where someone might (has) actually click(ed) them.

If the number drops, you can investigate the actual pages that are no longer receiving traffic by exporting the data to Excel and doing a side-by-side with the previous month. If the number rises, you can see the new pages getting traffic. Those individual URLs will tell a story – of pages that broke, that stopped being linked-to, that fell too far down in paginated results or lost their unique content. It’s so much better than playing the mystery game that SEOs so often confront in the face of "lower indexation numbers" from the site: command.

Some Necessary Caveats

This methodology certainly isn’t perfect, and there are some important points to be aware of (thanks especially to some folks in the comments who brought these up):

  • Google Analytics (and many other analytics packages) use sampled data at times to make guesstimates. If you want to be sure you’re getting the absolute best number, export to CSV and do the side-by-side in Excel. You can even expunge similar results from two time period to see only those pages that uniquely did/didn’t receive traffic. In many of these cases, you might also only care about pages that gained/lost 5/10/20+ visits.
  • Greater accuracy can be found from shrinking the time period in the analytics, but it also reduces the liklihood that a page receiving very long tail query traffic once in a blue moon will be properly listed, so adjust accordingly, and plan for imperfect data. This method isn’t foolproof, but it is (in my opinion), better than the random roulette wheel of site: queries.
  • This technique isn’t going to help you catch other kinds of SEO issues like duplicate content (it can in some cases, but it’s not as good as something like GG WM Tools reporting) or 301s, 302s, etc. which can require a crawling solution.

I’d, of course, love your feedback. I know many SEOs are addicted to and supportive of the site: command numbers as a way to measure progress, so maybe there’s things I’m not considering or situations where it makes sense. I also know that many of you like the number reported in Google Webmaster tools under the Sitemaps crawl data (I’m skeptical of this too, for the record) and I’d like to hear how you find value with that data as well.

p.s. Tomorrow we’ll be announcing two webinars (open to all) about using Open Site Explorer to get ACTIONABLE data. Be sure to leave either Wednesday the 27th at 2pm Pacific or Thursday the 28th at 10am Pacific free :-)

Do you like this post? Yes No

One Giant Leap for Link Data: Announcing Open Site Explorer + Page/Domain Authority Metrics

Posted by randfish

For the past 15 months, we’ve been working hard to improve Linkscape, our index of the WWW. Today, we’re releasing an entirely new platform for Linkscape’s index with more accessible data than ever before. And, for the next 48 hours, full functionality is available entirely for free:

Open Site Explorer

The new tool, Open Site Explorer, makes gathering, sorting and exporting link data easier than ever. It’s built with speed and accessibilty at the forefront and provides a tremendous amount of information about the links to any page or site. Since there’s a lot to cover, let’s dive right into some of the features and functionality.

#1 – Fast Access to Top Level Metrics

OSE Metrics

At the top of every results page, you’ll find the key metrics we have on your page – the importance/ranking ability of that URL (Page Authority) and root domain (Domain Authority), the number of linking root domains and the total number of links.

#2 – See Up to 10,000 Links Alongside Anchor Text & Key Metrics

OSE Link List

You can browse through up to 10,000 links (this is restricted to 1,000 for non-PRO members normally, but will be completely free to everyone for the first 48 hours). We also offer CSV export functionality, but it won’t be available until the weekend (and then, only to PRO members – CSV takes up a LOT of bandwidth for 10K rows :-) ).

#3 – Filtering for the Links You Want to See

OSE Filtering Options

As you drill down in the list of links, you can exclude nofollowed links or see only the 301s that point to a page. You also have the ability to sort by the location from which you want to see links – internal vs. external – and links that point to a given page, all pages on a subdomain or an entire root domain.

#4 – Display Root Domains that Contain Links

OSE Linking Domains

The second tab in Open Site Explorer (OSE for short) is the linking root domains. We realized that a lot of people want to get a quick glance of the types of sites that are sending links to a given page or domain, and thus created this unique view. In the future (probably a couple months away), you’ll also be able to click an individual domain and see a list of pages from that site that link to the target of your choice.

#5 – Review Anchor Text Term & Phrase Distribution

OSE Anchor Text Distribution

Anchor text is often the missing link in a "why does that guy rank there?" puzzle. We’re opening up the anchor text distribution so you can learn more about your own sites and pages and those of the competition. You can also sort by both the number of root domains that contain a link with a particular anchor text term (single word) or phrase and the raw number of links containing that anchor text.

#6 – Pie Chart Displays of Link Data 

OSE Data Pie Charts

Many SEOs worry that, particularly on small sites, they may be seeing lots of numbers of links, but the sources aren’t ideal. In this view, we try to illustrate through pie charts the percentage of links that come from internal vs. external pages and are followed vs. nofollowed. This view is at the top of the "full metrics" tab.

#7 - Rejoice in Data Junkie Heaven 

OSE Full Metrics

Additionally in the "full metrics" tab, you’ll find a list of all the Linkscape data we’ve got including mozRank (an algorithm similar to Google’s PageRank), mozTrust (akin to TrustRank) and many more. You can also see the more refined link counts and data for an individual URL, the subdomain it’s on and the hosting root domain.

#8 – Compare Pages/Sites Link Metrics to One Another

OSE Comparison

A frequently requested feature is the ability to compare one site/page against another. OSE makes this quick and easy with a comparison view drop-down. If you click the "-" symbol again, you can return to the individual report view.

#9 – Graphical Views of Metric Comparisons

OSE Comparative Metrics

In the comparison view, we show nice visual charts that you can embed in a client report or send to your boss to help illustrate just how challenging it might be to take on a particular competitor. For example, you can see above that Fred Wilson has a long way to go to reach Guy Kawasaki‘s stats on his blog (granted, Guy’s posts are designed for a much broader audience and he’s been blogging for longer).

#10 – Compare Links Side by Side

OSE Links Side by Side

At the bottom of this comparative view you’ll see links side-by-side. We noticed a lot of SEOs open two browser windows with lists of links to compare them against one another and thought "why not make that easier?!" With this feature, you can scroll through the links for two pages to get a fast sense for the quality and variety of sources that point to each.

New Metrics – Domain Authority & Page Authority

OSE Metrics for NinebyBlue

We’ve got much more information coming soon about these two metrics, but basically, we’re using our ranking models to build predictions about how well an individual page might perform in the search engines (Page Authority) or how well content on a root domain would do (Domain Authority). These aren’t like PageRank or mozRank at all – they’re much broader.

Authority scores take into account all the metrics we have about a page and hundreds of derivatives of those metrics. We’ve put the scores on a classic 0-100 scale that’s logarithmic (so moving from a 50 to a 60 is much harder than moving from a 10 to a 20). Over time, these metrics will change and evolve as we get better and better with our machine learning systems (and as the engines and the web itself changes). Watch for this week’s Whiteboard Friday with much more detail on this subject. For now Open Site Explorer is the only place to get Domain/Page Authority data, but we’ll be rolling it into the SEOmoz toolbar and other tools over the next few months.

Linkscape’s Index Update

Linkscape itself has also updated – growing to a whopping 65 billion URLs with 45 day minimum freshness. As Nick’s previous post on the Trillion+ URLs Linkscape has seen shows, freshness is one of the most critical metrics for those who care about accurate link data, and we’re working hard to keep our index as up-to-date as possible. Linkscape recrawls every page in the index each month, so no "old data" is stored or served. Our current metrics for this index are:

  • Pages: 64,180,990,434 (65 billion)
    • 301s: 293 million
    • 302s: 672 million (Marshall Simmonds calls this "job security")
    • 404s: 360 million (but we do try to exclude known 404s in crawls, so this may be low percentage wise)
  • Subdomains: 259,977,972 (260 million)
  • Root Domains: 63,264,651 (63 million)
    • .com – 49.4%
    • .net – 6.4%
    • .de – 5.8%
    • .org – 5.2%
    • .ru – 2.5%
    • .cn – 2.5%
  • Links: 701,881,850,733 (701 billion)
    • Nofollows: 13 billion (1.85%)
    • Internal Nofollows: 9.06 billion vs. External Nofollows: 4.11 billion
    • Meta Refreshes: 40.9 million
    • Internal Links: 638 billion vs. External Links: 63 billion (people link to their own stuff a lot more than they do to others)
    • Feed Autodiscovery (i.e. RSS/Atom feeds): 2.261 billion
    • Rel=canonical: 100 million
    • Links passed through 301s: 8.61 billion (just over 1% of all links go through a 301)
  • mozRank Correlations to Google Toolbar PageRank
    • Individual page mR: 0.42 (avg. error +/- 0.56 from PR)
    • Subdomain mR: 0.45 (avg. error +/- 0.35 from PR)
    • Root domain mR: 0.45 (avg error +/- 0.37 from PR
  • File Extensions
    • html: 26.5%
    • php: 21.7%
    • htm: 10.6%
    • asp: 5.7%
    • aspx: 2.9%
    • cgi: 0.89%

API Update

Finally, we’ve also updated the SEOmoz API – you can now get lists of links for any URL for FREE along with tons of other link data and metrics. Sarah & Nick have a blog post coming soon with more, but for now, check out the API page to get a developer key and the API Wiki for more details.

Answers to Common Questions About OSE

What’s the difference between OSE and Linkscape?

Open Site Explorer provides a fast, free, more basic view of link data while Linkscape provides power users the ability to refine by dozens of filters, search within link anchor text, URLs and domains. Linkscape will let you dig into significantly more metrics and details on a per link basis on things like mozRank passed, Domain mozTrust, juice per anchor text, links from particular TLDs, etc.

OSE is substantively faster than Linkscape, and not as metrics heavy. It’s designed to give the "500 foot view" vs. the deep, in-the-weeds look you can get in Linkscape. Certainly feel free to try both and use the one that suits you best.

Why is OSE on a separate domain?

Three big reasons, actually:

  1. We’ve haven’t tried the microsite strategy in a long time (since the first launch of the Web 2.0 Awards), and want to test and see lots of SEO and strategic/branding (we’ll have some cool data to report in the next few weeks/months)
  2. OSE is built entirely on the SEOmoz API platform – we wanted to show off just how much you can build using that service :-)
  3. SEOmoz engineers are very busy working on another exciting launch (scheduled for June) so we wanted to split resources without putting a load on folks focused on our site (PRO members may see some previews of that even earlier)

What will OSE continue to offer for free?

For the first 48 hours, registered members (anyone with a free SEOmoz account) will get the full PRO features (unlimited metrics, up to 10K links per report, full anchor text data, etc). After that, anyone can still get up to 1,000 links per search and a sampling of metrics. You can see a full breakdown in the bottom right-hand corner of the homepage.

Why Call it "Open" Site Explorer?

We’re aiming to give out more link data than anyone else on the web for free. Open Site Explorer not only gives out lots and lots of links (up to 1,000), but also metrics and link numbers for free (permanently). We also provide a free API that lets you use any of the data (including lists of links) in your applications, public or private. Our goal is to be transparent with this data – to show exactly how many pages/domains are in our index, show accuracy with freshness and canonicalize and re-crawl like a search engine. We’re trying to take the web’s link graph and make it as available as possible and use the revenue component of PRO membership to accelerate growth on index freshness, quality and size.

Please Give Us Feedback!

We’d love to hear from you. If you have suggestions, bug reports (this is a first launch, after all) or ideas for future iterations, please leave them in the comments or send them via the Open Site Explorer feedback form. We’re of course very excited for the launch of OSE and would certainly appreciate you sharing and helping us spread it around. The free period ends at 8am Pacific on Friday, January 22nd, but PRO members will continue to be able to access all the features and unlimited reports (and free reports will still provide up to 1,000 links).

p.s. Two great posts with more information on this topic appeared in the last 24 hours and are worth sharing:

If you have more to share, feel free to link in the comments.

Do you like this post? Yes No

Posted by randfish

For the past 15 months, we’ve been working hard to improve Linkscape, our index of the WWW. Today, we’re releasing an entirely new platform for Linkscape’s index with more accessible data than ever before. And, for the next 48 hours, full functionality is available entirely for free:

Open Site Explorer

The new tool, Open Site Explorer, makes gathering, sorting and exporting link data easier than ever. It’s built with speed and accessibilty at the forefront and provides a tremendous amount of information about the links to any page or site. Since there’s a lot to cover, let’s dive right into some of the features and functionality.

#1 – Fast Access to Top Level Metrics

OSE Metrics

At the top of every results page, you’ll find the key metrics we have on your page – the importance/ranking ability of that URL (Page Authority) and root domain (Domain Authority), the number of linking root domains and the total number of links.

#2 – See Up to 10,000 Links Alongside Anchor Text & Key Metrics

OSE Link List

You can browse through up to 10,000 links (this is restricted to 1,000 for non-PRO members normally, but will be completely free to everyone for the first 48 hours). We also offer CSV export functionality, but it won’t be available until the weekend (and then, only to PRO members – CSV takes up a LOT of bandwidth for 10K rows :-) ).

#3 – Filtering for the Links You Want to See

OSE Filtering Options

As you drill down in the list of links, you can exclude nofollowed links or see only the 301s that point to a page. You also have the ability to sort by the location from which you want to see links – internal vs. external – and links that point to a given page, all pages on a subdomain or an entire root domain.

#4 – Display Root Domains that Contain Links

OSE Linking Domains

The second tab in Open Site Explorer (OSE for short) is the linking root domains. We realized that a lot of people want to get a quick glance of the types of sites that are sending links to a given page or domain, and thus created this unique view. In the future (probably a couple months away), you’ll also be able to click an individual domain and see a list of pages from that site that link to the target of your choice.

#5 – Review Anchor Text Term & Phrase Distribution

OSE Anchor Text Distribution

Anchor text is often the missing link in a "why does that guy rank there?" puzzle. We’re opening up the anchor text distribution so you can learn more about your own sites and pages and those of the competition. You can also sort by both the number of root domains that contain a link with a particular anchor text term (single word) or phrase and the raw number of links containing that anchor text.

#6 – Pie Chart Displays of Link Data 

OSE Data Pie Charts

Many SEOs worry that, particularly on small sites, they may be seeing lots of numbers of links, but the sources aren’t ideal. In this view, we try to illustrate through pie charts the percentage of links that come from internal vs. external pages and are followed vs. nofollowed. This view is at the top of the "full metrics" tab.

#7 - Rejoice in Data Junkie Heaven 

OSE Full Metrics

Additionally in the "full metrics" tab, you’ll find a list of all the Linkscape data we’ve got including mozRank (an algorithm similar to Google’s PageRank), mozTrust (akin to TrustRank) and many more. You can also see the more refined link counts and data for an individual URL, the subdomain it’s on and the hosting root domain.

#8 – Compare Pages/Sites Link Metrics to One Another

OSE Comparison

A frequently requested feature is the ability to compare one site/page against another. OSE makes this quick and easy with a comparison view drop-down. If you click the "-" symbol again, you can return to the individual report view.

#9 – Graphical Views of Metric Comparisons

OSE Comparative Metrics

In the comparison view, we show nice visual charts that you can embed in a client report or send to your boss to help illustrate just how challenging it might be to take on a particular competitor. For example, you can see above that Fred Wilson has a long way to go to reach Guy Kawasaki‘s stats on his blog (granted, Guy’s posts are designed for a much broader audience and he’s been blogging for longer).

#10 – Compare Links Side by Side

OSE Links Side by Side

At the bottom of this comparative view you’ll see links side-by-side. We noticed a lot of SEOs open two browser windows with lists of links to compare them against one another and thought "why not make that easier?!" With this feature, you can scroll through the links for two pages to get a fast sense for the quality and variety of sources that point to each.

New Metrics – Domain Authority & Page Authority

OSE Metrics for NinebyBlue

We’ve got much more information coming soon about these two metrics, but basically, we’re using our ranking models to build predictions about how well an individual page might perform in the search engines (Page Authority) or how well content on a root domain would do (Domain Authority). These aren’t like PageRank or mozRank at all – they’re much broader.

Authority scores take into account all the metrics we have about a page and hundreds of derivatives of those metrics. We’ve put the scores on a classic 0-100 scale that’s logarithmic (so moving from a 50 to a 60 is much harder than moving from a 10 to a 20). Over time, these metrics will change and evolve as we get better and better with our machine learning systems (and as the engines and the web itself changes). Watch for this week’s Whiteboard Friday with much more detail on this subject. For now Open Site Explorer is the only place to get Domain/Page Authority data, but we’ll be rolling it into the SEOmoz toolbar and other tools over the next few months.

Linkscape’s Index Update

Linkscape itself has also updated – growing to a whopping 65 billion URLs with 45 day minimum freshness. As Nick’s previous post on the Trillion+ URLs Linkscape has seen shows, freshness is one of the most critical metrics for those who care about accurate link data, and we’re working hard to keep our index as up-to-date as possible. Linkscape recrawls every page in the index each month, so no "old data" is stored or served. Our current metrics for this index are:

  • Pages: 64,180,990,434 (65 billion)
    • 301s: 293 million
    • 302s: 672 million (Marshall Simmonds calls this "job security")
    • 404s: 360 million (but we do try to exclude known 404s in crawls, so this may be low percentage wise)
  • Subdomains: 259,977,972 (260 million)
  • Root Domains: 63,264,651 (63 million)
    • .com – 49.4%
    • .net – 6.4%
    • .de – 5.8%
    • .org – 5.2%
    • .ru – 2.5%
    • .cn – 2.5%
  • Links: 701,881,850,733 (701 billion)
    • Nofollows: 13 billion (1.85%)
    • Internal Nofollows: 9.06 billion vs. External Nofollows: 4.11 billion
    • Meta Refreshes: 40.9 million
    • Internal Links: 638 billion vs. External Links: 63 billion (people link to their own stuff a lot more than they do to others)
    • Feed Autodiscovery (i.e. RSS/Atom feeds): 2.261 billion
    • Rel=canonical: 100 million
    • Links passed through 301s: 8.61 billion (just over 1% of all links go through a 301)
  • mozRank Correlations to Google Toolbar PageRank
    • Individual page mR: 0.42 (avg. error +/- 0.56 from PR)
    • Subdomain mR: 0.45 (avg. error +/- 0.35 from PR)
    • Root domain mR: 0.45 (avg error +/- 0.37 from PR
  • File Extensions
    • html: 26.5%
    • php: 21.7%
    • htm: 10.6%
    • asp: 5.7%
    • aspx: 2.9%
    • cgi: 0.89%

API Update

Finally, we’ve also updated the SEOmoz API – you can now get lists of links for any URL for FREE along with tons of other link data and metrics. Sarah & Nick have a blog post coming soon with more, but for now, check out the API page to get a developer key and the API Wiki for more details.

Answers to Common Questions About OSE

What’s the difference between OSE and Linkscape?

Open Site Explorer provides a fast, free, more basic view of link data while Linkscape provides power users the ability to refine by dozens of filters, search within link anchor text, URLs and domains. Linkscape will let you dig into significantly more metrics and details on a per link basis on things like mozRank passed, Domain mozTrust, juice per anchor text, links from particular TLDs, etc.

OSE is substantively faster than Linkscape, and not as metrics heavy. It’s designed to give the "500 foot view" vs. the deep, in-the-weeds look you can get in Linkscape. Certainly feel free to try both and use the one that suits you best.

Why is OSE on a separate domain?

Three big reasons, actually:

  1. We’ve haven’t tried the microsite strategy in a long time (since the first launch of the Web 2.0 Awards), and want to test and see lots of SEO and strategic/branding (we’ll have some cool data to report in the next few weeks/months)
  2. OSE is built entirely on the SEOmoz API platform – we wanted to show off just how much you can build using that service :-)
  3. SEOmoz engineers are very busy working on another exciting launch (scheduled for June) so we wanted to split resources without putting a load on folks focused on our site (PRO members may see some previews of that even earlier)

What will OSE continue to offer for free?

For the first 48 hours, registered members (anyone with a free SEOmoz account) will get the full PRO features (unlimited metrics, up to 10K links per report, full anchor text data, etc). After that, anyone can still get up to 1,000 links per search and a sampling of metrics. You can see a full breakdown in the bottom right-hand corner of the homepage.

Why Call it "Open" Site Explorer?

We’re aiming to give out more link data than anyone else on the web for free. Open Site Explorer not only gives out lots and lots of links (up to 1,000), but also metrics and link numbers for free (permanently). We also provide a free API that lets you use any of the data (including lists of links) in your applications, public or private. Our goal is to be transparent with this data – to show exactly how many pages/domains are in our index, show accuracy with freshness and canonicalize and re-crawl like a search engine. We’re trying to take the web’s link graph and make it as available as possible and use the revenue component of PRO membership to accelerate growth on index freshness, quality and size.

Please Give Us Feedback!

We’d love to hear from you. If you have suggestions, bug reports (this is a first launch, after all) or ideas for future iterations, please leave them in the comments or send them via the Open Site Explorer feedback form. We’re of course very excited for the launch of OSE and would certainly appreciate you sharing and helping us spread it around. The free period ends at 8am Pacific on Friday, January 22nd, but PRO members will continue to be able to access all the features and unlimited reports (and free reports will still provide up to 1,000 links).

p.s. Two great posts with more information on this topic appeared in the last 24 hours and are worth sharing:

If you have more to share, feel free to link in the comments.

Do you like this post? Yes No

Seth Godin: Sliced Bread

Malcolm Gladwell: Outliers

Anthony Parinello: Your Price is Too High