Indexation for SEO: Real Numbers in 5 Easy Steps

Posted by randfish

How many pages has Google indexed?

This question and the problems surrounding it run rampant through the SEO world. It usually arises when someone starts doing searches like this:

Indexation of SEOmoz According to Google

Google claims to have 93,800 pages indexed on the root domain, seomoz.org. That sounds pretty good, but when I ran that search query last week, the number was closer to 75,000 and when I run it again from Google.co.uk 60 seconds later, the number changes even more dramatically:

Indexation of SEOmoz.org on Google.co.uk

How about if I hit refresh on my Google.com results again:

Indexation on Google.com 3 minutes later

Doh! Google just dropped 8,500 of my pages out of their index. That sucks – but not nearly as much as managers, marketing directors and CEOs who use these numbers as actual KPIs! Can you imagine? A number that means nothing, fluctuates 300% between data centers, can change at a moment’s notice and provides no actionable insight being used as a business metric?

And yet… It happens.

Fortunately, there’s an easy way to get much, much better data than what the search engines provide through "site:" queries and this post is here to walk you through that process step-by-step.

Step 1: Go to Traffic Sources in Your Analytics

Google Analytics Step 1

Click the "traffic sources" link in Google analytics or Omniture (it can also be called "referring sources" in other analytics packages).

Step 2: Head to the Search Engines Section

Step 2 of the Indexation Process

We want to find out how many pages the search engines have indexed, so the obvious next step is to go to the "search engines" sub-section.

Step 3: Choose an Engine

Step 3: Choose an Engine 

Choose the engine you want indexation data on and click. If you have both paid and organic traffic from this engine, you’ll want to display organic only at this step, too.

Step 4: Filter by Landing Pages

Step 4: Filter by Landing Page

The "Landing Page" filter in the dropdown will show you the traffic each individual page on your site received from the engine you’ve selected. This also produces the magical "total" number of pages that have received traffic, described in the last step.

Step 5: Record the Number at the Bottom

Step 5: Indexation Count Arrives

That count tells you the unique number of pages that received at least one visit from searches performed on Google. It’s the Holy Grail of indexation – a number you can accurately track over time to see how the search engine is indexing your site. On its own, it isn’t particularly useful, but over time (I usually recommend recording monthly, but for some sites, every 2-3 months can make more sense), it gives you insight into whether your pages are doing better or worse at drawing in traffic from the engine.

Now, technically I’m being a bit cheeky here. This number doesn’t tell you the full story – it’s not showing the actual number of pages a search engine has crawled or indexed on your site, but it does tell you the unique number of URLs that received at least 1 visit from the engine. In my opinion this data is far more accurate and more actionable. The first adjective – accurate – is hard to argue (particularly given the visual evidence atop this post), but the second requires a bit of an explanation.

Why is Number of Pages Receiving ≥1 Visit Actionable?

Indexation numbers alone are useless. Businesses and websites use them as KPIs because they want to know if, over time, more of their pages are making their way into the engines’ indices. I’d argue that actually, you don’t care if your pages are in the indices – you care if your pages have the opportunity to EARN TRAFFIC!

Being a row in a search index means nothing if your page is:

  • too low in PageRank/link juice to appear in any results
  • displaying content the engines can’t properly parse
  • devoid of keywords or content that could send traffic
  • broken, misdirected or unavailable
  • a duplicate of other pages that the engine will rank instead

Thus, the metric you want to count over time isn’t (in most cases) number of pages indexed, it’s number of pages that earned traffic. Over time, that’s the number you want to rise, the number you want marketers to concentrate on and the KPI that’s meaningful. It tells you whether the engine is crawling, indexing AND listing your pages in the results where someone might (has) actually click(ed) them.

If the number drops, you can investigate the actual pages that are no longer receiving traffic by exporting the data to Excel and doing a side-by-side with the previous month. If the number rises, you can see the new pages getting traffic. Those individual URLs will tell a story – of pages that broke, that stopped being linked-to, that fell too far down in paginated results or lost their unique content. It’s so much better than playing the mystery game that SEOs so often confront in the face of "lower indexation numbers" from the site: command.

Some Necessary Caveats

This methodology certainly isn’t perfect, and there are some important points to be aware of (thanks especially to some folks in the comments who brought these up):

  • Google Analytics (and many other analytics packages) use sampled data at times to make guesstimates. If you want to be sure you’re getting the absolute best number, export to CSV and do the side-by-side in Excel. You can even expunge similar results from two time period to see only those pages that uniquely did/didn’t receive traffic. In many of these cases, you might also only care about pages that gained/lost 5/10/20+ visits.
  • Greater accuracy can be found from shrinking the time period in the analytics, but it also reduces the liklihood that a page receiving very long tail query traffic once in a blue moon will be properly listed, so adjust accordingly, and plan for imperfect data. This method isn’t foolproof, but it is (in my opinion), better than the random roulette wheel of site: queries.
  • This technique isn’t going to help you catch other kinds of SEO issues like duplicate content (it can in some cases, but it’s not as good as something like GG WM Tools reporting) or 301s, 302s, etc. which can require a crawling solution.

I’d, of course, love your feedback. I know many SEOs are addicted to and supportive of the site: command numbers as a way to measure progress, so maybe there’s things I’m not considering or situations where it makes sense. I also know that many of you like the number reported in Google Webmaster tools under the Sitemaps crawl data (I’m skeptical of this too, for the record) and I’d like to hear how you find value with that data as well.

p.s. Tomorrow we’ll be announcing two webinars (open to all) about using Open Site Explorer to get ACTIONABLE data. Be sure to leave either Wednesday the 27th at 2pm Pacific or Thursday the 28th at 10am Pacific free :-)

Do you like this post? Yes No

Posted by randfish

How many pages has Google indexed?

This question and the problems surrounding it run rampant through the SEO world. It usually arises when someone starts doing searches like this:

Indexation of SEOmoz According to Google

Google claims to have 93,800 pages indexed on the root domain, seomoz.org. That sounds pretty good, but when I ran that search query last week, the number was closer to 75,000 and when I run it again from Google.co.uk 60 seconds later, the number changes even more dramatically:

Indexation of SEOmoz.org on Google.co.uk

How about if I hit refresh on my Google.com results again:

Indexation on Google.com 3 minutes later

Doh! Google just dropped 8,500 of my pages out of their index. That sucks – but not nearly as much as managers, marketing directors and CEOs who use these numbers as actual KPIs! Can you imagine? A number that means nothing, fluctuates 300% between data centers, can change at a moment’s notice and provides no actionable insight being used as a business metric?

And yet… It happens.

Fortunately, there’s an easy way to get much, much better data than what the search engines provide through "site:" queries and this post is here to walk you through that process step-by-step.

Step 1: Go to Traffic Sources in Your Analytics

Google Analytics Step 1

Click the "traffic sources" link in Google analytics or Omniture (it can also be called "referring sources" in other analytics packages).

Step 2: Head to the Search Engines Section

Step 2 of the Indexation Process

We want to find out how many pages the search engines have indexed, so the obvious next step is to go to the "search engines" sub-section.

Step 3: Choose an Engine

Step 3: Choose an Engine 

Choose the engine you want indexation data on and click. If you have both paid and organic traffic from this engine, you’ll want to display organic only at this step, too.

Step 4: Filter by Landing Pages

Step 4: Filter by Landing Page

The "Landing Page" filter in the dropdown will show you the traffic each individual page on your site received from the engine you’ve selected. This also produces the magical "total" number of pages that have received traffic, described in the last step.

Step 5: Record the Number at the Bottom

Step 5: Indexation Count Arrives

That count tells you the unique number of pages that received at least one visit from searches performed on Google. It’s the Holy Grail of indexation – a number you can accurately track over time to see how the search engine is indexing your site. On its own, it isn’t particularly useful, but over time (I usually recommend recording monthly, but for some sites, every 2-3 months can make more sense), it gives you insight into whether your pages are doing better or worse at drawing in traffic from the engine.

Now, technically I’m being a bit cheeky here. This number doesn’t tell you the full story – it’s not showing the actual number of pages a search engine has crawled or indexed on your site, but it does tell you the unique number of URLs that received at least 1 visit from the engine. In my opinion this data is far more accurate and more actionable. The first adjective – accurate – is hard to argue (particularly given the visual evidence atop this post), but the second requires a bit of an explanation.

Why is Number of Pages Receiving ≥1 Visit Actionable?

Indexation numbers alone are useless. Businesses and websites use them as KPIs because they want to know if, over time, more of their pages are making their way into the engines’ indices. I’d argue that actually, you don’t care if your pages are in the indices – you care if your pages have the opportunity to EARN TRAFFIC!

Being a row in a search index means nothing if your page is:

  • too low in PageRank/link juice to appear in any results
  • displaying content the engines can’t properly parse
  • devoid of keywords or content that could send traffic
  • broken, misdirected or unavailable
  • a duplicate of other pages that the engine will rank instead

Thus, the metric you want to count over time isn’t (in most cases) number of pages indexed, it’s number of pages that earned traffic. Over time, that’s the number you want to rise, the number you want marketers to concentrate on and the KPI that’s meaningful. It tells you whether the engine is crawling, indexing AND listing your pages in the results where someone might (has) actually click(ed) them.

If the number drops, you can investigate the actual pages that are no longer receiving traffic by exporting the data to Excel and doing a side-by-side with the previous month. If the number rises, you can see the new pages getting traffic. Those individual URLs will tell a story – of pages that broke, that stopped being linked-to, that fell too far down in paginated results or lost their unique content. It’s so much better than playing the mystery game that SEOs so often confront in the face of "lower indexation numbers" from the site: command.

Some Necessary Caveats

This methodology certainly isn’t perfect, and there are some important points to be aware of (thanks especially to some folks in the comments who brought these up):

  • Google Analytics (and many other analytics packages) use sampled data at times to make guesstimates. If you want to be sure you’re getting the absolute best number, export to CSV and do the side-by-side in Excel. You can even expunge similar results from two time period to see only those pages that uniquely did/didn’t receive traffic. In many of these cases, you might also only care about pages that gained/lost 5/10/20+ visits.
  • Greater accuracy can be found from shrinking the time period in the analytics, but it also reduces the liklihood that a page receiving very long tail query traffic once in a blue moon will be properly listed, so adjust accordingly, and plan for imperfect data. This method isn’t foolproof, but it is (in my opinion), better than the random roulette wheel of site: queries.
  • This technique isn’t going to help you catch other kinds of SEO issues like duplicate content (it can in some cases, but it’s not as good as something like GG WM Tools reporting) or 301s, 302s, etc. which can require a crawling solution.

I’d, of course, love your feedback. I know many SEOs are addicted to and supportive of the site: command numbers as a way to measure progress, so maybe there’s things I’m not considering or situations where it makes sense. I also know that many of you like the number reported in Google Webmaster tools under the Sitemaps crawl data (I’m skeptical of this too, for the record) and I’d like to hear how you find value with that data as well.

p.s. Tomorrow we’ll be announcing two webinars (open to all) about using Open Site Explorer to get ACTIONABLE data. Be sure to leave either Wednesday the 27th at 2pm Pacific or Thursday the 28th at 10am Pacific free :-)

Do you like this post? Yes No

11 Conversion Rate Optimization Lessons Learned in 2009 (and annual moz traffic stats)

Posted by Sam Niccolls

"Don’t do viral marketing until your product doesn’t suck. If you do, more people will find out your product sucks." This pearl of wisdom from serial entrepreneur Dave McClure applies well not only to product development, but also to conversion rate optimization. The extension would be "don’t focus on getting more visitors until your site converts the visitors it already gets."

This is a sentiment we’ve taken to heart here at SEOmoz. So in this post we will share how we grew traffic and conversions in 2009, as well as some of the valuable lessons we’ve learned in the process, which we’re excited to execute on in 2010.

Traffic Statistics from 2009

In the past, SEOmoz has shared data about the traffic we receive (see past years – 2006, 2007). In 2008, we somehow skipped out, but this year, we’re bringing sexy back. Yes, it’s probably helpful to our competitors, but it’s also hugely valuable to our members (we hope) and part of our core value of transparency. So in the same vein of Rand’s blog posts about the venture funding process, we’re opening the kimono and sharing some analysis in hopes that others can benefit from our traffic and conversion rate learnings. 

We’ll start with an overview of visitor and broad traffic data: 

SEOmoz Visits by Month in 2009

The early part of the year featured a big growth, as the overall popularity of the site spiked, new traffic sources (like Twitter) started bringing in visitors and we had some big successes with email marketing. The latter part of the year saw relatively steady numbers, with a small, predictable fall in December for the holidays.

SEOmoz Return Visits by Month 2009

Return visits show a fairly similar trend, with a slight drop in Q4 (though, as you’ll see below, it was a massive growth from 2008).

We’ve come a long way in 2009 – growing traffic to the site as a whole and to the blog. Revenue was also up over 250%, so it’s not just additional visits or visitors – conversions have also been improving.

SEOmoz Free Signups 2009

All this raw data is interesting, but it’s even more valuable to dig in deep and identify the opportunities for improvement.

11 Conversion Rate Optimization Lessons We Learned in 2009

If marketers are captains of leaking ships, finding ways to remove more water faster might work, but plugging the holes and improving conversion rates is much more efficient. At SEOmoz we’re proud of the ship we’re sailing, but there’s also a laundry list of ways we can improve. So based on some of the things we learned in 2009, here are some of the holes we will look to plug to keep the Moz ship rising in 2010.

 
Long Tail Opportunities

Missing Calls to Action







A note on #10 – there are several ways to implement form field tracking, including onclick events or using the track event in Google Analytics. Additionally, Clicktale, though not part of Google Analytics, is a really useful tool for tracking abandonment. For more information on the subject, Distilled’s Duncan Morris has a detailed follow up post on using jquery and GA to track form abandonment.

The takeaways from these slides shouldn’t be – do exactly what we’re doing on your pages – but rather, find a process at your company to identify where your traffic is going, where you are losing customers, and make small conversion rate improvements because, depending on how you monetize your site, making incremental conversion rate improvements could be the most efficient way to hit your revenue goals this year.

For us at SEOmoz, 2009 was an outstanding traffic year. We’re certainly proud of the fact that over 100,000 more visitors will visit the site this January than last January, but we are also well aware of the fact that more traffic does not equate to more revenue. So for this reason, we will continue to place our efforts on better converting the visitors we already have and better retaining our existing PRO members.

In closing, please note that this post is not meant to bash SEO, PPC, social media marketing, or any other traffic building tactics. Getting traffic to websites is what we do. It’s at the core of what we do at SEOmoz! Our goal is simply to be transparent about how we’re working to improve our business. Admittedly, however, when Rand says conversion rate optimization will be a major trend in 2010, it’s possible he’s projecting just a little. :-)

Do you like this post? Yes No

Posted by Sam Niccolls

"Don’t do viral marketing until your product doesn’t suck. If you do, more people will find out your product sucks." This pearl of wisdom from serial entrepreneur Dave McClure applies well not only to product development, but also to conversion rate optimization. The extension would be "don’t focus on getting more visitors until your site converts the visitors it already gets."

This is a sentiment we’ve taken to heart here at SEOmoz. So in this post we will share how we grew traffic and conversions in 2009, as well as some of the valuable lessons we’ve learned in the process, which we’re excited to execute on in 2010.

Traffic Statistics from 2009

In the past, SEOmoz has shared data about the traffic we receive (see past years – 2006, 2007). In 2008, we somehow skipped out, but this year, we’re bringing sexy back. Yes, it’s probably helpful to our competitors, but it’s also hugely valuable to our members (we hope) and part of our core value of transparency. So in the same vein of Rand’s blog posts about the venture funding process, we’re opening the kimono and sharing some analysis in hopes that others can benefit from our traffic and conversion rate learnings. 

We’ll start with an overview of visitor and broad traffic data: 

SEOmoz Visits by Month in 2009

The early part of the year featured a big growth, as the overall popularity of the site spiked, new traffic sources (like Twitter) started bringing in visitors and we had some big successes with email marketing. The latter part of the year saw relatively steady numbers, with a small, predictable fall in December for the holidays.

SEOmoz Return Visits by Month 2009

Return visits show a fairly similar trend, with a slight drop in Q4 (though, as you’ll see below, it was a massive growth from 2008).

We’ve come a long way in 2009 – growing traffic to the site as a whole and to the blog. Revenue was also up over 250%, so it’s not just additional visits or visitors – conversions have also been improving.

SEOmoz Free Signups 2009

All this raw data is interesting, but it’s even more valuable to dig in deep and identify the opportunities for improvement.

11 Conversion Rate Optimization Lessons We Learned in 2009

If marketers are captains of leaking ships, finding ways to remove more water faster might work, but plugging the holes and improving conversion rates is much more efficient. At SEOmoz we’re proud of the ship we’re sailing, but there’s also a laundry list of ways we can improve. So based on some of the things we learned in 2009, here are some of the holes we will look to plug to keep the Moz ship rising in 2010.

 
Long Tail Opportunities

Missing Calls to Action







A note on #10 – there are several ways to implement form field tracking, including onclick events or using the track event in Google Analytics. Additionally, Clicktale, though not part of Google Analytics, is a really useful tool for tracking abandonment. For more information on the subject, Distilled’s Duncan Morris has a detailed follow up post on using jquery and GA to track form abandonment.

The takeaways from these slides shouldn’t be – do exactly what we’re doing on your pages – but rather, find a process at your company to identify where your traffic is going, where you are losing customers, and make small conversion rate improvements because, depending on how you monetize your site, making incremental conversion rate improvements could be the most efficient way to hit your revenue goals this year.

For us at SEOmoz, 2009 was an outstanding traffic year. We’re certainly proud of the fact that over 100,000 more visitors will visit the site this January than last January, but we are also well aware of the fact that more traffic does not equate to more revenue. So for this reason, we will continue to place our efforts on better converting the visitors we already have and better retaining our existing PRO members.

In closing, please note that this post is not meant to bash SEO, PPC, social media marketing, or any other traffic building tactics. Getting traffic to websites is what we do. It’s at the core of what we do at SEOmoz! Our goal is simply to be transparent about how we’re working to improve our business. Admittedly, however, when Rand says conversion rate optimization will be a major trend in 2010, it’s possible he’s projecting just a little. :-)

Do you like this post? Yes No

Charting ‘Unique Keyphrases’ Using Advanced Segments

Posted by RobOusbey

A useful indicator of SEO success is the number of unique keyphrases that send traffic to a website. An increase in this number is a reflection of increased trust in the site by search-engines.

Google Analytics can show you the total number of unique organic keyphrases at a glance, on the Traffic Sources ⇒ Keywords page. (Make sure you select ‘non-paid’ to exclude any CPC campaigns.)

This post will show you how to break that down to a more useful level of granularity and help you to create a table such as the following:

We’ll aim to categorise traffic into three buckets: ‘branded’, ‘head terms’ and ‘mid-long tail terms’. (In reality, we’ll actually calculate the first two, and the third one will be ‘everything that is left’.)

As we often can’t export enough keywords from Google Analytics to do the analysis offline, we will have to use ‘Advanced Segments’ to do this. This means that we can only group together ‘branded terms’ and ‘head terms’ in ways that we can explain through AND and OR statements.

The process for doing this goes like this:

  1. Plan to create advanced segments that define each group of keywords you want to track
  2. Define rules using ‘AND’ & ‘OR’ statements that describe which keywords should be in each group
  3. Apply these groups each month, one at a time, to the previous month’s data, in order to reveal the number of unique keywords.

Since this ‘rule defining’ will take place in Google Analytics’ Advanced Segments feature, we’ll be using ‘regular expressions’ – a clever but pretty technical method of defining which items in a set should be included in a particular subset. (More details about them at this site.)

The next sections may have particular appeal to the more ‘techie’ readers (or just those people feeling brave) – so do feel free to just skip down to the end to see screen-shots of these segments applied to the keywords report, if the nitty-gritty isn’t your cup of tea.

Creating the ‘Branded Terms’ Segment

If you’ve not really implemented Advanced Segments before, I suggest starting with Google Analytics’ help pages on the topic, but also having a play with the feature, to see how it works. (Really, do have a play. I’m going to assume you at least have understood what most of the main buttons do, and that’s a great way to find out.)

Planning the Segment

Let’s use a fictional company, TechNet, who make a product called the Vox9000. Their segment for ‘branded terms’ will include anything that mentions these terms.

Define the Rules, Create the Segment

To create the segment for branded terms, begin by clicking ‘Advanced Segments’ ⇒ ‘Create new custom segment’.

In the first ‘dimension or metric’ space, add a ‘Medium’ block (found under ‘Dimensions’) and set Condition to ‘Matches exactly’ and Value to ‘organic’. Then hit ‘and‘ to add another section. Place a ‘Keywords’ block here, with Condition as ‘Matches regular expression’ and a value that is all your branded terms, separated by the pipe character: |

(NB: the pipe acts as an ‘OR’ in these regular expressions.)

As an example, for TechNet (which people often search for it with a spaces, as ‘Tech Net’) that makes a product called ‘Vox9000′ (sometimes searched for as ‘Vox 9000′) would use the following string here: technet|tech net|vox9000|vox 9000

Give the segment a name, and save it.

Creating the ‘Head Terms’ Segment

Planning the Segment

The next segment – the head terms – is a bit more complicated, and you’ll see why it’s important for us to to specify rules that will define the head keyphrases.

Let’s imagine that TechNet sells laptops and notebooks in Philadelphia and Baltimore. (Therefore head terms will be those such as ‘notebooks’ or ‘laptops in philadelphia’)

In this example, the rules to define head terms might be:

  • the phrase can’t mention any branded terms
  • it must mention one of their product groups (laptop, notebook)
  • it can only have two words of 3+ characters (this allows for some short linking words, such as a, in, at, etcetera)
  • it can only have a maximum of four words in total.

Define the Rules, Create the Segment

The last two rules can be the trickiest to implement, so we’ll look at these first. Two insights help us solve these requirements:

Insight 1: Combining the two rules, and using S and L to indicate short words (1 or 2 characters) and long words (3+ characters) we see that the only twenty possible structures for keyphrases are: L, LS, SL, LL, LSS, SLS, SSL, LLS, LSL, SLL, LSSS, SLSS, SSLS, SSSL, LLSS, LSLS, LSSL, SLLS, SLSL, SSLL

Insight 2: The regular expression: \b[^ ]{3,50}\b matches a word of between 3 & 50 characters. It’s also necessary to know that ^ matches something at the beginning of an expression, and $ matches at the end. (Seriously, they do. Start by going through the examples at this site if you want to know why that’s the case.)

We’re now in a position to take the list of combinations from ‘Insight 1′ and replace ‘S’ with \b[^ ]{1,2}\b (matching words with 1/2 characters) and ‘L’ with \b[^ ]{3,50}\b, putting spaces in-between, wrapping in parentheses, and matching at beginning and end. Missed that? OK, here are examples of some of the resulting statements:

L becomes ^(\b[^ ]{3,50}\b)$
SL becomes ^(\b[^ ]{1,2}\b \b[^ ]{3,50}\b)$
LSL becomes ^(\b[^ ]{3,50}\b \b[^ ]{3,50}\b \b[^ ]{1,2}\b)$
etc.

You should join the twenty created expressions together using a pipe character, to create the resulting, massive, expression. To save space, I won’t post the whole expression in, but you can see what it looks like if you hover your mouse over this text.

NB: There seems to be a limit to the number of parts to an expression that you can put into Google Analytics, so I tend to break this up into two parts – say, those matching on three or less words, and those matching four – and put them as ‘OR’ alternatives in one section. I’ve done that below to demonstrate.

The resultant segment rules for ‘Branded Keyphrases’ look like this:

The image shown above reads:

    • Dimension: Medium, Condition: Matches exactly, Value: organic
  • AND
    • Dimension: Keyword, Condition: Does not match regular expression, Value: technet|tech net|vox9000|vox 9000
  • AND
  • AND
    • Dimension: Keyword, Condition: Matches regular expression, Value: laptop|notebook

Collecting the numbers

With our two Advanced Segments defined, we can head back to the ‘keywords’ page and set the date range to the last month. Click each image to see it full size.

We can apply each custom segment in turn, in order to collect the following numbers for September:

  • Total keyphrases: 64,278
  • Branded keyphrases: 393
  • Head keyphrases: 2,835
  • Other keyphrases: 61,050 (calculated from the previous three numbers)

You can now put these numbers in a spreadsheet in order to chart the change in number of unique keyphrases as months go by.

You can use these basic techniques to create and report on even more well defined segments of keyphrases (for example: you could group keyphrases by competitiveness, department, intent, etc.) If there are particular steps here that require more explanation, or you’re looking for more ideas about how to apply this to your SEO reporting structure, drop a comment below.

Do you like this post? Yes No

Posted by RobOusbey

A useful indicator of SEO success is the number of unique keyphrases that send traffic to a website. An increase in this number is a reflection of increased trust in the site by search-engines.

Google Analytics can show you the total number of unique organic keyphrases at a glance, on the Traffic Sources ⇒ Keywords page. (Make sure you select ‘non-paid’ to exclude any CPC campaigns.)

This post will show you how to break that down to a more useful level of granularity and help you to create a table such as the following:

We’ll aim to categorise traffic into three buckets: ‘branded’, ‘head terms’ and ‘mid-long tail terms’. (In reality, we’ll actually calculate the first two, and the third one will be ‘everything that is left’.)

As we often can’t export enough keywords from Google Analytics to do the analysis offline, we will have to use ‘Advanced Segments’ to do this. This means that we can only group together ‘branded terms’ and ‘head terms’ in ways that we can explain through AND and OR statements.

The process for doing this goes like this:

  1. Plan to create advanced segments that define each group of keywords you want to track
  2. Define rules using ‘AND’ & ‘OR’ statements that describe which keywords should be in each group
  3. Apply these groups each month, one at a time, to the previous month’s data, in order to reveal the number of unique keywords.

Since this ‘rule defining’ will take place in Google Analytics’ Advanced Segments feature, we’ll be using ‘regular expressions’ – a clever but pretty technical method of defining which items in a set should be included in a particular subset. (More details about them at this site.)

The next sections may have particular appeal to the more ‘techie’ readers (or just those people feeling brave) – so do feel free to just skip down to the end to see screen-shots of these segments applied to the keywords report, if the nitty-gritty isn’t your cup of tea.

Creating the ‘Branded Terms’ Segment

If you’ve not really implemented Advanced Segments before, I suggest starting with Google Analytics’ help pages on the topic, but also having a play with the feature, to see how it works. (Really, do have a play. I’m going to assume you at least have understood what most of the main buttons do, and that’s a great way to find out.)

Planning the Segment

Let’s use a fictional company, TechNet, who make a product called the Vox9000. Their segment for ‘branded terms’ will include anything that mentions these terms.

Define the Rules, Create the Segment

To create the segment for branded terms, begin by clicking ‘Advanced Segments’ ⇒ ‘Create new custom segment’.

In the first ‘dimension or metric’ space, add a ‘Medium’ block (found under ‘Dimensions’) and set Condition to ‘Matches exactly’ and Value to ‘organic’. Then hit ‘and‘ to add another section. Place a ‘Keywords’ block here, with Condition as ‘Matches regular expression’ and a value that is all your branded terms, separated by the pipe character: |

(NB: the pipe acts as an ‘OR’ in these regular expressions.)

As an example, for TechNet (which people often search for it with a spaces, as ‘Tech Net’) that makes a product called ‘Vox9000′ (sometimes searched for as ‘Vox 9000′) would use the following string here: technet|tech net|vox9000|vox 9000

Give the segment a name, and save it.

Creating the ‘Head Terms’ Segment

Planning the Segment

The next segment – the head terms – is a bit more complicated, and you’ll see why it’s important for us to to specify rules that will define the head keyphrases.

Let’s imagine that TechNet sells laptops and notebooks in Philadelphia and Baltimore. (Therefore head terms will be those such as ‘notebooks’ or ‘laptops in philadelphia’)

In this example, the rules to define head terms might be:

  • the phrase can’t mention any branded terms
  • it must mention one of their product groups (laptop, notebook)
  • it can only have two words of 3+ characters (this allows for some short linking words, such as a, in, at, etcetera)
  • it can only have a maximum of four words in total.

Define the Rules, Create the Segment

The last two rules can be the trickiest to implement, so we’ll look at these first. Two insights help us solve these requirements:

Insight 1: Combining the two rules, and using S and L to indicate short words (1 or 2 characters) and long words (3+ characters) we see that the only twenty possible structures for keyphrases are: L, LS, SL, LL, LSS, SLS, SSL, LLS, LSL, SLL, LSSS, SLSS, SSLS, SSSL, LLSS, LSLS, LSSL, SLLS, SLSL, SSLL

Insight 2: The regular expression: \b[^ ]{3,50}\b matches a word of between 3 & 50 characters. It’s also necessary to know that ^ matches something at the beginning of an expression, and $ matches at the end. (Seriously, they do. Start by going through the examples at this site if you want to know why that’s the case.)

We’re now in a position to take the list of combinations from ‘Insight 1′ and replace ‘S’ with \b[^ ]{1,2}\b (matching words with 1/2 characters) and ‘L’ with \b[^ ]{3,50}\b, putting spaces in-between, wrapping in parentheses, and matching at beginning and end. Missed that? OK, here are examples of some of the resulting statements:

L becomes ^(\b[^ ]{3,50}\b)$
SL becomes ^(\b[^ ]{1,2}\b \b[^ ]{3,50}\b)$
LSL becomes ^(\b[^ ]{3,50}\b \b[^ ]{3,50}\b \b[^ ]{1,2}\b)$
etc.

You should join the twenty created expressions together using a pipe character, to create the resulting, massive, expression. To save space, I won’t post the whole expression in, but you can see what it looks like if you hover your mouse over this text.

NB: There seems to be a limit to the number of parts to an expression that you can put into Google Analytics, so I tend to break this up into two parts – say, those matching on three or less words, and those matching four – and put them as ‘OR’ alternatives in one section. I’ve done that below to demonstrate.

The resultant segment rules for ‘Branded Keyphrases’ look like this:

The image shown above reads:

    • Dimension: Medium, Condition: Matches exactly, Value: organic
  • AND
    • Dimension: Keyword, Condition: Does not match regular expression, Value: technet|tech net|vox9000|vox 9000
  • AND
  • AND
    • Dimension: Keyword, Condition: Matches regular expression, Value: laptop|notebook

Collecting the numbers

With our two Advanced Segments defined, we can head back to the ‘keywords’ page and set the date range to the last month. Click each image to see it full size.

We can apply each custom segment in turn, in order to collect the following numbers for September:

  • Total keyphrases: 64,278
  • Branded keyphrases: 393
  • Head keyphrases: 2,835
  • Other keyphrases: 61,050 (calculated from the previous three numbers)

You can now put these numbers in a spreadsheet in order to chart the change in number of unique keyphrases as months go by.

You can use these basic techniques to create and report on even more well defined segments of keyphrases (for example: you could group keyphrases by competitiveness, department, intent, etc.) If there are particular steps here that require more explanation, or you’re looking for more ideas about how to apply this to your SEO reporting structure, drop a comment below.

Do you like this post? Yes No

Yahoo! Sponsored Search Ads for BOSS

Every day thousands of developers drive Yahoo! Search BOSS traffic, serving millions of queries a day. Many of these developers have requested the ability to access Yahoo! Sponsored Search ads to monetize their BOSS innovations. Starting today, in partnership with Domain Development Corp (DDC), our first approved Yahoo! Search BOSS syndication partner, developers can get [...]

Every day thousands of developers drive Yahoo! Search BOSS traffic, serving millions of queries a day. Many of these developers have requested the ability to access Yahoo! Sponsored Search ads to monetize their BOSS innovations. Starting today, in partnership with Domain Development Corp (DDC), our first approved Yahoo! Search BOSS syndication partner, developers can get global access to Yahoo! Sponsored Search results and benefit from revenue generated by their BOSS-powered products.

We invite you to apply to qualify for this program to have Yahoo! Sponsored Search ads appear on your sites. You will need to provide details about your product, including information about your traffic sources, UI framework, and implementation.  Once your product is approved, you will have access to and support for Yahoo! Sponsored Search ads, and you will earn a revenue share.  You can apply at our partner’s site: www.ddc.com.

Please note that in signing up for this Sponsored Search ad program, you will be entering into a contract with a third party, not Yahoo!, and the third party will be providing support and sending you your earnings.  Other Yahoo! Search BOSS Ad partners may be online in the future, so stay tuned for updates.

By now most of you are aware of the Yahoo!-Microsoft search deal, which is still undergoing regulatory review.  While no decisions about BOSS have been made at this point, we will let you know as soon as we figure out the details. These key services will continue to operate for a period of time after we complete migrating our services and technology.

We are proud of the rich community of developers and entrepreneurs who share our enthusiasm for opening search and who continue to amaze us with innovative BOSS implementations. Thank you for your support.

Ashim Chhabra

The Yahoo! Search BOSS Team

Seth Godin: Sliced Bread

Malcolm Gladwell: Outliers

Anthony Parinello: Your Price is Too High