Don’t Search Google for 123456

A Google Web Search Help thread reports that he was testing to make sure his boss’s computer network had connectivity, so he searched for [123456] in Google and up came a porn video hosted on Google Video in the top result.

Here is a picture:

Google Porn 123456

The video on the right is complete adult pornographic material – so don’t click on it (unless you want porn, then click on it).

The person who spotted this was very disturbed. He said:

I was assisting our CEO’s secretary today, and to test whether she had connectivity, I did a quick search on google for “123456″.

She thought I was being disgusting, but I eventually convinced her it was a freak occurrence.

Can you imagine that, this woman thought this guy was playing a trick on her or something. How sad.

Forum discussion at Google Web Search Help.


A Google Web Search Help thread reports that he was testing to make sure his boss’s computer network had connectivity, so he searched for [123456] in Google and up came a porn video hosted on Google Video in the top result.

Here is a picture:

Google Porn 123456

The video on the right is complete adult pornographic material – so don’t click on it (unless you want porn, then click on it).

The person who spotted this was very disturbed. He said:

I was assisting our CEO’s secretary today, and to test whether she had connectivity, I did a quick search on google for “123456″.

She thought I was being disgusting, but I eventually convinced her it was a freak occurrence.

Can you imagine that, this woman thought this guy was playing a trick on her or something. How sad.

Forum discussion at Google Web Search Help.



Advanced Link Analysis Charts

Posted by willcritchlow

Bored of sorting massive lists of links in all kinds of different directions to understand the link profile of a new site?

Struggle to understand how to gather actual insights about link profiles from lists of thousands of links and persuade management of the actions needed?

Don’t panic. Help is at hand.

I’m going to share some data visualisation tips today that I reckon I could use to beat up on Rand in a presentation-off (umm, again). We have recently been doing some deep dives into clients’ and prospects’ link profiles which gave me an excuse to mash up some Linkscape API data in Excel. I’ve used Linkscape data, but you could use any link analysis tool you like as long as you can get some metric to sort the linking domain by (I have used domain mozTrust in most of the examples below). Equally, I’ve used Excel, but you can use any data analysis package you like. If you want to use Excel, you will need the Data Analysis Toolpak (for the histogram function).

I’ll get into how to make the charts in a minute, but first I’m going to just show you some pretty pictures:

Impress the boss

This one is of questionable use (I think there are better ways of actually visualising the data) but it’s pretty, and bosses like pretty (allegedly). This is a surface chart of number of linking domains by domain mozTrust shown across 4 data points – all links, links to the homepage and links to the next two strongest pages:

Impress your boss with surface charts

The bit of insight this does give us at a glance is that the vast majority of the site’s very low DmT links go to the homepage and that the most trusted domains linking to the site (DmT >= 8) don’t link to the homepage or the next two strongest pages.

The same chart just showing links to the homepage compared to all links which shows the top end a litle more clearly:

Smaller chart to impress your boss

Gathering insights

I think this data is actually easier to see as a line chart like this (locations A and B are the top two strongest pages on the site after the homepage):

More detail about links shown with a line graph

What we just about see here is some bumps up at the top end of the DmT scale in the light blue line which is the same bit of insight I mentioned above.

Drilling down

Diving into this data to show only the top end of the DmT scale, we get:

Drilling down into link data

And we see that although the homepage and these top two location pages are the most powerful pages on the site, they are not the ones with the links from the biggest / most trusted sites. This is an area for further examination that would be hard to discover by looking at endless lists of links.

This is just an example of the kind of insight you can gather. I’m showing off tools and techniques here rather than specific insights. I’ll leave you to do your own playing to discover interesting things about your clients and competitors. I didn’t know what I was going to find when I started diving into the data for this site. You likely won’t know either, but graphs are great discovery tools. Sometimes, of course you find nothing of interest:

Comparing top pages

Comparing just the top two pages doesn’t give us any very meaningful insights except that the big links out at 6.5-7 DmT to location A probably explain why it’s more powerful than B. It might be more insightful at a lower granularity.

Equally, I haven’t yet learnt to understand the meaning that I am sure is buried in charts like this one:

Individual links chart

This is the number of links to a whole site by the mR of the linking page. Like the mythical guys who can understand network traffic by watching LEDs blink on routers, I’d love to be able to look at this kind of chart and really understand things. The closest I’ve got so far is that I think these charts should look roughly smooth in the absence of manipulation. If we assume that the difficulty of acquiring a link is roughly correlated to its strength and that we get links at a rate inversely proportional to their difficulty, then I think this chart should look roughly like a Poisson distribution:

Poisson distribution

Which this one does, so I’m happy.

Persuading management / bosses

The next thing that some of these charts helps with is making the case to management when you know something is true, but they need more persuading. This next example takes two different sites (neither of them is the site above) that are in different industries but have remarkably similar link characteristics at the macro level (don’t ask me how I found these sites – I am just that sad). The spider chart shows how similar they are:

Spider comparison - almost identical sites

However, if we dig in a little further, we find quite a difference behind the scenes:

Site comparison side by side

The red site seems to have loads more decent links (mR 4, 5, 6) than the blue site. So how does the blue site end up with similar domain metrics?

It’s all about the relatively small number of very powerful links the blue site has. Zooming in on mR 6 & 7 links:

Powerful links comparison

If you were just to look at this chart, you might imagine that the red site was getting more juice passed via these links than the blue site is. However, you’d be being fooled by the logarithmic scale. In terms of total juice passed by just these mR 6 and 7 links, the actual story is:

Powerful links through logarithm scale

In other words, the blue site is competing almost purely on the basis of the big mR 7 links it has that the red site doesn’t. That’s kinda interesting in terms of strategy generation isn’t it?

How do you do this analysis?

Pretty much everything in this post was generated using the histogram function in Excel running over Linkscape API data. It’s pretty straightforward with the online help. The only gotchas I noticed that you might need to know about were:

  1. Align the ‘bins’ (which are the x-axis values on most of the charts above) either with mR / mT intervals (e.g. 1, 2, 3, 4, …) or go much more granular (e.g. 0.1, 0.2, 0.3, ….). Anything in between tends to generate artifacts
  2. The bin range has to be on the same sheet as the data – if you try to pull in a bin range from another sheet, it fails silently
  3. If you want to do the surface chart, you need to do some interpolation between your points. In the examples above, I just did a linear interpolation (i.e. drawing a straight line between the different page levels) – so if the homepage has 100 mR 2 links and the next page has 50 mR 2 links, I just created 10 imaginary pages with 55, 60, 65, 70… mR 2 links to spread the surface out far enough to see it. This may not be the best way of doing things. I’d love to hear from anyone who has a better method

Thanks to foliovision for the photo from the ProSEO seminar.

Technorati Tags

, , , ,

Do you like this post? Yes No

Posted by willcritchlow

Bored of sorting massive lists of links in all kinds of different directions to understand the link profile of a new site?

Struggle to understand how to gather actual insights about link profiles from lists of thousands of links and persuade management of the actions needed?

Don’t panic. Help is at hand.

I’m going to share some data visualisation tips today that I reckon I could use to beat up on Rand in a presentation-off (umm, again). We have recently been doing some deep dives into clients’ and prospects’ link profiles which gave me an excuse to mash up some Linkscape API data in Excel. I’ve used Linkscape data, but you could use any link analysis tool you like as long as you can get some metric to sort the linking domain by (I have used domain mozTrust in most of the examples below). Equally, I’ve used Excel, but you can use any data analysis package you like. If you want to use Excel, you will need the Data Analysis Toolpak (for the histogram function).

I’ll get into how to make the charts in a minute, but first I’m going to just show you some pretty pictures:

Impress the boss

This one is of questionable use (I think there are better ways of actually visualising the data) but it’s pretty, and bosses like pretty (allegedly). This is a surface chart of number of linking domains by domain mozTrust shown across 4 data points – all links, links to the homepage and links to the next two strongest pages:

Impress your boss with surface charts

The bit of insight this does give us at a glance is that the vast majority of the site’s very low DmT links go to the homepage and that the most trusted domains linking to the site (DmT >= 8) don’t link to the homepage or the next two strongest pages.

The same chart just showing links to the homepage compared to all links which shows the top end a litle more clearly:

Smaller chart to impress your boss

Gathering insights

I think this data is actually easier to see as a line chart like this (locations A and B are the top two strongest pages on the site after the homepage):

More detail about links shown with a line graph

What we just about see here is some bumps up at the top end of the DmT scale in the light blue line which is the same bit of insight I mentioned above.

Drilling down

Diving into this data to show only the top end of the DmT scale, we get:

Drilling down into link data

And we see that although the homepage and these top two location pages are the most powerful pages on the site, they are not the ones with the links from the biggest / most trusted sites. This is an area for further examination that would be hard to discover by looking at endless lists of links.

This is just an example of the kind of insight you can gather. I’m showing off tools and techniques here rather than specific insights. I’ll leave you to do your own playing to discover interesting things about your clients and competitors. I didn’t know what I was going to find when I started diving into the data for this site. You likely won’t know either, but graphs are great discovery tools. Sometimes, of course you find nothing of interest:

Comparing top pages

Comparing just the top two pages doesn’t give us any very meaningful insights except that the big links out at 6.5-7 DmT to location A probably explain why it’s more powerful than B. It might be more insightful at a lower granularity.

Equally, I haven’t yet learnt to understand the meaning that I am sure is buried in charts like this one:

Individual links chart

This is the number of links to a whole site by the mR of the linking page. Like the mythical guys who can understand network traffic by watching LEDs blink on routers, I’d love to be able to look at this kind of chart and really understand things. The closest I’ve got so far is that I think these charts should look roughly smooth in the absence of manipulation. If we assume that the difficulty of acquiring a link is roughly correlated to its strength and that we get links at a rate inversely proportional to their difficulty, then I think this chart should look roughly like a Poisson distribution:

Poisson distribution

Which this one does, so I’m happy.

Persuading management / bosses

The next thing that some of these charts helps with is making the case to management when you know something is true, but they need more persuading. This next example takes two different sites (neither of them is the site above) that are in different industries but have remarkably similar link characteristics at the macro level (don’t ask me how I found these sites – I am just that sad). The spider chart shows how similar they are:

Spider comparison - almost identical sites

However, if we dig in a little further, we find quite a difference behind the scenes:

Site comparison side by side

The red site seems to have loads more decent links (mR 4, 5, 6) than the blue site. So how does the blue site end up with similar domain metrics?

It’s all about the relatively small number of very powerful links the blue site has. Zooming in on mR 6 & 7 links:

Powerful links comparison

If you were just to look at this chart, you might imagine that the red site was getting more juice passed via these links than the blue site is. However, you’d be being fooled by the logarithmic scale. In terms of total juice passed by just these mR 6 and 7 links, the actual story is:

Powerful links through logarithm scale

In other words, the blue site is competing almost purely on the basis of the big mR 7 links it has that the red site doesn’t. That’s kinda interesting in terms of strategy generation isn’t it?

How do you do this analysis?

Pretty much everything in this post was generated using the histogram function in Excel running over Linkscape API data. It’s pretty straightforward with the online help. The only gotchas I noticed that you might need to know about were:

  1. Align the ‘bins’ (which are the x-axis values on most of the charts above) either with mR / mT intervals (e.g. 1, 2, 3, 4, …) or go much more granular (e.g. 0.1, 0.2, 0.3, ….). Anything in between tends to generate artifacts
  2. The bin range has to be on the same sheet as the data – if you try to pull in a bin range from another sheet, it fails silently
  3. If you want to do the surface chart, you need to do some interpolation between your points. In the examples above, I just did a linear interpolation (i.e. drawing a straight line between the different page levels) – so if the homepage has 100 mR 2 links and the next page has 50 mR 2 links, I just created 10 imaginary pages with 55, 60, 65, 70… mR 2 links to spread the surface out far enough to see it. This may not be the best way of doing things. I’d love to hear from anyone who has a better method

Thanks to foliovision for the photo from the ProSEO seminar.

Technorati Tags

, , , ,

Do you like this post? Yes No

Accessing SearchMonkey Structured Objects via BOSS

SearchMonkey and the structured Web
We’ve just announced an all-new Yahoo! Search experience, with many new features powered by SearchMonkey data.  Since launching our open developer platform in May 2008, Yahoo! Search has enabled thousands of developers to shape the search experience for millions of Yahoo! users. If you are interested in building semantic applications similar [...]

SearchMonkey and the structured Web

We’ve just announced an all-new Yahoo! Search experience, with many new features powered by SearchMonkey data.  Since launching our open developer platform in May 2008, Yahoo! Search has enabled thousands of developers to shape the search experience for millions of Yahoo! users. If you are interested in building semantic applications similar to what we’ve come up with at Yahoo! Search, here are some details to get you started.

What structured objects are available?

All of the objects listed on the SearchMonkey homepage are available to you. With the new feature “object refiners,” users can now restrict the search results to specific object types. Site owners contribute data of these objects by marking up their pages with RDF or microformats, or by providing dataRSS feeds. If you’re interested in the actual data of these objects, use the Yahoo! Search BOSS API to request the SearchMonkey data as part of the search request.

How can I access these structured objects?

The SearchMonkey team has been encouraging developers to use our structured data to build semantic Web applications ever since we partnered with BOSS.  Using the BOSS API, you can access SearchMonkey structured objects.

To restrict the result set to pages with SearchMonkey objects, just add “searchmonkey:<objectType>” to your query. The result set from BOSS will only contain URLs that have objects of that type.

For example, the following query returns all of the documents in the Yahoo! Web index that has the words “Sunnyvale” and “pizza” – about 3 million pages.

http://boss.yahooapis.com/ysearch/web/v1/sunnyvale+pizza?appid=wX7OZ3zV34Fy2Y4W4in_vsjFmRhruQNgCxdxn6RUke2c2JVDZdw6bfc1rcEjVnw-&format=xml

But if you only want pages with local business objects on them, you can add “searchmonkey:local” to the query:

http://boss.yahooapis.com/ysearch/web/v1/sunnyvale+pizza+searchmonkey:local?appid=wX7OZ3zV34Fy2Y4W4in_vsjFmRhruQNgCxdxn6RUke2c2JVDZdw6bfc1rcEjVnw-&format=xml

This query returns about 25,000 pages.

Yes, we’ve just thrown out over 90 percent of the result set – but we are after the most relevant results, not simply the greatest number of results. Our new object refiners use SearchMonkey’s structured data to narrow your query from “pizza+Sunnyvale” to actual local business listings within those results. You can use BOSS to retrieve the same structured data and construct any presentation you like.

You can take it a step further and add any of these terms to the query:

  • searchmonkey:video – restricts the result set to videos.
  • searchmonkey:product – restricts the result set to products.
  • searchmonkey:local – restricts the result set to local businesses.
  • searchmonkey:event – restricts the result set to events.
  • searchmonkey:document – restricts the result set to presentations, spreadsheets, and similar document formats.
  • searchmonkey:discussion – restricts the result set to blogs and forums.
  • searchmonkey:game – restricts the result set to Flash games.

What don’t I get?

Not all structured data we’ve collected is part of the BOSS API.  For example, some third parties who provide us with feeds have elected to keep that data outside of BOSS. Structured data annotations from technologies built by Yahoo! Research are also not available to third party developers via BOSS. However, we aim to include all data we find embedded in web pages that deploy microformats or RDFa.

Our goal is a successful semantic Web where we extract the semantics as we process Web content. Every page marked up with semantic data makes that much easier for us to extract meaning from that page. And it’s not just us! Google Video Search has recently adopted the same video markup (RDFa and Facebook Share) that SearchMonkey supports.

What’s next?

We will make many more object types available to you soon. In the mean time, you can learn more about SearchMonkey and how we acquire structured data annotations from this new from this post on the YDN Blog.

Kevin Haas

Senior engineering manager, Yahoo! SearchMonkey

Seth Godin: Sliced Bread

Malcolm Gladwell: Outliers

Anthony Parinello: Your Price is Too High