Wednesday, September 19, 2012

Google’s index of example.com



While writing several blog posts and documentation, I often have used
example.com to stand in for any
domain name. One of the Internet
standards established by the Internet engineers circa 1999 set aside example.com (as well as example.org and example.net) for documentation purposes. So if you were to click ona link to http://www.example.com in
my post, you wouldn’t see an actual web page. Click on this link to see for yourself.

I’d like to demonstrate a fun little
trick you can use to amaze your
friends.

The page you see is when you go to http://www.example.com is
completely indexable by the search engines. There’s not a lot of content,but you would think that the engines will have indexed the content
exactly as your browser shows it to
you. It turns out that there is a robots.txt file that blocks all spiders from all content inside www.example.com. (If you ever forget how to create a basic
robots.txt file, you can use this one
as a guide.) Alright, now for the punch line. Let’s see what the search engines really have indexed for
http://www.example.com. Go to
www.google.com and type“site:example.com” (without the
quotes). What do you see? If you see only one result, click on the link:
repeat the search with the omitted
results included.

I see 10,400 results now. There are
pages like example.com/blah/ and
www.example.com/concepts. The
Google search results page does not have links to the cached version for any of these results, unfortunately,so we can’t see what exactly Google has indexed from these pages, but
we can go to the page ourselves.
Well, I tried that, and every page I go to replies back with “Not Found.” It’s logical to conclude that those pages never existed, but also notice some of the results have been crawled by Google in the past few hours.
Impossible, no?

You can try this search on other
search engines too.
My feeling on this strange
phenomenon is that it could either be Google’s own testing or other people testing or somehow tricking Google into adding these pages to its index. It may be relegated to certain data centers as well.

Whatever is causing this, I’m sure
Google knows about it, but doesn’t
feel the need to do anything about it. This phenomenon may also get you thinking about how search engines are supposed to work.

ping fast  my blog, website, or RSS feed for Free AllNewsSite Link Exchange

Related Articles :


Stumble
Delicious
Technorati
Twitter
Facebook

0 comments:

Cerpen666

-Cerpen666-Only-

RECENT POSTS

Cerpen666-blog-

POPULAR POSTS

Cerpen666-
 

LOVE IS TO ACCEPT OTHERS FOR WHAT THEY ARE Copyright © 2011-2012 BloggerTemplate is Designed by Cerpen666