When you sit at your computer and do a Google search, you are presented with a list of results from all over the web, almost instantly.
So how does Google find all of those web pages and arrange them in the correct order for you?
Very simply, you could think of searching the Internet as looking in a very large book with a very detailed index telling you exactly where everything is located in the book.
When you perform a Google search, Google’s programs check their index to determine the most relevant search results to match to your search.
There are three key processes that enable Google to deliver its results:
Crawling – This is where Google’s computers crawl your website and web pages. Google’s own program is called Googlebot (or Robot, Bot, or Spider).
Googlebot uses an algorithmic process: computer programs determine which sites to crawl, how often and how many pages to retrieve from each site.
Google’s crawl begins with a list of URLs generated from previous crawls and Sitemap data provided by webmasters. Every time Google’s spiders crawl a web page they note new links, new content and dead links.
Indexing – Googlebot processes all of the pages it crawls and compiles a massive index of all of the words it sees along with the positions on the page. It also gathers information about content tags, title tags and ALT attributes etc. Google can process most data, however it cannot process some pages rich in media files or dynamic pages.
Serving Results – When you perform a Google search, Google’s computers search their index for matching pages and return the results in order of relevance.
Relevance is determined by over 200 factors, one of which is Google’s PageRank technology. In simple terms, each link to a page on your site from another site adds to your sites PageRank.
Google is working hard to improve user experience and is investing heavily in spotting spam links and other practices that negatively impact the quality of search results. The best types of incoming links are those that are given based on the quality of the content on the web page.
Comments on this entry are closed.