One of the most common questions we hear from librarians is "How does Google decide what result goes at the top of the list?" Here, from quality engineer Matt Cutts, is a quick primer on how we crawl and index the web and then rank search results. Matt also suggests exercises school librarians can do to help students.
Crawling and Indexing
A lot of things have to happen before you see a web page containing your Google search results. Our first step is to crawl and index the billions of pages of the World Wide Web. This job is performed by Googlebot, our "spider," which connects to web servers around the world to fetch documents. The crawling program doesn't really roam the web; it instead asks a web server to return a specified web page, then scans that web page for hyperlinks, which provide new documents that are fetched the same way. Our spider gives each retrieved page a number so it can refer to the pages it fetched.
Our crawl has produces an enormous set of documents, but these documents aren't searchable yet. Without an index, if you wanted to find a term like civil war, our servers would have to read the complete text of every document every time you searched.
Google Librarian Central
No comments:
Post a Comment