Handling Robots
What exactly is a Robot?
A Robot in Search engine optimization
services terminology is a search engine software program which visits
a page on a website and follows all the links of the website from
that page and indexes some or all of the pages in the website.
Why do we need robots?
Everyday search engines receive
hundreds of new website submissions. It is quite cumbersome and
time consuming for a human to review the whole of the website and
judge whether that particular website meets the search engine optimization
standards and index the same. Here is where our friend robot comes
into picture. Robots are highly intelligent software programs which
crawl the entire website, checking the relevancy, consistency and
significance of the site thereby effectively indexing the website
into the search engine database reducing the amount of time consumed
per site. In this way a robot can quickly index more sites per day.
Though a robot is not a very critical aspect of search engine optimization
services technology, it is advisable to include it.
Controlling a Robot
Normally, a robot visits the home
page of the site and follows the links present in the page, scanning
each link and page. Sometimes we do not prefer a robot to index
a particular page(s) in our site. For instance, you might want a
series of pages to be viewed in sequence and would like to index
only the page one. To achieve this, we have a special kind of Meta
tag known as Robots Meta tag. The robots Meta tag is similar to
other Meta tags and is placed in the head of the document. This
tag dictates the robot which pages to be indexed, which pages not
be indexed, which links should be followed and which links should
not be followed.
A typical meta robots tag would
resemble as follows,
<meta name="robots" content ="index,
nofollow">
The meta tag describes us that the robot should
index the page visited but to not follow the links in the page.
The other most and significant part of controlling
robots is the robots.txt file. The robots.txt file is used primarily
to control areas or portions of the website by excluding those portions
being visited by the robot. Whenever a robot visits a site, it first
checks the robots.txt file.
The robots.txt file,
- is a text file created in notepad or
any text editor
- should be placed in the top level directory
or root of the website or server space
- should include all lower case letters
Through the robots.txt file we can,
- Exclude all robots from visiting the
server
- Allow complete access to all robots
- Exclude robots from accessing a portion
of the server
- Exclude a specific robot
- Exclude certain type of files from accessing
by specifying the file extensions.
Finally, we can conclude that the robots.txt
file basically acts as a filter thereby providing total control
over the search engine robot.
While talking of robots I would like to
mention the revisit tag. This tag is an important tag in the realm
of search engine optimization services technology. The revisit tag
tells the search engine the time duration after which it should
visit your site again. If you change your site's contents frequently
then your revisit time should be say a week else it can be higher.
Your search engine rankings dip if the search engine visits you
a second time and finds that the content has not been altered significantly.
Though not all search engines honor the Revisit tag, it is advisable
to include the tag. If you are keen on top search engine rankings
use the revisit tag judiciously.
|