Spiderline
custom search engine solutions
Your Own Search Engine.
Just seconds after registering, your web site can be searchable with the features you want and reliability you need. No software to install or maintenance required. Search results can match your website design seamlessly.

Site Search Knowledge Base

Search  
   
Browse by Category
Site Search Knowledge Base .: Crawl Questions .: Excluding crawler from sections of pages.

Excluding crawler from sections of pages.

This help topic describes how to prevent sections of a document from being indexed. To prevent an entire document from being indexed, see the topics above.

Spiderline supports the proprietary "robots" comment tag. This tag allows a web author to apply robots exclusion rules to arbitrary sections of a document. The tag has one attribute, content, with the following possible values:

  • noindex - the text enclosed in the tag is not saved in the index
  • nofollow - links are not extracted from the text enclosed
  • none - enclosed text is not indexed nor searched for links

Values "index", "follow", and "all" are also valid. In practice they are ignored since they are the unspoken defaults.

This feature is expected to fit the customer need of preventing certain parts of a document - such as a navigational sidebar - from being included in the search.

Example:

<HTML>
<BODY>

This text will be indexed.
    <A HREF="foo.html"> this link will be followed </A>

<!-- robots content="none" -->

This text will NOT be indexed.
        <A HREF="bar.html"> this link will NOT be followed </A>

<!-- /robots -->

<!-- robots content="noindex" -->

This text will NOT be indexed.
<A HREF="bar1.html"> this link WILL be followed </A>

<!-- /robots -->

<!-- robots content="nofollow" -->

This text WILL be indexed.
<A HREF="bar1.html"> this link will NOT be followed </A>

<!-- /robots -->

la la la

</BODY>
</HTML>

For the example of a navigational sidebar, the "noindex" value would be the best choice.

This syntax was designed to match the robots META tag.

For documents which have both the "robots" META tag and the "robots" comment tag, the most restrictive interpretation will be made, always erring on the side on not indexing or not following.



How helpful was this article to you?

Related Articles

article Why didn't Spiderline find all of my pages?
There are several reasons that all of your web site pages may not have been crawled. Account Document Limit: The default Document Limit is 100 pages. In order for Spiderline to crawl over this...

  2005-04-20    Views: 11227   
article Why didn't Spiderline find all of my pages?
There are several reasons that all of your web site pages may not have been crawled. Account Document Limit: The default Document Limit is 100 pages. In order for Spiderline to crawl over this...

(No rating)  2005-01-19    Views: 11608   
article How do I exclude parts of my site from being crawled?
To exclude areas from being indexed, you will need to put in commands to the URLs section of the Crawl settngs. These Patterns will tell the crawler what to index and what to avoid indexing. Type...

  2005-04-27    Views: 20087   


.: Powered by Lore 1.5.3

Powered by Lucene