Spiderline
custom search engine solutions
Your Own Search Engine.
Just seconds after registering, your web site can be searchable with the features you want and reliability you need. No software to install or maintenance required. Search results can match your website design seamlessly.

Site Search Knowledge Base

Search  
   
Browse by Category
Site Search Knowledge Base .: Crawl Questions .: How do I exclude parts of my site from being crawled?

How do I exclude parts of my site from being crawled?

To exclude areas from being indexed, you will need to put in commands to the URLs section of the Crawl settngs. These Patterns will tell the crawler what to index and what to avoid indexing. Type each pattern, one per line, with any desired index or follow options.

If you want everything on your website searchable, enter your domain name in the Patterns field. If you have only a few specific paths you want crawled, enter those paths in the Patterns field. Searchable documents should match a pattern that is followed by the INDEX option. In order to exclude a certain document or directory, the pattern entered must specify NOINDEX or the document must only be linked to by another document which has a NOFOLLOW command.

The precedence of entries in the Patterns field is the inverse of their order. Entries are read from top to bottom, meaning that the last entry's options will override any previous conflicting entries. If the Spiderline robot encounters a page that matches more than one entry in the Patterns field, the entry that is listed later will take precedence. This allows you to be even more specific in what you do and do not want crawled.

Examples:

  • * I enter "http://www.mydomain.com INDEX FOLLOW".

    All documents and directories that contain this pattern will be crawled. Entering your domain name in this field ensures that links to websites outside your domain will not be followed or crawled.

  • * I enter "/ INDEX FOLLOW". CAUTION!!!

    All documents have a "/" in their path somewhere! For example, "http://www.anydomain.com/. All documents on your website and on websites you link to will be crawled. Spiderline does not necessarily crawl all documents beginning with your domain first. If you enter a "/", you could crawl every website on the internet.

  • * I enter "/only/these/dirs/ INDEX FOLLOW".

    All documents and directories that contain this path will be crawled. A document found at "/only/here.html", "/only/these/here.html", or "/what/about/here.html" will not be crawled unless another pattern matching the document path is entered. Also, since the entry ended with a "/" ("dirs/"), the document "/only/these/dirs.html" will not be crawled. If the final "/" was omitted, dirs.html would be crawled.

  • * I only enter "/dirs/ INDEX FOLLOW" and "/some/path/ INDEX FOLLOW".

    A page must contain one or the other. All pages that do not have one of these pattterns will not be crawled.

  • * I enter "/not/here/ NOINDEX FOLLOW".

    Any document or directory that contains this path will not be crawled.

  • * I enter ".cgi NOINDEX NOFOLLOW".

    All documents that have a ".cgi" extension will not be crawled and any links found will not be followed.

  • * I enter on line 1 "/dir" and on line 2 "/tmp NOINDEX NOFOLLOW".

    All documents that contain "/dir" will be crawled, unless the document path also contains "/tmp". A document located at "/tmp/x.html", "/dir/tmp/here.html", or "/dir/tmp.html" will not be crawled and links withini those documents will not be followed.


How helpful was this article to you?

Related Articles

article Robot Exclusion Guide
The robots.txt file and robot META tags are methods used to allow and disallow crawling portions of your site by robots (web robots, spiders). Website administrators and content providers can...

  2005-01-20    Views: 20387   
article My account is not gettng crawled!!
Reasons an account may not be crawled. Log into your account, check the crawl log and last crawl date. Is your account out of crawls? Is your account expired? Is your website up and...

(No rating)  2005-01-19    Views: 12225   
article How do I use the Patterns fields to specify what should and should not be crawled?
To exclude areas from being indexed, you will need to put in commands to the URLs section of the Crawl settngs. These Patterns will tell the crawler what to index and what to avoid indexing. Type...

  2005-01-20    Views: 6167   


.: Powered by Lore 1.5.3

Powered by Lucene