Site Search Knowledge Base
|
Site Search Knowledge Base
.: Crawl Questions
|
Crawl Questions
Crawling and Indexing Questions
|
Robot Exclusion Guide
The robots.txt file and robot META tags are methods used to allow and disallow crawling portions of your
site by robots (web robots, spiders). Website administrators and content providers can...
   
2005-01-20
Views: 7698
|
|
Learn How to Use NOINDEX & NOFOLLOW
"Crawling" is the process of finding content on your web site. Finding web
pages is similar to a web user browsing through a site and clicking on links.
Spiderline spiders also browse your...
   
2005-01-20
Views: 4314
|
|
Learn How to Configure URLs
Configuring URLs can be as simple or detailed as needed for your website. The Starting URL and
Pattern fields in combination with the INDEX and FOLLOW options allow you to control exactly...
   
2005-01-20
Views: 3595
|
|
Excluding crawler from sections of pages.
This help topic describes how to prevent sections of a document from being indexed. To prevent an entire document from being indexed, see the topics above.
Spiderline supports the proprietary...
(No rating)
2005-01-20
Views: 176567
|
|
How many pages are on my site?
You or your webmaster is the best judge of the size of your site. (Spiderline cannot know how many pages are on your site without physically crawling it.) You can purchase a lower plan than needed...
   
2005-04-20
Views: 7570
|
|
Sever HTTP response (Error) codes
Errors that come from a HTTP server that display in a browser are the same that Spiderline Cralwer receives.
Common codes are:
301 Moved Permanently
302 Found Redirect
303 See Other
307...
   
2005-07-12
Views: 7452
|
|
My account is not gettng crawled!!
Reasons an account may not be crawled.
Log into your account, check the crawl log and last crawl date.
Is your account out of crawls?
Is your account expired?
Is your website up and...
(No rating)
2005-01-19
Views: 6026
|
|
Why do I get duplicate pages in my results?
If duplicate pages are identical in all respects except for their URLs you should get a report of "document content matches previously processed document" next to the page in your crawl report....
(No rating)
2005-01-20
Views: 5487
|
|
Can Spiderline index ASP sites?
Yes, Spiderline can index ASP, PHP, JSP, CFM, MPE, and any other dynamically produced HTML.
(No rating)
2005-01-19
Views: 4884
|
|
Does Spiderline allow manual and scheduled indexing?
Yes. Spiderline provides all accounts (under any service plan) with the ability to request a recrawl/update of your website at any time. Our automated crawl scheduling feature will let you choose...
   
2005-01-19
Views: 4710
|
|
Why didn't Spiderline find all of my pages?
There are several reasons that all of your web site pages may not have been crawled.
Account Document Limit: The default Document Limit is 100 pages. In order for Spiderline to crawl over this...
(No rating)
2005-01-19
Views: 4591
|
|
Does NOINDEX keep documents from being counted?
Yes, if the NOINDEX and page or directory is in the URL patterns, this is because the document is avoided by the crawler.
No, if the pages are being ommited by Robot Meta-tags, this is becuase...
(No rating)
2005-06-27
Views: 3826
|
|
How do I exclude parts of my site from being crawled?
To exclude areas from being indexed, you will need to put in commands to the URLs section of the Crawl settngs. These Patterns will tell the crawler what to index and what to avoid indexing. Type...
(No rating)
2005-04-27
Views: 3800
|
|
How do robot meta tags work?
The Robots META tag is another method that may be used to indicate to visiting robots whether a page should
be indexed (crawled), or links on the page should be followed. It differs from the...
(No rating)
2005-04-27
Views: 3578
|
|
Does Spiderline honor the robot exclusion protocol?
Yes, Spiderline does honor the robot exclusion protocol. Our spiders will not index directories or follow links that have been disallowed in the robots.txt configuration file located on your...
(No rating)
2005-01-20
Views: 3391
|
|
Robot META tags Tutorial.
The Robots META tag is another method that may be used to indicate to visiting robots whether a page should
be indexed (crawled), or links on the page should be followed. It differs from the...
   
2005-04-27
Views: 3320
|
|
Javascript Navagation
Spiderline (like most site search engines) does not have the ability to
interpret java or javascript code to harvest links. Java and Javascript needs to be excecuted for it to be read, and it is...
   
2005-03-30
Views: 3219
|
|
What changes to my account require re-crawling?
Changes to any of the following configurations will require re-crawling your website.
URL configuration
Exclude Word List
Any authentication
Robots handleing
If you have an...
(No rating)
2005-01-20
Views: 3167
|
|
Useing NOINDEX and NOFOLLOW patterns
"Crawling" is the process of finding content on your web site. Finding web
pages is similar to a web user browsing through a site and clicking on links.
Spiderline spiders also browse your...
(No rating)
2005-05-03
Views: 3145
|
|
My crawl never completed.
If your crawl is not returning after a adaquate time. Adaquate meaning enough time for you to upload your full site on a 56K connection, time for pdf files and text documents to be opened and read...
(No rating)
2005-04-27
Views: 2822
|
|
What parts of a document does Spiderline crawl?
For HTML documents, the page title, Meta Keywords, Meta Description, and body text will be crawled. Image ALT tags and Robot comments are read but not indexed, you have the ability to use these...
(No rating)
2005-01-20
Views: 2693
|
|
No new crawls. Crawls stopped.
First check your account to see if you are either out of crawls or your account is expired.
If your account is active , with crawls available. Does the crawl status say "crawl in progress",...
(No rating)
2005-04-27
Views: 2438
|
|
|
|