Spiders

rASCAL

Disciple
I have a weird question ,
I was thinking about spiders (google and yahoo)
sometimes when we search on google and yahoo we get forum threads, but on clicking on the page it asks you to log in.
was just wondering how do spiders get access to the threads , does this mean there is a vulnerability in the web sites ?

I don't know if this is the correct place to post the q's ..
mods please do the needful
thanks
 
The search engines sit on ISP backbones to gather information. The webpage might be transmitted unencrypted, so google/yahoo pick it up and index it. But when you request the page, the website obviously asks for a login.

Raghu.
 
In most of the forum applications there will be an option(or addons) to treat bots/spiders as users. If you notice the online members list, sometimes there will be an entry of googlebot or that of yahoo. So spiders can have access into a restricted forum if the admins want (they can also block the spiders using the option)
 
i know about the entry in the web pages
but then can't a person impersonate a spider and get the info w/o having to log in?
 
Spiders are programs which have a list of domains.. They go to each domain and request pages. But they also check .htaccess files for proper guidance on how to more around.

You can code your own spider, and follow links. But the amount of data amassed is tremendous. You can access all pages in a forum which a guest can access thorugh a spider. Otherwise if there are special rules of spiders then they can access according to those rules.
 
The answer is there if you care enough to read.

you can use Mod_Rewrite ( .htaccess ) to change behavior based on rules. For spiders the rules are written in that .htaccess file.
 
Back
Top