[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Spiderspace




Ed Carp writes:
 > ... I was under the impression that the only documents that most web crawlers
 > will search are documents that are link-accessible.  Are you saying that this
 > isn't true?  Are you saying that Alta-Vista will search EVERYTHING that's
 > publicly accessible, whether by anonymous FTP or web?

Ah, but if it hits a site that's set up with a top-level directory
which *does* contain an "index" page but whose server *doesn't*
recognize the index page name, then when you hit the site you
(probably) get one of those server-generated indices.  Those things
generally have *everything* in the directory visible (except those
files blocked by the server configuration, usually stuff like emacs
temp files), and so there you go...

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
| Nobody's going to listen to you if you just | Mike McNally ([email protected]) |
| stand there and flap your arms like a fish. | Tivoli Systems, Austin TX    |
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~