GNU Wget Manual

Go to the first, previous, next, last section, table of contents.

Robots

Since Wget is able to traverse the web, it counts as one of the Web robots. Thus Wget understands Robots Exclusion Standard (RES)---contents of `/robots.txt', used by server administrators to shield parts of their systems from wanderings of Wget.

Norobots support is turned on only when retrieving recursively, and never for the first page. Thus, you may issue:

wget -r http://fly.cc.fer.hr/

First the index of fly.cc.fer.hr will be downloaded. If Wget finds anything worth downloading on the same host, only then will it load the robots, and decide whether or not to load the links after all. `/robots.txt' is loaded only once per host. Wget does not support the robots META tag.

The description of the norobots standard was written, and is maintained by Martijn Koster m.koster@webcrawler.com. With his permission, I contribute a (slightly modified) texified version of the RES.

Go to the first, previous, next, last section, table of contents.