Go to the first, previous, next, last section, table of contents.
- `-E'
-
- `--html-extension'
-
If a file of type `text/html' is downloaded and the URL does not
end with the regexp `\.[Hh][Tt][Mm][Ll]?', this option will cause
the suffix `.html' to be appended to the local filename. This is
useful, for instance, when you're mirroring a remote site that uses
`.asp' pages, but you want the mirrored pages to be viewable on
your stock Apache server. Another good use for this is when you're
downloading the output of CGIs. A URL like
`http://site.com/article.cgi?25' will be saved as
`article.cgi?25.html'.
Note that filenames changed in this way will be re-downloaded every time
you re-mirror a site, because Wget can't tell that the local
`X.html' file corresponds to remote URL `X' (since
it doesn't yet know that the URL produces output of type
`text/html'. To prevent this re-downloading, you must use
`-k' and `-K' so that the original version of the file will be
saved as `X.orig' (see section Recursive Retrieval Options).
- `--http-user=user'
-
- `--http-passwd=password'
-
Specify the username user and password password on an
HTTP server. According to the type of the challenge, Wget will
encode them using either the
basic
(insecure) or the
digest
authentication scheme.
Another way to specify username and password is in the URL itself
(see section URL Format). For more information about security issues with
Wget, See section Security Considerations.
- `-C on/off'
-
- `--cache=on/off'
-
When set to off, disable server-side cache. In this case, Wget will
send the remote server an appropriate directive (`Pragma:
no-cache') to get the file from the remote service, rather than
returning the cached version. This is especially useful for retrieving
and flushing out-of-date documents on proxy servers.
Caching is allowed by default.
- `--cookies=on/off'
-
When set to off, disable the use of cookies. Cookies are a mechanism
for maintaining server-side state. The server sends the client a cookie
using the
Set-Cookie
header, and the client responds with the
same cookie upon further requests. Since cookies allow the server
owners to keep track of visitors and for sites to exchange this
information, some consider them a breach of privacy. The default is to
use cookies; however, storing cookies is not on by default.
- `--load-cookies file'
-
Load cookies from file before the first HTTP retrieval.
file is a textual file in the format originally used by Netscape's
`cookies.txt' file.
You will typically use this option when mirroring sites that require
that you be logged in to access some or all of their content. The login
process typically works by the web server issuing an HTTP cookie
upon receiving and verifying your credentials. The cookie is then
resent by the browser when accessing that part of the site, and so
proves your identity.
Mirroring such a site requires Wget to send the same cookies your
browser sends when communicating with the site. This is achieved by
`--load-cookies'---simply point Wget to the location of the
`cookies.txt' file, and it will send the same cookies your browser
would send in the same situation. Different browsers keep textual
cookie files in different locations:
- Netscape 4.x.
-
The cookies are in `~/.netscape/cookies.txt'.
- Mozilla and Netscape 6.x.
-
Mozilla's cookie file is also named `cookies.txt', located
somewhere under `~/.mozilla', in the directory of your profile.
The full path usually ends up looking somewhat like
`~/.mozilla/default/some-weird-string/cookies.txt'.
- Internet Explorer.
-
You can produce a cookie file Wget can use by using the File menu,
Import and Export, Export Cookies. This has been tested with Internet
Explorer 5; it is not guaranteed to work with earlier versions.
- Other browsers.
-
If you are using a different browser to create your cookies,
`--load-cookies' will only work if you can locate or produce a
cookie file in the Netscape format that Wget expects.
If you cannot use `--load-cookies', there might still be an
alternative. If your browser supports a "cookie manager", you can use
it to view the cookies used when accessing the site you're mirroring.
Write down the name and value of the cookie, and manually instruct Wget
to send those cookies, bypassing the "official" cookie support:
wget --cookies=off --header "Cookie: name=value"
- `--save-cookies file'
-
Save cookies from file at the end of session. Cookies whose
expiry time is not specified, or those that have already expired, are
not saved.
- `--ignore-length'
-
Unfortunately, some HTTP servers (CGI programs, to be more
precise) send out bogus
Content-Length
headers, which makes Wget
go wild, as it thinks not all the document was retrieved. You can spot
this syndrome if Wget retries getting the same document again and again,
each time claiming that the (otherwise normal) connection has closed on
the very same byte.
With this option, Wget will ignore the Content-Length
header--as
if it never existed.
- `--header=additional-header'
-
Define an additional-header to be passed to the HTTP servers.
Headers must contain a `:' preceded by one or more non-blank
characters, and must not contain newlines.
You may define more than one additional header by specifying
`--header' more than once.
wget --header='Accept-Charset: iso-8859-2' \
--header='Accept-Language: hr' \
http://fly.srk.fer.hr/
Specification of an empty string as the header value will clear all
previous user-defined headers.
- `--proxy-user=user'
-
- `--proxy-passwd=password'
-
Specify the username user and password password for
authentication on a proxy server. Wget will encode them using the
basic
authentication scheme.
- `--referer=url'
-
Include `Referer: url' header in HTTP request. Useful for
retrieving documents with server-side processing that assume they are
always being retrieved by interactive web browsers and only come out
properly when Referer is set to one of the pages that point to them.
- `-s'
-
- `--save-headers'
-
Save the headers sent by the HTTP server to the file, preceding the
actual contents, with an empty line as the separator.
- `-U agent-string'
-
- `--user-agent=agent-string'
-
Identify as agent-string to the HTTP server.
The HTTP protocol allows the clients to identify themselves using a
User-Agent
header field. This enables distinguishing the
WWW software, usually for statistical purposes or for tracing of
protocol violations. Wget normally identifies as
`Wget/version', version being the current version
number of Wget.
However, some sites have been known to impose the policy of tailoring
the output according to the User-Agent
-supplied information.
While conceptually this is not such a bad idea, it has been abused by
servers denying information to clients other than Mozilla
or
Microsoft Internet Explorer
. This option allows you to change
the User-Agent
line issued by Wget. Use of this option is
discouraged, unless you really know what you are doing.
Go to the first, previous, next, last section, table of contents.