I've got GNU Wget 1.10.2 for windows and linux and the -k option behaves differently on those two.

-k, --convert-links make links in downloaded HTML point to local files.

On windows it produces:

www.example.com/index.html
www.example.com/index.html@page=about
www.example.com/index.html@page=contact
www.example.com/index.html@page=sitemap

and on linux it produces:

www.example.com/index.html
www.example.com/index.html?page=about
www.example.com/index.html?page=contact
www.example.com/index.html?page=sitemap

This is problematic in linux because when I serve the mirror through Apache it will not distinguish between the 4 generated pages since the part after the questionmark (?) character is used as the query string to the file.

Any ideas on how I can control this?

thanks

Accepted Answer

You can't use a question mark (?) in a filename on NTFS or FAT32. This is why wget uses the at symbol (@) instead.

In Linux, only a slash (/) is forbidden on most filesystems, so wget uses the question mark (since it's part of the URI).

You can force either behaviour by using --restrict-file-names=unix or --restrict-file-names=windows.

From the wget documentation:

When mode is set to “unix”, Wget escapes the character ‘/’ and the control characters in the ranges 0–31 and 128–159. This is the default on Unix-like OS'es.

When mode is set to “windows”, Wget escapes the characters ‘\’, ‘|’, ‘/’, ‘:’, ‘?’, ‘"’, ‘*’, ‘<’, ‘>’, and the control characters in the ranges 0–31 and 128–159. In addition to this, Wget in Windows mode uses ‘+’ instead of ‘:’ to separate host and port in local file names, and uses ‘@’ instead of ‘?’ to separate the query portion of the file name from the rest. Therefore, a URL that would be saved as ‘www.xemacs.org:4300/search.pl?input=blah’ in Unix mode would be saved as ‘www.xemacs.org+4300/search.pl@input=blah’ in Windows mode. This mode is the default on Windows.

This page was build to provide you fast access to the question and the direct accepted answer.
The content is written by members of the stackoverflow.com community.
It is licensed under cc-wiki