I've got GNU Wget 1.10.2 for windows and linux and the -k option behaves differently on those two.
-k, --convert-links make links in downloaded HTML point to local files.
On windows it produces:
www.example.com/index.html www.example.com/index.html@page=about www.example.com/index.html@page=contact www.example.com/index.html@page=sitemap
and on linux it produces:
www.example.com/index.html www.example.com/index.html?page=about www.example.com/index.html?page=contact www.example.com/index.html?page=sitemap
This is problematic in linux because when I serve the mirror through Apache it will not distinguish between the 4 generated pages since the part after the questionmark (?) character is used as the query string to the file.
Any ideas on how I can control this?
You can't use a question mark (?) in a filename on NTFS or FAT32. This is why wget uses the at symbol (@) instead.
In Linux, only a slash (/) is forbidden on most filesystems, so wget uses the question mark (since it's part of the URI).
You can force either behaviour by using
From the wget documentation:
When mode is set to â€œunixâ€, Wget escapes the character â€˜/â€™ and the control characters in the ranges 0â€“31 and 128â€“159. This is the default on Unix-like OS'es.
When mode is set to â€œwindowsâ€, Wget escapes the characters â€˜\â€™, â€˜|â€™, â€˜/â€™, â€˜:â€™, â€˜?â€™, â€˜"â€™, â€˜*â€™, â€˜<â€™, â€˜>â€™, and the control characters in the ranges 0â€“31 and 128â€“159. In addition to this, Wget in Windows mode uses â€˜+â€™ instead of â€˜:â€™ to separate host and port in local file names, and uses â€˜@â€™ instead of â€˜?â€™ to separate the query portion of the file name from the rest. Therefore, a URL that would be saved as â€˜
www.xemacs.org:4300/search.pl?input=blahâ€™ in Unix mode would be saved as â€˜
www.xemacs.org+4300/search.pl@input=blahâ€™ in Windows mode. This mode is the default on Windows.
The content is written by members of the stackoverflow.com community.
It is licensed under cc-wiki