Getting directories with GNU Wget
« previous entry | next entry »
Sep. 5th, 2008 | 02:45 pm
Sometimes there are files that are available from a Web server using Apache's auto index module (mod_autoindex), and you want to copy them to your machine. And you're satisfied retrieving them over HTTP this one time, rather than another file transfer method like SSH, FTP or rsync for that matter.
I usually feel confident retrieving things with GNU Wget things over HTTP, but its command-line arguments are hard to memorize. It took me a long time to put together, but the following will copy a directory on a Web server to your current directory.
$ wget -r -N -nH -nd -np -R "index.html*" -P nyc-2008 \
'http://localhost/~ashawley/photos/nyc-2008/
The command deletes all the file listings -- index.html* -- created by Apache's autoindex module. These files are used by Wget for retrieving your files recursively, but that's it. There should probably be an option for this in Wget.
The long option alternatives of Wget are easier to read, but don't help me much in remembering them.
$ wget --recursive --timestamping --no-host-directories \
--no-directories --no-parent --directory-prefix=nyc-2008 \
http://localhost/~ashawley/photos/nyc-2008/
Now this post will help me remember them.
In an idealized microkernel environment -- like GNU/Hurd, you could have a translator that converts the HTTP protocol to a file system that can be accessed the same as the other files on your machine. For copying, you would just use the command you're used to using for copying files.
$ cp -pr /http/localhost/~ashawley/photos/nyc-2008/ .
Or use your favorite more complex unix commands to get only the things you want.
$ find /http/localhost/~ashawley/photos/nyc-2008/ \
-type f -name '*.jpg' -size -1M -print0 \
| cpio -0 -pd nyc-2008
Someday I'll have my pie in the sky.
(no subject)
from: anonymous
date: Sep. 7th, 2008 09:26 am (UTC)
Link
doesn't Plan 9 offer that?
- R/db
Reply | Thread
re: Plan 9
from:
aaronhawley
date: Sep. 8th, 2008 01:52 pm (UTC)
Link
Reply | Parent | Thread