Download entire contents of a website with wget script
Unless you’re a web developer or trying to do something mischievous, you probably won’t find youself needing to download the entire contents of a website very often. I happen to be the former, and recently I offered to help a friend solve some CSS styling problems she was having with her page. Rather than having her email or send me the source some other way, I figured it would be quicker and easier to get it straight from her site. I used the following wget command to do this:
wget –recursive –no-clobber –page-requisites –html-extension –convert-links –restrict-file-names=windows –domains website.org –no-parent website.org
If you need to do this very often, the above command might be a little cumbersome to type, as well as remember. To alleviate this, I wrote a little script to take care of this for you:
#!/bin/bash if [ -n "$1" ] && [ -n "$2" ] then echo "Downloading site from $2..."; wget --recursive \ --no-clobber \ --page-requisites \ --html-extension \ --convert-links \ --restrict-file-names=windows \ --domains $1 \ --no-parent \ $2; else echo "Missing an argument!"; echo "Usage: dl-site www.somedomain.com www.somedomain.com/specificfolder/"; fi |
Save the above code to a file and call it “dl-site”, then give it executable permissions. Using this script, you can issue a command like
dl-site www.brandonfoltz.com www.brandonfoltz.com/somefolder/
to download an entire site directory, and save it’s contents to a directory that is created with the same name as the website you’re downloading from. The first parameter tells wget not to download anything outside of www.brandonfoltz.com, the second tells which specific folder to download from (if you wish to do that). If you just want the entire site, this command is sufficient:
dl-site somesite.com somesite.com
The commands and script mentioned in this article should work on any *nix system (Linux, Mac OSX). Windows users aren’t so fortunate to have these tools installed by default.