What's new

Tutorial How do I mirror another website

Status
Not open for further replies.

DCCS

Forum Expert
Joined
Jun 10, 2013
Posts
7,175
Solutions
22
Reaction
17,644
Points
3,471
Downloading a copy of an entire website ("mirroring") is a tricky business, because modern websites are so often built with PHP, ASP, CGI and other dynamic technologies that constantly update every page and often produceYou do not have permission to view the full content of this post. Log in or register now. that are only used once. And that can make the site appear infinitely large to website mirroring software.

Also, mirroring the web pages your browser sees on a site doesn't mean you'll get the "dynamic" behavior. if a website displays the current weather in Cleveland, mirroring the current pages will only get you a frozen "snapshot" of the weather on that particular day.

That said, though, there are tools available that will help you mirror basic websites that don't have these problems. The best-known of these is GNU wget, a free, open-source tool that can easily fetch an entire website with a single command. wget is not the friendliest tool in the world, but boy does it work!

Mirroring a website on Windows
If you are running Windows, I recommend Tech Knight's You do not have permission to view the full content of this post. Log in or register now.. Tech Knight offers step-by-step instructions to download and use the wgetsoftware on Windows.You do not have permission to view the full content of this post. Log in or register now. and compile your own copy from source.
Once you have wget installed correctly, the command line to mirror a website is:

wget -m -k -K -E You do not have permission to view the full content of this post. Log in or register now.

See man wget or wget --help | more for a detailed explanation of each option.

If this command seems to run forever, there may be parts of the site that generate an infinite series of different URLs. You can combat this in many ways, the simplest being to use the -l option to specify how many links "away" from the home page wget should travel. For instance, -l 3 will refuse to download pages more than three clicks away from the home page. You'll have to experiment with different values for -l. Consult man wget for additional workarounds.

Note: some web servers may be set up to "punish" users who download too much, too fast. If you're not careful, using tools like wget could get your IP address banned from the site. You can avoid this problem by using the -w option to specify a delay, in seconds, between page downloads. Usually, this will prevent the web server from viewing your behavior as unacceptable. But your mileage may vary!
Mirroring a website on MacOS X
Like Linux, MacOS X is a version of Unix. However, wget isn't standard equipment in all versions of MacOS X. If you receive an error message when you try the wget --help command at the MacOS X "Terminal" prompt, you can fetchwget from the You do not have permission to view the full content of this post. Log in or register now., which also offers "Simple wget," a user-friendly front end to wget. Most of the site is in Japanese, so some patience is necessary in picking your way through!
Of course, you can also install the developer tools from your MacOS X system CD (if you have not already done so) and then visit the You do not have permission to view the full content of this post. Log in or register now. to build and install wget from source code.

Once you have the command line version of wget for MacOS X installed, just follow my You do not have permission to view the full content of this post. Log in or register now. at the MacOS X Terminal prompt.

Offering Your Mirror To The World
Publicly mirroring someone else's website without their permission is a violation of copyright law. Don't do that.
If you have received their permission, it's easy to offer your mirror to the world. Just use the wget command to download it to a directory inside your own website's space. This is much easier if you have command line access to your own web server so that you can run wget there directly. But you can also upload the mirrored site to your server by dragging and dropping it to your usual file transfer program after wget is finished.

If you do offer a mirror of another site, make sure you link to the original and explain to users that this is a mirror and not the original. Also be sure to keep your mirror up to date. And once again, get the original site's permission first!

Example of website that i already mirrored.
You do not have permission to view the full content of this post. Log in or register now.

upload_2015-3-6_0-32-10.png
 

Attachments

Status
Not open for further replies.

Similar threads

Back
Top