wayback-machine-downloader
Download an entire website from the Internet Archive Wayback Machine
Description
wayback-machine-downloader is a command-line tool that downloads an entire website from the Internet Archive's Wayback Machine. It recreates the directory structure, auto-generates index.html pages for seamless use with Apache and Nginx, and downloads original unmodified files rather than Wayback Machine rewritten versions. Supports filtering by timestamp ranges, URL patterns via regex, file types, and concurrent downloads for faster retrieval.
Install
gem install wayback_machine_downloaderdocker pull hartator/wayback-machine-downloaderAI Summary
Download complete websites from the Internet Archive Wayback Machine with directory structure reconstruction and filtering options.
Capabilities
- + Download an entire website from Wayback Machine archives
- + Recreate original directory structure with auto-generated index.html files
- + Download original files, not Wayback Machine rewritten versions
- + Filter by timestamp range to get snapshots from specific dates
- + Filter by URL patterns using regular expressions
- + Exclude specific file types or paths
- + Concurrent downloads for faster retrieval
- + JSON output listing available files without downloading
- + Exact URL mode for precise downloads
Use When
- → When you need to recover a website that is no longer online
- → When you want to download a historical snapshot of a website
- → When you need to archive a website from the Wayback Machine
- → When researching how a website looked at a specific point in time
Avoid When
- x When the website was never archived by the Wayback Machine
- x When you need to scrape a live website (use wget or httrack instead)
- x When you need only a single page rather than an entire site
Usage Patterns
Download an entire site
wayback_machine_downloader http://example.com Downloads the latest version of every file from example.com on the Wayback Machine
Download from a specific time period
wayback_machine_downloader http://example.com --from 20140101 --to 20141231 Downloads only snapshots from the year 2014
Filter by file type
wayback_machine_downloader http://example.com --only '/\.pdf$/' Downloads only PDF files from the archived site
Concurrent downloads
wayback_machine_downloader http://example.com --concurrency 10 Uses 10 concurrent connections for faster downloading
List files without downloading
wayback_machine_downloader http://example.com --list Lists available files as JSON without downloading them