wayback-machine-downloader

Download an entire website from the Internet Archive Wayback Machine

Data & JSON Cloud & Services linuxmacoswindows Ruby MIT

Description

wayback-machine-downloader is a command-line tool that downloads an entire website from the Internet Archive's Wayback Machine. It recreates the directory structure, auto-generates index.html pages for seamless use with Apache and Nginx, and downloads original unmodified files rather than Wayback Machine rewritten versions. Supports filtering by timestamp ranges, URL patterns via regex, file types, and concurrent downloads for faster retrieval.

When to use this tool

✓ Good fit when

When you need to recover a website that is no longer online
When you want to download a historical snapshot of a website
When you need to archive a website from the Wayback Machine
When researching how a website looked at a specific point in time

✕ Avoid when

When the website was never archived by the Wayback Machine
When you need to scrape a live website (use wget or httrack instead)
When you need only a single page rather than an entire site

Install

gemgem install wayback_machine_downloader

dockerdocker pull hartator/wayback-machine-downloader

AI Summary

Download complete websites from the Internet Archive Wayback Machine with directory structure reconstruction and filtering options.

Capabilities

+ Download an entire website from Wayback Machine archives
+ Recreate original directory structure with auto-generated index.html files
+ Download original files, not Wayback Machine rewritten versions
+ Filter by timestamp range to get snapshots from specific dates
+ Filter by URL patterns using regular expressions
+ Exclude specific file types or paths
+ Concurrent downloads for faster retrieval
+ JSON output listing available files without downloading
+ Exact URL mode for precise downloads

Use When

→ When you need to recover a website that is no longer online
→ When you want to download a historical snapshot of a website
→ When you need to archive a website from the Wayback Machine
→ When researching how a website looked at a specific point in time

Avoid When

x When the website was never archived by the Wayback Machine
x When you need to scrape a live website (use wget or httrack instead)
x When you need only a single page rather than an entire site

Usage Patterns

Download an entire site

wayback_machine_downloader http://example.com

Downloads the latest version of every file from example.com on the Wayback Machine

Download from a specific time period

wayback_machine_downloader http://example.com --from 20140101 --to 20141231

Downloads only snapshots from the year 2014

Filter by file type

wayback_machine_downloader http://example.com --only '/\.pdf$/'

Downloads only PDF files from the archived site

Concurrent downloads

wayback_machine_downloader http://example.com --concurrency 10

Uses 10 concurrent connections for faster downloading

List files without downloading

wayback_machine_downloader http://example.com --list

Lists available files as JSON without downloading them

View AGENTS.md for wayback-machine-downloader