Jump to content

ArchiveBox: Difference between revisions

From Archive
Created page with "{{AI-Generated|Claude generated the initial structure and examples for this documentation|tool=Claude}} = ArchiveBox Installation = '''ArchiveBox''' is a self-hosted web archive service that preserves websites, articles, and online content for offline viewing and long-term storage. Our instance runs at [https://snap.ejfox.com snap.ejfox.com]. == Overview == ArchiveBox creates multiple backup formats for each archived URL: * HTML snapshots * PDF captures * Screensho..."
 
No edit summary
 
Line 15: Line 15:
* Archive.org submissions
* Archive.org submissions


== Installation ==
=== Docker Setup ===
<pre>
# Create data directory
mkdir -p ~/archivebox-data
cd ~/archivebox-data
# Initialize ArchiveBox
docker run -v $PWD:/data archivebox/archivebox init
# Create admin user
docker run -v $PWD:/data archivebox/archivebox manage createsuperuser
# Start server
docker run -d \
  --name archivebox \
  -p 8000:8000 \
  -v $PWD:/data \
  archivebox/archivebox server 0.0.0.0:8000
</pre>
=== Reverse Proxy Configuration ===
Add to Caddy configuration:
<pre>
snap.ejfox.com {
    reverse_proxy localhost:8000
}
</pre>
== API Usage ==
=== Authentication ===
First, obtain an API token from the admin interface at <code>https://snap.ejfox.com/admin/</code> under "Authentication and Authorization" → "Tokens".


=== Archive a URL ===
=== Archive a URL ===

Latest revision as of 17:29, 31 May 2025

🤖 AI-Generated Content
Claude generated the initial structure and examples for this documentation

Claude

ArchiveBox Installation

ArchiveBox is a self-hosted web archive service that preserves websites, articles, and online content for offline viewing and long-term storage. Our instance runs at snap.ejfox.com.

Overview

ArchiveBox creates multiple backup formats for each archived URL:

  • HTML snapshots
  • PDF captures
  • Screenshot images
  • Video recordings
  • Raw HTML/CSS/JS files
  • Archive.org submissions


Archive a URL

curl -X POST https://snap.ejfox.com/api/v1/core/snapshot/ \
  -H "Authorization: Token your_api_token_here" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://ejfox.com", "tags": "personal,website"}'

Check if URL is Archived

# Method 1: API query
curl -s "https://snap.ejfox.com/api/v1/core/snapshot/?url=https://ejfox.com" \
  -H "Authorization: Token your_api_token" | jq '.count > 0'

# Method 2: Simple HTTP check
curl -s -o /dev/null -w "%{http_code}" \
  "https://snap.ejfox.com/archive/https://ejfox.com"
# Returns: 200 if archived, 404 if not

Search Archives

# Search by URL
curl "https://snap.ejfox.com/api/v1/core/snapshot/?search=ejfox.com" \
  -H "Authorization: Token your_api_token"

# Search by tags
curl "https://snap.ejfox.com/api/v1/core/snapshot/?tag=dataviz" \
  -H "Authorization: Token your_api_token"

List Recent Archives

curl "https://snap.ejfox.com/api/v1/core/snapshot/?limit=10&ordering=-created" \
  -H "Authorization: Token your_api_token" | jq '.results[] | {url, created, tags}'

Browser Integration

Bookmarklet

Create a bookmark with this JavaScript for one-click archiving:

javascript:(function(){
  const url = window.location.href;
  const title = document.title;
  
  fetch('https://snap.ejfox.com/api/v1/core/snapshot/', {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'Authorization': 'Token your_api_token_here'
    },
    body: JSON.stringify({
      url: url,
      tags: 'manual,bookmarklet',
      title: title
    })
  })
  .then(resp => resp.json())
  .then(data => {
    alert(`Archived! View at: ${data.archive_url}`);
  })
  .catch(err => alert('Archive failed: ' + err));
})();

MediaWiki Integration

Archive Template

Create Template:Archive:

[https://snap.ejfox.com/search?q={{{1}}} 📸 Search Archives] | [https://snap.ejfox.com/add?url={{{1}}} ➕ Add to Archive]

Usage in articles:

Check out this article {{Archive|https://example.com}} about data visualization.

Maintenance

Health Check

# Check if service is running
curl -s https://snap.ejfox.com/health/ || echo "ArchiveBox down!"

# Check recent archives count
curl -s "https://snap.ejfox.com/api/v1/core/snapshot/?limit=1" \
  -H "Authorization: Token $API_TOKEN" | jq '.count'

Backup Archives

# Backup ArchiveBox data
tar -czf archivebox-backup-$(date +%Y%m%d).tar.gz ~/archivebox-data/

# Sync to remote storage
rsync -av ~/archivebox-data/ user@backup-server:/backups/archivebox/

Related