Latest revision as of 14:30, 18 January 2026

Scrapbook-core

A searchable archive of everything read, starred, and shared online - and why 9,485 scraps is both too much and not enough.

The Problem

The frustration that drove this project:

Losing articles read 6 months ago
Can't remember where "that thing about X" came from
Bookmarks accumulate but never get searched
Browser history becomes useless after a certain point

If I star something, bookmark something, or post about it - it mattered to me. It should be as easy to search my own history as it is to Google something new.

What It Actually Does

Data Sources

Pulls from multiple platforms where content gets saved or shared:

GitHub stars - repositories that caught my attention
Pinboard bookmarks - saved links and articles
Mastodon posts - shared content and commentary
Are.na saves - collected ideas and inspiration

Processing Pipeline

AI summarizes each piece (OpenRouter, fallback to OpenAI)
Generates embeddings for semantic search
Makes it searchable via Alfred (⌘Space, type "sc [query]", instant results)

Architecture

Storage

Supabase for cloud storage and sync
SQLite mirror for instant local search
Docker deployment with health checks

Rate Limiting

Smart 6-level backoff system:

Respects API limits
Graceful degradation under load
Saves hundreds in API costs

The 16.7% Problem

Current status reveals an interesting tension:

9,485 total scraps captured
Only 1,584 have AI summaries (16.7%)
Massive backlog due to early design decisions

Still incredibly useful even incomplete. The backlog is itself interesting data - a record of what accumulated faster than it could be processed.

What I Learned

Design Lessons

Perfect is enemy of done - should have processed as items came in
Local search > cloud search for daily use
Alfred integration is the killer feature
Smart rate limiting prevented runaway costs

Philosophical Lessons

Building tools to understand yourself, not impress others. Information accumulation is pointless without retrieval. Your digital exhaust is valuable if you capture it right.

Semantic search reveals patterns you didn't know existed.

Connection to Other Systems

Scrapbook-core is part of a larger Quantified Self approach:

Capture automatically (no friction)
Store long-term (years of data)
Analyze occasionally (when curious)
Reveal patterns (invisible in daily use)

Works alongside Personal APIs for unified access to personal data, and feeds context to AI systems.

Future Directions

Process the 8,000 scrap backlog
Build "theme weaver" to group related scraps
Mobile interface for on-the-go exploration
Integration with other personal systems

🚀 Projects
Active	Projects · FPV Drones · NOAA Satellites · Website
Tools	Scrapbook-core · Exif-photo-printer · Coach Artie · Dataviz
Hardware	Meshtastic · HackRF · Flipper Zero
Frameworks	Timeline Viz · LLM Eval · Sensemaking Systems

@@ Line 1: / Line 1: @@
+= Scrapbook-core =
+'''A searchable archive of everything read, starred, and shared online - and why 9,485 scraps is both too much and not enough.'''
+== The Problem ==
+The frustration that drove this project:
+* Losing articles read 6 months ago
+* Can't remember where "that thing about X" came from
+* Bookmarks accumulate but never get searched
+* Browser history becomes useless after a certain point
+''If I star something, bookmark something, or post about it - it mattered to me. It should be as easy to search my own history as it is to Google something new.''
+== What It Actually Does ==
+=== Data Sources ===
+Pulls from multiple platforms where content gets saved or shared:
+* '''GitHub stars''' - repositories that caught my attention
+* '''Pinboard bookmarks''' - saved links and articles
+* '''Mastodon posts''' - shared content and commentary
+* '''Are.na saves''' - collected ideas and inspiration
+=== Processing Pipeline ===
+* AI summarizes each piece (OpenRouter, fallback to OpenAI)
+* Generates embeddings for semantic search
+* Makes it searchable via Alfred (⌘Space, type "sc [query]", instant results)
+== Architecture ==
+=== Storage ===
+* '''Supabase''' for cloud storage and sync
+* '''SQLite mirror''' for instant local search
+* '''Docker deployment''' with health checks
+=== Rate Limiting ===
+Smart 6-level backoff system:
+* Respects API limits
+* Graceful degradation under load
+* Saves hundreds in API costs
+== The 16.7% Problem ==
+Current status reveals an interesting tension:
+* '''9,485 total scraps''' captured
+* '''Only 1,584 have AI summaries''' (16.7%)
+* Massive backlog due to early design decisions
+''Still incredibly useful even incomplete. The backlog is itself interesting data - a record of what accumulated faster than it could be processed.''
+== What I Learned ==
+=== Design Lessons ===
+* '''Perfect is enemy of done''' - should have processed as items came in
+* '''Local search > cloud search''' for daily use
+* '''Alfred integration''' is the killer feature
+* '''Smart rate limiting''' prevented runaway costs
+=== Philosophical Lessons ===
+Building tools to understand yourself, not impress others. Information accumulation is pointless without retrieval. Your digital exhaust is valuable if you capture it right.
+''Semantic search reveals patterns you didn't know existed.''
+== Connection to Other Systems ==
+Scrapbook-core is part of a larger [[Quantified Self]] approach:
+* Capture automatically (no friction)
+* Store long-term (years of data)
+* Analyze occasionally (when curious)
+* Reveal patterns (invisible in daily use)
+Works alongside [[Personal APIs]] for unified access to personal data, and feeds context to AI systems.
+== Future Directions ==
+* Process the 8,000 scrap backlog
+* Build "theme weaver" to group related scraps
+* Mobile interface for on-the-go exploration
+* Integration with other personal systems
 [[Category:Projects]]
+[[Category:Personal Data]]
+[[Category:Knowledge Management]]
 {{Navbox Projects}}