paper_firehose.core.database¶
Database management for the three-database approach: - all_feed_entries.db: All RSS entries for deduplication - matched_entries_history.db: Historical matches across all topics - papers.db: Current run processing data
Classes
|
Manages the three-database system for feed processing. |
- class paper_firehose.core.database.DatabaseManager(config)[source]¶
Bases:
objectManages the three-database system for feed processing.
- backup_important_databases()[source]¶
Backup history and all_feeds databases with timestamped rotation.
Writes timestamped backups alongside the source DBs in the runtime data directory.
Keeps up to 3 most recent backups per database, pruning older ones.
Returns a dict mapping logical db keys to the created backup file paths.
- clear_current_db()[source]¶
Clear the current run database.
Re-initialises the FTS index and triggers before deleting rows so that DELETE triggers can fire cleanly even if a previous migration or crash left the FTS table missing.
- close_all_connections()[source]¶
Close any open database connections (placeholder for connection pooling).
- get_connection(db_key='current', row_factory=True)[source]¶
Context manager for database connections with automatic commit/rollback.
- Parameters:
- Yields:
sqlite3.Connection – Database connection
Example
- with db.get_connection() as conn:
cursor = conn.cursor() cursor.execute(“SELECT * FROM entries”) # Auto-commits on success, auto-closes always
- get_current_entries(topic=None, status=None)[source]¶
Get entries from papers.db with optional filtering.
- get_entries_by_criteria(topic=None, min_rank=None, status=None, has_doi=None, order_by='rank_score DESC')[source]¶
Flexible query builder for entries with various criteria.
- Parameters:
- Return type:
- Returns:
List of sqlite3.Row objects with dict-like access
- purge_old_entries(days)[source]¶
Remove entries from the most recent N days (including today) based on publication date (YYYY-MM-DD).
- Parameters:
days (int)
- query_entries(db_key='current', topic=None, min_rank=None, status=None, has_doi=None, has_abstract=None, since=None, until=None, search=None, fuzzy=None, order_by='rank_score DESC', limit=20, offset=0)[source]¶
General-purpose query across any of the three databases.
- Parameters:
db_key (
str) –'current','history', or'all_feeds'topic (
Optional[str]) – Topic filter (exact match for current, LIKE for history)has_doi (
Optional[bool]) – If True only entries with DOI, if False only withouthas_abstract (
Optional[bool]) – If True only entries with abstractsince (
Optional[str]) – Published on or after this date (YYYY-MM-DD)until (
Optional[str]) – Published on or before this date (YYYY-MM-DD)search (
Optional[str]) – FTS5 keyword search on title + abstract/summary (supports phrases"...", prefixterm*, booleanAND/OR/NOT)fuzzy (
Optional[str]) – Fuzzy text search via FTS5 trigram (min 3 chars, mutually exclusive with search)order_by (
str) – SQL ORDER BY clauselimit (
int) – Max rows (0 = unlimited)offset (
int) – Skip first N rows
- Return type:
- Returns:
(rows, total_count)where rows is a list of dicts and total_count is the count before LIMIT/OFFSET.
- save_current_entry(entry, feed_name, topic, entry_id)[source]¶
Save an entry to papers.db for current run processing.
- save_feed_entry(entry, feed_name, entry_id)[source]¶
Save an entry to all_feed_entries.db with proper date formatting.
- save_matched_entry(entry, feed_name, topic, entry_id)[source]¶
Save a matched entry to matched_entries_history.db, merging topics if entry already exists.
- update_entry_rank(entry_id, topic, score, reasoning=None)[source]¶
Update rank_score (and optionally rank_reasoning) for a single entry.