paper_firehose

Functions

abstracts([topic, mailto, limit, rps, ...])

Fetch abstracts for ranked entries and write to papers.db/history.

email([topic, mode, limit, recipients_file, ...])

Send an email digest generated from papers.db via SMTP.

export_recent([days, output_name, config_path])

Export recent entries from matched_entries_history.db to a smaller database.

filter([topic, config_path])

Run the filter step programmatically.

generate_html([topic, output_path, config_path])

Generate HTML for one or all topics directly from papers.db.

html([topic, output_path, config_path])

Generate HTML for one or all topics directly from papers.db.

paperqa_summary([topic, rps, limit, arxiv, ...])

Run the paper-qa pipeline to download PDFs and write grounded summaries.

pqa_summary([topic, rps, limit, arxiv, ...])

Run the paper-qa pipeline to download PDFs and write grounded summaries.

purge([days, all_data, config_path])

Purge entries from databases.

query(*[, history, all_feeds, topic, ...])

Query paper databases and print results.

rank([topic, config_path])

Compute and write rank scores into papers.db for the given topic (or all).

status([config_path])

Return configuration and environment status for programmatic use.

paper_firehose.abstracts(topic=None, *, mailto=None, limit=None, rps=None, config_path=None)[source]

Fetch abstracts for ranked entries and write to papers.db/history.

Parameters:
  • topic (Optional[str]) – Restrict to a single topic (optional)

  • mailto (Optional[str]) – Contact email for Crossref UA (optional)

  • limit (Optional[int]) – Max abstracts per topic (optional)

  • rps (Optional[float]) – Requests/second throttle (optional)

  • config_path (Optional[str]) – Path to config (optional)

Return type:

None

paper_firehose.email(topic=None, *, mode='auto', limit=None, recipients_file=None, dry_run=False, config_path=None)[source]

Send an email digest generated from papers.db via SMTP.

Return type:

None

Parameters:
  • topic (str | None)

  • mode (str)

  • limit (int | None)

  • recipients_file (str | None)

  • dry_run (bool)

  • config_path (str | None)

paper_firehose.export_recent(days=60, output_name=None, config_path=None)[source]

Export recent entries from matched_entries_history.db to a smaller database.

Creates a filtered database containing only entries from the last N days for faster initial page loads in the history viewer HTML.

Parameters:
  • days (int) – Number of days to include (default: 60)

  • output_name (Optional[str]) – Optional output filename (default: matched_entries_history.recent.db)

  • config_path (Optional[str]) – Path to main YAML config; defaults to repo config.

Return type:

None

paper_firehose.filter(topic=None, config_path=None)[source]

Run the filter step programmatically.

Parameters:
  • topic (Optional[str]) – Optional topic name to process; if None, process all topics.

  • config_path (Optional[str]) – Path to main YAML config; defaults to repo config.

Return type:

None

paper_firehose.generate_html(topic=None, output_path=None, config_path=None)

Generate HTML for one or all topics directly from papers.db.

Parameters:
  • topic (Optional[str]) – Optional topic name. When omitted, HTML is produced for all topics defined in the configuration.

  • output_path (Optional[str]) – Optional output path. Only valid when topic is provided; when generating all topics the configured filenames are used.

  • config_path (Optional[str]) – Path to main YAML config; defaults to repo config.

Return type:

None

paper_firehose.html(topic=None, output_path=None, config_path=None)[source]

Generate HTML for one or all topics directly from papers.db.

Parameters:
  • topic (Optional[str]) – Optional topic name. When omitted, HTML is produced for all topics defined in the configuration.

  • output_path (Optional[str]) – Optional output path. Only valid when topic is provided; when generating all topics the configured filenames are used.

  • config_path (Optional[str]) – Path to main YAML config; defaults to repo config.

Return type:

None

paper_firehose.paperqa_summary(topic=None, *, rps=None, limit=None, arxiv=None, entry_ids=None, use_history=False, history_date=None, history_feed_like=None, config_path=None)

Run the paper-qa pipeline to download PDFs and write grounded summaries.

Parameters:
  • topic (Optional[str]) – Optional topic name to target ranked entries; when omitted and no IDs are supplied, all configured topics are scanned.

  • rps (Optional[float]) – Optional requests-per-second override for arXiv lookups/downloads.

  • limit (Optional[int]) – Optional cap on number of ranked entries per topic.

  • arxiv (Optional[List[str]]) – Optional list of arXiv IDs/URLs to process directly (bypass ranking).

  • entry_ids (Optional[List[str]]) – Optional list of database entry IDs to summarize (history lookup).

  • use_history (bool) – When True, resolve entry_ids against the history database.

  • history_date (Optional[str]) – Optional YYYY-MM-DD filter when querying history records.

  • history_feed_like (Optional[str]) – Optional substring filter for history feed names.

  • config_path (Optional[str]) – Path to main YAML config; defaults to repo config.

Return type:

None

paper_firehose.pqa_summary(topic=None, *, rps=None, limit=None, arxiv=None, entry_ids=None, use_history=False, history_date=None, history_feed_like=None, config_path=None)[source]

Run the paper-qa pipeline to download PDFs and write grounded summaries.

Parameters:
  • topic (Optional[str]) – Optional topic name to target ranked entries; when omitted and no IDs are supplied, all configured topics are scanned.

  • rps (Optional[float]) – Optional requests-per-second override for arXiv lookups/downloads.

  • limit (Optional[int]) – Optional cap on number of ranked entries per topic.

  • arxiv (Optional[List[str]]) – Optional list of arXiv IDs/URLs to process directly (bypass ranking).

  • entry_ids (Optional[List[str]]) – Optional list of database entry IDs to summarize (history lookup).

  • use_history (bool) – When True, resolve entry_ids against the history database.

  • history_date (Optional[str]) – Optional YYYY-MM-DD filter when querying history records.

  • history_feed_like (Optional[str]) – Optional substring filter for history feed names.

  • config_path (Optional[str]) – Path to main YAML config; defaults to repo config.

Return type:

None

paper_firehose.purge(days=None, all_data=False, config_path=None)[source]

Purge entries from databases.

Parameters:
  • days (Optional[int]) – When provided, removes entries whose published_date falls within the most recent N days (including today) across all databases.

  • all_data (bool) – If True, clears all databases and reinitializes schemas.

  • config_path (Optional[str]) – Path to main YAML config; defaults to repo config.

Return type:

None

paper_firehose.query(*, history=False, all_feeds=False, topic=None, min_rank=None, since=None, until=None, search=None, status=None, has_doi=False, has_abstract=False, sort='rank', limit=20, offset=0, json=False, count=False, fields=None, config_path=None)[source]

Query paper databases and print results.

Parameters:
  • history (bool) – Query matched_entries_history.db instead of papers.db.

  • all_feeds (bool) – Query all_feed_entries.db instead of papers.db.

  • topic (Optional[str]) – Filter by topic name.

  • min_rank (Optional[float]) – Minimum rank_score threshold.

  • since (Optional[str]) – Published on or after this date (YYYY-MM-DD).

  • until (Optional[str]) – Published on or before this date (YYYY-MM-DD).

  • search (Optional[str]) – Case-insensitive text search on title and abstract.

  • status (Optional[str]) – Filter by entry status (current DB only).

  • has_doi (bool) – Only entries with a DOI.

  • has_abstract (bool) – Only entries with an abstract.

  • sort (str) – Sort key: ‘rank’, ‘date’, or ‘title’.

  • limit (int) – Max results (0 = unlimited).

  • offset (int) – Skip first N results.

  • json (bool) – Output as JSON.

  • count (bool) – Print count only.

  • fields (Optional[str]) – Comma-separated column names to include.

  • config_path (Optional[str]) – Path to main YAML config; defaults to repo config.

Return type:

None

paper_firehose.rank(topic=None, config_path=None)[source]

Compute and write rank scores into papers.db for the given topic (or all).

Return type:

None

Parameters:
  • topic (str | None)

  • config_path (str | None)

paper_firehose.status(config_path=None)[source]

Return configuration and environment status for programmatic use.

Return type:

Dict[str, Any]

Parameters:

config_path (str | None)

Modules

cli

Command-line entry point for Paper Firehose.

commands

core

processors