paper_firehose¶
Functions
|
Fetch abstracts for ranked entries and write to papers.db/history. |
|
Send an email digest generated from papers.db via SMTP. |
|
Export recent entries from matched_entries_history.db to a smaller database. |
|
Run the filter step programmatically. |
|
Generate HTML for one or all topics directly from papers.db. |
|
Generate HTML for one or all topics directly from papers.db. |
|
Run the paper-qa pipeline to download PDFs and write grounded summaries. |
|
Run the paper-qa pipeline to download PDFs and write grounded summaries. |
|
Purge entries from databases. |
|
Query paper databases and print results. |
|
Compute and write rank scores into papers.db for the given topic (or all). |
|
Return configuration and environment status for programmatic use. |
- paper_firehose.abstracts(topic=None, *, mailto=None, limit=None, rps=None, config_path=None)[source]¶
Fetch abstracts for ranked entries and write to papers.db/history.
- paper_firehose.email(topic=None, *, mode='auto', limit=None, recipients_file=None, dry_run=False, config_path=None)[source]¶
Send an email digest generated from papers.db via SMTP.
- paper_firehose.export_recent(days=60, output_name=None, config_path=None)[source]¶
Export recent entries from matched_entries_history.db to a smaller database.
Creates a filtered database containing only entries from the last N days for faster initial page loads in the history viewer HTML.
- paper_firehose.generate_html(topic=None, output_path=None, config_path=None)¶
Generate HTML for one or all topics directly from papers.db.
- Parameters:
topic (
Optional[str]) – Optional topic name. When omitted, HTML is produced for all topics defined in the configuration.output_path (
Optional[str]) – Optional output path. Only valid when topic is provided; when generating all topics the configured filenames are used.config_path (
Optional[str]) – Path to main YAML config; defaults to repo config.
- Return type:
- paper_firehose.html(topic=None, output_path=None, config_path=None)[source]¶
Generate HTML for one or all topics directly from papers.db.
- Parameters:
topic (
Optional[str]) – Optional topic name. When omitted, HTML is produced for all topics defined in the configuration.output_path (
Optional[str]) – Optional output path. Only valid when topic is provided; when generating all topics the configured filenames are used.config_path (
Optional[str]) – Path to main YAML config; defaults to repo config.
- Return type:
- paper_firehose.paperqa_summary(topic=None, *, rps=None, limit=None, arxiv=None, entry_ids=None, use_history=False, history_date=None, history_feed_like=None, config_path=None)¶
Run the paper-qa pipeline to download PDFs and write grounded summaries.
- Parameters:
topic (
Optional[str]) – Optional topic name to target ranked entries; when omitted and no IDs are supplied, all configured topics are scanned.rps (
Optional[float]) – Optional requests-per-second override for arXiv lookups/downloads.limit (
Optional[int]) – Optional cap on number of ranked entries per topic.arxiv (
Optional[List[str]]) – Optional list of arXiv IDs/URLs to process directly (bypass ranking).entry_ids (
Optional[List[str]]) – Optional list of database entry IDs to summarize (history lookup).use_history (
bool) – When True, resolve entry_ids against the history database.history_date (
Optional[str]) – Optional YYYY-MM-DD filter when querying history records.history_feed_like (
Optional[str]) – Optional substring filter for history feed names.config_path (
Optional[str]) – Path to main YAML config; defaults to repo config.
- Return type:
- paper_firehose.pqa_summary(topic=None, *, rps=None, limit=None, arxiv=None, entry_ids=None, use_history=False, history_date=None, history_feed_like=None, config_path=None)[source]¶
Run the paper-qa pipeline to download PDFs and write grounded summaries.
- Parameters:
topic (
Optional[str]) – Optional topic name to target ranked entries; when omitted and no IDs are supplied, all configured topics are scanned.rps (
Optional[float]) – Optional requests-per-second override for arXiv lookups/downloads.limit (
Optional[int]) – Optional cap on number of ranked entries per topic.arxiv (
Optional[List[str]]) – Optional list of arXiv IDs/URLs to process directly (bypass ranking).entry_ids (
Optional[List[str]]) – Optional list of database entry IDs to summarize (history lookup).use_history (
bool) – When True, resolve entry_ids against the history database.history_date (
Optional[str]) – Optional YYYY-MM-DD filter when querying history records.history_feed_like (
Optional[str]) – Optional substring filter for history feed names.config_path (
Optional[str]) – Path to main YAML config; defaults to repo config.
- Return type:
- paper_firehose.purge(days=None, all_data=False, config_path=None)[source]¶
Purge entries from databases.
- Parameters:
days (
Optional[int]) – When provided, removes entries whose published_date falls within the most recent N days (including today) across all databases.all_data (
bool) – If True, clears all databases and reinitializes schemas.config_path (
Optional[str]) – Path to main YAML config; defaults to repo config.
- Return type:
- paper_firehose.query(*, history=False, all_feeds=False, topic=None, min_rank=None, since=None, until=None, search=None, status=None, has_doi=False, has_abstract=False, sort='rank', limit=20, offset=0, json=False, count=False, fields=None, config_path=None)[source]¶
Query paper databases and print results.
- Parameters:
history (
bool) – Query matched_entries_history.db instead of papers.db.all_feeds (
bool) – Query all_feed_entries.db instead of papers.db.since (
Optional[str]) – Published on or after this date (YYYY-MM-DD).until (
Optional[str]) – Published on or before this date (YYYY-MM-DD).search (
Optional[str]) – Case-insensitive text search on title and abstract.status (
Optional[str]) – Filter by entry status (current DB only).has_doi (
bool) – Only entries with a DOI.has_abstract (
bool) – Only entries with an abstract.sort (
str) – Sort key: ‘rank’, ‘date’, or ‘title’.limit (
int) – Max results (0 = unlimited).offset (
int) – Skip first N results.json (
bool) – Output as JSON.count (
bool) – Print count only.fields (
Optional[str]) – Comma-separated column names to include.config_path (
Optional[str]) – Path to main YAML config; defaults to repo config.
- Return type:
- paper_firehose.rank(topic=None, config_path=None)[source]¶
Compute and write rank scores into papers.db for the given topic (or all).
- paper_firehose.status(config_path=None)[source]¶
Return configuration and environment status for programmatic use.
Modules
Command-line entry point for Paper Firehose. |
|