paper_firehose.commands.abstracts¶
Fetch abstracts and populate both papers.db (entries.abstract) and matched_entries_history.db (matched_entries.abstract).
Rules¶
First pass fills arXiv/cond-mat abstracts from summary (no threshold).
Then for rows with
rank_score >= threshold: Crossref (DOI, then title search), followed by aggregator fallbacks (Semantic Scholar, OpenAlex, PubMed).Only process topics where the topic YAML has
abstract_fetch.enabled: true.Use per-topic
abstract_fetch.rank_thresholdif set; otherwise fall back to globaldefaults.rank_thresholdinconfig.yaml.Respect API rate limits; include a descriptive
User-Agentwith contact email and obeyRetry-Afteron 429/503 responses. Default to ~1 request/second.
Functions
|
Fetch and write abstracts into papers.db for ranked entries. |
- paper_firehose.commands.abstracts.run(config_path, topic=None, *, mailto=None, max_per_topic=None, rps=1.0, output_json=False)[source]¶
Fetch and write abstracts into papers.db for ranked entries.
- Parameters:
config_path (
str) – Path to the main configuration filetopic (
Optional[str]) – Optional single topic; otherwise process all topicsmailto (
Optional[str]) – Contact email for Crossref User-Agentmax_per_topic (
Optional[int]) – Optional cap on number of fetches per topicrps (
float) – Requests per second throttle (default ~1 req/s)output_json (
bool) – When True, suppress log noise and return a result dict.
- Return type:
- Returns:
Result dict when output_json is True, otherwise None.