paper_firehose.core.apis

API client modules for fetching abstracts from various sources.

This package provides a unified interface for fetching abstracts from: - Crossref - Semantic Scholar - OpenAlex - PubMed

paper_firehose.core.apis.get_crossref_abstract(doi, *, mailto, max_retries=3, session=None)[source]

Return the plain-text abstract for DOI or None if not available.

Implements exponential backoff on 429/5xx and honors Retry-After when present. Also sends Crossref the mailto parameter.

Parameters:
  • doi (str) – Digital Object Identifier to look up

  • mailto (str) – Contact email for Crossref User-Agent

  • max_retries (int) – Maximum number of retry attempts (default: 3)

  • session (Optional[Session]) – Optional requests.Session for backward compatibility

Return type:

Optional[str]

Returns:

Plain-text abstract or None if not available

paper_firehose.core.apis.get_openalex_abstract(doi, *, mailto, session=None)[source]

Fetch an abstract from OpenAlex by DOI, reconstructing when inverted-indexed.

Parameters:
  • doi (str) – Digital Object Identifier to look up

  • mailto (str) – Contact email for OpenAlex User-Agent

  • session (Optional[Session]) – Optional requests.Session for backward compatibility

Return type:

Optional[str]

Returns:

Plain-text abstract or None if not available

paper_firehose.core.apis.get_pubmed_abstract_by_doi(doi, *, session=None)[source]

Look up a DOI in PubMed and return the combined abstract text if available.

Uses ESearch to find PMID by DOI, then EFetch to retrieve the abstract XML.

Parameters:
  • doi (str) – Digital Object Identifier to look up

  • session (Optional[Session]) – Optional requests.Session for backward compatibility

Return type:

Optional[str]

Returns:

Plain-text abstract or None if not available

paper_firehose.core.apis.get_semantic_scholar_abstract(doi, *, session=None)[source]

Fetch abstract from Semantic Scholar Graph API by DOI (no key needed).

Parameters:
  • doi (str) – Digital Object Identifier to look up

  • session (Optional[Session]) – Optional requests.Session for backward compatibility

Return type:

Optional[str]

Returns:

Plain-text abstract or None if not available

paper_firehose.core.apis.search_crossref_abstract_by_title(title, *, mailto, max_retries=2, session=None)[source]

Best-effort abstract lookup by title when DOI is missing or returns no abstract.

Uses Crossref’s works search endpoint with a bibliographic query. Returns the first item’s abstract if available.

Parameters:
  • title (str) – Paper title to search for

  • mailto (str) – Contact email for Crossref User-Agent

  • max_retries (int) – Maximum number of retry attempts (default: 2)

  • session (Optional[Session]) – Optional requests.Session for backward compatibility

Return type:

Optional[str]

Returns:

Plain-text abstract or None if not available

Modules

crossref_client

Crossref API client for fetching paper abstracts.

openalex_client

OpenAlex API client for fetching paper abstracts.

pubmed_client

PubMed API client for fetching paper abstracts.

semantic_scholar_client

Semantic Scholar API client for fetching paper abstracts.