paper_firehose.core.abstract_source

Abstract source interface using Python Protocol for structural subtyping.

Provides a unified interface for fetching abstracts from various sources (Crossref, Semantic Scholar, OpenAlex, PubMed) with fallback support.

Functions

get_biomedical_sources()

Return abstract sources optimized for biomedical papers.

get_default_sources()

Return default list of abstract sources in priority order.

Classes

AbstractSource(*args, **kwargs)

Protocol defining the interface for abstract fetching sources.

CrossrefSource([max_retries])

Crossref abstract source with DOI lookup and title search.

OpenAlexSource()

OpenAlex abstract source with inverted-index reconstruction.

PubMedSource()

PubMed abstract source (DOI-based lookup via ESearch + EFetch).

SemanticScholarSource()

Semantic Scholar abstract source (DOI-based lookup).

class paper_firehose.core.abstract_source.AbstractSource(*args, **kwargs)[source]

Bases: Protocol

Protocol defining the interface for abstract fetching sources.

All abstract sources should implement fetch_abstract() method that accepts DOI, title, and optional parameters, returning the abstract text or None if not found.

fetch_abstract(doi=None, title=None, mailto=None, session=None)[source]

Fetch abstract from this source.

Parameters:
  • doi (Optional[str]) – Digital Object Identifier (optional)

  • title (Optional[str]) – Paper title (optional)

  • mailto (Optional[str]) – Contact email for polite API usage (optional)

  • session (Optional[Session]) – requests.Session for connection pooling (optional)

Return type:

Optional[str]

Returns:

Abstract text or None if not found

class paper_firehose.core.abstract_source.CrossrefSource(max_retries=3)[source]

Bases: object

Crossref abstract source with DOI lookup and title search.

Parameters:

max_retries (int)

fetch_abstract(doi=None, title=None, mailto=None, session=None)[source]

Fetch abstract from Crossref by DOI or title.

Return type:

Optional[str]

Parameters:
  • doi (str | None)

  • title (str | None)

  • mailto (str | None)

  • session (Session | None)

class paper_firehose.core.abstract_source.OpenAlexSource[source]

Bases: object

OpenAlex abstract source with inverted-index reconstruction.

fetch_abstract(doi=None, title=None, mailto=None, session=None)[source]

Fetch abstract from OpenAlex by DOI.

Return type:

Optional[str]

Parameters:
  • doi (str | None)

  • title (str | None)

  • mailto (str | None)

  • session (Session | None)

class paper_firehose.core.abstract_source.PubMedSource[source]

Bases: object

PubMed abstract source (DOI-based lookup via ESearch + EFetch).

fetch_abstract(doi=None, title=None, mailto=None, session=None)[source]

Fetch abstract from PubMed by DOI.

Return type:

Optional[str]

Parameters:
  • doi (str | None)

  • title (str | None)

  • mailto (str | None)

  • session (Session | None)

class paper_firehose.core.abstract_source.SemanticScholarSource[source]

Bases: object

Semantic Scholar abstract source (DOI-based lookup).

fetch_abstract(doi=None, title=None, mailto=None, session=None)[source]

Fetch abstract from Semantic Scholar by DOI.

Return type:

Optional[str]

Parameters:
  • doi (str | None)

  • title (str | None)

  • mailto (str | None)

  • session (Session | None)

paper_firehose.core.abstract_source.get_biomedical_sources()[source]

Return abstract sources optimized for biomedical papers.

Order: PubMed (best for PNAS/biomedical), Crossref, Semantic Scholar, OpenAlex.

Return type:

list[AbstractSource]

Returns:

List of AbstractSource instances

paper_firehose.core.abstract_source.get_default_sources()[source]

Return default list of abstract sources in priority order.

Order: Crossref (most comprehensive), Semantic Scholar, OpenAlex, PubMed.

Return type:

list[AbstractSource]

Returns:

List of AbstractSource instances