paper_firehose.processors.feed_processor

RSS feed processing functionality. Fetches RSS feeds, applies regex filters, and manages entry storage.

Classes

FeedProcessor(db_manager, config_manager)

Processes RSS feeds with regex filtering and database storage.

class paper_firehose.processors.feed_processor.FeedProcessor(db_manager, config_manager)[source]

Bases: object

Processes RSS feeds with regex filtering and database storage.

Parameters:
apply_filters(entries_per_feed, topic_name)[source]

Apply regex filters to entries and return matched entries.

Parameters:
  • entries_per_feed (Dict[str, List[Dict[str, Any]]]) – Dict mapping feed names to entry lists

  • topic_name (str) – Name of the topic to filter for

Return type:

List[Dict[str, Any]]

Returns:

List of entries that match the topic’s regex filter

fetch_feeds(topic_name)[source]

Fetch RSS feeds for a topic and return new entries.

Return type:

Dict[str, List[Dict[str, Any]]]

Returns:

Dict mapping feed names to lists of new entries

Parameters:

topic_name (str)

save_all_entries_to_dedup_db(all_entries_per_feed)[source]

Save ALL processed entries to all_feed_entries.db for deduplication.

Parameters:

all_entries_per_feed (Dict[str, List[Dict[str, Any]]])