You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Both rss-fetcher and story-indexer contain tests for non-news URLs based on the NON_NEWS_DOMAINS list from urls.py
rss-fetcher uses:
tasks.py: if s.domain in mcmetadata.urls.NON_NEWS_DOMAINS:
which only catches cases where the fully qualified domain name (FQDN) is EXACTLY what appears in NON_NEWS_DOMAINS, while story-indexer has a utility function that also matches anything INSIDE the embargoed domains:
def non_news_fqdn(fqdn: str) -> bool:
"""
check if a FQDN (fully qualified domain name, ie; DNS name)
is (in) a domain embargoed as "non-news"
maybe belongs in mcmetadata??
"""
# could be written as "any" on a comprehension:
# looks like that's 15% slower in Python 3.10,
# and harder to for me to... comprehend!
fqdn = fqdn.lower()
for nnd in NON_NEWS_DOMAINS:
if fqdn == nnd or fqdn.endswith("." + nnd):
return True
return False
I'd like to be able to use this function in rss-fetcher!
NOTE: this code assumes NON_NEWS_DOMAINS is all lower case which is currently.... the case, but that is not enforced/guaranteed, so maybe that could be added as well?!
The text was updated successfully, but these errors were encountered:
Both rss-fetcher and story-indexer contain tests for non-news URLs based on the NON_NEWS_DOMAINS list from urls.py
rss-fetcher uses:
which only catches cases where the fully qualified domain name (FQDN) is EXACTLY what appears in NON_NEWS_DOMAINS, while story-indexer has a utility function that also matches anything INSIDE the embargoed domains:
I'd like to be able to use this function in rss-fetcher!
NOTE: this code assumes NON_NEWS_DOMAINS is all lower case which is currently.... the case, but that is not enforced/guaranteed, so maybe that could be added as well?!
The text was updated successfully, but these errors were encountered: