You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Using scrapy package, playwright, and trafilatura. Getting this error on certain pages.
The KeyError: None indicates that the code is trying to access a key in the HTML_TAG_MAPPING dictionary using a value that is [None]. This error occurs in the trafilatura library, specifically in the htmlprocessing.py file.
To fix this issue, you need to ensure that the elem.get('rend') call does not return [None]. If it does, you should handle this case appropriately.
Here is a step-by-step plan to address this issue:
Locate the Code: Identify where the elem.get('rend') call is made.
Handle [None] Values: Add a check to handle cases where elem.get('rend') returns [None]
Example Fix
In the trafilatura/htmlprocessing.py file, locate the following line:
Here, default_value should be a valid key in the HTML_TAG_MAPPING dictionary that you want to use as a fallback.
Summary
Locate the Issue: Find where elem.get('rend') is called.
Handle [None] Values: Use HTML_TAG_MAPPING.get(elem.get('rend'), default_value) to handle cases where elem.get('rend') returns None.
By making this change, you can avoid the KeyError: None and ensure that the code handles cases where elem.get('rend') is None.
The text was updated successfully, but these errors were encountered:
Using scrapy package, playwright, and trafilatura. Getting this error on certain pages.
The KeyError: None indicates that the code is trying to access a key in the HTML_TAG_MAPPING dictionary using a value that is [None]. This error occurs in the trafilatura library, specifically in the htmlprocessing.py file.
To fix this issue, you need to ensure that the elem.get('rend') call does not return [None]. If it does, you should handle this case appropriately.
Here is a step-by-step plan to address this issue:
Example Fix
In the trafilatura/htmlprocessing.py file, locate the following line:
"hi": lambda elem: HTML_TAG_MAPPING[elem.get('rend')]
Update it to handle None values:
"hi": lambda elem: HTML_TAG_MAPPING.get(elem.get('rend'), default_value)
Here, default_value should be a valid key in the HTML_TAG_MAPPING dictionary that you want to use as a fallback.
Summary
Locate the Issue: Find where elem.get('rend') is called.
Handle [None] Values: Use HTML_TAG_MAPPING.get(elem.get('rend'), default_value) to handle cases where elem.get('rend') returns None.
By making this change, you can avoid the KeyError: None and ensure that the code handles cases where elem.get('rend') is None.
The text was updated successfully, but these errors were encountered: