Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

usapears: implement redundant-nutrient-value filtering #1294

Merged
merged 8 commits into from
Oct 17, 2024
28 changes: 28 additions & 0 deletions recipe_scrapers/usapears.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,7 @@
import re

from ._abstract import AbstractScraper
from ._exceptions import ElementNotFoundInHtml
from ._utils import get_minutes, normalize_string


Expand All @@ -25,6 +28,31 @@ def ingredients(self):
for paragraph in ingredient_elements
]

def nutrients(self):
container = self.soup.find("ul", {"itemprop": "nutrition"})
if not container:
raise ElementNotFoundInHtml("Could not find nutritional info container")
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'll merge this as-is, but a note about something I realized while self-reviewing my changes: we're a bit inconsistent throughout the codebase about usage of the arguments to the ElementNotFoundInHtml initializer, and what their purpose is. There might be an opportunity to improve the consistency/functionality there in future.


results = {}
redundant_pattern = r"<strong>(.+)[:] </strong>"
for item in container.find_all("li", {"itemprop": True}):
nutrient = item["itemprop"]
content = "".join(str(elem) for elem in item.children)
if re.match(redundant_pattern, content):
content = re.sub(redundant_pattern, "", content)
results[nutrient] = content

corrections = {
"carbohydrates": "carbohydrateContent",
"protein": "proteinContent",
"fat": "fatContent",
}
jayaddison marked this conversation as resolved.
Show resolved Hide resolved
for mistake, correction in corrections.items():
if mistake in results:
results[correction] = results.pop(mistake)

return results

def ratings(self):
try:
ratings = self.schema.ratings()
Expand Down
11 changes: 4 additions & 7 deletions tests/test_data/usapears.org/usapears.json
Original file line number Diff line number Diff line change
Expand Up @@ -23,14 +23,11 @@
"yields": "6 servings",
"description": "This simple, tasty pear recipe was created by Chef Jamie Lauren of Absinthe Brasserie and Bar in San Francisco and cookbook author Mollie Katzen. If you can’t find Bosc pears at your local grocery store, red or green Anjou pears also work well.",
"total_time": 25,
"ratings": null,
"ratings_count": 4,
"nutrients": {
"servingSize": "Serving Size: 1 Pear",
"calories": "Calories: 230",
"carbohydrateContent": "Carbohydrate: 34g",
"proteinContent": "Protein: 5g",
"fiberContent": "Dietary Fiber: 7g"
"servingSize": "1 Pear",
"calories": "230",
"carbohydrateContent": "34g",
"proteinContent": "5g"
},
"image": "https://usapears.org/wp-content/uploads/2014/10/Sauteed-Bosc-Pears1.jpg"
}
Loading
Loading