Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add americastestkitchen.com #1060

Merged
merged 10 commits into from
Apr 22, 2024
Merged

add americastestkitchen.com #1060

merged 10 commits into from
Apr 22, 2024

Conversation

smilerz
Copy link
Contributor

@smilerz smilerz commented Apr 12, 2024

ATK is behind a paywall - so this only makes sense to add as part of v15

add tests for ATK scraper
@smilerz
Copy link
Contributor Author

smilerz commented Apr 16, 2024

I have three questions/requests for feedback on the latest commit.
Is there a way to use HTMLTagStripperPlugin here?
headnote = f"Note: {normalize_string(re.sub(r'<.*?>', '', headnote))}\n"

Is there a way to avoid duplicating the ingredient formatting code between ingredients() and ingredient_groups()?

And finally - in ingredients, sometimes they preface the post_text with ", ...." and sometimes they don't. Is there an elegant way to preface the post text with a space if there isn't some sort of delimiter?

@smilerz
Copy link
Contributor Author

smilerz commented Apr 16, 2024

One more question: ATK also has a format for instruction_groups(), but they aren't correlated to the ingredient_groups(). As an example their recipe for lasagna has 3 groups of ingredients named 'meat sauce', 'bechamel, 'noodles and cheese'.

There are 5 instructions groups with no names assigned, but can be eyeballed to include the above 3 assembly steps and 2 intermediate steps on the actual baking.

I'm not sure if it's worth updating the instruction_groups() or not.

@jayaddison
Copy link
Collaborator

Is there a way to use HTMLTagStripperPlugin here?

I'm not too familiar with that feature to be honest - but I'll investigate that soon and then will let you know (unless someone else gets to that before me).

Is there a way to avoid duplicating the ingredient formatting code between ingredients() and ingredient_groups()?

Yes, for this there is a helper utility - I'd recommend referring to the _grouping_utils.py documentation for that.

And finally - in ingredients, sometimes they preface the post_text with ", ...." and sometimes they don't. Is there an elegant way to preface the post text with a space if there isn't some sort of delimiter?

Hrm, that's annoying. Maybe we could trim the left of the postText until we find an alphanumeric character; there's no common functionality in the library to do that, though, as far as I'm aware.

@smilerz
Copy link
Contributor Author

smilerz commented Apr 16, 2024

I looked at the _grouping_utils.py it is using soup and HTML elements, but the grouping for ATK is already structured json data so is parsed differently. Or maybe I don't fully understand how so use the util?

@jayaddison
Copy link
Collaborator

One more question: ATK also has a format for instruction_groups(), but they aren't correlated to the ingredient_groups(). As an example their recipe for lasagna has 3 groups of ingredients named 'meat sauce', 'bechamel, 'noodles and cheese'.

There are 5 instructions groups with no names assigned, but can be eyeballed to include the above 3 assembly steps and 2 intermediate steps on the actual baking.

I'm not sure if it's worth updating the instruction_groups() or not.

Interesting!

We don't support groups of instructions at the moment - only groups of ingredients. However, both do make sense in recipes, and it's good to have found an example of that -- and to note the possibility for them to be linked in some cases.

I'm thinking about what to do here, too.

@smilerz
Copy link
Contributor Author

smilerz commented Apr 16, 2024

Oh, thanks for the clarification - I misread what instructions_list() was all about (and misrememberd the name)

@jayaddison
Copy link
Collaborator

That's OK, they are similar names! :)

@jayaddison
Copy link
Collaborator

I looked at the _grouping_utils.py it is using soup and HTML elements, but the grouping for ATK is already structured json data so is parsed differently. Or maybe I don't fully understand how so use the util?

Of course; I forgot that we're using JSON input here. No, I think you've got it - the grouping utils are intended for use with HTML, and they use CSS-based selector queries.

I'll try experimenting with the scraper code here to explore other possibilities.

@smilerz
Copy link
Contributor Author

smilerz commented Apr 16, 2024

Thanks - I appreciate the collaboration.

@jayaddison
Copy link
Collaborator

You're welcome - thanks for the pull request. I'll provide some more review comments within the next day or two.

@smilerz
Copy link
Contributor Author

smilerz commented Apr 19, 2024

I saw the tests failed - my mistake, I forgot to update expected values after stripping the leading zeros from commas.

Copy link
Collaborator

@jayaddison jayaddison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me - thank you @smilerz!

A note also that this scraper may be eligible to include ratings_count info (ref #1061) when that is available.

@jayaddison jayaddison merged commit ee36b05 into hhursev:v15 Apr 22, 2024
17 checks passed
@jayaddison
Copy link
Collaborator

@smilerz this has been included and released in v14.56.0 (and also the pre-release v15.0.0-rc3) of recipe-scrapers on PyPi.

@smilerz smilerz deleted the 15.0.0-rc2 branch April 25, 2024 15:46
@smilerz
Copy link
Contributor Author

smilerz commented Apr 25, 2024

@smilerz this has been included and released in v14.56.0 (and also the pre-release v15.0.0-rc3) of recipe-scrapers on PyPi.

It doesn't look like it's in 14.56.0 - which is what I expected since it's behind a paywall.

@jayaddison
Copy link
Collaborator

Ah, oops - yep, my mistake (that notification was copy-pasted; I forgot about the different circumstance here :/ ). Thanks @smilerz.

@smilerz smilerz restored the 15.0.0-rc2 branch May 2, 2024 15:54
@smilerz smilerz deleted the 15.0.0-rc2 branch May 2, 2024 16:37
jayaddison added a commit that referenced this pull request Jul 28, 2024
Co-authored-by: James Addison <james@reciperadar.com>
@jayaddison jayaddison added the v15 label Jul 28, 2024
jayaddison added a commit that referenced this pull request Jul 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants