Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add keukenliefde parser #877

Merged
merged 5 commits into from
Oct 3, 2023
Merged

Conversation

jaapio
Copy link
Contributor

@jaapio jaapio commented Sep 30, 2023

This patch adds support for keukenliefde.nl

@jaapio
Copy link
Contributor Author

jaapio commented Oct 2, 2023

@jayaddison i think it is better now, I merged the 2 loops into one large one.

@jayaddison
Copy link
Collaborator

Thanks! Looking pretty good - do you have an example recipe where the listitems (li tags) are used, to check the behaviour of those?

@jaapio
Copy link
Contributor Author

jaapio commented Oct 2, 2023

I was just testing a bit more, but found some recipe that didn't work because the html is different, can I add a second test somehow to cover the other situations?

@jayaddison
Copy link
Collaborator

I was just testing a bit more, but found some recipe that didn't work because the html is different, can I add a second test somehow to cover the other situations?

Yep, you certainly can - for that I'd recommend copying the approach taking by an existing multi-test scraper.

For example, two tests for The Clever Carrot (note the different test_file_name setting in particular):

@jaapio
Copy link
Contributor Author

jaapio commented Oct 2, 2023

Thanks, will have a look, I found out that some old recipes are different in html format, which will make it a challenge to make this work. But that's mostly my issue as this site seems to be a mess :-P

The site handles different formats depending on the age of the
recipe, so add a second test case that covers this behavior.
The oldest recipes do not have the nice classes to find elements,
so we do need text matching on the headers.
@jaapio
Copy link
Contributor Author

jaapio commented Oct 2, 2023

I added 2 more test cases which brings the total number at 3:

  1. normal case with paragraphs for the instructions
  2. normal case with list items for the instructions
  3. legacy format recipe without classes to match, which requires some guessing

I tried to extract methods when possible to reduce the number of duplicated lines in the code. And it should now not crash any more when something is a bit different from what we expected.

Please let me know what you think of this :-), should I extract the legacy format to another class or is it ok like this?

@jaapio jaapio requested a review from jayaddison October 2, 2023 21:57
@jayaddison
Copy link
Collaborator

should I extract the legacy format to another class or is it ok like this?

Hmm, good question. Roughly speaking: I think the current approach with multiple fallbacks in a single class is fine here.

Trying to find a reason why that is / general guidance: firstly it's often down to what makes the code easiest to manage, with a large amount of personal preference. And by personal preference, I mean the scraper author (you in this case :)). So if you want, experiment with the alternative, and if you find that you have a clear preference, we can go with that.

The other consideration would be how much of the structure of the HTML page is shared. Generally I'd say that if most of the page is the same, then re-using a single class probably makes more sense. If there are three completely different page structures, then three different classes might be more likely to make sense.

@jaapio
Copy link
Contributor Author

jaapio commented Oct 3, 2023

Think this is done for now, please let me know if other changes are required

@jayaddison
Copy link
Collaborator

Yep, looks good to me @jaapio - thank you for this contribution. If I had one minor nitpick it would be to add the same language coverage in all three test modules for consistency. That's optional though: I plan to merge this soon (within the next 30 mins or so) either way.

@jaapio
Copy link
Contributor Author

jaapio commented Oct 3, 2023

Here you go sir, thanks for this project and your assistance!

As I'm working through my bookmarked recipes you can expect more contributions from my side.

@jayaddison
Copy link
Collaborator

Excellent, thanks! And you're welcome. Please do, and any feedback on improving the development process here would be gratefully received too.

@jayaddison jayaddison merged commit a2abd42 into hhursev:main Oct 3, 2023
16 checks passed
@jaapio jaapio deleted the feat/keukenliefde branch October 3, 2023 07:20
strangetom pushed a commit to strangetom/recipe-scrapers that referenced this pull request Nov 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants