Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add export feature for Articles #530

Merged
merged 6 commits into from
Jun 13, 2024
Merged

Add export feature for Articles #530

merged 6 commits into from
Jun 13, 2024

Conversation

addie9800
Copy link
Collaborator

This PR adds a to_json() function to the Article class and adds the option to save all articles to file.
Closes #529

@addie9800 addie9800 requested a review from dobbersc June 5, 2024 11:40
@MaxDall
Copy link
Collaborator

MaxDall commented Jun 10, 2024

@addie9800 Thanks for adding this :) I reworked some parts of the code. Can you give it a quick look and your feedback?

Copy link
Collaborator Author

@addie9800 addie9800 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really good, thanks for the additions. The only thing I'm wondering is whether or not to include the meta and ld in the saved data by default. That was actually what got me thinking about adding the option of filtering some attributes in the first place, because it's a lot of potentially unnecessary data. What do you think?

@addie9800
Copy link
Collaborator Author

This looks good, I can't approve it though, since I am the one who opened the PR

@@ -137,4 +138,10 @@ Should print this:
en
```

## Saving an Article

In case you want to save some or all of the articles (refer to the [`save_to_file` parameter](5_advanced_topics.md#saving-the-crawled-articles) in the next section for the latter), the `Article` class provides a `to_json()` function.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a minor thing: Tutorial 5 isn't the next section after this one.

@addie9800 addie9800 merged commit 70050be into master Jun 13, 2024
4 checks passed
@addie9800 addie9800 deleted the article-to-json branch June 13, 2024 11:32
@koswjjnd
Copy link

Please publish new package

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Question]: Best way to save crawled articles
3 participants