Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overhaul priviblur-extractor #98

Merged
merged 38 commits into from
Oct 4, 2024
Merged

Overhaul priviblur-extractor #98

merged 38 commits into from
Oct 4, 2024

Conversation

syeopite
Copy link
Owner

@syeopite syeopite commented Sep 19, 2024

This PR cleans up the priviblur-extractor module significantly.

The internal logic remains mostly unchanged but file structure and method order was altered significantly to make it easier to maintain and understand.

I think I'd like to eventually separate this into a separate installable python package as so other projects can use it, and so that we can have (well add in this case) tests specific to the extractor in a more organized way.

use None for default val of url params

Remove unreachable logic in _get_json

Remove unused variable
Converts .process into a classmethod that initializes and
calls the .parse method of the corresponding parser when
a match is found
Removes the usage of the ELEMENT_PARSERS constant
Merged directly into logic of post parser's parse method
BlogThemeParser can simply be a method of BlogInfoParser
BlogParser -> BlogTimelineParser
BlogInfoParser -> BlogParser

This new name is more accurate as to the actual parsed results
of the two parsers.

The old BlogInfoParser parses the actual blog that contains the name,
header, etc; while the BlogParser parses a simple object that stores
the blog info (which calls BlogInfoParser), a list of posts, and a cursor.
A blog's theme attribute should always exist
More efficient to directly initialize and call .parse() instead
rather than relying on a parameter that essentially does the same
result.
There is no need for the complicated error handling that we previously
had in the logic for parsing trail posts.

Tumblr's API is in general quite stable and there shouldn't be any missing
fields.
Blog model is used to represent the contents of a blog and not
the blog itself. As such it has been renamed.
Any changes to `fields[blog]` could potentially break parsing.
In addition the attribute should also match real Tumblr API requests
as much as possible
True bool is converted to "True" and not "true"
@syeopite syeopite marked this pull request as ready for review October 4, 2024 05:14
@syeopite syeopite merged commit 0c75acb into master Oct 4, 2024
@syeopite syeopite deleted the extractor-refactor branch October 4, 2024 05:14
syeopite added a commit that referenced this pull request Oct 4, 2024
The custom defined jinja test "a_post" pointed to an old
location for the Post model that did not reflect the changes
seen in #98.

This commit updates it to use the proper location

Closes #106
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant