Overhaul priviblur-extractor #98

syeopite · 2024-09-19T04:55:58Z

This PR cleans up the priviblur-extractor module significantly.

The internal logic remains mostly unchanged but file structure and method order was altered significantly to make it easier to maintain and understand.

I think I'd like to eventually separate this into a separate installable python package as so other projects can use it, and so that we can have (well add in this case) tests specific to the extractor in a more organized way.

use None for default val of url params Remove unreachable logic in _get_json Remove unused variable

Converts .process into a classmethod that initializes and calls the .parse method of the corresponding parser when a match is found

Removes the usage of the ELEMENT_PARSERS constant

Merged directly into logic of post parser's parse method

BlogThemeParser can simply be a method of BlogInfoParser

BlogParser -> BlogTimelineParser BlogInfoParser -> BlogParser This new name is more accurate as to the actual parsed results of the two parsers. The old BlogInfoParser parses the actual blog that contains the name, header, etc; while the BlogParser parses a simple object that stores the blog info (which calls BlogInfoParser), a list of posts, and a cursor.

A blog's theme attribute should always exist

More efficient to directly initialize and call .parse() instead rather than relying on a parameter that essentially does the same result.

There is no need for the complicated error handling that we previously had in the logic for parsing trail posts. Tumblr's API is in general quite stable and there shouldn't be any missing fields.

Blog model is used to represent the contents of a blog and not the blog itself. As such it has been renamed.

Any changes to `fields[blog]` could potentially break parsing. In addition the attribute should also match real Tumblr API requests as much as possible

True bool is converted to "True" and not "true"

The custom defined jinja test "a_post" pointed to an old location for the Post model that did not reflect the changes seen in #98. This commit updates it to use the proper location Closes #106

syeopite added 30 commits September 18, 2024 12:46

_get_json: clean up debris from code

d49e3ba

use None for default val of url params Remove unreachable logic in _get_json Remove unused variable

Extractor: Refactor parsing.py to a package

c236de7

Refactor parser objects for better syntax

be56de2

Converts .process into a classmethod that initializes and calls the .parse method of the corresponding parser when a match is found

parse_item: match parser based on given ones

57f1643

Removes the usage of the ELEMENT_PARSERS constant

Remove unused constants

f6153d0

Extractor: move parse_* functions to separate file

6cd148a

Use cls variable in .process classmethod

672bf2c

Extract collection parsers to separate file

4b6b5f4

Remove underscore prefix from parser names

51af698

Post parser: remove get_placeholder* functions

0fb6c2c

Merged directly into logic of post parser's parse method

Rename TimelineBlogParser to BlogInfoParser

a921e63

Merge BlogThemeParser into BlogInfoParser

e246f82

BlogThemeParser can simply be a method of BlogInfoParser

Rename TimelinePostParser to PostParser

9ab5428

Rename parsers.py to items.py

4ee57e4

Make parse_item default to using only PostParser

598346b

Move post list parse logic to collection_parsers

f96214b

Simplify parse_* functions

3ff74bd

Make CursorParser private

5b2b26d

Remove None return of parse_theme()

d8cbcae

A blog's theme attribute should always exist

BlogParser: remove force_parse param from .process

25e2838

More efficient to directly initialize and call .parse() instead rather than relying on a parameter that essentially does the same result.

Simplify logic for parsing trail posts

fdfa1e2

There is no need for the complicated error handling that we previously had in the logic for parsing trail posts. Tumblr's API is in general quite stable and there shouldn't be any missing fields.

Rename Blog model to BlogTimeline

46667e9

Blog model is used to represent the contents of a blog and not the blog itself. As such it has been renamed.

Rename TimelineBlog to Blog

3b437c2

Rename TimelinePost* to Post*

761cca0

Extract BlogTimeline and Timeline to separate file

647cfcb

Refactor file structure of models module

ee1e37e

TumblrAPI: Remove ability to set fields[blog]

969658b

Any changes to `fields[blog]` could potentially break parsing. In addition the attribute should also match real Tumblr API requests as much as possible

TumblrAPI: Remove ability to set reblog_info

a9aa610

TumblrAPI: Use "true" instead of True

f07fcf2

True bool is converted to "True" and not "true"

syeopite added 8 commits September 29, 2024 12:14

Remove non-existent url param for /blog/name/posts

133701c

Minor docstring changes

80785db

TumblrAPI: Remove ability to edit limit parameter

85d6ddf

Shift tumblr error response logging to info lvl

45c6ed8

Add log for unknown internal error code

2f5c007

Remove parser found debug log for timeline parsers

ffc23e6

Bump priviblur model version number

1c4f519

Parser: remove redundant argument to parse_item

f7a8ab9

syeopite marked this pull request as ready for review October 4, 2024 05:14

syeopite merged commit 0c75acb into master Oct 4, 2024

syeopite deleted the extractor-refactor branch October 4, 2024 05:14

syeopite mentioned this pull request Oct 4, 2024

Fix AttributeError on all pages with posts #107

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Overhaul priviblur-extractor #98

Overhaul priviblur-extractor #98

syeopite commented Sep 19, 2024 •

edited

Loading

Overhaul priviblur-extractor #98

Overhaul priviblur-extractor #98

Conversation

syeopite commented Sep 19, 2024 • edited Loading

syeopite commented Sep 19, 2024 •

edited

Loading