-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add version parsing #661
base: images
Are you sure you want to change the base?
add version parsing #661
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot, this is a really useful addition. I do have a couple comments and questions though :)
def __repr__(self) -> str: | ||
return f"{self.width}x{self.height or '...'}" | ||
|
||
def __lt__(self, other: "Dimension") -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it might be a bit cleaner to add type check for other
to cause an accurate exception to be raised.
src/fundus/parser/data.py
Outdated
min_width: Optional[int] = None | ||
size: Optional[Dimension] = None | ||
type: Optional[str] = None | ||
file_formats: ClassVar[List[str]] = ["png", "jpg", "jpeg", "webp"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to keep this as a Class Variable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would argue yes, cause ultimately it belongs solely to the class. As a compromise, I changed it to __FILE_FORMATS__
src/fundus/parser/utility.py
Outdated
return None | ||
|
||
|
||
_min_width_pattern = re.compile(r"min-width:\s*(?P<minwidth>[0-9]*)(?P<descriptor>[pxem]+)") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to some css docs, possible units are also %
and ch
|
||
|
||
_min_width_pattern = re.compile(r"min-width:\s*(?P<minwidth>[0-9]*)(?P<descriptor>[pxem]+)") | ||
_width_x_height_pattern = re.compile(r"(?P<width>[0-9]+)x(?P<height>[0-9]+)") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might make sense to also match max-width
, which is sometimes used instead of min-width
for smaller images.
e.g. https://www.zeit.de/gesundheit/2023-11/neues-alzheimer-medikament-zulassung-lecanemab-leqembi-demenz
src/fundus/parser/utility.py
Outdated
if descriptor != "px": | ||
logger.debug(f"Pixel calculation not implemented for {descriptor}") | ||
else: | ||
min_width = int(value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like an iteration over all matches is not the best solution here. Granted, we would only expect one match, but in case there are more, overwriting all previous matches seems not ideal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I rewrote the logic. We now assume that there is only one max-/min-width per query string (separated through ,
) and either a max- or min-width defined.
if descriptor is not None: | ||
if match := re.search(r"(?P<multiplier>[0-9.]+)x", descriptor): | ||
kwargs["dpr"] = float(match.group("multiplier")) | ||
elif match := re.search(r"(?P<width>[0-9]+)(px|w)", descriptor): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other technically allowed units of measurement are %
and em
, also w
doesn't seem to be a valid unit (https://www.w3schools.com/cssref/css_units.php)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The URLs coming are coming from the srcset
attribute. Allowed descriptors are x
, w
, I also found some px
src/fundus/utils/regex.py
Outdated
if value is not None: | ||
matches[key] = conversion(value) if conversion is not None else value | ||
elif keep_none: | ||
matches[key] = match_dict[key] or value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are match_dict[key]
and value
not equivalent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, that's a typo and should be match
This PR reworks the versioning system of images, adds extended serialization and parses size for image versions.