generalise "applicable" argument #25

lukavdplas · 2024-10-29T11:32:06Z

This implements the suggestion in #8. It changes the "applicable" argument of extractors to cover more use cases. See the issue for examples.

API changes

Currently, the value of applicable, if any, must be a function that takes the metadata as input. In this version, applicable should be an extractor.

Providing a function is still supported, but I've added a DeprecationWarning. Extractors that use applicable can be updated to use a Metadata extractor (possibly with transform to add a custom function).

BeritJanssen · 2024-11-06T10:53:00Z

ianalyzer_readers/extract.py

+        if self.applicable is None:
+            return True
+        if isinstance(self.applicable, Extractor):
+            return bool(self.applicable.apply(*nargs, **kwargs))


Won't this check here double the time needed for indexing a given field with applicable? In case we have an "expensive" extractor, such as XML tree parsing, this might be an issue. Perhaps good to warn about this in the documentation of the applicable argument.

Or is the only extractor allowed for the applicable argument the Metadata extractor? In this case, this is not a worry, but then this should be documented.

You mean because there is a processing cost in applying the extractor? If so, yes, that extra processing is added.

But I can't imagine an implementation where you can provide an expensive extractor without incurring its processing cost. How would that even work? Do we need to warn developers that if they provide expensive checks, those checks will actually be run?

I'm genuinely not sure how to formulate such a warning that isn't an obvious statement like "keep in mind that the code you provide needs to be executed as well".

Hmm, I see your point. I guess the current setup actually makes the potential processing time more explicit, so perhaps that is in itself already a better documentation than having a callable here, which might be a wrapper around a time consuming XML tree parse, too.

Just a brainfart: Could we make the purpose of what is now implemented as a combination of Backup/Combined and applicable even more explicit in a ConditionalExtractor class?

Something like this?

Condition( if=XML('foo'), then=XML('bar'), else=XML('baz'), )

I can see that working. Or did you have something else in mind?

Something like that, indeed. Not necessary to overhaul this PR, but might be more readable in the long run.

BeritJanssen

Very good idea to generalize the applicable argument! I noticed the documentation of the applicable argument still needs to be updated. Also to clarify what kind of extractors can be used in the applicable argument: only the Metadata extractor? In case this is meant to work with any kind of extractor, warn about "expensive" extractors which might, for instance parse a large XML tree, and add unit tests for more extractors to be used in applicable.

lukavdplas · 2024-11-06T11:09:10Z

Also to clarify what kind of extractors can be used in the applicable argument: only the Metadata extractor?

Good point, I'll add that to the documentation. It's any extractor, as long as it's supported by the Reader.

generalise applicable argument

810316d

lukavdplas requested a review from BeritJanssen October 29, 2024 11:32

BeritJanssen reviewed Nov 6, 2024

View reviewed changes

BeritJanssen requested changes Nov 6, 2024

View reviewed changes

clarify docstring, move deprecation warning

fea5636

lukavdplas force-pushed the feature/generalise_applicable_arg branch from c8bb9e1 to fea5636 Compare November 14, 2024 13:53

lukavdplas merged commit 948bef3 into develop Nov 20, 2024
8 checks passed

lukavdplas deleted the feature/generalise_applicable_arg branch November 20, 2024 15:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

generalise "applicable" argument #25

generalise "applicable" argument #25

lukavdplas commented Oct 29, 2024

BeritJanssen Nov 6, 2024

BeritJanssen Nov 6, 2024

lukavdplas Nov 6, 2024

BeritJanssen Nov 13, 2024

lukavdplas Nov 13, 2024

BeritJanssen Nov 20, 2024

BeritJanssen left a comment •

edited

Loading

lukavdplas commented Nov 6, 2024

generalise "applicable" argument #25

generalise "applicable" argument #25

Conversation

lukavdplas commented Oct 29, 2024

BeritJanssen Nov 6, 2024

Choose a reason for hiding this comment

BeritJanssen Nov 6, 2024

Choose a reason for hiding this comment

lukavdplas Nov 6, 2024

Choose a reason for hiding this comment

BeritJanssen Nov 13, 2024

Choose a reason for hiding this comment

lukavdplas Nov 13, 2024

Choose a reason for hiding this comment

BeritJanssen Nov 20, 2024

Choose a reason for hiding this comment

BeritJanssen left a comment • edited Loading

Choose a reason for hiding this comment

lukavdplas commented Nov 6, 2024

BeritJanssen left a comment •

edited

Loading