-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
generalise "applicable" argument #25
Conversation
if self.applicable is None: | ||
return True | ||
if isinstance(self.applicable, Extractor): | ||
return bool(self.applicable.apply(*nargs, **kwargs)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't this check here double the time needed for indexing a given field with applicable
? In case we have an "expensive" extractor, such as XML tree parsing, this might be an issue. Perhaps good to warn about this in the documentation of the applicable
argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or is the only extractor allowed for the applicable
argument the Metadata
extractor? In this case, this is not a worry, but then this should be documented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean because there is a processing cost in applying the extractor? If so, yes, that extra processing is added.
But I can't imagine an implementation where you can provide an expensive extractor without incurring its processing cost. How would that even work? Do we need to warn developers that if they provide expensive checks, those checks will actually be run?
I'm genuinely not sure how to formulate such a warning that isn't an obvious statement like "keep in mind that the code you provide needs to be executed as well".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I see your point. I guess the current setup actually makes the potential processing time more explicit, so perhaps that is in itself already a better documentation than having a callable here, which might be a wrapper around a time consuming XML tree parse, too.
Just a brainfart: Could we make the purpose of what is now implemented as a combination of Backup/Combined and applicable
even more explicit in a ConditionalExtractor
class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something like this?
Condition(
if=XML('foo'),
then=XML('bar'),
else=XML('baz'),
)
I can see that working. Or did you have something else in mind?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something like that, indeed. Not necessary to overhaul this PR, but might be more readable in the long run.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very good idea to generalize the applicable argument! I noticed the documentation of the applicable
argument still needs to be updated. Also to clarify what kind of extractors can be used in the applicable
argument: only the Metadata
extractor? In case this is meant to work with any kind of extractor, warn about "expensive" extractors which might, for instance parse a large XML tree, and add unit tests for more extractors to be used in applicable
.
Good point, I'll add that to the documentation. It's any extractor, as long as it's supported by the Reader. |
c8bb9e1
to
fea5636
Compare
This implements the suggestion in #8. It changes the "applicable" argument of extractors to cover more use cases. See the issue for examples.
API changes
Currently, the value of
applicable
, if any, must be a function that takes the metadata as input. In this version,applicable
should be an extractor.Providing a function is still supported, but I've added a
DeprecationWarning
. Extractors that useapplicable
can be updated to use aMetadata
extractor (possibly withtransform
to add a custom function).