Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Load/run only the preprocessors required by the requested features #5

Open
lefterav opened this issue Jan 29, 2013 · 3 comments
Open
Assignees

Comments

@lefterav
Copy link
Collaborator

Currently all default preprocessors are loaded, even if some of them are not required by the requested features. This causes major delays (e.g. if parsing occurs without being needed). Some design changes would be required, in order to let each one of the feature classes to specify which preprocessors are required before their execution

@lefterav
Copy link
Collaborator Author

We have a version for this in the branch resource-manager. We need to test this shortly and then it will be ready to merge with master.

@lefterav
Copy link
Collaborator Author

I just finished a first pass on re-designing the execution of ResourceProcessors.
ResourceProcessors are not any more initialized and executed in a raw way from the FeatureExtractor.java
The initialization functions who were in the FeatureExtractor.java have been moved to shef.mt.pipelines.DefaultResourcePipeline
The superclass ResourcePipeline can now receive a list of the required resourceNames and fires only the ResourceProcessors that are define with this resourceName.
ResourceProcessors that want to be compatible with this, should have the this.resourceName class variable set with a resource name (i.e. "bparser' etc)
The only ResourceProcessors who were executed by the existing FeatureExtractor were BParser and TopicModelling, so these are the only ones that have been modified to work with the pipeline system.

TODO: the above mentioned solution only avoids RUNNING the ResourceProcessors. ResourceProcessors should actually not be initialized at all (i.e. grammars and tables should not be loaded). This actually requires adding a separate "initialize" function to each of the resourceProcessors, or implementing some kind of pythonic dynamic class loading which is tricky in Java.

@lefterav
Copy link
Collaborator Author

According to the proposed design, every tool that implements the ResourceProcessor interface, will have one additional obligatory function, called initialize. This function will have ONLY one parameter, the PropertiesManager, which is an object that holds all parameters read from the user's customized .properties.

Each resource processor will be now responsible in its own class to acquire the parameters they need for their initialization, by directly asking the PropertiesManager for them.

This will solve the problem, that the resource processors had to be initialized "hard-coded" one by one in the FeaturesExtractor.java since each of them had different initialization parameters.

This will also require that we modify the existing processors by moving their initialization code from the FeatureExtractor (or the Pipeline) back to the Processor class.

I hope you approve this change

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants