Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refresh BaseX DB at given intervals, but not during startups #45

Closed
michellutz opened this issue Sep 30, 2018 · 4 comments
Closed

Refresh BaseX DB at given intervals, but not during startups #45

michellutz opened this issue Sep 30, 2018 · 4 comments
Labels
EIP Improvement Proposal. Put up for discussion.

Comments

@michellutz
Copy link

michellutz commented Sep 30, 2018

Background and Motivation:

Performance optimisation is required to reduce startup time and validation time; reduction of startup time will simplify cloud deployment horizontal scaling, while reduction in validation time will be helpful while integrating ETF with INSPIRE Geoportal, or any other Metadata related workflow/pipeline.

One identified performance issue is that BaseX contains a cache of all validation results, and of all tests; the DB is re-initialised each time Tomact is started/restarted; related data is persisted on file system under /home/tomcat/.etf/ .

Proposed change

Refresh BaseX DB at given intervals, but not during startups.

Alternatives

A parameter could be introduced to deactivate some time-consuming consistency checks during the startup.

Funding

JRC will be ready to fund within its current development contract.

Additional information

n/a

@michellutz michellutz added the EIP Improvement Proposal. Put up for discussion. label Sep 30, 2018
@michellutz
Copy link
Author

Split off from #14 as agreed in the 4th SG meeting on 2018-09-04.

@carlospzurita
Copy link

After conducting some research, we think that this change would need major refactoring of the code. Many controllers on the webapp code (TestDriverController, TestResultController...) rely on an initialized DataStorageService, that contains a BaseX instance using the class BsxDataStorage. The application can't deploy this controllers, that are necessary to run the services, without starting this data storage. Also, we think that this operation makes more sense during startup, better than launching this task at the same time that the users are using the ETF.

Looking at the class BsxDataStorage, on the etf-bsxds module, we can't observe any possible refactoring to improve the startup time, all the tasks executed on initialization seems necessary for us.

In our experience, the most time-consuming task during deployment is the download of ETS files from GitHub. We may run some more tests to assess startup times thoroughly.

@jonherrmann jonherrmann self-assigned this Nov 22, 2018
@carlospzurita
Copy link

Due to some gross startup tests, BaseX DB startup time has not revealed to be an excessive time consumption task in the startup considering it in absolute terms.

A mean startup for the ETF validator is 60 seconds which we can estimate that 30/60 is consumed by the BaseX DB initialisation. Event if the BaseX initialization is roughly a 50% of the total time, it is still a very low amount of time.

Thus, considering that:

  • in general terms for a web application, the initialization time of BaseX doesn't seem to be relevant
  • amount of resources and relevant changes to be performed to initialize/refresh BaseX at given intervals
  • the fact that the refresh would take place at real time and could block processes
    It has been agreed JRC that this issue is not relevant anymore for the performance enhancements on the ETF.
    The improvements on performance can be continued on issues Reusable Test Objects #26 and Improve schema validation #49

@jonherrmann jonherrmann removed their assignment Dec 3, 2018
@michellutz
Copy link
Author

Closed as agreed in the SG meeting on 2020-01-21

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
EIP Improvement Proposal. Put up for discussion.
Projects
None yet
Development

No branches or pull requests

3 participants