Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Propose a solution to speed up the process of ingesting updated ontologies into BioPortal #37

Closed
alexskr opened this issue Nov 3, 2023 · 7 comments
Assignees
Labels
XA2 Enhance usability, completeness, and reliability in domain knowledge for RADx and research data mana XA2.3 In vocabulary management system (BioPortal/OntoPortal), establish ontological views...

Comments

@alexskr
Copy link

alexskr commented Nov 3, 2023

BioPortal has a mechanism for downloading ontology files from remote URLs and triggering a new submission when change is detected. This process runs on a daily schedule which currently set to once-a-day. The delay can cause a significant friction for users who need the new version of ontology/vocabulary to show up quicker in BioPortal. We need to propose an enhancement for the BioPortal to expedite up this process.

@alexskr alexskr added XA2 Enhance usability, completeness, and reliability in domain knowledge for RADx and research data mana XA2.3 In vocabulary management system (BioPortal/OntoPortal), establish ontological views... labels Nov 3, 2023
@alexskr alexskr self-assigned this Nov 3, 2023
@matthewhorridge
Copy link

I'd like to propose that we use GitHub webhooks. GitHub can trigger a webhook event when a release is created. I think this would be ideal.

@alexskr
Copy link
Author

alexskr commented Nov 3, 2023

ncbo/ontologies_api#127

@marcosmro
Copy link
Contributor

I agree with @matthewhorridge

@matthewhorridge
Copy link

I think BioPortal will need to support GitHub secrets to validate pull requests were from GitHub. See: https://docs.github.com/en/webhooks/using-webhooks/validating-webhook-deliveries

@matthewhorridge
Copy link

Info you can provide to create a WebHook in GitHub:

image

@alexskr
Copy link
Author

alexskr commented Nov 30, 2023

@matthewhorridge highlighted a recurring issue concerning BioPortal caches. When a new ontology submission is made in BioPortal, end users are unable to ascertain the submission's status. The parsing process requires some time to complete, and even after its completion, the BioPortal UI does not immediately reflect the changes. It appears that BioPortal caches fail to effectively invalidate all caches linked to the recently processed ontology.

see ncbo/bioportal-project#193

mdorf added a commit to ncbo/ncbo_cron that referenced this issue Dec 17, 2023
mdorf added a commit to ncbo/ncbo_cron that referenced this issue Dec 17, 2023
mdorf added a commit to ncbo/ontologies_api that referenced this issue Dec 17, 2023
syphax-bouazzouni added a commit to ontoportal-lirmm/ncbo_cron that referenced this issue Dec 27, 2023
…its, and the Most visited pages in the month (#17)

* remove forgot variables

* fix for #61

- create contact instance if it doesn't exist
- changed --from-api to --from-apikey
- minor linting

* Restore branch specifier to develop

* Optimization - remove repeated query

* Gemfile.lock update

* Gemfile.lock update

* Gemfile.lock update

* Gemfile had references to develop branch

* implemented #64 - ability to generate labels independently of RDF processing (and vise versa)

* Gemfile.lock update

* fixed a bug in #64

* Relocate docker-compose file and update default configs

* Add GH workflow for publishing docker images

* use ruby native method for listing files instead of a git function

Resolves warning messages when we exclude .git directory from docker image

* remove comment

* capitalize argument in order to be consistent with other scripts

* add arm/64 platform

* additional error handling for SPAM deletion script, #60

* additional error handling for SPAM deletion script, #60

* implemented #67 - improved corrupt data and error handling

* Gemfile.lock update

* exclude test/data/dictionary.txt from git commits

* update version of solr-ut

* Gemfile.lock update

* Restore branch specifier to master

* fixed configuration for the analytics module

* Gemfile.lock update

* implemented #69 - scheduled annotator dictionary file generation should be a configurable option instead of the default

* Gemfile.lock update

* gem update

* create new rake taks for updating purls for all ontologies

moved from ontologies_api/fix_purls.rb

* initial implementation of #70 - Google Analytics v4 Update Compatibility Issue

* added the /data folder to ignore

* update gems

* Gemfile.lock update

* Gemfile.lock update

* Gemfile.lock update

* use patched version of agraph v7.3.1

* unpin faraday gem

* A chnage to reference Analytics Redis from LinkedData block

* Gemfile.lock update

* Gemfile.lock update

* Gemfile.lock update

* Gemfile.lock update

* use assert_operator instead of assert

minitest style guide adherence.
encountered an intermittent unit test failure so assert_operator will provide
better failure feedback than assert

* use local solr to pass the tests

* fixed ncbo_ontology_archive_old_submissions error output

* Gemfile.lock update

* Gemfile.lock update

* Gemfile update

* Gemfile update

* fixes to the analytics script and a new script to generate UA analytics for documentation

* Gemfile.lock update

* Gemfile.lock update

* implemented the first pass at bmir-radx/radx-project#37

* implemented the first pass at bmir-radx/radx-project#37

* set bundler version to be comptatible with ruby 2.7

+ AG v8

* refactor ontologies analytics job to handle the new google analytics migration

* add user analytics fetching the monthly user visits count

* add page visits analytics  fetching  last month most visited pages

* extract google analytics UA import code to a script to make current code clean of it

* add option to force submission archiving even if already archived

---------

Co-authored-by: Alex Skrenchuk <alexskr@stanford.edu>
Co-authored-by: mdorf <mdorf@stanford.edu>
Co-authored-by: Jennifer Vendetti <vendetti@stanford.edu>
syphax-bouazzouni added a commit to ontoportal-lirmm/ncbo_cron that referenced this issue Dec 28, 2023
…its, and the Most visited pages in the month (#17)

* remove forgot variables

* fix for #61

- create contact instance if it doesn't exist
- changed --from-api to --from-apikey
- minor linting

* Restore branch specifier to develop

* Optimization - remove repeated query

* Gemfile.lock update

* Gemfile.lock update

* Gemfile.lock update

* Gemfile had references to develop branch

* implemented #64 - ability to generate labels independently of RDF processing (and vise versa)

* Gemfile.lock update

* fixed a bug in #64

* Relocate docker-compose file and update default configs

* Add GH workflow for publishing docker images

* use ruby native method for listing files instead of a git function

Resolves warning messages when we exclude .git directory from docker image

* remove comment

* capitalize argument in order to be consistent with other scripts

* add arm/64 platform

* additional error handling for SPAM deletion script, #60

* additional error handling for SPAM deletion script, #60

* implemented #67 - improved corrupt data and error handling

* Gemfile.lock update

* exclude test/data/dictionary.txt from git commits

* update version of solr-ut

* Gemfile.lock update

* Restore branch specifier to master

* fixed configuration for the analytics module

* Gemfile.lock update

* implemented #69 - scheduled annotator dictionary file generation should be a configurable option instead of the default

* Gemfile.lock update

* gem update

* create new rake taks for updating purls for all ontologies

moved from ontologies_api/fix_purls.rb

* initial implementation of #70 - Google Analytics v4 Update Compatibility Issue

* added the /data folder to ignore

* update gems

* Gemfile.lock update

* Gemfile.lock update

* Gemfile.lock update

* use patched version of agraph v7.3.1

* unpin faraday gem

* A chnage to reference Analytics Redis from LinkedData block

* Gemfile.lock update

* Gemfile.lock update

* Gemfile.lock update

* Gemfile.lock update

* use assert_operator instead of assert

minitest style guide adherence.
encountered an intermittent unit test failure so assert_operator will provide
better failure feedback than assert

* use local solr to pass the tests

* fixed ncbo_ontology_archive_old_submissions error output

* Gemfile.lock update

* Gemfile.lock update

* Gemfile update

* Gemfile update

* fixes to the analytics script and a new script to generate UA analytics for documentation

* Gemfile.lock update

* Gemfile.lock update

* implemented the first pass at bmir-radx/radx-project#37

* implemented the first pass at bmir-radx/radx-project#37

* set bundler version to be comptatible with ruby 2.7

+ AG v8

* refactor ontologies analytics job to handle the new google analytics migration

* add user analytics fetching the monthly user visits count

* add page visits analytics  fetching  last month most visited pages

* extract google analytics UA import code to a script to make current code clean of it

* add option to force submission archiving even if already archived

---------

Co-authored-by: Alex Skrenchuk <alexskr@stanford.edu>
Co-authored-by: mdorf <mdorf@stanford.edu>
Co-authored-by: Jennifer Vendetti <vendetti@stanford.edu>
syphax-bouazzouni added a commit to ontoportal-lirmm/ncbo_cron that referenced this issue Dec 28, 2023
…its, and the Most visited pages in the month (#17)

* remove forgot variables

* fix for #61

- create contact instance if it doesn't exist
- changed --from-api to --from-apikey
- minor linting

* Restore branch specifier to develop

* Optimization - remove repeated query

* Gemfile.lock update

* Gemfile.lock update

* Gemfile.lock update

* Gemfile had references to develop branch

* implemented #64 - ability to generate labels independently of RDF processing (and vise versa)

* Gemfile.lock update

* fixed a bug in #64

* Relocate docker-compose file and update default configs

* Add GH workflow for publishing docker images

* use ruby native method for listing files instead of a git function

Resolves warning messages when we exclude .git directory from docker image

* remove comment

* capitalize argument in order to be consistent with other scripts

* add arm/64 platform

* additional error handling for SPAM deletion script, #60

* additional error handling for SPAM deletion script, #60

* implemented #67 - improved corrupt data and error handling

* Gemfile.lock update

* exclude test/data/dictionary.txt from git commits

* update version of solr-ut

* Gemfile.lock update

* Restore branch specifier to master

* fixed configuration for the analytics module

* Gemfile.lock update

* implemented #69 - scheduled annotator dictionary file generation should be a configurable option instead of the default

* Gemfile.lock update

* gem update

* create new rake taks for updating purls for all ontologies

moved from ontologies_api/fix_purls.rb

* initial implementation of #70 - Google Analytics v4 Update Compatibility Issue

* added the /data folder to ignore

* update gems

* Gemfile.lock update

* Gemfile.lock update

* Gemfile.lock update

* use patched version of agraph v7.3.1

* unpin faraday gem

* A chnage to reference Analytics Redis from LinkedData block

* Gemfile.lock update

* Gemfile.lock update

* Gemfile.lock update

* Gemfile.lock update

* use assert_operator instead of assert

minitest style guide adherence.
encountered an intermittent unit test failure so assert_operator will provide
better failure feedback than assert

* use local solr to pass the tests

* fixed ncbo_ontology_archive_old_submissions error output

* Gemfile.lock update

* Gemfile.lock update

* Gemfile update

* Gemfile update

* fixes to the analytics script and a new script to generate UA analytics for documentation

* Gemfile.lock update

* Gemfile.lock update

* implemented the first pass at bmir-radx/radx-project#37

* implemented the first pass at bmir-radx/radx-project#37

* set bundler version to be comptatible with ruby 2.7

+ AG v8

* refactor ontologies analytics job to handle the new google analytics migration

* add user analytics fetching the monthly user visits count

* add page visits analytics  fetching  last month most visited pages

* extract google analytics UA import code to a script to make current code clean of it

* add option to force submission archiving even if already archived

---------

Co-authored-by: Alex Skrenchuk <alexskr@stanford.edu>
Co-authored-by: mdorf <mdorf@stanford.edu>
Co-authored-by: Jennifer Vendetti <vendetti@stanford.edu>
syphax-bouazzouni added a commit to ontoportal/ncbo_cron that referenced this issue Jan 16, 2024
…onward (#2)

* add a script to eradicate (delete data+ files) submissions of an ontology

* Auto stash before merge of "development" and "master"

* omit logs link file

* update the eradicator to support the eradication of not archived submissions if wanted

* fix the delete submission files to not let behind empty directories

* not remove the submission directory beaucse it's already done by the submission.delete

* Update Gemfile.lock

* Reset branch specifier to develop

* extract do_ontology_pull function

* some simple code refactor in the ontology_pull

* simple code refactor of test_ontology_pull

* add a script to do a ontology pull on an ontology on demand

* set the name of the new script in $0

* extract new_file_exists? method from do_ontology_pull

* save the submission in the RemoteFileException

* some automatic code refactor/lint

* use the new do_ontology_pull in the old  do_remote_ontology_pull

* fixed an API call mentioned by @syphax-bouazzouni in ncbo/bioportal-project#254

* fixed an API call mentioned by @syphax-bouazzouni in ncbo/bioportal-project#254

* Gemfile.lock update

* bump up version of actions/checkout from v2->v3

* Gemfile.lock update

* Merge branch 'develop'

* remove forgot variables

* GH Actions unit test workflow refactor

- add ruby versioning via docker-compose.yml file
- bump up ruby v2.6 -> v2.7
- add AllegroGraph backend
- add code coverage

* Remove extra space

* fix for #61

- create contact instance if it doesn't exist
- changed --from-api to --from-apikey
- minor linting

* Restore branch specifier to develop

* Optimization - remove repeated query

* Gemfile.lock update

* Gemfile.lock update

* Gemfile.lock update

* Gemfile had references to develop branch

* implemented #64 - ability to generate labels independently of RDF processing (and vise versa)

* Gemfile.lock update

* fixed a bug in #64

* Relocate docker-compose file and update default configs

* Add GH workflow for publishing docker images

* use ruby native method for listing files instead of a git function

Resolves warning messages when we exclude .git directory from docker image

* remove comment

* capitalize argument in order to be consistent with other scripts

* add arm/64 platform

* additional error handling for SPAM deletion script, #60

* additional error handling for SPAM deletion script, #60

* implemented #67 - improved corrupt data and error handling

* Gemfile.lock update

* exclude test/data/dictionary.txt from git commits

* update version of solr-ut

* Gemfile.lock update

* Restore branch specifier to master

* fixed configuration for the analytics module

* Gemfile.lock update

* implemented #69 - scheduled annotator dictionary file generation should be a configurable option instead of the default

* Gemfile.lock update

* gem update

* create new rake taks for updating purls for all ontologies

moved from ontologies_api/fix_purls.rb

* initial implementation of #70 - Google Analytics v4 Update Compatibility Issue

* added the /data folder to ignore

* update gems

* Gemfile.lock update

* Gemfile.lock update

* Gemfile.lock update

* use patched version of agraph v7.3.1

* unpin faraday gem

* A chnage to reference Analytics Redis from LinkedData block

* Gemfile.lock update

* Gemfile.lock update

* Gemfile.lock update

* Gemfile.lock update

* use assert_operator instead of assert

minitest style guide adherence.
encountered an intermittent unit test failure so assert_operator will provide
better failure feedback than assert

* fixed ncbo_ontology_archive_old_submissions error output

* Gemfile.lock update

* Gemfile.lock update

* Gemfile update

* Gemfile update

* fixes to the analytics script and a new script to generate UA analytics for documentation

* Gemfile.lock update

* Gemfile.lock update

* implemented the first pass at bmir-radx/radx-project#37

* implemented the first pass at bmir-radx/radx-project#37

* set bundler version to be comptatible with ruby 2.7

+ AG v8

* Gemfile.lock update

* Gemfile.lock update

---------

Co-authored-by: Jennifer Vendetti <vendetti@stanford.edu>
Co-authored-by: mdorf <mdorf@stanford.edu>
Co-authored-by: Alex Skrenchuk <alexskr@stanford.edu>
syphax-bouazzouni added a commit to ontoportal/ontologies_api that referenced this issue Jan 16, 2024
…onward (#4)

* fix get a submission metrics

* Auto stash before merge of "upstream" and "upstream/master"

* add the slice get endpoint

* add the slices create endpoint

* add the slices delete endpoint

* add the slices update endpoint

* Add caching for analytics for 24 hours.

* Fix for #97. Check for ontology existence before brining attributes

* Handle edge case for submission downloads which do not have UploadFilePath set

Fixes #98

* Gemfile.lock update

* Add GH workflow for capistrano deployments

* Update Gemfile.lock

* fix ability to run deployment manually

* Fix for deprecation notice of Rack::Attack.throttled_response

Update configuration to closely match rack attack documentation in order
to address deprecation notice:
[DEPRECATION] Rack::Attack.throttled_response is deprecated. Please use Rack::Attack.throttled_responder instead

* Update version of actions/checkout to address deprecation notices

* Fix: Documentation rendering  (#107)

* Auto stash before merge of "upstream" and "upstream/master"

* fix haml gem version

* Update Gemfile

* Restore branch specifier to master

* Update capistrano sample config to include setting which branch to deploy from

* Gemfile.lock update

* Gemfile.lock update

* Gemfile.lock update

* Gemfile.lock update

* Make sure ontology is present when accessing submissions.

Fixes an internal server error when accessing submission for non-existent/deleted ontology

* Remove owlapi_wrapper.jar file.  It is included elsewhere

* Add health checks for docker services and add more config file options

* Remove wait-for-it

* Add health check to AG service

* Add GH workflow for publishing docker images

* Add missing redis port number

* Restore branch specifier to develop

* Set mgrep port to 55556

* add arm64 platform

* bump up version of solr-ut

* Fix ncbo#116

- pinned redis-store to 1.9.1 until #358 gets resolved
- fixed deprecation notices "warning: calling URI.open via Kernel#open
  is deprecated, call URI.open directly or use URI#open"

* Restore branch specifier to master

* fix for wrongly replaced string

* Delete fix_purls.rb

fix_purls.rb moved to a rake task under ncbo_cron project

* Remove google analytics depenencies since those needed only in ncbo_cron

* bump up major version of oj and faraday

* remove search_index.rb script

* lock gem rack-cache to 1.13.0

see ncbo#118

* remove depenency on redis-activesupport

rack-attack can work with redis directly so there is no need to use
redis-activesupport which is no longer being actively developed

* unpin redis-store

solves:
ncbo#105
ncbo#106

* use redis-store from forked repo containing redis 5 compat fixes

this should be reverted back to original after redis 5
copatibilty issues are resolved

* use patched version of agraph v7.3.1

* unpin faraday gem

* Gemfile.lock update

* fixed an issue with the GA4 Analytics migration

* fixed an issue with the GA4 Analytics migration

* reduce request limit for resource intensive api calls (#121)

* Announce deployments in NewRelic  (#124)

* Record deployments to NewRelic

https://docs.newrelic.com/docs/apm/agents/ruby-agent/features/record-deployments-ruby-agent/

* add newrelic to deployment group

github actions deployment doesn't install default group so
capistrano fails to find newrelic recepies unless we add it
to the deployment group

* add rubocop

* Gemfile update, goo version including goo#138 and goo#139

* Gemfile.lock update

* Gemfile.lock update

* Gemfile.lock update

* update slice write operation to check if user is admin

* Gemfile.lock update

* fixed an accidental commit of docker compose file

* fixed Gemfile after merging from master

* Gemfile.lock update

* update Gemfile.lock

* Gemfile.lock update

* redis-store gem with redis 5 compatibility fix

* Gemfile.lock update

* make the check_access helper use filter_access if the object is a list

* add test for submissions access check with two ontologies private and pubic

* check access of ontologies in /ontologies/:acronym/submissions endpoint

* Set gem branch specifier to develop

* reset branch specifier to master

* add the slice get endpoint

* add the slices create endpoint

* add the slices delete endpoint

* add the slices update endpoint

* update slice write operation to check if user is admin

* add slices creation & deletion unit tests

* Gemfile.lock update

* Gemfile.lock update

* Merged #87 from master

* fixed Gemfile after merge

* Gemfile.lock update

* update Gemfile.lock

* Add configurable option for github org where code is deployed from

* Gemfile update

* Gemfile update

* check existance of acroym before fetching details

fixes #129

* extract slice tests helper to the parent class for reusability

* add a test for the creation of an admin user

* enforce the security of admin user creation

* enforce user deletion security to be admin only

* Gemfile.lock update

* update Gemfile.lock

* Gemfile update

* implemented the first pass at bmir-radx/radx-project#37

* set bundler version to be comptatible with ruby 2.7

* Gemfile.lock update

* implemented #127 - Add API call to trigger ontology pull from remote location

* Gemfile.lock

* implemented a test for #127 - Add API call to trigger ontology pull from remote location

* Gemfile.lock update

* implemented a test for #127 - Add API call to trigger ontology pull from remote location

* implemented a test for #127 - Add API call to trigger ontology pull from remote location

* Gemfile.lock update

* use agraph v8.0.0

* Gemfile.lock update

* Gemfile.lock update

---------

Co-authored-by: Alex Skrenchuk <alexskr@stanford.edu>
Co-authored-by: mdorf <mdorf@stanford.edu>
Co-authored-by: Jennifer Vendetti <vendetti@stanford.edu>
@alexskr alexskr changed the title Propose a solution to speed up the process of injecting updated ontologies into BioPortal Propose a solution to speed up the process of ingesting updated ontologies into BioPortal Jan 17, 2024
@alexskr
Copy link
Author

alexskr commented Jan 17, 2024

new BioPortal API call to trigger remote ontology pull is deployed.
Sample GitHub actions script for triggering pull is here

@matthewhorridge, would you like me to add GH actions to the radx ontology?

@mdorf mdorf closed this as completed Feb 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
XA2 Enhance usability, completeness, and reliability in domain knowledge for RADx and research data mana XA2.3 In vocabulary management system (BioPortal/OntoPortal), establish ontological views...
Projects
None yet
Development

No branches or pull requests

4 participants