Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker and docker-compose #73

Open
wants to merge 8 commits into
base: master
Choose a base branch
from
Open

Docker and docker-compose #73

wants to merge 8 commits into from

Conversation

lfoppiano
Copy link
Collaborator

@lfoppiano lfoppiano commented Apr 26, 2022

This PR provides:

  • updated both docker and docker-compose configuration files
  • updated documentation including loading of the data in both cases
  • solving the problem with the hard-coded hosts (as discussed in Fix docker environment #34)

For the moment I've pushed one image on docker hub:lfoppiano/biblio-glutton-lookup:0.2 which works fine with a pre-existing LMDB database and elastic search, running on the host machine.

@lfoppiano
Copy link
Collaborator Author

Testing the crossref dump loading via docker:

docker exec 29f22d257f4e java -jar lib/lookup-service-0.2-onejar.jar crossref --input /app/data/sources/crossref_public_data_file_2021_01 /app/lookup/config/glutton.yml

[...]


-- Counters --------------------------------------------------------------------
crossrefLookup_rejectedRecords
             count = 3997391

-- Meters ----------------------------------------------------------------------
crossrefLookup
             count = 108449979
         mean rate = 8397.29 events/second
     1-minute rate = 6769.17 events/second
     5-minute rate = 6807.32 events/second
    15-minute rate = 6933.23 events/second


INFO  [2022-04-28 06:53:57,257] com.scienceminer.lookup.command.LoadCrossrefCommand: Number of Crossref records processed: 108502026
INFO  [2022-04-28 06:53:57,309] com.scienceminer.lookup.command.LoadCrossrefCommand: Crossref lookup size {crossref_Jsondoc=108502027} records.
INFO  [2022-04-28 06:53:57,309] com.scienceminer.lookup.command.LoadCrossrefCommand: Crossref latest indexed date 2020-04-04T04:06:33.

@lfoppiano lfoppiano marked this pull request as ready for review May 8, 2022 23:40
@lfoppiano
Copy link
Collaborator Author

lfoppiano commented May 8, 2022

Using the single docker image with an external elasticsearch and grobid I successfully loaded the lmdb launching commands from within the docker image.

I did not test the docker-compose but the principle should be the same (mount the data directory and launch the command)

steppo83 and others added 3 commits September 28, 2022 09:23
The arg --config core.autocrlf=input is needed for windows users, otherwise they'll have strange errors during docker-compose up phase.
Readme updated for Windows users
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants