-
Notifications
You must be signed in to change notification settings - Fork 69
Harvest
Harris Tzovanakis edited this page Jan 9, 2019
·
8 revisions
There is a celerybeat
running every day at:
- EST Timezone (Estern Standard Time) UTC -5
- EDT Timezone (Eastern Daylight Time) UTC -4
The easiest way to check harvest is by accessing https://inspire-prod-grafana.web.cern.ch. There is also an alert from grafana
which sends a message on Zulip at ops/harvest
topic.
We harvest many collections from arXiv
but someone can harvest a single paper as well. The collections that are related to INSPIRE are the following:
- cs
- econ
- eess
- math
- physics
- physics:astro-ph
- physics:cond-mat
- physics:gr-qc
- physics:hep-ex
- physics:hep-lat
- physics:hep-ph
- physics:hep-th
- physics:math-ph
- physics:nlin
- physics:nucl-ex
- physics:nucl-th
- physics:physics
- physics:quant-ph
- q-bio
- q-fin
- stat
$ ssh inspire-prod-crawler1
$ inspirehep crawler schedule arXiv article --kwarg 'from_date=2018-12-06' --kwarg 'until_date=2018-12-07' --kwarg 'sets=cs,econ,eess,math,physics,physics:astro-ph,physics:cond-mat,physics:gr-qc,physics:hep-ex,physics:hep-lat,physics:hep-ph,physics:hep-th,physics:math-ph,physics:nlin,physics:nucl-ex,physics:nucl-th,physics:physics,physics:quant-ph,q-bio,q-fin,stat
Note from_date
and until_date
are very important.
This command will trigger a harvest, you can always check the tasks in the queue (rabbitmq) with the following command:
$ ssh inspire-prod-broker1
$ rabbitmqctl -p inspire list_queues | grep harvests
$ inspirehep crawler schedule arXiv_single article --kwarg 'identifier=oai:arXiv.org:1604.05726'
You can check the logs by running:
$ inspirehep crawler job list
$ inspirehep crawler job logs <JOB_ID>