-
Notifications
You must be signed in to change notification settings - Fork 2
Troubleshooting biocache‐service
- Intro
- is solr running?
- Are the solr cores created?
- if you use biocache-store, is cassandra running?
- Other services that biocache-service needs
- Start order
- Increase the biocache log level
biocache-service
does not start correctly when they dependencies are not running or reacheable. A typical error message looks like:
{ "message": "Error creating bean with name 'qidCacheDao' defined in file [/var/lib/tomcat7/webapps-records-ws.l-a.site/ROOT/WEB-INF/classes/au/org/ala/biocache/dao/QidCacheDAOImpl.class]: Instantiation of bean failed; nested exception is org.springframework.beans.BeanInstantiationException: Failed to instantiate [au.org.ala.biocache.dao.QidCacheDAOImpl]: Constructor threw exception; nested exception is java.lang.NoClassDefFoundError: Could not initialize class au.org.ala.biocache.Config$", "errorType": "Server error" }
This is the ALA equivalent to the blue screen of death :-)
This wiki page tries to help you to fix this startup error with different checks.
In this example of commands, solr
is installed in ala-install-test-2
and biocache-service
in ala-install-test-1
.
Check the solr
service status, it should look like:
root@ala-install-test-2:~# service solr status
* solr.service - LSB: Controls Apache Solr as a Service
Loaded: loaded (/etc/init.d/solr; generated)
Active: active (exited) since Thu 2021-12-02 10:52:59 UTC; 6h ago
Docs: man:systemd-sysv-generator(8)
Tasks: 0 (limit: 4915)
CGroup: /system.slice/solr.service
Dec 02 11:51:36 ala-install-test-2 systemd[1]: solr.service: Failed to reset devices.list: Operation not permitted
Dec 02 11:51:39 ala-install-test-2 systemd[1]: solr.service: Failed to reset devices.list: Operation not permitted
Dec 02 11:51:39 ala-install-test-2 systemd[1]: solr.service: Failed to reset devices.list: Operation not permitted
Dec 02 11:51:39 ala-install-test-2 systemd[1]: solr.service: Failed to reset devices.list: Operation not permitted
Dec 02 11:51:41 ala-install-test-2 systemd[1]: solr.service: Failed to reset devices.list: Operation not permitted
Dec 02 11:51:41 ala-install-test-2 systemd[1]: solr.service: Failed to reset devices.list: Operation not permitted
Dec 02 11:51:41 ala-install-test-2 systemd[1]: solr.service: Failed to reset devices.list: Operation not permitted
Dec 02 11:51:44 ala-install-test-2 systemd[1]: solr.service: Failed to reset devices.list: Operation not permitted
Dec 02 11:54:38 ala-install-test-2 systemd[1]: solr.service: Failed to reset devices.list: Operation not permitted
Dec 02 11:54:39 ala-install-test-2 systemd[1]: solr.service: Failed to reset devices.list: Operation not permitted
Check if the port is listenning:
root@ala-install-test-2:~# lsof -i:8983
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
java 29838 solr 137u IPv6 757352331 0t0 TCP ala-install-test-2:8983 (LISTEN
If not, check the memory of the VM and the logs to verify why solr
is not running.
If it's running, check if is accessible from the biocache-service
VM (in our example ala-install-test-1
):
root@ala-install-test-1:~# grep solr.home /data/biocache/config/biocache-config.properties
solr.home=http://index-es.l-a.site:8983/solr/biocache
and now we'll try to connect in a similar way:
root@ala-install-test-1:~# nc index-es.l-a.site 8983 -v
Connection to index-es.l-a.site 8983 port [tcp/*] succeeded!
If is not reacheable, verify things like the name resolution:
root@ala-install-test-1:~# ping -c 1 index-es.l-a.site
PING ala-install-test-2 (10.10.10.152) 56(84) bytes of data.
64 bytes from ala-install-test-2 (10.10.10.152): icmp_seq=1 ttl=64 time=0.027 ms
--- ala-install-test-2 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.027/0.027/0.027/0.000 ms
and:
root@ala-install-test-1:~# getent ahosts index-es.l-a.site
10.10.10.152 STREAM ala-install-test-2
10.10.10.152 DGRAM
10.10.10.152 RAW
and if you have some firewall between the VMs allow the traffic for solr
(8983/tcp
) and/or zookeeper
(2181/tcp
).
You have to verify that the solr
cores were created by ala-install
:
root@ala-install-test-2:~# ls -l /data/solr/data/
total 20
drwxr-xr-x 4 solr solr 4096 Dec 2 10:53 bie
drwxr-xr-x 4 solr solr 4096 Dec 2 10:53 bie-offline
drwxr-xr-x 4 solr solr 4096 Dec 2 10:53 biocache
-rw-r----- 1 solr solr 2180 Dec 2 10:51 solr.xml
-rw-r----- 1 solr solr 975 Dec 2 10:51 zoo.cfg
You can verify the cores in the solr
interface using the solr
admin interface that should be protected.
You should check similar things with cassandra
, in our example is running in ala-install-test-3
:
root@ala-install-test-3:~# service cassandra status
* cassandra.service - LSB: distributed storage system for structured data
Loaded: loaded (/etc/init.d/cassandra; generated)
Active: active (running) since Thu 2021-12-02 10:50:26 UTC; 6h ago
Docs: man:systemd-sysv-generator(8)
Tasks: 58 (limit: 4915)
CGroup: /system.slice/cassandra.service
`-29012 /usr/bin/java -Xloggc:/var/log/cassandra/gc.log -ea -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -XX:+HeapDumpOnOutOfMemoryError -Xss256k -XX:StringTableSize=1000003 -XX:+AlwaysPreTouc
Dec 02 11:28:44 ala-install-test-3 systemd[1]: cassandra.service: Failed to reset devices.list: Operation not permitted
Dec 02 11:29:54 ala-install-test-3 systemd[1]: cassandra.service: Failed to reset devices.list: Operation not permitted
Dec 02 11:29:54 ala-install-test-3 systemd[1]: cassandra.service: Failed to reset devices.list: Operation not permitted
Dec 02 11:29:57 ala-install-test-3 systemd[1]: cassandra.service: Failed to reset devices.list: Operation not permitted
Dec 02 11:29:57 ala-install-test-3 systemd[1]: cassandra.service: Failed to reset devices.list: Operation not permitted
Dec 02 11:29:57 ala-install-test-3 systemd[1]: cassandra.service: Failed to reset devices.list: Operation not permitted
Dec 02 11:29:59 ala-install-test-3 systemd[1]: cassandra.service: Failed to reset devices.list: Operation not permitted
Dec 02 11:29:59 ala-install-test-3 systemd[1]: cassandra.service: Failed to reset devices.list: Operation not permitted
Dec 02 11:29:59 ala-install-test-3 systemd[1]: cassandra.service: Failed to reset devices.list: Operation not permitted
Dec 02 11:30:02 ala-install-test-3 systemd[1]: cassandra.service: Failed to reset devices.list: Operation not permitted
Let's see if the port 9042
is listenning:
root@ala-install-test-3:~# lsof -i:9042
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
java 29012 cassandra 85u IPv4 757303579 0t0 TCP *:9042 (LISTEN)
java 29012 cassandra 89u IPv4 759281357 0t0 TCP ala-install-test-3:9042->ala-install-test-1:59370 (ESTABLISHED)
java 29012 cassandra 90u IPv4 759281428 0t0 TCP ala-install-test-3:9042->ala-install-test-1:59380 (ESTABLISHED)
and also if is reacheable from the biocache-service
VM:
root@ala-install-test-1:~# grep cassandra.host /data/biocache/config/*
/data/biocache/config/biocache-config.properties:# cassandra hosts - this should be comma separated list in the case of a cluster
/data/biocache/config/biocache-config.properties:cassandra.hosts=ala-install-test-3
root@ala-install-test-1:~# nc ala-install-test-3 9042 -v
Connection to ala-install-test-3 9042 port [tcp/*] succeeded!
If not, check again dns resolution and firewall rules to allow this tcp traffic.
Let's check is a quick way if other services more than solr
and cassandra
are up an reacheable:
root@ala-install-test-1:~# for i in $(grep http /data/biocache/config/biocache-config.properties | cut -d "=" -f 2 | grep -v zip | sort | uniq) ; do echo; echo $i ----; curl --write-out '%{http_code}' --silent --output /dev/null $i; done
https://auth.l-a.site/userdetails ----
302
https://auth.ala.org.au/apikey/ws/check?apikey ----
200
https://collections.l-a.site/ws ----
200
https://collections.l-a.site/ws/citations ----
200
https://collections.l-a.site/ws/collection ----
200
https://dataquality.ala.org.au/ ----
000
https://doi.l-a.site ----
200
https://doi.l-a.site/api/ ----
302
https://doi.l-a.site/doi/ ----
200
https://doi.l-a.site/myDownloads ----
302
https://spatial.l-a.site/geoserver ----
302
https://spatial.l-a.site/ws ----
302
https://spatial.l-a.site/ws/fields ----
200
https://species-ws.l-a.site ----
200
https://images.l-a.site ----
200
https://lists.l-a.site ----
302
https://logger.l-a.site/service/logger/ ----
200
https://records.l-a.site ----
200
https://records.l-a.site/download/doi?doi ----
302
https://records-ws.l-a.site ----
200
https://records-ws.l-a.site/biocache-download ----
301
https://records-ws.l-a.site/biocache-media/ ----
404
If you see some 500 error or more 404 errors, verify that services.
There is a script that do this better and more visually here.
If you suffer so power outage and your VMs restart, sometimes biocache-service starts before their dependencies are up, failling to start. This also happens if you have many services in the same VM and biocache-service
starts before others. In this case, try to restart only this biocache-service
. In this case a simple touch should be enough:
root@ala-install-test-1:~# touch /var/lib/tomcat8/webapps-records-ws.l-a.site/ROOT.war
Edit /data/biocache/config/log4j.xml
and increase the log levels to get a more verbose logs.
Index
- Wiki home
- Community
- Getting Started
- Support
- Portals in production
- ALA modules
- Demonstration portal
- Data management in ALA Architecture
- DataHub
- Customization
- Internationalization (i18n)
- Administration system
- Contribution to main project
- Study case