-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problem: AM1.16 MCPClient tries to refresh prometheus tmp database when prometheus is not enabled (and mcpclient crashed) #1711
Comments
As workaround I added prometheus monitoring to MCPClients in localhost:
|
I suspect we are missing a conditional in some places like this: We should check the monitoring is enabled before running metrics functions |
I think we do, but maybe we've missed some code path. The Before applying this workaround what were the values of those two settings in the MCPClient configuration? |
Do we have a sense if this is occurring anytime there is a batch task? |
Both values were not defined before in /etc/sysconfig/archivematica-mcp-client file, so based on README.md, so if port not defined: |
I couldn't reproduce the issue in a similar environment: Separate SS and dashboard VMs (Rocky9 AM1.16) starting transfers with amclient. Transfer type: zipped bag I used the following loop with the script a lot of times and never got the error (when using sleep=0 git some fails, because same tarball en sharedDir, but never prints the prometheus error) #!/bin/bash
#Use AM virtualenv
source /usr/share/archivematica/virtualenvs/archivematica/bin/activate
#VARS
API_KEY=XXXXXXXXXXX
AM_USER=test
AM_URL=https://YYYYYYYYYYYY.archivematica.net
TRANSFER_SOURCE=ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ
TRANSFER_NAME=tarredbag-from-amclient
TRANSFER_TYPE="zipped bag"
PROCESSING_CONFIG=automated
TRANSFER_DIR_OR_FILE=artefactual/archivematica-sampledata/SampleTransfers/BagExamples/TarGzBags/TarredBag.tar.gz
SLEEP_TIME=1
# Create loop
for i in {1..100}; do
echo "Running a zipped bag transfer with amclient: $i"
amclient create-package \
--am-user-name ${AM_USER} \
--am-url ${AM_URL} \
--transfer-source ${TRANSFER_SOURCE} \
--transfer-name ${TRANSFER_NAME} \
--transfer-type "${TRANSFER_TYPE}" \
--processing-config ${PROCESSING_CONFIG} \
${API_KEY} \
${TRANSFER_DIR_OR_FILE}
# sleep
echo "Sleep ${SLEEP_TIME}s"
sleep ${SLEEP_TIME}
# Clean dashboard
echo "Cleaning dashboard"
amclient close-completed-transfers \
--am-user-name ${AM_USER} \
--am-url ${AM_URL} \
${API_KEY}
amclient close-completed-ingests \
--am-user-name ${AM_USER} \
--am-url ${AM_URL} \
${API_KEY}
done |
We could see again the error in mcp-client but this time:
This time the error seems a gearmand error, but it is a problem in prometheus tmp dir. It happened having prometheus disabled in MCP-CLIENT, but service running after 1.5 months. RHEL9 deletes by default the directories in /tmp older than 10 days (it doesn't happen in Ubuntu). So the fix applied in artefactual-labs/ansible-archivematica-src#416 for new AM1.17 deployment/upgrades can be done from CLI:
|
Expected behaviour
Current behaviour
Sometimes MCPclient tries to refresh the prometheus database (I suppose when worker pool member is restarted because reaching the limit (new AM1.16 feature: pool of workers))
It is happening at the transfer start. And it is not happening in all transfers.
MCPServer log error:
MCPCLient log in syslog:
Steps to reproduce
It isn't easy to reproduce, it only happened 1 time on each 2 system pipelines. So I suspect it only happens when restarting a worker.
Your environment (version of Archivematica, operating system, other relevant details)
Rocky 9, AM1.16
SS separated VM
2 pipelines
The issue happened in both pipelines (Identical config)
For Artefactual use:
Before you close this issue, you must check off the following:
The text was updated successfully, but these errors were encountered: