Skip to content

Problem: AM1.16 MCPClient tries to refresh prometheus tmp database when prometheus is not enabled (and mcpclient crashed) #1711

Open
@mamedin

Description

@mamedin

Expected behaviour

  • No crash in MCPCient
  • If prometheus disabled in MCPclient, we don't need to refresh tmp prometheus database (because it doesn't exist)

Current behaviour

Sometimes MCPclient tries to refresh the prometheus database (I suppose when worker pool member is restarted because reaching the limit (new AM1.16 feature: pool of workers))

It is happening at the transfer start. And it is not happening in all transfers.

MCPServer log error:

DEBUG     2024-10-16 13:00:07  archivematica.mcp.server.packages:packages:_copy_from_transfer_sources:243:  source: 6ff352d0-f495-4330-920a-dcbe20a817b5.tar.gz, destination: /var/archivematica/sharedDirectory/tmp/tmp1gkokc0b/6ff352d0-f495-4330-920a-dcbe20a817b5.tar.gz
DEBUG     2024-10-16 13:00:11  archivematica.mcp.server.packages:packages:_start_package_transfer_with_auto_approval:424:  Package cbf9eee3-b327-4bb9-a2a3-915438751536: moving package to processing directory
DEBUG     2024-10-16 13:00:11  archivematica.mcp.server.packages:packages:_start_package_transfer_with_auto_approval:429:  Package cbf9eee3-b327-4bb9-a2a3-915438751536: starting workflow processing
DEBUG     2024-10-16 13:00:11  archivematica.mcp.server.jobs.chain:chain:__init__:89:  Creating JobChain 167dc382-4ab1-4051-8e22-e7f1c1bf3e6f for package cbf9eee3-b327-4bb9-a2a3-915438751536 (initial link 3229e01f-adf3-4294-85f7-4acb01b3fbcf)
INFO      2024-10-16 13:00:11  archivematica.mcp.server.jobs.client:client:run:104:  Running Extract zipped bag transfer (package cbf9eee3-b327-4bb9-a2a3-915438751536)
DEBUG     2024-10-16 13:00:11  archivematica.mcp.server.jobs.tasks:gearman_backend:submit:217:  Submitted gearman job d92fafcd-0b61-4ac1-93af-3e714db63a75 (extractzippedtransfer_v0.0)
ERROR     2024-10-16 13:00:11  archivematica.mcp.server.jobs.tasks:gearman_backend:update_task_results:238:  Gearman task batch d92fafcd-0b61-4ac1-93af-3e714db63a75 failed to execute
INFO      2024-10-16 13:00:11  archivematica.mcp.server.jobs.client:client:run:104:  Running Email fail report (package cbf9eee3-b327-4bb9-a2a3-915438751536)
DEBUG     2024-10-16 13:00:11  archivematica.mcp.server.jobs.tasks:gearman_backend:submit:217:  Submitted gearman job f19c544a-6b26-421c-bb3b-8552ec6c4cd6 (emailfailreport_v0.0)
ERROR     2024-10-16 13:00:11  archivematica.mcp.server.jobs.tasks:gearman_backend:update_task_results:238:  Gearman task batch f19c544a-6b26-421c-bb3b-8552ec6c4cd6 failed to execute
INFO      2024-10-16 13:00:11  archivematica.mcp.server.jobs.client:client:run:104:  Running Move to the failed directory (package cbf9eee3-b327-4bb9-a2a3-915438751536)
DEBUG     2024-10-16 13:00:11  archivematica.mcp.server.jobs.tasks:gearman_backend:submit:217:  Submitted gearman job 664f634a-9034-4a54-b85e-91438aae55d3 (movetransfer_v0.0)
ERROR     2024-10-16 13:00:11  archivematica.mcp.server.jobs.tasks:gearman_backend:update_task_results:238:  Gearman task batch 664f634a-9034-4a54-b85e-91438aae55d3 failed to execute

MCPCLient log in syslog:

Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]: archivematicaClient.py: DEBUG     2024-10-16 13:00:11,421  archivematica.mcp.client.gearman:handle_job:95:  Gearman job request 
d92fafcd-0b61-4ac1-93af-3e714db63a75 received for extractzippedtransfer_v0.0
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]: archivematicaClient.py: ERROR     2024-10-16 13:00:11,424  archivematica.mcp.client.gearman:on_job_exception:111:  An unhandled 
exception occurred processing a Gearman job
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]: Traceback (most recent call last):
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]:  File "/usr/share/archivematica/virtualenvs/archivematica/lib/python3.9/site-packages/gearman/worker.py", line 218, in on_job_ex
ecute
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]:    job_result = function_callback(self, current_job)
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]:  File "/opt/archivematica/archivematica/src/MCPClient/lib/client/gearman.py", line 101, in handle_job
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]:    with metrics.task_execution_time_histogram.labels(script_name=task_name).time():
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]:  File "/usr/share/archivematica/virtualenvs/archivematica/lib/python3.9/site-packages/prometheus_client/metrics.py", line 203, i
n labels
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]:    self._metrics[labelvalues] = self.__class__(
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]:  File "/usr/share/archivematica/virtualenvs/archivematica/lib/python3.9/site-packages/prometheus_client/metrics.py", line 600, i
n __init__
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]:    super().__init__(
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]:  File "/usr/share/archivematica/virtualenvs/archivematica/lib/python3.9/site-packages/prometheus_client/metrics.py", line 151, i
n __init__
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]:    self._metric_init()
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]:  File "/usr/share/archivematica/virtualenvs/archivematica/lib/python3.9/site-packages/prometheus_client/metrics.py", line 628, i
n _metric_init
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]:    self._sum = values.ValueClass(self._type, self._name, self._name + '_sum', self._labelnames, self._labelvalues, self._documen
tation)
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]:  File "/usr/share/archivematica/virtualenvs/archivematica/lib/python3.9/site-packages/prometheus_client/values.py", line 68, in __init__
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]:    self.__reset()
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]:  File "/usr/share/archivematica/virtualenvs/archivematica/lib/python3.9/site-packages/prometheus_client/values.py", line 82, in __reset
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]:    files[file_prefix] = MmapedDict(filename)
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]:  File "/usr/share/archivematica/virtualenvs/archivematica/lib/python3.9/site-packages/prometheus_client/mmap_dict.py", line 64, in __init__
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]:    self._f = open(filename, 'rb' if read_mode else 'a+b')
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]: FileNotFoundError: [Errno 2] No such file or directory: '/tmp/prometheus-stats7rrndq4_/histogram_1782.db'
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]: archivematicaClient.py: DEBUG     2024-10-16 13:00:11,461  archivematica.mcp.client.gearman:handle_job:95:  Gearman job request
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]: archivematicaClient.py: ERROR     2024-10-16 13:00:11,462  archivematica.mcp.client.gearman:on_job_exception:111:  An unhandled exception occurred processing a Gearman job
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]: Traceback (most recent call last):
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]:  File "/usr/share/archivematica/virtualenvs/archivematica/lib/python3.9/site-packages/gearman/worker.py", line 218, in on_job_execute
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]:    job_result = function_callback(self, current_job)
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]:  File "/opt/archivematica/archivematica/src/MCPClient/lib/client/gearman.py", line 101, in handle_job
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]:    with metrics.task_execution_time_histogram.labels(script_name=task_name).time():
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]:  File "/usr/share/archivematica/virtualenvs/archivematica/lib/python3.9/site-packages/prometheus_client/metrics.py", line 203, in labels
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]:    self._metrics[labelvalues] = self.__class__(
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]:  File "/usr/share/archivematica/virtualenvs/archivematica/lib/python3.9/site-packages/prometheus_client/metrics.py", line 600, in __init__
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]:    super().__init__(
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]:  File "/usr/share/archivematica/virtualenvs/archivematica/lib/python3.9/site-packages/prometheus_client/metrics.py", line 151, in __init__
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]:    self._metric_init()
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]:  File "/usr/share/archivematica/virtualenvs/archivematica/lib/python3.9/site-packages/prometheus_client/metrics.py", line 628, in _metric_init
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]:    self._sum = values.ValueClass(self._type, self._name, self._name + '_sum', self._labelnames, self._labelvalues, self._documentation)
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]:  File "/usr/share/archivematica/virtualenvs/archivematica/lib/python3.9/site-packages/prometheus_client/values.py", line 68, in __init__
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]:    self.__reset()
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]:  File "/usr/share/archivematica/virtualenvs/archivematica/lib/python3.9/site-packages/prometheus_client/values.py", line 82, in __reset
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]:    files[file_prefix] = MmapedDict(filename)
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]:  File "/usr/share/archivematica/virtualenvs/archivematica/lib/python3.9/site-packages/prometheus_client/mmap_dict.py", line 64, in __init__
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]:    self._f = open(filename, 'rb' if read_mode else 'a+b')
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]: FileNotFoundError: [Errno 2] No such file or directory: '/tmp/prometheus-stats7rrndq4_/histogram_1782.db'
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]: archivematicaClient.py: DEBUG     2024-10-16 13:00:11,489  archivematica.mcp.client.gearman:handle_job:95:  Gearman job request 664f634a-9034-4a54-b85e-91438aae55d3 received for movetransfer_v0.0
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]: archivematicaClient.py: ERROR     2024-10-16 13:00:11,490  archivematica.mcp.client.gearman:on_job_exception:111:  An unhandled exception occurred processing a Gearman job
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]: Traceback (most recent call last):
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]:  File "/usr/share/archivematica/virtualenvs/archivematica/lib/python3.9/site-packages/gearman/worker.py", line 218, in on_job_execute
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]:    job_result = function_callback(self, current_job)
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]:  File "/opt/archivematica/archivematica/src/MCPClient/lib/client/gearman.py", line 101, in handle_job
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]:    with metrics.task_execution_time_histogram.labels(script_name=task_name).time():
6 13:00:11 RACAM-RH-DEVMPL python[1782]:  File "/usr/share/archivematica/virtualenvs/archivematica/lib/python3.9/site-packages/gearman/worker.py", line 218, in on_job_execute
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]:    job_result = function_callback(self, current_job)
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]:  File "/opt/archivematica/archivematica/src/MCPClient/lib/client/gearman.py", line 101, in handle_job
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]:    with metrics.task_execution_time_histogram.labels(script_name=task_name).time():
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]:  File "/usr/share/archivematica/virtualenvs/archivematica/lib/python3.9/site-packages/prometheus_client/metrics.py", line 203, in labels
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]:    self._metrics[labelvalues] = self.__class__(
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]:  File "/usr/share/archivematica/virtualenvs/archivematica/lib/python3.9/site-packages/prometheus_client/metrics.py", line 600, in __init__
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]:    super().__init__(
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]:  File "/usr/share/archivematica/virtualenvs/archivematica/lib/python3.9/site-packages/prometheus_client/metrics.py", line 151, in __init__
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]:    self._metric_init()
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]:  File "/usr/share/archivematica/virtualenvs/archivematica/lib/python3.9/site-packages/prometheus_client/metrics.py", line 628, in _metric_init
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]:    self._sum = values.ValueClass(self._type, self._name, self._name + '_sum', self._labelnames, self._labelvalues, self._documentation)
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]:  File "/usr/share/archivematica/virtualenvs/archivematica/lib/python3.9/site-packages/prometheus_client/values.py", line 68, in __init__
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]:    self.__reset()
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]:  File "/usr/share/archivematica/virtualenvs/archivematica/lib/python3.9/site-packages/prometheus_client/values.py", line 82, in __reset
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]:    files[file_prefix] = MmapedDict(filename)
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]:  File "/usr/share/archivematica/virtualenvs/archivematica/lib/python3.9/site-packages/prometheus_client/mmap_dict.py", line 64, in __init__
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]:    self._f = open(filename, 'rb' if read_mode else 'a+b')
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]: FileNotFoundError: [Errno 2] No such file or directory: '/tmp/prometheus-stats7rrndq4_/histogram_1782.db'

Steps to reproduce

It isn't easy to reproduce, it only happened 1 time on each 2 system pipelines. So I suspect it only happens when restarting a worker.

Your environment (version of Archivematica, operating system, other relevant details)

Rocky 9, AM1.16

SS separated VM
2 pipelines

The issue happened in both pipelines (Identical config)


For Artefactual use:

Before you close this issue, you must check off the following:

  • All pull requests related to this issue are properly linked
  • All pull requests related to this issue have been merged
  • A testing plan for this issue has been implemented and passed (testing plan information should be included in the issue body or comments)
  • Documentation regarding this issue has been written and merged (if applicable)
  • Details about this issue have been added to the release notes (if applicable)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Status: reviewThe issue's code has been merged and is ready for testing/review.

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions