Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

maintenance tasks increases when asd is down (cleanuposdnamespace) #761

Open
jeroenmaelbrancke opened this issue Jul 19, 2017 · 3 comments
Assignees

Comments

@jeroenmaelbrancke
Copy link

I know it is normal that the tasks on the maintenance increases but after 15min the chunks on this asd will be created on other asds.
So if an asd is down for x hours the maintenance agent should ignore these tasks.

In my example osd 14 and 15 are down for 3 days and the amount of work still increase on the maintenance agent while the auto repair timeout is 900 seconds.

Maintenance config = {
  "enable_auto_repair": true,
  "auto_repair_timeout_seconds": 900.0,
  "auto_repair_disabled_nodes": [],
  "enable_rebalance": true,
  "cache_eviction_prefix_preset_pairs": {},
  "redis_lru_cache_eviction": {
    "host": "172.17.16.22",
    "port": 6379,
    "key": "alba_lru_56f58646-419d-4236-a868-e3b79ac8784d"
  }
}

work items:

54158935 | "(Albamgr_protocol.Protocol.Work.CleanupOsdNamespace (15L, 1004123L))"
54159096 | "(Albamgr_protocol.Protocol.Work.CleanupOsdNamespace (15L, 1004127L))"
54159149 | "(Albamgr_protocol.Protocol.Work.CleanupOsdNamespace (15L, 1004126L))"
54159150 | "(Albamgr_protocol.Protocol.Work.CleanupOsdNamespace (14L, 1004126L))"
54159271 | "(Albamgr_protocol.Protocol.Work.CleanupOsdNamespace (15L, 1004134L))"
54159324 | "(Albamgr_protocol.Protocol.Work.CleanupOsdNamespace (15L, 1004132L))"
54159377 | "(Albamgr_protocol.Protocol.Work.CleanupOsdNamespace (15L, 1004129L))"
54159430 | "(Albamgr_protocol.Protocol.Work.CleanupOsdNamespace (15L, 1004135L))"
54159483 | "(Albamgr_protocol.Protocol.Work.CleanupOsdNamespace (15L, 1004128L))"
54159536 | "(Albamgr_protocol.Protocol.Work.CleanupOsdNamespace (15L, 1004131L))"
54159537 | "(Albamgr_protocol.Protocol.Work.CleanupOsdNamespace (14L, 1004131L))"
54159589 | "(Albamgr_protocol.Protocol.Work.CleanupOsdNamespace (15L, 1004138L))"
54159590 | "(Albamgr_protocol.Protocol.Work.CleanupOsdNamespace (14L, 1004138L))"
54159642 | "(Albamgr_protocol.Protocol.Work.CleanupOsdNamespace (15L, 1004136L))"
54159704 | "(Albamgr_protocol.Protocol.Work.CleanupOsdNamespace (15L, 1004143L))"
54159757 | "(Albamgr_protocol.Protocol.Work.CleanupOsdNamespace (15L, 1004145L))"
54159863 | "(Albamgr_protocol.Protocol.Work.CleanupOsdNamespace (15L, 1004137L))"
54159916 | "(Albamgr_protocol.Protocol.Work.CleanupOsdNamespace (15L, 1004142L))"
54159917 | "(Albamgr_protocol.Protocol.Work.CleanupOsdNamespace (14L, 1004142L))"
54159969 | "(Albamgr_protocol.Protocol.Work.CleanupOsdNamespace (15L, 1004144L))"

amount of work items:
image

@toolslive
Copy link
Member

54159757 | "(Albamgr_protocol.Protocol.Work.CleanupOsdNamespace (15L, 1004145L))"

Delete all that's left on osd 15L from namespace 1004145L .
The namespace was deleted, but the osd was down, and the work item is kept in the work queue in te abm (and retried and tracked without success in the maintenance processes).

@wimpers
Copy link

wimpers commented Nov 27, 2017

@toolslive

  • Would the items be removed from the queue in case the OSD is removed through the model?
  • Would it make sense not to keep the work items in the queue in case of a namespace delete which didn't succeed but try again periodically based upon the flag that the namespace is deleted?

@toolslive
Copy link
Member

If the OSD was purged, the CleanupOsdNamespace items will complete without problem.
The maintenance agent that does it, will log

   "UnknownOsd(%Li) => no cleanup to be done anymore

on info level.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants