Replies: 5 comments 4 replies
-
If this helps, I'm getting this (multiple times) in the datanode container logs after trying to delete a file:
|
Beta Was this translation helpful? Give feedback.
-
Deletes are asynchronous in Ozone. Once the file is deleted, Ozone Manager marks the file as deleted. This deletion is propagated to the SCM and then the SCM sends a delete request to Datanode. So it takes some time for the relevant background services to run and reclaim the blocks, Background Service for Key Deletion:Ozone Manager
SCM
DataNode
Typically each of these services by default runs every Question:
|
Beta Was this translation helpful? Give feedback.
-
+1 to Aswin's suggestion. Additionally, could you share which version of Ozone you are running? I think 1.4.0 should have most/all of the known deletion improvements. See this comment for a list. There were two issues HDDS-11492 and HDDS-11491 identified recently, but those apply to dense filesystem hierarchies which doesn't look to be relevant here. You may actually have to wait up to 10 minutes for the space to get reclaimed. This is because SCM will read the block deletes from RocksDB, and they will not be flushed down from Ratis until every 10 minutes according to the default value of
This means the datanodes do not have any block delete operations to process. The deletions could still be running through OM or SCM, or it could be the issue fixed in HDDS-7156 if you are using a version earlier than 1.4.0. On a large cluster constantly undergoing operations, space reclamation taking a few minutes usually works out fine. However on a small test cluster it can definitely be confusing behavior. I think we will have a set of tasks to improve Ozone's deletion flow coming up soon, which will likely involve adding Grafana dashboards that can help track deletions end to end through the system. |
Beta Was this translation helpful? Give feedback.
-
Hi! First of all, thank you very much both for your answers.
Being honest, long enough hahaha. I sometimes waited for half an hour before giving up or in the case of the HA deployment, it was left with the file deleted for a whole weekend and it didn't free up the space. (What I mean by file deleted in this case is the key. I just sent the delete-object command using the aws s3api)
Not at all :(. In fact, I saw the OM message telling the SCM the key to be deleted and which blocks store the data for that key. But it doesn't go further (no communication between SCM and datanodes for the deletion)
I'm using the 1.4 version deployed using docker compose. These days I kept doing tests. I'm now running a fresh deployment in my computer using as well the docker compose and here it seems to work but not always. During my tests I noticed that restarting the containers usually helps. I'll now explain my last test: I deployed the containers using docker compose(fresh install, no old data in the containers and no custom config in the docker-config file), created the bucket, uploaded a big file (ubuntu iso (5.8GB)) and waited in the logs until it marked the containers are closed and proceeded to delete the file. The delete arrived properly in the logs: This (as you can see in the logs) was done at 08:58:07. I waited for a bit more than half an hour and nothing happened, so I decided to restart the containers as it had helped me in other tests. This was done at 09:36: And it helped, 6 minutes later ozone started deleting the file: As I said in the first comment we also have a deployment with high availability in 3 different machines. It was left for the whole weekend with the file deleted and in this case the file is still in the machines: I'm sorry for not giving much useful information. Sometimes it works, sometimes it doesn't and I'm still trying to figure out whats going on. I'll keep posting here the results of my tests if I find out something. I'll write here the steps to reproduce my last test in case anyone can check if it works for them: The test was made in a ubuntu 24 desktop pc and using the default ozone image (version 1.4) 1- Download the docker-compose.yaml and docker-config: Thank you very much in advance! |
Beta Was this translation helpful? Give feedback.
-
Hi! After a break, I gave it a try again and I've made some progress. I made some tests again in my local deployment (uploaded around 1.5GB of files and then deleted them). It wasn't freeing up space doesn't matter how much I waited. But then I discovered this command: Then I made the same test in my deployment in 3 different machines with High Availability (which had 6 GB of space that wasn't being freed up) and the same, the moment I used the command it starting deleting the blocks. This was left in this state for around 20 days and it hadn't freed up space until I used the command, so I'm guessing it isn't a problem of not giving it enough time. But I'm guessing this is not expected to be done manually, right? Thank you very much! |
Beta Was this translation helpful? Give feedback.
-
Hi all!
I was testing if there were upload limits with ozone so I tried to upload the ubuntu iso (6GB file) to my ozone deployment (version 1.4 and deployed using docker compose).
I uploaded the file using the aws cli api:
And after it uploaded correctly I tried deleting it:
It seemed like it went well, the list-objects call didn't show up the file.
But after checking the recon server the 3 datanodes showed up that ozone was using 5.7GB of storage.
I did the same test again (uploading and deleting) and the recon server marked 11GB of storage used by ozone.
I checked the storage in my machine using
df -h
and it was indeed less free storage than expected.I have ozone deployed using docker, so I tried removing one datanode container, deleting its files and deploying it back and it showed up 0GB in the recon server.
I don't really know how deletions work in ozone but looking for information I saw that the SCM is expected to mark the containers to be removed and the garbage collector should remove them if they are closed. But checking the container list with
ozone admin container list
it lists 15 containers, the 15 of them are closed but none of them marked to be deleted.I thought it could be some communication problem between the SCM and the datanodes/containers because of we have it with High Availability, but it also happened to us in other machine where we have a single-deployment.
Does anyone know if there is any configuration property to add to my docker-config or my ozone-site.xml so that the delete works? I'm kinda lost here and couldn't find anything related to this in the documentation.
Thank you very much in advance!
Beta Was this translation helpful? Give feedback.
All reactions