-
Notifications
You must be signed in to change notification settings - Fork 155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cant delete large file #1185
Comments
@tnatanael What you'd have to do for deleting file physically is "compaction-start" as described here: http://leo-project.net/leofs/docs/admin/system_operations/data/#how-to-operate-data-compaction please check the doc above out for more details. |
Tried with leofs-adm compact-start, it says OK, but the file persists, even after waiting the process to finish |
Let us know your LeoFS' error log and the state of the large object:
|
How can i discover the object name? Is it the filename of the original file? |
when i run compact and when i try to delete the file using s3 api this error message pops on log |
Exactly, leofs-adm whereis |
I tried with only filename... And bucket + filename The 2 options says |
I am 100% sure that the file was corrupt due to disk failures, but it may need to be cleared by the manual delete or auto by the cluster in some way |
I've understood that your LeoFS' RING (routing table) is broken. So let me know the current state of the system. Can you share the result of |
|
Ok... only to state, the cluster is still working, i am uploading and removing new files right now... only this file is undeletable... |
TO: @mocchira your opinion will be much appreciated. |
Hy guys! Can this ticket be labelled as a bug instead of question? |
Please do the following if you understand that we may NOT be able to restore your system completely. Procedure:
If you succeeded in doing the procedure, you can execute the |
I'd like to share an example of the procedure of recovering LeoManager's RING as below. [Example] How To Recover LeoManager's RINGBefore recovery:$ leofs-adm status
[System Confiuration]
-----------------------------------+----------
Item | Value
-----------------------------------+----------
Basic/Consistency level
-----------------------------------+----------
system version | 1.5.0
cluster Id | leofs_1
DC Id | dc_1
Total replicas | 2
number of successes of R | 1
number of successes of W | 1
number of successes of D | 1
number of rack-awareness replicas | 0
ring size | 2^128
-----------------------------------+----------
Multi DC replication settings
-----------------------------------+----------
[mdcr] max number of joinable DCs | 2
[mdcr] total replicas per a DC | 1
[mdcr] number of successes of R | 1
[mdcr] number of successes of W | 1
[mdcr] number of successes of D | 1
-----------------------------------+----------
Manager RING hash
-----------------------------------+----------
current ring-hash |
previous ring-hash |
-----------------------------------+----------
[State of Node(s)]
-------+--------------------------+--------------+---------+----------------+----------------+----------------------------
type | node | state | rack id | current ring | prev ring | updated at
-------+--------------------------+--------------+---------+----------------+----------------+----------------------------
S | storage_0@127.0.0.1 | running | | d5d667a6 | d5d667a6 | 2019-05-23 10:10:33 +0900
S | storage_1@127.0.0.1 | running | | d5d667a6 | d5d667a6 | 2019-05-23 10:10:33 +0900
S | storage_2@127.0.0.1 | running | | d5d667a6 | d5d667a6 | 2019-05-23 10:10:33 +0900
-------+--------------------------+--------------+---------+----------------+----------------+---------------------------- The Procedure of Recovering LeoManager's RING1. Stop the all nodes$ ./package/leo_manager_0/bin/leo_manager stop
ok
$ ./package/leo_manager_1/bin/leo_manager stop
ok
$ ./package/leo_gateway_0/bin/leo_gateway stop
ok
$ ./package/leo_storage_0/bin/leo_storage stop
ok
$ ./package/leo_storage_1/bin/leo_storage stop
ok
$ ./package/leo_storage_2/bin/leo_storage stop
ok 2. Archive LeoManager's directories$ tar czf leo_manager_0_backup.ta.gz ./package/leo_manager_0/
$ tar czf leo_manager_1_backup.ta.gz ./package/leo_manager_1/
$ ls -la | grep backup.ta.gz
-rw-r--r-- 1 yosukehara staff 15435040 May 23 10:12 leo_manager_0_backup.ta.gz
-rw-r--r-- 1 yosukehara staff 15429047 May 23 10:12 leo_manager_1_backup.ta.gz 3. Remove the LeoManager's data directories## manager_0:
$ rm -rf ./package/leo_manager_0/work/mnesia/*
## manager_1:
$ rm -rf ./package/leo_manager_1/work/mnesia/*
4. Restart the all nodes except LeoGateway's node(s)$ ./package/leo_manager_0/bin/leo_manager start
$ ./package/leo_manager_1/bin/leo_manager start
$ ./package/leo_storage_0/bin/leo_storage start
$ ./package/leo_storage_1/bin/leo_storage start
$ ./package/leo_storage_2/bin/leo_storage start
$ leofs-adm status
[System Confiuration]
-----------------------------------+----------
Item | Value
-----------------------------------+----------
Basic/Consistency level
-----------------------------------+----------
system version | 1.5.0
cluster Id | leofs_1
DC Id | dc_1
Total replicas | 2
number of successes of R | 1
number of successes of W | 1
number of successes of D | 1
number of rack-awareness replicas | 0
ring size | 2^128
-----------------------------------+----------
Multi DC replication settings
-----------------------------------+----------
[mdcr] max number of joinable DCs | 2
[mdcr] total replicas per a DC | 1
[mdcr] number of successes of R | 1
[mdcr] number of successes of W | 1
[mdcr] number of successes of D | 1
-----------------------------------+----------
Manager RING hash
-----------------------------------+----------
current ring-hash |
previous ring-hash |
-----------------------------------+----------
[State of Node(s)]
-------+--------------------------+--------------+---------+----------------+----------------+----------------------------
type | node | state | rack id | current ring | prev ring | updated at
-------+--------------------------+--------------+---------+----------------+----------------+----------------------------
S | storage_0@127.0.0.1 | attached | | | | 2019-05-23 10:14:00 +0900
S | storage_1@127.0.0.1 | attached | | | | 2019-05-23 10:14:03 +0900
S | storage_2@127.0.0.1 | attached | | | | 2019-05-23 10:14:05 +0900
-------+--------------------------+--------------+---------+----------------+----------------+---------------------------- After restarted the all nodes, execute
|
I'll try this tomorrow and return, but i am wondering why this happens? It appear to be an expected behaviour? Thanks for now! |
Sorry for delay... it worked... after that procedure i was able to delete the file... |
@yosukehara I tried to do follow your instructions but I found some problem. After I restart all leofs service. All User and Buckets are disappear. So I recovery mnesia folder at leo_manager Everything is back. but RING is broken again. Can you suggest us how to fix this problem? Thanks |
Hy guys, i created a simple cluster, with 2 storages, and after uploading a file with 1Gb and running the cluster for 1 week, i am not able to delete this file, the delete operation runs with success but the file persists...
What i tried:
recover-node
recover-disk
recover-consistency
I wonder that when i put the cluster in the production env, with so many files this would be a very annoying bug, so please help me.
The text was updated successfully, but these errors were encountered: