Mount fails because "LTFS17285E Failed to search the final index in IP (1)" even when ltfs can try to search on DP. #479

amissael95 · 2024-08-23T00:21:43Z

Describe the bug

When a tape cartridge with a write permanent error is trying to be mounted and the MAM (cartridge memory) attribute of the Index Partion (IP) stores a generation number lower than the MAM attribute of the Data Partition (DP), the mount process fails with error LTFS17285E even when ltfs can still search for the index in the DP.

Following logs shows that scenario.

LTFS11005I Mounting the volume.
LTFS30252I Logical block protection is disabled.
LTFS11333I A cartridge with write-perm error is detected on IP. Seek the newest index (IP: Gen = 26, VCR = 152) (DP: Gen = 27, VCR = 252) (VCR = 180).
LTFS17283I Detected unmatched VCR value between MAM and VCR (152, 180).
LTFS17284I Seaching the final index in IP.
LTFS17285E Failed to search the final index in IP (1).
LTFS14013E Cannot mount the volume.

To Reproduce

Select a tape with a write permamnt error to be mounted
Look for message LTFS11333I and confirm that the IP Generation is lower than than the DP Generation:

LTFS11333I  A cartridge with write-perm error is detected on %s. Seek the newest index (IP: Gen = %llu, VCR = %llu) (DP: Gen = %llu, VCR = %llu) (VCR = %llu)." }

The mount process fails with LTFS14013E since Failed to search the final index in IP (LTFS17285E)

Note: It is hard to reproduce since as mentioned above the tape cartridge needs to have a write permanent error.

Expected behavior
A clear and concise description of what you expected to happen.

It seems the issue can be solved by making the _ltfs_search_index_wp@ltfs/src/libltfs/ltfs.c process to continue searching on the DP even if the search on the IP fails. (It can be done by setting can_skip_ip = true).

ltfs/src/libltfs/ltfs.c

Lines 1464 to 1507 in 7271446

 static inline int _ltfs_search_index_wp(bool recover_symlink, bool can_skip_ip, 

 struct tc_position *seekpos, struct ltfs_volume *vol) 

 { 

 int ret = 0; 

 tape_block_t end_pos, index_end_pos; 

 bool fm_after, blocks_after; 

 ltfsmsg(LTFS_INFO, 17284I, "IP"); 

 ret = ltfs_seek_index(vol->label->partid_ip, &end_pos, &index_end_pos, &fm_after, 

 &blocks_after, recover_symlink, vol); 

 if (ret) { 

 if (can_skip_ip) { 

 ltfsmsg(LTFS_INFO, 17289I); 

 vol->ip_coh.count = 0; 

 vol->ip_coh.set_id = 0; 

 } else { 

 ltfsmsg(LTFS_ERR, 17285E, "IP", ret); 

 return -LTFS_INDEX_INVALID; 

 } 

 } 

 ltfsmsg(LTFS_INFO, 17284I, "DP"); 

 ret = ltfs_seek_index(vol->label->partid_dp, &end_pos, &index_end_pos, &fm_after, 

 &blocks_after, recover_symlink, vol); 

 if (ret < 0) { 

 ltfsmsg(LTFS_ERR, 17285E, "DP", ret); 

 return -LTFS_INDEX_INVALID; 

 } 

 /* Use the latest index on the tape */ 

 ltfsmsg(LTFS_INFO, 17288I, 

 (unsigned long long)vol->ip_coh.count, (unsigned long long)vol->ip_coh.set_id, 

 (unsigned long long)vol->dp_coh.count, (unsigned long long)vol->dp_coh.set_id); 

 if (vol->ip_coh.count > vol->dp_coh.count) { 

 seekpos->partition = ltfs_part_id2num(vol->label->partid_ip, vol); 

 seekpos->block = vol->ip_coh.set_id; 

 } else { 

 seekpos->partition = ltfs_part_id2num(vol->label->partid_dp, vol); 

 seekpos->block = vol->dp_coh.set_id; 

 } 

 return 0; 

 }

Additional context

This makes me to ask, was there any reason to avoid the index to be searched on the Data Partition?

The "can_skip_ip" flag was explicitaly set to false in the following commit 3287850, was there any special reason to do that?

The text was updated successfully, but these errors were encountered:

piste-jp · 2024-08-23T09:15:23Z

It looks a bug.

The blocks

ltfs/src/libltfs/ltfs.c

Lines 1661 to 1663 in 7271446

 ret = _ltfs_search_index_wp(recover_symlink, false, &seekpos, vol); 

 if (ret < 0) 

 goto out_unlock;

and

ltfs/src/libltfs/ltfs.c

Lines 1690 to 1693 in 7271446

 /* Index of IP could be corrupted. So set skip flag */ 

 ret = _ltfs_search_index_wp(recover_symlink, true, &seekpos, vol); 

 if (ret < 0) 

 goto out_unlock;

shall be swapped.

Upper code belongs to the logic that handles WP happens on IP. So index on IP might corrupted, thus skip flag shall be true.

But lower code belongs to the logic that handles WP happens on DP. The index shall be searched from IP. So skip flag shall be false;

amissael95 · 2024-08-23T17:13:05Z

Hello @piste-jp,

Thanks for quick response. I am curious. Could we just remove the "can_skip_ip" flag and let _ltfs_search_index_wp function to try to search the index on both, in the DP and DP?

At the end the logic consists in using the latest index on tape, so it does not hurt to simply try to search the index on both partitions, mark the index as 0 in case some searching failed, and use the latest index.

Regards

piste-jp · 2024-08-26T00:44:03Z

Could we just remove the "can_skip_ip" flag and let _ltfs_search_index_wp function to try to search the index on both, in the DP and DP?

I believe it's little bit dangerous. Because the block starts from L1680 means the tape says IP has the latest index. So an index on IP must be existed at least. Why do we provide a skip flag or obsolete the skip flag and always allow the skip?

ltfs/src/libltfs/ltfs.c

Lines 1680 to 1695 in 7271446

 if (volume_change_ref != vol->ip_coh.volume_change_ref) { 

 /* 

  * Cannot trust the index info on MAM, search the last indexes 

  * This would happen when the drive returns an error against acquiring the VCR 

  * while write error handling. 

  */ 

 ltfsmsg(LTFS_INFO, 17283I, 

 (unsigned long long)vol->dp_coh.volume_change_ref, 

 (unsigned long long)volume_change_ref); 

 /* Index of IP could be corrupted. So set skip flag */ 

 ret = _ltfs_search_index_wp(recover_symlink, true, &seekpos, vol); 

 if (ret < 0) 

 goto out_unlock; 

 } else {

Your proposal might relax acceptable tape condition a little bit it just ignores unexpected behavior of tape drive or LTFS itself. I believe we need to understand why it happens if that really happens. And fix it correctly. But your proposal just hide that fact with any knowledge.

I believe it's not time to do that at this time.

amissael95 · 2024-08-27T02:39:16Z

@piste-jp,

I have created the following PR #480 with the modifications that you pointed out.

Do you think we can ensure that the change will not break the tape, and it is safe to be Implemented? I am currently trying to replicate this scenario using itdt... I think the only problem is in case we write incorrect index data into the tape.

In addition, it is good to emphasize that this involves a "data lost" scenario, since the index found will not point to all files within the tape.

Regards

piste-jp · 2024-08-27T03:17:36Z

I have created the following PR #480 with the modifications that you pointed out.

Do you think we can ensure that the change will not break the tape, and it is safe to be Implemented? I am currently trying to replicate this scenario using itdt... I think the only problem is in case we write incorrect index data into the tape.

For PR discussion, you need to use the comment thread on the PR. Let's use #480.

In addition, it is good to emphasize that this involves a "data lost" scenario, since the index found will not point to all files within the tape.

I cannot understand this ... Why?

amissael95 · 2024-08-28T21:25:46Z

I cannot understand this ... Why?

What I meant is because the write perm error I am not sure if we can trust the state of the indexes within the tape. According to the LTFS standard v2.4:

A volume that has been locked because a permanent write error "shall be mounted as read-only using the highest generation index available on the tape in either partition"

Is it possible that the highest index found available within the tape corresponds to a previous generation and therefore do not specify the latest files within the tape?

Could you confirm if after write perm error and successfully find the latest index in either partition that index will always point to the latest file within the tape?

Really appreciate your support

Regards

piste-jp · 2024-08-29T00:40:17Z

First of all, data lost or data loss is really strong word for storage engineers. They must be used only when data that is once written on medium disappear unexpectedly in some reasons. So, we have to say this is a data loss only when that happens because of a bug of LTFS's logic.

In this case, it is clear that your scenario is not a data loss problem at all. Because LTFS never write (or overwrite) anything at read only mount process.

Second, it looks you pointed out the scenario based on reading through only the mount process logic. I believe it is not a correct approach. You need to understand the implementation of write side.

Long story short, when LTFS gets a write perm from the drive, LTFS writes down an index
to another partition, writes current index information on MAM and marks the tape as single write perm tape. So the latest index shall be read by the tape drive at mount time.

Is it possible that the highest index found available within the tape corresponds to a previous generation and therefore do not specify the latest files within the tape?

The drive returned a GOOD response after writing latest index on tape. So LTFS marks it is single write perm tape. The drive must find the latest index correctly or return read perm error from specification point of view.

Could you confirm if after write perm error and successfully find the latest index in either partition that index will always point to the latest file within the tape?

I can review if you provide such code. But honestly, it's not sure I need to do this because,

The required information is already logged
May be final result (mount with an index that is found this scan) is same

I believe reporting mount error (and fail) when read index generator is matched to the one on MAM is no benefit to users.

perezle · 2024-09-10T17:39:00Z

Nice talking to you @piste-jp. Yes we will take care of the pull request. Thanks!

amissael95 linked a pull request Aug 27, 2024 that will close this issue

Changed flags for _ltfs_search_index_wp #480

Open

6 tasks

piste-jp linked a pull request Aug 27, 2024 that will close this issue

Changed flags for _ltfs_search_index_wp #480

Open

6 tasks

piste-jp assigned amissael95 Aug 27, 2024

piste-jp added the bug label Aug 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mount fails because "LTFS17285E Failed to search the final index in IP (1)" even when ltfs can try to search on DP. #479

Mount fails because "LTFS17285E Failed to search the final index in IP (1)" even when ltfs can try to search on DP. #479

amissael95 commented Aug 23, 2024 •

edited

Loading

piste-jp commented Aug 23, 2024

amissael95 commented Aug 23, 2024

piste-jp commented Aug 26, 2024

amissael95 commented Aug 27, 2024

piste-jp commented Aug 27, 2024 •

edited

Loading

amissael95 commented Aug 28, 2024

piste-jp commented Aug 29, 2024

perezle commented Sep 10, 2024

Mount fails because "LTFS17285E Failed to search the final index in IP (1)" even when ltfs can try to search on DP. #479

Mount fails because "LTFS17285E Failed to search the final index in IP (1)" even when ltfs can try to search on DP. #479

Comments

amissael95 commented Aug 23, 2024 • edited Loading

piste-jp commented Aug 23, 2024

amissael95 commented Aug 23, 2024

piste-jp commented Aug 26, 2024

amissael95 commented Aug 27, 2024

piste-jp commented Aug 27, 2024 • edited Loading

amissael95 commented Aug 28, 2024

piste-jp commented Aug 29, 2024

perezle commented Sep 10, 2024

amissael95 commented Aug 23, 2024 •

edited

Loading

piste-jp commented Aug 27, 2024 •

edited

Loading