Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failure to Remount Quantum LTO-7 HH Tape after Format/Write/Unmount/Eject #378

Closed
DizzyThermal opened this issue Dec 8, 2022 · 5 comments
Assignees

Comments

@DizzyThermal
Copy link

DizzyThermal commented Dec 8, 2022

Describe the bug
I am unable to remount a tape that was formatted LTFS and copied to.. I am using ltfs-2.4-stable on RHEL 8.7 Ootpa. I am formatting the Quantum LTO-7 HH Tape, copying contents to the tape, unmounting, and ejecting the tape.

To Reproduce
Steps to reproduce the behavior:

Format Tape:
mkltfs -d /dev/sg4

Mount Tape:
ltfs -o devname=/dev/sg4 /LTFS

Copy Content:
cp files*.iso /LTFS

Unmount Tape:
umount /LTFS

Ejecting Tape:
mt -f /dev/st0 eject

** Physically removing and reinserting Tape **

To Remount Tape:
ltfs -o devname=/dev/sg4 /LTFS

Expected behavior
Expecting for the mount to succeed and be able to access the previously written files on mountpoint (/LTFS)

Additional context
I am in an airgapped environment and cannot transfer logs out of the environment, however, I can type out the log lines.

On mount, here's the errors (typed):

[root@rhel8 ~]# ltfs -o devname=/dev/sg4 /LTFS
819e LTFS14000I LTFS starting, LTFS version 2.4.5.1 (Prelim), log level 2
819e LTFS14058I LTFS Format Specification version 2.4.0.
819e LTFS14104I Launched by "ltfs -o devname=/dev/sg4 /LTFS"
819e LTFS14105I This binary is built for Linux (x86_64).
819e LTFS14106I GCC version is 8.5.0 20210514 (Red Hat 8.5.0-10).
819e LTFS17087I Kernel version: Linux version 4.18.0-425.3.1.el8.x86_64 (mockbuild@x86-vm-08.build.eng.bos.redhat.com) (gcc version 8.5.0 20210514 (Red Hat 8.5.0-15) (GCC)) #1 SMP Fri Sep 30 11:45:06 EDT 2022 i386.
819e LTFS17089I Distribution: NAME="Red Hat Enterprise Linux".
819e LTFS17089I Distribution: Red Hat Enterprise Linux release 8.7 (Ootpa).
819e LTFS17089I Distribution: Red Hat Enterprise Linux release 8.7 (Ootpa).
819e LTFS14063I Sync type is "time", Sync time is 300 sec.
819e LTFS17085I Plugin: Loading "sg" tape backend.
819e LTFS17085I Plugin: Loading "unified" iosched backend.
819e LTFS14063I Set the tape device write-anywhere mode to avoid cartridge ejection.
819e LTFS30209I Opening a device through sg-ibmtape driver (/dev/sg4).
819e LTFS30250I Opened the SCSI tape device 1.0.1.0 (/dev/sg4).
819e LTFS30207I Vendor ID is QUANTUM .
819e LTFS30208I Product ID is ULTRIUM-HH7      .
819e LTFS30214I Firmware revision is P381.
819e LTFS30215I Drive serial is 109xxxxxxx.
819e LTFS30285I The reserved buffer size of /dev/sg4 is 1048576.
819e LTFS30294I Setting up timeout values from RSOC.
819e LTFS17160I Maximum device block size is 1048576.
819e LTFS11330I Loading cartridge.
819e LTFS11332I Load successful.
819e LTFS17157I Changing the drive setting to write-anywhere mode.
819e LTFS11005I Mounting the volume.
819e LTFS11175E Cannot read ANSI label: expected 80 bytes, but received 0.
819e LTFS11171E Cannot read volume: failed to read partition 1.
819e LTFS11009E Cannot mount the volume.
819e LTFS30252I Logical block protection is disabled.

Any ideas if I'm able to rebuild/repair the index partition (it seems that's what it cannot read?), or maybe there's other things I can try.

I ran ltfsck /dev/sg4 --full-recovery and it returned:

[root@rhel8 ~]# ltfsck /dev/sg4 --full-recovery
LTFS16000I STarting ltfsck, LTFS version 2.4.5.1 (Prelim), log level 2.
LTFS16088I Launched "ltfsck --full-recovery /dev/sg4"
LTFS16089I This binary is built for Linux (x86_64).
LTFS16090I GCC version is 8.5.0 20210514 (Red Hat 8.5.0-10).
LTFS17087I Kernel version: Linux version 4.18.0-425.3.1.el8.x86_64 (mockbuild@x86-vm-08.build.eng.bos.redhat.com) (gcc version 8.5.0 20210514 (Red Hat 8.5.0-15) (GCC)) #1 SMP Fri Sep 30 11:45:06 EDT 2022 i386.
LTFS17089I Distribution: NAME="Red Hat Enterprise Linux".
LTFS17089I Distribution: Red Hat Enterprise Linux release 8.7 (Ootpa).
LTFS17089I Distribution: Red Hat Enterprise Linux release 8.7 (Ootpa).
LTFS17085I Plugin: Loading "sg" tape backend.
LTFS30209I Opening a device through sg-ibmtape driver (/dev/sg4).
LTFS30250I Opened the SCSI tape device 1.0.1.0 (/dev/sg4).
LTFS30207I Vendor ID is QUANTUM .
LTFS30208I Project ID is ULTRIUM-HH7      .
LTFS30214I Firmware revision is P381.
LTFS30215I Drive serial is 109xxxxxxx.
LTFS30285I The reserved buffer size of /dev/sg4 is 1048576.
LTFS17160I Setting up timeout values from RSOC.
LTFS11330I Loading cartridge.
LTFS11332I Load successful.
LTFS17157I Changing the drive setting to write-anywhere mode.
LTFS16014I Checking LTFS file system on '/dev/sg4'.
LTFS11175E Cannot read ANSI label: expected 80 bytes, but received 0.
LTFS11171E Failed to read label (-1012) from partition 1.
LTFS11009E Cannot read volume: failed to read partition labels.
LTFS16080E Cannot check volume (8).
LTFS30252I Logical block protection is disabled.

I've updated my Quantum LTO-7 Tape Drive to the latest available firmware as well using ITDT (IBM Tape Diagnostics Tool). I also used the tool to check the tape health and status on whether the Tape Drive Head was Clean.. Everything came back PASSED/OK from the ITDT tool.

Are there any suggestions to attempt to recover the content written to the tape? Maybe a way to rebuild the index partition or whatever it is failing to read from?

I can provide more logs if necessary (in a verbose mode), however, if there are sections more relevant I can provide instead of the entire log - that would be helpful (since I'm typing it out manually).

Thanks in advanced!

@piste-jp
Copy link
Member

piste-jp commented Dec 9, 2022

LTFS said it cannot read the ANSI label, it is placed into the position0 and size shall be 80 bytes. It is really funny behavior but I cannot make any investigation from the info you uploaded because it is a problem at writing the data or unmounting the tape if this is really a problem.

LTFS11175E Cannot read ANSI label: expected 80 bytes, but received 0.

Please upload the log at unmount if you want to have more investigation.

In my sense, your test scenario is little bit dangerous because umount /LTFS is just a trigger of unmount and FUSE just closes the POSIX API I/F. Actually ltfs process keeps running for writing index and unmount it cleanly. So this kind of problem might happen if another command comes from another device file /dev/st0 while ltfs is closing the tape.

Could you make a test with my recommended test scenario below?

  1. Format a tape: mkltfs -d /dev/sg4
  2. Mount the tape with automatic eject option at unmount: ltfs -o devname=/dev/sg4 -o eject /LTFS
  3. Copy files: cp files*.iso /LTFS
  4. Unmount the tape: umount /LTFS
  5. Wait until ltfs process disappears (The tape might be ejected at the end of ltfs process because of the -o eject option)
  6. Physically removing and reinserting Tape
  7. Remount the tape: ltfs -o devname=/dev/sg4 -o eject /LTFS

@piste-jp piste-jp self-assigned this Dec 9, 2022
@DizzyThermal
Copy link
Author

Thanks @piste-jp-ibm - I didn't realize that the umount command was non-blocking.. I like the eject flag passed to ltfs on mount though. I'm going to give this method a try and see if that helps. Especially if my eject is directly acting on the /dev/st0 device while ltfs is still handling it.

Will reply back with my findings, thank you.

@DizzyThermal
Copy link
Author

@piste-jp-ibm

I've followed your test scenario by using the -o eject option when mounting the LTFS volume. On unmount I wait for the tape to eject itself (I am no longer using the mt to directly eject the tape.

Once I reinsert the tape I try to remount using the same command: ltfs -o devname=/dev/sg4 -o eject /LTFS and I noticed it did a 2-3 minute consistency check before failing. Here's the tail end of the log when trying to remount:

11b69 LTFS11005I Mounting the volume.
11b69 LTFS11026I Performing a full medium consistency check.
11b69 LTFS17037E XML parser: failed to read from XML stream.
11b69 LTFS11194W Cannot parse index direct from medium (-5000).
11b69 LTFS11220E Cannot read index: failed to read and parse XML data (-5000).
11b69 LTFS11220E Medium check failed: extra blocks detected. Run ltfsck.
11b69 LTFS11027E Cannot mount volume: medium consistency check failed.
11b69 LTFS14013E Cannot mount the volume.
11b69 LTFS30252I Logical block protection is disabled.

So I followed the log and ran ltfsck /dev/sg4

LTFS16000I Starting ltfsck, LTFS version 2.4.5.1 (Prelim), log level 2.
LTFS16088I Launched "ltfsck --full-recovery /dev/sg4"
LTFS16089I This binary is built for Linux (x86_64).
LTFS16090I GCC version is 8.5.0 20210514 (Red Hat 8.5.0-10).
LTFS17087I Kernel version: Linux version 4.18.0-425.3.1.el8.x86_64 (mockbuild@x86-vm-08.build.eng.bos.redhat.com) (gcc version 8.5.0 20210514 (Red Hat 8.5.0-15) (GCC)) #1 SMP Fri Sep 30 11:45:06 EDT 2022 i386.
LTFS17089I Distribution: NAME="Red Hat Enterprise Linux".
LTFS17089I Distribution: Red Hat Enterprise Linux release 8.7 (Ootpa).
LTFS17089I Distribution: Red Hat Enterprise Linux release 8.7 (Ootpa).
LTFS17085I Plugin: Loading "sg" tape backend.
LTFS30209I Opening a device through sg-ibmtape driver (/dev/sg4).
LTFS30250I Opened the SCSI tape device 1.0.1.0 (/dev/sg4).
LTFS30207I Vendor ID is QUANTUM .
LTFS30208I Project ID is ULTRIUM-HH7      .
LTFS30214I Firmware revision is P381.
LTFS30215I Drive serial is 109xxxxxxx.
LTFS30285I The reserved buffer size of /dev/sg4 is 1048576.
LTFS17160I Setting up timeout values from RSOC.
LTFS11330I Loading cartridge.
LTFS11332I Load successful.
LTFS17157I Changing the drive setting to write-anywhere mode.
LTFS16014I Checking LTFS file system on '/dev/sg4'.
LTFS16023I LTFS volume information:.
LTFS16024I Volser (bar code) :
LTFS16025I Volume UUID     : <UUID - snipped for privacy>
LTFS16026I Format time     : 2022-12-12 08:48:31.978920106 EST.
LTFS16027I Block size      : 524288.
LTFS16028I Compression     : Enabled.
LTFS16029I Index partition : ID = a, SCSI Partition = 0.
LTFS16030I Data partition  : ID = b, SCSI Partition = 1.

LTFS11005I Mounting the volume.
LTFS11026I Performing a full medium consistency check.
LTFS17037E XML parser: failed to read from XML stream.
LTFS11194W Cannot parse index direct from medium (-5000).
LTFS11220E Cannot read index: failed to read and parse XML data (-5000).
LTFS11337I Update index-dirty flag (1) - NO_BARCODE (0x0x55bdc4460650).
LTFS11227I Preserve existing unreffered data blocks in the data partition, put the latest index at 3894092.
LTFS11230I Writing index(es) to restore consistency.
LTFS17259I Recover an index on DP from (b, 3894089).
LTFS17235I Writing index of NO_BARCODE to b *Reason: Recovery, 22 files) 109xxxxxxx.
LTFS17236I Wrote index of NO_BARCODE (Gen = 30, Part = b, Pos = 3894093, 109xxxxxxx).
LTFS11337I Update index-dirty flag (0) - NO_BARCODE (0x0x55bdc4460650).
LTFS17259I Recover an index on IP from (b, 3894093).
LTFS17235I Writing index of NO_BARCODE to a (Reason: Recovery, 22 files) 109xxxxxxx.
LTFS17236I Wrote index of NO_BARCODE (Gen = 30, Part = a, Pos = 5, 109xxxxxxx).
LTFS17227I Tape attribute: Vendor = IBM     .
LTFS17227I Tape attribute: Application Name = LTFS                            .
LTFS17227I Tape attribute: Application Version = 2.4.5.1 .
LTFS17227I Tape attribute: Medium Label = .
LTFS17227I Tape attribute: Text Localization ID = 0x81.
LTFS17227I Tape attribute: Barcode =                                .
LTFS17227I Tape attribute: Application Format Version = 2.4.0          .
LTFS17227I Tape attribute: Volume Lock Status = 0x00.
LTFS17227I Tape attribute: Media Pool name = .
LTFS11031I Volume mounted successfully. NO_BARCODE : Gen = 30 / (a, 5) -> (b, 3894093) / 109xxxxxxx.
LTFS17265I Skip writing the index because of the volume is not dirty and current self pointer points IP.
LTFS11034I Volume unmounted successfully.
LTFS16022I Volume is consistent.
LTFS30252I Logical block protection is disabled.

After the ltfsck finishes I am able to mount the volume, however, the last file I wrote md5sum.txt is missing.

I'm running the cp files*.iso operation followed by an md5sum of each file on the tape overnight. My first task the following morning is to compare the source files' md5sum.txt against the md5sums of the files written to the tape. - then I write the md5sum.txt file to the tape as well. I usually then immediately do the unmount and while the ltfs command is likely still running - I status the tape using: mt -f /dev/st0 status -- maybe this is the cause of my missing md5sum.txt file? Would running status on the st0 device interrupt the ltfs command? Maybe running ltfsck with --full-recovery would recover it?

It would be nice if there was a way to completely rebuild the index partition from the data on the tape. I believe that the files*.iso files written to the data partition have had plenty of time to finish writing (since it's an overnight operation). Still puzzling why md5sum.txt is missing after running ltfsck.

I'm going to try not statusing the tape until the ltfs process finishes.

@piste-jp
Copy link
Member

I think something might happens when you writing the tape. Could you send me a log while you are writing data on tape?

And could you provide the output of ltfsck -l -m /dev/sg4 ?

@DizzyThermal
Copy link
Author

Removing all uses for mt in my workflow and relying solely on ltfs with the -o eject flag - I have not ran into issues with remounting the tapes. I believe this issue can be closed as the recommended workflow does not render unmountable/corrupt tapes.

Summary on this issue: Do not use mt commands when using LTFS, even if it's simply statusing the tape.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants