Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve error position detention at WRITE PERM #357

Merged
merged 4 commits into from
Aug 10, 2022

Conversation

piste-jp
Copy link
Member

@piste-jp piste-jp commented Aug 8, 2022

Summary of changes

This pull request includes following changes or fixes.

Description

  • Changes backend interface for handling write perm
    • Previously, it returns number of records in the drive buffer. Currently, it returns first untransferred record position to tape (In other words written but not transferred to tape yet) and it is used as error position
  • Consider the final index on the partition as error position even if very small number is returned (this is a safeguard to avoid over kill of extents)

Fixes #356

Type of change

  • Bug fix (non-breaking change which fixes an issue)

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have confirmed my fix is effective or that my feature works

Test Log

On Linux
3135 LTFS11031I Volume mounted successfully. NO_BARCODE : Gen = 2 / (a, 5) -> (b, 8008) / 00013B0119.
3135 LTFS14111I Initial setup completed successfully.
3135 LTFS14112I Invoke 'mount' command to check the result of final setup.
3135 LTFS14113I Specified mount point is listed if succeeded.
314f LTFS14029I Ready to receive file system requests.
3151 LTFS11337I Update index-dirty flag (1) - NO_BARCODE (0x0x1fb51f0).
3152 LTFS30205I WRITE (0x0a) returns -20301.
3152 LTFS30263I WRITE returns Cartridge Fault (-20301) /dev/sg119.
3152 LTFS30261I Taking drive dump in buffer.
3152 LTFS30253I Saving drive dump to /tmp/ltfs_00013B0119_2022_0808_143413.dmp.
3152 LTFS30262I Forcing drive dump.
3152 LTFS30253I Saving drive dump to /tmp/ltfs_00013B0119_2022_0808_143413_f.dmp.
3152 LTFS12045E Cannot write block: backend call failed (-20301). Dropping to read-only mode.
3152 LTFS11072E Cannot write blocks: failed to write to the medium (-20301).
3152 LTFS11077E Cannot write: failed to write blocks to the medium (-20301).
3152 LTFS13014W Data partition writer: failed to write data to the tape (-20301).
3152 LTFS13024I Clean up extents and append index at index partition (-20301).
3152 LTFS17292I Current position is (1, 8715), Error position is (1, 8011).
3152 LTFS13025I Truncate extents larger than position (1, 8011), block size = 524288.
3152 LTFS11334I Remove extent : errfile1 (8011, 369098752).
3152 LTFS11343I Try to write an index on the IP on NO_BARCODE because of a permanent write error on the DP..
3152 LTFS17235I Writing index of NO_BARCODE to a (Reason: Write perm, 4 files) 00013B0119.
3152 LTFS17236I Wrote index of NO_BARCODE (Gen = 3, Part = a, Pos = 8, 00013B0119).
3152 LTFS11337I Update index-dirty flag (0) - NO_BARCODE (0x0x1fb51f0).

On Mac
103 LTFS11031I Volume mounted successfully. NO_BARCODE : Gen = 2 / (a, 5) -> (b, 8008) / 00013B0119.
103 LTFS14111I Initial setup completed successfully.
103 LTFS14112I Invoke 'mount' command to check the result of final setup.
103 LTFS14113I Specified mount point is listed if succeeded.
2203 LTFS14029I Ready to receive file system requests.
3203 LTFS11337I Update index-dirty flag (1) - NO_BARCODE (0x0x7ff446705100).
1803 LTFS30808I WRITE (0x0a) returns -20301.
1803 LTFS30865I WRITE returns Cartridge Fault (-20301) 00013B0119.
1803 LTFS30863I Taking drive dump in buffer.
1803 LTFS30855I Saving drive dump to /tmp/ltfs_00013B0119_2022_0808_145738.dmp.
1803 LTFS30864I Forcing drive dump.
1803 LTFS30855I Saving drive dump to /tmp/ltfs_00013B0119_2022_0808_145738_f.dmp.
1803 LTFS12045E Cannot write block: backend call failed (-20301). Dropping to read-only mode.
1803 LTFS11072E Cannot write blocks: failed to write to the medium (-20301).
1803 LTFS11077E Cannot write: failed to write blocks to the medium (-20301).
1803 LTFS13014W Data partition writer: failed to write data to the tape (-20301).
1803 LTFS13024I Clean up extents and append index at index partition (-20301).
3103 LTFS30803I Failed to execute CDB, The opcode = 00 (-536870187).
3103 LTFS30808I TEST_UNIT_READY (0x00) returns -21700.
3103 LTFS12029E Device is not ready (-21700).
1803 LTFS17292I Current position is (1, 9289), Error position is (1, 8020).
1803 LTFS13025I Truncate extents larger than position (1, 8020), block size = 524288.
1803 LTFS11334I Remove extent : errfile1 (8011, 670040064).
1803 LTFS11343I Try to write an index on the IP on NO_BARCODE because of a permanent write error on the DP..
1903 LTFS30803I Failed to execute CDB, The opcode = 00 (-536870187).
1903 LTFS30808I TEST_UNIT_READY (0x00) returns -21700.
1903 LTFS12029E Device is not ready (-21700).
1803 LTFS17235I Writing index of NO_BARCODE to a (Reason: Write perm, 4 files) 00013B0119.
1803 LTFS17236I Wrote index of NO_BARCODE (Gen = 3, Part = a, Pos = 8, 00013B0119).
1803 LTFS11337I Update index-dirty flag (0) - NO_BARCODE (0x0x7ff446705100).

@piste-jp piste-jp changed the title Small wp fix Improve error position detention at WRITE PERM Aug 8, 2022
@piste-jp piste-jp self-assigned this Aug 10, 2022
@piste-jp
Copy link
Member Author

Ready for review

@piste-jp piste-jp marked this pull request as ready for review August 10, 2022 01:27
@piste-jp piste-jp merged commit 486ce60 into LinearTapeFileSystem:v2.4-stable Aug 10, 2022
@piste-jp piste-jp deleted the small-wp-fix branch August 10, 2022 01:28
@piste-jp
Copy link
Member Author

Cherry picked to the master.

piste-jp pushed a commit that referenced this pull request Aug 10, 2022
* Correct miscalculation of last block on tape
* Consider the final index on the partition as error position even if very small number is returned
* Never adjust the force_writeperm threshold for better debug
* Stop checking the I/F of the ltfs-backends repository
@piste-jp piste-jp mentioned this pull request Aug 15, 2022
6 tasks
Copy link

@MayraPD MayraPD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good to me with additional validations, a couple I do not fully understand but questions/comments documented for Abe-san who does know the code (I am just learning through it)

if (last_index_pos > err_pos.block) {
ltfsmsg(LTFS_INFO, 13027I, (int)err_pos.partition,
(unsigned long long)err_pos.block, (unsigned long long)last_index_pos);
err_pos.block = last_index_pos + 1;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@piste-jp-ibm So if last_index_pos is "the end of the partition" you are moving error_pos to the end of the partition?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is first block of the index. But it is enough because every extents must not have that block at all.

So LTFS sets the last position on tape to last_index_pos + 1 and cleanup extents. As a result, all extents after the last index will be removed.

memcpy(pos, &dev->position, sizeof(struct tc_position));

ltfsmsg(LTFS_DEBUG, 11335D, (int)pos->block, block);
pos->block -= block;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@piste-jp-ibm at first glance I would have thought on this line as the guilty of the overwriting of the previous block

Copy link
Member Author

@piste-jp piste-jp Aug 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. this line is one of the root cause. But logic itself is correct. The idea how to fetch the error position is bad.

We thought that the only way to fetch the error position is subtract number of records in buffer from the current position. But this idea is not good when a WRITE PERM happens just after a locate (for append). Because the drive mould have records which is read by the locate in buffer.

Actually, we realized error position it self is provided by READ_POSITION command.

if (ext->start.block && ext->bytecount) {
extent_last.partition = ltfs_part_id2num(ext->start.partition, vol);
/* Calculate the last block of this extent */
extent_last.block = ext->start.block + (ext->bytecount / blocksize);
Copy link

@MayraPD MayraPD Aug 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@piste-jp-ibm so LTFS always uses fixed length never variable length right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, a extent is constructed from one or more 512KB fixed blocks and no or one variable block.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

There is a small chance to mis-detect the position when a permanent write error happens
2 participants