Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(services/sftp): support copy and rename for sftp #2263

Merged
merged 5 commits into from
May 17, 2023
Merged

feat(services/sftp): support copy and rename for sftp #2263

merged 5 commits into from
May 17, 2023

Conversation

silver-ymz
Copy link
Member

No description provided.

Signed-off-by: silver-ymz <yinmingzhuo@gmail.com>
core/src/services/sftp/backend.rs Outdated Show resolved Hide resolved
core/src/services/sftp/backend.rs Outdated Show resolved Hide resolved
Signed-off-by: silver-ymz <yinmingzhuo@gmail.com>
@suyanhanx
Copy link
Member

Seems that it is possible to rename to a different path if it is a POSIX_RENAME operation.

Is there any way to detect its capability? 🤔

@silver-ymz
Copy link
Member Author

silver-ymz commented May 15, 2023

Sorry for misunderstanding the document. We can rename to a different path as long as ensuring the destionation dir exists. The unsupported operation is rename to the dir of different mount point. We can pass the rename tests now.

But copy operation only supports when the remote server has copy-data extension. Should we send a test request when building sftp backend to detect the capability? This needs to block an async operation in build function.

@suyanhanx
Copy link
Member

Should we send a test request when building sftp backend to detect the capability? This needs to block an async operation in build function.

I think it isn't a big problem. cc @Xuanwo

@silver-ymz
Copy link
Member Author

There is a new bug about range_read.

In https://github.com/apache/incubator-opendal/actions/runs/4976970726/jobs/8905529556, test write_test_fuzz_range_reader failed.

thread 'services_sftp::write_test_fuzz_range_reader' panicked at 'assertion failed: `(left == right)`
  left: `546602`,
 right: `543843`: check seek failed: output pos is different with expected pos, actions: [Read(84819), Seek(End(-519436)), Seek(Start(346471)), Next, Read(27351), Seek(Start(349302)), Next, Seek(End(-339634)), Next, Read(180661), Seek(End(-561960)), Next, Next, Seek(End(-768061)), Next, Next, Seek(End(-634305)), Next, Next, Read(1337), Seek(End(-328798))]', 

I'm debuging it now.

@silver-ymz
Copy link
Member Author

After some research, I think we may need to give up the sftp copy feature. There are 3 reasons:

  1. Sftp copy-file extension is supported by few SFTP servers only. For example by ProFTPD mod_sftp and Bitvise WinSSHD. The most widespread SFTP server, the OpenSSH supports related copy-data only in very recent version 9.0.
  2. There are many ways to copy files except for sftp, such as direct cp in ssh, scp. There are very few users who use sftp to copy files.
  3. Detecting the capability in build function costs expensively. We need to spawn a new process to initiate a complete connection, and find a file which has read-write permission to test copying. Moreover, all of these things will use block_on function, which makes the interaction with tokio becomes more complex.

@suyanhanx
Copy link
Member

There is a new bug about range_read.

In https://github.com/apache/incubator-opendal/actions/runs/4976970726/jobs/8905529556, test write_test_fuzz_range_reader failed.

thread 'services_sftp::write_test_fuzz_range_reader' panicked at 'assertion failed: `(left == right)`
  left: `546602`,
 right: `543843`: check seek failed: output pos is different with expected pos, actions: [Read(84819), Seek(End(-519436)), Seek(Start(346471)), Next, Read(27351), Seek(Start(349302)), Next, Seek(End(-339634)), Next, Read(180661), Seek(End(-561960)), Next, Next, Seek(End(-768061)), Next, Next, Seek(End(-634305)), Next, Next, Read(1337), Seek(End(-328798))]', 

I'm debuging it now.

Is the reason you removed the content-length check?

Signed-off-by: silver-ymz <yinmingzhuo@gmail.com>
@silver-ymz
Copy link
Member Author

silver-ymz commented May 15, 2023

Is the reason you removed the content-length check?

I think it's because I set read_with_range to true, and this feature has some problems.
It can't reproduce stably, and all tests including read_with_range will fail randomly.
So far, I can only reproduce it on atmos/sftp container.
And according to my tests, it always failed after seek operation.

@silver-ymz
Copy link
Member Author

😓At this time, it passed all tests. Maybe we can open an issue first, and fix it in another PR.

@silver-ymz
Copy link
Member Author

silver-ymz commented May 16, 2023

The problem exists in TokioCompatFile of openssh-sftp-client. But I havn't found specific reason.

The buffer's length in TokioCompatFile of openssh-sftp-client is 4096 as default. If we seek the position far than Current(4096), it will only seek to Current(4096). Should we change the impl of AsyncSeek in SftpReaderInner?

@suyanhanx
Copy link
Member

I find the reason. The buffer's length in TokioCompatFile of openssh-sftp-client is 4096 as default. If we seek the position far than Current(4096), it will only seek to Current(4096). Should we change the impl of AsyncSeek in SftpReaderInner?

We can open a new issue. You may revert some relevate code and we could do this content-length job in another PR.

@Xuanwo
Copy link
Member

Xuanwo commented May 16, 2023

I think it isn't a big problem. cc @Xuanwo

No, we can't call async functions in build. And the ability of sftp server should be told by users. I prefer to return not supported instead.

core/src/services/sftp/backend.rs Outdated Show resolved Hide resolved
core/src/services/sftp/backend.rs Outdated Show resolved Hide resolved
core/src/services/sftp/backend.rs Outdated Show resolved Hide resolved
Signed-off-by: silver-ymz <yinmingzhuo@gmail.com>
Signed-off-by: silver-ymz <yinmingzhuo@gmail.com>
Copy link
Member

@Xuanwo Xuanwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@NobodyXu
Copy link
Contributor

NobodyXu commented May 16, 2023

Detecting the capability in build function costs expensively. We need to spawn a new process to initiate a complete connection, and find a file which has read-write permission to test copying. Moreover, all of these things will use block_on function, which makes the interaction with tokio becomes more complex.

@silver-ymz Auxiliary contains all supported extensions, so you could expose a new function in openssh-sftp-client easily.

@silver-ymz
Copy link
Member Author

silver-ymz commented May 16, 2023

Auxiliary contains all supported extensios, so you could expose a new function in openssh-sftp-client easily.

I'll try it in openssh-sftp-client. But we can't still detect it in build function. build function is a sync function, we haven't build the connection at this time. We had better not to block async operation in it.

@Xuanwo
Copy link
Member

Xuanwo commented May 16, 2023

But we can't still detect it in build function.

May I ask why we need to detect this? Can we return NotSuppported error while copy failed for server doesn't support it?

@Xuanwo
Copy link
Member

Xuanwo commented May 16, 2023

After some research, I think we may need to give up the sftp copy feature.

They are different things: To make copy work, we need both client and server supports this operation. So OpenDAL should:

  • Support copy if server supports.
  • Return a NotSupported errro if server doesn't.

@silver-ymz
Copy link
Member Author

silver-ymz commented May 16, 2023

May I ask why we need to detect this? Can we return NotSuppported error while copy failed for server doesn't support it?

This is the way to deal with it in the current implementation. So should we add copy to Capability in info function? If so, the tests will fail, because the sftp server in CI doesn't support copy.

@Xuanwo
Copy link
Member

Xuanwo commented May 16, 2023

If so, the tests will fail, because the sftp server in CI doesn't support copy.

We can add a new config called enable_copy: users should decide this.

@Xuanwo
Copy link
Member

Xuanwo commented May 17, 2023

So should we add copy to Capability in info function?

Hi, @silver-ymz, would like to add this in next PR? This PR is good enough to get merged.

@silver-ymz
Copy link
Member Author

would like to add this in next PR? This PR is good enough to get merged.

Ok. I think it's fine.

@Xuanwo Xuanwo merged commit 9cce4d3 into apache:main May 17, 2023
@silver-ymz silver-ymz deleted the feat/sftp-copy-rename branch May 17, 2023 08:04
@Xuanwo Xuanwo mentioned this pull request May 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants