Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First attempt to implement reading via SSH/SCP #44

Closed
wants to merge 5 commits into from

Conversation

ziky90
Copy link
Contributor

@ziky90 ziky90 commented Dec 11, 2015

Goal of this PR is to add the functionality of reading files via SSH/SCP from remote machines part of what was discussed in the issue #36 .

PLEASE DO NOT MERGE YET:
TODO:

  • implement tests
  • update README.rst

What should be discussed:

  • authentication (should we care of authentication or can we estimate that everybody has copied SSH keys to the machine before using smart_open at least for the beginning)

@ziky90
Copy link
Contributor Author

ziky90 commented Dec 16, 2015

There seems to be some bug in travis, based on the log there is problem with packages installation via pip.

Second question is that if we can assume for reading from ssh that everybody has his/her public key in authorized_keys?
For my purposes it's sufficient for now, but I can imagine that someone might need some config file with username and password or setting these in some comparable way to s3 credentials.

@piskvorky
Copy link
Owner

I'd say, implement what you need, cleanly. If someone needs more functionality, they can open a new PR to justify that extra complexity.

I don't understand the question with authorized_keys.

It may also make sense to pass additional options to SSH, so that it doesn't trigger interactive prompts ("known host checking" etc). But it's a slippery slope, I wouldn't want smart_open to end up taking responsibility for SSH setup and management.

CC @gojomo opinion?

@piskvorky
Copy link
Owner

By the way @ziky90 , @tmylk is working on a new smart_open release this week -- if you want this functionality in there, now is the best time.

@ziky90
Copy link
Contributor Author

ziky90 commented Dec 16, 2015

I've added that smart_open will not check for known hosts.

With authorized_keys I mean that this PR currently does not support any possibility to attach keys via -i parameter.

And again output from unit tests on Travis is kind of weird. Is there some problem with travis?


def __iter__(self):
ssh = subprocess.Popen("ssh -o StrictHostKeyChecking=no " + self.parsed_uri.netloc + " 'cat " + self.parsed_uri.uri_path + "'",
stdout=subprocess.PIPE, shell=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Choices on this line introduce risks that, while perhaps justified in many/most usage situations, should be clearly documented and perhaps require a user's explicit opt-in. (1) Users should know (perhaps by choosing a non-default option) that there's no host-key-checking; (2) composing a locally- and remotely- shelled execution string from the URI parts could introduce arbitrary execution risks at both endpoints, so extra care must be taken where such paths originate.

@piskvorky
Copy link
Owner

@tmylk is working on fixing the travis tests.

Regarding SSH options -- let's make that a configurable list, via an object attribute (e.g. obj.ssh_opts = ['-o', '"StrictHostKey=no"']), with some sensible defaults.

Regarding the documentation, I agree with @gojomo that we should feature prominently the fact that smart_open is only meant for running on known, controllable inputs, not on unsanitized URIs.

Also I'm not sure the cat escaping is right -- did you test this on remote filenames with spaces or special characters in them?

self.parsed_uri = parsed_uri

def __iter__(self):
ssh = subprocess.Popen("ssh -o StrictHostKeyChecking=no " + self.parsed_uri.netloc + " 'cat " + self.parsed_uri.uri_path + "'",
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably better (more robust) to pass a list of arguments to Popen ctr, rather than a single hand-assembled string.

@val314159
Copy link

Here's a pure-python (no shell) solution using Parimiko (the low level lib used by fabric)

It also has the side-benefit of maintaining connections so you only have to suffer through setup time once per connection :)

#58

@ziky90
Copy link
Contributor Author

ziky90 commented Feb 1, 2016

@val314159 Solution using Parimiko looks better.
So I'll close this PR.

@ziky90 ziky90 closed this Feb 1, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants