Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add functionality to get AWS credentials from the Default Credential Provider #865

Closed
wants to merge 2 commits into from

Conversation

bbreton
Copy link

@bbreton bbreton commented Nov 11, 2020

This adds the ability to get AWS credentials for accessing s3 from any of the supported methods on the AWS credentials chain.

https://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/credentials.html#credentials-default

This makes it so Cognito no longer needs to be setup should the organization using AWS not wish to and/or for minimal uses Cognito is overkill. It also will allow for the instance profile credentials (e.g. you start up an EC2 instance and that instance has the credentials baked into it) to be used. Overall it increases flexibility for uses that are not organizations that need cognito and/or want to run IGV with s3 access on AWS EC2 instances.

This implementation is a bit of a hack using the OAUTH code/path even though it isn't OAUTH.

My Open Questions:

Are you interested in adding this functionality?

What changes would you like to see to merge this and/or is the current method acceptable?

How long should I sign the S3 urls for? Right now it is set for the max (7 days). This can be set to the default of 15 minutes or any other time including adding a config parameter for the user to set it.

What text should I put for the settings checkbox and is the current location good? I don't love my current text there, but trying to convey exactly what it is without getting long is difficult.

Thank You

@bbreton bbreton marked this pull request as draft November 11, 2020 17:39
@jrobinso
Copy link
Contributor

@bbreton Thanks for the PR. Yes I am interested, but it will be a few weeks before I can turn my attention to this. I also have some outstanding S3 issues to contend with, such as #856, which may or may not impact this PR.

I would be interested to get input from @brainstorm

@brainstorm
Copy link
Contributor

brainstorm commented Nov 12, 2020

Thanks @jrobinso for the heads up, appreciate that.

I feared that this type of PR would come along sooner or later since I got this type of feedback from some users and here it is... so I think it's worth clarifying and re-iterating the need for proper security and good practices in IGV desktop when working in conjunction with AWS services.

I get what @bbreton is trying to do here and it sounds good in principle, but it's bad (dangerous) in practice, let me explain why:

  1. Cognito is a free (for less than 50.000 users) AWS service that provides temporary, time-gated (STS) credentials for non-IAM users (uses Roles instead). One might think it's overkill but its use is fairly common practice and provides pretty good security guarantees. OTOH, users are known to heavily misuse and underestimate the security risks behind IAM credentials, which brings me to the next point.
  2. IAM users should not be used like that. Please carefully read the Stage 5 of AWS Security Maturity Roadmap by Scott Piper (@0xdabbad00).
  3. Statements like "How long should I sign the S3 urls for? Right now it is set for the max (7 days)" defeat the purpose of an adequate time-gated, secure access to sensitive data, widening the potential damage if a data breach occurs. Also, setting 7 days as the maximum ignores the fact that only IAM users can provide that. Other mechanisms like "assume role"-like creds are limited to 12h or less, so that hardcoded max limit is already misleading (possibly broken?) for other secure use cases.
  4. "Overall it increases flexibility for uses that are not organizations that need cognito and/or want to run IGV with s3 access on AWS EC2 instances." ... it does increase flexibility, but who's willing to support that flexibility while reasonably guaranteeing its security on most third party deployments? Certainly it will not be me if this gets merged as it is now.

Please don't get me wrong, I understand that supporting Instance Profiles and the Credential Provider Chain is a very convenient way to have IGV desktop running on an EC2 instance, avoiding the need to install clients on user endpoints (a pain I'm well aware of myself)... OTOH, it would defeat the purpose of having a nimble pay-for-what-you-use S3-only setup, as it is now... what is your usecase here? Copying data to an EC2 instance, launching IGV there and connecting to that instance? Depending on the implementation, this could result in quite poor UX as a result, not to mention the administrative overhead of properly controlling those instances, but I'm happy to be convinced otherwise.

Zooming out from my security concerns and getting some perspective on this topic, @bbreton's proposal and potential use cases involve running EC2 instances (if I understood correctly). That would require users to connect via SSH-X11-forwarding/RDP/VDI/NICE-DCV... At this point I'd really step back and think: isn't this approach overkill when one just wants to use IGV DESKTOP... specially when there's igv.js which is 0-heavy server side?

Full disclosure, I'm not claiming our (@umccr) implementation is perfect by any stretch of the imagination. For instance, I'm still uneasy with the current JWT validation scheme implemented in IGV's desktop codebase, which I raised before on the original PR (which would need a proper and professional security audit). I don't want to come across as the typical IT security alarmist, but please be aware that IGV is frequently used to access clinical genomics (patient) data so it should deserve higher scrutiny than most (rushed, security-as-an-afterthought) scientific software, IMO.

Please, please, @bbreton, don't take this as personal criticism/grilling. I'm genuinely flattered that some code we wrote "out there on the internet" gets used by great people, so don't be discouraged by this analysis.

@bbreton
Copy link
Author

bbreton commented Nov 12, 2020

@brainstorm

First off would you prefer I pull this request while the security parts are considered? I can't really put the genie back in the bottle, but deleting this probably avoid it hitting google and becoming too widespread. I added my email to github so if you want to contact me directly feel free.

The use case I specifically have is:

I have scripted IGV to take a large amount of IGV snapshots that I distribute out using aws batch to snapshot 1000+ locations at once. These snapshots can allow the rapid review of many locations such as is needed when trying to build a true positive database for whatever you are trying to find.

If anyone else has any ideas that would let a user review a diverse set of genomic locations on bams stored on s3 with less than a second load time per location I would be open to it. My current approach is kind of a pain, but for user time spent waiting for things to load I haven't found a better way.

I ended up doing a docker container that runs IGV combined with proxying my s3 bucket to do this on AWS batch, but that was a lot of effort and not very nice overall which is why I was looking for a way to give the EC2 instance/container permissions.

One option that would at least support that specific use case would be to limit this feature to InstanceProfileCredentialsProvider and ContainerCredentialsProvider.

I agree with your concerns about the signed url. My setting it to 7 days initially was somewhat for my environment/situation where it would be difficult to impossible for a signed url to leak. I wasn't certain what the best trade-off for functionality vs. security would be which is why I pointed it out as a question I had.

IAM users should not be used like that. Please carefully read the Stage 5 of AWS Security Maturity Roadmap by Scott Piper (@0xdabbad00).
I believe that they would consider AWS SSO with assume role and using it to generate a temporary id/key as compatible with that roadmap. So yeah this pull request enables insecure things, but it also opens up other options that for a given organization may be easier to secure depending on existing infrastructure.

For the more general concerns about security some of this gets into more security philosophy which I suspect my thoughts are somewhat driven by my personal experiences on small teams/companies with minimal IT organizational support and all the users being relatively sophisticated. I am fully aware that IGV is frequently used for clinical genomics data and have dealt with PHI compliance including HIPAA.

My thoughts are:

The AWS credential chain can be used securely with proper setup. The exact combination that best suits an org's security, existing infrastructure, regulatory and other requirements will vary. That said an org can always do things like proxy s3 like I did which would allow them to use whatever auth they wanted so IGV supporting other methods isn't absolutely necessary.

I agree Cognito is probably the best way that a user can't easily drastically mess up security, but as above it isn't the only method to secure access.

If someone who doesn't understand why using the basic AWS IAM credentials at the very least need to be carefully secured and given minimal permissions and ideally transition to something with SSO or cognito especially if dealing with PHI has the ability to generate/give them out I feel that environment is already insecure and I'm not sure IGV not supporting one way of being insecure is going to help.

The simplest alternative that the user currently has is to make the files or entire bucket public which is really bad for PHI though at least it wouldn't potentially compromise the whole account like overly permissive IAM or root credentials would if leaked.

@jrobinso
Copy link
Contributor

@bbreton This is off topic, nothing to do with security, but have you checked out igv-reports (github.com/igvteam/igv-reports)? It creates standalone html pages for variant review with data from the bam, vcf, and whatever else embedded right in the page and viewed with igv.js. It is the ultimate in security because nothing is on the network, its literally a single html file. I think Brian Haas uses this with 1000s of sites, IIRC 10s of thousands, I don't really know what the practical upper limit is but there are workarounds when we meet it. My vision for that was to replace the static screenshot use case, which is a common one. I haven't had time to promote it really, but please check it out. I would be happy to setup a video meeting to demonstrate it.

RE security and IGV in general, users and organizations do all sorts of things with IGV and many other programs, we aren't the secops police. My goal would be to give organizations the tools to make IGV secure but its up to them to implement what is secure for their organization. The most common scheme for accessing S3 resources with IGV, by far, is using signed URLs generated external to IGV. Many organization with their own secops team feel this is secure. We on the IGV team are not requiring incognito, recommending icognito, nor any other particular scheme, we simply support it through UMCCR's contribution.

@brainstorm s critique is more about best practices, as he sees them, that's not a debate I have expertise on or will engage in. If there is a security hole in this PR (I haven't even looked at it yet) that would need to be addressed, but if it merely enables an insecure practice if misused that is another matter. We also support basic auth, which in general is a very bad authentication scheme but good enough for some organizations and usages. For example locally here labs have used basic auth just to keep their data hidden from other labs until they publish.

I don't know where this PR falls and won't have time to study it until probably post Thanksgiving, but no you don't need to pull it.

@bbreton
Copy link
Author

bbreton commented Nov 12, 2020

@bbreton This is off topic, nothing to do with security, but have you checked out igv-reports (github.com/igvteam/igv-reports)? It creates standalone html pages for variant review with data from the bam, vcf, and whatever else embedded right in the page and viewed with igv.js. It is the ultimate in security because nothing is on the network, its literally a single html file. I think Brian Haas uses this with 1000s of sites, IIRC 10s of thousands, I don't really know what the practical upper limit is but there are workarounds when we meet it. My vision for that was to replace the static screenshot use case, which is a common one. I haven't had time to promote it really, but please check it out. I would be happy to setup a video meeting to demonstrate it.

Thanks. I will take a look at it. It certainly sounds like it will meet the requirements and would be easier than the scripting/snapshotting.

That said I have gotten some feedback (including my own impressions) that many prefer the desktop app view, but much of that is just what users are used to and also may not apply to bulk review vs. careful examination of a few sites which are very different use cases.

@jrobinso
Copy link
Contributor

@bbreton To be clear I was not suggesting igv-web / igv.js could replace the desktop app, at least not now, I was suggesting it as a possible replacement for browsing static screenshots.

@bbreton
Copy link
Author

bbreton commented Nov 18, 2020

My current future plans for this pull are:

  1. Add the capability to use AWS assume-role along with the ability to enter/use a MFA token if the assume-role requires it.
  2. Add the capability to specify a credentials profile when using the aws credentials.
  3. Figure out a better way to do the settings given I'm up to 4 settings needed. Perhaps a checkbox similar to the google settings with a separate settings config dialog makes sense here.
  4. Give my code a more overall refactor to make it cleaner and a bit more separated from the cognito/OAUTH code.

1 and 2 are useful to support better AWS security practices and not having them could make people default to the worst security practices (using a single permanent AWS key/secret with too many permissions).

Overall AWS credentials have a lot of options. I can't support all possible forms of credentials and configurations. Even figuring out what all of them are would be difficult and the amount of code and testing needed would get crazy especially given some of the methods are rarely used. If someone requests something not supported this infrastructure should make it easy to add, but I think this along with the existing OAUTH code will cover all the common uses.

@brainstorm
Copy link
Contributor

For snapshotting, I'd either use https://github.com/brentp/jigv or https://github.com/igvteam/igv-reports, TBH... the former seems to be a bit more in line with what you were looking for, @bbreton.

@jrobinso
Copy link
Contributor

For static snaphsots the former, if you want to interact with the data the latter.

@markotitel
Copy link

does it make sense if IGV has access key / secret access key fields?
User can get the keys from AWS admin and put it in the app.
The keys can have read only access to the bucket.
And voilla.
Java sdk should have getObject() on S3.

@jrobinso
Copy link
Contributor

@markotitel I don't think so, @brainstrom no doubt would have an opinion. @bbreton I'm going to close this simply because its stale and couldn't be merged now in any event.

@jrobinso jrobinso closed this Jan 21, 2022
@brainstorm
Copy link
Contributor

@markotitel Please use the already officially supported AWS S3 via Cognito as mentioned in the README. Happy to assist as I did recently for a third party user: https://www.biostars.org/p/9506536

I understand that this PR seems "simple" but from a systems/security perspective it can become a nightmare fast, I'm glad it's finally closed and I hope it doesn't resurface as I noted in #1084.

@jrobinso
Copy link
Contributor

To be clear to everyone the IGV team is not taking a position on the discussion above between @bbreton and @brainstorm , the issue here is closed due to staleness.

@jrobinso
Copy link
Contributor

jrobinso commented Apr 27, 2022

@bbreton I've implemented a solution similar to this PR in #1139 and will be merging it in the next release shortly. Apologies this took so long. Although implemented independently, it is close to this PR. The main differences I see is I do not use IGV preferences, if Cognito is not supported the default credentials chain is tried. Also, the signing timeout is not set, but this should not affect anyone as the signed URLs are not stored in sessions or otherwise exposed, they are fetched as needed from the s3 url.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants