-
Notifications
You must be signed in to change notification settings - Fork 385
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add functionality to get AWS credentials from the Default Credential Provider #865
Conversation
@bbreton Thanks for the PR. Yes I am interested, but it will be a few weeks before I can turn my attention to this. I also have some outstanding S3 issues to contend with, such as #856, which may or may not impact this PR. I would be interested to get input from @brainstorm |
Thanks @jrobinso for the heads up, appreciate that. I feared that this type of PR would come along sooner or later since I got this type of feedback from some users and here it is... so I think it's worth clarifying and re-iterating the need for proper security and good practices in IGV desktop when working in conjunction with AWS services. I get what @bbreton is trying to do here and it sounds good in principle, but it's bad (dangerous) in practice, let me explain why:
Please don't get me wrong, I understand that supporting Instance Profiles and the Credential Provider Chain is a very convenient way to have IGV desktop running on an EC2 instance, avoiding the need to install clients on user endpoints (a pain I'm well aware of myself)... OTOH, it would defeat the purpose of having a nimble pay-for-what-you-use S3-only setup, as it is now... what is your usecase here? Copying data to an EC2 instance, launching IGV there and connecting to that instance? Depending on the implementation, this could result in quite poor UX as a result, not to mention the administrative overhead of properly controlling those instances, but I'm happy to be convinced otherwise. Zooming out from my security concerns and getting some perspective on this topic, @bbreton's proposal and potential use cases involve running EC2 instances (if I understood correctly). That would require users to connect via SSH-X11-forwarding/RDP/VDI/NICE-DCV... At this point I'd really step back and think: isn't this approach overkill when one just wants to use IGV DESKTOP... specially when there's igv.js which is 0-heavy server side? Full disclosure, I'm not claiming our (@umccr) implementation is perfect by any stretch of the imagination. For instance, I'm still uneasy with the current JWT validation scheme implemented in IGV's desktop codebase, which I raised before on the original PR (which would need a proper and professional security audit). I don't want to come across as the typical IT security alarmist, but please be aware that IGV is frequently used to access clinical genomics (patient) data so it should deserve higher scrutiny than most (rushed, security-as-an-afterthought) scientific software, IMO. Please, please, @bbreton, don't take this as personal criticism/grilling. I'm genuinely flattered that some code we wrote "out there on the internet" gets used by great people, so don't be discouraged by this analysis. |
First off would you prefer I pull this request while the security parts are considered? I can't really put the genie back in the bottle, but deleting this probably avoid it hitting google and becoming too widespread. I added my email to github so if you want to contact me directly feel free. The use case I specifically have is: I have scripted IGV to take a large amount of IGV snapshots that I distribute out using aws batch to snapshot 1000+ locations at once. These snapshots can allow the rapid review of many locations such as is needed when trying to build a true positive database for whatever you are trying to find. If anyone else has any ideas that would let a user review a diverse set of genomic locations on bams stored on s3 with less than a second load time per location I would be open to it. My current approach is kind of a pain, but for user time spent waiting for things to load I haven't found a better way. I ended up doing a docker container that runs IGV combined with proxying my s3 bucket to do this on AWS batch, but that was a lot of effort and not very nice overall which is why I was looking for a way to give the EC2 instance/container permissions. One option that would at least support that specific use case would be to limit this feature to InstanceProfileCredentialsProvider and ContainerCredentialsProvider. I agree with your concerns about the signed url. My setting it to 7 days initially was somewhat for my environment/situation where it would be difficult to impossible for a signed url to leak. I wasn't certain what the best trade-off for functionality vs. security would be which is why I pointed it out as a question I had.
For the more general concerns about security some of this gets into more security philosophy which I suspect my thoughts are somewhat driven by my personal experiences on small teams/companies with minimal IT organizational support and all the users being relatively sophisticated. I am fully aware that IGV is frequently used for clinical genomics data and have dealt with PHI compliance including HIPAA. My thoughts are: The AWS credential chain can be used securely with proper setup. The exact combination that best suits an org's security, existing infrastructure, regulatory and other requirements will vary. That said an org can always do things like proxy s3 like I did which would allow them to use whatever auth they wanted so IGV supporting other methods isn't absolutely necessary. I agree Cognito is probably the best way that a user can't easily drastically mess up security, but as above it isn't the only method to secure access. If someone who doesn't understand why using the basic AWS IAM credentials at the very least need to be carefully secured and given minimal permissions and ideally transition to something with SSO or cognito especially if dealing with PHI has the ability to generate/give them out I feel that environment is already insecure and I'm not sure IGV not supporting one way of being insecure is going to help. The simplest alternative that the user currently has is to make the files or entire bucket public which is really bad for PHI though at least it wouldn't potentially compromise the whole account like overly permissive IAM or root credentials would if leaked. |
@bbreton This is off topic, nothing to do with security, but have you checked out igv-reports (github.com/igvteam/igv-reports)? It creates standalone html pages for variant review with data from the bam, vcf, and whatever else embedded right in the page and viewed with igv.js. It is the ultimate in security because nothing is on the network, its literally a single html file. I think Brian Haas uses this with 1000s of sites, IIRC 10s of thousands, I don't really know what the practical upper limit is but there are workarounds when we meet it. My vision for that was to replace the static screenshot use case, which is a common one. I haven't had time to promote it really, but please check it out. I would be happy to setup a video meeting to demonstrate it. RE security and IGV in general, users and organizations do all sorts of things with IGV and many other programs, we aren't the secops police. My goal would be to give organizations the tools to make IGV secure but its up to them to implement what is secure for their organization. The most common scheme for accessing S3 resources with IGV, by far, is using signed URLs generated external to IGV. Many organization with their own secops team feel this is secure. We on the IGV team are not requiring incognito, recommending icognito, nor any other particular scheme, we simply support it through UMCCR's contribution. @brainstorm s critique is more about best practices, as he sees them, that's not a debate I have expertise on or will engage in. If there is a security hole in this PR (I haven't even looked at it yet) that would need to be addressed, but if it merely enables an insecure practice if misused that is another matter. We also support basic auth, which in general is a very bad authentication scheme but good enough for some organizations and usages. For example locally here labs have used basic auth just to keep their data hidden from other labs until they publish. I don't know where this PR falls and won't have time to study it until probably post Thanksgiving, but no you don't need to pull it. |
Thanks. I will take a look at it. It certainly sounds like it will meet the requirements and would be easier than the scripting/snapshotting. That said I have gotten some feedback (including my own impressions) that many prefer the desktop app view, but much of that is just what users are used to and also may not apply to bulk review vs. careful examination of a few sites which are very different use cases. |
@bbreton To be clear I was not suggesting igv-web / igv.js could replace the desktop app, at least not now, I was suggesting it as a possible replacement for browsing static screenshots. |
My current future plans for this pull are:
1 and 2 are useful to support better AWS security practices and not having them could make people default to the worst security practices (using a single permanent AWS key/secret with too many permissions). Overall AWS credentials have a lot of options. I can't support all possible forms of credentials and configurations. Even figuring out what all of them are would be difficult and the amount of code and testing needed would get crazy especially given some of the methods are rarely used. If someone requests something not supported this infrastructure should make it easy to add, but I think this along with the existing OAUTH code will cover all the common uses. |
For snapshotting, I'd either use https://github.com/brentp/jigv or https://github.com/igvteam/igv-reports, TBH... the former seems to be a bit more in line with what you were looking for, @bbreton. |
For static snaphsots the former, if you want to interact with the data the latter. |
does it make sense if IGV has |
@markotitel I don't think so, @brainstrom no doubt would have an opinion. @bbreton I'm going to close this simply because its stale and couldn't be merged now in any event. |
@markotitel Please use the already officially supported AWS S3 via Cognito as mentioned in the README. Happy to assist as I did recently for a third party user: https://www.biostars.org/p/9506536 I understand that this PR seems "simple" but from a systems/security perspective it can become a nightmare fast, I'm glad it's finally closed and I hope it doesn't resurface as I noted in #1084. |
To be clear to everyone the IGV team is not taking a position on the discussion above between @bbreton and @brainstorm , the issue here is closed due to staleness. |
@bbreton I've implemented a solution similar to this PR in #1139 and will be merging it in the next release shortly. Apologies this took so long. Although implemented independently, it is close to this PR. The main differences I see is I do not use IGV preferences, if Cognito is not supported the default credentials chain is tried. Also, the signing timeout is not set, but this should not affect anyone as the signed URLs are not stored in sessions or otherwise exposed, they are fetched as needed from the s3 url. |
This adds the ability to get AWS credentials for accessing s3 from any of the supported methods on the AWS credentials chain.
https://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/credentials.html#credentials-default
This makes it so Cognito no longer needs to be setup should the organization using AWS not wish to and/or for minimal uses Cognito is overkill. It also will allow for the instance profile credentials (e.g. you start up an EC2 instance and that instance has the credentials baked into it) to be used. Overall it increases flexibility for uses that are not organizations that need cognito and/or want to run IGV with s3 access on AWS EC2 instances.
This implementation is a bit of a hack using the OAUTH code/path even though it isn't OAUTH.
My Open Questions:
Are you interested in adding this functionality?
What changes would you like to see to merge this and/or is the current method acceptable?
How long should I sign the S3 urls for? Right now it is set for the max (7 days). This can be set to the default of 15 minutes or any other time including adding a config parameter for the user to set it.
What text should I put for the settings checkbox and is the current location good? I don't love my current text there, but trying to convey exactly what it is without getting long is difficult.
Thank You