-
Notifications
You must be signed in to change notification settings - Fork 496
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Include Signed-urls in the External Tools Framework #7715
Comments
|
Could potentially be approached as ~3 sub-tasks:
Relevant code snippets: UrlSignerUtil.java has sign/validate methods, #6957 is an old PR where Dataverse calls javascript to do a POST in response to a button push, and changes in #7568 (which add a workflow invocationId as a short-lived, workflow-specific alternative to an APIKey) highlight one option for where you could validate the URLs (i.e. you'd fail to findAuthenticatedUser(), which is used across the API, if the URL was invalid). |
Condensed security notes from @joshua-oss: Since this is authorizing access to resources, on a time-bound basis, it would be good to examine: The authorization token generation process
How the system identifies resources
How the system defines time to ensure that none of these can be spoofed or otherwise controverted
|
Desired functionality:
Use case from the DPcreator perspective:
{
"signedUrls":[
{
"type":"fileDownload",
"signedUrl":"https://some-signed-url-343234-234324",
"expirationDate":"2021-05-12T13:40:06Z"
},
{
"type":"metdataRetrievalJSONLD",
"signedUrl":"https://some-signed-url-343234-234324",
"expirationDate":"2021-05-12T13:40:06Z"
}
]
} |
Here's a description of how the signedURLs I proposed work, which I think addresses the questions that have been asked (if I missed some, let me know). Unless I made a typo, here's an example of a signedURL that should be validate-able against the apikey
Given that these are all visible, the receiver (OpenDP) can inspect them and, for example, verify that the url is still valid and/or get the user id. (The validation mechanism, as discussed below, doesn't require keeping the date/time or user id secret and having them visible in the URL doesn't help anyone trying to create a malicious URL.) When the URL is used, the server (Dataverse in the OpenDP case) looks at the user= param, gets the apikey for that user and then creates a sha512 hash of the string where the last param is the concatenation of the never-shared secret Dataverse key and the api key of the listed user (also not shared with OpenDP via the proposed external tool api changes, but potentially available from other sources). If, and only if, that sha512 hash matches the original token= param sent on the URL (as it should here if I manually created the URL correctly - you can try it at https://emn178.github.io/online-tools/sha512.html for example), the server (Dataverse) can be assured that the URL has not been changed since it was signed. If that's true, Dataverse can then check:
Dataverse can then use the user= param (which has to match with the api key used to originally sign because that user id was also used by Dataverse to choose the right key for validation) to assign the user's privileges to this api call. Any change to the URL results in the hash failing and Dataverse would then reject the call as not authorized. If the user's permissions in Dataverse have been revoked since the signedURL was created, the call would still fail. (There isn't a way from the URL to know if that has happened, but one could potentially use a Dataverse admin api (with a credential supplied out-of-band) to query Dataverse about that.) To create a valid signedURL outside Dataverse would require knowing the appropriate user's APIkey (possible available from, for example, the user's browser unless we change Dataverse to not ever show/distribute API keys once signed URLs are implemented) AND the never-shared secret Dataverse key. The latter (which doesn't exist yet since URLSigning has so far only been used for Dataverse requesting data from other services) could be stored on disk with access limited to the account running Dataverse, or, if there's concern that someone could still read it, this key could be stored in a service such as Vault (not recommending this - just an example if storing the key with restricted access on the Dataverse server is not secure enough. Given that a user who could get to a key stored that way could probably do other harmful things to Dataverse, just protecting the Dataverse server as well as possible might be the best option.) To prevent someone from request many URLs to derive the secret Dataverse key, it would also make sense to regenerate that key periodically. That could be manual, or automated. (One aspect of this model with a time-limited signed URL, that wouldn't be true with for example, a one-time use URL, is that there is no database record in Dataverse related to the signed URL (there could be logging of the creation of the signedURL, but validation of the URL doesn't require looking up any per-signed-URL database entry. This helps with scalability and makes things simpler as Dataverse doesn't need an extra table or to worry about removing entries from that table if the URLs are never used, etc.) |
@qqmyers, thank you for the detailed explanation ^ |
|
sprint
|
|
We met today to talk about the big picture of integrating DP Creator with dataverse. An important thing that came out of that discussion was that the initial integration is only at a demo level. Bob explained today why the integration of the support for signed URLs into the External Tools framework in dataverse is not a small task. When we talked further with Raman it became clear that what is required in the short term is only for an MVP that will not be ingesting truly secure data from dataverse. The MVP demonstrates the user experience, operation of the DP Creator libraries, and interaction with dataverse for retrieving data and publishing reaults. It is not going into production at this stage. Raman and Bob and some others will get together next Thursday to figure out a way forward in the short term to satisfy this intial MVP. |
From @raprasad |
Summary from discussions of the proposed addition to add suport for sending signed URLs within the external tools framework: Add support for a new json object in the external tools registration mechanism (as described in the guides at https://guides.dataverse.org/en/latest/admin/external-tools.html and https://guides.dataverse.org/en/latest/api/external-tools.html#creating-an-external-tool-manifest) that would list the set of signed Urls for calls in the Dataverse API that the tool should receive. For each signed Url, the entry should include:
Url Templates would be relative to the base site url and take advantage of the existing reserved words (defined at https://guides.dataverse.org/en/latest/api/external-tools.html#reserved-words) to indicate where Dataverse should replace a token with the value for the specific dataset/file for which the tool is being launched. For example:
would be signed with the specific dataset id and specific site, e.g.
The ExternalTool class would be extended with a new text column which would store the JSON, serialized as a string, for this new registration parameter. The parameter and column will have a name that indicates that these are requested/allowed uris/api calls. In various PRs there are utility classes to read/write Json to/from a string. (Nominally postgres can store Json directly but we have not done this anywhere yet (we have several other columns that store json)). For backward compatibility, if this new param does not exist, Dataverse can continue to send the apikey as it does now. The ExternalToolHandler class has an existing method to retrieve/format the values for each of the registered words (see dataverse/src/main/java/edu/harvard/iq/dataverse/externaltools/ExternalToolHandler.java Lines 133 to 185 in 4947992
Another task to complete this would be to add a flyway sql script that would add the newly defined column to existing databases, e.g. |
W.r.t. #7715 (comment) - it sounds like task
Nominally 3 could be chanced by completing 2, but it should also be possible to a) use the URLSignerUtil manually to sign URLs to test or b) to add a (superuser-only?) createSignedUrl API call that would allow dynamic generation of signed URLs. The latter is probably useful in general and would be a good way to create an integration test (create a signed URL and then try to use it and verify the signed URL API call worked.). |
Caught up with Bob today. He explained in a way that I could understand the concepts around how things work now with the token exchange and how they are going to work. We agreed that it would be good to get the results in front of the other folks involved to makes sure we all stay on the same page in terms of how this demo implementation is going to work. I laid out the flow as I understand it in Miro here
Next: Touch base again and go a level deeper.
|
Sprint:
|
Bob and Jim agreed to a quick quick meeting today. There were some questions raised. If we need a meeting we can do one, but it sounds like we may be close enough that updating the doc is good. As Jim pointed out once we get a bit deeper into the code, we can iterate around anything further. |
At today's meeting
|
feat: make API signing secret a JvmSetting IQSS#7715
In a March 16th meeting with the Dataverse team, improvements to the external tools framework were discussed. Specifically, the team described an external tools improvement to provide signed urls instead of passing a Dataverse general API token. In transitioning from Dataverse to an external tool, Dataverse would:
These signed urls would have the following characteristics:
(a) Limited in scope and linked to a particular user
(b) Limited in time
(c) Encoded with a Dataverse signature
(d) Involve Usage tracking
Each time the signed url is used, the Dataverse would log usage information including:
This was not specifically addressed in the meeting but in addition to the signed urls, the external tools framework should still include defining and passing along the data currently specified as “queryParameters” and described by the Reserved Words in the external tools documentation. Note: The exception is the “apiGeneralToken”--this would be replaced by the specialized urls.
The text was updated successfully, but these errors were encountered: