-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document best-practices for IP address stability and accessing remote datasets #63
Comments
Aah, each node has its own external IP address, and they rotate fairly frequently. Can you instead create a new service principal that can be used to access blob storage instead? Even if we do have hub IP stability, it means folks on other hubs (since they all share a cluster) could access your blob storage! |
@yuvipanda this seems like a useful use-case to think about for documentation purposes. I think it is more "power-user" than most people in 2i2c.org/pilot, but it also seems like it'd be pretty common. Where do we imagine the Q/A for this kinda question would go? Maybe this is the kind of thing that would be useful to have a Discourse forum for? |
This sounds like a good suggestion, I just need to figure out how to 'create a new service principal that can be used to access blob storage'. Any suggestions for that? |
For Azure, I'm guessing we'll need to create a service principal that has access to blob storage. https://docs.microsoft.com/en-us/azure/storage/common/storage-auth seems to be the article that has the overview. Can you tell us how you're allowing access to a particular IP? Maybe we can dig in from there. |
Within the container, I'm just granting specific IPs access in the networking blade - this was the easiest thing for me to manage with a high level of certainty and without having to deal with a virtual network or Azure active directory. But happy to change it around. |
yeah, unfortunately you might need to do the azure active directory thing now. I guess longer term this won't be as much of a problem since we can run this on Azure. |
I've started playing around with this and Microsoft seems to recommend granting users access to the blob using AAD rather than creating a broader service principal with access. In our case, this would look like creating guest user AAD accounts in Azure and then using this (https://azuresdkdocs.blob.core.windows.net/$web/python/azure-storage-blob/12.8.0/index.html) to provide users access. What I've tried so far is:
But, I still get an authentication error: I have also tried to set the |
As of right now, I'm just going to store a copy of the data in the shared folder on the hub and wait till we move to Azure itself to figure out how to integrate using the blob storage. |
@JILPulvino ah, I see re: confidentiality. How big is the data? Managing it in the hub in the medium run is probably the best option, and we should figure out how to do that securely. In the meantime, putting each in an Azure blob and granting access is a great way to go. Can you give me access to your Azure cloud project so I can try debug per-user permissions? |
Nope, not a problem at all. |
@yuvipanda I've added you as an admin to our azure portal and you should be receiving an invitation. |
Hey all - I'm not sure if there's more to work on here or not. Let's scope this issue to resolving @JILPulvino's immediate need, and I've opened up 2i2c-org/infrastructure#372 to track us documenting best-practices for object data storage access in general. @JILPulvino - what's left to do here? Is this actionable on 2i2c's end? |
In our use case, we have a number of different confidential datasets and our hub users should have differing access to them. Because of this, we can't just store all of them in the current shared folder system on the hub as then all users have access to all data. Ideally we could either (1) isolate data on the hub to specific users or (2) store the data on our Azure services in isolated containers and then provide access to users to specific containers. Prior to the hub, I had been using (2) and granting access based on an individual's IP address, but if users are using the hub, then that option no longer works and I need to provision access via Azure Active Directory - which is where I think we landed was the best solution and I just need to set up Azure Active Directory rather than using the IP address access. As for what 2i2c can do, I think it'd be just laying out what you think best practices are. |
We've ultimately switched to authentication to our Azure containers and storage accounts using AAD so this is no longer an issue. |
We use an Azure blob storage and approve specific IP addresses to the blob storage. Is there an IP range that we should approve? I've approved the IP address for the hub when I log in, but it appears that other users have a different ip address?
The text was updated successfully, but these errors were encountered: