-
Notifications
You must be signed in to change notification settings - Fork 10.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DataProtection: IDataProtector.UnProtect can force refresh the key ring if it fails to unprotect #3975
Comments
We don't refresh like this because it's easy to send a fake payload with a new key ID in every request, which would then block all incoming requests until the refresh finishes. It's too easy a DoS vector. Your workaround really is the only solution. You could, in startup, just manually spin up a protector and call protect so you're not waiting for the first inbound call to start creating a keyring. |
The DataProtectionStartupFilter.Config function will call the IKeyRingProvider.GetCurrentKeyRing to preload the ring, I don't need to put anything extra in the Startup. The problem is IIS does not start the process until it has the request. IIS8 or later has the preloadEnabled attribute that I can try. My current workaround works only because the protect\unprotect is one way - always have the App1 protect the cookie and App2 and App3 to unprotect it. If the data protection for the shared apps is in any direction, it will still have the problem. Is there a better solution to make all shared apps having the same keyring? |
We're seeing issues related to this pop up and are going to investigate possible solutions. I've assigned myself and put it in a 3.0 milestone so this is tracked and reviewed in the coming weeks. |
I just come across another problem during a few days of performance testing. As said in the above thread, we have App1 to create a cookie and App2 and App3 will use the same data protection key to read the cookie. App2 and App3 are started and ended per user session, During the few days of performance, each day we have a few hours that can see the performance hit. I find out that, when some of the App2 or App3 started, they had transient problem in getting the data protection keys from the Couchbase and the IXmlRepository.GetAllElements fails. Therefore, a new key is created and the StoreElement is called and the new key is saved to the Couchbase. We have 3 nodes before the load balancer. After around 18-24 hrs (KeyRingRefreshPeriod), the App1 in one of the node has the GetAllElements call, it will now have that new key set as Default key. The cookie created by the App1 in that node cannot be decrypted by any App2 and App3 while those App1 in the other two nodes are still ok. However, the App1 in the other two nodes will have the GetAllElements call one by one and they are used the new key. Then, App2 and App3 will have the GetAllElements call and then they can use the new key to decrypt. Therefore, from the first App1 in a node getting the new key till all App2 and App3 in all nodes getting that new key, we have seen performance degradation. One workaround now is changing App2 and App3 to DisableAutomaticKeyGeneration so only App1 can generate new key. But then after the 90 days key expiration, App1 in one node will still generate a new key and we still need to make sure App1 in other nodes and App2 and App3 get that new key. What is the best practice to achieve this key synchronization among different apps and in different nodes? |
@HaoK - is this something you can take a look at while @natemcmaster is on leave? |
Sure I will take a look |
Discussed with @blowdart initial plan for the fix will be the following: When an unknown key id is encountered during a short(configurable?) window of time after dataprotection first starts up, we will automatically refresh the key ring (potentially multiple times if multiple unknown keys are encountered during this window) to mitigate the start up race condition. If the refresh doesn't discover the key, we will fail the request. This will be done in a blocking way, but since its only during a window around the apps starting up this seems like an acceptable tradeoff |
Moving to preview 8 since I'm not familiar enough with this code to get this in immediately which would be what it takes to for preview 7 |
The tradeoff seems appropriate as a short-term solution. From an internal email thread: The best fix would probably be a fix at the storage layer itself, since the storage layer is intended to be quasi-transactional with respect to updates. If this issue manifests itself primarily with the file system storage provider, one potential solution would be to use a monotonically incrementing counter instead of a random GUID for the filename. That is, if the file "key-00000000.xml" doesn't exist then the application would attempt to write that file, if it does then they'd try "key-00000001.xml", and so on. This means that if two applications try to write "key-xxxxxxxx.xml" at the same time, one of them will succeed and one of them will see the conflict and can perform a refresh. This restores the quasi-transactional properties of the file system store. Note that there's no relationship between the filename and the contents of the XML file. The name of the file can be completely divorced from the key id contained within. Updating the file system store to use a predictable filename as in the proposal here can be done without introducing a breaking change to the DataProtection layer, as all .xml files are read by default anyway, so all existing keys will still be honored regardless of filename. |
@HaoK Your suggestion handles the Startup case. How's about the key renew case? The key will be renewed after NewKeyLifetime (90 days by default) and the KeyRingRefreshPeriod is hard coded to 24 hrs and the GetRefreshPeriodWithJitter will fudge the refresh period up to -20%. If App1 renew first, there is that 4-5 hrs period where App2 and App3 are using old keys. @GrabYourPitchforks You talk about the file system only. How's about those that implement the IXmlRepository and save the key in DB? |
@kevinlo The Edit: I see now that the work item is talking about a specific DB repository, not the file system repository (as has been the culprit in other work items which have been opened). In your particular case, the 20ms delay in the database itself violates a fundamental assumption of the DataProtection layer: all machines must have a consistent view of the key ring ( |
@GrabYourPitchforks I prefer to use file system for the data protection key, but my architect choose DB. Do you have any documentation that I can show my architect that IXmlRepository needs to be quasi-transactional? Back to your file system case, could you please explain more details how it works. When you say
Do you mean it does not exist in the application memory or in the file system? E.g when both App1 and App2 are started, they both try to read the keys from the file system. If the file system is empy, they both try to create a new key and write "key-00000000.xml". Say App1 successfully writes it and App2 fails. Does App2 retry to refresh the key ring first before it tries to write it as "key-00000001.xml"? If App2 does not refresh and write it as "key-00000001.xml", how does App1 and App2 to refresh and get both keys. As you said the key name has nothing to do with the content, it is possible the key name with larger number has older key than the one with smaller number, right? |
I don't know if the docs say this explicitly. When we wrote the docs we were primarily concerned with things like correct usage of the APIs and describing configurable behaviors. We do try to write down assumptions made by the APIs, but on occasion we forget to list something because it never occurs to us that somebody might violate a behavior that we treat as a core assumption. In this case, the assumption is that changes made to the repository by one client are immediately visible to all other clients. And I can say as the original author of the DataProtection stack that this was definitely an assumption. :) Whether this is a good assumption is a different matter, and perhaps that's a debate worth having, but then this moves from "bug" to "feature request". And even if the feature request is approved, there's no guarantee that
That's correct. If there's a conflict at write time, the keyring would need to refresh so that it would see any keys written by the client that "won" the race. You may even be able to perform this trick inside your database layer. I assume that there's some kind of consistency mechanism in the database so that two clients don't try to write the same entry within a single 200 millisecond window. If the DB can somehow detect "Hey - you're asking me to write 00000 but somebody has already beaten you to it. Wait a few milliseconds and try querying to see the value." then you might be able to fulfill the necessary assumptions within your repository. Hope this helps! |
@GrabYourPitchforks I think that assumption make sense, but now we have Couchbase consistency issue with data vs. index. Even with the quasi-transactional file system, do you think it still has the key renew case that I ask @HaoK?
My question is how to refresh the keyring from my app. At first, I submit this issue expecting ASP.NET Core would do it. But blowdart says it can cause DoS. From the Key management extensibility doc, it says the IKeyManager can Get all keys from storage, but it also has the warnings:
Do you have sample codes how to force the App to refresh the keys? |
Fixed by #11987 |
I have an App1 that will create the cookie and have 20 instances of App2 and 20 instances of App3 that will read the cookie. The data protection key is stored in Couchbase.
When these applications are first deployed without the key in the Couchbase, all 20 instances of App2 and 20 instances of App3 are started. They all CreateNewKey. Because the Couchbase indexes are updated cyclicly, every 20ms, the GetAllKeys call may not see the key added by the CreateNewKey immediately. I finally see 62 keys created.
I think because of that 20ms time lag, the GetAllKeys call in some instances of App2 and/or App3 get more keys than others. The App1 is started last and it will get all the keys and so its default key is newer than some instances of App2 and App3. Therefore, the cookie protected by the newest default key in the App1 cannot be unprotected by some instances of App2 and App3.
Currently, KeyRingBasedDataProtector.UnprotectCore calls the _keyRingProvider.GetCurrentKeyRing to get the CacheableKeyRing. The CacheableKeyRing.IsValid just checks if the keys in the key ring is not revoked and expired. In the above case, the keys are all valid and the CacheableKeyRing will be used, and it does not have the latest key from the DB.
Should the IKeyRingProvider.GetCurrentKeyRing have a forceRefresh parameter default to false so that if the KeyRingBasedDataProtector.UnprotectCore fails, it will call GetCurrentKeyRing(true) to force it to refresh the key rings and try again OR provide another way to tell the IKeyRingProvider to discard the cache? KeyRingBasedDataProtector.UnprotectCore can try to unprotect again with the refreshed key ring. At least it can force it to discard the cache and get the refresh one so next unprotect call would be good.
The current workaround I think of is making the App1 startup first so it will generated the key. When the App2 and App3 are started, they can read that key and App2 and App3 always have the same or newer key as App1.
I think that application dependency is a possible solution but it would be nice if the ASP.NET Core Data Protection can handle this situation magically so I don't need to care about the application dependency.
The text was updated successfully, but these errors were encountered: