-
Notifications
You must be signed in to change notification settings - Fork 538
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Security to Riak #355
Comments
Concerning user groups, I'd vote yes. If a group of permissions could be bundled (aka roles), then users could be assigned to a group/role rather than granted/revoked permissions individually. This could prove helpful, not only in implemented RBAC, but also reduce the complexity of defining multiple users with similar complex roles. |
+1 |
This should include a configurable Certificate Revocation List; otherwise untrusted clients can't be removed without basically starting the CA from scratch. |
+1 Groups are important, especially in the LDAP/ActiveDirectory world. |
There are fundamentally three modes for TLS' authentication model:
I recommend either 1.) fully authenticated or 2.) server-authenticated TLS channels. This mandates the use of a certificate store on each client. While you're going to that trouble, it might make sense to also generate and store client certificates as well--and indeed, the proposal requires the secure storage of client keys and certificates; which means you'll need a key distribution scheme for clients. Given the presence of client secure storage for client keys and certs, you might as well encode the user credentials in the certificate directly. This removes the need for passwords and their secure storage on the server, which reduces the attack profile. It also removes the need for a separate username/password auth channel in the Riak protocol. That'd make it simpler for client maintainers to add auth support to their clients, since they can rely on the TLS protocol to do the work for them. Clients only have to store/configure [key, cert], instead of [key, cert, username, password]. User access can be cancelled via the usual CRL techniques. |
Canola (a PAM authenticator module) at https://github.com/basho/canola should be included in the Work In Progress list. |
@bkerley Yes, CRL support is something I forgot to cover, will update the text. @aphyr Yes, I plan to support both 1 and 2 (but not 3). If you want to use the certificate authentication mode (with no additional password), we require you to handshake via method 1. This is already sort of implemented for the PBC protocol and the riak_test shows its use: https://github.com/basho/riak_test/blob/adt-security/tests/pb_security.erl#L94 |
Will the ACL style permissions preclude / (make it difficult) to use a more capability driven security model (oauth scopes) in a future phase? Has oauth been discussed? |
How would this work with MDC? |
It would be useful to also have authorization for list_keys, list_buckets and secondary index queries as well as for the ability to run mapreduce queries. |
@cdahlqvist From my reading of the proposal, all operations will be authorized and audited. |
@aggress MDC doesn't deal with client authentication at all, it already has support for TLS based authentication, where both sides verify the other's certificate. @cdahlqvist Yes, EVERY operation (or almost all of them) will need a permission associated, I just don't know what some of them will look like yet. The only exception to the rule may be the stats and ping endpoints, you may be able to hit them simply by being authenticated, not sure yet. |
@Vagabond I was thinking more along the lines of will users/roles created in cluster a) be replicated over to cluster b) or will they need to be set up individually and how might things work with such things like cascading writes? |
@aggress Yes, that is a good question, will add it to the open questions section. My initial feeling is to NOT replicate that information, but we'll see. |
How might commit hooks be handled? stopping user |
Oh yeah, that reminds me: Erlang mapreduce is basically a wide-open door for arbitrary code execution, including, I suppose, modifying the ACLs themselves, so it should only be accessible to the highest privilege levels. |
re: replication between clusters, that isn't something we planned to support in cluster metadata (where this info will be stored) -- at least initially. Either way, I agree w/ @Vagabond's initial feeling. Even if the cluster is accessed by the same logical users I would assume they are typically accessed by different hardware (external LB at the least, probably different application servers). |
Something Riak might consider is a capability-based security model for granting access to buckets. I think capability-based security could fit extremely well with Riak's key/value storage model and have done a bit of work in this space. Under this model, authentication could be handled using whatever mechanism is desired (e.g. mutual TLS), but to authorize access to a particular bucket, the client would need to present a bucket-specific token, which could actually be a combination of cryptographic keys (known as a crypto-capability model). I've implemented a generic key/value store encryption system which works with Riak among other key/value stores here, if you're interested in seeing a real-world example of what I'm describing. My scheme encrypts both keys and values, allows data to be accessed using only encrypted keys, and allows clients to decrypt the key names if so desired: https://github.com/cryptosphere/keyspace The best part of this approach is that it has minimal impact on Riak. In fact, encryption is orthogonal, and something only clients would have to support. The only thing that would have to be added to Riak itself is a digital signature check (along with a timestamp check to prevent replay attacks) to ensure values being written are authentic. |
+1 capabilities. It's generally easier to understand and more secure. Managing ACLs becomes cumbersome very quickly from my experience. |
So, after a bunch of reading and internal discussion, I think we're going to stick with ACLs, for the following reasons:
However, I think I will add postgres style roles, as a way to implement 'groups'. |
Sad to hear that :( I buy the familiarity argument, but that's really the only thing ACLs have going for them over capabilities. Since capabilities solve the AuthZ problem, you can still use MTLS to solve AuthN and revoke access that way. Audit logging can be used to spot abuse of capabilities. Relevant: Zed Shaw - The ACL Is Dead |
I am watching that talk, but I'm struggling to extract much of relevance from it. It is sort of like a tech-related street performance that occasionally touches on ACLs en-route to bagels, smoked meat, corporate greed, the incompetence of MBAs, etc. The three key points he seems to make are:
Maybe I'm dense, but I don't understand why those are even a problem. I understand his point about ridiculous business rule requirements about time and situation dependent ACLs, but Riak does not really have that problem. Right now my takeaway is this: ACLs for people != ACLs for applications. Applications rarely need time-dependent or situation dependent access to data, they have their data and they want to access it whenever they need to, and these access rules change rarely. Riak is not a document management solution, it is a database. It is used by applications, not people. I'm happy to have a discussion about this, but providing references to things like the 'authz problem' that is not a 1:10 stream of consciousness rant about all sorts of unrelated things would help your case a lot more. It is fairly telling that none of the questions at the end were even about ACLs at all, beyond one question about making what Zed did into a product. |
Haha, sorry about that. But I hope it drives home that ACLs are in an uncanny valley between a capability based system and Turing-complete code for providing AuthZ. Waterken Web describes some of the tradeoffs of capabilities vs ACLs: http://waterken.sourceforge.net/ You might also take a look at how Tahoe-LAFS implements "writecaps" and "readcaps" for its mutable files. You wouldn't need anything so elaborate, just a digital signature: http://eprint.iacr.org/2012/524.pdf Tahoe ends up providing something that looks an awful lot like an encrypted version of Riak, sans many of the features that make Riak compelling as a database (read repair, vector clocks, 2I, etc) |
Maybe we can narrow the conversation here. When would the confused deputy problem occur for Riak, along the lines of the compiler example here: |
I guess my biggest sources of mystification are the following:
|
@Vagabond that's not a question I can answer until you have defined a threat model. Only then can you enumerate potential attacks and choose defenses. I can perhaps enumerate why ACLs don't work in practice with an example threat model: Threat: We want to give Alice, but not Mallory, AuthZ to X even though both Alice and Mallory can both AuthN to the service providing X and Alice and Bob are conspirators In the end the result is the same, with some caveats: In the capability scenario, we see Mallory accessing the resource illicitly, but don't learn that Alice is a conspirator. In the ACL scenario, we don't learn about Mallory's involvement at all, as it appears that Alice accessed the resource. In the ACL scenario, Alice's behavior in the audit logs looks "normal", because Alice is authorized to access X. In the capability scenario, we can cross check the audit logs with our records of who should be able to access what, and determine that Mallory accessed X illicitly. Thus, while capabilities are shareable, it's probably in Mallory's best interest to act as if they weren't and obtain X through a conspirator, lest his actions show up in the audit logs. In other words, while the fact capabilities are shareable appears to be disadvantageous, it's actually in the attacker's best interest not to take advantage of this fact, lest their actions appear in the audit logs. A sophisticated attacker will want to piggyback their attack on normal looking behavior as this will make it harder to detect.
This is a fairly open-ended question as there are many ways that capabilities can be implemented. I can roughly detail what you could do with the sort of crypto-capabilities model implemented by Tahoe (although in this case I'm only describing how you'd ensure authenticity of data, not confidentiality. Tahoe provides both) In general capability tokens are considered necessary and sufficient in and of themselves for accessing a particular resource. This doesn't preclude adding an additional mutual TLS layer or what have you to AuthN to the service. Ideally every part of the system has an associated set of capabilities. All data is individually, uniquely, and securely identifiable. So for starters: every bucket would have separate write/authenticate capabilities, if not every key. So, at the time you create a bucket, a public and private digital signature key would be generated. The server would store the public key and use it to authenticate writes. The private key would allow new data to be written. The server would mandate that all writes be digitally signed (hopefully with a timestamp to prevent replay attacks) Requests to write would include some type of request parameter containing a digital signature produced client side by the holder of a private key for a particular bucket or bucket:key combination. The server would authenticate digital signatures before accepting the write. |
+1 and thanks! |
I am looking into Riak for project requiring a secure distributed. I need to make sure that if one node is compromised, i.e. server has been taken over, it will not be possible to break the entire cluster. For example, by prevention against altering permissions, or changing commit hooks. |
@glagnar: I doubt you'll satisfy that property in any major distributed database without end-to-end cryptographic verification of writes by both all servers and all clients. As an example, take a look at what's required to build http://www.pmg.csail.mit.edu/bft/castro99correctness-abstract.html |
@glagnar This is a different sort of security altogether. If a box itself is compromised, the user can simply give themselves any permissions they want via |
@coderoshi Thanks, I know. That was my exact source of worry. In a situation where the server is compromised, could an 'admin password' not solve this issue ? I.e. without password authentication, it should not be be allowed to change for example permissions within the cluster of nodes ? |
@glagnar if you really want a "trust no one" system where the compromise of a single node has zero impact on the rest of the grid, you might look at Tahoe-LAFS. It satisfies those properties (namely end-to-end cryptographic confidentiality and integrity of all content as @aphyr described): http://tahoe-lafs.org |
@aphyr well yes, but ideally you separate the Tahoe nodes which provide storage service from the clients which are accessing the content, in which case only the clients see the capabilities/secrets, and the storage nodes are otherwise completely oblivious and see only ciphertexts. In such a deployment, the servers could be compromised without worry |
Is it possible to perhaps setup RIAK in a unidirectional replication manor. I.e. A is master, and B & C are slaves. This means that it does not matter if B or C are compromised. Then 3 clusters could be set up, one where in turn A, B or C is master. A client would then be able to detect if one master had been compromised, by looking at the difference between the three clusters. |
No this is not possible. All Riak nodes are equivalent.
|
I tried Security extensions with user/CIDR authentication. It seems to work fine. But I can't find how to remove Sources. Could anyone tell me when I should try most of functions? |
See #434 and related PRs, not all of the security code has landed yet. |
Thanks, @Vagabond |
Closing this as most of what was described here landed in 2.0 pre builds. |
This is a tracking meta-issue for the cross-repo task of adding security to Riak.
Rationale
Riak (not Riak CS, which has its own application layer security model) currently has no authentication or authorization, nor does it provide encryption for the Protocol Buffers API (HTTPS is optional for the REST interface). In most deployments, Riak is deployed on a trusted network and unauthorized access is restricted by firewall/routing rules. This is usually fine, but if unauthorized access is obtained, Riak offers no protection to the data that it stores.
Thus, I propose we add authentication/authorization/TLS and auditing to Riak, to make Riak more resilient to unauthorized access. In general, I took the design cues from PostgreSQL. Another goal was to make this applicable to riak_core, so any reliance on KV primitives or features are intentionally avoided.
Authentication
All authentication should be done over TLS for two reasons, to avoid MITM attacks and to prevent eavesdroppers sniffing credentials. Self-signed certificates are acceptable if the client checks the server's certificate against a local copy of the CA certificate (and thus we can avoid the complicated 'web of trust' used by regular HTTPS). CRL checks should be done, when it is appropriately configured.
Once TLS has been negotiated and verified, the client supplies username/password credentials. The password is transmitted in the clear, this is to facilitate writing pluggable security backends. This is not a major problem because at this point the connection should be proof against eavesdropping.
The pluggable security backends we propose to implement are the following:
Postgres auth methods
Authentication information is split into two pieces, users and sources. A user cannot authenticate without a corresponding source that matches username/peer address.
Postgres authentication source configuration
To add a user named andrew and to trust all connections from localhost, you'd do:
To add a user that you wanted to authenticate against the local password table and be allowed to connect from anywhere:
The password provided at user-creation time is hashed via PBKDF2 and stored.
To trust users on the LAN but to force everyone else to authenticate against PAM:
The service=riak option tells PAM to submit any provided credentials to that particular PAM service configuration. Sources are compared most to least specific, both by the user match and the CIDR match (specific usernames sort before 'all' and a /24 sorts before a /0). Only the first matching source is tested, if that fails, the authentication fails.
Authorization
Riak currently has a completely permissive approach to data access, if you can connect, you can get/put/delete anything you want. Providing authentication, as in the above section, raises the bar to that kind of access, but it still leaves your data vulnerable to a compromised client, especially if you have something like a lower security reporting application (or even a remotely hosted one with a hole punched in the firewall). This also makes anything like multi-tenancy impossible (think hosting multiple phpbb instances on a single mysql server).
Thus, in addition to authentication, we also need authorization. This is a major change to Riak's semantics, especially when it comes to creating buckets, which in Riak now is as simple as writing a key to the bucket you want to create. For applications that want to dynamically create buckets, we need to provide some way to give them authorization to do so, without compromising the ability to provide security.
To that end, I propose that authorization be checked on a per-bucket basis. Users are granted granular permissions (registered by the individual riak_core applications):
The permissions are namespaced by the registering application, so the above permissions become riak_kv.get, riak_kv.put, etc. These permissions convey no meaning to riak_core, the application is in charge of indicating what permissions are required for each operation.
Examples of granting permissions:
To preserve the ability to dynamically create buckets whose name is not known beforehand (think buckets per-username or something), I propose the ability to GRANT based on a bucket prefix:
Thus, the application connecting with the 'andrew' credential can create unlimited buckets that begin with 'myapp_', but has no access to buckets outside that prefix space.
Additionally, perhaps you want to give a user access to everything, Riak could support the ALL permission, and the ANY target:
This would effectively provide the old unlimited access that Riak currently has, but still provide some security.
It may also be interesting to wildcard permissions by application, eg. 'riak_kv.*'.
As the superuser giveth, he may also taketh away:
Grants and revokes are currently stored separately. The goal is to make users/sources/grants strongly consistent and revokes eventually consistent. That way, during an outage (possibly caused by a malicious/co-opted user account), you can revoke without requiring complete cluster availability, but you avoid problems with partial grants, etc.
Auditing
Since every operation will now be tied back to a user account, we should be able to audit what user did what and when. To that end I plan to extend lager to support alternate event streams (with a separate gen_event) and use that as an audit logging facility. Pairing that with the syslog backend, you'd be able to ship the logs off the machine and so make them harder to tamper with. This is a stretch goal for this development cycle.
Migration
When this work drops in the next major release, existing deployments will have to migrate. For at least existing deployments, the security stuff should default to off. When the user is ready to turn it on, they'll need to have upgraded their client libraries to support it as well as deployed SSL certificates to all the nodes, signed by the same CA. Until that switch gets flipped, clients will work exactly as they do now.
Open Questions
Risks
Example client sessions
HTTP: https://gist.github.com/Vagabond/05b7dc8ae6d3ca4af6c2
PBC: https://gist.github.com/Vagabond/6222793a1d352f1ccdd2
Work in Progress
Partial implementations of all of this may be found in the 'adt-security' branch of the following repos:
The text was updated successfully, but these errors were encountered: