Skip to content

Commit 99e728c

Browse files
sbernauerlfrancke
andauthored
feat!: Support only sending a subset of the fields to OPA (#49)
* WIP: First implementation * refactor: Use dynamic dispatch (something something Java :)) * bump to HDFS 3.4.0, as it's the LTS now * Update deps * changelog and docs * fix: Set path to `/` when the operation `contentSummary` is called on `/`i * changelog * changelog * Add benchmark shell * linter * refactor: Making random stuff final * Update README.md Co-authored-by: Lars Francke <git@lars-francke.de> * docs: Document reduced API call * markdown linter * Try silencing markdown linter * Update src/main/java/tech/stackable/hadoop/StackableAccessControlEnforcer.java Co-authored-by: Lars Francke <git@lars-francke.de> * Update README.md Co-authored-by: Lars Francke <git@lars-francke.de> * Update benchmark to use nested folder --------- Co-authored-by: Lars Francke <git@lars-francke.de>
1 parent 8fb154d commit 99e728c

File tree

9 files changed

+138
-16
lines changed

9 files changed

+138
-16
lines changed

.markdownlint.yaml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,3 +18,8 @@ MD013:
1818
MD024:
1919
# Only check sibling headings
2020
siblings_only: true
21+
22+
# MD033/no-inline-html Inline HTML [Element: summary]
23+
MD033:
24+
# Allow expandable boxes
25+
allowed_elements: [details, summary]

CHANGELOG.md

Lines changed: 11 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,11 +6,20 @@ All notable changes to this project will be documented in this file.
66

77
### Changed
88

9-
- Bump okio to 1.17.6 to get rid of CVE-2023-3635 ([#46])
9+
- BREAKING: Only send a subset of the fields sufficient for most use-cases to OPA for performance reasons.
10+
The old behavior of sending all fields can be restored by setting `hadoop.security.authorization.opa.extended-requests`
11+
to `true` ([#49]).
1012
- Performance fixes ([#50])
11-
- Updates various dependencies and does a full spotless run. This will now require JDK 17 or later to build (required by later error-prone versions), the build target is still Java 11 [#51]
13+
- Updates various dependencies and do a full spotless run. This will now require JDK 17 or later to build
14+
(required by later error-prone versions), the build target is still Java 11 [#51]
15+
- Bump okio to 1.17.6 to get rid of CVE-2023-3635 ([#46])
16+
17+
### Fixed
18+
19+
- Set path to `/` when the operation `contentSummary` is called on `/`. Previously path was set to `null` ([#49]).
1220

1321
[#46]: https://github.com/stackabletech/hdfs-utils/pull/46
22+
[#49]: https://github.com/stackabletech/hdfs-utils/pull/49
1423
[#50]: https://github.com/stackabletech/hdfs-utils/pull/50
1524
[#51]: https://github.com/stackabletech/hdfs-utils/pull/51
1625

README.md

Lines changed: 40 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -26,13 +26,51 @@ The Stackable HDFS already takes care of this, you don't need to do anything in
2626

2727
- Set `dfs.namenode.inode.attributes.provider.class` in `hdfs-site.xml` to `tech.stackable.hadoop.StackableAuthorizer`
2828
- Set `hadoop.security.authorization.opa.policy.url` in `core-site.xml` to the HTTP endpoint of your OPA rego rule, e.g. `http://opa.default.svc.cluster.local:8081/v1/data/hdfs/allow`
29+
- The property `hadoop.security.authorization.opa.extended-requests` (defaults to `false`) controls if all fields (`true`) should be sent to OPA or only a subset
30+
Sending all fields degrades the performance, but allows for more advanced authorization.
2931

3032
### API
3133

32-
For every action a request similar to the following is sent to OPA:
34+
By default for every HDFS action a request similar to the following is sent to OPA:
35+
36+
```json
37+
{
38+
"input": {
39+
"fsOwner": "nn",
40+
"supergroup": "supergroup",
41+
"callerUgi": {
42+
"realUser": null,
43+
"userName": "alice/test-hdfs-permissions.default.svc.cluster.local@CLUSTER.LOCAL",
44+
"shortUserName": "alice",
45+
"primaryGroup": "developers",
46+
"groups": [
47+
"developers"
48+
],
49+
"authenticationMethod": "KERBEROS",
50+
"realAuthenticationMethod": "KERBEROS"
51+
},
52+
"snapshotId": 2147483646,
53+
"path": "/developers-ro/hosts._COPYING_",
54+
"ancestorIndex": 1,
55+
"doCheckOwner": false,
56+
"ignoreEmptyDir": false,
57+
"operationName": "getfileinfo",
58+
"callerContext": {
59+
"context": "CLI",
60+
"signature": null
61+
}
62+
}
63+
}
64+
```
65+
66+
The contained details should be sufficient for most use-cases.
67+
However, if you need access to all the provided information from the `INodeAttributeProvider.AccessControlEnforcer` interface, you can instruct hdfs-utils to send all fields by setting `hadoop.security.authorization.opa.extended-requests` to `true`.
68+
However, please note that this results in very big JSON objects being send from HDFS to OPA, so please keep an eye on performance degradations.
69+
70+
The following example provides an extend request sending all available fields:
3371

3472
<details>
35-
<summary>Example request</summary>
73+
<summary>Example extended request</summary>
3674

3775
```json
3876
{

src/main/java/tech/stackable/hadoop/OpaException.java

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,6 @@
22

33
import static tech.stackable.hadoop.StackableGroupMapper.OPA_MAPPING_URL_PROP;
44

5-
import java.net.URI;
65
import java.net.http.HttpResponse;
76

87
public abstract class OpaException extends RuntimeException {
@@ -22,7 +21,7 @@ public UriMissing(String configuration) {
2221
}
2322

2423
public static final class UriInvalid extends OpaException {
25-
public UriInvalid(URI uri, Throwable cause) {
24+
public UriInvalid(String uri, Throwable cause) {
2625
super(
2726
"Open Policy Agent URI is invalid (see configuration property \""
2827
+ OPA_MAPPING_URL_PROP
Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
package tech.stackable.hadoop;
2+
3+
import org.apache.hadoop.hdfs.server.namenode.INodeAttributeProvider;
4+
5+
public class OpaReducedAllowQuery {
6+
public final OpaReducedAllowQueryInput input;
7+
8+
public OpaReducedAllowQuery(OpaReducedAllowQueryInput input) {
9+
this.input = input;
10+
}
11+
12+
/**
13+
* Similar to {@link OpaAllowQuery.OpaAllowQueryInput}, but this class only contains a subset of
14+
* fields that should be sufficient for most use-cases, but offer a much better performance. See
15+
* <a href="https://github.com/stackabletech/hdfs-utils/issues/48">this issue</a> for details.
16+
*/
17+
public static class OpaReducedAllowQueryInput {
18+
public String fsOwner;
19+
public String supergroup;
20+
// Wrapping this
21+
public OpaQueryUgi callerUgi;
22+
public int snapshotId;
23+
public String path;
24+
public int ancestorIndex;
25+
public boolean doCheckOwner;
26+
public boolean ignoreEmptyDir;
27+
public String operationName;
28+
public org.apache.hadoop.ipc.CallerContext callerContext;
29+
30+
public OpaReducedAllowQueryInput(INodeAttributeProvider.AuthorizationContext context) {
31+
this.fsOwner = context.getFsOwner();
32+
this.supergroup = context.getSupergroup();
33+
this.callerUgi = new OpaQueryUgi(context.getCallerUgi());
34+
this.snapshotId = context.getSnapshotId();
35+
this.path = context.getPath();
36+
this.ancestorIndex = context.getAncestorIndex();
37+
this.doCheckOwner = context.isDoCheckOwner();
38+
this.ignoreEmptyDir = context.isIgnoreEmptyDir();
39+
this.operationName = context.getOperationName();
40+
this.callerContext = context.getCallerContext();
41+
}
42+
}
43+
}

src/main/java/tech/stackable/hadoop/StackableAccessControlEnforcer.java

Lines changed: 30 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -52,11 +52,14 @@ public class StackableAccessControlEnforcer
5252
private static final Logger LOG = LoggerFactory.getLogger(StackableAccessControlEnforcer.class);
5353

5454
public static final String OPA_POLICY_URL_PROP = "hadoop.security.authorization.opa.policy.url";
55+
public static final String EXTENDED_REQUESTS_PROP =
56+
"hadoop.security.authorization.opa.extended-requests";
5557

5658
private static final HttpClient HTTP_CLIENT =
5759
HttpClient.newBuilder().connectTimeout(Duration.ofSeconds(30)).build();
5860
private final ObjectMapper json;
59-
private URI opaUri;
61+
private final URI opaUri;
62+
private final boolean extendedRequests;
6063

6164
public StackableAccessControlEnforcer() {
6265
LOG.debug("Starting StackableAccessControlEnforcer");
@@ -72,9 +75,11 @@ public StackableAccessControlEnforcer() {
7275
try {
7376
opaUri = URI.create(opaPolicyUrl);
7477
} catch (Exception e) {
75-
throw new OpaException.UriInvalid(opaUri, e);
78+
throw new OpaException.UriInvalid(opaPolicyUrl, e);
7679
}
7780

81+
extendedRequests = configuration.getBoolean(EXTENDED_REQUESTS_PROP, false);
82+
7883
json =
7984
new ObjectMapper()
8085
// OPA server can send other fields, such as `decision_id`` when enabling decision logs
@@ -98,7 +103,14 @@ public StackableAccessControlEnforcer() {
98103
// recursion (StackOverflowError)
99104
.addMixIn(DatanodeDescriptor.class, DatanodeDescriptorMixin.class);
100105

101-
LOG.debug("Started HdfsOpaAccessControlEnforcer");
106+
StringBuilder logStartupStatement = new StringBuilder("Started StackableAccessControlEnforcer");
107+
if (this.extendedRequests) {
108+
logStartupStatement.append(" sending extended requests");
109+
} else {
110+
logStartupStatement.append(" sending reduced requests");
111+
}
112+
logStartupStatement.append(" to OPA url ").append(this.opaUri);
113+
LOG.debug(logStartupStatement.toString());
102114
}
103115

104116
private static class OpaQueryResult {
@@ -158,7 +170,21 @@ public void checkPermission(
158170
@Override
159171
public void checkPermissionWithContext(INodeAttributeProvider.AuthorizationContext authzContext)
160172
throws AccessControlException {
161-
OpaAllowQuery query = new OpaAllowQuery(new OpaAllowQuery.OpaAllowQueryInput(authzContext));
173+
// When executing "hdfs dfs -du /" the path is set to null. This does not worsen security, as
174+
// "/" is the highest level of access that a user can have.
175+
if (authzContext.getOperationName().equals("contentSummary")
176+
&& authzContext.getPath() == null) {
177+
authzContext.setPath("/");
178+
}
179+
180+
Object query;
181+
if (this.extendedRequests) {
182+
query = new OpaAllowQuery(new OpaAllowQuery.OpaAllowQueryInput(authzContext));
183+
} else {
184+
query =
185+
new OpaReducedAllowQuery(
186+
new OpaReducedAllowQuery.OpaReducedAllowQueryInput(authzContext));
187+
}
162188

163189
String body;
164190
try {

src/main/java/tech/stackable/hadoop/StackableGroupMapper.java

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ public class StackableGroupMapper implements GroupMappingServiceProvider {
2323
private static final HttpClient HTTP_CLIENT =
2424
HttpClient.newBuilder().connectTimeout(Duration.ofSeconds(30)).build();
2525
private final ObjectMapper json;
26-
private URI opaUri;
26+
private final URI opaUri;
2727

2828
public StackableGroupMapper() {
2929
// Guaranteed to be only called once (Effective Java: Item 3)
@@ -37,7 +37,7 @@ public StackableGroupMapper() {
3737
try {
3838
this.opaUri = URI.create(opaMappingUrl);
3939
} catch (Exception e) {
40-
throw new OpaException.UriInvalid(opaUri, e);
40+
throw new OpaException.UriInvalid(opaMappingUrl, e);
4141
}
4242

4343
LOG.debug("OPA mapping URL: {}", opaMappingUrl);

test/stack/20-hdfs.yaml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -51,6 +51,9 @@ spec:
5151
level: INFO
5252
tech.stackable.hadoop:
5353
level: DEBUG
54+
# configOverrides:
55+
# hdfs-site.xml:
56+
# hadoop.security.authorization.opa.extended-requests: "true"
5457
roleGroups:
5558
default:
5659
replicas: 2

test/stack/31-benchmark-shell.yaml

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -27,14 +27,13 @@ spec:
2727
2828
log_in admin
2929
30-
bin/hdfs dfs -mkdir -p /bench
31-
bin/hdfs dfs -ls /bench
30+
bin/hdfs dfs -mkdir -p /bench/deep/path/with/many/sub/sub/dirs/
3231
33-
# for i in $(seq 0 10); do echo "Creating $i" && bin/hdfs dfs -put -f /etc/hosts /bench/$i; done
32+
# for i in $(seq 0 100); do echo "Creating $i" && bin/hdfs dfs -put -f /etc/hosts /bench/deep/path/with/many/sub/sub/dirs/$i; done
3433
3534
# Watch out for the exact command you are using! (e.g. don't use "du -h /""). Checl the NameNode logs to
3635
# make sure you actually produce enough OPA calls.
37-
# time bin/hdfs dfs -du -h /bench
36+
# time bin/hdfs dfs -du -h /bench/deep/path/with/many/sub/sub/dirs
3837
3938
# So that you can run the benchmark manually
4039
sleep infinity

0 commit comments

Comments
 (0)