Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#2234] feat(core): Add the support of User entity #2481

Merged
merged 62 commits into from
Mar 29, 2024

Conversation

qqqttt123
Copy link
Contributor

@qqqttt123 qqqttt123 commented Mar 8, 2024

What changes were proposed in this pull request?

Add the UserEntity. Gravitino doesn't manage users, just sets up the relationship between the metalake and the user. So we don't bring too many fields in the User entity. More user information should be managed by external user system.

Why are the changes needed?

Fix: #2234

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Add UT.

@qqqttt123 qqqttt123 self-assigned this Mar 8, 2024
@Override
public Namespace namespace() {
return namespace;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need namespace for User?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

User is under a metalake. So we should have a namespace.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's weird to use namespace here, we don't have a namespace concept for "User".

You can use metalake directly, or remove this interface. Since I don't see the usage of namespace from the current code.

uint64 id = 1;
string name = 2;
map<string, string> properties = 3;
AuditInfo audit_info = 4;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need to add some more basic information for user?

Copy link
Contributor Author

@qqqttt123 qqqttt123 Mar 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hesitate to add them. If some properties are very important, we can make them become fields later. At the beginning, we can put them into the properties.


/** A class representing a user metadata entity in Gravitino. */
@ToString
public class MetalakeUser implements User, Entity, Auditable, HasIdentifier {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not to choose to name this CombinedUser Because the underlying system may have different users. We can't map them one to one.
Not to choose to name this BaseUser. Because BaseUser looks like an abstract class although we already have a class BaseMetalake.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like this name, user is just a user, MetalakeUser will make me think that this user is only for metalake.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about concreteUser, UserImpl and BasicUser?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about ManagedUser?

@qqqttt123 qqqttt123 changed the title [#2234] core: Add the UserEntity [#2234] core: Add the UserEntity and UserOperation Mar 14, 2024
@qqqttt123 qqqttt123 changed the title [#2234] core: Add the UserEntity and UserOperation [#2234][#2238] core: Add the UserEntity and UserOperation Mar 14, 2024
@qqqttt123 qqqttt123 changed the title [#2234][#2238] core: Add the UserEntity and UserOperation [#2234][#2238] core: Add the support of User Mar 14, 2024
@qqqttt123 qqqttt123 closed this Mar 14, 2024
@qqqttt123 qqqttt123 reopened this Mar 14, 2024
@qqqttt123 qqqttt123 changed the title [#2234][#2238] core: Add the support of User [#2234][#2238] feat(core): Add the support of User Mar 15, 2024
@qqqttt123 qqqttt123 force-pushed the ISSUE-2234 branch 2 times, most recently from 1e79fd7 to 2808f2d Compare March 15, 2024 03:39
@qqqttt123 qqqttt123 changed the title [#2234][#2238] feat(core): Add the support of User [#2234] feat(core): Add the support of User entity Mar 15, 2024
public static final Field NAME =
Field.required("name", String.class, "The name of the user entity.");
public static final Field PROPERTIES =
Field.optional("properties", Map.class, "The properties of the user entity.");
Copy link
Contributor

@jerryshao jerryshao Mar 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you define properties here, but don't have a related method in the User interface, how do users use this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have defined the method properties() in User.java #24L.

@@ -390,6 +407,7 @@ public boolean delete(NameIdentifier ident, EntityType entityType, boolean casca
.build());
}

deleteUserEntitiesIfNecessary(ident, entityType);
return transactionalKvBackend.delete(dataKey);
});
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yuqi1129 please help to check this part. I don't think the change is enough to support User.

Copy link
Contributor

@yuqi1129 yuqi1129 Mar 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@qqqttt123

  1. If a catalog and a user share the same entity, Gravitino will receive an error, so we need to take actions to prevent this from happening.
  2. You may also need to implement rational entity stores.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. How to prevent it?
  2. I prefer doing it in another pr.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How to prevent it?

We need to consider it any way, let me think about it also.

Copy link
Contributor

@yuqi1129 yuqi1129 Mar 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@qqqttt123
We'd better add some comments about it, If the name of a user and catalog is the same, they will share the same id in the id-name mapping, If we drop any one( the user or the catalog), the ID of it may also be removed( though we will not abandon the id of the name when dropping an entity), so I suggest we need to add some comment about it.

You can add the following comments here:

// As the name of users and catalogs shares the same ID if their names are the same, we must be cautious
// that we should NOT drop the id when remove a user or catalog, for more please refer to `KvNameMappingService`

https://github.com/datastrato/gravitino/pull/2481/files#:~:text=USER%2C-,new%20String,-%5B%5D%20%7BUSER.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my thought,

  1. Metalake is a tenant concept us. We have users, groups and roles under the metalake. They will be isolated among metalakes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I want is that the semantic is correct to support different operations, no matter what underlying mapping you use.

Besides, I don't think the storage hierarchy should be directly mapping the semantic layer. Like user, it is a concept under the metalake. But we can store them in a system catalog/table, we don't have to store it directly under metalake.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another potential issue is that renaming may not go as planned. For instance, the catalog 'A' and user A share the same id 1, when we rename catalog A to B, then the mapping then become B to 1 and user A can't be found.

This problem should also exist for other entities, it should be fixed. For example, renaming table from A to B, then the fileset A cannot be found either.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should never assume that there's only one entity type under the specific namespace.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to wait for @yuqi1129 's fix or we can merge it beforehand?

@@ -390,6 +407,7 @@ public boolean delete(NameIdentifier ident, EntityType entityType, boolean casca
.build());
}

deleteUserEntitiesIfNecessary(ident, entityType);
return transactionalKvBackend.delete(dataKey);
});
}
Copy link
Contributor

@yuqi1129 yuqi1129 Mar 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@qqqttt123

  1. If a catalog and a user share the same entity, Gravitino will receive an error, so we need to take actions to prevent this from happening.
  2. You may also need to implement rational entity stores.

import org.slf4j.LoggerFactory;

/* AccessControlManager is used for manage users, roles, grant information, this class is
* an important class for tenant management. */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This appears to be not a standard Java document format.

@apache apache deleted a comment from jerqi Mar 19, 2024
@qqqttt123
Copy link
Contributor Author

qqqttt123 commented Mar 20, 2024

From the Snowflake, the user model is as follows.

CREATE [ OR REPLACE ] USER [ IF NOT EXISTS ] <name>
  [ objectProperties ]
  [ objectParams ]
  [ sessionParams ]
  [ [ WITH ] TAG ( <tag_name> = '<tag_value>' [ , <tag_name> = '<tag_value>' , ... ] )

objectProperties ::=
  PASSWORD = '<string>'
  LOGIN_NAME = <string>
  DISPLAY_NAME = <string>
  FIRST_NAME = <string>
  MIDDLE_NAME = <string>
  LAST_NAME = <string>
  EMAIL = <string>
  MUST_CHANGE_PASSWORD = TRUE | FALSE
  DISABLED = TRUE | FALSE
  DAYS_TO_EXPIRY = <integer>
  MINS_TO_UNLOCK = <integer>
  DEFAULT_WAREHOUSE = <string>
  DEFAULT_NAMESPACE = <string>
  DEFAULT_ROLE = <string>
  DEFAULT_SECONDARY_ROLES = ( 'ALL' )
  MINS_TO_BYPASS_MFA = <integer>
  RSA_PUBLIC_KEY = <string>
  RSA_PUBLIC_KEY_2 = <string>
  COMMENT = '<string_literal>'


sessionParams ::=
  ABORT_DETACHED_QUERY = TRUE | FALSE
  AUTOCOMMIT = TRUE | FALSE
  BINARY_INPUT_FORMAT = <string>
  BINARY_OUTPUT_FORMAT = <string>
  DATE_INPUT_FORMAT = <string>
  DATE_OUTPUT_FORMAT = <string>
  ERROR_ON_NONDETERMINISTIC_MERGE = TRUE | FALSE
  ERROR_ON_NONDETERMINISTIC_UPDATE = TRUE | FALSE
  JSON_INDENT = <num>
  LOCK_TIMEOUT = <num>
  QUERY_TAG = <string>
  ROWS_PER_RESULTSET = <num>
  SIMULATED_DATA_SHARING_CONSUMER = <string>
  STATEMENT_TIMEOUT_IN_SECONDS = <num>
  STRICT_JSON_OUTPUT = TRUE | FALSE
  TIMESTAMP_DAY_IS_ALWAYS_24H = TRUE | FALSE
  TIMESTAMP_INPUT_FORMAT = <string>
  TIMESTAMP_LTZ_OUTPUT_FORMAT = <string>
  TIMESTAMP_NTZ_OUTPUT_FORMAT = <string>
  TIMESTAMP_OUTPUT_FORMAT = <string>
  TIMESTAMP_TYPE_MAPPING = <string>
  TIMESTAMP_TZ_OUTPUT_FORMAT = <string>
  TIMEZONE = <string>
  TIME_INPUT_FORMAT = <string>
  TIME_OUTPUT_FORMAT = <string>
  TRANSACTION_DEFAULT_ISOLATION_LEVEL = <string>
  TWO_DIGIT_CENTURY_START = <num>
  UNSUPPORTED_DDL_ACTION = <string>
  USE_CACHED_RESULT = TRUE | FALSE
  WEEK_OF_YEAR_POLICY = <num>
  WEEK_START = <num>

We could ignore the session properties. Gravitino doesn't have the concept of the session. For object projecties, some of them may be useful like

displayName
email
defaultRole (If a user have multiple roles which have the privillege to create item, the item use the default role to create item)
comment

@qqqttt123
Copy link
Contributor Author

qqqttt123 commented Mar 20, 2024

From the Databricks, https://docs.databricks.com/api/workspace/users/create
You can see

{
  "schemas": [
    "urn:ietf:params:scim:schemas:core:2.0:User"
  ],
  "id": "string",
  "userName": "user@example.com",
  "emails": [
    {
      "$ref": "string",
      "value": "string",
      "display": "string",
      "primary": true,
      "type": "string"
    }
  ],
  "name": {
    "givenName": "string",
    "familyName": "string"
  },
  "displayName": "string",
  "groups": [
    {
      "$ref": "string",
      "value": "string",
      "display": "string",
      "primary": true,
      "type": "string"
    }
  ],
  "roles": [
    {
      "$ref": "string",
      "value": "string",
      "display": "string",
      "primary": true,
      "type": "string"
    }
  ],
  "entitlements": [
    {
      "$ref": "string",
      "value": "string",
      "display": "string",
      "primary": true,
      "type": "string"
    }
  ],
  "externalId": "string",
  "active": true
}

active seems an important property. Snowflake has similar property DISABLED.
Some of properties are like Snowflake. We can add these properties, too.
It is worth noticing about the property schema. It is related to scim. But I think this may be an advanced property. We can consider it later.

@qqqttt123
Copy link
Contributor Author

From the Ranger, https://ranger.apache.org/apidocs/resource_XUserREST.html#resource_XUserREST_createXUser_POST

{
  "name" : "...",
  "firstName" : "...",
  "lastName" : "...",
  "emailAddress" : "...",
  "password" : "...",
  "description" : "...",
  "credStoreId" : 12345,
  "groupIdList" : [ 12345, 12345 ],
  "myClassType" : 12345,
  "status" : 12345,
  "isVisible" : 12345,
  "userSource" : 12345,
  "userRoleList" : [ "...", "..." ],
  "groupNameList" : [ "...", "..." ],
  "otherAttributes" : "...",
  "syncSource" : "...",
  "id" : 12345,
  "createDate" : 12345,
  "updateDate" : 12345,
  "owner" : "...",
  "updatedBy" : "..."
}

Some properties are similar to the Snowflake. Description is similar to the comment.

@qqqttt123
Copy link
Contributor Author

qqqttt123 commented Mar 20, 2024

From the surveys above, I think we can add the properties

firstName
lastName
displayName
emailAddress
comment
active
defaultRole
groups
roles

roles and groups may have some consistent issues when we drop a user, because we lack the transaction, but it should be acceptable.
For example,
we delete a user from a group first. Then we delete the user.
If service goes down after we delete a user from a group, the user won't be deleted, but the group has deleted the user.

@qqqttt123 qqqttt123 closed this Mar 26, 2024
@qqqttt123 qqqttt123 reopened this Mar 26, 2024
@qqqttt123
Copy link
Contributor Author

CI isn't stable. It's not caused by this pull request.

import com.google.errorprone.annotations.FormatMethod;
import com.google.errorprone.annotations.FormatString;

/** An exception thrown when a resource already exists. */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/** An exception thrown when a user already exists. */

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

* @throws UserAlreadyExistsException If a User with the same identifier already exists.
* @throws RuntimeException If adding the User encounters storage issues.
*/
public User addUser(String metalake, String name) throws UserAlreadyExistsException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can have a UserManager under this AccessControlManager, so that we don't have to put everything in the AccessControlManager.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 63 to 67
.withNamespace(
Namespace.of(
metalake,
AuthorizationConstants.SYSTEM_CATALOG_RESERVED_NAME,
AuthorizationConstants.USER_SCHEMA_NAME))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'd better define this reserved Namespace in the entity/user entity, not here in auth constants.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put SYSTEM_CATALOG_RESERVED_NAME into CatalogEntity.
Put USER_SCHEMA_NAME into UserEntity.

try {
return store.get(ofUser(metalake, user), Entity.EntityType.USER, UserEntity.class);
} catch (NoSuchEntityException e) {
LOG.warn("user {} does not exist in the metalake {}", user, metalake, e);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Capitalize the first letter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -268,6 +269,10 @@ public Catalog createCatalog(
Map<String, String> properties)
throws NoSuchMetalakeException, CatalogAlreadyExistsException {

if (AuthorizationConstants.SYSTEM_CATALOG_RESERVED_NAME.equals(ident.name())) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should also added to the catalog's common rule? @mchades

.startInclusive(true)
.end(Bytes.increment(Bytes.wrap(prefix)).get())
.build());
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of this method, I cannot get it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we delete a metalake, we should remove the users under the metalake.

@qqqttt123
Copy link
Contributor Author

I resolved the conflict with the main branch.

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

/** UserManager is used for manage users */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is meaningless, please add more meaningful description.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

private final IdGenerator idGenerator;

/**
* Constructs a User instance.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is "User instance"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Removed the comment.

*
* @param store The EntityStore to use for managing access control.
* @param idGenerator The IdGenerator to use for generating identifiers.
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can remove some of the javadoc here, it's not so useful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done. Removed the comment.

@jerryshao jerryshao merged commit 6641d70 into apache:main Mar 29, 2024
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Subtask] Add a new entity for user
3 participants