Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Use intermediate table to compute user permissions #9787

Closed
wants to merge 40 commits into from

Conversation

guillep2k
Copy link
Member

@guillep2k guillep2k commented Jan 15, 2020

This PR will implement a new table (user_repo_unit) for easy check of permissions on listing and actions as described in #9613. This table would make the access table obsolete.

The main goal is to simplify and speed up permission checking, which is performed significantly more often than changing any permissions.

I did my best in order to document via comments for easier analysis and future maintenance. I've also tested most of the statements in all the supported databases.

Work in progress

  • Create and populate permission tables from different perspectives (user, repo, team, etc.)
  • Modify all places where permissions are updated to ensure user_repo_unit remains in sync.
  • Create/modify tests as required.
  • Modify most of access.go functions to use user_repo_unit as the main source of information.
  • Gradually modify the site queries (e.g. Home, Search, mail notifications, etc.) to use user_repo_unit (some of the most used in this PR, others can be gradually modified in future PRs).
  • Make a proper migration step (currently it's more a testing/checking operation).

Will close #9613.

models/user_repo_unit.go Outdated Show resolved Hide resolved
// This list may be updated as needed. Requisites are:
// - New units need to abide by the same rules as the others
// - Units from all repos must be rebuilt after this change
supportedUnits = []UnitType {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this list for? Why only these units?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The list is for choosing what kinds of units we will actually be checking using this table (e.g. in listings, link generation, and such). There are some units that don't make much sense caching, like UnitTypeExternalTracker, perhaps? Or surely UnitTypeReleases, that can't be disabled in a user by user basis. Since I didn't get to the part where I'm actually using this information, I still don't know which units makes sense to include, but I'd like to avoid processing more units than necessary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So with this change you would still keep the Access table?
If you included all units you can use it in getUserRepoPermission, you don't think it is worth it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So with this change you would still keep the Access table?
If you included all units you can use it in getUserRepoPermission, you don't think it is worth it?

Yes, this will end as a replacement of the access table. And yes, all units we expect to use with getUserRepoPermission should go in here. The thing is (AFAIU) that some units exists for repositories to have, not for users. From the back of my head I believe we can't deny UnitTypeReleases to a particular user or group, so that unit I'd leave out. But including one unit is only a matter of adding it to the list. If we were to include all units, we'd just leave userRepoUnitSupported empty.

I'm just trying to optimize storage and SQL operations, but if we think that the performance hit won't be that big of a deal, we can certainly enable them all. Those are easy changes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least team setting page allow specifying access to Release unit.

log.Trace("RebuildUserUnits: rebuilding permissions for repository %d on batch %d", repo.ID, batchID)

// Make sure we start from scratch; we intend to recreate all pairs
_, err = e.Delete(&UserRepoUnit{RepoID: repo.ID})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be a concurrency issue. It should only be done in the end of the batch while copying from the work table in same session?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right about the concurrency problem. Moving the DELETE down until the last moment will certainly help. However, I'm thinking of implementing some locking mechanism as well. Since we're not enforcing TRANSACTION MODE = SERIALIZED (or whatever name this mode has in every other database), source information can change in the middle of our calculations (this is true for other processes in Gitea as well). I'm still evaluating options for that (a queue, an auxiliary table, etc.).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you use one db session and commit in the end, would that suffice?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not according to my tests. e.g. in PostgreSQL:

Initial state of table a

id value
1 'hello'
2 'bye bye'
  • Sesion 1: BEGIN TRANSACTION;
  • Sesion 1: SELECT value FROM a WHERE ID = 1; gets 'hello'.
    • Sesion 2: BEGIN TRANSACTION;
    • Sesion 2: UPDATE a SET value = 'good bye' WHERE id = 2;
    • Sesion 2: COMMIT;
  • Sesion 1: SELECT value FROM a WHERE ID = 2; gets 'good bye', expected 'bye bye'.

I mean, the problem is that there will potentially be certainly be several sessions updating data on the instance (a rare case, but two admins might be creating teams for the same organization at the same time).

@GiteaBot GiteaBot added the lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. label Jan 15, 2020
@guillep2k
Copy link
Member Author

Updated so it would (hopefully) passed the tests.

@lunny
Copy link
Member

lunny commented Jan 16, 2020

Since all permissions check are now from getUserRepoPermission, isn't it enough to just change this function?

And I would like we could add a guest user which userid = 0 to store this table so that we could public some units on private repositories.

@guillep2k
Copy link
Member Author

Since all permissions check are now from getUserRepoPermission, isn't it enough to just change this function?

@lunny this change is most useful for listing pages (e.g. Home, search issues, explore, cross-repository references, cross-reposistory dependencies, etc.) where we need to constraint the query to repositories the user has access in a sure and efficient way. This table allows to use a simpler join for those queries.

And I would like we could add a guest user which userid = 0 to store this table so that we could public some units on private repositories.

Yes, this does that, but I've set two different cases apart:

  • The guest user is represented with -2 in the table (can change it to 0, though).
  • Generic users (authenticated, but with no explicit permissions) are represented with -1 in the table. This is to avoid a full product users x repos that may yield too many records.

Restricted users (recently merged) are supported as well.

@guillep2k guillep2k mentioned this pull request Jan 17, 2020
7 tasks
@stale
Copy link

stale bot commented Mar 17, 2020

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs during the next 2 months. Thank you for your contributions.

@stale stale bot added the issue/stale label Mar 17, 2020
@stale
Copy link

stale bot commented May 17, 2020

This pull request has been automatically closed because of inactivity. You can re-open it if needed.

@stale stale bot closed this May 17, 2020
@6543
Copy link
Member

6543 commented May 17, 2020

should we realy close this?

@guillep2k
Copy link
Member Author

guillep2k commented May 19, 2020

Reopening to bring the discussion back for 1.13, but this PR is still WIP and very much outdated.

@guillep2k guillep2k reopened this May 19, 2020
@stale stale bot removed issue/stale labels May 19, 2020
@zeripath
Copy link
Contributor

we may not actually completely need an intermediate table. A table that is effectively a superset of:

  • access
  • org_user
  • team_user

may simply be enough. That is the columns would be:

  • user_id
  • repo_id
  • owner_id - Equivalent of org_user.org_id, `team_user.
  • team_id

We then rationalise collaborator permissions into a collaborator team etc.

The locking situation is no worse. It's still a bit of a mess to work with it's better than we currently have.

@guillep2k
Copy link
Member Author

@zeripath the proposed table kind of is a superset of those tables. My goal was to get a table that allows for 100% permissions matching between user, repository and unit (type of access). See this function to get what I mean:

gitea/models/repo_list.go

Lines 401 to 446 in 3bcfbc5

// accessibleRepositoryConditionUnit takes a user and returns a condition for checking
// if a repository is accessible for a given unit type
func accessibleRepositoryConditionUnit(user *User, unitType UnitType) builder.Cond {
if user == nil || user.ID == 0 {
// Anonymous users (i.e. not logged in) can access whatever is public.
// Systems with restricted users must disable anonymous access.
// FIXME: GAP: Remove log.Trace()
log.Trace("accessibleRepositoryConditionUnit(UserRepoUnitAnyUser, %d)", unitType)
return builder.In("`repository`.id", builder.
Select("repo_id").
From("user_repo_unit").
Where(builder.And(
builder.Eq{"user_id": UserRepoUnitAnyUser},
builder.Eq{"unit_type": unitType})))
}
if user.IsRestricted {
// Restricted users can only access whatever they've been explicitly allowed to.
// FIXME: GAP: Remove log.Trace()
log.Trace("accessibleRepositoryCondition(restricted user:%s(%d))", user.LowerName, user.ID)
return builder.In("`repository`.id", builder.
Select("repo_id").
From("user_repo_unit").
Where(builder.And(
builder.Eq{"user_id": user.ID},
builder.Eq{"unit_type": unitType})))
}
if user.IsAdmin {
// UserRepoUnitLoggedInUser is not required for site administrators
// FIXME: GAP: Remove log.Trace()
log.Trace("accessibleRepositoryConditionUnit(admin user:%s(%d), %d)", user.LowerName, user.ID, unitType)
return builder.In("`repository`.id", builder.Select("repo_id").From("user_repo_unit").
Where(builder.And(
builder.In("user_id", []int64{user.ID, UserRepoUnitAdminUser}),
builder.Eq{"unit_type": unitType})))
}
// Know users can access whatever is public or have been explicitly allowed to.
// FIXME: GAP: Remove log.Trace()
log.Trace("accessibleRepositoryConditionUnit(user:%s(%d), %d)", user.LowerName, user.ID, unitType)
return builder.In("`repository`.id", builder.Select("repo_id").From("user_repo_unit").
Where(builder.And(
builder.In("user_id", []int64{user.ID, UserRepoUnitLoggedInUser}),
builder.Eq{"unit_type": unitType})))
}

The goal is to eliminate a lot of guessing in the code about what should and shouldn't be allowed so we can maintain a single file for calculating what can and can't be done by any user and thus prevent possible bugs regarding permissions.

Although these changes allow for an important improvement on listing queries, it doesn't strictly translates to all possible listing optimizations (e.g. one may want to list repositories that are private or not private, or semi-public, or from org private but public, etc.); those other attributes I believe belong to a different set of data (e.g. an additional column on the repository table, as mentioned in Discord). IMHO we should maintaining permission optimization separated from query optimization in order to avoid painting ourselves into a corner. 😄

@stale
Copy link

stale bot commented Jul 19, 2020

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs during the next 2 months. Thank you for your contributions.

@stale stale bot added the issue/stale label Jul 19, 2020
@lunny lunny added the issue/confirmed Issue has been reviewed and confirmed to be present or accepted to be implemented label Jul 19, 2020
@stale stale bot removed the issue/stale label Jul 19, 2020
@yardenshoham
Copy link
Member

This is in progress for a while now, I'm closing it. Please reopen when it's ready for review.

@go-gitea go-gitea locked as resolved and limited conversation to collaborators Aug 1, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
issue/confirmed Issue has been reviewed and confirmed to be present or accepted to be implemented lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Proposal: User <-> repo unit cross table for easier permission checking.
7 participants