Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug] ldap用户同步的code目前使用的是full_name #714

Open
wklken opened this issue Oct 11, 2022 · 13 comments
Open

[bug] ldap用户同步的code目前使用的是full_name #714

wklken opened this issue Oct 11, 2022 · 13 comments
Assignees
Labels
Priority: High Sign: help wanted Extra attention is needed Type: bug Something isn't working

Comments

@wklken
Copy link
Collaborator

wklken commented Oct 11, 2022

https://github.com/TencentBlueKing/bk-user/blob/v2.3.4-beta.28/src/api/bkuser_core/categories/plugins/ldap/helper.py#L82

image

企业微信截图_143035ea-ccf4-4c5b-b060-fef4da941e66

正常情况下, 不会有问题, 因为每个部门的full_name是唯一的

带来的问题: 部门入库后的字段code=full_name, 而code又是唯一的

但是

  1. 同一个ldap源存在 full_name相同的(小概率)
  2. 两个目录, 配置同一个ldap(大概率)
  3. 两个目录, 配置不同ldap, 两个ldap存在相同的部门full_name

此时, code冲突, 同步失败

@wklken wklken added Type: bug Something isn't working Priority: High labels Oct 11, 2022
@wklken wklken self-assigned this Oct 11, 2022
@wklken
Copy link
Collaborator Author

wklken commented Oct 11, 2022

注意: 不能直接修复, 因为存量环境是通过code确定唯一的! 如果修复成sha256字符串, 会导致存量环境再次同步出现双份数据

@wklken
Copy link
Collaborator Author

wklken commented Oct 12, 2022

企业微信截图_3f7be633-8dc3-48ea-814d-7d0d22467e05

这段代码的目的就是: 拿full_name获取, 拿到了code不一样就更新code; 拿不到就save, code=key_field

@wklken
Copy link
Collaborator Author

wklken commented Oct 12, 2022

如果没有找到方式可以兼容存量环境以及新装环境, 就不要动现有的逻辑

@wklken
Copy link
Collaborator Author

wklken commented Oct 19, 2022

plugins/ldap/syncer.py

    def fetch_departments(self, restrict_types: List[str]):
        """获取 department 对象列表"""
        groups, departments, _ = self._load()
        results = []
        for is_group, dept_meta in chain.from_iterable(iter([product([False], departments), product([True], groups)])):
            if not dept_meta.get("dn"):
                logger.warning("no dn field, skipping for %s:%s", ("group" if is_group else "department"), dept_meta)
                continue
            results.append(
                department_adapter(
                    code=self._get_code(dept_meta),
                    dept_meta=dept_meta,
                    is_group=is_group,
                    restrict_types=restrict_types,
                )
            )
        return results

plugins/ldap/adaptor.py

def department_adapter(code: str, dept_meta: Dict, is_group: bool, restrict_types: List[str]) -> LdapDepartment:
    dn = dept_meta["dn"]
    dn_values = parse_dn_value_list(dn, restrict_types=restrict_types)

    parent_dept: Optional[LdapDepartment] = None
    for dept_name in reversed(dn_values):
        parent_dept = LdapDepartment(
            name=dept_name,
            parent=parent_dept,
            is_group=is_group,
        )

    assert parent_dept is not None, "未从 dn 中提取到任何部门信息"
    parent_dept.code = code
    return parent_dept

这里相当于递归存了一条链路

总公司 - 部门 1 - 部门 2 - 部门 3[只有这个带code]

@wklken
Copy link
Collaborator Author

wklken commented Oct 19, 2022

分析的问题已全部注释标出来了, 需要考虑如何处理

def _handle_department(self, dept_info: LdapDepartment) -> Optional[Department]:
"""将 DepartmentProfile 转换成 Department, 并递归处理其父节点
如果父节点存在, 则递归处理父节点, 并绑定部门上下级关系, 再将部门对象(Department)插入缓存层
如果父节点不存在, 则直接将部门对象(Department)插入缓存层
"""
# NOTE: 注意, dept_info.parent 不带code字段, syncer->adaptor拼接的时候没放(除非能拿到每个节点的dn)
if dept_info.parent:
parent_dept = self._handle_department(dept_info.parent)
else:
parent_dept = None
# FIXME: https://github.com/TencentBlueKing/bk-user/issues/714
# BUG: here, the code is full_name, not the real code
defaults = {
# 当 self._handle_department(dept_info.parent) 逻辑执行到这里的时候, 没有code, 所以用的full_name
"code": dept_info.key_field,
"category_id": self.category.pk,
"name": dept_info.name,
"enabled": True,
"parent_id": getattr(parent_dept, "pk", None),
"extras": {
"type": self.config_loader["user_group_class"]
if dept_info.is_group
else self.config_loader["organization_class"],
},
**self._MPTT_INIT_PARAMS,
}
dept = self._insert_dept(dept_info=dept_info, defaults=defaults)
return dept
def _insert_dept(self, dept_info: LdapDepartment, defaults: Dict) -> Department:
# BUG: here, use dept_info.key_field to get dept -> it's full_name, not the code
# the defaults["code"] is the not the `code`, it's the full_name
dept: Department = self.db_sync_manager.magic_get(dept_info.key_field, LdapDepartmentMeta)
if dept:
# Question: here update the code to real code, without magic_add, will not saved into db?
if dept_info.code and dept.code != dept_info.code:
dept.code = dept_info.code
return dept
# NOTE: 如果full_name在 db 里面已经有了, 这里是判断不出来的, 因为self.db_departments是当前category的集合, 不是全局的
# 但是code字段是 db unique字段
if dept_info.key_field in self.db_departments:
dept = self.db_departments[dept_info.key_field]
for key, value in defaults.items():
setattr(dept, key, value)
self.db_sync_manager.magic_add(dept, SyncOperation.UPDATE.value)
else:
defaults["pk"] = self.db_sync_manager.register_id(LdapDepartmentMeta)
# BUG: here, defaults["code"] is the full_name, not the code, saved into db
# Question: key_field能否加上 category_id, 保证唯一性?
# Question: 如何鉴权存量数据?
dept = Department(**defaults)
self.db_sync_manager.magic_add(dept, SyncOperation.ADD.value)
self.context.add_record(step=SyncStep.DEPARTMENTS, success=True, department=dept_info.key_field)
return dept

1, 全部用code, 那么需要所有部门链路都带
2. 沿用key_field, 需要考虑加入category_id, 以便全局唯一?

最困难的问题: 升级后如何保证存量数据正确

@wklken
Copy link
Collaborator Author

wklken commented Oct 25, 2022

department full_name超长

例如一级部门就超过 64 个字符, 那么所有都写不进来, 被截断了

所以, 无论如何是应该要改成hash的, 并且按目录区分

@wklken wklken added this to the Y2022M44 milestone Oct 31, 2022
@wklken
Copy link
Collaborator Author

wklken commented Oct 31, 2022

先看下有没有什么兼容方案, 确保同类的问题不再出现

@wklken
Copy link
Collaborator Author

wklken commented Oct 31, 2022

ldap metas.py

class LdapDepartmentMeta(DepartmentMeta):
    update_exclude_fields = ["category_id", "code"]
    unique_key_field = "code"
    use_bulk = True

使用的code 去做Department.get(code=xxxx)判定是否存在

@wklken
Copy link
Collaborator Author

wklken commented Oct 31, 2022


─────────────────────────────────────────────────────────────────────────────────────────┐
• src/api/bkuser_core/categories/plugins/ldap/helper.py:101: class DepartmentSyncHelper: │
─────────────────────────────────────────────────────────────────────────────────────────┘
 101⋮ 101│    def _insert_dept(self, dept_info: LdapDepartment, defaults: Dict) -> Department:
 102⋮ 102│        # BUG: here, use dept_info.key_field to get dept -> it's full_name, not the code
 103⋮ 103│        # the defaults["code"] is the not the `code`, it's the full_name
    ⋮ 104│        import hashlib
    ⋮ 105│
    ⋮ 106│        from django.utils.encoding import force_bytes
    ⋮ 107│
    ⋮ 108│        code_hash = hashlib.sha256(force_bytes(f"{self.category.pk}:{dept_info.key_field}")).hexdigest()
    ⋮ 109│
    ⋮ 110│        # 如果发现老的, 直接更新其 code=code_hash
 104⋮ 111│        dept: Department = self.db_sync_manager.magic_get(dept_info.key_field, LdapDepartmentMeta)
 105⋮ 112│        if dept:
 106⋮ 113│            # Question: here update the code to real code, without magic_add, will not saved into db?
 107⋮    │            if dept_info.code and dept.code != dept_info.code:
 108⋮    │                dept.code = dept_info.code
    ⋮ 114│            # if dept_info.code and dept.code != dept_info.code:
    ⋮ 115│            #     dept.code = dept_info.code
    ⋮ 116│            if dept_info.code and dept_info.code != code_hash:
    ⋮ 117│                dept.code = code_hash
 109⋮ 118│            return dept
 110⋮ 119│
    ⋮ 120│        # 判定 code_hash 是否存在, 是的话直接返回
    ⋮ 121│        dept2: Department = self.db_sync_manager.magic_get(code_hash, LdapDepartmentMeta)
    ⋮ 122│        if dept2:
    ⋮ 123│            return dept2
    ⋮ 124│
 111⋮ 125│        # NOTE: 如果full_name在 db 里面已经有了, 这里是判断不出来的, 因为self.db_departments是当前category的集合, 不是全局的
 112⋮ 126│        # 但是code字段是 db unique字段
 113⋮ 127│        if dept_info.key_field in self.db_departments:
 114⋮ 128│            dept = self.db_departments[dept_info.key_field]
    ⋮ 129│            # 老的, 更新code
    ⋮ 130│            defaults["code"] = code_hash
 115⋮ 131│            for key, value in defaults.items():
 116⋮ 132│                setattr(dept, key, value)
 117⋮ 133│            self.db_sync_manager.magic_add(dept, SyncOperation.UPDATE.value)

─────────────────────────────────────────────────────────────────────────────────────────┐
• src/api/bkuser_core/categories/plugins/ldap/helper.py:136: class DepartmentSyncHelper: │
─────────────────────────────────────────────────────────────────────────────────────────┘
 120⋮ 136│            # BUG: here, defaults["code"] is the full_name, not the code, saved into db
 121⋮ 137│            # Question: key_field能否加上 category_id, 保证唯一性?
 122⋮ 138│            # Question: 如何鉴权存量数据?
    ⋮ 139│            # 新插入的, 使用code_hash
    ⋮ 140│            defaults["code"] = code_hash
 123⋮ 141│            dept = Department(**defaults)
 124⋮ 142│            self.db_sync_manager.magic_add(dept, SyncOperation.ADD.value)
 125⋮ 143│
------------------------------------------------------------

image

没想明白原来的 line 114是为什么这么处理的

@nannan00
Copy link
Collaborator

nannan00 commented Nov 7, 2022

plugins/ldap/syncer.py

    def fetch_departments(self, restrict_types: List[str]):
        """获取 department 对象列表"""
        groups, departments, _ = self._load()
        results = []
        for is_group, dept_meta in chain.from_iterable(iter([product([False], departments), product([True], groups)])):
            if not dept_meta.get("dn"):
                logger.warning("no dn field, skipping for %s:%s", ("group" if is_group else "department"), dept_meta)
                continue
            results.append(
                department_adapter(
                    code=self._get_code(dept_meta),
                    dept_meta=dept_meta,
                    is_group=is_group,
                    restrict_types=restrict_types,
                )
            )
        return results

plugins/ldap/adaptor.py

def department_adapter(code: str, dept_meta: Dict, is_group: bool, restrict_types: List[str]) -> LdapDepartment:
    dn = dept_meta["dn"]
    dn_values = parse_dn_value_list(dn, restrict_types=restrict_types)

    parent_dept: Optional[LdapDepartment] = None
    for dept_name in reversed(dn_values):
        parent_dept = LdapDepartment(
            name=dept_name,
            parent=parent_dept,
            is_group=is_group,
        )

    assert parent_dept is not None, "未从 dn 中提取到任何部门信息"
    parent_dept.code = code
    return parent_dept

这里相当于每个部门都递归存了一条链路

总公司 - 部门 1 - 部门 2 - 部门 3[只有这个带code]

def load_to_memory(self):
for dept in handle_with_progress_info(
self.target_obj_list, progress_title="handle department"
): # type: LdapDepartment
self._handle_department(dept)
def _handle_department(self, dept_info: LdapDepartment) -> Optional[Department]:
"""将 DepartmentProfile 转换成 Department, 并递归处理其父节点
如果父节点存在, 则递归处理父节点, 并绑定部门上下级关系, 再将部门对象(Department)插入缓存层
如果父节点不存在, 则直接将部门对象(Department)插入缓存层
"""
# NOTE: 注意, dept_info.parent 不带code字段, syncer->adaptor拼接的时候没放(除非能拿到每个节点的dn)
if dept_info.parent:
parent_dept = self._handle_department(dept_info.parent)
else:
parent_dept = None
# FIXME: https://github.com/TencentBlueKing/bk-user/issues/714
# BUG: here, the code is full_name, not the real code
defaults = {
# 当 self._handle_department(dept_info.parent) 逻辑执行到这里的时候, 没有code, 所以用的full_name
"code": dept_info.key_field,
"category_id": self.category.pk,
"name": dept_info.name,
"enabled": True,
"parent_id": getattr(parent_dept, "pk", None),
"extras": {
"type": self.config_loader["user_group_class"]
if dept_info.is_group
else self.config_loader["organization_class"],
},
**self._MPTT_INIT_PARAMS,
}
dept = self._insert_dept(dept_info=dept_info, defaults=defaults)
return dept
def _insert_dept(self, dept_info: LdapDepartment, defaults: Dict) -> Department:
# BUG: here, use dept_info.key_field to get dept -> it's full_name, not the code
# the defaults["code"] is the not the `code`, it's the full_name
dept: Department = self.db_sync_manager.magic_get(dept_info.key_field, LdapDepartmentMeta)
if dept:
# Question: here update the code to real code, without magic_add, will not saved into db?
if dept_info.code and dept.code != dept_info.code:
dept.code = dept_info.code
return dept
# NOTE: 如果full_name在 db 里面已经有了, 这里是判断不出来的, 因为self.db_departments是当前category的集合, 不是全局的
# 但是code字段是 db unique字段
if dept_info.key_field in self.db_departments:
dept = self.db_departments[dept_info.key_field]
for key, value in defaults.items():
setattr(dept, key, value)
self.db_sync_manager.magic_add(dept, SyncOperation.UPDATE.value)
else:
defaults["pk"] = self.db_sync_manager.register_id(LdapDepartmentMeta)
# BUG: here, defaults["code"] is the full_name, not the code, saved into db
# Question: key_field能否加上 category_id, 保证唯一性?
# Question: 如何鉴权存量数据?
dept = Department(**defaults)
self.db_sync_manager.magic_add(dept, SyncOperation.ADD.value)
self.context.add_record(step=SyncStep.DEPARTMENTS, success=True, department=dept_info.key_field)
return dept

输入:

LdapDepartments  = List[
    总部门(with code)
    总部门(without code) -> 部门1(with code)
    总部门(without code) -> 部门1(without code)-> 部门2(with code)
    总部门(without code) -> 部门1(without code)-> 部门2(without code)-> 部门3(with code)
]

其中code = sha256(category_id +  department ldap DN) 

  1. 查询DB部门数据,以key=full_name value=department db object存储于db_department里
  2. 对于需要新增或更新的数据,暂时缓存于magic_dict里,对于magic_dict的get/add操作说明如下
    • Add magic_dict时,以LdapDepartment.code为key进行添加
    • Get magic_dict时,以full_name进行查询
  3. 每个部门处理逻辑,先递归其上级部门,进行_insert_dept处理,而_insert_dept处理如下:
    • 先查询magic_dict是否存在
    • 如果magic_dict存在,则判断LdapDepartment.code非空且与查询出的数据不一样,若不一样则替换(由于每个部门可能被多次进行_insert_dept处理,其中只有一次是带code的,可参考输入样例的总部门、部门1和部门2)
    • 如果不存在,则查询db_department是否存在
    • 若db_department里存在,则使用defaults进行字段更新内存对象,并更新到magic_dict里,其中defaults["code"]=full_name
    • 若db_department里不存在,则使用defaults进行字段创建内存对象,并添加到magic_dict里,其中defaults["code"]=full_name
  4. 最终所有部门都处理完成后,使用magic_dict里的数据进行批量创建或更新DB数据

  • 问题1:DB Department里的code可能是 full_name 或 sha256(category_id + department ldap DN)
    原因:部门被多次处理,处理顺序将影响code在DB的存储

    • 先处理with code的,则最终DB里code存储为full_name,
    • 先处理without code,则最终DB里code存储的是sha256(category_id + department ldap DN)
  • 问题2:full_name 是随部门层级的变化的,变化后可能导致同一个部门再次新增

  • bug1: code = sha256(category_id + department ldap DN) 发生 DB冲突
    原因:

    • 第一次只同步两级,但是由于无论同步多少级,对于ldap返回的dn拥有是最全的,即ldap的所有层级,这时候存到DB后重新读取出来的full_name就是两级
    • 第二次同步调整为3级后,由于对比是以full_name为key的,这会导致同一个部门由于这次同步的层级变化而导致full_name变化
    • 这时候会执行db新增,而code的唯一约束又不会变化的,所以出现DB冲突
  • bug2: code = full_name 发生DB 冲突
    原因:

    • 不同category可能存在相同的组织架构,这时候full_name就会冲突

@nannan00 nannan00 mentioned this issue Nov 10, 2022
34 tasks
@nannan00
Copy link
Collaborator

image

code 无法被更新,是否支持在update_exclude_fields里去除呢?

@nannan00 nannan00 modified the milestones: Y2022M45, Y2022M46 Nov 14, 2022
@nannan00 nannan00 modified the milestones: Y2022M46, Y2022M47 Nov 22, 2022
@wklken
Copy link
Collaborator Author

wklken commented Jan 11, 2023

临时解决方案:

1406, Data too long for column 'code'

alter table departments_department modify `code` varchar(256) DEFAULT NULL

@wklken
Copy link
Collaborator Author

wklken commented Jan 12, 2023

需要决策: 尽早在某个版本把department的code改成varchar 256, 可以减少大量的用户咨询

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Priority: High Sign: help wanted Extra attention is needed Type: bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants