Skip to content

Refactor parent/child relationship #255

@jamesharr

Description

@jamesharr

Environment

  • DiffSync version: 1.9.0

Proposed Functionality

Currently in DiffSync, the tree hierarchy has shown a few limitations, most notably that objects can not form a tree using the same type. For example, a user cannot form a hierarchy of Locations (Site -> Building -> Floor -> Room). Additionally, this parent/child relationship must be declared when creating the Models, which might not be entirely necessary.

Adapting to a new tree implementation may resolve several outstanding feature requests, such as #239 and #225, as well as allow for the implementation of #122 and a get_parent() method which does not exist today.

This change proposes:

  • Declare parent/child object relationships when an object is added to the backend instead of declared during type declaration.
  • (Implementation detail) Store parent/child relationships in the back-end only instead of in DiffSyncModel
  • Add a get_parent method
  • Eliminate the need for a call to obj1.add_child(obj2)
  • Support arbitrary object hierarchies such as:
    • Self-referencing models, such as Location trees (Location in a Location in a Location)
    • Objects with an optional parent

Use Case

# Note that MyModelA & MyModelB declarations would not include _children

be1 = MyBackEnd()
a1 = MyModelA(...)
a2 = MyModelA(...)
b3 = MyModelB(...)
b4 = MyModelB(...)

be1.add(a1, parent=None)  # Note that parent=None is the default
be1.add(a2, parent=a1)  # Specify a parent object
be1.add(b3, parent=a2)  # Specify a parent
be1.add(b4, parent=None)  # Another object of the same type, but without a parent

# Getter methods
be1.get_parent(a2)  # Will return a1
be1.get_parent(a1)  # Will return None; though it could also raise an Exception
be1.get_children(a2)  # Will return something that iterable (that happens to only have b3 in it)
be1.get_children(b4)  # Will return an empty iterator/list

# Alternate signatures to support Model + ids style signatures
# IE when you don't have the original object.
be1.add(a2, parent_model=MyModelA, parent_ids={"k": "v", ...})
be1.get_parent(model=MyModelA, ids={"k": "v", ...})
be1.get_children(model=MyModelA, ids={"k": "v", ...})

One thing I haven't figured out entirely is if it makes sense to support moving an object between parents without delete/recreate

  • What would the API look like?
  • What would diffs look like? How would this be represented?
  • How would this change with multiple parents like in Allow childrens to have multiple parents #210?
    • get_parent turns into get_parents, but do we keep the original and how does it behave?
    • Most of this feature request assumes only a single parent
  • Does it make sense to add this support in the same feature PR?
  • Does it need to be implemented with this feature or just designed in a way that doesn't inhibit this feature?

Implementation Thoughts

I've been thinking about the best place to store the parent/child relationship information. I think it's actually better to store them purely in the backend (in a dedicated tree structure) vs having the DiffSyncModel be aware of the parent/child relationships. Using this method would keep DiffSyncModel focused on storing data instead of getting involved in part of the tree.

@dataclass
class TreeNode:
    """Internal data structure to represent an object in a tree"""
    object: DiffSyncModel
    parent: None | DiffSyncModel = None
    children: None | Dict[str, Dict[str, DiffSyncModel]] = None  # children[model_name][identifier] = Object

    def add_child(self: Node, child: DiffSyncModel) -> TreeNode:
	"""Add a tree to a node"""
        n = TreeNode(object=child, parent=self)
	self.children = parent.children or {}  # or use defaultdict()
	self.children.setdefault(child._modelname, {})  # or use defaultdict()
	self.children[child._modelname][child.get_identifier()] = n
        return n

# How TreeNode could be represented in a backend
class DiffSync:
    top_level_nodes: Dict[str, Dict[str, TreeNode]]   # top_level_nodes[model][identifier] = TreeNode(...)
    all_nodes: Dict[str, Dict[str, TreeNode]]  # all_nodes[model][identifier] = TreeNode(...)

    # alternative - use a synthetic root node to reduce the number of special-cases for top-level objects
    root_node: TreeNode(object=None)
    all_nodes: Dict[str, Dict[str, TreeNode]]  # all_nodes[model][identifier] = TreeNode(...)

    # The root_node / top_level_nodes would establish only the top-level nodes
    # The all_nodes would allow for get_parent / get_children method to quickly locate an object in the tree,
    # as well as allow DiffSync to enforce key-uniqueness.

I'm going a bit off-topic, but instead of TreeNode storing a reference to a DiffSyncModel, it could instead store a reference to just the object's identifier; This might make it easier for Python's GC to do its job at the expense of having to look up an object by its key.

@dataclass
class TreeNode:
    """Internal data structure to represent an object in a tree"""
    model: str  # Model name
    id: str  # object key
    parent_model: None | str = None  # Parent model
    parent_id: None | str = None  # Parent key
    children: None | Dict[str, Set[str]] = None  # children[model_name] = Set(ids)  # Set of IDs

class DiffSync:
    # Key lookup of all objects in the backend
    all_objects: Dict[str, Dict[str, DiffSyncModel]] # all_objects[model][identifier] = DiffSyncModel()

    # Tree structure
    top_level_nodes: Dict[str, Dict[str, TreeNode]]   # top_level_nodes[model][identifier] = TreeNode(...)
    all_nodes: Dict[str, Dict[str, TreeNode]]  # all_nodes[model][identifier] = TreeNode(...)

Related Issues

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions