Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance for creating crdt.TreeNode #930

Closed
blurfx opened this issue Jul 16, 2024 · 5 comments · Fixed by yorkie-team/yorkie-js-sdk#875
Closed

Improve performance for creating crdt.TreeNode #930

blurfx opened this issue Jul 16, 2024 · 5 comments · Fixed by yorkie-team/yorkie-js-sdk#875
Assignees
Labels
enhancement 🌟 New feature or request sdk ⚒️

Comments

@blurfx
Copy link
Member

blurfx commented Jul 16, 2024

Description:

Currently, the time complexity of creating crdt.TreeNode is O(N^2). While this may not be a significant issue currently, there is a risk that as the number of tree nodes in the protobuf increases, operations will scale quadratically, potentially causing performance bottlenecks.

It would be beneficial to optimize the time complexity to prevent potential future performance issues.

func FromTreeNodes(pbNodes []*api.TreeNode) (*crdt.TreeNode, error) {
if len(pbNodes) == 0 {
return nil, nil
}
nodes := make([]*crdt.TreeNode, len(pbNodes))
for i, pbNode := range pbNodes {
node, err := fromTreeNode(pbNode)
if err != nil {
return nil, err
}
nodes[i] = node
}
root := nodes[len(nodes)-1]
for i := len(nodes) - 2; i >= 0; i-- {
var parent *crdt.TreeNode
for j := i + 1; j < len(nodes); j++ {
if pbNodes[i].Depth-1 == pbNodes[j].Depth {
parent = nodes[j]
break
}
}
if err := parent.Prepend(nodes[i]); err != nil {
return nil, err
}
}
root.Index.UpdateDescendantsSize()
// build crdt.Tree from root to construct the links between nodes.
return crdt.NewTree(root, nil).Root(), nil
}

Why:

To prevent performance issues that could happen in the future.

@blurfx blurfx added the enhancement 🌟 New feature or request label Jul 16, 2024
@blurfx blurfx moved this to Backlog in Yorkie Project - 2024 Jul 16, 2024
@blurfx blurfx changed the title Improve time complexity for creating crdt.TreeNode Improve performance for creating crdt.TreeNode Jul 17, 2024
@raararaara
Copy link
Contributor

raararaara commented Jul 17, 2024

I tried implementing it in the following way.

func FromTreeNodes(pbNodes []*api.TreeNode) (*crdt.TreeNode, error) {    
	if len(pbNodes) == 0 {    
		return nil, nil    
	}    
    
	nodes := make([]*crdt.TreeNode, len(pbNodes))    
	for i, pbNode := range pbNodes {    
		node, err := fromTreeNode(pbNode)    
		if err != nil {    
			return nil, err    
		}    
		nodes[i] = node    
	}    
    
	root := nodes[len(nodes)-1]    
	depthTable := make(map[int32]*crdt.TreeNode)    
	depthTable[pbNodes[len(nodes)-1].Depth] = nodes[len(nodes)-1]    
	for i := len(nodes) - 2; i >= 0; i-- {    
		var parent *crdt.TreeNode = depthTable[pbNodes[i].Depth-1]    
    
		if err := parent.Prepend(nodes[i]); err != nil {    
			return nil, err    
		}    
		depthTable[pbNodes[i].Depth] = nodes[i]    
	}
    
	// build crdt.Tree from root to construct the links between nodes.    
	return crdt.NewTree(root, nil).Root(), nil    
}  

When benchmarking against trees of the form <p>aaaa...aa<p>, there was a performance improvement of about 20%.

Here's an explanation of how it works:

  • Each node in the PB tree contains information from root to leaf nodes in reverse order.
  • Post-traversal guarantees that traversal of all children nodes is completed before traversing the self node.
    • Therefore, if the depth of the current node is denoted as 'd', the first node with a depth of 'd-1' among the nodes to be traversed later will become the parent of the current node.
    • Viewing in reverse order, the node with the most recent access at depth 'd-1' is identical.
    • For better understanding, the visit order when post-traversing a given tree is roughly labeled as follows.
      • image
  • Let's try using a hash table of the form <k, v> = <depth, treeNode> to optimize traversal costs in trees with many siblings.
  • Update the <depth, node> hash table for completed access suffix nodes to keep it up-to-date.

@m4ushold
Copy link
Contributor

m4ushold commented Jul 22, 2024

I tested raararaara's code by simply creating 10,000 child vertices

Test Code

func TestCreateNode(t *testing.T) {
	var chd []json.TreeNode
	for i := 0; i < 10000; i++ {
		chd = append(chd, json.TreeNode{Type: "p", Children: []json.TreeNode{{Type: "text", Value: "a"}}})
	}

	root := helper.BuildTreeNode(&json.TreeNode{
		Type:     "r",
		Children: chd,
	})

	pbNodes := converter.ToTreeNodes(root)

	for i := 1; i <= 10; i++ {
		t.Run(fmt.Sprintf("test %d", i), func(t *testing.T) {
			converter.FromTreeNodes(pbNodes)
		})
	}
}

Current O(n^2) code result

Running tool: /usr/local/go/bin/go test -timeout 30s -run ^TestCreateNode$ github.com/yorkie-team/yorkie/api/converter

=== RUN TestCreateNode
=== RUN TestCreateNode/test_1
--- PASS: TestCreateNode/test_1 (0.38s)
=== RUN TestCreateNode/test_2
--- PASS: TestCreateNode/test_2 (0.32s)
=== RUN TestCreateNode/test_3
--- PASS: TestCreateNode/test_3 (0.31s)
=== RUN TestCreateNode/test_4
--- PASS: TestCreateNode/test_4 (0.31s)
=== RUN TestCreateNode/test_5
--- PASS: TestCreateNode/test_5 (0.31s)
=== RUN TestCreateNode/test_6
--- PASS: TestCreateNode/test_6 (0.35s)
=== RUN TestCreateNode/test_7
--- PASS: TestCreateNode/test_7 (0.32s)
=== RUN TestCreateNode/test_8
--- PASS: TestCreateNode/test_8 (0.31s)
=== RUN TestCreateNode/test_9
--- PASS: TestCreateNode/test_9 (0.33s)
=== RUN TestCreateNode/test_10
--- PASS: TestCreateNode/test_10 (0.31s)
--- PASS: TestCreateNode (3.31s)
PASS
ok github.com/yorkie-team/yorkie/api/converter 3.321s

rararara's code result

Running tool: /usr/local/go/bin/go test -timeout 30s -run ^TestCreateNode$ github.com/yorkie-team/yorkie/api/converter

=== RUN TestCreateNode
=== RUN TestCreateNode/test_1
--- PASS: TestCreateNode/test_1 (0.17s)
=== RUN TestCreateNode/test_2
--- PASS: TestCreateNode/test_2 (0.19s)
=== RUN TestCreateNode/test_3
--- PASS: TestCreateNode/test_3 (0.16s)
=== RUN TestCreateNode/test_4
--- PASS: TestCreateNode/test_4 (0.15s)
=== RUN TestCreateNode/test_5
--- PASS: TestCreateNode/test_5 (0.15s)
=== RUN TestCreateNode/test_6
--- PASS: TestCreateNode/test_6 (0.15s)
=== RUN TestCreateNode/test_7
--- PASS: TestCreateNode/test_7 (0.15s)
=== RUN TestCreateNode/test_8
--- PASS: TestCreateNode/test_8 (0.15s)
=== RUN TestCreateNode/test_9
--- PASS: TestCreateNode/test_9 (0.15s)
=== RUN TestCreateNode/test_10
--- PASS: TestCreateNode/test_10 (0.16s)
--- PASS: TestCreateNode (1.63s)
PASS
ok github.com/yorkie-team/yorkie/api/converter 1.637s

The time complexity is optimized from O(n^2) to O(n).
Additionally, I added the missing root.Index.UpdateDescendantsSize().

func FromTreeNodes(pbNodes []*api.TreeNode) (*crdt.TreeNode, error) {
	if len(pbNodes) == 0 {
		return nil, nil
	}

	nodes := make([]*crdt.TreeNode, len(pbNodes))
	for i, pbNode := range pbNodes {
		node, err := fromTreeNode(pbNode)
		if err != nil {
			return nil, err
		}
		nodes[i] = node
	}

	root := nodes[len(nodes)-1]
	depthTable := make(map[int32]*crdt.TreeNode)
	depthTable[pbNodes[len(nodes)-1].Depth] = nodes[len(nodes)-1]
	for i := len(nodes) - 2; i >= 0; i-- {
		var parent *crdt.TreeNode = depthTable[pbNodes[i].Depth-1]

		if err := parent.Prepend(nodes[i]); err != nil {
			return nil, err
		}
		depthTable[pbNodes[i].Depth] = nodes[i]
	}

	root.Index.UpdateDescendantsSize()

	// build crdt.Tree from root to construct the links between nodes.
	return crdt.NewTree(root, nil).Root(), nil
}

Can I pr this code?

@sejongk
Copy link
Contributor

sejongk commented Jul 22, 2024

@m4ushold Sure. I believe we can continue the discussion after you create the PR, if necessary.

CC) @blurfx

@hackerwins
Copy link
Member

Before optimization:
Screenshot 2024-07-24 at 10 39 47 AM

After optimization:
Screenshot 2024-07-24 at 10 40 58 AM

@hackerwins
Copy link
Member

With the introduction of depthTable, we get fromTreeNodes with O(N) time complexity. It would be good if it applied on the JS SDK as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement 🌟 New feature or request sdk ⚒️
Projects
Status: Done
Status: Todo
Development

Successfully merging a pull request may close this issue.

5 participants