Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added tree interface #1043

Merged
merged 9 commits into from
Dec 13, 2020
Merged

added tree interface #1043

merged 9 commits into from
Dec 13, 2020

Conversation

tzahij
Copy link
Contributor

@tzahij tzahij commented Dec 10, 2020

Just one small file
Yoni - please see if it answers your needs.
Others - please see it match our conventions and golang


This change is Reviewable

Copy link
Contributor

@johnnyaug johnnyaug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This is very much what I was expecting, it will be exteremly helpful.

I left some comments.

The thing I'm missing the most is documentation. What is the functionality provided by each interface and each method. I know it may be clear to you, but it will help others allow.

PartName sstable.ID `json:"part_name"`
MaxKey gr.Key `json:"max_path"`
}
type TreeType struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove the word "type", it's confusing.
Also from the treePartType

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

}

type TreeWriter interface {
HasOpenWriter() bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not really clear to me how the writer works.
Are HasOpenWriter, ForceCloseCurrentPart and IsSplitKey part of the interface, or are they implementation details?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HasOpenWriter and ForceClose are interfaces needed by Apply.
I am not sure they will be needed by others. If Merge is implemented without apply, It will probably need it.
IsSplitKey determines if a key should terminate a part. is probably redundant. So unless someone really wants it - it will be removed

return t.partManger
}

type TreeRepo interface {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about GetTree?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's discuss it. a Tree is an internal structure of the package. I am not sure it is needed.
There is another one needed, which returns two reduced trees for diff. let's discuss it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So how can one pass a TreeSlice to use the interface?

forest/tree/tree.go Outdated Show resolved Hide resolved
treeSlice []treePartType
}

type treesRepo struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an implementation. Did you mean to create an interface from it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will be moved

partManger sstable.Manager
}

const (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are these? add documentation if they are part of the interface.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will be moved to the cache part

forest/tree/tree.go Outdated Show resolved Hide resolved
NewTreeWriter(splitFactor int, closeAsync sstable.BatchWriterCloser) TreeWriter
NewScannerFromID(treeID gr.TreeID, start gr.Key) (gr.ValueIterator, error)
NewScannerFromTreeParts(treeSlice TreeType, start gr.Key) (gr.ValueIterator, error)
GetPartManger() sstable.Manager
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the tree interface must expose the inner part manager? It's the sort of complication we really need to avoid if possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently process initialization, and where application parts get trees and sstables is not finalized AFAIK. Once it is - I will happily remove it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is initialisation part of the interface?
Isn't it a one time operation during setup?

Co-authored-by: itaiad200 <itaiad200@gmail.com>
Copy link
Collaborator

@ozkatz ozkatz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this interface requires a do-over - it expoes some of the implementation details and is pretty hard to understand without seeing how it is being used, so kind of defeating the purpose of having a interface in the first place.

type TreeWriter interface {
HasOpenWriter() bool
WriteEntry(record gr.ValueRecord) error
ForceCloseCurrentPart()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This very much looks like implementation details, not an interface. What is a current part? it is not mentioned anywhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, removed from interface

HasOpenWriter() bool
WriteEntry(record gr.ValueRecord) error
ForceCloseCurrentPart()
IsSplitKey(key gr.Key, rowNum int) bool
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this part of a tree writer? Why not decouple these and have a KeySplitter interface?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed it from interfaces. currently no need to expose it

ForceCloseCurrentPart()
IsSplitKey(key gr.Key, rowNum int) bool
FlushIterToTree(iter gr.ValueIterator) error
SaveTree(reuseTree TreeType) (gr.TreeID, error)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is a reuseTree? the interface doesn't really explain itself, so at a minimum it requires extensive documentation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, added explanation


import (
cache "github.com/treeverse/lakefs/forest/cache_map"
gr "github.com/treeverse/lakefs/graveler"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's not alias our own packages, it makes the rest of the implementation harder to read

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

MaxKey gr.Key `json:"max_path"`
}
type TreeSlice struct {
treeSlice []TreePart
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, if it's a struct including a treeSlice, then the struct type should be called Tree.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after commit

treeSlice []TreePart
}

type treesRepo struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change "trees-repo" to tree-repo (singular) in all relevant places

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after commit

@codecov-io
Copy link

codecov-io commented Dec 13, 2020

Codecov Report

Merging #1043 (10f577b) into master (a8dc344) will increase coverage by 0.05%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1043      +/-   ##
==========================================
+ Coverage   46.17%   46.23%   +0.05%     
==========================================
  Files         152      152              
  Lines       12316    12318       +2     
==========================================
+ Hits         5687     5695       +8     
+ Misses       5903     5897       -6     
  Partials      726      726              
Impacted Files Coverage Δ
catalog/mvcc/cataloger_list_entries.go 85.34% <0.00%> (+0.12%) ⬆️
config/logger.go 75.00% <0.00%> (+10.71%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a8dc344...10f577b. Read the comment docs.

…ange once integrated with rest of tree package
…ange once integrated with rest of tree package
}

type TreeWriter interface {
WriteEntry(record graveler.ValueRecord) error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

naming suggestions:
WriteEntry >> WriteValue

FlushIterToTree >> WriteIterator

SaveTree: split to 1. Save() which will save and return tree ID.
and 2. Merge(tree) which will create a new tree on top of an existing one and return a tree id as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good ideas. will do after merge

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why wait?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

package tree

import (
// cache "github.com/treeverse/lakefs/forest/cache_map"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove.

"github.com/treeverse/lakefs/graveler/committed/sstable"
)

type TreePart struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need json tags?


type TreePart struct {
PartName sstable.ID `json:"part_name"`
MaxKey graveler.Key `json:"max_path"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
MaxKey graveler.Key `json:"max_path"`
MaxKey graveler.Key `json:"max_key"`

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

PartName sstable.ID `json:"part_name"`
MaxKey graveler.Key `json:"max_path"`
}
type TreeSlice struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need it? Also - name is confusing as it's parts slice.


type TreeRepo interface {
// NewTreeWriter returns a writer that uses the part manager to create a new tree
NewTreeWriter(splitFactor int, // average number of keys that we want to stored in a part
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use go standard of documentation?
i.e. Everything is about the declaration, not embedded in the args across the lines.

NewTreeWriter(splitFactor int, closeAsync sstable.BatchWriterCloser) TreeWriter
NewScannerFromID(treeID gr.TreeID, start gr.Key) (gr.ValueIterator, error)
NewScannerFromTreeParts(treeSlice TreeType, start gr.Key) (gr.ValueIterator, error)
GetPartManger() sstable.Manager
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is initialisation part of the interface?
Isn't it a one time operation during setup?

}

type TreeWriter interface {
WriteEntry(record graveler.ValueRecord) error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why wait?

type TreeRepo interface {
// NewTreeWriter returns a writer that uses the part manager to create a new tree
NewTreeWriter(splitFactor int, // average number of keys that we want to stored in a part
// for more detail, look at "IsSplitKey"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is 'IsSplitKey' ?


type TreeRepo interface {
// NewTreeWriter returns a writer that uses the part manager to create a new tree
NewTreeWriter(splitFactor int, // average number of keys that we want to stored in a part
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is splitFactor changeable between trees? IMO should be a const.

// NewTreeWriter returns a writer that uses the part manager to create a new tree
NewTreeWriter(splitFactor int, // average number of keys that we want to stored in a part
// for more detail, look at "IsSplitKey"
closeAsync sstable.BatchWriterCloser, // component used to close part asynchronously, and wait for all part
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're braking the abstraction layers, why does the caller of the TreeManager need to pass a sstable writer feature?

@tzahij
Copy link
Contributor Author

tzahij commented Dec 13, 2020

Initial commit to enable other team members to work against this interface/ will be changed later according to needs

@tzahij tzahij merged commit 8aada34 into master Dec 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants