Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Backend interface for ent #4

Merged
merged 14 commits into from
Jun 5, 2024

Conversation

jhoward-lm
Copy link
Contributor

Add the Store and Retrieve methods in order to implement the Backend interface.

Copy link
Member

@puerco puerco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love this, some initial comments

@jhoward-lm jhoward-lm force-pushed the store-retrieve branch 2 times, most recently from 5bffbca to 74e5efc Compare April 26, 2024 19:35
@jhoward-lm
Copy link
Contributor Author

jhoward-lm commented May 5, 2024

@puerco (and anyone else who might be interested)

I'd like to leverage generic types for the Backend interface. In this example I called it StoreRetriever (sort of like how ValueScanner is a type that implements Scan and Value methods)

Interface example
type (
	ProtobomType interface {
		sbom.Document | sbom.DocumentType | sbom.Edge | sbom.ExternalReference |
			sbom.Metadata | sbom.Node | sbom.NodeList | sbom.Person | sbom.Purpose | sbom.Tool |
			map[sbom.HashAlgorithm]string | map[sbom.SoftwareIdentifierType]string
	}

	StoreRetriever[T ProtobomType] interface {
		Store(*T, *options.StoreOptions) error
		Retrieve(string, *options.RetrieveOptions) (*T, error)
	}
)

This would add some flexibility by allowing ent and other backend types to implement a backend for any of the protobom types instead of just Document. For example:

Sample implementation
type ToolBackend Backend

var _ StoreRetriever[sbom.Tool] = (*ToolBackend)(nil)

func (backend *ToolBackend) Store(tool *sbom.Tool, opts *options.StoreOptions) error {
	// store logic

	return nil
}

func (backend *ToolBackend) Retrieve(id string, opts *options.RetrieveOptions) (*sbom.Tool, error) {
	// retrieve logic

	return nil, nil
}

So, a few questions:

  • What are your thoughts on this? Would you be opposed?
  • Should the new/updated interface be scoped to:
    • just backends/ent/ (at least for now)? or
    • model/v1/storage/backend.go?
      • in addition to the existing interface? or
      • as a replacement for the existing Backend interface? The impact to the existing FileSystem backend should be minimal

EDIT: for context, the reason for this is because the logic for inserting into the database using the ent API follows a very similar pattern for each protobom type, including Document. To me it seems like the perfect use case for reusing the interface for different protobom types.

Also, it would enable useful functionality for projects like bomctl. Doing a retrieve and full graph traversal to reconstruct an entire document just to get a list of nodes or whatever seems like a lot of extra processing that can be optimized with this change

@jhoward-lm
Copy link
Contributor Author

@puerco

For retrieve, the Backend interface Retrieve() method takes an ID string. Some clarification questions:

  • Is that explicitly meant to be the document ID?
  • Is that document ID tied to the metadata ID field?
    • for SPDX SBOMs, the Document.Metadata.Id field seems to always be the non-unique identifier string DOCUMENT. In this PR, I've changed the unique identifier for a document from just metadata ID to a unique index consisting of metadata ID, name, and version
  • For retrieving data by ID, would it make sense to either:
    • Associate everything that gets stored in the database with that document ID? i.e. create a mixin schema so that every protobom type schema gets a document_id foreign key field. This way for example every Node, NodeList, Tool, etc. that gets added would have a reference back to its originating Document. Users can then easily write code that says for example "give me all the Nodes that came from this document ID",

      OR

    • Only implement Retrieve() for the types that have an explicit string ID (basically just Document and Node)?

@eddiezane
Copy link
Member

@jhoward-lm I haven't used ent before but I am going to assume the files with the spdx license header are auto generated? Is there a way to edit the header template to include a note about which files are generated and not to be edited manually (or do you edit them with ent)? I like how this is very explicit with protos.

https://github.com/protobom/protobom/blob/5a59787986af712a86bb822081bf7c5977f1fbd7/pkg/sbom/sbom.pb.go#L1

@jhoward-lm
Copy link
Contributor Author

jhoward-lm commented May 21, 2024

@eddiezane this is already done with a go template (example), which has the nice side effect of being automatically ignored by golangci-lint. So if that first line isn’t present, it isn’t auto generated. The template is in internal/backends/ent/template/header.tmpl

Inside internal/backends/ent, everything is auto generated except:

  • entc.go
  • generate.go
  • Makefile
  • schema/*.go
  • template/header.tmpl

@puerco
Copy link
Member

puerco commented May 23, 2024

Updating this for those not present at the community meeting: we fixed the generation of the document IDs when reading data from SPDX.

Answering @jhoward-lm 's question above: Yes, the idea is that storage backends retrieve the IDs using the document identifier. The doc identifier is supposed to be globally unique (when used properly).

At some point we can think of another way to key them. We may develop a serialization method and content-address them in the database, but for now I expect the protobom structures to be still in flux so it could be hard to keep up to do it now.

@jhoward-lm
Copy link
Contributor Author

@puerco can you cut a new protobom release with the document ID fix?

@jhoward-lm jhoward-lm marked this pull request as ready for review May 28, 2024 21:51
chore:
  - Update protobom dependency to v0.4.2

Signed-off-by: Jonathan Howard <jonathan.w.howard@lmco.com>
Signed-off-by: Jonathan Howard <jonathan.w.howard@lmco.com>
Signed-off-by: Jonathan Howard <jonathan.w.howard@lmco.com>
Signed-off-by: Jonathan Howard <jonathan.w.howard@lmco.com>
@jhoward-lm
Copy link
Contributor Author

@puerco @eddiezane @houdini91 ready for review

Do we want godoc examples as tests, stretchr/testify tests, or both?

@jhoward-lm jhoward-lm marked this pull request as draft June 1, 2024 21:15
@jhoward-lm
Copy link
Contributor Author

Moved back to draft while addressing a unique constraint issue with node edge types.

Signed-off-by: Jonathan Howard <jonathan.w.howard@lmco.com>
@jhoward-lm
Copy link
Contributor Author

@puerco @eddiezane @houdini91 Created a separate PR #9 containing schema updates and generated ent code to minimize the number of changes in this PR. The other PR should be merged before this one.

Also added a godoc example in example_test.go. PR is ready for review

@jhoward-lm jhoward-lm marked this pull request as ready for review June 2, 2024 20:37
@jhoward-lm jhoward-lm requested a review from puerco June 3, 2024 13:01
@jhoward-lm jhoward-lm marked this pull request as draft June 3, 2024 20:06
@jhoward-lm
Copy link
Contributor Author

Need to address a small issue with retrieving node list edges

Copy link
Member

@puerco puerco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking really good, but I have a few of comments on some patterns in the code

chore:
  - address PR comments
Signed-off-by: Jonathan Howard <jonathan.w.howard@lmco.com>
Signed-off-by: Jonathan Howard <jonathan.w.howard@lmco.com>
@jhoward-lm
Copy link
Contributor Author

jhoward-lm commented Jun 4, 2024

@puerco summary of changes from most recent commit:

  • expose new methods Backend.InitClient() and Backend.CloseClient(), leave database client connection management to calling context
    • Backend.InitClient() is pretty much the same as what was previously just Backend.init(), just returns error instead of panicking
  • if client is not initialized when a Store or Retrieve method is called, error out instead of repeatedly initializing it
  • for database backreference foreign keys (e.g. Node.NodeListID), use context.WithValue() instead of BackendOptions struct fields
  • for retrieving the protobom Edges, there is currently no easy way to reconstruct the edges back to their original form, so the edges are being consolidated for now

edit: I wanted to just initialize the client within NewBackend() without the calling code needing to also call InitClient(), but if panic isn't an option, that would require returning an error from a constructor function, which seemed strange to me. What are your thoughts?

@jhoward-lm jhoward-lm marked this pull request as ready for review June 4, 2024 16:28
Signed-off-by: Jonathan Howard <jonathan.w.howard@lmco.com>
Signed-off-by: Jonathan Howard <jonathan.w.howard@lmco.com>
Signed-off-by: Jonathan Howard <jonathan.w.howard@lmco.com>
Signed-off-by: Jonathan Howard <jonathan.w.howard@lmco.com>
@puerco
Copy link
Member

puerco commented Jun 5, 2024

Nice, LGTM. Let's merge it. Two thoughts:

that would require returning an error from a constructor function, which seemed strange to me

Returning an error in the constructor doesn't feel strange to me, but I get it :) If you want to keep the same signature, defer the error by storing it as a property (eg backend.clientInitError ), delete the failed client and return the stored error when you check for the client in Store(), etc

there is currently no easy way to reconstruct the edges back to their original form, so the edges are being consolidated for now

OK, it is concerning because this will fail once we have conformance tests. But TBH, it feels that a document that has two edges of the same type, eg:

graph LR;
    SRC-->DEPENDS_ON;
    DEPENDS_ON-->NODE1;
    SRC-->D2[DEPENDS_ON];
    D2-->NODE2;
Loading

... should be equivalent to another one that has just one edge pointing to the same two destinations

graph LR;
    SRC-->DEPENDS_ON;
    DEPENDS_ON-->NODE1;
    DEPENDS_ON-->NODE2;
Loading

I think we don't have support to compare these two as equivalents, do you want to open an issue on protobom/protobom?

Copy link
Member

@puerco puerco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@puerco puerco merged commit 2c2f51f into protobom:main Jun 5, 2024
1 check passed
@jhoward-lm jhoward-lm deleted the store-retrieve branch June 5, 2024 21:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants