Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More flexible fs.ChecksumCalculator #167

Closed

Conversation

andymoe
Copy link
Member

@andymoe andymoe commented Apr 30, 2021

Summary

Made the following additions to the API in order to allow for adding additional data to the hash after a set of files or directories have been processes. The new API is similar to the Hash interface however other function names may be more clear.

calculator := fs.NewChecksumCalculator()

calculator.WritePaths("/my/file", "/my/folder")

// currently Write mirrors Hash.Write signature but we probably don't need the byte length returned
calculator.Write([]bytes{"some string"})
calculator.Write([]bytes{"another string"})

sum := calculator.SumAsHexString()

The old API is still around for backwards compatibility:

calculator := fs.NewChecksumCalculator()
sum := calculator.Sum("/my/file", "/my/folder")

Use Case

func ShouldRun() (bool, string, error) {
  npm := pexec.NewExecutable("npm")

  buffer := bytes.NewBuffer(nil)
  npm.Execute(pexec.Execution{
      Args:   []string{"config", "get", "node-version"},
      Dir:    "",
      Stdout: buffer,
      Stderr: buffer,
  }))

  calculator := fs.NewChecksumCalculator()
  err := calculator.WritePaths("node_modules")
  if err != nil {
    return false, "", fmt.Errorf("unable to get sha of node_modules for layer caching: %w", err)
  }
  _, err = calculator.Write(buffer.Bytes())
    if err != nil {
    return false, "", fmt.Errorf("unable to determine node version for layer caching: %w", err)
  }

  sum := calculator.SumAsHexString()
  
  cacheSha, ok := metadata["cache_sha].(string)
    if !ok || sum != cacheSha {
    return true, sum, nil
  }
  
  return false, "", nil
}

With no API changes (still viable option I guess)

func ShouldRun() (bool, string, error) {
  npm := pexec.NewExecutable("npm")
  
  buffer := bytes.NewBuffer(nil)
  npm.Execute(pexec.Execution{
      Args:   []string{"config", "get", "node-version"},
      Dir:    "",
      Stdout: buffer,
      Stderr: buffer,
  }))

  calculator := fs.NewChecksumCalculator()
  sum, err := calculator.sum("node_modules")
  if err != nil {
     return false, "", fmt.Errorf("unable to get sha of node_modules for layer caching: %w", err)
  }

  hash := sha256.New()
  _, err = hash.Write(buffer.Bytes())
  if err != nil {
     return false, "", fmt.Errorf("unable to get sha of node_modules for layer caching: %w", err)
  }

  _, err = hash.Write([]byte(sum))
    if err != nil {
    return false, "", fmt.Errorf("unable to get sha of node_modules for layer caching: %w", err)
  }

  sum = hex.EncodeToString(hash.sum(nil))

  cacheSha, ok := metadata["cache_sha].(string)
  if !ok || sum != cacheSha {
    return true, sum, nil
  }
  
  return false, "", nil
}

Potential alternative method names

calculator := fs.NewChecksumCalculator()

calculator.AddPaths("/my/file", "/my/folder")
calculator.Add([]bytes{"some string"})
calculator.Add([]bytes{"another string"})

sum := calculator.ResultAsHexString()

Copy link
Member

@ryanmoran ryanmoran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to understand the intention of an API like this. How would it be used? What use-cases is it solving for?

fs/checksum_calculator.go Outdated Show resolved Hide resolved
fs/checksum_calculator.go Outdated Show resolved Hide resolved
@andymoe andymoe force-pushed the enhance-checksum-calculator branch from 028a7ba to e80baa4 Compare May 3, 2021 23:33
@andymoe andymoe changed the title WIP: More flexible ChecksumCalculator More flexible fs.ChecksumCalculator May 3, 2021
Comment on lines +50 to +54
n, err := c.hash.Write(p)
if err != nil {
return n, err
}
return n, err
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
n, err := c.hash.Write(p)
if err != nil {
return n, err
}
return n, err
return c.hash.Write(p)

Unless you intend to do something different in the error case, these are logically equivalent.

Copy link
Member Author

@andymoe andymoe May 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I'm debating if it makes sense to hide n entirely from the return is all.

Introduces the following methods:

  - WriteFiles(files ...string) (error)
  - Write(p []byte) (int, error)
  - SumAsHexString()

Sum(files ...string) (string, error) remains for backward compatability.
@andymoe andymoe force-pushed the enhance-checksum-calculator branch from e80baa4 to 50f4366 Compare May 4, 2021 00:13
@andymoe andymoe marked this pull request as ready for review May 4, 2021 00:16
@andymoe andymoe requested a review from a team as a code owner May 4, 2021 00:16
@andymoe
Copy link
Member Author

andymoe commented May 4, 2021

If or when we decide something like this API is OK I should backfill tests for the public methods. Right now the new additions are tested indirectly because Sum uses them all.

@ryanmoran
Copy link
Member

I think it could make sense to expose some of these lower-level APIs to the buildpack author, but I think it may make more sense as a new type that is distinct from the fs.ChecksumCalculator. Specifically, I think the proposed changes break an implicit contract which is that the fs.ChecksumCalculator is stateless. The new API moves state into the calculator and thus makes it difficult to reuse or use in cases of parallel operation. I'd like to propose a different API that would create a new stateful type that can be reused by the fs.ChecksumCalculator and therefore keep this implicit stateless contract in-place.

type Checksum struct {
  hash hash.Hash
}

func (c Checksum) WritePaths(paths ...string) error {
  // recursively hashes a filepath, keeping state in c.hash
}

func (c Checksum) Write(b []byte) (int, error) {
  // writes b into c.hash
}

func (c Checksum) EncodeToString() string {
  // returns a hex string encoding of the hash value
}

@andymoe
Copy link
Member Author

andymoe commented Jun 9, 2021 via email

@andymoe
Copy link
Member Author

andymoe commented Jun 9, 2021 via email

@ForestEckhardt
Copy link
Contributor

@andymoe feels like six of one, half a dozen of the other to me. If you feel that a new PR is necessary I would not be offended.

@ForestEckhardt
Copy link
Contributor

@andymoe bump

@sophiewigmore
Copy link
Member

Drafting for now due to inactivity

@sophiewigmore sophiewigmore marked this pull request as draft August 20, 2021 14:50
@fg-j
Copy link

fg-j commented Dec 1, 2021

Closing this for now since it's been inactive for a few months. Feel free to reopen if you still want to contribute this work @andymoe

@fg-j fg-j closed this Dec 1, 2021
@andymoe
Copy link
Member Author

andymoe commented Dec 1, 2021 via email

@ForestEckhardt
Copy link
Contributor

My goal was to review and pick this up next. I'll try and do that later this week!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants