Skip to content

conveyor is a lightweight multithreaded file processing library.

License

Notifications You must be signed in to change notification settings

fgehrlicher/conveyor

Repository files navigation

Conveyor

Go Reference Go Test codecov Go Report Card

Conveyor is a lightweight multithreaded file processing library.
Think of it as a simple way to apply a function/method to every line in 1 to n file(s).

A few good example use cases for this library are:

Installation

go get github.com/fgehrlicher/conveyor

Example Usage

Redact all occurrences of a given email:

func main() {
	// Create the output file
	resultFile, _ := os.Create("redacted_data.txt")

	// Instantiate a new ConcurrentWriter which wraps the resultFile handle.
	// The ConcurrentWriter type is just a small thread-safe wrapper for 
	// io.Writer which is able to keep the chunk output in order.
	w := conveyor.NewConcurrentWriter(resultFile, true)

	// Split the input file into chunks of 512 bytes with 
	// the concurrent writer as output ChunkWriter.
	chunks, _ := conveyor.GetChunksFromFile("data.txt", 512, w)

	// Create and execute a Queue with 4 workers and the Redact function as LineProcessor.
	result := conveyor.NewQueue(chunks, 4, conveyor.LineProcessorFunc(Redact)).Work()

	// Print the number of processed lines.
	log.Printf("processed %d lines", result.Lines)
}

// Text that should be redacted
const emailToRedact = "testmail@test.com"

// Redact replaces all occurrences of "testmail@test.com" with "x"
func Redact(line []byte, metadata conveyor.LineMetadata) ([]byte, error) {
	result := strings.ReplaceAll(
		string(line),
		emailToRedact,
		strings.Repeat("x", len(emailToRedact)),
	)

	return []byte(result), nil
}

Additional Examples:

  • Rune Counter counts and prints the number of occurrences of certain runes.
  • Animal Sorter sorts .csv entries by field and divides them into separate files.
  • Split Lines replaces all occurrences of spaces with line breaks.

Limitations

TODO

Logging

TODO

Performance

TODO

About

conveyor is a lightweight multithreaded file processing library.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages