Skip to content

Blazing fast parallel text data pipeline for large files

Notifications You must be signed in to change notification settings

DanielMcSheehy/parallel-pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

example workflow Parallel Pipeline

A blazing fast library that allows data pipelines to work in parallel. This can traverse and transform extremely large text files (100GB or more) in seconds.

Usage

import "github.com/DanielMcSheehy/parallel-pipeline/pipeline"

Add some text transformations

// example text transformation
func RemoveAllSmileyFaces() *pipeline.Transformer {
	return &pipeline.Transformer{
		Transform: func(input string) string {
			return strings.ReplaceAll(input, "😀", "")
		},
	}
}

start the data pipeline

func main() {
    mainPipeline := pipeline.New(workerCount)
    mainPipeline.RegisterTransformers(
        RemoveAllSmileyFaces(),
    )
    mainPipeline.Execute(directory, ouputDirectory)
}

About

Blazing fast parallel text data pipeline for large files

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages