Tecton Parallel Retrieval (Experimental)

Tecton Parallel Retrieval is an experimental feature that allows you to retrieve feature values in parallel, using multiple Databricks Spark clusters. This feature is currently in alpha and is subject to change.

This can be found on pypi https://pypi.org/project/tecton-parallel-retrieval/

How to Run (Spark)

import tecton
import tecton_parallel_retrieval as retrieval

ws = tecton.Workspace('prod')
feature_service = ws.get_feature_service('my_feature_service')

spine = spark.read.parquet('s3://...')

df = feature_service.get_features_for_events(spine)

multi_job = retrieval.start_dataset_jobs_in_parallel(
    df,
    dataset_name="tecton-parallel-retrieval-test",
    num_splits=5,
    compute_mode='spark',
    staging_path='s3://bucket/staging',  # Materialization job role should have read access
    tecton_materialization_runtime="1.0.10",
    cluster_config=tecton.EMRClusterConfig(instance_type='m5.8xlarge', spark_config={'spark.sql.shuffle.partitions': '5000'})
)

multi_job.wait_for_all_jobs()

multi_job.to_spark()

How to Run (Rift)

import tecton
import tecton_parallel_retrieval as retrieval

ws = tecton.Workspace('prod')
feature_service = ws.get_feature_service('my_feature_service')

spine = pandas.read_parquet('...')

df = feature_service.get_features_for_events(spine)

multi_job = retrieval.start_dataset_jobs_in_parallel(
    df,
    dataset_name="tecton-parallel-retrieval-test",
    num_splits=5,
    compute_mode='rift',
    environment='rift-core-1.0',
    cluster_config=tecton.RiftBatchConfig(instance_type='m5.8xlarge')
)

multi_job.wait_for_all_jobs()

multi_job.to_pandas()

How to retrieve existing parallel dataset

import tecton
import tecton_parallel_retrieval as retrieval

ws = tecton.Workspace('prod')

multi_ds = retrieval.retrieve_dataset(ws, "tecton-parallel-retrieval-test")
multi_ds.to_pandas()

How to Install and Upload

To generate a whl file run the following in your terminal...

pip install --upgrade build
python -m build

The following command uploads it to pypi. Make sure you update the version in setup.py, as well as the dependency versions in setup.py and the hardcoded databricks job dependencies in main.py

pip install --upgrade twine
twine upload --repository pypi dist/*

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
tecton_parallel_retrieval		tecton_parallel_retrieval
.gitignore		.gitignore
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tecton Parallel Retrieval (Experimental)

How to Run (Spark)

How to Run (Rift)

How to retrieve existing parallel dataset

How to Install and Upload

About

Releases

Packages

Contributors 4

Languages

tecton-ai/tecton-parallel-retrieval

Folders and files

Latest commit

History

Repository files navigation

Tecton Parallel Retrieval (Experimental)

How to Run (Spark)

How to Run (Rift)

How to retrieve existing parallel dataset

How to Install and Upload

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages