-
Notifications
You must be signed in to change notification settings - Fork 109
Create Dataset, upload data to it and use it in Workflow
This tutorial goes through the process of preparing data by creating dataset and creating a workflow to analyze data resided in the dataset using Texera.
More specifically, we are going to create a dataset named Sales Dataset which contains a file about the sales data of different types of merchandises for several countries. And the workflow will calculate the average sales per item type across different countries in Europe from the CountrySalesData.csv (Make sure the downloaded file is in .csv file extension). The sales data has been downloaded from eforexcel.com and has 100 rows of data.
We will first be creating a dataset and uploading the sales data to it. Then we will be creating a workflow on Texera Web UI to
- read the data from the file;
- filter the relevant data based on keywords;
- perform an aggregation.
1. Upload data by creating a Dataset
- Go to the Dataset tab and click the
dataset creationicon to start creating the datasaet - Name the dataset as
Sales Dataset, drag and drop theCountrySalesData.csvto the file uploading area - Click
Create, the dataset we just created, along with the preview ofCountrySalesData.csvis shown.
2. Read data in Workflow
- On the left panel, go to the
environmenttab and clickAdd Datasetto add theSales Datasetto current workflow.CountrySalesData.csvwill be available to be previewed and loaded to the workflow.
' - Drag and drop a
CSV File Scanoperator. On the right panel, input the file nameCountrySalesData.csvand select the path from the drop down menu - Run the workflow, you should be able to see the loaded sales data.
3. Add operators to analyze data
-
Drag and drop a
Filteroperator to keep only the sales data inEurope
-
Drag and drop a
Aggregateoperator to get the average sold units group byItem Type
Copyright © 2025 The Apache Software Foundation.
Getting Started
Implementing an Operator
- Step 2 - Guide to Implement a Java Native Operator
- Step 3 - Guide to Use a Python UDF
- Step 4 - Guide to Implement a Python Native Operator
Contributing to the Project