-
Notifications
You must be signed in to change notification settings - Fork 109
Guide for Developers
Install Java JDK 11 (Java Development Kit) (recommend: [adoptopenjdk](https://adoptium.net/installation/)). To verify the installation, run:
java -versionNext, set JAVA_HOME. On macOS you can run:
export JAVA_HOME=$(/usr/libexec/java_home -v 11)
On Windows, add a system environment variable called JAVA_HOME that points to the JDK directory.
Install Python 3.12 from the official site or your preferred package manager.
On Windows, install the software from https://gitforwindows.org/. Git Bash is available after installing Git.
On Mac and Linux, see https://git-scm.com/book/en/v2/Getting-Started-Installing-Git
Verify the installation by:
git --versionInstall sbt for building the project. Please refer to sbt Reference Manual — Installing sbt. We recommend you to use sdkman to install sbt.
Verify the installation by:
sbt --versionIf the above command fails on Windows after installation, it is recommended to restart your computer.
Install an LTS version (not the latest) of node. Currently, we require LTS version > 18.x.
On Windows, install from https://nodejs.org/en/.
On Mac and Linux, use NVM to install NodeJS as it avoids permission issues.
Verify the installation by:
node -vInstall the angular 16 cli globally:
npm install -g @angular/cli@16Verify the installation by:
ng versionIn the terminal, clone the Texera repo:
git clone git@github.com:Texera/texera.gitDo the following changes to the configuration files:
- Edit
common/config/src/main/resources/storage.confto use your Postgres credentials.
jdbc {
- username = "postgres"
+ username = <Postgres username you have>
username = ${?STORAGE_JDBC_USERNAME}
- password = "postgres"
+ password = <Postgres password you have>
password = ${?STORAGE_JDBC_PASSWORD}
}- Edit
common/config/src/main/resources/udf.confto use the correct python executable path(can be obtained by commandwhich pythonorwhere python):
python {
- path =
+ path = "/the/executable/path/of/python"
}Texera uses PostgreSQL to manage the user data and system metadata. To install and configure it: Install Postgres. If you are using Mac, simply execute:
brew install postgresqlInstall Pgroonga for enabling full-text search, if you are using Mac, simply execute:
brew install pgroongaExecute sql/texera_ddl.sql to create texera_db database for storing user system data & metadata storage
Execute sql/iceberg_postgres_catalog.sql to create the database for storing Iceberg catalogs.
Texera requires LakeFS and S3(Minio is one of the implementations) as the dataset storage. Setting up these two storage services locally are required to make Texera's dataset feature functioning.
Install Docker Desktop which contains both docker engine and docker compose. Make sure you launch the Docker after installing it.
In the terminal, enter the directory containing the docker-compose file:
cd file-service/src/main/resources
Edit docker-compose.yml by: search for volumes in the file and follow the instructions in the comment. This step is required otherwise your data will be lost if containers are deleted
Execute the following command to start LakeFS and Minio:
docker compose up
Before you import the project, you need to have "Scala", and "SBT Executor" plugins installed in Intellij.

- In Intellij, open
File -> New -> Project From Existing Source, then choose thetexerafolder. - In the next window, select
Import Project from external model, then selectsbt. - In the next window, make sure
Project JDKis set. Click OK. - IntelliJ should import and build this Scala project. In the terminal under
texera, run:
sbt clean protocGenerate
This will generate proto-specified codes. And the IntelliJ indexing should start. Wait until the indexing and importing is completed. And on the right, you can open the sbt tab and check the loaded texera project and couple of sub projects:
-
When IntelliJ prompts "Scalafmt configuration detected in this project" in the bottom right corner, select "Enable". If you missed the IntelliJ prompt, you can check the
Event Logon the bottom right -
In addition to the microservices, you need to run the
JooqCodeGeneratorlocated atcommon/dao/src/main/scala/edu/uci/ics/texera/dao/JooqCodeGenerator.scalabefore starting the microservices for the first time, or each time you make changes to the database.
The easiest way to run backend services is in IntelliJ. Currently we have couple of micro services for different purposes. If one microservice failed after running, it may have dependency to another microservice, so wait for other ones to start, also make sure to run LakeFS docker compose:
| Component | File Path | Purpose / Functionality |
|---|---|---|
| ConfigService |
config-service/src/main/scala/edu/uci/ics/texera/service/ConfigService.scala
|
Hosts the system configurations to allow the frontend to retrieve configuration data. |
| TexeraWebApplication |
amber/src/main/scala/edu/uci/ics/texera/web/TexeraWebApplication.scala
|
Provides user login, community resource read/write operations, and loads metadata for available operators. |
| FileService |
file-service/src/main/scala/edu/uci/ics/texera/service/FileService.scala
|
Provides dataset-related endpoints including dataset management, access control, and read/write operations across datasets. |
| WorkflowCompilingService |
workflow-compiling-service/src/main/scala/edu/uci/ics/texera/service/WorkflowCompilingService.scala
|
Propagates schema and checks for static errors during workflow construction. |
| ComputingUnitMaster |
amber/src/main/scala/edu/uci/ics/texera/web/ComputingUnitMaster.scala
|
Manages workflow execution and acts as the master node of the computing cluster. Must start before ComputingUnitWorker.
|
| ComputingUnitWorker |
amber/src/main/scala/edu/uci/ics/texera/web/ComputingUnitWorker.scala
|
A worker node in the computing cluster (not a web server). |
| ComputingUnitManagingService |
computing-unit-managing-service/src/main/scala/edu/uci/ics/texera/service/ComputingUnitManagingService.scala
|
Manages the lifecycle of different types of computing units and their connections to users’ frontends. |
| AccessControlService |
access-control-service/src/main/scala/org/apache/ texera/service/AccessControlService.scala
|
Authorize requests sent to computing unit, currently not needed to run for local development, it is only used in Kubernetes setup |
To run each of the above web service, go to the corresponding scala file(i.e. for TexeraWebApplication, go find TexeraWebApplication.scala), then run the main function by pressing on the green run button and wait for the process to start up.
For TexeraWebApplication, the following message indicates that it is successfully running:
[main] [akka.remote.Remoting] Remoting now listens on addresses:
org.eclipse.jetty.server.Server: Started
- If IntelliJ displays CreateProcess error=206, the filename or extension is too long : add the -Didea.dynamic.classpath=true in Help | Edit Custom VM Options and restart the IDE
For ComputingUnitMaster, the following prompt indicates that it is successfully running:
---------Now we have 1 node in the cluster---------
Texera has lots of Python-based operators like visualizations, and UDF operators. To enable them, install python dependencies by executing, you also need to install R in your system:
cd texera
pip install -r amber/requirements.txt -r amber/operator-requirements.txt -r amber/r-requirements.txtThis is for developers that work on the frontend part of the project. This step is NOT needed if you develop the backend only.
Before you start, make sure the backend services are all running.
cd frontend
yarn installIgnore those warnings (warnings are usually marked in yellow color or start with WARN).
- Click on the Green Run button next to the
startinfrontend/package.json. - Wait for some time and the server will get started. Open a browser and access
http://localhost:4200. You should see the Texera UI with a canvas.\
Every time you save the changes to the frontend code, the browser will automatically refresh to show the latest UI.
Run the following command
yarn run build
This command will optimize the frontend code to make it run faster. This step will take a while. After that, start the backend engine in IntelliJ and use your browser to access http://localhost:8080.
- Set
smtpinconfig/src/main/resources/user-system.conf. You need an App password if the account has 2FA. - Log in to Texera with an admin account.
- Open the Gmail dashboard under the admin tab.
- Send a test email.
This part is optional; you only need to do this if you are working on a specific task.
- Create the needed new table in MySQL and update
sql/texera_ddl.sqlto include the new table. - Run
common/dao/src/main/scala/edu/uci/ics/texera/dao/JooqCodeGenerator.scalato generate the classes for the new table.
Note: Jooq creates DAO for simple operations if the requested SQL query is complex, then the developer can use the generated Table classes to implement the operation
Edit config/src/main/resources/gui.conf, change local-login to false.
Edit config/src/main/resources/user-system.conf, change invite-only to true.
There are two types of permissions for the backend endpoints:
- @RolesAllowed(Array("Role"))
- @PermitAll Please don't leave the permission setting blank. If the permission is missing for an endpoint, it will be @PermitAll by default.
Some workflows create deep directories (e.g., when writing metadata.json via Python/ICEBERG). On Windows, this can exceed the legacy MAX_PATH (~260 chars) and cause failures like:
[WinError 3] The system cannot find the path specified.
Enable long paths support (per machine) by running PowerShell as Administrator:
New-ItemProperty -Path "HKLM:\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 -PropertyType DWORD -ForceVerify the setting (expected value: 1):
Get-ItemProperty -Path "HKLM:\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled"If you cannot change this policy (e.g., on managed devices), keep your workspace path short (e.g.,
C:\src\texera) to reduce overall path length.
On Windows, if you encounter the following error when executing a workflow:
Caused by: java.io.FileNotFoundException: HADOOP_HOME and hadoop.home.dir are unset
here are the steps to solve this issue:
Steps
- Obtain a
winutils.exematching your Hadoop line (Texera currently uses Hadoop 3.3.x).- Suggested source (use any equivalent source approved for your environment): https://github.com/cdarlint/winutils/tree/master/hadoop-3.3.5/bin
- Create the directory and place the binary:
C:\hadoop\bin\winutils.exe - In IntelliJ, add this VM option to the FileService run configuration:
-Dhadoop.home.dir="C:\hadoop" - (Optional) Also set a system environment variable and restart the IDE/terminal:
HADOOP_HOME=C:\hadoop
Notes
- This issue may happen only on Windows; macOS/Linux do not need
winutils.exe. - Ensure the
winutils.exeyou use matches your Hadoop major/minor (e.g., 3.3.x). - After configuring, the prior read/write and “unset” errors should disappear.
Copyright © 2025 The Apache Software Foundation.
Getting Started
Implementing an Operator
- Step 2 - Guide to Implement a Java Native Operator
- Step 3 - Guide to Use a Python UDF
- Step 4 - Guide to Implement a Python Native Operator
Contributing to the Project