salesforce · aadyotb · Nov 8, 2022 · Oct 25, 2022 · Oct 26, 2022 · Oct 27, 2022
diff --git a/.github/workflows/tests.yml b/.github/workflows/tests.yml
@@ -42,15 +42,15 @@ jobs:
         timeout_minutes: 40
         command: |
           # Get a comma-separated list of the directories of all python source files
-          source_files=$(for f in $(find merlion -iname "*.py"); do echo -n ",$f"; done)
-          script="import os; print(','.join({os.path.dirname(f) for f in '$source_files'.split(',') if f}))"
+          files=$(for f in $(find merlion -iname "*.py"); do echo -n ",$f"; done)
+          script="import os; print(','.join({os.path.dirname(f) for f in '$files'.split(',') if f and 'dashboard' not in f}))"
           source_modules=$(python -c "$script")
 
           # Run tests & obtain code coverage from coverage report.
           coverage run --source=${source_modules} -L -m pytest -v -s
           coverage report && coverage xml -o .github/badges/coverage.xml
           COVERAGE=`coverage report | grep "TOTAL" | grep -Eo "[0-9\.]+%"`
-          echo "##[set-output name=coverage;]${COVERAGE}"
+          echo "coverage=${COVERAGE}" >> $GITHUB_OUTPUT
 
           # Choose a color based on code coverage
           COVERAGE=${COVERAGE/\%/}
@@ -65,7 +65,7 @@ jobs:
           else
             COLOR=red
           fi
-          echo "##[set-output name=color;]${COLOR}"
+          echo "color=${COLOR}" >> $GITHUB_OUTPUT
 
     - name: Create coverage badge
       if: ${{ github.ref == 'refs/heads/main' && matrix.python-version == '3.10' }}

diff --git a/README.md b/README.md
@@ -77,8 +77,9 @@ time series as ``pandas.DataFrame`` s with accompanying metadata.
 You can install `merlion` from PyPI by calling ``pip install salesforce-merlion``. You may install from source by
 cloning this repoand calling ``pip install Merlion/``, or ``pip install -e Merlion/`` to install in editable mode.
 You may install additional dependencies via ``pip install salesforce-merlion[all]``,  or by calling
-``pip install "Merlion/[all]"`` if installing from source. Individually, the optional dependencies include ``plot``
-for interactive plots and ``deep-learning`` for all deep learning models.
+``pip install "Merlion/[all]"`` if installing from source. 
+Individually, the optional dependencies include ``dashboard`` for a GUI dashboard,
+``spark`` for a distributed computation backend with PySpark, and ``deep-learning`` for all deep learning models.
 
 To install the data loading package `ts_datasets`, clone this repo and call ``pip install -e Merlion/ts_datasets/``.
 This package must be installed in editable mode (i.e. with the ``-e`` flag) if you don't want to manually specify the
@@ -107,10 +108,23 @@ and presents experimental results on time series anomaly detection & forecasting
 time series.
 
 ## Getting Started
-Here, we provide some minimal examples using Merlion default models, 
-to help you get started with both anomaly detection and forecasting.
+The easiest way to get started is to use the GUI web-based
+[dashboard](https://opensource.salesforce.com/Merlion/merlion.dashboard.html).
+This dashboard provides a great way to quickly experiment with many models on your own custom datasets.
+To use it, install Merlion with the optional ``dashboard`` dependency (i.e.
+``pip install salesforce-merlion[dashboard]``), and call ``python -m merlion.dashboard`` from the command line.
+You can view the dashboard at http://localhost:8050.
+Below, we show some screenshots of the dashboard for both anomaly detection and forecasting.
+
+![anomaly dashboard](https://github.com/salesforce/Merlion/raw/main/docs/source/_static/dashboard_anomaly.png)
+
+![forecast dashboard](https://github.com/salesforce/Merlion/raw/main/docs/source/_static/dashboard_forecast.png)
+
+To help you get started with using Merlion in your own code, we provide below some minimal examples using Merlion
+default models for both anomaly detection and forecasting.
 
 ### Anomaly Detection
+Here, we show the code to replicate the results from the anomaly detection dashboard above.
 We begin by importing Merlion’s `TimeSeries` class and the data loader for the Numenta Anomaly Benchmark `NAB`.
 We can then divide a specific time series from this dataset into training and testing splits.
 
@@ -164,6 +178,7 @@ Precision: 0.6667, Recall: 0.6667, F1: 0.6667
 Mean Time To Detect: 1 days 10:30:00
 ```
 ### Forecasting
+Here, we show the code to replicate the results from the forecasting dashboard above.
 We begin by importing Merlion’s `TimeSeries` class and the data loader for the `M4` dataset. We can then divide a
 specific time series from this dataset into training and testing splits.
 
@@ -215,7 +230,7 @@ msis = ForecastMetric.MSIS.value(ground_truth=test_data, predict=test_pred,
 print(f"sMAPE: {smape:.4f}, MSIS: {msis:.4f}")
 ```
 ```
-sMAPE: 6.2855, MSIS: 19.1584
+sMAPE: 4.1944, MSIS: 18.9331
 ```
 
 ## Evaluation and Benchmarking

diff --git a/docker/dashboard/Dockerfile b/docker/dashboard/Dockerfile
@@ -0,0 +1,15 @@
+FROM python:3.9-slim
+WORKDIR /opt/Merlion
+# Install Java
+RUN rm -rf /var/lib/apt/lists/* && \
+    apt-get clean && \
+    apt-get update && \
+    apt-get upgrade && \
+    apt-get install -y --no-install-recommends openjdk-11-jre-headless && \
+    rm -rf /var/lib/apt/lists/*
+# Install Merlion from source & set up a gunicorn server
+COPY *.md ./
+COPY setup.py ./
+COPY merlion merlion
+RUN pip install gunicorn "./[dashboard]"
+CMD gunicorn -b 0.0.0.0:80 merlion.dashboard.server:server
diff --git a/Dockerfile → docker/spark-on-k8s/Dockerfile b/Dockerfile → docker/spark-on-k8s/Dockerfile
@@ -15,5 +15,3 @@ RUN pip install pyarrow "./"
 COPY apps /opt/spark/apps
 RUN chmod g+w /opt/spark/apps
 USER ${spark_uid}
-COPY emissions.csv emissions.csv
-COPY emissions.json emissions.json
diff --git a/docs/source/_static/dashboard_anomaly.png b/docs/source/_static/dashboard_anomaly.png
diff --git a/docs/source/_static/dashboard_file.png b/docs/source/_static/dashboard_file.png
diff --git a/docs/source/_static/dashboard_forecast.png b/docs/source/_static/dashboard_forecast.png
diff --git a/docs/source/examples b/docs/source/examples
diff --git a/docs/source/index.rst b/docs/source/index.rst
@@ -30,8 +30,8 @@ You can install ``merlion`` from PyPI by calling ``pip install salesforce-merlio
 cloning the Merlion `repo <https://github.com/salesforce/Merlion>`__ and calling ``pip install Merlion/``, or
 ``pip install -e Merlion/`` to install in editable mode. You may install additional optional dependencies via
 ``pip install salesforce-merlion[all]``,  or by calling ``pip install "Merlion/[all]"`` if installing from source.
-Individually, the optional dependencies include ``plot`` for interactive plots
-and ``deep-learning`` for all deep learning models.
+Individually, the optional dependencies include ``dashboard`` for a GUI dashboard,
+``spark`` for a distributed computation backend with PySpark, and ``deep-learning`` for all deep learning models.
 
 To install the data loading package ``ts_datasets``, clone the Merlion
 `repo <https://github.com/salesforce/Merlion>`__ and call ``pip install -e Merlion/ts_datasets/``. This package must be
@@ -59,7 +59,13 @@ Note the following external dependencies:
 
 Getting Started
 ---------------
-To get started, we recommend the linked tutorials on `anomaly detection <tutorials/anomaly/0_AnomalyIntro>`
+The easiest way to get started is to use the GUI web-based `dashboard <merlion.dashboard>`.
+This dashboard provides a great way to quickly experiment with many models on your own custom datasets.
+To use it, install Merlion with the optional ``dashboard`` dependency (i.e.
+``pip install salesforce-merlion[dashboard]``), and call ``python -m merlion.dashboard`` from the command line.
+You can view the dashboard at http://localhost:8050.
+
+For code resources, we recommend the linked tutorials on `anomaly detection <tutorials/anomaly/0_AnomalyIntro>`
 and `forecasting <tutorials/forecast/0_ForecastIntro>`. After that, you should read in more detail about Merlion's
 main data structure for representing time series `here <tutorials/TimeSeries>`.
 

diff --git a/docs/source/merlion.dashboard.rst b/docs/source/merlion.dashboard.rst
@@ -0,0 +1,78 @@
+merlion.dashboard package
+=========================
+
+This package includes a GUI dashboard app for Merlion, providing a convenient way to train
+and test a time series forecasting or anomaly detection model supported in Merlion. To launch
+the dashboard app, type the following command: ``python -m merlion.dashboard``.
+
+It will launch a Dash app on http://localhost:8050/ by default. After opening the link, the app
+will create a folder ``merlion`` in your home directory. This folder includes the datasets you want to
+analyze or train a model with (in the ``data`` folder), and the trained models for time series
+forecasting or anomaly detection (in the ``models`` folder).
+
+The app has three tabs. The first one is called "file manager" in which you can upload your datasets
+(the datasets will be stored in ``~/merlion/data``), check basic statistics of the datasets, visualize
+the time series data, or download a particular trained model:
+
+.. image:: _static/dashboard_file.png
+
+You can click "Drag & Drop" to upload the file to the ``merlion`` folder (our app is designed to support
+docker deployment, so it doesn't allow to open a local file directly). If you use the app on a local
+machine, you can also copy the data to ``~/merlion/data`` directly. The supported data file is in
+the csv format, where the first column should be either integer Unix timestamps in milliseconds, or datetimes in a
+string format (e.g., "1970-01-01 00:00:00"). The other columns are the features/variables.
+
+Clicking the load button will load the dataset and show the time series figure on the right hand side.
+It will also show some basic statistics, e.g., time series length, mean/std for each variable.
+If you have already trained a model using the dashboard, you can select the model you want to download
+and click the download button. The model and its configuration file will be compressed into a zip file.
+
+The second tab is used to train a time series anomaly detection model:
+
+.. image:: _static/dashboard_anomaly.png
+
+The app provides full support for these models, where you can choose different algorithms and set particular parameters
+according to your needs. To train a model, you need to:
+
+- **Select the dataset**: You can select a single training dataset if there is no test dataset, and then choose
+  a train/test split fraction for splitting this dataset into training and test dataset for evaluation.
+  If you have the test dataset, you can choose "Separate train/test files" and select the test dataset,
+  and then the model will be trained with the training dataset and evaluated with the separate test dataset.
+  The screenshot above uses a single data file, where we use the first 15% for training and the last 85% for testing.
+- **Set the feature columns**: Merlion supports both univariate and multivariate time series anomaly detection,
+  so you can choose one or more features on which to train an anomaly detection model.
+- **Set the label column**: If the dataset has a label column, you can set it for evaluation. Otherwise,
+  ignore this setting.
+- **Select an anomaly detection algorithm**: You need to choose an anomaly detection algorithm such as
+  IsolationForest. You may modify the model's hyperparameters if the default values do not work well.
+- **Set threshold parameters**: You can also test different settings for the detection threshold to
+  determine which value is better for your specific application. Note that updating the threshold will
+  not re-train the entire model; it will simply change the post-processing applied by the trained model.
+
+The training procedure begins after clicking the train button, and the trained model is saved in the
+folder ``~/merlion/models/algorithm_name``. The figure on the right hand side shows the detection results
+on the test dataset, and the tables show the training and testing performance metrics if you set the
+label column.
+
+The third tab is used to train a time series forecasting model supported in Merlion:
+
+.. image:: _static/dashboard_forecast.png
+
+The app provides full support for these models, where you can choose different algorithms and set particular parameters
+according to your needs. To train a model, you need to:
+
+- **Select the dataset**: You can select a single training dataset if there is no test dataset, and then choose
+  a train/test split fraction for splitting this dataset into training and test dataset for evaluation.
+  If you have the test dataset, you can choose "Separate train/test files" and select the test dataset,
+  and then the model will be trained with the training dataset and evaluated with the separate test dataset.
+  The screenshot above uses separate train/test files.
+- **Set the target column**: You need to set the target column whose value you wish to forecast (required),
+  any additional features to use for `multivariate forecasting <tutorials/forecast/2_ForecastMultivariate>` (optional),
+  and the `exogenous variables <tutorials/forecast/3_ForecastExogenous>` whose values are known a priori (optional).
+- **Select a forecasting algorithm**: Finally, you need to choose a forecasting algorithm such as
+  Arima, AutoETS. You may modify the model's hyperparameters if the default values do not work well.
+
+The training procedure begins after clicking the train button. It may take some time to finish model
+training. After the model is trained, the model files will be saved in the folder ``~/merlion/models/algorithm_name``.
+The figure on the right hand side shows the forecasting results on the test dataset, and the tables
+show the training and testing performance metrics.
diff --git a/docs/source/merlion.rst b/docs/source/merlion.rst
@@ -17,6 +17,9 @@ each associated with its own sub-package:
         detection and forecasting.
     -   :py:mod:`merlion.models.automl`: AutoML layers for various models
 
+-   :py:mod:`merlion.dashboard`: A GUI dashboard app for Merlion, which can be started with
+    ``python -m merlion.dashboard``. This dashboard provides a good way to quickly experiment many models on a new
+    time series.
 -   :py:mod:`merlion.spark`: APIs to integrate Merlion with PySpark for using distributed computing to run training
     and inference on multiple time series in parallel.
 -   :py:mod:`merlion.transform`: Data pre-processing layer which implements many standard data transformations used in
@@ -55,6 +58,11 @@ Subpackages
    :maxdepth: 4
 
    merlion.models
+
+.. toctree::
+   :maxdepth: 2
+
+   merlion.dashboard
    merlion.spark
    merlion.transform
    merlion.post_process