-
Notifications
You must be signed in to change notification settings - Fork 82
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #108 from petrobras/documentation_improvements
Set 3W Project, 3W Dataset and 3W Toolkit as proper entities/names
- Loading branch information
Showing
17 changed files
with
120 additions
and
120 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,6 @@ | ||
The 3W dataset consists of multiple CSV files saved in the [dataset](dataset) directory and structured as follows. | ||
The 3W Dataset consists of multiple CSV files saved in the [dataset](dataset) directory and structured as follows. | ||
|
||
There are two types of subdirectory: | ||
|
||
* The [folds](dataset/folds) subdirectory holds all 3W dataset configuration files. For each specific project released in the 3W project there will be a file that will specify how and which data must be loaded for training and testing in multiple folds of experimentation. This scheme allows implementation of cross validation and hyperparameter optimization by the 3W toolkit users. In addition, this scheme allows the user to choose some specific characteristics to the desired experiment. For example: whether or not simulated and/or hand-drawn intances should be considered in the training set. It is important to clarify that specifying which instances make up which folds will always be random but fixed in each configuration file. This is considered necessary so that results obtained for the same problem with different approaches can be compared; | ||
* The other subdirectories holds all 3W dataset data files. The subdirectory names are the instances' labels. Each file represents one instance. The filename reveals its source. All files are standardized as follow. There are one observation per line and one series per column. Columns are separated by commas and decimals are separated by periods. The first column contains timestamps, the last one reveals the observations' labels, and the other columns are the Multivariate Time Series (MTS) (i.e. the instance itself). | ||
* The [folds](dataset/folds) subdirectory holds all 3W Dataset configuration files. For each specific project released in the 3W Project there will be a file that will specify how and which data must be loaded for training and testing in multiple folds of experimentation. This scheme allows implementation of cross validation and hyperparameter optimization by the 3W Toolkit users. In addition, this scheme allows the user to choose some specific characteristics to the desired experiment. For example: whether or not simulated and/or hand-drawn intances should be considered in the training set. It is important to clarify that specifying which instances make up which folds will always be random but fixed in each configuration file. This is considered necessary so that results obtained for the same problem with different approaches can be compared; | ||
* The other subdirectories holds all 3W Dataset data files. The subdirectory names are the instances' labels. Each file represents one instance. The filename reveals its source. All files are standardized as follow. There are one observation per line and one series per column. Columns are separated by commas and decimals are separated by periods. The first column contains timestamps, the last one reveals the observations' labels, and the other columns are the Multivariate Time Series (MTS) (i.e. the instance itself). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,20 +1,20 @@ | ||
The list of priority improvements for the 3W project that we intend to develop collaboratively with the community is detailed below. | ||
The list of priority improvements for the 3W Project that we intend to develop collaboratively with the community is detailed below. | ||
|
||
* Extend the 3W dataset with more instances of new event types; | ||
* Finalize incorporation of MAIS into the 3W toolkit; | ||
* Extend the 3W Dataset with more instances of new event types; | ||
* Finalize incorporation of MAIS into the 3W Toolkit; | ||
* Evaluate and if appropriate start using [Git LFS](https://git-lfs.com/); | ||
* Configure other GitHub resources that may be useful for our development. What resources exactly? | ||
* Incorporate and provide in this repository documentation automatically generated from docstrings. How exactly? | ||
* Review strategy for generating `folds_clf_XX.csv`; | ||
* Review strategy for virtual environment specification (`environment.yml`); | ||
* Develop a `setup.py`. Is this module interesting for our project? | ||
* Develop tool to generate `diff` between versions of the 3W dataset | ||
* Improve presentation of the [3W dataset citation list](LIST_OF_CITATIONS.md); | ||
* Develop tool to generate `diff` between versions of the 3W Dataset | ||
* Improve presentation of the [3W Dataset citation list](LIST_OF_CITATIONS.md); | ||
* Develop unit tests for the main methods and functions; | ||
* Set up action for automatic execution of unit tests after creating PRs; | ||
* Establish coding guidelines. Which one? | ||
* Reevaluate the use of the [rolling_window.py](toolkit/rolling_window.py). Is there a better option or a newer version? | ||
* Evaluate inclusion of specific features for hyperparameter optimization; | ||
* Assess feasibility and benefits of using [Sklearn Pipeline](https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html); | ||
* Evaluate the use of [Docker](https://www.docker.com/) to facilitate the use of the 3W toolkit and the approval of contributions; | ||
* Evaluate the use of [Docker](https://www.docker.com/) to facilitate the use of the 3W Toolkit and the approval of contributions; | ||
* Establish one or more time-related metrics for anomaly detection and classification. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.