This is the first Petrobras' repository on GitHub. It supports the 3W project and promotes experimentation of Machine Learning-based approaches and algorithms for specific problems related to undesirable events that occur in offshore oil wells.
The 3W project is based on the 3W dataset, a database described in this paper, and on the 3W toolkit, a software package that promotes experimentation with the 3W dataset for specific problems. The name 3W was chosen because this dataset is composed of instances from 3 different sources and which contain undesirable events that occur in oil Wells.
Timely detection of undesirable events in oil wells can help prevent production losses, reduce maintenance costs, environmental accidents, and human casualties. Losses related to this type of events can reach 5% of production in certain scenarios, especially in areas such as Flow Assurance and Artificial Lifting Methods. In terms of maintenance, the cost of a maritime probe, required to perform various types of operations, can exceed US $500,000 per day.
Creating a dataset and making it public to be openly experienced can greatly foment the development of tools that can:
- Improve the process of identifying undesirable events in offshore wells production;
- Increase the efficiency of monitoring the integrity of wells and subsea systems, whose related problems can generate invaluable losses for people, environment, and company's image.
The 3W is the pilot of a Petrobras' program called Conexões para Inovação - Módulo Open Lab. This pilot is an open project composed by two major resources:
- The 3W dataset, which will be evolved and supplemented with more instances from time to time;
- The 3W toolkit, which will also be evolved (in many ways) to cover an increasing number of undesirable events during its development.
Therefore, our strategy is to make these resources publicly available so that we can develop the 3W project with a global community collaboratively.
With this project, Petrobras intends to develop (fix, improve, supplement, etc.):
- The 3W dataset itself;
- The 3W toolkit itself;
- Approaches and algorithms that can be incorporated into systems dedicated to monitoring undesirable events in offshore oil wells during their respective production phases;
- Tools that can be useful for our ambition.
We expect to receive various types of contributions from individuals, research institutions, startups, companies and partner oil operators.
Before you can contribute to this project, you need to read and agree to the following documents:
It is also very important to know, participate and follow the discussions. See the discussions section.
All the code of this project is licensed under the Apache 2.0 License and all 3W dataset data files (CSV files in the subdirectories of the dataset directory) are licensed under the Creative Commons Attribution 4.0 International License.
See the discussions section. If you don't get clarification, please open discussions to ask your questions so we can answer them.
To the best of its authors' knowledge, this is the first realistic and public dataset with rare undesirable real events in oil wells that can be readily used as a benchmark dataset for development of machine learning techniques related to inherent difficulties of actual data. For more information about the theory behind this dataset, refer to the paper A realistic and public dataset with rare undesirable real events in oil wells published in the Journal of Petroleum Science and Engineering (link here).
The 3W dataset consists of all CSV files in the subdirectories of the dataset directory and structured as detailed here.
A 3W dataset's general presentation with some quantities and statistics is available in this Jupyter Notebook.
The 3W toolkit is a software package written in Python 3 that contains resources that make the following easier:
- 3W dataset overview generation;
- Experimentation and comparative analysis of Machine Learning-based approaches and algorithms for specific problems related to undesirable events that occur in offshore oil wells during their respective production phases;
- Standardization of key points of the Machine Learning-based algorithm development pipeline.
It is important to note that there are arbitrary choices in this toolkit, but they have been carefully made to allow adequate comparative analysis without compromising the ability to experiment with different approaches and algorithms.
The 3W toolkit is implemented in sub-modules as discribed here.
Specific problems will be incorporated into this project gradually. At this point, we can work on:
All specification is detailed in the CONTRIBUTING GUIDE.
The list below with examples of how to use the 3W toolkit will be incremented throughout its development.
For a contribution of yours to be listed here, follow the instructions detailed in the CONTRIBUTING GUIDE.
For all results generated by the 3W toolkit to be consistent, we recommend you create and use a virtual environment with the packages versions specified in the environment.yml, which was generated with conda. First you have to install the Anaconda. Then open an Anaconda Prompt, make sure the current directory is the directory where you have the 3W and run the following commands as needed:
- To create a virtual environment from our environment.yml:
$ conda env create -f environment.yml
- To activate the created virtual environment:
$ conda activate 3w
- To use the 3W toolkit resources interactively:
$ python
- To initialize a local Jupyter Notebook server:
$ jupyter notebook