Skip to content

Latest commit

 

History

History
 
 

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 

Modin Vs. Pandas Performance Sample

The Modin Vs. Pandas Performance code illustrates how to use Modin* to replace the Pandas API. The sample compares the performance of Modin and the performance of Pandas for specific dataframe operations.

Area Description
Category Concepts and Functionality
What you will learn How to accelerate the Pandas API using Modin.
Time to complete Less than 10 minutes

Purpose

Modin accelerates Pandas operations using Ray or Dask execution engine. The distribution provides compatibility and integration with the existing Pandas code. The sample code demonstrates how to perform some basic dataframe operations using Pandas and Modin. You will be able to compare the performance difference between the two methods. You can run the sample locally or in Google Colaboratory (Colab).

Prerequisites

Optimized for Description
OS Ubuntu* 20.04 (or newer)
Hardware Intel® Core™ Gen10 Processor
Intel® Xeon® Scalable Performance processors
Software Intel® Distribution of Modin*

Note: AI and Analytics samples are validated on AI Tools Offline Installer. For the full list of validated platforms refer to Platform Validation.

Key Implementation Details

This code sample is implemented for CPU using Python programming language. The sample requires NumPy, Pandas, Modin libraries, and the time module in Python.

Environment Setup

You will need to download and install the following toolkits, tools, and components to use the sample.

1. Get AI Tools

Required AI Tools: Modin

If you have not already, select and install these Tools via AI Tools Selector. AI and Analytics samples are validated on AI Tools Offline Installer. It is recommended to select Offline Installer option in AI Tools Selector.

Note: If Docker option is chosen in AI Tools Selector, refer to Working with Preset Containers to learn how to run the docker and samples.

2. (Offline Installer) Activate the AI Tools bundle base environment

If the default path is used during the installation of AI Tools:

source $HOME/intel/oneapi/intelpython/bin/activate

If a non-default path is used:

source <custom_path>/bin/activate

3. (Offline Installer) Activate relevant Conda environment

conda activate modin

4. Clone the GitHub repository

git clone https://github.com/oneapi-src/oneAPI-samples.git
cd oneAPI-samples/AI-and-Analytics/Getting-Started-Samples/Modin_Vs_Pandas

5. Install dependencies

Note: Before running the following commands, make sure your Conda/Python environment with AI Tools installed is activated

pip install -r requirements.txt
pip install notebook

For Jupyter Notebook, refer to Installing Jupyter for detailed installation instructions.

Run the Sample

Note: Before running the sample, make sure Environment Setup is completed.

Go to the section which corresponds to the installation method chosen in AI Tools Selector to see relevant instructions:

AI Tools Offline Installer (Validated)

1. Register Conda kernel to Jupyter Notebook kernel

If the default path is used during the installation of AI Tools:

$HOME/intel/oneapi/intelpython/envs/modin/bin/python -m ipykernel install --user --name=modin

If a non-default path is used:

<custom_path>/bin/python -m ipykernel install --user --name=modin

2. Launch Jupyter Notebook

jupyter notebook --ip=0.0.0.0

3. Follow the instructions to open the URL with the token in your browser

4. Select the Notebook

Modin_Vs_Pandas.ipynb

5. Change the kernel to modin

6. Run every cell in the Notebook in sequence

Conda/PIP

Note: Before running the instructions below, make sure your Conda/Python environment with AI Tools installed is activated

1. Register Conda/Python kernel to Jupyter Notebook kernel

For Conda:

<CONDA_PATH_TO_ENV>/bin/python -m ipykernel install --user --name=<your-env-name>

To know <CONDA_PATH_TO_ENV>, run conda env list and find your Conda environment path.

For PIP:

python -m ipykernel install --user --name=<your-env-name>

2. Launch Jupyter Notebook

jupyter notebook --ip=0.0.0.0

3. Follow the instructions to open the URL with the token in your browser

4. Select the Notebook

Modin_Vs_Pandas.ipynb

5. Change the kernel to <your-env-name>

6. Run every cell in the Notebook in sequence

Docker

AI Tools Docker images already have Get Started samples pre-installed. Refer to Working with Preset Containers to learn how to run the docker and samples.

Example Output

Note: Your output might be different between runs on the notebook depending upon the random generation of the dataset. For the first run, Modin may take longer to execute than Pandas for certain operations since Modin performs some initialization in the first iteration.

CPU times: user 8.47 s, sys: 132 ms, total: 8.6 s
Wall time: 8.57 s

Example expected cell output is included in Modin_Vs_Pandas.ipynb.

Related Samples

License

Code samples are licensed under the MIT license. See License.txt for details.

Third party program Licenses can be found here: third-party-programs.txt

*Other names and brands may be claimed as the property of others. Trademarks