DeepMapping: Learned Data Mapping for Lossless Compression and Efficient Lookup

Resources for ICDE 2024 Submission

DeepMapping: Learned Data Mapping for Lossless Compression and Efficient Lookup

DeepMapping is wrapped up as a Python library, please run the following command to install it.
```
cd DeepMapping
pip install -e ./
```
We wrapped up the feature extraction as a C-based function for better performance. Run the following command to compile it as a shared library.
```
cc -fPIC -shared -o shared_utils.so shared_utils.c
```

Dataset

Our experiments covered synthetic datasets, low/high correlation datasets with different scales(100MB, 1GB, 10GB), and TPC-H, TPC-DS benchmark datasets with scale factors as 1 and 10. We removed all string/continuous columns and uploaded our pre-generated datasets to HERE.

After download it, please unzip it to the root folder of this GitHub repository. Then, you will see a dataset folder here.

List of datasets:

TPC-H (S1/S10): customer, lineitem, orders, part, supplier.
TPC-DS (S1/S10): catalog_page, catalog_returns, catalog_sales,customer_address, customer_demographics, customer, item, store_returns, web_returns.
Synthetic Dataset (100MB, 1GB, 10GB): single_value_column_low_correlation, single_value_column_high_correlation, multiple_value_column_low_correlation, multiple_value_column_high_correlation.

Model Searching

Please run python run_search_model.py to perform a NAS with given dataset. You can configure the NAS by editing the run_search_model.py correspondingly. The searched result will be printout.
Modify the SEARCH_MODEL_STRUCTURE in run_train_searched_model.py with the output from step 1. And then run python run_train_searched_model.py to train a model.

Benchmark

We provided some demo models for the following 2 tasks. Please go HERE to download:

After download it, please unzip it to the root folder of this GitHub repository. Then, you will see a models folder here.

Note: to optimize the performance for each method, including baselines and DeepMapping. It is recommended to tune the hyperparameters in your local environment and use it. Run run_benchmark_tune.py to run a grid-search.

Task: Data Query

These experiments measured overall storage overhead and end-end query latency for benchmark datasets, i.e. TPC-H and TPC-DS. Run python run_benchmark_data_query.py to benchmark. To benchmark with different dataset, you should modify the file correspondingly by following the instructions provided in the python file.

Task: Data Manipulation

These experiments measured overall storage overhead and end-end query latency for synthetic dataset with data manipulation, i.e. INSERT/UPDATE/DELETE. Run python run_benchmark_data_manipulation.py to benchmark it. To benchmark with different dataset, you should modify the file correspondingly by following the instructions provided in the python file.

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
DeepMapping		DeepMapping
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
nas_train_models.py		nas_train_models.py
requirements.txt		requirements.txt
run_benchmark_data_manipulation.py		run_benchmark_data_manipulation.py
run_benchmark_data_query.py		run_benchmark_data_query.py
run_benchmark_tune.py		run_benchmark_tune.py
run_search_model.py		run_search_model.py
shared_utils.c		shared_utils.c
shared_utils.so		shared_utils.so

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DeepMapping: Learned Data Mapping for Lossless Compression and Efficient Lookup

Dataset

Model Searching

Benchmark

Task: Data Query

Task: Data Manipulation

About

Releases

Packages

Contributors 2

Languages

License

asu-cactus/DeepMapping

Folders and files

Latest commit

History

Repository files navigation

DeepMapping: Learned Data Mapping for Lossless Compression and Efficient Lookup

Dataset

Model Searching

Benchmark

Task: Data Query

Task: Data Manipulation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages