Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Excel data source adapter #418

Merged
merged 23 commits into from
Feb 18, 2023
Merged

Excel data source adapter #418

merged 23 commits into from
Feb 18, 2023

Conversation

zxie86
Copy link
Contributor

@zxie86 zxie86 commented Sep 7, 2022

Final Report: GSoC '22

Projects

This PR has been created as part of the Google Summer of Code in the "Data Source Adapter for Excel Sheets" project.
I have created an adapter to query the Excel Sheets. This adapter allows Polypheny to interact with Excel sheets and query the mapped data.

This adapter is able to handle both XLS (Excel ’97) and XLSX (Excel 2007) formats of spreadsheets. It uses the Apache POI to extract data from Excel sheets.


Features

  • Able to perform the following basic operations:
  1. TableScan
  2. Project
  3. Filter
  4. Aggregate
  5. Sort
  • Able to recognize the following date formats:
  1. yyyy-mm-dd
  2. dd.mm.yyyy
  • Able to import Excel formats:
  1. XLS
  2. XLSX

Constraints and Future Improvement

The adapter is able to add one sheet at a time. It would be great to have an "add all sheets from an Excel file" function.


Thank you

Thank you Polypheny and Google Summer of Code for this amazing opportunity. I learned and applied my coding skills during this event. Special thanks to my mentors Isabel Geissmann and Marc Hennemann for helping me when I ran into issues, and Marc Vogt for helping with my proposal.

@hennlo hennlo added A-db Area: DB C-enhancement Category: An issue proposing an enhancement S-in-progress Still working on this pull request labels Sep 7, 2022
@hennlo hennlo requested review from isabelge and hennlo September 9, 2022 11:21
@hennlo
Copy link
Member

hennlo commented Sep 16, 2022

@zxie86 Thanks for the PR and your contribution.
We're just going through your PR and have a few remarks or adjustments that you should take a look at

1. Source can be created without a source file
image
Although you make sure that the name is specified and unique you still omit checking if a source file is selected or if it even exists.
This results in creating sources that have no content attached.

  • Check if the directory field has an input
  • Check that the input within directory even exists.

2. The unique sourceName is not used within the catalog or the schema view. rather the filename of the xlsx file.
image
image
This is difficult to digest if your source file has multiple sheets. and you want to add all of them they will look somethin like this
public.excel_adapter_test_data
public.excel_adapter_test_data0
public.excel_adapter_test_data1

  • Make sure that you don't use the filename as the table object name but maybe insert another textfield for the adapter, stating what the objectName should be.

@datomo datomo self-requested a review February 9, 2023 10:41
Copy link
Member

@datomo datomo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most suggestions from Marc are sadly handled similarly for all Sources and will be adjusted for all of them in a single solution. This PR is for me good to go.

Copy link
Member

@vogti vogti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx, @zxie86, for this PR!

@vogti vogti changed the title Excel adapter Excel data source adapter Feb 18, 2023
@vogti vogti merged commit 5e3257e into polypheny:master Feb 18, 2023
Copy link

@NigamAnanya NigamAnanya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR's are good to go

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-db Area: DB C-enhancement Category: An issue proposing an enhancement S-in-progress Still working on this pull request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants