Skip to content

Commit

Permalink
removes date datatype support from the tap code (#95)
Browse files Browse the repository at this point in the history
* Bump version 3.0.0
* Removes date support from the tap code
* Fixes unit tests
* Fixes datatype integration test
* Update null time unittest


---------

Co-authored-by: RushiT0122 <rtodkar@stitchdata-talend.com>
  • Loading branch information
rdeshmukh15 and RushiT0122 authored Mar 12, 2024
1 parent a958361 commit b5c50a6
Show file tree
Hide file tree
Showing 7 changed files with 14 additions and 36 deletions.
3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
# Changelog

## 3.0.0
* Remove support for date datatype [#95](https://github.com/singer-io/tap-google-sheets/pull/95)

## 2.1.0
* Updates to run on python 3.11.7 [#94](https://github.com/singer-io/tap-google-sheets/pull/94)

Expand Down
8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,21 +48,21 @@ This tap:
- Invalid types: formulaValue, errorValue
- Then check:
- [effectiveFormat.numberFormat.type](https://developers.google.com/sheets/api/reference/rest/v4/spreadsheets/cells#NumberFormatType)
- Valid types: UNEPECIFIED, TEXT, NUMBER, PERCENT, CURRENCY, DATE, TIME, DATE_TIME, SCIENTIFIC
- Valid types: UNEPECIFIED, TEXT, NUMBER, PERCENT, CURRENCY, TIME, DATE_TIME, SCIENTIFIC
- Determine JSON schema column data type based on the value and the above cell metadata settings.
- If DATE, DATE_TIME, or TIME, set JSON schema format accordingly
- If DATE_TIME, or TIME, set JSON schema format accordingly

[**values (GET)**](https://developers.google.com/sheets/api/reference/rest/v4/spreadsheets.values/get)
- Endpoint: https://sheets.googleapis.com/v4/spreadsheets/${spreadsheet_id}/values/'${sheet_name}'!${row_range}?dateTimeRenderOption=SERIAL_NUMBER&valueRenderOption=UNFORMATTED_VALUE&majorDimension=ROWS
- This endpoint loops through sheets and row ranges to get the [unformatted values](https://developers.google.com/sheets/api/reference/rest/v4/ValueRenderOption) (effective values only), dates and datetimes as [serial numbers](https://developers.google.com/sheets/api/reference/rest/v4/DateTimeRenderOption)
- This endpoint loops through sheets and row ranges to get the [unformatted values](https://developers.google.com/sheets/api/reference/rest/v4/ValueRenderOption) (effective values only), datetimes as [serial numbers](https://developers.google.com/sheets/api/reference/rest/v4/DateTimeRenderOption)
- Primary keys: _sdc_row
- Replication strategy: Full (GET file audit data for spreadsheet_id in config)
- Process/Transformations:
- Loop through sheets (compared to catalog selection)
- Send metadata for sheet
- Loop through ALL columns for columns having a column header
- Loop through ranges of rows for ALL rows in sheet available area max row (from sheet metadata)
- Transform values, if necessary (dates, date-times, times, boolean).
- Transform values, if necessary (date-times, times, boolean).
- Date/time serial numbers converted to date, date-time, and time strings. Google Sheets uses Lotus 1-2-3 [Serial Number](https://developers.google.com/sheets/api/reference/rest/v4/DateTimeRenderOption) format for date/times. These are converted to normal UTC date-time strings.
- Process/send records to target

Expand Down
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
from setuptools import setup, find_packages

setup(name='tap-google-sheets',
version='2.1.0',
version='3.0.0',
description='Singer.io tap for extracting data from the Google Sheets v4 API',
author='jeff.huth@bytecode.io',
classifiers=['Programming Language :: Python :: 3 :: Only'],
Expand Down
14 changes: 4 additions & 10 deletions tap_google_sheets/schema.py
Original file line number Diff line number Diff line change
Expand Up @@ -123,7 +123,7 @@ def get_sheet_schema_columns(sheet):
# INVALID: errorType, formulaType
# https://developers.google.com/sheets/api/reference/rest/v4/spreadsheets/other#ExtendedValue
#
# column_number_format_type = UNEPECIFIED, TEXT, NUMBER, PERCENT, CURRENCY, DATE,
# column_number_format_type = UNEPECIFIED, TEXT, NUMBER, PERCENT, CURRENCY,
# TIME, DATE_TIME, SCIENTIFIC
# https://developers.google.com/sheets/api/reference/rest/v4/spreadsheets/cells#NumberFormatType
#
Expand All @@ -136,18 +136,12 @@ def get_sheet_schema_columns(sheet):
col_properties = {'type': ['null', 'boolean', 'string']}
column_gs_type = 'boolValue'
elif column_effective_value_type == 'numberValue':
if column_number_format_type == 'DATE_TIME':
if column_number_format_type in ['DATE_TIME', 'DATE']:
col_properties = {
'type': ['null', 'string'],
'format': 'date-time'
}
column_gs_type = 'numberType.DATE_TIME'
elif column_number_format_type == 'DATE':
col_properties = {
'type': ['null', 'string'],
'format': 'date'
}
column_gs_type = 'numberType.DATE'
elif column_number_format_type == 'TIME':
col_properties = {
'type': ['null', 'string'],
Expand Down Expand Up @@ -215,11 +209,11 @@ def get_sheet_schema_columns(sheet):
}
columns.append(column)

if column_gs_type in {'numberType.DATE_TIME', 'numberType.DATE', 'numberType.TIME', 'numberType'}:
if column_gs_type in {'numberType.DATE_TIME', 'numberType.TIME', 'numberType'}:
col_properties = {
'anyOf': [
col_properties,
{'type': ['null', 'string']} # all the date, time has string types in schema
{'type': ['null', 'string']} # all the time has string types in schema
]
}
# add the column properties in the `properties` in json schema for the respective column name
Expand Down
17 changes: 0 additions & 17 deletions tap_google_sheets/transform.py
Original file line number Diff line number Diff line change
Expand Up @@ -80,19 +80,6 @@ def transform_sheet_datetime_data(value, unformatted_value, sheet_title, col_nam
sheet_title, col_name, col_letter, row_num, col_type))
return str(value)

# transform date values in the sheet
def transform_sheet_date_data(value, unformatted_value, sheet_title, col_name, col_letter, row_num, col_type):
if isinstance(unformatted_value, (int, float)):
# passing both the formatted as well as the unformatted value, so we can use the string value in
# case of any errors while date transform
date_str, is_error = excel_to_dttm_str(value, unformatted_value)
return_str = date_str if is_error else date_str[:10]
return return_str
else:
LOGGER.info('WARNING: POSSIBLE DATA TYPE ERROR; SHEET: {}, COL: {}, CELL: {}{}, TYPE: {}'.format(
sheet_title, col_name, col_letter, row_num, col_type))
return str(value)

# transform time values in the sheet
def transform_sheet_time_data(value, unformatted_value, sheet_title, col_name, col_letter, row_num, col_type):
if isinstance(unformatted_value, (int, float)):
Expand Down Expand Up @@ -231,10 +218,6 @@ def get_column_value(value, unformatted_value, sheet_title, col_name, col_letter
elif col_type == 'numberType.DATE_TIME':
return transform_sheet_datetime_data(value, unformatted_value, sheet_title, col_name, col_letter, row_num, col_type)

# DATE
elif col_type == 'numberType.DATE':
return transform_sheet_date_data(value, unformatted_value, sheet_title, col_name, col_letter, row_num, col_type)

# TIME ONLY (NO DATE)
elif col_type == 'numberType.TIME':
return transform_sheet_time_data(value, unformatted_value, sheet_title, col_name, col_letter, row_num, col_type)
Expand Down
4 changes: 1 addition & 3 deletions tests/test_google_sheets_datatypes.py
Original file line number Diff line number Diff line change
Expand Up @@ -164,8 +164,7 @@ def test_run(self):
}
string_column_formats = {
"Datetime": "%Y-%m-%dT%H:%M:%S.%fZ",
"Time": "%H:%M:%S",
"Date": "%Y-%m-%d",
"Time": "%H:%M:%S"
}

for record in record_data:
Expand Down Expand Up @@ -207,7 +206,6 @@ def test_run(self):
"Currency": "stringValue",
"Datetime": "numberType.DATE_TIME",
"Time": "numberType.TIME",
"Date": "numberType.DATE",
"String": "stringValue",
"Number": "numberType",
"Boolean": "boolValue",
Expand Down
2 changes: 1 addition & 1 deletion tests/unittests/test_null_cell_format.py
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,7 @@ def test_null_date_effectiveFormat(self):
"null",
"string"
],
"format": "date"
"format": "date-time"
}

sheet_json_schema, columns = schema.get_sheet_schema_columns(sheet)
Expand Down

0 comments on commit b5c50a6

Please sign in to comment.