Nullable int columns being pulled into float columns: Allow format all columns on read_csv #158
Labels
bug
Something isn't working
enhancement
New feature or request
major release
Will be addressed in the next major release
When importing data from athena my data has nulls which is a change from file to file. Csv's nullable int columns get converted to float64 when returned as a dataframe from
pandas.read_sql_athena
, which is unexpected behavior.I suggest that aws-data-wrangler adds an option to
Session.pandas.read_csv
that can convert all columns to string columns such asdf = pd.read_csv('/path/to/file.csv', dtype=str)
.Oddly pandas does not have an easy way to convert an already existing dataframe float64 to str AND format. The suggested way is at load format the dataframe for float64 (suggested workaround and for printing, the inability to format when convert to a string )
What is happening:
Data
col1,col2,col3
19,3,1
20,,1
,5,4
Becomes:
col1,col2,col3
19.0,3.0,1
20.0,,1
,5.0,4
Using aws data wrangler like:
Forcing me to write a following function after the above
read_csv
Why can I just specify the column(s) I want? Because this is data I am receiving externally and I don't know when a column will have null and thus the int column will implicitly be converted to float64 and when I run code against it
if row['col'] == 19
the data source data keeps changing from csv to application.The text was updated successfully, but these errors were encountered: