Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dataframe headers don't make it into the qri dataset #113

Open
chriswhong opened this issue Sep 23, 2021 · 1 comment
Open

dataframe headers don't make it into the qri dataset #113

chriswhong opened this issue Sep 23, 2021 · 1 comment
Assignees

Comments

@chriswhong
Copy link
Contributor

chriswhong commented Sep 23, 2021

Given the following code, we expect the resulting Qri dataset body to have a column named firstname. Instead we see the first row as the first column name.

# CSV Download Code Sample
# This really works! Click 'Dry Run' to try it ↗

# import dependencies
load("http.star", "http") # `http` lets us talk to the internets
load("dataframe.star", "dataframe") # `dataframe` gives us powerful dataset manipulation capabilities

# with dependencies loaded, download a CSV
# this fetches a "popular baby names" dataset from the NYC Open Data Portal
csvDownloadUrl = "https://data.cityofnewyork.us/api/views/25th-nujf/rows.csv?accessType=DOWNLOAD"
rawCSV = http.get(csvDownloadUrl).body()

# parse the CSV (string) into a qri DataFrame
theData = dataframe.parse_csv(rawCSV)

# we can do filtering of the DataFrame and assign it back to its original variable
# filter for first names that start with 'V'
theData = theData[[x.startswith('V') for x in theData["Child's First Name"]]]

# each column in the DataFrame is a Series
# make a new `Series` with only the unique values
uniqueSeries = theData["Child's First Name"].unique()

# iterate over the Series and convert each string to lowercase
for idx, val in enumerate(uniqueSeries):
    uniqueSeries[idx] = val.lower()

# sort the Series alphabetically
uniqueSeries = sorted(uniqueSeries)

# make an empty DataFrame, assign our Series to be a column named 'firstname'
# this will become the next version of our dataset's body
newBody = dataframe.DataFrame()
newBody['firstname'] = uniqueSeries

# get the previous version of this dataset
workingDataset = dataset.latest()
# set the body of the dataset to be our new body
workingDataset.body = newBody

# finally, commit the changes
# the last step of every transform is always `dataset.commit(Dataset)`
dataset.commit(workingDataset)



@dustmop dustmop self-assigned this Sep 27, 2021
@dustmop
Copy link
Contributor

dustmop commented Oct 11, 2021

Figured out the root cause of this bug. The line workingDataset.body = newBody does not correctly copy the columns from newBody to the workingDataset object. Fix should be fairly straight-forward to make.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants