Fix a bug in the passing around of the csv delimiter #83

ojarjur · 2016-10-07T19:09:42Z

Apparently, the Pandas DataFrame type is special in that it only
supports the keyword argument 'sep' for specifying the delimiter used in
a CSV file, and not the keyword argument 'delimiter'. (Note that the
builtin 'csv' package only supports the keyword argument 'delimiter' and
not 'sep', so we cannot normalize on a single keyword).

Furthermore, somewhere deep in the DataFrame code it checks if the type
of that argument is 'str', which will succeed in Python 3, but not in
Python 2, as we are using Python 3-compatible strings which have a type
of 'unicode', and were overriding the 'str' method with a Python
3-compatible one which returns values with type 'newstr'.

All of this means that in order to pass a single character argument to
the method, we had to distinguish between all of the calls where a
Python 3 compatible str was expected with this one call that requires an
argument with type 'str'. That distinction is being made by changing
the 'from builtins import str' to 'from builtins import str as newstr'.

Apparently, the Pandas DataFrame type is special in that it only supports the keyword argument 'sep' for specifying the delimiter used in a CSV file, and not the keyword argument 'delimiter'. (Note that the builtin 'csv' package only supports the keyword argument 'delimiter' and not 'sep', so we cannot normalize on a single keyword). Furthermore, somewhere deep in the DataFrame code it checks if the type of that argument is 'str', which will succeed in Python 3, but not in Python 2, as we are using Python 3-compatible strings which have a type of 'unicode', and were overriding the 'str' method with a Python 3-compatible one which returns values with type 'newstr'. All of this means that in order to pass a single character argument to the method, we had to distinguish between all of the calls where a Python 3 compatible str was expected with this one call that requires an argument with type 'str'. That distinction is being made by changing the 'from builtins import str' to 'from builtins import str as newstr'.

qimingj · 2016-10-07T19:19:26Z

datalab/data/_csv.py

@@ -176,4 +176,4 @@ def sample_to(self, count, skip_header_rows, strategy, target):
 datalab.utils.gcs_copy_file(f.name, target)
 else:
 with open(target, 'w') as f:
- df.to_csv(f, header=False, index=False, delimiter=self._delimiter)
+ df.to_csv(f, header=False, index=False, sep=str(','))


Should this be sep=self._delimiter?

Yes, it should. I changed things up when I was debugging why simply casting to str wasn't working (because an import was overriding str), and I forgot to change it back.

Thanks for catching that.

qimingj

Thanks for fixing this!

wafisher · 2016-10-10T02:49:08Z

I'm running into this issue now. Do you mind describing (preferably in a more public forum like the README or the official Datalab docs) how changes like this make their way into the Docker container?

And assuming I don't want to wait for another Datalab version release, what's the best way to get the code from master into my locally-run container? The README says to run the install-* scripts but the Wiki says to run python setup.py install. Also, is there a version of Python that's required to run these?

yebrahim · 2016-10-10T06:45:26Z

Take a look at the Dockerfile for datalab's base image here, the pydatalab repo is cloned (unless you point to a local copy of it, read below) everytime the datalab image is built, this is how these changes get there. I agree this needs to be cleared up in the documentation, perhaps in the install script.

If what you need is to get your own copy of this repo (pydatalab) into your locally-built datalab image then it's easier to do it while building the datalab image. To do this you can pass the path to your local pydatalab to the build script at containers/datalab/build.sh, for example:
./build.sh ../../pydatalab/ < note the slash in the end

The install instructions on this repo are meant for building Jupyter nbextensions, which can then be used from any Jupyter instance.

wafisher · 2016-10-10T15:10:38Z

Thank you. What version of Python should I be running Datalab and this extension with or does it not matter?

Also, what I'm going to try to do is install this locally into site packages and load it in the kernel — is that a bad idea? Or you recommend rebuilding the entire image in that case?

parthea · 2016-10-10T15:35:59Z

Also, what I'm going to try to do is install this locally into site packages and load it in the kernel — is that a bad idea?

I often use this approach because I find it faster than rebuilding the image, although it may not be supported officially. Keep in mind that this approach may not always work (things may break due to version conflicts). I use python 2.7.

Here are the steps:

Clone pydatalab using git clone https://github.com/googledatalab/pydatalab.git
cd to the folder and run either ./install-no-virtualenv.sh or install-virtualenv.sh depending on your environment. Make sure the install completes successfully without errors.
Replace the copy of pydatalab in the container with your local copy, by using : docker run -it -p "127.0.0.1:8081:8080" -v "${HOME}:/content" -e "PROJECT_ID=<your-project>" -v "<path-to-pydatalab-clone>/datalab/:/usr/local/lib/python2.7/dist-packages/datalab/" gcr.io/cloud-datalab/datalab:local

I hope this helps.

ojarjur assigned yebrahim and qimingj Oct 7, 2016

qimingj reviewed Oct 7, 2016

View reviewed changes

Fix erroneous debugging change

9b11c5e

qimingj approved these changes Oct 7, 2016

View reviewed changes

ojarjur merged commit 0dd6d09 into googledatalab:master Oct 7, 2016

ojarjur mentioned this pull request Oct 8, 2016

Add a tutorial to list Stackdrive groups, and read metrics for them. googledatalab/notebooks#33

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix a bug in the passing around of the csv delimiter #83

Fix a bug in the passing around of the csv delimiter #83

ojarjur commented Oct 7, 2016

qimingj Oct 7, 2016

ojarjur Oct 7, 2016

qimingj left a comment

wafisher commented Oct 10, 2016 •

edited

Loading

yebrahim commented Oct 10, 2016

wafisher commented Oct 10, 2016 •

edited

Loading

parthea commented Oct 10, 2016 •

edited

Loading

Fix a bug in the passing around of the csv delimiter #83

Fix a bug in the passing around of the csv delimiter #83

Conversation

ojarjur commented Oct 7, 2016

qimingj Oct 7, 2016

Choose a reason for hiding this comment

ojarjur Oct 7, 2016

Choose a reason for hiding this comment

qimingj left a comment

Choose a reason for hiding this comment

wafisher commented Oct 10, 2016 • edited Loading

yebrahim commented Oct 10, 2016

wafisher commented Oct 10, 2016 • edited Loading

parthea commented Oct 10, 2016 • edited Loading

wafisher commented Oct 10, 2016 •

edited

Loading

wafisher commented Oct 10, 2016 •

edited

Loading

parthea commented Oct 10, 2016 •

edited

Loading