Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Let --data-path be specified when running download.py scripts #642

Merged
merged 6 commits into from
Nov 23, 2016

Conversation

ExplodingCabbage
Copy link
Contributor

@ExplodingCabbage ExplodingCabbage commented Nov 20, 2016

Description

Adds a --data-path (or -d) argument to the download.py scripts that lets the path to download to be specified.

Motivation and Context

Makes downloading to a custom location easier; resolves #637

How Has This Been Tested?

Manually. After pip install -e ., I did a git clean -xdf to clean my local repo, then, in sequence:

  • Checked that

    import spacy
    spacy.load('en')('Hello, world')[0].pos_
    

    gives the empty string

  • Ran python3 -m spacy.en.download all -d ~/some_dir

  • Checked that spacy.load('en')('Hello, world')[0].pos_ still gave the empty string but that spacy.util.set_data_path('/home/mark/some_dir'); spacy.load('en')('Hello, world')[0].pos_ didn't.

  • Ran python3 -m spacy.en.download all

  • Checked that spacy.load('en')('Hello, world')[0].pos_ no longer gave the empty string.

  • Tried running python3 -m spacy.en.download all and python3 -m spacy.en.download all -d ~/some_dir again and confirmed that both exited early saying that the model was already installed.

Types of changes

  • Bug fix (non-breaking change fixing an issue)
  • New feature (non-breaking change adding functionality to spaCy)
  • Breaking change (fix or feature causing change to spaCy's existing functionality)
  • Documentation (Addition to documentation of spaCy)

Checklist:

  • My code follows spaCy's code style. (*as far as I can tell - there's no actual style guide in CONTRIBUTING.md yet, so I just tried to imitate what was nearby)
  • My change requires a change to spaCy's documentation. (ideally, anyway)
  • I have updated the documentation accordingly.
  • I have added tests to cover my changes. (no tests for the downloader currently exist; any that we'd want to add would probably want to be CI-only, since they'll be extremely slow for anyone on a slow internet connection)
  • All new and existing tests passed.

@ExplodingCabbage
Copy link
Contributor Author

I'm gonna quickly add in docs, then this should be ready to merge.

There was an explicitly-declared `path` keyword argument, so 'path'
would never be present in `**overrides`. This line just overwrote
any manually-specified value the user might've passed to the `path`
parameter.
@ExplodingCabbage
Copy link
Contributor Author

ExplodingCabbage commented Nov 20, 2016

Docs added, and looked at (both on the site on the local server and in the README on GitHub) to make sure the formatting came out right. Still not quite ready for merge - in the process of documenting ways to specify the path at load time, I realised that passing a path to the Language constructor is broken (fixed by b0a07c2) and I want to:

  • Make sure it's properly fixed, and
  • Make sure the tests still pass

Once that's done, I'll hand this over to y'all.

@ExplodingCabbage
Copy link
Contributor Author

This is ready for review, I think.

Copy link
Member

@honnibal honnibal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great — especially appreciate the docs!

@ines
Copy link
Member

ines commented Nov 23, 2016

Agree – looks great and thanks for adding the sections to the docs and readme! 👍
Feel free to add yourself to the CONTRIBUTORS.md, btw.

@ines ines merged commit a7b5fba into explosion:master Nov 23, 2016
@ExplodingCabbage ExplodingCabbage deleted the specify-data-path branch November 23, 2016 12:08
@ExplodingCabbage
Copy link
Contributor Author

Feel free to add yourself to the CONTRIBUTORS.md, btw.

I'll leave it to you to do so or not as you see fit - I feel like I probably shouldn't be making the decision about whether I get added or not. :)

@ines
Copy link
Member

ines commented Nov 24, 2016

@ExplodingCabbage Sure, we'd be happy to add you! 😃

Just saw that your GitHub profile only shows your user name – which name would you like me to use for the CONTRIBUTORS.md? We usually list contributors as "Full name, username", but an alias instead of the full name would be fine, too.

@ExplodingCabbage
Copy link
Contributor Author

ExplodingCabbage commented Nov 24, 2016

@ines My real name is "Mark Amery" (and I've just updated my profile to show it). :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow spacy.en.download to take a file path
3 participants