Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows Compatibility #199

Merged
merged 1 commit into from
Aug 13, 2015
Merged

Windows Compatibility #199

merged 1 commit into from
Aug 13, 2015

Conversation

crohkohl
Copy link

@crohkohl crohkohl commented Aug 7, 2015

On my machine the image serving, e.g. of the mean.jpg does not work.
The browser (tested IE and Chrome) cannot interpret the image probably due to the missing content type. The send_file function takes care of that all.

@gheinrich gheinrich closed this Aug 7, 2015
@gheinrich gheinrich reopened this Aug 7, 2015
@gheinrich
Copy link
Contributor

@crohkohl: Travis tests failed because docs were not updated. If you wish to update docs and pass the test you may do:

pip install git+https://github.com/lukeyeager/flask-autodoc.git
/scripts/generate_docs.py

@crohkohl
Copy link
Author

crohkohl commented Aug 7, 2015

@gheinrich

I am all new to using github and python.
I am trying to port DIGITS to windows - so the majority of my changes will be windows related.

Thanks for the doc-tip!

@crohkohl crohkohl changed the title Fixed broken image serving in development server Windows Compatibility Aug 7, 2015
@lukeyeager
Copy link
Member

Hi @crohkohl,

I would be happy to merge some simple changes which make it easier for you to use DIGITS on Windows. Can you separate the changes for fixing file serving out from the Windows compatibility changes? That will make this easier to review. And it would be great if you could squash your Windows compatibility fixes into a single commit. These instructions may be helpful:
https://github.com/edx/edx-platform/wiki/How-to-Rebase-a-Pull-Request#squash-your-changes

And I'll need you to send us a signed copy of the CLA before we can merge any changes.
https://github.com/NVIDIA/DIGITS/blob/master/CONTRIBUTING.md#pull-requests

@crohkohl
Copy link
Author

crohkohl commented Aug 7, 2015

Hi @lukeyeager ,

thanks for that tip, that squashing is good to know. I separated the image serving fix into #202 and just squashed the windows-related changes.

Once a pre-compiled _caffe.pyd is available for windows (took some time, still working on perfecting it) running DIGITS on windows should hopefully be no big deal.

The CLA is sent.

@lukeyeager
Copy link
Member

@crohkohl great, thanks!

Can you remove the file serving commit from this branch and rebase on top of the master branch so that TravisCI will run again?

And sorry about all the documentation tests failing. I'm seriously considering giving up on all that nifty autodoc stuff.

@lukeyeager
Copy link
Member

I think you need to git fetch --all to get the latest changes on the master branch before rebasing.

@crohkohl
Copy link
Author

crohkohl commented Aug 7, 2015

okay, I fetched all, merged everything and removed that commit. We should be all set now.

@crohkohl
Copy link
Author

crohkohl commented Aug 7, 2015

only in case of binary files. On linux there is no difference. But in windows writing a text file is different from a binary file.

@lukeyeager
Copy link
Member

I pulled your LevelDB-related change out into a separate pull request: #203.

@crohkohl
Copy link
Author

crohkohl commented Aug 7, 2015

that is a good idea

@crohkohl
Copy link
Author

crohkohl commented Aug 9, 2015

Okay, I think I managed to adapt most of the code. With the latest commit I managed to perform:

  • database import
  • training of classifier
  • apply of classifier

Here some remarks:

  • The async file reading was a real pain and took me almost all day to get it working.
  • The devices class is very difficult to adapt in its current form. On windows there can be an arbitrary number of frameworks installed and the naming is not just cudart.dll but e.g. cudart64_75.dll. Also the nvsmi dll is most likely not in the search path. Would it be possible to use the devices output from pycaffe? It should always use the correctly linked cudart dll.

@lukeyeager
Copy link
Member

Ok, it looks like you have a lot of different things going on in this PR.

Safe changes

  • Look for the Caffe executable at caffe.exe instead of caffe in config.caffe_option
  • Read and write binary files explicitly with the 'b' flag throughout the code
  • Convert filesystem paths to URL paths in documentation
    • Would using the posixpath module make sense here? (documentation)
>>> os.path.abspath('.')
'/home/lyeager'
>>> ntpath.abspath('.')
'\\home\\lyeager'
>>> posixpath.abspath('.')
'/home/lyeager'

Needs some more discussion

  • Convert filesystem paths to URL paths in task.py
    • If you're doing it in Task, why aren't you in Job? (here)
    • Are you sure you want to do this? It seems like you're going to need the windows-style paths internally. Does the test suite pass for you?
  • Set different library names in device_query
    • libnvidia-ml -> nvml is fine
    • libcudart -> cudart64_75 seems unnecessarily tied to the platform and CUDA version. Is there any way to load the lib in a generic way?
  • Changes to nonblocking_readlines()
    • I don't understand what's going on here

Merging/rebasing

Now that I've merged #203, you should be able to rebase on the master branch and remove your LevelDB-related changes. You seem to be having a little trouble with git still. These are the basic commands that I think you'll need:

# checkout your development branch 
#    assumes master is tracking crohkol/master
git checkout master
# get the latest changes from NVIDIA/DIGITS
git fetch --all
# move your changes on top of the master branch of NVIDIA/DIGITS
#    assumes nvidia is your name for the NVIDIA/DIGITS fork
# you'll probably have to deal with some merge conflicts - follow the advice git gives you
git rebase -i nvidia/master
# push your changes back to this branch
#    assumes crohkohl is your name for the crohkohl/DIGITS fork
git push crohkohl/master

This is a helpful comment about rebasing:

In future work please make use of rebase instead of merge, as our policy is to only have merge commits for PRs. Thanks.
BVLC/caffe#2782 (comment)

@crohkohl crohkohl force-pushed the master branch 2 times, most recently from 7aad31c to 7bef237 Compare August 10, 2015 23:11
@crohkohl
Copy link
Author

@lukeyeager I never used git so extensively ... but I think I am understanding the rebase strategy now - it is much cleaner!

Safe Changes

Convert filesystem paths to URL paths in documentation
Would using the posixpath module make sense here?

Those modules return all the same:

>>> os.path.abspath('.')
'E:\\deep-learning\\DIGITS\\digits\\utils'
>>> ntpath.abspath('.')
'E:\\deep-learning\\DIGITS\\digits\\utils'
>>> posixpath.abspath('.')
'E:\\deep-learning\\DIGITS\\digits\\utils'

Needs some more discussion

Convert filesystem paths to URL paths in task.py

you're doing it in Task, why aren't you in Job?
I only fixed it were it lead to problems due to its usage. Sometimes the functions called with the result can handle all kinds of paths and sometimes not.

Are you sure you want to do this? It seems like you're going to need the windows-style paths internally. Does the test suite pass for you?

You are right, I would propose to only change them were necessary. The testing was a good idea! I had to change a few more things. I added them to the commit. Here is the test output. A few tests fail due to permission issues in nosetests with the image upload. However, the upload works outside of the tests.

cudart localization

libcudart -> cudart64_75 seems unnecessarily tied to the platform and CUDA version. Is there any way to load the lib in a generic way?

I implemented a more generic version that loops through the version numbers from top to bottom

Changes to nonblocking_readlines()

I don't understand what's going on here

fcntl is used on unix to prevent blocking the thread when waiting for a response from the subprocess.
This is not available on windows. But without it, the website hangs when a job is running and only continues after the job is finished.
By using gevents this problem can be overcome as it provides a function for reading from a separate thread on windows to simulate the fcntl behavior.

@lukeyeager
Copy link
Member

Thanks for the fixes, this is getting close to merge-able.

Still an issue

Failing tests

Those all look like they could be related to the tempfilemodule. I don't have a Windows machine to debug this on, sorry.

Path substitution in task.py

You haven't removed the path substitution from Task or added it to Job. Are you waiting to get the tests fixed first?

cudart localization

Ok, I guess that works. Can you can clean it up a little bit by only checking for those things if platform.system() == 'Windows' and by querying the architecture with platform.architecture()[0]? Thanks.

Resolved

posixpath not working

Weird. Whatever, the \\ substitution works fine for the doc generation.

Changes to nonblocking_readlines()

Cool, thanks for the explanation!

@crohkohl
Copy link
Author

Failing tests

With the latest changes all tests pass. Here is the output for your reference.

I had to modify some of the test scripts due to problems with tempfile.NamedTemporaryFile() file permissions on windows , see e.g. here or here for reference. There is no easy way around this. So I just create the files myself if necessary and try to do a delete afterwards. However, some python functions in the test code do not close file handles properly which in turn again leads to exceptions I need to handle because they are not test-relevant.

Path substitution in task.py

Sorry, I overlooked this. I added it in Job.pyaswell.

cudart localization

Modified it according to your suggestions

@@ -187,6 +189,9 @@ def get_version(executable):
elif platform.system() == 'Darwin':
# XXX: guess and let the user figure out errors later
return (0,11,0)
elif platform.system() == 'Windows':
# XXX: guess and let the user figure out errors later
return (0,12,0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we guess v0.11 for Windows too?

I'm working on a patch for Caffe that exposes the version number as caffe.__version__, but until then I think we should just assume the worst.

@crohkohl
Copy link
Author

okay, I addressed all of your comments in the last commit. I rebased and squashed everything into a single commit again.

@lukeyeager
Copy link
Member

Awesome, just two more things:

  1. I count 11 files that include that annoying little whitespace change on the first line. Can we get rid of those?
  2. It's a bummer that NamedTemporaryFile doesn't work on Windows - I really like the cleanliness of the with statement. But it's worse to put platform-specific code throughout.
    1. Use tempfile.mkstemp() for all platforms.
    2. Use tempfile.mkstemp(), not tempfile.mktemp() - it's deprecated.
    3. Make sure to close the file in every case - looks like you're missing it in utils/test_image.py.

Sorry for all the nitpicking! It takes a while to review things when you put several different changes in a single PR. This is also taking longer because I can't actually test for any of these cases you're trying to fix.

@crohkohl crohkohl force-pushed the master branch 2 times, most recently from 20acce0 to 2ef3a3c Compare August 13, 2015 07:32
- binary files need to be opened with 'rb' or 'wb' flag
- leveldb package is not available for windows - display warning instead of crashing
- adapted find_executable with fallback for windows ".exe" extension and let the version check pass
- normalization of path separators to linux style
- turn off fcntl on windows
- Making tests pass and support of file upload
- portable cudart localization
- Make tests pass + refactoring of existing modifications
@crohkohl
Copy link
Author

That's no problem at all. I also want to do it right.

  1. I fixed those line endings - I always thought git could manage that. I actually enforce LF in my commits but somehow it just does not work even though git is always saying it replaces CLRF with LF. I just did it manually now.
  2. I also do not like how that plays out - but there does not seem to be a better solution.
    1. done
    2. done
    3. I just reproduced the original code which also did not delete the created file or did I miss something? I remove the file now after the processing.

On windows all tests are passing.

@lukeyeager
Copy link
Member

LGTM, thanks!

lukeyeager added a commit that referenced this pull request Aug 13, 2015
@lukeyeager lukeyeager merged commit df401d9 into NVIDIA:master Aug 13, 2015
@crohkohl
Copy link
Author

That's great, thanks!

@d4nst
Copy link

d4nst commented Aug 14, 2015

Thanks for this guys!

One last thing, it would be nice that the digits-server could be used from windows as well (right now is a bash command).

@lukeyeager
Copy link
Member

@danst18, Windows support isn't really a priority for us. But if you find a way to rewrite the digits-server script it in a cross-platform way, I'd be happy to review and merge a pull request.

@crohkohl
Copy link
Author

@danst18 The major problem right now is that gunicorn does not run on windows. But they have been working on it, see e.g. benoitc/gunicorn#524.
I do not know if there are any results yet.

crohkohl pushed a commit to crohkohl/DIGITS that referenced this pull request Aug 16, 2015
@lunzueta
Copy link
Contributor

lunzueta commented Sep 9, 2015

Hi all.

I've been able to compile Nvidia's Caffe fork in Windows (see for details: willyd/caffe-builder#6), and now I'm doing some tests with Digits using it. I've been able to create the MNIST dataset, but when training the LeNet model I get the following error message:

Traceback (most recent call last):
  File "(MY_PATH)\DIGITS\digits\scheduler.py", line 454, in run_task
    task.run(resources)
  File "(MY_PATH)\DIGITS\digits\task.py", line 197, in run
    close_fds=False if platform.system() == 'Windows' else True,
  File "C:\Anaconda\lib\subprocess.py", line 710, in __init__
    errread, errwrite)
  File "C:\Anaconda\lib\subprocess.py", line 958, in _execute_child
    startupinfo)
WindowsError: [Error 5] Access denied

I'm not a Python expert so I don't know what to do next. Why could I be getting this error?

@deanhough
Copy link

The last comment I see was dated Sept9, 2015. Possibly dumb question... is NVidia's DIGITS available for Windows 7?

@lukeyeager
Copy link
Member

It is not supported. But if you can get NVIDIA/caffe built on Windows, then you shouldn't have any problems getting DIGITS to run.

Please note that Ubuntu 14.04 is the only officially supported OS at this time, although DIGITS has been successfully used on other Linux variants as well as on OSX. If you want to use DIGITS on an alternative OS, your main obstacle will be building Caffe. Please refer to BVLC's installation docs, user group and/or GitHub issues.
https://github.com/NVIDIA/DIGITS/blob/digits-3.0/docs/BuildDigits.md#building-digits

@lukeyeager lukeyeager mentioned this pull request Mar 25, 2016
@lunzueta
Copy link
Contributor

lunzueta commented Apr 3, 2016

@houghdsla I finally was able to run Digits on Windows 10 (check the last comment here: willyd/caffe-builder#6)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants