Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update for Julia v0.6 #248

Merged
merged 17 commits into from
Jun 1, 2017
Merged

update for Julia v0.6 #248

merged 17 commits into from
Jun 1, 2017

Conversation

ExpandingMan
Copy link
Contributor

I have updated for 0.6, including fixes for all deprecation warnings.

The problem people were most frequently experiencing, and, as far as I can tell, the only problem that was causing an exception originated in @chain. In 0.6, the AST was changed so that a => b is Expr(:call, :(=>), :a, :b) rather than Expr(:(=>), :a, :b). I've changed the macro code to reflect this.

Base.take was being overloaded. Rather awkwardly, take is still in Base but deprecated, so it still allows the method to be extended but throws a deprecation warning if one does this. take was defined in some boilerplate code-generation, so I added a special clause telling it not to check for take in Base. This line should be removed once take is removed from Base (I added a comment to this effect).

slice is no longer in Base, so obviously slice no longer extends this function. One must now use mx.slice similarly to how one uses other functions in this module.

Otherwise, there were various other changes to type declaration syntax and to broadcasting operators (dotted operators are now accessed through broadcast(::typeof(op), ...)).

@ExpandingMan
Copy link
Contributor Author

ExpandingMan commented May 23, 2017

Note that the failing tests are on 0.5. I'm adding compat and a few more type declaration fixes now.

@ExpandingMan
Copy link
Contributor Author

ExpandingMan commented May 23, 2017

I've added compat for 0.5. Whether or not the Compat module is actually up to date is a different question.

Update: Indeed it seems that Compat is out of date on these platforms as it's throwing a "wrong number of arguments" error, that it should not throw.

@ExpandingMan ExpandingMan changed the title 0p6update update for Julia v0.6 May 23, 2017
@vchuravy
Copy link
Collaborator

Thank you for this!

You might have to update the Compat version in

Compat 0.9.5

@vchuravy vchuravy self-requested a review May 23, 2017 21:32
@vchuravy vchuravy self-assigned this May 23, 2017
@ExpandingMan
Copy link
Contributor Author

ExpandingMan commented May 23, 2017

Hm, yeah I'm a little confused as to exactly what the situation is right now with Compat. It definitely seems that all the errors being thrown on appveyor and travis originate from it. I'll look into this tomorrow.

Update: Hmm.... Compat is throwing the same error even though it is on 0.25.2. When I use it on 0.6 I'm getting no errors. I'll look into this a little more and if I can't figure out what's going on I'll ask people on discourse.

@codecov-io
Copy link

codecov-io commented May 24, 2017

Codecov Report

Merging #248 into master will decrease coverage by 0.49%.
The diff coverage is 78.78%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master     #248     +/-   ##
=========================================
- Coverage    68.9%   68.41%   -0.5%     
=========================================
  Files          24       25      +1     
  Lines        1859     1852      -7     
=========================================
- Hits         1281     1267     -14     
- Misses        578      585      +7
Impacted Files Coverage Δ
src/callback.jl 100% <ø> (ø) ⬆️
src/io.jl 79.89% <ø> (-0.11%) ⬇️
src/name.jl 92.85% <ø> (ø) ⬆️
src/optimizer.jl 69.56% <ø> (ø) ⬆️
src/base.jl 26.19% <ø> (+2.38%) ⬆️
src/initializer.jl 51.61% <ø> (ø) ⬆️
src/model.jl 70.27% <ø> (-0.78%) ⬇️
src/metric.jl 29.1% <ø> (ø) ⬆️
src/compat.jl 0% <0%> (ø)
src/optimizers/rmsprop.jl 100% <100%> (ø) ⬆️
... and 15 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 172af60...322f7da. Read the comment docs.

@ExpandingMan
Copy link
Contributor Author

Ok, the Julia devs have told me that Compat won't work for struct or mutable struct, presumably because they don't throw deprecation warnings. I suppose I'll have to revert those, am I correct in assuming that 0.5 compatibility is a priority? Since Julia is pre-1.0 my philosophy is usually that everything should be kept up-to-date, but I know not everyone sees it the same way.

@pluskid
Copy link
Member

pluskid commented May 24, 2017

@ExpandingMan Thanks! I believe our policy is to make the latest release No.1 priority and latest features are always welcome as long as there can be a workaround with Compat.

@ExpandingMan
Copy link
Contributor Author

ExpandingMan commented May 24, 2017

Ok, in that case I'll revert to type and immutable. It's really unfortunate that one can't use current syntax and still be compatible with 0.5, but it will take very little effort to change this one particular thing in 0.7. I'll make another commit soon.

@ExpandingMan
Copy link
Contributor Author

This is getting ugly. Apparently 0.5 wants to use the wrong dotted operators (i.e. it uses .+ instead of broadcast(::typeof(+),...). The only way around this is to conditionally define those methods like I did for @chain.

@ExpandingMan
Copy link
Contributor Author

ExpandingMan commented May 24, 2017

Ok, this now works on both 0.5 and 0.6 though there is now some slightly ugly compatibility code.

Because I hate ugly compatibility code mucking up my regular code, I put as much of it as I could in the new "compat.jl" file. It contains the following:

  • The @chain definition for 0.5.
  • Imports of broadcasted operators which are deprecated in 0.6.
  • The @compatdot macro which redefines broadcasted operators in 0.5 to use .+, .- et cetera, as opposed to broadcast(::typeof(op), ...).

I have added MacroTools.jl as a requirement (I use it in @compatdot) and I'd like to try to persuade you guys to go along with this. I find that MacroTools is invaluable for writing macro code (look how short my code is) and as I find it highly likely that more macro code will be added to MXNet at some point I think it's a really good thing to have around. It is used by tons of packages, so it's highly likely that most if not all users already have this (it really, really should be in Base). (Ironically it's not super-useful for @chain because of the recursive nature of @chain, though there probably is a nicer way of writing it using MacroTools.)

Appveyor and travis now run on both 0.5 and 0.6.

@ExpandingMan
Copy link
Contributor Author

It occurred to me earlier that I can circumvent the need to write separate @chain macros for 0.5 and 0.6 by rewriting it using MacroTools. I'll do that tomorrow. Please let me know if there's any strong opposition to requiring it. It'll seem more useful once I rewrite @chain.

@vchuravy
Copy link
Collaborator

Having MacroTools should be fine as a dependency.

@Evizero
Copy link

Evizero commented May 25, 2017

Just FYI, the appveyor.yml URLs just recently changed and a wide array of automated PRs were opened to update these. see JuliaImages/ImageInTerminal.jl#14 for a diff

@ExpandingMan
Copy link
Contributor Author

ExpandingMan commented May 25, 2017

For some reason Travis is timing out on 0.6. I have no idea why as I am having no problems.

@ExpandingMan
Copy link
Contributor Author

I have rewritten @chain so that it is independent of the AST. It should therefore work on both 0.5 and 0.6. I have also added a very basic test of @chain.

As of this morning, appveryor didn't seem to running at all until I made the update suggested by @Evizero. Now, however, it seems to fail to build libmxnet.so. I have pretty much no idea why this is happening as all this Windows and powershell stuff is inscrutable to me. The lines I change don't appear to be specific to mxnet. Any ideas on this? Perhaps somebody who uses windows can chime in?

Last night travis seemed to time out on 0.6 only. I have absolutely no idea why this happened, everything is working fine for me on 0.6 on linux locally. We'll see if it does this again.

@ExpandingMan
Copy link
Contributor Author

For some reason appveyor decided to stop throwing errors all of the sudden.

Travis is still timing out on 0.6. This happens when building mxnet. As this is a build error, I'm not quite sure how to fix this, and it wouldn't seem to have anything to do with this particular PR. Any advice on this? Has this been encountered before?

Otherwise everything looks ok now.

@vchuravy
Copy link
Collaborator

Thanks, I will be looking into the timeouts.

There is one legit failure on travis:

MNIST Test: Error During Test
  Test threw an exception of type MethodError
  Expression: mnist_fit_and_predict(mx.AdaDelta(), mx.NormalInitializer(), 2) > 90
  MethodError: no method matching *(::MXNet.mx.NDArray, ::MXNet.mx.NDArray)
  Closest candidates are:
    *(::Any, ::Any, !Matched::Any, !Matched::Any...) at operators.jl:424
    *(::MXNet.mx.NDArray, !Matched::Real) at /Users/travis/.julia/v0.6/MXNet/src/ndarray.jl:642
    *(!Matched::Real, ::MXNet.mx.NDArray) at /Users/travis/.julia/v0.6/MXNet/src/ndarray.jl:646
  Stacktrace:
   [1] (::MXNet.mx.##8142#8143)(::MXNet.mx.NDArray, ::MXNet.mx.NDArray, ::MXNet.mx.NDArray) at ./<missing>:0
   [2] broadcast_c(::Function, ::Type{Any}, ::MXNet.mx.NDArray, ::MXNet.mx.NDArray, ::Vararg{MXNet.mx.NDArray,N} where N) at ./broadcast.jl:335
   [3] broadcast(::Function, ::MXNet.mx.NDArray, ::MXNet.mx.NDArray, ::MXNet.mx.NDArray, ::Vararg{MXNet.mx.NDArray,N} where N) at ./broadcast.jl:434
   [4] update(::MXNet.mx.AdaDelta, ::Int64, ::MXNet.mx.NDArray, ::MXNet.mx.NDArray, ::MXNet.mx.AdaDeltaState) at /Users/travis/.julia/v0.6/MXNet/src/optimizers/adadelta.jl:84
   [5] (::MXNet.mx.#updater#8125{MXNet.mx.AdaDelta,Dict{Int64,Any}})(::Int64, ::MXNet.mx.NDArray, ::MXNet.mx.NDArray) at /Users/travis/.julia/v0.6/MXNet/src/optimizer.jl:235
   [6] #fit#8263(::Array{Any,1}, ::Function, ::MXNet.mx.FeedForward, ::MXNet.mx.AdaDelta, ::MXNet.mx.MXDataProvider) at /Users/travis/.julia/v0.6/MXNet/src/model.jl:514
   [7] (::MXNet.mx.#kw##fit)(::Array{Any,1}, ::MXNet.mx.#fit, ::MXNet.mx.FeedForward, ::MXNet.mx.AdaDelta, ::MXNet.mx.MXDataProvider) at ./<missing>:0
   [8] mnist_fit_and_predict(::MXNet.mx.AdaDelta, ::MXNet.mx.NormalInitializer, ::Int64) at /Users/travis/.julia/v0.6/MXNet/examples/mnist/mlp-test.jl:34
   [9] test_mnist_mlp() at /Users/travis/.julia/v0.6/MXNet/examples/mnist/mlp-test.jl:79

@pluskid
Copy link
Member

pluskid commented May 26, 2017

Thanks a lot! I'm also quite OK with adding MacroTools as dependency.

@ExpandingMan
Copy link
Contributor Author

ExpandingMan commented May 26, 2017

Hmm... it would seem that * isn't getting imported from Base properly. I'm quite confused as to why this would happen on Mac but not Linux.

I'll be a bit busy this weekend but I'll take a more careful look at it soon.

@ExpandingMan
Copy link
Contributor Author

I have identified the problem that is causing the crash and it is rather strange, see here. It's not actually clear to me how to fix this, so I'll wait on a response. Previously I was not running this test as I did not have ENV["CONTINUOUS_INTEGRATION"] set.

@ExpandingMan
Copy link
Contributor Author

Ok, I have fixed the method error that was being thrown, but there is a bit of a caveat.

The emerging semantics of Julia seem to be that .* necessarily implies that some sort of loop over * is taking place. Several optimizations are implemented to this end, including one for x .* x, that is, cases where a symbol is repeated. See the discussion here. Calls to x .* x in MXNet were causing the original Base broadcasting method to be called.

I don't know what the ultimate resolution of this will be, but at the moment it seems to me that the proper thing to do would be to use a different symbol such as for Hadamard products in cases where no loop over * is implied, as is the case here.

That would be a non-trivial change, and I'm still not 100% sure that it will turn out to be the right one in the long run. For now, I have changed x .* x to broadcast(*, x, x) in the few cases where there was an ambiguity (these were all in the optimizers).

Appveyor seems to be failing to build libmxnet.so. Help with it would be appreciated.

@vchuravy
Copy link
Collaborator

Appveyore failed to download the prebuilt binaries. I will look into making that a bit more stable today.

@ExpandingMan
Copy link
Contributor Author

Silly me, I forgot that I hadn't overloaded broadcast for 0.5. There is now yet another ugly compatibility macro. Again, some of this can be changed once this is resolved.

By the way, I have seen one test that fails intermittently but very rarely (I've only seen it fail once):

INFO: NDArray::assign::dims = (6, 4, 7, 4, 3, 7)                   
NDArray Test: Test Failed                            
  Expression: reldiff(zeros(Float16, size(tensor)) + scalar, copy(array3)) < thresh
   Evaluated: 0.001333792834973383 < 0.001                       
Stacktrace:                                                      
 [1] test_assign() at /home/expandingman/.julia/v0.6/MXNet/test/unittest/ndarray.jl:65
 [2] macro expansion at /home/expandingman/.julia/v0.6/MXNet/test/unittest/ndarray.jl:401 [inlined]
 [3] macro expansion at ./test.jl:856 [inlined]                         
 [4] anonymous at ./<missing>:?    

I'm reasonably sure that this has nothing to do with my changes.

@pluskid
Copy link
Member

pluskid commented May 31, 2017

I think the issue with the Float16 looks fine.

@ExpandingMan
Copy link
Contributor Author

Travis is having a bit of a tantrum on Mac. Appveyor is having the same problem getting the binary it was having before.

Finally 0.5 and 0.6 both pass with no errors on Linux. It's funny how this works: Get working on 0.6, very little effort; Get working on both 0.5 and 0.6, big pain.

Copy link
Collaborator

@vchuravy vchuravy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for all the hard work! This looks good to me.

@pluskid I would be happy to break compatibility with 0.5 soonish and maintain 0.5 on a separate stable branch until 0.7/1.0.

@pluskid pluskid merged commit 2b26bc2 into dmlc:master Jun 1, 2017
@pluskid
Copy link
Member

pluskid commented Jun 1, 2017

@ExpandingMan Thanks a lot for the great efforts!

@vchuravy I am looking at the Julia milestone: https://github.com/JuliaLang/julia/milestones, since v0.6 is going to be released 3 months ago, I guess it would be OK to start to plan to move ahead for us, too.

Let us aim to make another (potentially minor) release for our final v0.5 package, and then the master can start introducing breaking changes for v0.5.

BTW: @vchuravy I saw there was a new release that you made ~two days ago but it seems you forgot to update our NEWS.md. Can you add that? It is good to have a file to keep track of all the versions. Thanks a lot!

@ExpandingMan
Copy link
Contributor Author

Glad to be of help. I'm going to post an issue about the residual problem with the overloading of .* as there is certainly the potential for users to run into an error. Presumably we are waiting a bit on a consensus in the Julia community about appropriate uses of dotted operators to decide on the best way of fixing it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants