-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal for better module system #8014
Comments
This is a very good writeup, thanks. However this is almost identical to what I'm also skeptical of extending all functions accessed via |
Also see #8000 regarding reducing the number of keywords (import/using/importall). |
Thanks. There are similarities, of course, but there some very important differences.
If someone writes a function definition unaware that Foo contains a function with the same name, it won't matter b/c their version will just take precedence over the prior version, so as long as they have no need of the other it doesn't make any difference. Also the compiler can produce warnings (given a flag to do so) to let us know when that happens, so we have an easy way to check. Note, it is possible that Julia should adhere to a philosophy that all packages have only a single module point of entry. That seems to be the conventional way of doing things at this point. For that, this proposal would still work with one change, basically my last point above would be disregarded --load would only be allowed to take a single package/file name. However, doing that would require large projects to divide their code up into multiple small packages in order to follow good modular practice. That might be a good thing, but for it to work it would need to be dead simple to create packages and it should be possible to make multiple packages for a single project. |
As @timholy has said a couple of times, you can't get rid of
There has yet to be any compelling argument made for open modules in Julia. There are two main things that open classes are used for in Ruby:
Both of these can be accomplished cleanly and safely with multiple dispatch without any need for open modules. As an escape hatch, you can already eval code inside of a module to alter it after it's been closed, but that's discouraged. More significantly, I can't think of a single package in the Julia ecosystem that has actually had to do this. If there were some compelling use cases for open modules, I would certainly consider it, but so far I'm not seeing any.
Your This convention has been extremely successful: it cuts down on pointless repetition in programs and enforces consistent, predictable naming of the entry point of a package – I immediately know from the name of a package where loading it starts. When loading and import are separated, they can have different names, which means I have no idea from the name of a package, what the entry point is. If anything, we need more conventions that make the structure of packages predictable, not fewer. #4600 takes a nice step in that direction by encouraging (though not forcing) the directory structure of packages to mirror their submodule structure.
This means that you need to know two things to use a package (or other code): the path to the entry point and the name of the module that it defines. It also means that it is possible for those to be out of sync: different capitalization, underscores to separate words or not, an entry point whose name is unrelated to the provided module name. In the best case, separating these forces the programmer to write the same name twice – once to load something and again to use it. In the worst case, this forces the programmer to use two different names: one name to load code and another to use it. Having a one-to-one correspondence between packages, package entry points, and modules is great. If I load a package called |
I will second that the discussion in #4600 will be helpful for moving packages from Regarding extending functions: a definition with the same name cannot just "take precedence", because when you add a method in julia the generic function is modified. If Foo defines The alternative is to use "layered modularity", which looks like this:
Now my module has its own One could argue that this form of extension should be the default (I believe that was considered at one point). However the difference between this and actually modifying |
@StefanKarpinski Again the problem with include is that it allows non-modular composition of code. This is very bad practice and tends to invert code hierarchies. Instead of
You see:
So the code it "bar.jl" is not modular but just free floating code. In addition it violates SOC, because code defined in the top-level of the loading file is visible to the included file. That is analogous of defining functions that can see variable outside their definition. I could go on, but I have already done so in the mailing list. It's basic modular design principles, and from that is every reason not to have
It's not something you have do within a package b/c obviously one can always put everything in one file --of course I see a lot of examples of code in the wild instead using include to mimic open modules, and sadly not using modular design. That's in itself is a good reason for it. But it can be a life saver for modifications between packages. Think of the case when a package has lost it's maintainer and a bug has been found in the package. What to do? Someone could fork it, but then they have to rename and release it (and that means modifying your code to match), plus all the work of managing the fork. But then the maintainer turns back up or a new maintainer comes along and the fork was all a waste of time. Would have been a lot easier just to be able to "patch" the code in your code and avoid all the trouble. The basic problem with not having open modules is that it all but forces you to use include (creating non-modular code dependencies) b/c otherwise all of a modules code would have to be in a single file.
If that's Julia's package philosophy, that's fine. But if so, why all the pseudo-separation between package, module and file? Just totally embrace it. Require every package to have a Foo.jl file named after the package name, and just ditch Also, with that package philosophy, it needs to be dead simple to create packages, even multiple packages within a single project. That will be really useful to larger projects that could have multiple, but related reusable components. [edit formatting: @StefanKarpinski] |
@JeffBezanson By "take precedence" I was meaning "overwrite". But then, I may be confused by the meaning of "extend" here, because I was also thinking in terms of the function being redefined within another module that it was imported into. I thought that's what you meant. Partially aside, are you saying that it is possible to do this:
And thus change the meaning of |
@trans can you send us a link to some Julia code you've written in the style you are proposing so we can better understand your perspective? Obviously the keywords will be a bit different but you can do most of it with the current machinery. I ask because my gut feeling is "rejection" of your proposal (I like |
Let's go through the arguments for this proposal again. Eliminates Open modules are good. We still don't have any concrete examples of useful applications of open modules in Julia. The one hypothetical example is a situation where a package maintainer has gone AWOL and we need to use open modules to monkey patch their code. This is a truly terrible way to deal with this problem. Forking the repo and taking up maintenance is a much better solution. Doing so wouldn't require renaming the package – it's a simple matter of changing one URL file in METADATA.jl. The package manager handles this with zero fuss – most people using the package would not even notice the transfer of ownership. If you do desperately need to monkey patch a module, you can already do so using the module's eval: julia> module Foo
bar = 1
end
julia> Foo.bar
1
julia> Foo.baz
ERROR: baz not defined
julia> Foo.eval(:(baz = 2))
2
julia> Foo.baz
2 For reasons I've already given, multiple dispatch makes it exceedingly rare to need to do such a thing. There's absolutely no compelling reason to make this any easier to do. It might even be a good idea to disallow it altogether instead of making it easier. It separates loading from import. We had this before and and still have the ability to do this with It's more modular. The only specific complaint about modularity of the existing system is the use of In general, I think it would be a more productive use of everybody's time if you used Julia's existing system for a while before deciding to redesign it. If the documentation is unclear or you can't understand how or why something works the way it does, please ask, but please hold off on the grand redesigns until you have a bit more context and experience with the current state of affairs. |
Yes, yes, include isn't modular. We get it already. For crying out loud. |
@IainNZ It's not possible to give a working example, not even a similar style, primarily b/c there is no support for open modules. But I can give you an example of what it basically would look like. I modified my Corpus program so you could see. Check it out at https://github.com/openbohemians/corpus-julia/tree/master. This program is not finished. I got as far as getting the Ngrams working b/c that's the part I had to have for another project, but you can see where I am headed with the Words and Letters modules. Eventually they would probably have shared Utils module too. For comparison you can look at tag 0.1 (https://github.com/openbohemians/corpus-julia/tree/0.1) which is the working version (using include). |
In that code, how would you want to use open modules? |
Do you honesty think I would advocate deprecating
Hmm... I've been programming for 30 years. It's not too hard to see that Julia's current system is a bit of a "stinker". I know you love include. But there's an old saying that sometimes "you must kill your darlings" (William Faulkner). Sorry, but
It's already using them by allowing the submodules to be clearly parked inside the main module within their files. That way it is easy to tell how the hierarchy fits together in every file. BTW, I actually find the #8015 proposal a bit more compelling --albeit its not quite as well fleshed out. But it's more interesting for it's pure simplicity. Even so, I thought it important to vet both of these approaches (in separate issues). |
It's been said about a million times now that
It's tempting to counter your candid characterization of Julia's module system cum Faulkner quote with that timeless piece of folk wisdom, "he who smelt it dealt it." But that may be taking the level of discourse even further down, so I'll restrain myself. Instead, I'll ask this: Of those 30 years, how many were spent programming in a multiple dispatch language? Unless you're a Dylan or Common Lisp programmer, the answer is likely none. This absence of experience with Julia's central paradigm shows: there's no indication in any of this exchange of any comprehension of how generic functions work, let alone their design impact on a language and its module system. I'm not saying that Julia's current modules system is perfect by any means – hence the various issues opened, mostly by me, with specific, practical plans to improve it. It might help, before trying to design a module system for a multiple dispatch language, to, I dunno, do some programming in a multiple dispatch language.
Yes, I see. It seems that the only thing that's gained in this example by having open modules is the the ability to write these additional lines in each submodule: module Corpus
# exactly what goes in the file under #4600
end You'll forgive me if I don't fall all over myself breaking all the packages in the Julia ecosystem so that we can write two extra lines in every file while also making code harder to refactor.
Fair enough. I think we're pretty much done here. |
| It's tempting to counter your candid characterization of Julia's module system cum Faulkner LOL! Though I point out you didn't exactly restrain yourself ;-) | Instead, I'll ask this: Of those 30 years, how many were spent programming in a multiple dispatch You mean like VB and Java? [Ok. Overloading is not exactly the same as multiple-dispatch, but in use it serves essentially the same purpose.] | Yes, I see. It seems that the only thing that's gained in this example by having open modules is It's not about writing them. It's about knowing them. It's like having page (or section) numbers in a book. Who wants a book without those? I fully take your point on backwards compatibility. That's always a tough ride, yet sometimes what it takes. Anyway, enjoy those includey aromas! ;-) |
@StefanKarpinski It just occurred to me, you keep telling me things are different b/c of multiple-dispatch, and that I don't have that experience to know how to do it right. If that is so, I would like to understand. So could you show me how I am supposed to organize my Corpus project using multiple-dispatch? That would be great. |
Let's do try to keep this positive (I really like your last post, @trans). @trans, I think the frustration that Stefan feels comes from the fact that, to anyone who knows how these things work internally, your proposal to eliminate Said my peace. Sincere best wishes in organizing Corpus to your satisfaction. |
Not even close. If you think that's true, then you also believe that static overloading is the same thing as virtual single dispatch, which means you don't really understand single-dispatch object orientation either. This whole interaction is like someone who hasn't bothered to understand Hindley-Milner and doesn't know what lazy evaluation is showing up on the Haskell forums and trying to redesign parts of the language. Generic functions already provide so much openness and extensibility that we are not in a position where we need more of it – if anything, we need ways to constrain ourselves. If you're genuinely interested in learning Julia rather than playing armchair language designer, then multiple dispatch is the one thing you absolutely have to comprehend. Julia is not a faster Ruby or Python with some type annotations. Nor is it a more dynamic C++ or Java. Until you really get the paradigm and why it's so powerful, you have no idea what the language is about.
On my Kindle, I turn off all location markers because I don't want any distractions on the page while I'm reading. If I do suddenly forget what book I'm reading or decide I really need to know what chapter I'm in, I can tap the screen and see that information without it always cluttering my file–I mean screen. |
This proposal is not even really a redesign of the module system. It
The third point is very well taken, but I don't believe it requires a reshuffling of keywords or the couple other random things in here. The proposal in #4600 is an extremely small change that appears able to remove most uses of I don't understand why, in trying to maximize modularity, you would add open modules. This means that any file loaded at any time can "dump" more things into an existing namespace. Say what you will about |
@StefanKarpinski I am happy to wrong. But I really need you to show me so I can understand what you are talking about. How would you fix my Corpus project using "powerful" multiple dispatch? I can't really understand it until I can see it in action. Btw, I once again tried to make what seems like a simple enough transition to includes from my pseudo open modules --as you said, it was nothing more than "additional lines in each submodule". So I took those lines out and put in the includes (https://github.com/openbohemians/corpus-julia/blob/master/src/Corpus.jl) but I get an error I can't seem to fix:
P.S. And this is exactly the problem that began all of this discussion months ago. And back then you told me the same thing: multiple-dispatch. So I really want to understand this. |
You need See: http://docs.julialang.org/en/latest/manual/modules/#relative-and-absolute-module-paths |
Overloading ≠ multiple dispatch: http://stackoverflow.com/a/483649/659248. A more significant difference is that generic functions reify collections of related methods as a first-class object. This bridges the gap between object orientation and functional programming, in particular, solving the expression problem elegantly and intuitively. Perhaps this notebook or this talk will help clarify further. Your Ngrams problem has nothing to do with open modules or multiple dispatch, you just need to import Ngrams before you can use it. |
@JeffBezanson Just to clarify, I wasn't suggesting adding The main reason I like open modules is because it allows you "describe" the hierarchy in each file and glue it all together from the toplevel. A things are you can't see how things fit together without finding where in the code a file gets pulled in. The ability to do a patch is an added benefit (though some think it otherwise). But its sill modular, b/c in this design the files aren't the modules. |
@StefanKarpinski Oh, so now it has nothing to do with it? |
You never described an actual problem before. |
Let me clarify, our position is not " |
This is one of two proposals for a better module system for Julia. The goal is to promote good modular design. This proposal, as opposed to my other proposal, supports multiple modules per file.
The main concept is that files are loaded into a program always at the toplevel. Once loaded any module within those files are available for use, either through their absolute paths or by way of importing.
I will use the function
load
in the examples, but obviously another term can be used. When a file is loaded, it's modules are made available as are it's exported functions via absolute paths. e.g.Any toplevel code in a file, i.e. outside the scope of a module or function, is processed only once on first load at compile time. This can be used to create computed resolutions, such as dynamic loads. e.g.
With parentheses
load
works like a function. So in the above,something
is a variable. Without parenthesesload
is more like keyword and treats the following text as if it were a function argument in quotes, i.e.load foo
->load("foo")
. It also assumes an extension of.jl
. Multiple terms separated by.
are converted to/
, hence directory separations. Soload Foo.Bar
looks forFoo/Bar.jl
. (Whether letter case matters can be decided latter. It makes no difference to this proposal.)All loads first look to the local directory of the loading file for a match. If not found it then looks to installed packages. Name clashes between local files and packages are generally easy enough to avoid by choosing non conflicting names. It a name conflict is unavoidable a "package marker" can be use to distinguish the package from the file. e.g. a
|
between the package name and file name (the exact marker to use is an open question).Of course in most cases that will be
load Foo|Foo
which sucks for redundancy, but maybe someone else can think of a good way to denote that without the duplication. Also, local relative paths can be specified by prefixing./
and../
if necessary, but generally these should not be needed.As an alternative to the use of the
|
it could require a "from" term, e.g.That reads a little better, albeit it not as concise.
I will use the term
import
to serve as the name of the function that brings module functions into the scope of other modules. I think the is a good term b/c it contrasts well withexport
.By importing
BarA
andBarB
, their exported functions are made visible within Foo without absolute paths. There doesn't need to be a separateusing
, asimport
allows the methods to both be found and to be extended. If there is a name clash between imported modules the later wins out. Specific methods can be imported by using the colon notation.This would import only
ga
and no other functions. Functions must be exported to be accessed. The only way to access none exported function would be to force them, by reopening the module and modifying it. Which beings me to last part of this proposal.If a module is "reopened" then it can be modified. This happens at compile time, so it is safe.
So
Foo.f()
can work b/c we modified Bar to allow it. This of course should be done with clear understanding of what one is doing. If it is being done to a module from another package b/c the author probably didn't export the function for a reason. But it is important to be able to have this option b/c it makes it possible to fix overlooked limitations and bugs, and improves potential code reuse.The advantages of this design are essentially all the advantages of modular programming since that is what it is designed to provide. One nice thing that stands out is that all loads can go at the top of a file, as order of loads is not significant. That makes it much easier to see what a file requires. It also means the each file can have it's own loads even if they are the same as another files that loads it. They are only ever loaded once.
The text was updated successfully, but these errors were encountered: