RFC: marking private and public APIs #48772

Roger-luo · 2023-02-23T21:08:52Z

This is based on #42117 but is a bit different proposal that might be breaking for Julia 1.x based on different considerations thus I decided to open a separate issue to make reading a bit easier.

In short, I'd like a similar macro/keyword that marks an object:

public, accessible via <module>.<name> or <name> after explicitly imported by import/using
unmarked names are considered private, which means one cannot access them at all from outside

the syntax may vary due to compatibility concerns, e.g

Keep things non-breaking

because we currently do not hide things by default, we will need a marker for things we want to make private

@public <something>
@private <something>

Breaking but simpler

if we choose something more breaking, it could be a public/export keyword combined with #39235, where export/public could just mean "public" accessible APIs, and things are not accessible from the outside module directly.

On the other hand, using XXX and import XXX needs to be private by default, unless marked with export/public so we can prevent someone using a long module path to access some deep dependencies from a package (see the point 2 in Why).

Additionally

I'd like a macro that marks certain API's stability status, this is something I find quite nice from the rust community, where they have things like

#[cfg_attr(not(test), rustc_diagnostic_item = "IpAddr")]
#[stable(feature = "ip_addr", since = "1.7.0")]
#[derive(Copy, Clone, Eq, PartialEq, Hash, PartialOrd, Ord)]
pub enum IpAddr {
    /// An IPv4 address.
    #[stable(feature = "ip_addr", since = "1.7.0")]
    V4(#[stable(feature = "ip_addr", since = "1.7.0")] Ipv4Addr),
    /// An IPv6 address.
    #[stable(feature = "ip_addr", since = "1.7.0")]
    V6(#[stable(feature = "ip_addr", since = "1.7.0")] Ipv6Addr),
}

It marks certain things' stability at the same place where it's defined programmatically so that a linter can warn users based on their current toolchain version.

In Julia, we only have a poor docstring saying "use at your own risk", which is something I think could be improved by this. Having this shouldn't break, it could be one macro marks struct fields and function arguments about their availability and stability. This could make the experimental feature easier to provide and play with in downstream.

Why?

#42117 has overlapped with this proposal thus the reasons @DilumAluthge listed also apply here, I'd like to provide a few other motivations tho

potential reduction on pkgimage/cache/sysimg size, because many functions in a package are used by the package rather than the downstream user, the methods corresponding to these functions that are not used by downstream can be deleted in the downstream package cache/sysimg in principle, but we currently cannot do it because users are allowed access them by a deep chain of module path (e.g A.B.C.<a private function>. I think quite a few AOT languages have a similar mechanism to mark things private so they can be tree-shake away. I'm not an expert on package cache or system images, but I think this might be one of the low-hanging fruit that can be improved by changing the semantics a bit, so please correct me if this is not what can be improved in Julia's case.

A demonstration of this can be resolving the issue that https://expronicon.rogerluo.dev/intro/bootstrap and https://github.com/thautwarm/DevOnly.jl trying to solve automatically:

MLStyle is an extreme example of this, what @match generates only depends on Base, but the only reason why downstream packages will still load MLStyle is only that users are allowed to access MLStyle's @match via AAA.BBB.CCC.@match if CCC contains using MLStyle: @match, and one can get a rough estimation in this extreme case on loading time improvements

julia> @time using MLStyle
  0.041998 seconds (148.07 k allocations: 10.553 MiB)

julia> @time using ExproniconLite
  0.020682 seconds (53.83 k allocations: 3.991 MiB)

here you can see even Expronicon depends on MLStyle, by removing MLStyle from loading entirely we can get twice faster loading time than depending on MLStyle.

prevent downstream users hacking unstable things, this is more or less an effect of having 1, but I want to argue it has certain advantages, one example is the non-public APIs from Base, we currently have a very vague way of distinguishing them by whether the function has a docstring or not, IMO even functions only made for developers deserves a docstring in the dev docs. If we have a mark to distinguish such APIs, then the reference page can be generated automatically for manual and dev docs of the corresponding functions. And APIs like Base.print_matrix etc. can be more clear to people that whether they should use and whether this is maybe broken in future versions.

But I'd like to mention one real-world example which is the usage of DiffEq, most of the time one only uses one ODE solver from that giant package, thus in principle downstream package should not be loading the whole thing, but that ODE solver code only. But because currently you are allowed to access other solvers by something like MyPackage.DiffEq.OrdinaryDiffEq.Vern8 we will have to load the whole thing which is super slow.

Ideally, the compiler should cache the corresponding solver code into the downstream package image and only load that piece when using MyPackage without explicit using DiffEq. But I think this is not allowed because users can technically do MyPackage.DiffEq to access anything inside DiffEq

The text was updated successfully, but these errors were encountered:

Seelengrab · 2023-02-23T22:39:35Z

I don't think preventing people from accessing a given function entirely via some form of @private is a good idea, but I do think that we need some means of explicitly marking an unexported thing as part of the SemVer API guarantees.

But I'd like to mention one real-world example which is the usage of DiffEq, most of the time one only uses one ODE solver from that giant package, thus in principle downstream package should not be loading the whole thing, but that ODE solver code only.

I also don't think such an @private macro would help with the woes of all of DiffEq needing to be loaded, because there's no definite line that can be drawn for where that boundary of accessible/non-accessible needs to be drawn in general (I've hacked around too many too restrictive private modifiers in java for my share of problems..). As far as I'm aware, the trend for these has also been going more towards allowing more extension from the outside, rather than disallowing more (such as favoring protected over private). Do you have an example of a language that uses the access modifiers for tree-shaking purposes?

In addition to this, I think having a way to mark objects public/private will also resolve the issue

In regards to smaller package images - in an AOT setting, where a static binary without compilation is the goal, you don't need an additional layer of @private to delete things. Just delete things that aren't reachable from a given (list of) static entry point(s) (which you need either way, else you can't start the binary/access the library). Currently, I think that would be exported names, but extending that with an @public list of unexported names should be sufficient, since there's no new code that may be generated at runtime that may access new names.

I don't see how @private helps with that in a non-AOT setting, since the presence of eval (or any macro, really) may mean that new functions accessing internals can exist, without necessarily being visible to static analysis (which is required for removing unused things from a package image).

I really like having an @public (or @api?) marker for marking an unexported thing explicitly as SemVer API. This should be queryable in code, to allow doc systems like the help-mode in the REPL or Documenter.jl to automatically pick it up & display the warning (possibly with a version of when it was added, which can generate a fitting admonition in the docstring automatically).

As a point of reference (but please don't feel the need to post), there's also this lengthy discourse thread about this very topic, with lots of good arguments for & against various versions of this (including my quite strong stance about why I personally think a private modifier is a bad idea).

Roger-luo · 2023-02-23T23:24:05Z

@Seelengrab I probably should emphasize on the improvement of code loading times more, for the API definitions part I think it should be in the scope of #42117, I think the comments on private totally make sense in the discourse post, I wasn't being clear that the above-proposed syntax is only made for compatibility reasons, and there could be a better way of making things inaccessible. However, I don't think there has been anyone discussed the impact of loading in discourse posts or issues before (unless I searched the wrong keyword).

In regards to smaller package images - in an AOT setting, where a static binary without compilation is the goal, you don't need an additional layer of @Private to delete things. Just delete things that aren't reachable from a given (list of) static entry point(s) (which you need either way, else you can't start the binary/access the library).

Yes, that's exactly what I'm trying to propose here actually, a more "static" marker, the proposed public/private marker is mainly for compatibility, if we break the semantics of export we can totally do what you described here.

As far as I'm aware, the trend for these has also been going more towards allowing more extension from the outside, rather than disallowing more (such as favoring protected over private). Do you have an example of a language that uses access modifiers for tree-shaking purposes?

I don't have an example right now, since I thought this was obvious. But I could be wrong on this. But let me explain two things here:

the proposed private marker is not necessary if we let things not exported inaccessible by default, I propose that mainly for compatibility reasons
for a Julia package A that depends on B.foo, and A does not re-export B, e.g

module A
# package A

using B: foo
end # module A

B.foo will be an entry for package B, and if B is not accessible from outside A, it will be safe to delete everything unrelated to foo and only save the definition of foo in the compiled cache of A? Because no one can access anything else from B except A.foo.

But eval is a good point, I didn't think about it, but I think in this case A.eval(:(foo)) will still work even if other function definitions in B are not loaded? and to make sure something is accessible you can always do

A.eval(quote
   using B: goo
   goo
end)

and by calling the loader at runtime throw eval will invalidate A's cache and load the whole cache of B in this case.

Basically, here we are treating exported or public APIs as the static entry point, and removing things not reachable, which is why we need to distinguish what is accessible and what is not. The simplest solution is treat exported APIs as static entry points, and do not treat other APIs as an entry points, but that would be breaking because all other APIs can be accessed via the dot operator.

This is basically why I'm saying if we want this in 1.x, we will need something like a private, but

we potentially combine this with 2.0: just one import keyword #39235 and break the convention of export, no extra keyword is needed,
or if we break the convention that everything is accessible from a module using a dot, then only a public marker is sufficient.

I don't see how @Private helps with that in a non-AOT setting, since the presence of eval (or any macro, really) may mean that new functions accessing internals can exist, without necessarily being visible to static analysis (which is required for removing unused things from a package image).

I don't think you have access to all the internals even for the current module loading system? if the only module you have access to above is A? You cannot access to A.B.goo without importing B to the scope like using B: B.

And in my proposal, if the loading statement is written as public using B: B, goo then you will have access to B and we should load B entirely as well, otherwise, because this is not an entry point for A anymore, we can just delete them.

@Seelengrab also maybe you didn't see the example I provided for the MLStyle case above, which is an extreme case because it doesn't have any dependencies, if we make MLStyle inaccessible from outside, we can safely not load MLStyle which will have an improvement on loading (the ExproniconLite package, which I created using a hacky way). This is a bit hard to demonstrate using a macro, so I'm not able to provide something more complicated, but I hope this demonstrates the idea.

Roger-luo · 2023-02-23T23:36:22Z

As far as I'm aware, the trend for these has also been going more towards allowing more extension from the outside, rather than disallowing more (such as favoring protected over private).

I guess extensions are not exactly the same as this - extensions are lazily loaded, but here it's about reducing downstream package loading time by reducing unnecessary cache. The effect is similar - much faster loading time, but both should be used in practice to reduce loading time further.

On the other hand, I'd doubt if converting say every ODE solver into an extension makes sense or practical... mark things inaccessible in downstream packages could be easier to adapt.

Seelengrab · 2023-02-24T07:19:05Z

Because no one can access anything else from B except A.foo.

I don't think you have access to all the internals even for the current module loading system? if the only module you have access to above is A? You cannot access to A.B.goo without importing B to the scope like using B: B.

That is not true:

julia> module B
           foo() = "hi"
           bar() = "bar"
       end
Main.B

julia> module A
       # package A

       using ..B: foo
       end # module A
Main.A

julia> (A.foo |> parentmodule).bar()
"bar"

So unless we break the notion of accessibility of unexported names, you can currently get everything in a dependency via reflection on anything that module gave us. If you want to get this into a 1.x release, breaking here does not look like an option because there's no safe state to fall back to in general here - we don't actually track the API surface of anything in a programmatic way right now, and "unexported" is not an accurate proxy variable here. To me that means the only option left is to actually make it queryable what is API, by placing explicit @api markers (and export, if the name needs to be exported).

Basically, here we are treating exported or public APIs as the static entry point, and removing things not reachable, which is why we need to distinguish what is accessible and what is not.

Yes, but that is not sufficient. What is accessible vs. what is API are orthogonal concepts, and the current (unspoken & unqueryable) convention is that unexported names are (usually) not API, and documented names are (usually) API. It's this fuzziness around what exactly constitutes the API of a package that is the problem, not whether you can access the given name at all. Regardless of solving that, using any sort of API marker as a boundary for tree shaking is bound to lead to problems because that's not the job of an API marker - their job is to inform a developer/user about what is considered safe/stable to access across versions, and accessing anything else does not have any guarantees associated with it.

I guess extensions are not exactly the same as this

On the other hand, I'd doubt if converting say every ODE solver into an extension makes sense or practical

To be very clear here - with "extension" I was not referring to the weak dependencies/package extensions we now have on master, but referring to the practice of extending the interface of a library in java with a new type/object. Unless that library has all of its boundary marked as protected or public, new code implementing e.g. an abstract class cannot change the behavior of some things (which there are very legitimate reasons to want to do).

Roger-luo · 2023-02-24T16:19:09Z

That is not true:

you are mixing up modules ported to Main and modules defined in Base.__toplevel__ module, they are not exactly the same: for packages, they are evaluated in a hidden Base.__toplevel__ which is why if B is a package, and using B: foo will not port B into the scope.

I didn't see the parentmodule part, yeah, I think you are right then, if we want to enable tree-shaking a package, we need to break the convention of what is accessible from a package then. Perhaps by making packages more special.

To me that means the only option left is to actually make it queryable what is API, by placing explicit @api markers (and export, if the name needs to be exported).

Yeah, or that, I'm open to anything that marks an API, a publicly accessible name, or whatever we name it. But I'm just saying this can be helpful in reducing loading time if we make things inaccessible.

Regardless of solving that, using any sort of API marker as a boundary for tree shaking is bound to lead to problems because that's not the job of an API marker - their job is to inform a developer/user about what is considered safe/stable to access across versions, and accessing anything else does not have any guarantees associated with it.

OK, you get me confused now, are you proposing we treat the @api marked object as a static entry, and delete unreachable? Or what you are trying to propose in terms of tree-shaking unused definition from a "package" (not a module inside Main)?

BioTurboNick · 2023-02-27T13:33:07Z

prevent downstream users hacking unstable things, one example is the non-public APIs from Base, we currently have a very vague way of distinguishing them by whether the function has a docstring or not, IMO even functions only made for developers deserves a docstring in the dev docs. If we have a mark to distinguish such APIs, then the reference page can be generated automatically for manual and dev docs of the corresponding functions. And APIs like Base.print_matrix etc. can be more clear to people that whether they should use and whether this is maybe broken in future versions.

I'm against hard barriers to access. Hacking "private" methods is honestly a very useful feature of Julia. If you want there to a little extra friction for access to "private" methods, that would be a nice addition: make the consuming code self-document that it's using a "private" method. There would have to be a very compelling reason to strip it entirely, IMO. Which leaves...

potential reduction on pkgimage/cache/sysimg size[...]

Which I don't know enough about to comment on the merits.

Roger-luo · 2023-02-27T14:57:26Z

If you want there to a little extra friction for access to "private" methods, that would be a nice addition: make the consuming code self-document that it's using a "private" method. There would have to be a very compelling reason to strip it entirely, IMO. Which leaves...

I kinda regret I put this as my first point now... so basically, if we want to delete these methods later, we need a way to prevent downstream users from accessing them, otherwise, we cannot delete them. I think this makes a lot of sense for an AOT language, but it's tricky for a JIT language... as people's expectation is different

I kinda understand why you are against hard barriers to access as a developer, I think if we can trade being able to hack private methods with faster loading time I'd choose later, but let me change my post a bit to be more clear on this...

Seelengrab · 2023-02-27T15:50:09Z

To be clear, I think the claim that a public/private or accessible/inaccessible distinction can improve load times is dubious - at best, you can delete code that isn't API and isn't used by the package internally, at which point, why is that code in the package in the first place? The load times are not a result of functions that noone calls after all, but of all those that are called (one way or another).

inkydragon added the design Design of APIs or of the language itself label Feb 24, 2023

JuliaLang locked and limited conversation to collaborators Feb 27, 2023

vtjnash converted this issue into discussion #48819 Feb 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This issue was moved to a discussion.

RFC: marking private and public APIs #48772

RFC: marking private and public APIs #48772

Roger-luo commented Feb 23, 2023 •

edited

Loading

Seelengrab commented Feb 23, 2023 •

edited

Loading

Roger-luo commented Feb 23, 2023 •

edited

Loading

Roger-luo commented Feb 23, 2023

Seelengrab commented Feb 24, 2023

Roger-luo commented Feb 24, 2023 •

edited

Loading

BioTurboNick commented Feb 27, 2023

Roger-luo commented Feb 27, 2023 •

edited

Loading

Seelengrab commented Feb 27, 2023 •

edited

Loading

This issue was moved to a discussion.

This issue was moved to a discussion.

RFC: marking private and public APIs #48772

RFC: marking private and public APIs #48772

Comments

Roger-luo commented Feb 23, 2023 • edited Loading

Keep things non-breaking

Breaking but simpler

Additionally

Why?

Seelengrab commented Feb 23, 2023 • edited Loading

Roger-luo commented Feb 23, 2023 • edited Loading

Roger-luo commented Feb 23, 2023

Seelengrab commented Feb 24, 2023

Roger-luo commented Feb 24, 2023 • edited Loading

BioTurboNick commented Feb 27, 2023

Roger-luo commented Feb 27, 2023 • edited Loading

Seelengrab commented Feb 27, 2023 • edited Loading

This issue was moved to a discussion.

Roger-luo commented Feb 23, 2023 •

edited

Loading

Seelengrab commented Feb 23, 2023 •

edited

Loading

Roger-luo commented Feb 23, 2023 •

edited

Loading

Roger-luo commented Feb 24, 2023 •

edited

Loading

Roger-luo commented Feb 27, 2023 •

edited

Loading

Seelengrab commented Feb 27, 2023 •

edited

Loading