-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use mmap to read files when possible #106
Comments
@JeffBezanson: does this make any sense? |
Yeah, we can make Arrays with any data pointer, so you can make strings or numeric arrays backed by binary files. Seems to make the most sense when the entire file corresponds to a single object, especially a random-access object. We could define a kind of i/o stream that wraps an |
Cool. This is very 2.0, of course, but I think it could make it possible to do operations on files both efficiently and easily. |
This could be very cool if it could be done. I presume that this is done in R. Perhaps matlab only focusses on windows. |
I'm fairly certain this is not done in R. I don't know of any language that does this. For some reason, people seem to be allergic to mmap. I don't understand why because IMO it's one of the best syscalls every invented. |
In commit d65e897, we have it. |
Can we close this one? |
Good question. Stefan has some interesting ideas above that go above the current mmap implementation, so I'll let him comment. |
Let's leave it open as an idea bookmark, although I'm not sure this fits too well with our current I/O model. Would need a lot of deep surgery to make it work, I think. |
I think we can close this issue. We now have fairly accessible |
REPL also uses sigatomic, but I don't expect to be interpreting REPL anytime soon.
Run all Base users of `sigatomic` in compiled mode. Fixes #106
…uliaLang#106) Otherwise we may just observe `gc_n_threads = 0` (`jl_gc_collect` sets it to 0 in the very end of its body) and this function becomes a no-op.
This commit makes it possible for `names` to return `using`-ed names as well: ```julia julia> using Base: @assume_effects julia> Symbol("@assume_effects") in names(@__MODULE__; usings=true) true ``` Currently, to find all names available in a module `A`, the following steps are needed: 1. Use `names(A; all=true, imported=true)` to get the names defined by `A` and the names explicitly `import`ed by `A`. 2. Use `jl_module_usings(A)` to get the list of modules `A` has `using`-ed and then use `names()` to get the names `export`ed by those modules. This method is implemented in e.g. REPL completions, but it has a problem: it could not get the names explicitly `using`-ed by `using B: ...` (#36529, #40356, JuliaDebug/Infiltrator.jl/#106, etc.). This commit adds a new keyword argument `usings::Bool=false` to `names(A; ...)`, which, when `usings=true` is specified, returns all names introduced by `using` in `A`. In other words, `usings=true` not only returns explicitly `using`-ed names but also incorporates step 2 above into the implementation of `names`. By using this new option, we can now use `names(A; all=true, imported=true, usings=true)` to know all names available in `A`, without implementing the two-fold steps on application side. As example application, this new feature will be used to simplify and enhance the implementation of REPL completions. - fixes #36529 Co-authored-by: Nathan Daly <NHDaly@gmail.com> Co-authored-by: Sebastian Pfitzner <pfitzseb@gmail.com>
This commit makes it possible for `names` to return `using`-ed names as well: ```julia julia> using Base: @assume_effects julia> Symbol("@assume_effects") in names(@__MODULE__; usings=true) true ``` Currently, to find all names available in a module `A`, the following steps are needed: 1. Use `names(A; all=true, imported=true)` to get the names defined by `A` and the names explicitly `import`ed by `A`. 2. Use `jl_module_usings(A)` to get the list of modules `A` has `using`-ed and then use `names()` to get the names `export`ed by those modules. This method is implemented in e.g. REPL completions, but it has a problem: it could not get the names explicitly `using`-ed by `using B: ...` (#36529, #40356, JuliaDebug/Infiltrator.jl/#106, etc.). This commit adds a new keyword argument `usings::Bool=false` to `names(A; ...)`, which, when `usings=true` is specified, returns all names introduced by `using` in `A`. In other words, `usings=true` not only returns explicitly `using`-ed names but also incorporates step 2 above into the implementation of `names`. By using this new option, we can now use `names(A; all=true, imported=true, usings=true)` to know all names available in `A`, without implementing the two-fold steps on application side. As example application, this new feature will be used to simplify and enhance the implementation of REPL completions. - fixes #36529 Co-authored-by: Nathan Daly <NHDaly@gmail.com> Co-authored-by: Sebastian Pfitzner <pfitzseb@gmail.com>
This commit makes it possible for `names` to return `using`-ed names as well: ```julia julia> using Base: @assume_effects julia> Symbol("@assume_effects") in names(@__MODULE__; usings=true) true ``` Currently, to find all names available in a module `A`, the following steps are needed: 1. Use `names(A; all=true, imported=true)` to get the names defined by `A` and the names explicitly `import`ed by `A`. 2. Use `jl_module_usings(A)` to get the list of modules `A` has `using`-ed and then use `names()` to get the names `export`ed by those modules. This method is implemented in e.g. REPL completions, but it has a problem: it could not get the names explicitly `using`-ed by `using B: ...` (#36529, #40356, JuliaDebug/Infiltrator.jl/#106, etc.). This commit adds a new keyword argument `usings::Bool=false` to `names(A; ...)`, which, when `usings=true` is specified, returns all names introduced by `using` in `A`. In other words, `usings=true` not only returns explicitly `using`-ed names but also incorporates step 2 above into the implementation of `names`. By using this new option, we can now use `names(A; all=true, imported=true, usings=true)` to know all names available in `A`, without implementing the two-fold steps on application side. As example application, this new feature will be used to simplify and enhance the implementation of REPL completions. - fixes #36529 Co-authored-by: Nathan Daly <NHDaly@gmail.com> Co-authored-by: Sebastian Pfitzner <pfitzseb@gmail.com>
This commit makes it possible for `names` to return `using`-ed names as well: ```julia julia> using Base: @assume_effects julia> Symbol("@assume_effects") in names(@__MODULE__; usings=true) true ``` Currently, to find all names available in a module `A`, the following steps are needed: 1. Use `names(A; all=true, imported=true)` to get the names defined by `A` and the names explicitly `import`ed by `A`. 2. Use `jl_module_usings(A)` to get the list of modules `A` has `using`-ed and then use `names()` to get the names `export`ed by those modules. This method is implemented in e.g. REPL completions, but it has a problem: it could not get the names explicitly `using`-ed by `using B: ...` (#36529, #40356, JuliaDebug/Infiltrator.jl/#106, etc.). This commit adds a new keyword argument `usings::Bool=false` to `names(A; ...)`, which, when `usings=true` is specified, returns all names introduced by `using` in `A`. In other words, `usings=true` not only returns explicitly `using`-ed names but also incorporates step 2 above into the implementation of `names`. By using this new option, we can now use `names(A; all=true, imported=true, usings=true)` to know all names available in `A`, without implementing the two-fold steps on application side. As example application, this new feature will be used to simplify and enhance the implementation of REPL completions. - fixes #36529 Co-authored-by: Nathan Daly <NHDaly@gmail.com> Co-authored-by: Sebastian Pfitzner <pfitzseb@gmail.com>
This commit makes it possible for `names` to return `using`-ed names as well: ```julia julia> using Base: @assume_effects julia> Symbol("@assume_effects") in names(@__MODULE__; usings=true) true ``` Currently, to find all names available in a module `A`, the following steps are needed: 1. Use `names(A; all=true, imported=true)` to get the names defined by `A` and the names explicitly `import`ed by `A`. 2. Use `jl_module_usings(A)` to get the list of modules `A` has `using`-ed and then use `names()` to get the names `export`ed by those modules. This method is implemented in e.g. REPL completions, but it has a problem: it could not get the names explicitly `using`-ed by `using B: ...` (#36529, #40356, JuliaDebug/Infiltrator.jl/#106, etc.). This commit adds a new keyword argument `usings::Bool=false` to `names(A; ...)`, which, when `usings=true` is specified, returns all names introduced by `using` in `A`. In other words, `usings=true` not only returns explicitly `using`-ed names but also incorporates step 2 above into the implementation of `names`. By using this new option, we can now use `names(A; all=true, imported=true, usings=true)` to know all names available in `A`, without implementing the two-fold steps on application side. As example application, this new feature will be used to simplify and enhance the implementation of REPL completions. - fixes #36529 Co-authored-by: Nathan Daly <NHDaly@gmail.com> Co-authored-by: Sebastian Pfitzner <pfitzseb@gmail.com>
This commit makes it possible for `names` to return `using`-ed names as well: ```julia julia> using Base: @assume_effects julia> Symbol("@assume_effects") in names(@__MODULE__; usings=true) true ``` Currently, to find all names available in a module `A`, the following steps are needed: 1. Use `names(A; all=true, imported=true)` to get the names defined by `A` and the names explicitly `import`ed by `A`. 2. Use `jl_module_usings(A)` to get the list of modules `A` has `using`-ed and then use `names()` to get the names `export`ed by those modules. This method is implemented in e.g. REPL completions, but it has a problem: it could not get the names explicitly `using`-ed by `using B: ...` (#36529, #40356, JuliaDebug/Infiltrator.jl/#106, etc.). This commit adds a new keyword argument `usings::Bool=false` to `names(A; ...)`, which, when `usings=true` is specified, returns all names introduced by `using` in `A`. In other words, `usings=true` not only returns explicitly `using`-ed names but also incorporates step 2 above into the implementation of `names`. By using this new option, we can now use `names(A; all=true, imported=true, usings=true)` to know all names available in `A`, without implementing the two-fold steps on application side. As example application, this new feature will be used to simplify and enhance the implementation of REPL completions. - fixes #36529 Co-authored-by: Nathan Daly <NHDaly@gmail.com> Co-authored-by: Sebastian Pfitzner <pfitzseb@gmail.com>
Stdlib: Distributed URL: https://github.com/JuliaLang/Distributed.jl Stdlib branch: master Julia branch: master Old commit: 6c7cdb5 New commit: c613685 Julia version: 1.12.0-DEV Distributed version: 1.11.0(Does not match) Bump invoked by: @DilumAluthge Powered by: [BumpStdlibs.jl](https://github.com/JuliaLang/BumpStdlibs.jl) Diff: JuliaLang/Distributed.jl@6c7cdb5...c613685 ``` $ git log --oneline 6c7cdb5..c613685 c613685 Merge pull request #116 from JuliaLang/ci-caching 20e2ce7 Use julia-actions/cache in CI 9c5d73a Merge pull request #112 from JuliaLang/dependabot/github_actions/codecov/codecov-action-5 ed12496 Merge pull request #107 from JamesWrigley/remotechannel-empty 010828a Update .github/workflows/ci.yml 11451a8 Bump codecov/codecov-action from 4 to 5 8b5983b Merge branch 'master' into remotechannel-empty 729ba6a Fix docstring of `@everywhere` (#110) af89e6c Adding better docs to exeflags kwarg (#108) 8537424 Implement Base.isempty(::RemoteChannel) 6a0383b Add a wait(::[Abstract]WorkerPool) (#106) 1cd2677 Bump codecov/codecov-action from 1 to 4 (#96) cde4078 Bump actions/cache from 1 to 4 (#98) 6c8245a Bump julia-actions/setup-julia from 1 to 2 (#97) 1ffaac8 Bump actions/checkout from 2 to 4 (#99) 8e3f849 Fix RemoteChannel iterator interface (#100) f4aaf1b Fix markdown errors in README.md (#95) 2017da9 Merge pull request #103 from JuliaLang/sf/sigquit_instead 07389dd Use `SIGQUIT` instead of `SIGTERM` ``` Co-authored-by: Dilum Aluthge <dilum@aluthge.com>
Stdlib: Distributed URL: https://github.com/JuliaLang/Distributed.jl Stdlib branch: master Julia branch: master Old commit: 6c7cdb5 New commit: c613685 Julia version: 1.12.0-DEV Distributed version: 1.11.0(Does not match) Bump invoked by: @DilumAluthge Powered by: [BumpStdlibs.jl](https://github.com/JuliaLang/BumpStdlibs.jl) Diff: JuliaLang/Distributed.jl@6c7cdb5...c613685 ``` $ git log --oneline 6c7cdb5..c613685 c613685 Merge pull request #116 from JuliaLang/ci-caching 20e2ce7 Use julia-actions/cache in CI 9c5d73a Merge pull request #112 from JuliaLang/dependabot/github_actions/codecov/codecov-action-5 ed12496 Merge pull request #107 from JamesWrigley/remotechannel-empty 010828a Update .github/workflows/ci.yml 11451a8 Bump codecov/codecov-action from 4 to 5 8b5983b Merge branch 'master' into remotechannel-empty 729ba6a Fix docstring of `@everywhere` (#110) af89e6c Adding better docs to exeflags kwarg (#108) 8537424 Implement Base.isempty(::RemoteChannel) 6a0383b Add a wait(::[Abstract]WorkerPool) (#106) 1cd2677 Bump codecov/codecov-action from 1 to 4 (#96) cde4078 Bump actions/cache from 1 to 4 (#98) 6c8245a Bump julia-actions/setup-julia from 1 to 2 (#97) 1ffaac8 Bump actions/checkout from 2 to 4 (#99) 8e3f849 Fix RemoteChannel iterator interface (#100) f4aaf1b Fix markdown errors in README.md (#95) 2017da9 Merge pull request #103 from JuliaLang/sf/sigquit_instead 07389dd Use `SIGQUIT` instead of `SIGTERM` ``` Co-authored-by: Dilum Aluthge <dilum@aluthge.com>
It occurs to me that we could potentially mmap files that are mmappable when reading them, potentially improving performance quite drastically. We'd have to test it out to make sure it didn't have major issues, but it could be a big win.
One really great example of where this could be an enormous performance improvement would be when using a regex to scan a file. The normal approach is to real in a chunk at a time and then scan that with the regex. However, that has issues if you can't be sure whether the regex has to match within the chunk. For example, when reading line-by-line, matching a regex that can only match within a line, this is fine — as long as the lines don't get prohibitively long. If the scan is through the entire file, e.g. in the case where the regex is being used to split the file into chunks for consumption, then this can get tricky. Using the mmap approach, you can just map the file and let the regex engine go to work — the kernel will handle getting the data when the regex engine is ready for it!
It would be nice to be able to apply the same trick to streamed data, which cannot be mmapped. However, I think a related trick might work: memory protect the page after the last bit of valid, read data from the file and trap accesses to that memory. Then if the regex engine (or whatever is doing the reading) reads past the valid buffer, you can go ahead and read more data on demand. This allows a very similar trick to be done even with streamed input.
The text was updated successfully, but these errors were encountered: