Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor out more crates? #1194

Closed
NobodyXu opened this issue Jul 7, 2023 · 16 comments
Closed

Refactor out more crates? #1194

NobodyXu opened this issue Jul 7, 2023 · 16 comments

Comments

@NobodyXu
Copy link
Member

NobodyXu commented Jul 7, 2023

Right now a few modules are complicated, can be used relatively independent from other and have their own *Error types:

  • helpers::git::Repository: Completely independent of others
  • load_manifest_from_workspace: This module is quite complicated and is useful on its own since cargo_toml does not provide such functionality
  • drivers: Provide default crates.io registry, sparse and git registry querying, could be useful as an independent crate

These modules listed above can be useful on their own, so it looks like splitting them out as separate crates makes sense?

@NobodyXu
Copy link
Member Author

NobodyXu commented Aug 9, 2023

Byron recommends to use tame-index since:

  • it provides caching that is compatible with cargo
  • it provides fine-grand features for selecting backend (curl, reqwest, rustls, etc).

So we might just replace part of drivers with tame-index.

@NobodyXu
Copy link
Member Author

NobodyXu commented Aug 9, 2023

tame_index::index::RemoteGitIndex provides a remote git index that is better than our current implementation:

  • It support caching that is compatible with cargo
  • It supports updating cached repository
  • It requires feature tame_index/git, which only pulls in gix

I decided that pulling in tame_index::index::AsyncRemoteSparseIndex is not worth its cost right now:

  • It requires feature tame_index/sparse, which will pull in rayon and crossbeam-channel.
    While gix indeed pulls them in right now, they might not be pulled in future
  • http is generally fast enough that no caching is required.

@NobodyXu
Copy link
Member Author

NobodyXu commented Aug 9, 2023

I've tried this, but it seems that doing this ourselves might also be feasible:

@NobodyXu
Copy link
Member Author

NobodyXu commented Aug 9, 2023

I used cargo-llvm-lines to profile on what contributes to the slow codegen of binstalk, turns out that even fs contributes a lot to it, cargo_toml_workspace, git and drivers indeed worth splitting out.

I think I might:

    1989 (0.2%, 39.5%)      1 (0.0%, 15.7%)  binstalk::helpers::cargo_toml_workspace::load_manifest_from_workspace_inner
     516 (0.1%, 70.0%)      1 (0.0%, 39.6%)  binstalk::helpers::cargo_toml_workspace::Pattern::glob_dirs
     183 (0.0%, 85.3%)      1 (0.0%, 55.1%)  <binstalk::helpers::cargo_toml_workspace::LoadManifestFromWSErrorInner as core::fmt::Display>::fmt
     148 (0.0%, 87.5%)      1 (0.0%, 58.5%)  binstalk::helpers::cargo_toml_workspace::Pattern::new
      96 (0.0%, 90.8%)      1 (0.0%, 63.0%)  <binstalk::helpers::cargo_toml_workspace::LoadManifestFromWSError as core::fmt::Display>::fmt
      82 (0.0%, 91.6%)      3 (0.0%, 63.8%)  binstalk::helpers::cargo_toml_workspace::load_manifest_from_workspace_inner::{{closure}}
      71 (0.0%, 92.6%)      1 (0.0%, 65.1%)  binstalk::helpers::cargo_toml_workspace::Pattern::matches_with_trailing

     933 (0.1%, 56.9%)      1 (0.0%, 27.8%)  <binstalk::helpers::git::progress_tracing::TracingProgress as prodash::traits::Progress>::set
     892 (0.1%, 58.2%)      1 (0.0%, 28.5%)  <binstalk::helpers::git::progress_tracing::TracingProgress as prodash::traits::Progress>::message
     577 (0.1%, 67.9%)      1 (0.0%, 38.1%)  binstalk::helpers::git::Repository::shallow_clone
     497 (0.1%, 70.8%)      1 (0.0%, 40.2%)  binstalk::helpers::git::Repository::shallow_clone_bare
     331 (0.0%, 77.5%)      1 (0.0%, 46.3%)  binstalk::helpers::git::Repository::get_head_commit_entry_data_by_path::inner
     226 (0.0%, 82.8%)      2 (0.0%, 51.9%)  <binstalk::helpers::git::progress_tracing::TracingProgress as prodash::traits::Progress>::add_child_with_id
     203 (0.0%, 84.1%)      1 (0.0%, 53.3%)  <binstalk::helpers::git::GitError as core::fmt::Display>::fmt
     144 (0.0%, 87.7%)      4 (0.0%, 58.7%)  <binstalk::helpers::git::progress_tracing::TracingProgress as prodash::traits::Progress>::set_name::{{closure}}
     134 (0.0%, 88.2%)      2 (0.0%, 59.2%)  <binstalk::helpers::git::progress_tracing::TracingProgress as prodash::traits::Progress>::set_name
     115 (0.0%, 89.5%)      1 (0.0%, 60.7%)  binstalk::helpers::git::Repository::prepare_fetch
      89 (0.0%, 91.2%)      1 (0.0%, 63.4%)  binstalk::helpers::git::progress_tracing::TracingProgress::new::{{closure}}
      77 (0.0%, 92.0%)      1 (0.0%, 64.3%)  binstalk::helpers::git::progress_tracing::TracingProgress::new

    1484 (0.2%, 47.0%)      1 (0.0%, 20.1%)  binstalk::fs::copy_to_tempfile
    1023 (0.1%, 55.4%)      1 (0.0%, 26.9%)  binstalk::fs::atomic_install
     727 (0.1%, 63.1%)      1 (0.0%, 32.4%)  binstalk::fs::atomic_symlink_file
     704 (0.1%, 63.5%)      1 (0.0%, 32.6%)  binstalk::fs::atomic_install_noclobber
     332 (0.0%, 77.5%)      1 (0.0%, 46.3%)  binstalk::fs::persist

     492 (0.1%, 71.2%)      1 (0.0%, 41.7%)  binstalk::drivers::registry::common::crate_prefix_components
     442 (0.1%, 73.3%)      1 (0.0%, 42.5%)  binstalk::drivers::registry::git_registry::GitIndex::new
     411 (0.0%, 74.0%)      1 (0.0%, 43.0%)  binstalk::drivers::registry::visitor::ManifestVisitor::load_manifest
     328 (0.0%, 77.7%)      1 (0.0%, 46.3%)  binstalk::drivers::registry::common::MatchedVersion::find
     315 (0.0%, 78.4%)      1 (0.0%, 46.8%)  binstalk::drivers::registry::git_registry::GitRegistry::find_crate_matched_ver
     296 (0.0%, 79.5%)      1 (0.0%, 48.1%)  binstalk::drivers::registry::common::render_dl_template
     231 (0.0%, 82.5%)      1 (0.0%, 51.8%)  <binstalk::drivers::registry::common::render_dl_template::Context as leon::values::Values>::get_value
     219 (0.0%, 83.3%)      1 (0.0%, 52.2%)  binstalk::drivers::registry::Registry::from_str_inner
     126 (0.0%, 88.7%)      1 (0.0%, 60.0%)  <binstalk::drivers::registry::RegistryError as core::fmt::Display>::fmt
     117 (0.0%, 89.3%)      1 (0.0%, 60.6%)  <binstalk::drivers::registry::InvalidRegistryErrorInner as core::fmt::Display>::fmt
      79 (0.0%, 91.8%)      1 (0.0%, 64.0%)  binstalk::drivers::registry::vfs::Vfs::add_path
      66 (0.0%, 93.1%)      1 (0.0%, 65.7%)  <binstalk::drivers::registry::RegistryError as miette::protocol::Diagnostic>::help
      85 (0.0%, 91.5%)      1 (0.0%, 63.6%)  <binstalk::drivers::registry::vfs::Vfs as cargo_toml::afs::AbstractFilesystem>::file_names_in

@NobodyXu
Copy link
Member Author

NobodyXu commented Aug 11, 2023

@NobodyXu
Copy link
Member Author

NobodyXu commented Aug 11, 2023

@NobodyXu
Copy link
Member Author

NobodyXu commented Aug 12, 2023

P.S. After the PRs for new crates are merged, remember to:

  • publish cargo-toml-workspace v0.0.0 version locally and invite others in the org
  • publish atomic-file-install v0.0.0 version locally and invite others in the org
  • publish binstalk-registry v0.0.0 version locally and invite others in the org, this requires binstalk-downloader to be released first.
  • update the secret CARGO_REGISTRY_TOKEN: Create a new token that allows publish to these new crates
  • publish cargo-toml-workspace and atomic-file-install as v1.0.0 since both of them is unlikely to have API breakage

@NobodyXu
Copy link
Member Author

@passcod I think we can also extract binstalk::fetchers as a separate crate binstalk-fetchers since it is mostly self contained and would only need to define a new error type containing binstalk_downloader::{remote::Error, download::Error} and leon::{ParseError, RenderError}.

IMO it will actually make the error message a bit clearer on where the error comes from and it would also enable more easier reuse, especially for self-update binaries, asides from compilation time improvement.

I also think putting binstalk::BinFile and resolve::collect_bin_files into its own crate might be necessary if binstalk-fetchers is extracted since it could be useful for self-update binaries to utilize BinFile to find and install binaries, they are mostly self-contained so it's not hard to do this.

@NobodyXu
Copy link
Member Author

NobodyXu commented Aug 17, 2023

@passcod I'm thinking if the git code should be in its own crate simple-git.

On one hand, binstalk-downloader contains stuff about http(s) before the git code is moved into it and now it becomes http and git.

While git indeed uses http stuff, which is why I decided to put it into binstalk-downloader, it is more than just downloading since it is stateful (can be cached locally and updated) where as http is stateless.

On the other hand, binstalk-downloader's codegen time now increases dramatically and it also creates extra dependencies for binstalk-fetchers, delaying its execution.

The git code also don't use anything from binstalk-downloader at all, makes it perfect to be an independent crate.

@passcod
Copy link
Member

passcod commented Aug 18, 2023

There's not a lot to that git code, I'm thinking of:

  • it seems it's used almost exclusively by binstalk-registry, we could move it there?
  • are the shallow_clone (and bare variant) method and cancellation token somethings that could be interesting to upstream to gix?

@NobodyXu
Copy link
Member Author

There's not a lot to that git code, I'm thinking of:

Yeah but the gix really pulls in a tons of crates

  • it seems it's used almost exclusively by binstalk-registry, we could move it there?

It's also used by binstalk for supporting "--git".

  • are the shallow_clone (and bare variant) method and cancellation token somethings that could be interesting to upstream to gix?

IMHO shallow_clone and its bare variant might be interesting to
@Byron as a convenient function in gix, but not sure whether they want CancellationTomen

@Byron
Copy link
Contributor

Byron commented Aug 18, 2023

Yeah but the gix really pulls in a tons of crates

It's my believe that there is no dependency to non-gix- crates that isn't needed, but I also think that one day there should be a review to make sure this is still the case.
As for the gix crates themselves, it's planned to make non-core functionality selectable with cargo features, but I see that happening only once the integration in cargo heads towards stabilization.

With that said, if you have found a low-hanging fruit that would help you, please let me know or submit a PR against gitoxide.

@Byron as a convenient function in gix, but not sure whether they want CancellationTomen

If the function would be upstreamed (without having looked at it), I think it should be usable here right away, which doesn't seem to be the case without such a cancellation token. Generally I wouldn't worry about it, and think that such customizations are exactly what gix and its plumbing crates should support.

@NobodyXu
Copy link
Member Author

As for the gix crates themselves, it's planned to make non-core functionality selectable with cargo features, but I see that happening only once the integration in cargo heads towards stabilization.

That's good to hear!
I would love for gix to pull in less crates.

Also, IMHO putting gitoxide in its own organisation name is better.

With that said, if you have found a low-hanging fruit that would help you, please let me know or submit a PR against gitoxide.

I could have a look at gix and tries to add a few features there, if you'd like.

Generally I wouldn't worry about it, and think that such customizations are exactly what gix and its plumbing crates should support.

Great, I would open a PR to gix instead of creating yet another crate.

@Byron
Copy link
Contributor

Byron commented Aug 18, 2023

I could have a look at gix and tries to add a few features there, if you'd like.

Thanks you, but I have to decline at this time. Each feature toggle adds complexity and I previously removed some as dealing with them slowed me down noticeably. What I meant is that if you know there is this one feature toggle that could significantly reduce compile times right now, then this is something that should be put in as the added complexity then seems worth it. That magical one feature toggle probably doesn't exist though. My feeling is that it's possible to divide gix into components and then express their dependencies with feature toggle dependencies. Doing so will definitely take some work and it's not yet time for that.

Also, IMHO putting gitoxide in its own organisation name is better.

I agree, but for now there isn't any benefit in doing that. This topic seems unrelated as well.

Great, I would open a PR to gix instead of creating yet another crate.

Actually, I tried to suggest the opposite: Do create the abstractions you need as this is exactly what gix is supposed to support. Please do CC me so I can take a look at them and once I have seen such abstractions often enough they can be brought upstream - we wouldn't want people to keep writing the same abstractions.
At this point it's hard to know for me if it's a convenience just for this crate or if it these abstractions represent a general need.
With that said, if you are convinced this should be upstreamed, let's see that PR :).

@NobodyXu
Copy link
Member Author

NobodyXu commented Aug 18, 2023

Thanks you, but I have to decline at this time. Each feature toggle adds complexity and I previously removed some as dealing with them slowed me down noticeably.

Got it.

What I meant is that if you know there is this one feature toggle that could significantly reduce compile times right now, then this is something that should be put in as the added complexity then seems worth it. That magical one feature toggle probably doesn't exist though.

Since gitoxide contains a lot of small crate, there's definitely no magic option for that.

Actually, I tried to suggest the opposite: Do create the abstractions you need as this is exactly what gix is supposed to support. Please do CC me so I can take a look at them and once I have seen such abstractions often enough they can be brought upstream - we wouldn't want people to keep writing the same abstractions.

Here's the link to the code.

Edit:

It contains:

@NobodyXu
Copy link
Member Author

Closed since all tasks are done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants