Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alternative registry references are inconsistent between the name and index URL #4880

Closed
sfackler opened this issue Dec 31, 2017 · 18 comments · Fixed by #4957
Closed

Alternative registry references are inconsistent between the name and index URL #4880

sfackler opened this issue Dec 31, 2017 · 18 comments · Fixed by #4957

Comments

@sfackler
Copy link
Member

Dependency information in Cargo.toml refers to alternative registries by their name, while the dependency information in the index JSON refers to alternative registries by URL. We should consistently use one or the other.

@withoutboats

@sfackler
Copy link
Member Author

sfackler commented Jan 17, 2018

The two options are:

All URLs

  • The index URL is the canonical identifier of a registry. The registry names configured in a .cargo/config are just conveniences to avoid having to stick long URLs everywhere. They don't have any semantic meaning, and don't need to match between different users.
  • When rewriting the Cargo.toml to the "canonical" version when publishing, change registry names to index URLs.
  • Add support to identify registries by URL directly rather than just by name in Cargo.tomls.
  • The alternative approach would be to add a .cargo/config into the package tarball with the users's registry configuration.

All names

  • The name is the canonical identifier of a registry. All users of the registry need to use that name for things to work.
  • The allowed-registries field in the index config.json turns into a list of names.
  • The registry field in dependency information in the index now stores a name rather than an index URL.
  • Add an accessor to SourceId::name and store that rather than the SourceId in publish logic.

The "all names" approach seems a bit cleaner, and I think matches better with what @withoutboats's idea of registry identification is.

@alexcrichton
Copy link
Member

@sfackler can you remind me the motivation for rewriting manifests as publish? Is that required today for correctness? Or is that just required when you run cargo vendor and then build again?

@sfackler
Copy link
Member Author

Right yeah. In current Cargo, we rewrite manifests to support older Cargos, I think specifically with older TOML parsers. For example, the object shorthand foo = { bar = true} is rewritten to [foo] bar = true`. But as far as I can remember, there aren't any semantic changes to the contents currently. This would change in the all URLs approach since we'd be modifying the data itself.

@sfackler
Copy link
Member Author

Oh actually that's wrong. It already strips path keys from dependency objects. It was implemented in #4030.

@alexcrichton
Copy link
Member

Oh so we don't actually do it to support older cargos, the original motivation was to support the use case of cargo vendor and then you add a [patch] pointing directly at the vendored source. That required some postprocessing like removing path dependencies (the literal key path) and removing [workspace] annotations. It's just a current side effect of toml serialization that formatting isn't preserved.

I suppose I should answer my own question though. So let's say we don't do anything other than what we do today. That way when you run cargo vendor you'll create a tarball which may have a bunch of other-registry crates. The names there, however, aren't serialized into the tarball, so it could change at build time.

Since vendoring requires a .cargo/config anyway though I wonder if cargo vendor could calculate the section necessary to configure all custom registries and display that?

...

hm...

So is the desire to be more consistent here one that fixes a bug? Or is it just 'we should be consistent'?

@sfackler
Copy link
Member Author

I think this is an instance of a more core issue that we should decide: what does a registry name represent?

In one way of thinking about it, a registry name is just a local alias that you as a developer use to not have to stick a big long git URL in every dependency you're pulling from a custom registry. The name itself doesn't have any meaning, and I can choose a different name than you do. In this world, we need to make sure the packages you publish either rewrite the name you chose to the index URL it represents, or to include the name to index mapping information in the package.

In the other way of thinking about it, a registry has a single canonical name. We never refer to the URL at when publishing, either in the metadata or package tarball itself. @withoutboats prefers this world since it allows you to move a registry from one URL to another without needing to rewrite every index that references its packages, but instead just require users to update their local mappings.

We currently have the worst of both worlds, where there's a canonical URL and a canonical name that everyone needs to agree on.

@withoutboats
Copy link
Contributor

As @sfackler has suggested, my preference is to use all names and require end users to map those names to index URLs. These are the reasons:

  • If the registry goes down, every end user who relies on it can migrate to a mirror, instead of requiring all the registries that have references to it to take action.
  • The same for if the registry intentionally moves addresses.
  • If users want to mirror a registry, they can use other registries that depend on it and those will still download from their mirror.

@alexcrichton
Copy link
Member

Hm ok, but so hang on I'd just like to confirm one thing first. We don't actually have a technical problem today, right? If I have a registry at registry.alexcrichton.com, you can call it foo, publish a crate to it, and then I can call it bar, and I can use your crate, right?


Assuming that works (which I think it does), here's my thoughts! While it's true that two developers can give a different name to the same registry, I sort of doubt this will ever happen in practice. That, for example, means that I can't work on any of your packages without extra configuration (or vice versa). There's sort of a carrot to name everything the same because that way you don't have to maintain a list of all registry names for one URL. So I think yes, effectively in practice, if we have names then we have a unique (ish) name per registry.

Now one thing I don't like about custom names though is that it's possible to have two disjoint communities that both name their registry foo. This in turn can cause confusion and is a symptom of the fact that there's no one arbitrarting handing out names of registries. With URLs, however, we can rely on DNS where it's someone else's problem to arbitrate that. In that sense we sort of get unique names for free (sorta). This of course is though is at odds with my previous conclusion of "effectively in practice we have unique names".

I'd still sort of personally lean on the status quo, with names in manifests that you write down in ~/.cargo/config when you "set up", but the source of truth is still a URL. This assumes of course that there's no technical issues with the current setup.

For things like mirrors and whatnot I think we'll still be covered. If we use a global name we could redirect the url for that name, but if we have a global url I think it'd work the same way internally in Cargo. For example the crates.io registry url isn't changing any time soon but we still want mirrors for that!

So overall I think I'm on the side of:

  • We need a canonical and unique identifier for all registries, and I'm on the side of urls for now
  • I believe that the mirroring question is orthogonal to the choice of name/url (if we choose name you can change the url, and if we choose url we can tell cargo "when you see that url use this instead")

@sfackler
Copy link
Member Author

sfackler commented Jan 17, 2018

If I have a registry at registry.alexcrichton.com, you can call it foo, publish a crate to it, and then I can call it bar, and I can use your crate, right?

Not if my crate depends on any other crates in that registry, because the Cargo.toml in the package I published would have registry = "foo" in it. That's why we need to rewrite the names to URLs when publishing in the "all URLs" option.

@alexcrichton
Copy link
Member

@sfackler did this break in practice for you? I'm under the impression that when Cargo loads dependency information from registries it completely ignores the [dependencies] section of Cargo.toml and exclusively uses the index for that information. (and in this case the index would be "correct" while your local system may not).

@sfackler
Copy link
Member Author

@sfackler did this break in practice for you?

Yes

$ git clone git@github.com:sfackler/busted-crate
Cloning into 'busted-crate'...
remote: Counting objects: 8, done.
remote: Compressing objects: 100% (5/5), done.
remote: Total 8 (delta 0), reused 8 (delta 0), pack-reused 0
Receiving objects: 100% (8/8), done.
$ cd busted-crate
$ cargo +nightly build
    Updating registry `https://github.com/sfackler/busted-registry`
error: unable to get packages from source

Caused by:
  failed to parse manifest at `/Users/sfackler/.cargo/registry/src/github.com-a114d8433bf4fc42/my-crate2-0.1.0/Cargo.toml`

Caused by:
  No index found for registry: `foo`

@alexcrichton
Copy link
Member

Hm ok, good to know and thanks for the example! If it's ok I'd like to dig into what's going on there tomorrow, that's not behaving like I might expect.

@sfackler
Copy link
Member Author

Sure.

@carols10cents
Copy link
Member

carols10cents commented Jan 18, 2018

Looks like it's this part of the RFC that's not working quite right:

If a dependency's registry is not specified, Cargo will assume the dependency can be located in the current registry.

@sfackler's registry currently has:

{
  "name": "my-crate2",
  "vers": "0.1.0",
  "deps": [
    {
      "name": "my-crate",
      "req": "^0.1",
      "features": [
        
      ],
      "optional": false,
      "default_features": true,
      "target": null,
      "kind": "normal"
    }
  ],
  "cksum": "4c2ca7b247a9865c0d2239e963f196b872e234a143619581c07c157ab7afcea7",
  "features": {
    
  },
  "yanked": false
}

Because my-crate doesn't have a registry specified, cargo should assume it can find my-crate in the same registry that my-crate2 is.

It might actually be doing that, but this error:

Caused by:
  failed to parse manifest at `/Users/sfackler/.cargo/registry/src/github.com-a114d8433bf4fc42/my-crate2-0.1.0/Cargo.toml`

Caused by:
  No index found for registry: `foo`

is happening way before then, and requiring that every registry name in every dependency's Cargo.toml have a corresponding index in a .cargo/config somewhere, even if it's not needed? That's my current understanding of the problem...

@sfackler
Copy link
Member Author

Cargo's dependency resolution logic does handle this situation correctly - cargo generate-lockfile succeeds. cargo build doesn't, and interestingly doesn't even produce a lockfile.

I don't think it's possible for Cargo to totally ignore the packaged Cargo.toml - the [lib] section is needed at the very least. If the intention is to parse the Cargo.toml but ignore the dependency section in favor of the information in the index/lockfile, then we may need to adjust Cargo.toml parsing to lazily resolve registry names to SourceIds.

@alexcrichton
Copy link
Member

Ok I see what's going on here, and I believe this is a bug! What's happening is that Cargo during cargo build is actually fetching crates from the registry and parsing the manifest (as @sfackler mentioned, we use the manifest for target information like [lib] and such). This doesn't happen in generate-lockfile AFAIK which is why that succeeds.

Parsing the manifest is failing here because Cargo's trying to construct a Dependency for all dependencies listed. One of the dependencies in my-crate2/Cargo.toml looks like:

[dependencies]
my-crate = { version = "0.1", registry = "foo" }

When parsing this we hit this case which notably calls SourceId::alt_registry which will then attempt to look up an index for foo, but that fails because one isn't configured.

Now this failure isn't actually useful here because the Dependency that we're loading is basically entirely ignored for later parts of cargo's build. The problem is that when parsing the manifest Cargo doesn't currently understand that it's contextually loading a crate from a particular registry and has the explicit dependency information available to it.

So for this specifically (without changing much else) there's two possible fixes:

  • We could add logic to the toml parsing to understand that it's contextually parsing a crate from a registry, and instead of looking up the index for foo we know that the dependency on my-crate has an index url already (from the custom registry index) so we just construct the SourceId directly from that.
  • We could also elaborate this part of the Cargo.toml, rewriting the dependency from registry = 'foo' to something like registry-index = 'https://...'.

That's to say at least that I don't think this is a slam dunk in any direction of how to fix this and what we should canonicalize on.

@sfackler
Copy link
Member Author

Elaborating to registry-index seems a bit cleaner, and also helps maintain the aim of #4030 to keep published crates buildable directly. I'll make a PR and we can discuss it there.

@alexcrichton
Copy link
Member

@sfackler I agree!

sfackler added a commit to sfackler/cargo that referenced this issue Jan 24, 2018
This avoids introducing a dependency on the publisher's name for the
registry.

Closes rust-lang#4880
bors added a commit that referenced this issue Jan 25, 2018
Elaborate registry names to index URLs when publishing

This avoids introducing a dependency on the publisher's name for the
registry.

Closes #4880
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants