Skip to content

Split krate and version functionality into submodules #1102

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Oct 18, 2017

Conversation

jtgeibel
Copy link
Member

@jtgeibel jtgeibel commented Oct 3, 2017

This series of commits splits the api endpoints in the krate and version modules into submodules as an initial step of the refactoring discussed in #912.

Module structure

krate::search::search

Shared by cargo and the frontend, used for searching crates.

krate::publish::publish

Used by cargo publish. This endpoint updates the index, uploads to S3, and caches crate metadata in the database.

{krate,version}/metadata.rs

Endpoints in these files provide read-only access to data that could largely be recreated by replaying all crate uploads. The only exception I've seen to this so far is that some responses include download counts.

{krate,version}/downloads.rs

Provide crate and version level download stats (updated in version::cargo::download).

krate/owners.rs

All endpoints for the used by cargo and the frontend for maintaining the list of crate owners.

krate/follow.rs

Read/write access to a user populated list of followed crates.

version::deprecated

The version::deprecated::{index,show} routes appear to be unused. We should confirm this and discuss a plan for potential removal.

version::yank

Yank and unyank functionality.

Code that remains in mod.rs

The code that remains in the mod.rs files consists primarily of structs for querying, inserting, and serializing. I'm thinking that these structs could be moved to modules under a src/models directory along with: src/category.rs, src/dependency.rs, src/download.rs, src/owner.rs. (The keyword, token and user modules also have model logic which can be extracted.)

Remaining work

In order to simplify review of this PR, I've only moved code around and haven't done any refactoring of logic. There are probably bits of code that we can move from the endpoints to the model logic, especially if it returns a QueryResult. Feel free to let me know if you see any low-hanging fruit there, otherwise we can address that in a future PR. (Some of the logic still in mod.rs returns CargoResult which covers all error types. We will probably need to put some thought into if the model represents just the database or also includes the index and S3 state. I think further exploration of this is best tracked under #912.)

/cc @vignesh-sankaran @kureuil

@kureuil
Copy link
Contributor

kureuil commented Oct 3, 2017

Thanks for getting the ball rolling on this issue @jtgeibel :)

So, I'm still not convinced that having cargo submodules is the best way forward when refactoring the existing codebase. IMO, it just postpones the problem: what will happen when the cargo module will be considered too bloated ? Splitting the cargo module into more submodules doesn't seem right to me and will lead to a more complicated than needed file & folder hierarchy.

I think a better solution would be to write loud and clear in the doc-comments of endpoints what are the constraints of each endpoints. Which means that all the endpoints used by cargo will have this written in their doc-comment, since what version of cargo it is used, since when version it is no more used by cargo (if needed), and what return code cargo expects. Using the doc-comments for this allow us to store much more information right next to the code without having to isolate unrelated functions in a catchall file.

version::depreciated

We probably want to rename this to version::deprecated instead.
Also, just as with the cargo module, I don't think having a module named deprecated hosting all the deprecated endpoints is a good solution, we should instead document the function as deprecated and communicate on it being deprecated. Isolating it doesn't seem useful to me and it makes finding its content harder, as the rest of the hosting module is properly separated & grouped by functionality. Also, what should happen if a deprecated endpoint ends up begin used by cargo ? Should it be moved to the deprecated module or to the cargo module ? This seems to be a pretty big limitation to me, we should use files as a mean to "tag" function, instead we should consider files more like "categories", where each function can only belong to one category (version) but have multiple tags (cargo, crates.io, deprecated, etc.).

The version::depreciated::{index,show} routes appear to be unused. We should confirm this and discuss a plan for potential removal.

I'm against any removal here, IMO we can't deprecate part of an API and remove it "silently" while the rest stays. The whole idea of versioning an API means that when someone use a version, it is implied that this version will not break, just like the Rust language doesn't break versions during a patch version we shouldn't remove endpoints of a published API version. This being a breaking change, it would imply a major version bump and I don't think anyone wants this for now.

I think it is less important to show this grouping in the router definition, but maybe it is better to be consistent.

I think that consistency would be better here.

{krate,version}/cache.rs

Not sure about the name of the file/module here, it doesn't seem to have anything to do with a cache of any sort. Moreover, if we ever introduce the notion of cache in these modules, well the name will be already taken which doesn't seem right to me.

{krate,version}/downloads.rs

Why didn't you put these endpoints with the other metadata routes in the cache module ? I don't see how they need to be in their own file...

The code that remains in the mod.rs files consists primarily of structs for querying, inserting, and serializing. I'm thinking that these structs could be moved to modules under a src/models directory along with: src/category.rs, src/dependency.rs, src/download.rs, src/owner.rs. (The keyword, token and user modules also have model logic which can be extracted.)

I'm not a big fan of having a central models module, I'd much rather have multiple self-contained modules such as krate, versions, keywords, categories, etc. with clearly defined boundaries. (Continue in #912 ?)

On another note, maybe making this PR only about the krate module and having another about the version module would make it easier to review ? Nothing sure here, just wondering :)

@jtgeibel
Copy link
Member Author

jtgeibel commented Oct 4, 2017

Thanks for the feedback @kureuil

what will happen when the cargo module will be considered too bloated ? Splitting the cargo module into more submodules doesn't seem right to me and will lead to a more complicated than needed file & folder hierarchy.

Splitting further into cargo/*.rs files, if needed, is exactly what I had in mind. I don't think this would necessarily make things more difficult to find, especially since the module path in the api router would direct developers where to look. I agree that we want to beef up doc comments, but when reviewing a PR the doc comments aren't necessarily shown in the diff whereas the full file path is always shown. However, I'm certainly open to changing my mind on this and will explore an alternative grouping below.

We probably want to rename this to version::deprecated instead.

Yes, that is much better.

Also, what should happen if a deprecated endpoint ends up begin used by cargo ? Should it be moved to the deprecated module or to the cargo module ?

If an existing cargo endpoint was deprecated, then I would propose it move from cargo/*.rs to cargo/deprecated.rs. Although this is something I hope we never have the need for.

I'm against any removal here, IMO we can't deprecate part of an API and remove it "silently" while the rest stays.

I agree, I'm not advocating for removal at this time. I do think there is value to pulling out these specific endpoints to discourage people from accidentally adding functionality here when it should be added to another route instead. I'll open an issue for broader discussion of our stability guarantees and these endpoints should help provide concrete examples for that discussion.

{krate,version}/cache.rs

Not sure about the name of the file/module here, it doesn't seem to have anything to do with a cache of any sort.

Yeah, I went back and forth a lot trying to name this. I liked your original suggestion of read.rs to reflect the read-only nature of these routes. Then I was also thinking a bit along the lines of "crates.io as a service". For a lot of the data we expose we are just keeping a cache of metadata that could be obtained by downloading and extracting the crate. In that sense, I considered: meta_cache, index_cache, registry_cache, registry, toml_cache, etc.

On one end I see the index as just the git repo. It includes dependency and feature metadata. In the middle is metadata extracted from the TOML file such as authors, license, and homepage. This is buried in the *.crate file and crates.io provides a service by making this easily accessible. (Although, we do only keep 1 copy at the krate level and there have been issues where publishing backported fixes overwrites with stale metadata.) Finally, on the other end we have the registry which covers all our state, including mutable data such as ownership which cannot be obtained anywhere else.

It may help if we had a clear term for this middle ground. For now I'll probably rename to read.rs.

Okay, so finally, here are my proposed changes to this PR:

  • Rename the cache.rs files to read.rs unless we think of something better
  • Rename depreciated.rs to deprecated.rs
  • Split krate/cargo.rs
    • Move index to a new search.rs file and rename to search
    • Move new (and supporting functions) to a new publish.rs file and rename to publish
    • Consolidate all ownership functionality under the existing owners.rs
  • Split version/cargo.rs
    • Move download into downloads.rs - Since this endpoint mutates the database I don't think it belongs in read.rs and at that point we might as well group it with the route that does read from the database.
    • Move yank and unyank into a new file yank.rs - These modify the database and the index.

Do you think these changes are sufficient?

I'll follow up in #912 regarding some observations and suggestions on the model related code.

@kureuil
Copy link
Contributor

kureuil commented Oct 9, 2017

I agree that we want to beef up doc comments, but when reviewing a PR the doc comments aren't necessarily shown in the diff whereas the full file path is always shown.

True, though on GitHub's web diff viewer you can easily expand collapsed file region and access the modified's function documentation comments. Moreover, I think checking that we are not breaking backwards-compatibility should be done in integration tests so that it is not possible to merge a PR breaking compatibility with Cargo.

Yeah, I went back and forth a lot trying to name this. I liked your original suggestion of read.rs to reflect the read-only nature of these routes. Then I was also thinking a bit along the lines of "crates.io as a service". For a lot of the data we expose we are just keeping a cache of metadata that could be obtained by downloading and extracting the crate. In that sense, I considered: meta_cache, index_cache, registry_cache, registry, toml_cache, etc.

How about metadata.rs ? Seems both more explicit & more precise about the content of the file.

Okay, so finally, here are my proposed changes to this PR:

I like a lot the changes you proposed :)

Move new (and supporting functions) to a new publish.rs file and rename to publish

👍

Move download into downloads.rs - Since this endpoint mutates the database I don't think it belongs in read.rs and at that point we might as well group it with the route that does read from the database.

Maybe it would make sense to put it in the metadata.rs module if we decide to choose this name ?

@carols10cents
Copy link
Member

I'm generally in favor of the direction this is going, let's try it! It looks like there were some changes you wanted to make @jtgeibel, so I'm putting this back in your court :)

@jtgeibel
Copy link
Member Author

Yeah, this is back in my court. I'm working on it today but may not finish tonight. I'll probably redo this PR to get a clean history and avoid any merge conflicts.

@jtgeibel jtgeibel force-pushed the split-krate-and-version branch from e7c133b to 18cd661 Compare October 12, 2017 00:19
@jtgeibel
Copy link
Member Author

Okay, I've redone this PR and pushed a new set of commits. The download related functionality is split between krate and version but I've added doc comments to cross-reference these two files. Other than that everything is split as discussed above.

@neunenak
Copy link

Is there anything blocking this commit from being approved/denied? The changes necessary to implement #491 (which I'm working on) will necessarily merge-conflict with this, so I'd like to know that this commit is officially approved or not before formally submitting them.

@carols10cents
Copy link
Member

Is there anything blocking this commit from being approved/denied?

I just need more hours in the day 😅 Looking into this now!

@carols10cents
Copy link
Member

Looks good! Much more organized, hopefully this will help people find the functionality they're looking to work on!!

bors: r+

bors-voyager bot added a commit that referenced this pull request Oct 18, 2017
1102: Split krate and version functionality into submodules r=carols10cents

This series of commits splits the api endpoints in the `krate` and `version` modules into submodules as an initial step of the refactoring discussed in #912.

## Module structure

### `krate::search::search`

Shared by cargo and the frontend, used for searching crates.

### `krate::publish::publish`

Used by `cargo publish`.  This endpoint updates the index, uploads to S3, and caches crate metadata in the database.

### `{krate,version}/metadata.rs`

Endpoints in these files provide read-only access to data that could largely be recreated by replaying all crate uploads.  The only exception I've seen to this so far is that some responses include download counts.

### `{krate,version}/downloads.rs`

Provide crate and version level download stats (updated in `version::cargo::download`).

### `krate/owners.rs`

All endpoints for the used by cargo and the frontend for maintaining the list of crate owners.

### `krate/follow.rs`

Read/write access to a user populated list of followed crates.

### `version::deprecated`

The `version::deprecated::{index,show}` routes appear to be unused.  We should confirm this and discuss a plan for potential removal.

### `version::yank`

Yank and unyank functionality.

## Code that remains in `mod.rs`

The code that remains in the `mod.rs` files consists primarily of structs for querying, inserting, and serializing.  I'm thinking that these structs could be moved to modules under a `src/models` directory along with: `src/category.rs`, `src/dependency.rs`, `src/download.rs`, `src/owner.rs`.  (The `keyword`, `token` and `user` modules also have model logic which can be extracted.)

## Remaining work

In order to simplify review of this PR, I've only moved code around and haven't done any refactoring of logic.  There are probably bits of code that we can move from the endpoints to the model logic, especially if it returns a QueryResult.  Feel free to let me know if you see any low-hanging fruit there, otherwise we can address that in a future PR.  (Some of the logic still in `mod.rs` returns CargoResult which covers all error types.  We will probably need to put some thought into if the model represents just the database or also includes the index and S3 state.  I think further exploration of this is best tracked under #912.)

/cc @vignesh-sankaran @kureuil
@bors-voyager
Copy link
Contributor

bors-voyager bot commented Oct 18, 2017

Build succeeded

@bors-voyager bors-voyager bot merged commit 18cd661 into rust-lang:master Oct 18, 2017
@jtgeibel jtgeibel deleted the split-krate-and-version branch October 18, 2017 23:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants