-
Notifications
You must be signed in to change notification settings - Fork 643
Restructuring backend as MVC #912
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I am into this! I would prefer to do this incrementally, if possible, so that there isn't one big PR that would need to be reviewed and kept up to date. Even though the intermediate state of being half reorganized might be more confusing, it would be MUCH easier for me to have smaller independent-ish PRs to review and merge :) |
I think a good place to start would be to move the API endpoints under // used by both `cargo search` and the frontend
api_router.get("/crates", C(krate::index));
// Cargo only routes - comments below reference src/crates-io/lib.rs in cargo's repository
api_router.put("/crates/new", C(krate::cargo::new)); // publish()
api_router.get("/crates/:crate_id/:version/download", C(krate::cargo::download)); // src/cargo/sources/registry/remote.rs:download()
api_router.get("/crates/:crate_id/owners", C(krate::cargo::owners)); // list_owners()
api_router.put("/crates/:crate_id/owners", C(krate::cargo::add_owners)); // add_owners()
api_router.delete("/crates/:crate_id/owners", C(krate::cargo::remove_owners)); // remove_owners()
api_router.delete("/crates/:crate_id/:version/yank", C(version::cargo::yank)); // yank()
api_router.put("/crates/:crate_id/:version/unyank", C(version::cargo::unyank)); // unyank()
// Frontend routes
api_router.get("/crates/:crate_id", C(krate::frontend::show));
api_router.get("/crates/:crate_id/:version", C(version::frontend::show));
api_router.get("/crates/:crate_id/:version/dependencies", C(version::frontend::dependencies));
api_router.get("/crates/:crate_id/:version/downloads", C(version::frontend::downloads)); The search API is the only route I've found that is currently used by both. I've attached my list of all backend routes and where they are used. I hope to get this cleaned up and submitted as documentation soon. |
I have several commits where I've started working on this and I'm looking for some feedback on the general approach. For now I've started with @sgrif's completed Diesel port since that removes a lot of code and there is less to move around. Once that lands I can rebase and clean this up for a PR. I moved the functionality in commits one route at a time, except for routes that share a lot of code like yank and unyank. My main question is, does this module structure look good? I moved cargo functionality and associated private functions under I see no reason to split any of the other API modules ( |
This looks like a good start, the separation of I'd like to set up a tracking issue, maybe we could set up a meeting over IRC or Skype @jtgeibel? |
Ok so I've had a look through the codebase, here's some suggestions of how we could move forward with the refactor of crates.io. @jtgeibel recently create separate modules for An idea I've had is that we could put all the structs in the For As for Regarding all middleware files, we could have them in a I'm thinking we could have every file in a module of some sort, so we don't have any stray files, and feel it could be clearer to tell what a file / module does without having to look inside of it. For some files e.g. I'd also like to refactor the tests so that all tests related to specific functionality get put into the files and modules they are testing for, rather than having them in the |
Hey everyone, just giving my 2 cents this issue :)
So first of all, I don't really see the gain of tagging endpoints with either a I think a far more sensible approach to refactoring the backend of crates.io would be to first split modules (that are getting out of hand) by functionality. Taking the
IMO, this separation makes it far more easier to find what you might search in crates.io codebase, but it also makes it easier to split up the module because you don't have to think about Does cargo uses this endpoint ? or anything like that. Then, the second step to refactoring the backend of crates.io would be to extract all conduit endpoints into a dedicated
Big 👍 from me on this one, though the example might not be the best. I think the
Well, technically even "stray files" are modules for rust, I don't see the benefit of creating a folder whose sole purpose is to contain a single file named
I'm not really sure moving these utils-kind of modules deeper into the project's hierarchy is the best course of action. For example, the
This would be good but it would imply being able to test the code without spinning up a server for each test which means decoupling the code from the HTTP server. Anyway, 👍 on this the |
Thanks for your input @kureuil, my responses are below.
So I think the reasoning for splitting modules into It's a bit off topic, but it could be possible to set up GraphQL for crates.io via the juniper crate for Iron, but that'd require porting everything over to Iron, and if Rocket or another framework becomes the de facto web standard for Rust in a couple of years, then we'd have to migrate to that as well. If we're refactoring, I feel a discussion about Iron at the very least could be had as well, though it'd be a monumental amount of work, possibly even verging on re engineering crates.io...
So in this example, we'd deprecate the endpoint and keep it in the
This sounds like a good plan to start the refactor of the Ideally, in this refactoring, it'd be nice to have no file be greater than ~300 lines or roughly avoid scrolling more than a monitor length within a file.
So the modules
Fair enough, we could keep the rest of the modules as is then. I'm going to create a separate issue for the test refactoring, feel it'll help make things clearer regarding our current state of test coverage and subsequently work to improve that. |
I don't see why crates.io wouldn't be a public API service, as it seems that no-one is against having public documentation for the API endpoints (See #731 and #741). Limiting ourselves to only support the crates.io frontend and cargo seems like shooting ourselves in the foot (feet?) as the API is already public and people are already using it, so telling them "The API might not be stable because you're not as big as cargo" doesn't seem fair to me. We should ensure that the API has no breaking changes no matter where it is used or by whom. This is the whole purpose of namespacing the API endpoints with the
Let's not rush ourselves here, the landscape might change once async I/O is used more widely and Rocket might even stable by then. About refactoring, if we decouple crates.io from conduit, it would help greatly if we are to change framework and go for something like Iron/Nickel/whatever. Though that's not the issue here.
Really not a fan of this. Moreover, if during front-end work the Ember application happens to use an endpoint that was only used by cargo before, we would have to move this endpoint into a
I don't like these kind of "hard" rules about file length. Limiting ourselves to a certain number of lines might promote functionalities which logic is split in multiple files because "limits" making it harder for someone to contribute to this particular feature. Also this is probably one of the best way to end up with file names like
Maybe they should be submodules of the |
My main concern here is that the crates.io backend isn't designed to support this use case in mind. An example of a service that is is GitHub, they created a GraphQL API because third party vendors were having difficulty querying specific data out of it. We simply don't have that knowledge ahead of time of third party users of the crates.io API to know what data is specifically most desired at least with fixed endpoints ala GitHub's v3 API, that's why I feel the priority of any API changes has to be ember and cargo since that's all we know. I feel if we want to support external users in the same manner cargo and ember are, we need to know where the most interest lies in crates.io data, and make more general endpoints for
👍 regarding async I/O stabilisation , I'm not sure how we could decouple the backend from conduit to ensure an easier migration to another framework. Interested to hear your thoughts on this.
Hmm, fair enough. If we decide to split on cargo and ember lines, we could add in tests to ensure that's kept up to date.
Sorry, I should have been clearer regarding this. I wanted to avoid having a file with a line count of So this discussion's starting to get a bit long, I'm thinking that we could narrow this discussion down to a few key question so we can make set up a task list and make this an E-Mentor issue:
|
The thing is that even if it wasn't designed with this use case in mind, people are building tools against crates.io no matter what, and we probably should encourage a rich tool ecosystem imho. I think maybe another route to explore for a
I'll quote myself on this one and give more details about what I'm talking about.
For example, taking the // Might not compile at all, no warranty
// src/lib.rs
api_router.get(
"/crates/:crate_id/:version/dependencies",
C(version::endpoints::dependencies),
);
// version/mod.rs
pub fn dependencies(conn: &PgConnection, crate_name: &str, crate_version: &semver::Version) -> CargoResult<Vec<EncodableCrate>> {
let krate = Crate::by_name(crate_name).first::<Crate>(&*conn)?;
let version = Version::belonging_to(&krate)
.filter(versions::num.eq(semver.to_string()))
.first(&*conn)
.map_err(|_| {
human(&format_args!(
"crate `{}` does not have a version `{}`",
crate_name,
semver
))
})?;
let deps = version.dependencies(&*conn)?;
let deps = deps.into_iter()
.map(|(dep, crate_name)| dep.encodable(&crate_name, None));
.collect();
}
// version/endpoints.rs
pub fn dependencies(req: &Request) -> CargoResult<Response> {
let conn = req.db_conn()?;
let crate_name = req.params()["crate_id"];
let version = match semver::Version::parse(req.params()["version"]) {
Ok(semver) => semver,
Err(_) => return Err(human(&format_args!("invalid semver: {}", semver)));
};
let deps = super::dependencies(conn, crate_name, &version)?
#[derive(Serialize)]
struct R {
dependencies: Vec<EncodableDependency>,
}
Ok(req.json(&R { dependencies: deps }))
} This is just a quick example I made without compiling anything but it should give you the big picture: The
👍 Here are my answers :D
I don't think so :)
I think so, lots of work on this side, esp. documentation.
Probably, no time scale as we probably will want for async I/O to stablize. cf. a couple paragraphs above.
¯\_(ツ)_/¯ |
Thanks a lot for the feedback @vignesh-sankaran and @kureuil.
I agree, although so far I've been using that as a bit of a shorthand. I think it makes sense to group cargo related functionality together and agree that When it comes to cargo related endpoints, I think there are a few special things to keep in mind. First, while we don't want to break any existing endpoints, we should be extra careful with the ones hit by cargo. For example, cargo expects us to return a Second, cargo does a lot of read access directly against the index. Cargo does need to go through crates.io to write to the index and to upload to S3. Non-cargo routes don't interface with these subsystems or the associated error types. For these reasons I think there is value in placing the cargo endpoints into To summarize, I think there are several things to keep in mind in this restructuring:
Here is my proposed outline for the
|
1102: Split krate and version functionality into submodules r=carols10cents This series of commits splits the api endpoints in the `krate` and `version` modules into submodules as an initial step of the refactoring discussed in #912. ## Module structure ### `krate::search::search` Shared by cargo and the frontend, used for searching crates. ### `krate::publish::publish` Used by `cargo publish`. This endpoint updates the index, uploads to S3, and caches crate metadata in the database. ### `{krate,version}/metadata.rs` Endpoints in these files provide read-only access to data that could largely be recreated by replaying all crate uploads. The only exception I've seen to this so far is that some responses include download counts. ### `{krate,version}/downloads.rs` Provide crate and version level download stats (updated in `version::cargo::download`). ### `krate/owners.rs` All endpoints for the used by cargo and the frontend for maintaining the list of crate owners. ### `krate/follow.rs` Read/write access to a user populated list of followed crates. ### `version::deprecated` The `version::deprecated::{index,show}` routes appear to be unused. We should confirm this and discuss a plan for potential removal. ### `version::yank` Yank and unyank functionality. ## Code that remains in `mod.rs` The code that remains in the `mod.rs` files consists primarily of structs for querying, inserting, and serializing. I'm thinking that these structs could be moved to modules under a `src/models` directory along with: `src/category.rs`, `src/dependency.rs`, `src/download.rs`, `src/owner.rs`. (The `keyword`, `token` and `user` modules also have model logic which can be extracted.) ## Remaining work In order to simplify review of this PR, I've only moved code around and haven't done any refactoring of logic. There are probably bits of code that we can move from the endpoints to the model logic, especially if it returns a QueryResult. Feel free to let me know if you see any low-hanging fruit there, otherwise we can address that in a future PR. (Some of the logic still in `mod.rs` returns CargoResult which covers all error types. We will probably need to put some thought into if the model represents just the database or also includes the index and S3 state. I think further exploration of this is best tracked under #912.) /cc @vignesh-sankaran @kureuil
@vignesh-sankaran and @kureuil, I've posted a proposed MVC style module layout over in PR #1155. |
The above-mentioned restructure PR has been merged. Is there any remaining work? Could we close this and open a issue issue for said remaining work if it's the case? |
Relaying the thoughts of @carols10cents from an email sent to me. We could look into setting up modules with the following structure:
This would also require a restructuring of unit and integration tests, so this would definitely not be a small task.
Thoughts @carols10cents?
The text was updated successfully, but these errors were encountered: