Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cookie jar implementation #14

Closed
pfernie opened this issue Nov 16, 2016 · 64 comments
Closed

cookie jar implementation #14

pfernie opened this issue Nov 16, 2016 · 64 comments
Labels
E-pr-welcome The feature is welcome to be added, instruction should be found in the issue.

Comments

@pfernie
Copy link
Contributor

pfernie commented Nov 16, 2016

About a year ago I wrote a user_agent crate (not published to crates.io) for an internal project, which provides the concept of a Session wrapping a Client and does cookie handling per RFC6265. At the time, the only client I targeted was hyper::Client. I'd be interested in updating/adapting the code to reqwest, but wanted to touch base on any existing plans/work here.

First off, the library suffers some warts:

  • Perhaps not the cleanest rust, one of the first libraries I developed when first learning rust.
  • It implements its own Cookie class, which wraps/extends the existing cookie crate (providing a From implementation to handle conversion). This was to introduce some more domain objects (CookiePath, CookieDomain, etc.) to enforce/aid the implementation of some RFC6265 guidelines.
  • At the time, hyper was on a different (0.1?) version of cookie-rs, whereas I was developing against a more recent version (0.2+?), meaning the library goes through some gyrations to "convert" between the two versions by stringifying-then-parsing cookies.
  • While I wrote a fair set of tests, usage and testing was primarly geared towards my particular use case at the time, so implementation re: RFC6265 may not be complete or necessarily fully correct.

Some of these issues can be cleaned up over time, but in terms of bigger picture I have some questions/concerns:

  • The re-implementation/extension of Cookie seems un-necessary; submitting PRs against cookie-rs to bring them in line would make sense...
  • cookie-rs already provides a CookieJar, implementing hierarchical jars and jar signing. In my crate, user_agent::CookieStore is more concerned with storing/expiring/updating cookies per RFC6265, and handles storing cookies and setting outgoing cookie headers based on URLs visited during the Session, as well as serializing the current session (to .json currently, but really to anything serde will serialize to) so they can be saved/restored. So, they are in some sense orthogonal currently, but feels as though they could be merged as well.
  • As mentioned, the crate currently tries to generalize the concept of Session, although the only concrete impl is HyperSession. This was done with the vain hope of making a library for "all the webs" (curl clients, etc.), but the realities of needing to get my project done means this was (IMO) sub-optimally implemented. I'm not sure if it makes more sense to ditch the currently not very ergonomic abstraction and focus on implementing specifically towards reqwest, or to fix/maintain the abstraction and publish the crate separately as user_agent and just implement a ReqwestSession here. At a high level, the Session implementation is mainly concerned with defining how to get/set cookies into/out of Request/Response objects.

Obviously some questions would be more easily answered with some code to review; I just wanted to get a feel for the lay of the land before updating code to be suitable for a PR.

@seanmonstar
Copy link
Owner

seanmonstar commented Nov 16, 2016

Thanks for writing this up, it's awesome!

I have actually wanted to write a new cookie crate for hyper for a while now, since I imagine some of my opinions at this point would be too divisive to get into cookie-rs. For example, I would like to make the Cookie struct not contain so many Strings, perhaps only owning a single String and then holding indices, similar to how rust-url does Url. I've also felt the secure feature of cookie-rs doesn't belong, and would want to move signing of cookies completely out of crate. Another crate could provide that, there are so many different ways someone might want to sign and/or encrypt cookies. I registered the cookies crate to do this, and then found I had way too many other things to do before that.

Does doing that plus the Cookie Jar / Session stuff you've done seem like a good idea?

@Michael-F-Bryan
Copy link
Contributor

Say you were to make a new cookie manager from scratch, how difficult would it be?

I recently came across your blog post and thought I'd try it out. So far I'm really liking the user friendly API you've built up, but the thing that's preventing me from writing any non-trivial web client stuff (as opposed to, say, Python's Requests) is the lack of cookie support.

I'd be interested in sending a pull request or two, but I'm not sure how you'd want to implement cookies and integrate it with the rest of the project. What have you done towards cookies so far?

@pfernie
Copy link
Contributor Author

pfernie commented Nov 18, 2016

I was hoping to push on this a bit this weekend; I'm hoping my existing code (which implements the linked rfc) can quickly be adapted for something at least functional that can be worked on for efficiency later w/o breaking back compat.

@Michael-F-Bryan
Copy link
Contributor

After looking at how hyper has done cookies, @seanmonstar's idea of storing a single string and then getting slices of that using some accessor function when you retrieve a particular field sounds like the way to go. It's not really adding that much complexity and makes things a lot nicer.

How did your cookie jar work? Isn't it essentially just a trie/hashmap under the hood?

@pfernie
Copy link
Contributor Author

pfernie commented Dec 14, 2016

Apologies for radio silence on this; I only recently had some time to revisit. It's certainly a WIP, but in the interest of getting something rolling I've made user_agent available (but not on crates.io), updated to use reqwest as a client. Revisiting the code, I think it makes sense for this to exist as its own crate user_agent, and in the "batteries included" spirit, reqwest may introduce a dependency and implement Session as seen here.

I'd be interested in feedback in terms of API, ergonomics, and as mentioned much in here may not be particularly rust-y as it was originally written a while ago. In particular, the set of traits exposed for clients to be able to be used in user_agent (Session, CarriesCookies, HasSetCookies) were tailored to my specific use case (& hyper), and may not be generally useful... Error handling is a little scatter-brained; nowadays I tend to use error-chain in my projects, although this doesn't seem to be the practice in either hyper or reqwest. I've recently started using slog for logging, although perhaps in this case sticking to the vanilla log makes more sense?

In terms of writing a new/reqwest/hyper-specific Cookie:
The current "cookie-version-mismatch" issue shows up here. There are aspects of the current user_agent implementation that really make more sense as part of a Cookie implementation (CookieDomain, CookiePath, etc.), and paring down user_agent to be concerned with storage, serialization, etc. (e.g. basically a hashmap as @Michael-F-Bryan mentions) and providing the Session and related traits for decorating clients. I'd be interested in implementing that; I started out implementing the "single serialization" a-la-url style here, although after doing this basic work I had some design questions/thoughts. Rather than pollute this thread, I've opened this issue to outline some of them.

@seanmonstar
Copy link
Owner

I do agree that this could be a separate crate, I have no problem with reqwest depending on it.

Looking at user_agent: I like a lot of what I see, and I see that you've thought through a lot of the concepts of storing and sending cookies, fantastic! I'd probably handle the Session and related traits differently, myself. Here's how I'd imagine such a crate working, and how I'd expect to use such a crate in reqwest:

  • I would likely make the reqwest::Client contain a CookieStore/CookieJar/BunchOfCookes, and let each Client essentially be its own session. The store probably wouldn't be exposed publicly, but perhaps Client could take some configuration if needed.
  • I wouldn't expect the Session to do the actual requests/responses, or really even know about them. I'd probably want to ask the store for a list of Cookies to send in a header, according to a given URL. Something along these lines:
if let Some(cookies) = self.jar.get(&url) {
    headers.set(Cookie(cookies));
}
  • I'd afterwards want to store the cookies back in the jar when I get a response:
let res = do_request(args)?;
if let Some(cookies) = res.headers.get::<SetCookie>() {
    self.jar.put(res.url, cookies);
}
  • I'd expect the jar to check for expiration, sub domains, paths, secure, etc, when getting and putting cookies.
  • I don't necessarily expect the jar to know how to convert to/from hyper's Cookie/SetCookie headers, and would probably do the conversion myself in reqwest.
  • I have only ever glanced over slog, and while it seems nice, I'd probably make libraries that interact with the more common logging library (which I assume is log).
  • I'd probably make most of the fields of your types private, and use getter/setters, ot reduce the need to bump major versions when you decide to structure something differently.

I'll hop over to the cooky issue to talk about that specifically.

Exciting!

@pfernie
Copy link
Contributor Author

pfernie commented Jan 12, 2017

Thanks for the comments. I agree that the current Session scheme feels backwards/unnecessary; it was a product of my use case at the time. I'd like to update the current implementation to take these comments into account (remove the Session, CarriesCookies, HasSetCookies traits, etc.), and publish a 0.1 to crates.io to make it available as a dependency for reqwest.

In regards to how you'd like to see functionality implemented within reqwest: You mention not exposing the store publicly, and that effectively each Client instantiation is its own session. But for the use case of wanting to be able to maintain sessions over time, I was thinking it makes more sense to have a reqwest::Client have a field cookie_jar: Option<CookieStore>. and a setter method set_jar(..) and a get_jar() to retrieve it once done. This would allow users to handle serialization and the like as they see fit and pass in the actual CookieStore instance. Calls to get(url) and similar would then utilize the jar if present. Alternatively, to make usage more explicit, do not introduce a new field to Client, but rather extend the API for Client to include session_get(url, &mut jar) and similar. Having a separate/parallel API for jar usage doesn't really feel as ergonomic to me, and I'm not sure what the added value would be (wanting to use a jar for some queries and not others...?), so I'd be inclined to the former.

Either way, it makes sense to move the conversion logic around hyper's headers out of the CookieStore and into reqwest, as you say; that goes along naturally with getting rid of the various CarriesCookies traits and whatnot.

@pfernie
Copy link
Contributor Author

pfernie commented Jan 19, 2017

@seanmonstar not sure if you follow the cookie-rs crate directly, but there is an interesting PR open which may make that crate more attractive for use as hyper/reqwest Cookie impl as opposed to rolling some independent cooky crate. Most of the ideas I tossed around in cooky are implemented here (minimizing allocations, builder pattern, etc.).

@SergioBenitez
Copy link

SergioBenitez commented Jan 19, 2017

@pfernie Saw your post from the PR. Is there something I missed in the PR that you proposed as an idea earlier (I couldn't find where you shared your ideas)? Would love to hear it if so!

@pfernie
Copy link
Contributor Author

pfernie commented Jan 19, 2017

@SergioBenitez No, I actually think your PR is pretty in line with most of the ideas I had been entertaining! I haven't had a chance to dig in deeply, though. I have an un-published (on crates.io) implementation of a cookie jar which handles path/domain matching, etc. here. That crate does introduce some more specific types (CookieDomain, CookiePath, etc.) to handle/enforce that logic, but I am not sure that is appropriate for pushing into the "lighter" Cookie implementation in cookie-rs. The validation, etc. involved is really only of interest to jar implementations, so is overhead that probably doesn't make sense for just Cookie.

As you can tell from the age of this issue, I've been letting it get a little stagnant, but have been hoping to get back to it soon to implement ideas discussed here, add support for public suffix checking, etc. When I do so, I'll have a better idea if there are any enhancements that would make sense to push into the cookie-rs Cookie implementation directly.

@seanmonstar
Copy link
Owner

seanmonstar commented Jan 19, 2017

@pfernie it looks really promising! I'd also want 1 other change, which is to rip out the signing secure feature, so that someone enabling it cannot cause a conflict with the included native-tls crate in reqwest.

@SergioBenitez
Copy link

@seanmonstar I'll be looking into just that in the near future.

@SergioBenitez
Copy link

@seanmonstar I've opened a PR that overhauls CookieJar: rwf2/cookie-rs#76. In particular, it removes the openssl dependency in favor of ring. The secure feature is disabled by default.

@erickt
Copy link

erickt commented May 23, 2017

To update this ticket, @pfernie and I have been getting https://github.com/pfernie/user_agent back to compiling on the latest cookie and reqwest. This should help make it easier to get it feature complete, or merge with another project.

@margolisj
Copy link

Howdy @pfernie and @erickt still working on this at all?

@pfernie
Copy link
Contributor Author

pfernie commented Jul 7, 2017

@margolisj unfortunately I haven't had time to actively work on this. Things are at least in a building state, but it really needs some thought on where to go forward, especially in terms of how it should work w/ reqwest. In particular, I saw #155 recently; event hooks might be a sensible way for something like user_agent to hook into the pipeline to handle setting and acquiring cookies. PRs, ideas, or even just issues are welcome!

@margolisj
Copy link

@seanmonstar Do you see this as something that can be pretty closely ported from the python requests library? I guess if it was possible, I'd really appreciate a short synthesis of how you see this fitting / if anything has changed form your previous opinions posted here.

@seanmonstar
Copy link
Owner

@margolisj I haven't really read through that part of requests, so I couldn't comment there. I don't imagine the exposed API would be the same (I wouldn't want someone to have to use reqwest::Session instead of reqwest::Client).

My thoughts otherwise have stayed the same. If you'd like, I can find the cookie spec and list out the key points about safely storing and sending cookies to the correct urls.

@margolisj
Copy link

margolisj commented Jul 7, 2017

I don't imagine the exposed API would be the same (I wouldn't want someone to have to use reqwest::Session instead of reqwest::Client).

Ah very important. requests does provide another object that wraps around a Cleitn Any preference for triggering "session" mode then? Just add it as a param to the builder?

The cookie spec and/or work seems to be pretty far along in their user_agent and I'd mostly like to piggy back off that.

@seanmonstar
Copy link
Owner

Yes, I'd just have a cookie_jar or whatever option on the ClientBuilder.

@pfernie
Copy link
Contributor Author

pfernie commented Jul 7, 2017

The existing user_agent does do the Session-wrapping-Client thing, but purely because I baked it up for an internal project and that made sense at the time. Just saying I'm totally open to aggressive PRs that change the API to be client owns the cookie_jar and mediates the process. Or if simply lifting out the tests and logic relevant to the cookies RFC to build something new is more expedient, I would have no issue with that either. I do think it would be Nice™ if it were reusable with other crates, but I do not have a requirement that it be so (I use this solely w/ hyper/reqwest for my original use case).

It is alluded to in this issue, but as a NB: I threw in public suffix support via the publicsuffix crate per the spec, but this has not been tested for correctness.

@margolisj
Copy link

@seanmonstar on second thought what I was was short sighted; for sure if you'd like to look through the cookie spec and list any key points I'm sure that would help towards getting it merged to your liking.

@pfernie you meant the cookie jar part as something talk could be easily used outside of reqwest, correct?

@pfernie
Copy link
Contributor Author

pfernie commented Jul 10, 2017

@margolisj correct; I mean if user_agent changed so it could be used for cookie management with other http crates. A design where user_agent needn't concern itself with the particulars of reqwest::{Request,Response} and the like, which off the cuff doesn't seem like a particularly high hurdle... in fact it seems more natural: if the client (reqwest or some other crates') owns the jar, it just hands cookies from whatever responses to store and urls to retrieve cookies to include in whatever requests, so the user_agent API only deals in structs/enums from common crates.

@seanmonstar seanmonstar added the B-rfc Blocked: Request for comments. More discussion would help move this along. label Aug 19, 2017
@shssoichiro
Copy link

shssoichiro commented Nov 8, 2017

I'd like to add my two cents in favor of this, with a scenario I ran into the other night. I was making a post request to a website using reqwest, and the response contained a SetCookie header that I needed in order to authenticate future requests. However, the response then redirected, and the next response in the chain did not have the SetCookie header I needed. My workaround, once I discovered the issue, was to set this request to not redirect automatically, and then grab the cookie from the header, and manually make the next request, but this issue was very hard to pinpoint the root cause. If reqwest had built-in cookie jar handling, it could have automatically put the cookies from the first response into the jar, and everything would have been happy.

My opinion on the API I'd like to see is that a Client should automatically instantiate a CookieJar (which can be customized during the ClientBuilder), and that CookieJar can be accessed via a method on the Client. Then during each request, the Client will automatically build a Cookie header from the CookieJar, and during each response, the Client will automatically put any headers from the SetCookie header into the CookieJar. But of course there may be factors I have not taken into account that may make this a suboptimal approach.

@xurtis
Copy link

xurtis commented Jul 14, 2018

@SergioBenitez the existence of a PrivateJar and SignedJar greatly concern me as they suggest that either storing cookies on the server side is useful or that trusting a client to authenticate themselves is in anyway a level of useful security. There is nothing that these two types provide beyond memory safety, in which case these are the wrong tools to use for providing that.

Any signing performed with cookies should be done on the server side on the values stored in the cookies, not on the cookies themselves and certainly not within the cookie jar on the client.

Secondly, as it stands, the cookies in the cookie crate do not encode a concept of origin, making them useless to any client library unless they are extended.

As far as I can tell, there is much of the specification of cookies as far as an HTTP client is concerned missing from the cookie crate that makes it barely useful to an HTTP client. Only its Cookie and CookieBuilder types seem useful and only then do they appear useful to a server.

Overall, the design of the cookie crate suggests a large misunderstanding of the protocol as well as the security implications related to it.

@SergioBenitez
Copy link

SergioBenitez commented Jul 14, 2018

@SergioBenitez the existence of a PrivateJar and SignedJar greatly concern me as they suggest that either storing cookies on the server side is useful or that trusting a client to authenticate themselves is in anyway a level of useful security. There is nothing that these two types provide beyond memory safety, in which case these are the wrong tools to use for providing that.

It sounds like you've either gravely misunderstood what those types do or are pitting them against a scenario where their utility is dubious. The idea is that, on the server-side, you hold a secret key and can use it to sign/encrypt cookies; the resulting ciphertext is sent to the client. The client sends these cookies back at some point in the future. You can then cryptographically assert the confidentiality, integrity, and/or authenticity of the cookie's values. These properties are significantly stronger than memory safety.

Perhaps you believe that cookie somehow advocates for the use of these types client-side? This is stated nowhere in the documentation, and I can't imagine how that would make sense, so it's odd to me that you would jump to this conclusion and or base your opinions on this non-fact.

As far as I can tell, there is much of the specification of cookies as far as an HTTP client is concerned missing from the cookie crate that makes it barely useful to an HTTP client.

Great! The cookie crate was certainly developed with more server-side oriented applications in mind. Let's improve the crate to work well with client-side uses as well!

Only its Cookie and CookieBuilder types seem useful and only then do they appear useful to a server.

I'm not sure why they wouldn't be useful for building/reading/parsing cookies on the client-side? What's more, I've used the CookieJar type as the storage component of a client-side application, and its delta functionality was very handy.

Overall, the design of the cookie crate suggests a large misunderstanding of the protocol as well as the security implications related to it.

This feels baseless. Nothing that you've said demonstrates this. I'm also not sure which protocol you're referring to; cookies aren't a protocol.

@pfernie
Copy link
Contributor Author

pfernie commented Jul 15, 2018

Glad to see this issue getting some attention; I regret that I never had the time to push on this myself. @SergioBenitez , I believe by "protocol" other users in this thread are referring to the behavior of cookie handling as described by RFC6265, which my user_agent code followed to provide an API. As such, I think the "security implications" mentioned was referring to the fact that without this RFC implementation, there is a temptation in client code to collect and send all cookies on all requests for a session; doing anything more sophisticated is effectively implementing the RFC...

@xurtis, I have not had a chance to review your crate, but I will say a reason I never pushed on this was I was unclear if the additional RFC functionality was appropriate for the cookie-rs project, or should be something external. Having a single definitive crate seems desirable, but IIRC some aspects of the work @SergioBenitez did in cookie-rs was focused on performance and avoiding allocations. I believe supporting RFC6265 fully involves some bookkeeping/allocations etc. such as for domain canonicalization, which some use cases may not want baked (pun intended...) into their CookieJar.

I never dug deep, but I considered that perhaps adding functionality to cookie-rs via implementing some new container (CookieRouter or CookieSession?) that owned a CookieJar and implemented the RFC6265 might be a direction to consider; this would be to allow lightweight usage for cookies in scenarios where RFC6265 is not useful (server-side), and clients that needed this functionality could opt-in. A similar approach might also make sense to separate out the concept of signing as @seanmonstar sort of suggests, and allow users to layer the concepts as needed.

All that being said, I will re-link the repo with my old implementation. The RFC is not that complicated, so perhaps it is simpler to reimplement, but it does contain implementations of most (all?) of the RFC, as well as a set of tests. And being both code from when I was first learning rust, and some niggles of the ecosystem back then (mismatched cookie-rs versions, etc.) any retained code would definitely benefit from some polish. But it may at least be useful as a reference.

Obviously, I have not had time to make any useful progress on developing my old code into something suitable as a crate, but I continue to keep an eye on this issue, and would be interested to contribute to whatever effort is born of the discussion.

@gzsombor
Copy link

@pfernie : I see you worked on your user_agent crate, but it's not on crates.io, so I guess, you don't consider ready for public usage, however there are some reqwest integration already there. Based on a cursory code reading, it seems pretty complete. Do you plan to finis and publish your work? It would be nice to have a working, cookie store solution for client side.

@pfernie
Copy link
Contributor Author

pfernie commented Jan 11, 2019

@gzsombor user_agent is now published on crates.io. I tidied up some of my bigger objections with the crate; I split out the actual RFC6265 implementation into a separate cookie_store crate.

I did not publish as 1.0 since I think there is still design work to be done to make this fit more nicely in the ecosystem. Specifically:

  • reqwest exposes its own Request(/RequestBuilder) and Response types, is there a longer term plan to migrate to http::{Request,Response}?
  • cookie_store still implements its own notion of Cookie (which wraps cookie-rs), as the path and domain values maintained in that crate do not conform to the specs of RFC6265.
  • I had intended to split out a separate user_agent_reqwest crate with the specific reqwest implementation there, but hit ye olde orphan rules and didn't want to jump through newtype hoops.

@seanmonstar
Copy link
Owner

@pfernie that's fantastic!

I'd probably as a next step try to use cookie_store inside reqwest, as part of the async client, and probably even expose wrappers around the Cookie types, to protect against any breaking changes (we can upgrade internally without users needing to worry about breakage).

@pfernie
Copy link
Contributor Author

pfernie commented Feb 21, 2019

Was going to open a fresh issue, but I see e.g. #396 refers back to here. Just wanted to link related issue raised on user_agent.

@seanmonstar seanmonstar added E-pr-welcome The feature is welcome to be added, instruction should be found in the issue. and removed B-rfc Blocked: Request for comments. More discussion would help move this along. labels Mar 21, 2019
@theduke theduke mentioned this issue Mar 22, 2019
@theduke
Copy link
Contributor

theduke commented Apr 9, 2019

Since #480 has been merged, a little update:

A cookie store is now supported (in master) and can be enabled with ClientBuilder::cookie_store(true).

For now, the cookie store is private though and can not be accessed for either populating it manually or storing/saving it somewhere.

This is of course desired and should be added, it just requires fleshing out good method signatures that won't need a breaking change.

@eutampieri
Copy link

@theduke what does

The rudimentary cookie support means that the cookies need to be manually configured for every single request. In other words, there's no cookie jar support as of now. The tracking issue for this feature is available on GitHub.

mean in the 0.9.x documentation? Is it outdated? Shouldn't be enough to just do the thing below to have the cookie stored inside the client?

let client = reqwest::Client::builder()
    .cookie_store(true)
    .build()
    .unwrap();
client.get(url.as_str()).send()

@theduke
Copy link
Contributor

theduke commented Oct 16, 2019

Yes that is outdated, where in the code is it?

@eutampieri
Copy link

eutampieri commented Oct 16, 2019 via email

repi pushed a commit to EmbarkStudios/reqwest that referenced this issue Dec 20, 2019
@gnzlbg
Copy link

gnzlbg commented Jan 11, 2020

What else is required to close this issue ?

@crusty-dave
Copy link

crusty-dave commented Jun 13, 2020

I just stumbled across this. I would like to see an explicit reqwest::Session that works like Python requests.Session(). The Python code handles TLS correctly in that it doesn't create a new TLS session for each request, while reqwest is creating a separate TLS session per request. I haven't found anyway to avoid that. I am running 0.10.4 because our mirror hasn't been updated in a while, so if there is capability in 0.10.6, I cannot currently test it due to mismatched dependencies.

I should add that in this case, my client is running on Windows and I am not specifying any certificates, so it is presumably using native-tls.

I just tested on Ubuntu using openssl and a root CA and it has the same issue - not surprisingly.

@theonlypwner
Copy link

theonlypwner commented Dec 5, 2020

@crusty-dave, there should also be some way to reset the cookies in the session. Cookies can be cleared with requests.Session in Python, according to https://stackoverflow.com/a/23816320

s = requests.session()
# make a request
s.cookies.clear()
# make another request, with cookies cleared

Currently, in 0.10.9, the ::cookie::CookieStore cannot be reset, so to clear the cookies, you'd have to build another Client with ClientBuilder.

@jlpetz
Copy link

jlpetz commented Dec 8, 2020

So I get you can start a client with cookie_store enabled. But is there an way to pre-load(before sending the first request) cookies or access the cookie jar/store?

For example if I fire up a client like

    let client = reqwest::Client::builder()
        .user_agent("GodZilla/2.0 (Atari 2600) petz/1 - petz@blah.com - rust/reqwest")
        .cookie_store(true)
        .build()?;

How can I then access the cookie jar and load cookies I have in a flat file. I'm guessing this feature only enables processing for cookies from responses? So I would need to build the a cookie header manually then use 'default_headers' when building the client? Like this

    let client = reqwest::Client::builder()
        .user_agent("GodZilla/2.0 (Atari 2600) petz/1 - petz@blah.com - rust/reqwest")
        .cookie_store(true)
        .default_headers(headers)
        .build()?;

If that's the case, I'd love to get better support for preloading cookies. As in my use case I have a cookie flat file in this format https://curl.se/docs/http-cookies.html. While I can parse the file format easy enough, it would be great to get functions in reqwest/cookies to be able to insert the seven(correction eight, I forgot #Httponly, per https://stackoverflow.com/questions/11261069/curl-cookiejar-line-commented-out-with-httponly) attributes for cookie(s) easily. Currently unless I'm mistaken the the insert function for HeaderMap only takes KV, meaning I'd have to do a bit of wrangling which could be made much easier.

@pfernie
Copy link
Contributor Author

pfernie commented Dec 8, 2020

@jlpetz I wouldn't strongly recommend it, but I will note that there is the user_agent crate, which was the source of my original implementation for CookieStore functionality in hyper/reqwest (the cookie_store crate was split from this). While I still use it for an internal project, it's effectively unmaintained, hence the lackluster recommendation, but it may suit your needs in a pinch. It implements the concept of a session by wrapping a reqwest::Client, as opposed to the eventual direction of reqwest::Client owning the jar. It is also specifically built using the blocking client in reqwest. And it would have issues with re-directs (missed cookies) mentioned previously in this discussion. However, if these aren't show stoppers and you need something functional, you might give it a shot. It provides the ability to serialize/deserialize the store (save/save_json and load/load_json). The reqwest::Client is passed in, so can be configured as usual, although you would disable the cookie store on the Client itself since the Session handles the store itself.

That aside, after lurk-watching this issue for quite a while, I'd be interested in re-visiting the design here and perhaps bring some of the functionality asked for in this issue. Generally, my recollection is that a goal was to avoid or minimize introducing further external dependencies in the public API. I believe I explored a similar idea previously, but nevertheless I put together a rough idea of perhaps decreasing some of reqwest dependency on specific cookie handling. The functionality reqwest needs can be distilled down to a minimal trait. cookie_store can then implement this trait. This implementation opts to implement the CookieStore within reqwest as a Box<dyn reqwest::cookie::CookieStorage, to avoid making Client generic over the cookie storage type. Client could then provide with_cookie_store_as{,_mut}, which would allow &mut access to the CookieStorage implementer (via std::any::Any::downcast_{mut,ref}).

In regards to the current API, this is a breaking change, as ClientBuilder::cookie_store goes from taking a bool to an Option<impl cookie::CookieStorage>.

This API facilitates usage such as:

use cookie_store::CookieStore;
use eyre::Result;

#[tokio::main]
async fn main() -> Result<()> {
    let cookie_store = CookieStore::default(); // or use `load` to deserialize an existing session off disk

    let reqwest = reqwest::Client::builder()
        .cookie_store(Some(cookie_store))
        .build()?;

    reqwest.with_cookie_store_as(|cs: Option<&CookieStore>| {
        dbg!(cs);
    });

    // let's get us some cookies
    reqwest.get("https://google.com").send().await?;

    reqwest.with_cookie_store_as(|cs: Option<&CookieStore>| {
        dbg!(cs);
    });

    reqwest.with_cookie_store_as_mut(|cs: Option<&mut CookieStore>| {
        if let Some(cs) = cs { // Only `Some` if a cookie store was provided, and it can downcast to the requested type
            cs.clear();
        }
    });

    reqwest.with_cookie_store_as(|cs: Option<&CookieStore>| {
        dbg!(cs);
        // the CookieStore could be serialized back to disk here
    });

    Ok(())
}

Producing:

[src/main.rs:13] cs = Some(
    CookieStore {
        cookies: {},
        public_suffix_list: None,
    },
)
[src/main.rs:19] cs = Some(
    CookieStore {
        cookies: {
            "google.com": {
                "/": {
                    "1P_JAR": Cookie {
                        raw_cookie: Cookie {
                            cookie_string: None,
                            name: Concrete(
                                "1P_JAR",
                            ),
<...SNIP.../>
        public_suffix_list: None,
    },
)
[src/main.rs:29] cs = Some(
    CookieStore {
        cookies: {},
        public_suffix_list: None,
    },
)
  • This does not remove the reqwest::cookie::Cookie type, as that is used in the Response API.
  • This example retains the reqwest::cookie::CookieStore newtype, although it could most likely be removed.
  • When receiving cookie header strings from CookieStorage::get_cookie_headers(..) impls, there is no validation that the returned string is valid; but this perhaps should not be the purview of reqwest anyway.
  • Currently, with_cookie_store_as{,_mut} will return None in both the case that no cookie_store was set, as well as asking for an invalid downcast. The signature could instead be something more like Result<Option<..>, ..>, where the Err variant would reflect an invalid downcast, and an Ok(None) that no cookie_store had been set.

@theonlypwner
Copy link

@jlpetz, wouldn't that fail if the server unsets cookies, and they are still included in the default headers?

More important than sessions would be read/write access to the cookie store (either in Client or in a later-added Session), since that would allow saving and restoring cookies.

@jlpetz
Copy link

jlpetz commented Dec 9, 2020

@theonlypwner

wouldn't that fail if the server unsets cookies, and they are still included in the default headers?

Probably, but I'm expecting this not to happen in my scenario. These are auth cookies I'm loading that are needed on every request I'm making from the client to the servers I will be interacting with. But it's painful as I'll need to handle expiry and such myself. Luckily I won't be hitting sites that don;t require these cookies, or I'd have to do the addition host/path checking etc. I haven't tested it yet but I'll be interested to see if the default_header cookie I initially provide gets merged correctly with any cookies the client tries to add to the cookie_store from server set_cookie responses, or if this confuses things(ie. cookie_store is out of sync and overwrites it).

More important than sessions would be read/write access to the cookie store (either in Client or in a later-added Session), since that would allow saving and restoring cookies.

Exactly, from the reply from pfernie it sounds like this isn't quite possible yet. But yes read/write access to the cookie store for saving/restoring cookies is what I'm trying to accomplish(mostly restoring, as I'm getting the cookies from an external U2F process).

@pfernie, Thanks for the detailed response. Given I'm new to rust I think I'll take your advice and avoid the 'lackluster recommendation,', since that seems like a quite complex path for my skill level. I hoping that potentially hacking default headers might be enough for my use case (though likely less ideal/robust). On this comment though

in regards to the current API, this is a breaking change, as ClientBuilder::cookie_store goes from taking a bool to an Option.

Would we need to break the API to do this? I don't mind not being able to provide a preloaded cookie store and keeping the bool. But after creating the client could the cookie_store not be exposed by the client for read/write access? ie. the cookie store has insert, get, list, delete cookie functions. Surely these would already exist and are just not exposed, given a response/request can already do most of these things. Going a step further to restore/save functions would allow re-use of cookies between multiple clients and persistence to a file(like the one I'm loading from).

Even if you wanted to be able to create the Client with a preloaded cookie_store, could we not provide that as a separate option to the currently available cookie_store bool, to avoid a breaking API change. ie. loaded_cookie_store or cookie_store_object. In this scenario, not providing the cookie_store_object would work as it does today. Then if you provide it, it would operate as below.

  1. cookie_store true + cookie_store_object = Gather cookies during client lifetime and start with preloaded cookie_store_object
  2. cookie_store false + cookie_store_object = Don't gather cookies during client lifetime, but use the cookies initially provided

or

  1. cookie_store true + cookie_store_object = Gather cookies during client lifetime and start with preloaded cookie_store_object
  2. cookie_store false + cookie_store_object = By providing a cookie_store_object your implicitly making cookie_store true

a goal was to avoid or minimize introducing further external dependencies in the public API. I believe I explored a similar idea previously, but nevertheless I put together a rough idea of perhaps decreasing some of reqwest dependency on specific cookie handling. The functionality reqwest needs can be distilled down to a minimal trait

As I mentioned above, I'm new to rust, so more than half of what you said is likely over my head. Perhaps I'm suggesting something not workable given the goal(above) and rusts ownership (apologies if I am). I just thought I would ask here in case I was missing a way to do this which already exists and if it doesn't to hopefully get it clarified as a feature request. Sounds like it would be feature request from what you wrote. Thanks again for the fast and very detailed response.

@pfernie
Copy link
Contributor Author

pfernie commented Dec 9, 2020

@jlpetz Apologies, I should have been more clear in my addressing of my comments. The first section about user_agent was specifically addressed to you, but the proposed reqwest & cookie_store changes were meant to be more general.

But, as to your comments: yes, it could be done without an API break. I was noting it more as a "I happened to do this for this implementation"; I wanted feedback on the internal approach before submitting any actual PR.

Your comments sort of mention in passing the idea of a "read-only CookieStore" (don't gather cookies during client lifetime, but use the cookies initially provided). That isn't a usage I'd consider/would need; it's a bit tangential to the discussion, but would that be desirable functionality?

@seanmonstar
Copy link
Owner

seanmonstar commented Dec 9, 2020

"read-only CookieStore"

I'd already been thinking that we could perhaps just define a trait CookieStore, with a default implementation, and then people could provide their own to satisfy their own needs. Doing a trait like that would allow a read-only version to just do nothing in its fn set_cookies() method.


On a meta note, I wonder if perhaps we should close this specific issue, since there is now cookie store support in reqwest, and open a new issue around the new functionality.

@pfernie
Copy link
Contributor Author

pfernie commented Dec 9, 2020

Closing this issue in favor of its continuation in #1104

@pfernie pfernie closed this as completed Dec 9, 2020
@jlpetz
Copy link

jlpetz commented Dec 10, 2020

@pfernie

Your comments sort of mention in passing the idea of a "read-only CookieStore" (don't gather cookies during client lifetime, but use the cookies initially provided). That isn't a usage I'd consider/would need; it's a bit tangential to the discussion, but would that be desirable functionality?

No, I don't really need a READ-ONLY cookie_store for my current usage. I was more mentioning this in case that makes doing a non breaking change easier or re-using existing codebase possible. ie. If the cookie_store_object was provided and the existing cookie_store bool was kept, what would the behaviour be? For me either A or B (below) are fine as I would always be using '1' and not '2'.

1.    cookie_store true + cookie_store_object = Gather cookies during client lifetime and start with preloaded cookie_store_object
2.    cookie_store false + cookie_store_object = Don't gather cookies during client lifetime, but use the cookies initially provided
or B
1.    cookie_store true + cookie_store_object = Gather cookies during client lifetime and start with preloaded cookie_store_object
2.    cookie_store false + cookie_store_object = By providing a cookie_store_object your implicitly making cookie_store true

Having said that, with private browser modes, privacy cookie blocking and all that such being the rage. Perhaps someone would have a use case for wanting to provide cookies they choose but not accept (or selectively accept) cookies from responses.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
E-pr-welcome The feature is welcome to be added, instruction should be found in the issue.
Projects
None yet
Development

No branches or pull requests