Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Urgent] High Impact Regression : v Query Parameter #51

Closed
darylteo opened this issue Aug 14, 2017 · 11 comments
Closed

[Urgent] High Impact Regression : v Query Parameter #51

darylteo opened this issue Aug 14, 2017 · 11 comments

Comments

@darylteo
Copy link

darylteo commented Aug 14, 2017

Hi

First off, let me say I have full appreciation for the work done in this project and the unpkg CDN in general.

To serious business: you may have just broken a number of websites relying on your service.

  • appending the v= parameter in query string now returns a 40x and breaks
  • the v= parameter is added by many other tools (yes, jQuery) by default for cache busting
  • and in general is also added for cache busting reasons in most bespoke algorithms, or by other tools like server-side frameworks, or front end packing tools
  • this also affects things like css font loading from your service as well

Example of issue in the wild:

iview/iview#1603

Unfortunately, this means that with your recent changes completely unbeknownst to us, you have broken our website (and potentially many others). As Linus says: "You do not break user space!"

My suggestion is to NOT do a 403, and simply ignore other unknown variables. Or you can treat 'v' as another accepted query parameter.

Again, thank you very much for your work. If you believe that this is intended behaviour, I respect your judgement but in the interest of time we will have to revert back to self hosting or hack out our custom cache busting.

@darylteo darylteo changed the title v Query Parameter [Urgent] High Impact Regression : v Query Parameter Aug 14, 2017
@fmakdemir
Copy link

One of the libraries I use have a similar problem with another parameter (namely t). It seems like this query parameter check was added recently and causing some problems with existing libraries that have cache invalidation parameters.

@lukasender
Copy link

@darylteo @mjackson FYI: the same is true for fontawesome, see https://unpkg.com/font-awesome@4.7.0/css/font-awesome.min.css which issues requests like https://unpkg.com/font-awesome@4.7.0/fonts/fontawesome-webfont.woff2?v=4.7.0 and returns a 403 now.

Nevertheless, thanks for the awesome project and your great work! Keep it up 👍 :)

@mlucool
Copy link

mlucool commented Aug 14, 2017

This effects all ipywidgets projects as well

@darylteo
Copy link
Author

@Lumannnn @mlucool @fmakdemir

Just FYI, code was specifically introduced to break cache busting, according to the comments.

// Do not allow unrecognized query parameters because
// some people use them to bust the cache.

99c1f70#diff-39c57a99c47dd1a43ee1b1b18c64cf98R24

@fmakdemir
Copy link

Yes, I saw that as well and in this case those some people are big libraries like fontawesome 😄
It seems like bootstrap use # for busting cache.

@EricSimons
Copy link

Oops, looks like I just made a dupe of this: #52

Gotta say, this is... insane. Querystring parameters are INCREDIBLY common and I'm blown away that the end solution here was to fail a valid request with a 403. There are just so many other ways of handling this that make more sense, I don't understand. Why not just send back a 301/302 to the actual resource? Why not resolve the resource, just like all other CDN's do (and any standard webserver for that matter)?

@mjackson as I've said previously, I'm a huge fan of unpkg and all of your work into it. That said, this issue + #38 really have me wondering: did you intend for Unpkg to be used in production use cases? Or should folks be treating this like a CDN only suitable for dev environments?

@mjackson
Copy link
Member

Thanks for the report @darylteo, and for the level-headed conversation. It sucks when stuff breaks, and people often lose their cool when it does. In this case I inadvertently changed the behavior of the code (I thought we were already checking for invalid query parameters, but apparently not) so I did not intend to break anyone's sites.

The good news is I already added back support for the v query parameter in f5be48f and cleared the cache so this should be fixed.

The larger issue here is people using query parameters as a way to bypass the cache. It's a common technique when working with browser caches, but it makes the CDN pointless. For example, many people are using a query string like ?_=timestamp where every single page load will trigger requests that bypass the cache (see #50) and hit the origin server. This is the brute-force method of dealing with cache invalidation problems that "fixes" things by opting out of the cache entirely.

When I designed the unpkg URLs, I put the package version number right in the URL. All unpkg URLs that do not contain a fully-resolved version number (e.g. those that contain a semver string) redirect to a URL with a real version number in the URL. So version numbers in the query string are not only harmful on unpkg, they're not actually necessary either.

My plan going forward is to add official support for the v query parameter as a way for library authors who want to use cache busting. It's my hope that they will put a real version number or equivalent (e.g. a commit SHA) as the value instead of just using a timestamp so that they can get the cache busting behavior they desire and we can get good caching.

@mjackson
Copy link
Member

mjackson commented Aug 15, 2017

Just looking into this a little more, this breakage was not actually introduced by a change to the code in this repo. You can see in the commit log that the code was already checking for invalid queries previous to the deploy made a few days ago.

This breakage was actually introduced when I changed a setting on Cloudflare (invisible to all of us here, unfortunately 😕) that started caching files by the pathname + the query string instead of just the pathname. So Cloudflare started sending all the query strings through to the origin server which issued 403s. This sort of thing is difficult to test in isolation because it involves more than just the code in this repo; it involves the Cloudflare config as well.

Anyway, I've reverted the Cloudflare setting so everything should work as it did before the change and I apologize for any inconvenience this may have caused.

The outcome to all of this is simply that unpkg won't ever actually be able to have meaningful query parameters (e.g. /index.js?meta) on individual files, but that's a constraint I can live with.

did you intend for Unpkg to be used in production use cases? Or should folks be treating this like a CDN only suitable for dev environments?

That's a good question, @EricSimons. Thanks for bringing this up.

My intention is that unpkg be reliable and useful in production use cases. Many people currently use it in production (we're serving literally hundreds of millions of requests every day) without any problems.

That said, occasionally I do something dumb and break some use case that I didn't anticipate. This issue is a great example of that. To quote from the "About" page:

The goal of unpkg is to provide a hassle-free CDN for npm package authors. It's also a great resource for people creating demos and instructional material. However, if you rely on it to serve files that are crucial to your business, you should probably pay for a host with well-supported infrastructure and uptime guarantees.

I do my best to be as responsible and transparent as I can with unpkg, but sometimes stuff goes wrong. When it does, I work as quickly as I can to do the right thing and fix the problem. WRT #38, that problem is now fixed and I haven't seen any of those kinds of errors since I published the fix last week.

I may offer an SLA soon for people who need uptime guarantees. More and more people are using unpkg in production, and it's getting to the point that when something breaks, even temporarily, it's just not acceptable any more.

@EricSimons
Copy link

EricSimons commented Aug 16, 2017

@mjackson thanks for this response — it's good to hear that you have plans to ensure folks can rely on unpkg in prod use cases.

FWIW, I can't see myself ever paying for an SLA because there are rock solid alternatives out there that are completely free. These other CDN's are fighting hard to steal away Unpkg's traffic and they have only one mantra: we're 100% reliable & prod-ready. That really hits home with me, because this past week alone our users have reported two separate Unpkg reliability issues and for months #38 was in a broken state. I get that you're a busy dude btw and I obv don't expect bugfixes to land immediately. What I'm pointing out is that most people's expectations of a prod-ready CDN exceed what Unpkg is currently providing, and there's now competition out there trying to eat your lunch.

I'm not sure how this applies to your project & aspirations, but as an outsider it seems like a very large existential threat that should be dealt with head-on. Perhaps being the CDN for NPM packages isn't super important to Unpkg, in which case it's perfectly suitable for dev environments (hence why I asked whether you intended its use for dev or prod).

Btw, I say this all as a guy who doesn't have a dog in the fight. I certainly hope & want to see Unpkg succeed. But if Uber is providing me a lower fare for the same commute, then I'm prob not going to use Lyft — especially if their service has been unreliable recently.

@mjackson
Copy link
Member

@EricSimons Thanks for the feedback. Again, I apologize for any inconvenience this issue may have caused. I think this could also be a case of bad timing for you in particular because you launched a product at the same time as I was doing a lot of major refactoring on this end.

I remain dedicated to this project and to seeing it succeed, and I'll always to my best to respond to and fix issues quickly and with as much transparency as possible. It's not easy to get 100% uptime (it's impossible AFAICT; even the largest sites have bad days) which is probably why I'm a bit hesitant to say it, but we are as production-ready as anyone. I have a lot of systems experience and I've been running this site for over 2 years now with a lot of good days with zero downtime or issues. I should probably launch a status page to remind people of how good the service is most of the time so that when stuff goes wrong they don't blow it out of proportion. I think that may be one of the main reasons for status pages ;)

Anyway, I've got some ideas about how to bring even greater transparency to the process of developing unpkg and tightening the feedback loop in the near future. Hopefully this will help to improve communication so that you feel comfortable using unpkg to power stuff you build.

@EricSimons
Copy link

@mjackson this all sounds awesome to me :) Thanks again for all of your work on unpkg, and I'm looking forward to the additional transparency stuff you mentioned!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants