Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow custom compression for custom Accept-Encoding Fixes #59 #62

Open
wants to merge 13 commits into
base: master
Choose a base branch
from

Conversation

nickdesaulniers
Copy link

review? @dougwilson

@nickdesaulniers
Copy link
Author

eh, the test I failed was nvm install 2.0: https://travis-ci.org/expressjs/compression/jobs/87722737

@nickdesaulniers
Copy link
Author

oh, I should add a note to the readme

@nickdesaulniers
Copy link
Author

oh this looks similar to #27 and #32

index.js Outdated

// If we support a custom encoding and a custom encoding was requested
if (opts.compressor) {
method = accept.encoding(Object.keys(opts.compressor));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would a used both define a custom encoding (like br), but yet still provide gzip encoding for clients that do not accept br? It looks like users will now need to provide their own gzip implementation to achieve this with this PR.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

heh, yep, just found this out the hard way, will patch

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately it looks like there is still a bug here. The header Accept-Encoding: br;q=0.001, gzip, deflate will incorrectly use br over gzip just because the user wanted to provide an implementation for br.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, you're correct. I think I'll need to implement turning off gzip or deflate then. Maybe:

var opts = {
  compressor: {
    br: ...,
    gzip: null,
    deflate: null,
  }
};

@nickdesaulniers
Copy link
Author

k, fixed and added a test

@dougwilson
Copy link
Contributor

These tests only confirm we set the Content-Encoding header; they don't actually confirm we ever used the custom compressor, as far as I can tell. Can you add at least one tests that confirms the custom compressor was used?

@dougwilson
Copy link
Contributor

Also, we need to think about how res.flush will work with a custom compressor.

@nickdesaulniers
Copy link
Author

Also, we need to think about how res.flush will work with a custom compressor.

I don't understand what res.flush is for? Could we modify:

diff --git a/index.js b/index.js
index a669cbb..315e7b9 100644
--- a/index.js
+++ b/index.js
@@ -66,7 +66,7 @@ function compression(options) {

     // flush
     res.flush = function flush() {
-      if (stream) {
+      if (stream && typeof stream.flush === 'function') {
         stream.flush()
       }
     }

@dougwilson
Copy link
Contributor

res.flush is required for things like SSE to work with compression (there is an example in the README).

@nickdesaulniers
Copy link
Author

just added a [failing] test for q= priorities, will work on getting the test passing

@nickdesaulniers
Copy link
Author

ready for review? @dougwilson
I'll probably also send along patches to explicitly disable gzip/deflate in another patch

@dougwilson
Copy link
Contributor

Looks pretty good :) Still need to do something about flush. Should we reject streams that do not provide a flush method? Should we allow the streams to maybe implement flush, and if so, how should we document this / should we not add res.flush in that case?

If flush is not provided by the stream, should we require that every single .write() call compress only that exact chunk and emit it? If not, how can someone possibly stream with this module in the middle (SSE is the best example)?

In addition, now that I look harder at your PR, I don't see how it can ever work on a server, because it looks like I can only handle one response.

@nickdesaulniers
Copy link
Author

In addition, now that I look harder at your PR, I don't see how it can ever work on a server, because it looks like I can only handle one response.

Yep, looks like we'll need compressor functions to return new instances of the compressor streams. Will fix up.

I'm not sure I fully understand what flush is and all of the implications of its use. It sounds like it causes execution to block until the stream is fully done writing? I don't see anything related to a flush method in the streams docs. _flush yes, flush no. Why do SSEs need flush? It looks like the only streams that implement flush are:
https://nodejs.org/api/zlib.html#zlib_zlib_flush_kind_callback I also can't find any info on implementation details about a flush method in the streams handbook.

From the readme: This module adds a res.flush() method to force the partially-compressed response to be flushed to the client.

Is it even correct to send the client a partially encoded message?

Is it that zlib objects buffer their content until a threshold is reached?

The example in the readme doesn't even work in Gecko (prompts for a file download). Are SSE not cross browser yet? hmm, looks like yes: http://caniuse.com/#feat=eventsource, I'll have to file some bugs...

If not, how can someone possibly stream with this module in the middle

I honestly can't see how flush relates to this.

After reading https://nodejs.org/api/stream.html#stream_transform_flush_callback a couple of times, it sounds to me like this is something zlib needs. If other streams don't have the same requirements, then I don't think we need to call flush on the custom stream.

@dougwilson
Copy link
Contributor

I understand if you don't understand why flush is necessary and understand how the framing working in gzip streams for something like flush to work. If you don't implement it it, I cannot accept the PR. I'll try to find some time in the upcoming week to show you how to use SSEs in web sites, and how flush works with this and exactly why it is so important, if you are unwilling to accept this at face-value.

You're also welcome to look through the issue history here and contact all the people who need flush in order for their stuff to work and ask them exactly why that is. You can also look at our unit tests for flush to see how flush works with gzip as another source.

Modules like https://github.com/marko-js/marko rely on a working flush and you can also ask the authors there as yet another source as to why it is so important.

@dougwilson
Copy link
Contributor

Is it even correct to send the client a partially encoded message?

To understand exactly what it does, you'll need to have a general understanding of how zlib works and how the framing works. To do any type of compression, you need to buffer up some content (typically called a "window") so the algorithm can look at the content for duplicated things. Without flush, the content is ever sent to the client unless the server fills that entire window or the content ends. Flush tells the algorithm that "hey, the bytes you have right now, that's the size of the window, so compress it up, frame it, and send it on". Then the next bytes will start a new window.

This means in SSE, you can send a command "new_mesage" to the client, but since that is not yet about 16KB long, it'll wait for more bytes before compressing and sending to the client. Unfortunately you need to send the message to the client now, but thankfully, you can call res.flush() to tell gzip to compress "new_mesage" as a whole and send it. Then when you message is a 1KB long JSON string, which still does not fill 16KB, you can call res.flush() again to send that JSON string compressed in it's own frame.

Clients decode in the blocks they get. Once they get a compete block, even if it's not the complete message, they can inflate it. Because res.flush() causes the block to complete, it means that the client can also inflate that block when it gets there, even when you have more content to send.

This is a general overview, only typed from memory and may have some mistakes, I will try to come up with a full document in the upcoming weeks, though, to help understand better and provide proper references to the implementations and how it all comes together for things like SSE.

@dougwilson
Copy link
Contributor

Also, @nickdesaulniers , if you want an example that does not require knowledge of how SSE works to see how important res.flush is, try running the following application, opening http://127.0.0.1:4000/ in Firefox both with flush commented out and without to see the drastic difference and how it's important for applications like SSE and partial HTML rendering:

var compression = require('compression');
var express = require('express');

express()
.use(compression())
.get('/', streamTime)
.listen(4000);

function streamTime(req, res) {
  res.type('txt');

  setInterval(function () {
    res.write(new Date() + '\n');
    res.flush(); // try commenting this out
  }, 1000);
}

@nickdesaulniers
Copy link
Author

Thank you for the explanation and for the code example. 🎊

Without flush, the content is ever sent to the client unless the server fills that entire window or the content ends.

I had forgotten about the usage of windows, though now I recall reading about them at some point. I was under the assumption that when content "ends" then an explicit call to flush was not needed.

Now I see that SSE relies on not calling end on a response object, so the compression engine needs a way of knowing not to wait for more input, and to write the fully compressed content to the response stream.

If you don't implement it it, I cannot accept the PR.

Respectfully, I'm curious if that's an issue for this library or not. If the user provides a compression engine that does not implement flush, but they're not using SSE is that a bad thing? If the user provides a compression engine that does not implement flush, and they try to use SSE, should this library simply crash (undefined is not a function since there's no stream.flush method)? Should we print a warning to the console when we're passed a compression engine missing a flush method? Should we warn about flush+SEE in the readme?

Let me know your thoughts, and I'm happy to implement and codify them in this PR. Thanks. 🎆

@KrisSiegel
Copy link

This PR is getting kinda old and I was looking for this capability so I could use brotli. Is this PR dead?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants