Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: added transform option #195

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 12 additions & 7 deletions HISTORY.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
Unreleased
===================

* Add `transform` option

0.17.1 / 2019-05-10
===================

Expand Down Expand Up @@ -455,37 +460,37 @@

* update range-parser and fresh

0.1.4 / 2013-08-11
0.1.4 / 2013-08-11
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, my editor removed the trailing white space. This seems like a good thing, but I can make a separate commit if you think it worth it.

==================

* update fresh

0.1.3 / 2013-07-08
0.1.3 / 2013-07-08
==================

* Revert "Fix fd leak"

0.1.2 / 2013-07-03
0.1.2 / 2013-07-03
==================

* Fix fd leak

0.1.0 / 2012-08-25
0.1.0 / 2012-08-25
==================

* add options parameter to send() that is passed to fs.createReadStream() [kanongil]

0.0.4 / 2012-08-16
0.0.4 / 2012-08-16
==================

* allow custom "Accept-Ranges" definition

0.0.3 / 2012-07-16
0.0.3 / 2012-07-16
==================

* fix normalization of the root directory. Closes #3

0.0.2 / 2012-07-09
0.0.2 / 2012-07-09
==================

* add passing of req explicitly for now (YUCK)
Expand Down
53 changes: 53 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,8 @@ the 4th byte in the stream.

Enable or disable etag generation, defaults to true.

This will default to `false` if the [`transform` option is passed](#transform).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This wording is misleading; if the transform option is passed, but as transform: undefined, I assume this will not default to false, but the wording says it will as long as the option is passed (without regard to what value is passed).


##### extensions

If a given file doesn't exist, try appending one of the given extensions,
Expand All @@ -104,6 +106,8 @@ in preferred order.
Enable or disable `Last-Modified` header, defaults to true. Uses the file
system's last modified value.

This will default to `false` if the [`transform` option is passed](#transform).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This wording is misleading; if the transform option is passed, but as transform: undefined, I assume this will not default to false, but the wording says it will as long as the option is passed (without regard to what value is passed).


##### maxAge

Provide a max-age in milliseconds for http caching, defaults to 0.
Expand All @@ -119,6 +123,55 @@ Serve files relative to `path`.
Byte offset at which the stream starts, defaults to 0. The start is inclusive,
meaning `start: 2` will include the 3rd byte in the stream.

##### transform

A function which returns a stream which can be used to transform the data.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be useful to point to what, exactly, "a stream" means. I assume it means it needs to be a Node.js ReadableStream ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see it is addressed below, which seems like duplicate wording. I guess that it is explained down below, but maybe it doesn't need to be listed twice, idk on this comment, but left it for posterity.

If this option is passed it will change the defaults of `etag` and `lastModified`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This wording is misleading; if the transform option is passed, but as transform: undefined, I assume this will not default those to false, but the wording says it will as long as the option is passed (without regard to what value is passed).

to `false`. If the transform stream changes the content length, it is also
responsible for updating the `Content-Length` header or changing to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is likely to be a large foot gun. Is there some way to just make this work without adding such a potential issue? For example, for the intended use cases of this, is it more likely than not that the content length will be different from before?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought hard on this. I don't think there is a good solution which doesn't rely on the user setting it if they are modifying the content. So in my case, I need to buffer the entire file to parse the json and then transform.

In this case, I can calculate the new length, but if you are doing something like escaping content in a CSV you might be able to do it mid stream and in that way you would not know the content length in advance. In this case, you might consider moving to chunked encoding to avoid it. One of the reference implementations just does that for you, but that is also very presumptuous.

TBQH, I feel that this is an advanced feature, requiring advanced understanding. That justifies, to me, documenting it and letting users "do the right thing" for their use case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this feature is being implemented on the "wrong side" of this module's operations. As in, maybe the transformation should happen prior to the stream entering into this module, perhaps solving the virtual fs stuff folks want can also solve this as well, including keeping all the features of this module working an intact, like ranges, etags, etc.

I'm thinking like in a way, the transform is a kind of virtual fs in itself, where the contents, at least, are virtual.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, well if we allowed you to pass in the reference to fs then yeah we could do that. That said, I am strongly opposed to virtual fs implementations. There is so much complexity in that, it is just a really hard trade off compared to such a simple feature such as this. Have you ever dug into things like vinyl-fs? Maybe there are better more modern ones, but either way it is a big dependency to take on.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So to expand on what I mean by a virtual fs is not those enormous things. More like providing an interface that is much simpler to do the little but that this module needs like "hey, give me a readable stream for the file of this name" or some such. In that way, the returned stream in this case could have all the transforms added at that point, which would be below where things like ranging and similar are done. If the "virtual" thing didn't understand (or able to support) ranges, it would just return the full stream, for example. I am just thinking out loud here, to be honest. I just think having the transform part between this module and the fs seems like a better fit to get all the features working, but happy to hear other solutions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, ok then yeah I think I can get on board with that. So we would need fs.stat and fs.createReadStream. I think it would be pretty simple to do that. So in the case of doing a transform you would have to read the metadata and manage your own stat data based on the transform you are going to do. So in my case, I would have to modify the file size in the stat call by reading the whole file and applying the transform, then throw it out. I guess maybe I could cache it in memory, anyway it would push more of the complexity to the end user, but would allow you to be more technically correct.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. And to also be clear: I don't think we even need to copy those names / those specific interfaces. We can "dumb it down" to what is actually needed for this module to do it's operations and let the "default impl" map to those fs.* calls, but when users want to do something different, they get an API that is easier and only what they need to work with. For example, fs.stat returns a lot of things we don't need -- so no need to make the user implement a fs.stat interface. The same for fs.createReadStream especially, as that take a whole slew of options that this module is just never going to need to send, and hammering it down to a simplified interface also means that it is clear to the user what, exactly, they need to implement to get it working well.

I know we need to fix the TOCTOU issue, so perhaps we can roll those things together. But even if not, we would do well to make this interface not prevent us from resolving that issue, at least :) To be specific, the underlying changes that were being held is to (1) open a fd to the fs path, (2) stat the fd and then (3) read from the fd. I suppose this simplified interface could provide the same mechanical interface or something.

`Transfer-Encoding: chunked`.

The transform function is given the readable stream from `fs.createReadStream`,
the `response` object and, the `options` passed to `send()`. It must return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of passing in the options to this transform function? Remember that there is those APIs like sendStream.etag(false) that changes the setting, but is not reflected in that options object; a transform implementer may not realize that, so if those are not important, we should strip them out. Or if they are important, they should get added in from the root source of the setting values.

a `ReadableStream`, you can even just return the `stream` passed to you.

If you expect the transform to always return the same results, you can re-enable
`etag` and `lastModified` and to use the underlying file as it normally would.

*Note on range requests:* When a range request is sent, the read stream will
only read part of the file. If your transform is preforming replacements, or
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would violate the actual http spec, as the range is on the actual recieved output. This will be a large issue for example if the transform doubled the output (maybe encode as hex); the web browser will send a range way past the end of the actual underlying file.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think defaulting acceptRanges: false when transform is passed is a good compromise?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If transform starts removing pretty much all the features of this module, at what point is the feature actually just a separate functionality?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, default is not removal. In my use case I actually would force set etag and lastModified to true and acceptRanges to false anyway. To me that is not removing, just normal options setting for a specific use case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, to clarify i mean the removal as in what happens by default when the transform option is enabled. Yes, folks can then understand the foot guns and re enable certain things, but many will use the defaults.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I think the defaults are good mostly with the transform option, especially if we turn off range requests by default. FWIW, range requests are so uncommon in my experience. The only time I have ever used them was building a video player, but that is a very uncommon use case to optimize for.

If you are using the transform option, the default should be no range requests because it means you need to be very careful your transform can handle range requests. The content-length is a little less clear, but it really depends on your transform, so it makes sense to not set it if you use transform. Lastly for last-modified, this also depends on what your transform does so it makes sense to leave it up to the user.

The more I think about this, the more I think this is the right approach to a transform option default as long as it is well documented.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree on some of this; I think we can make a transform work with all the features if we implement it instead along the lines of a virtual fs, as I poked at in another comment, and not like this implementation. This implementation, ultimately, is just a simple solution that ends up having a lot of pitfalls I think don't need to be there if done differently. Range requests are not uncommon in practice; the module implements specifically what is not uncommon around the web. A very common point (besides audio and video files) that use range requests are pdf files too. That is how the web browser can display the first page of a 50mb pdf without waiting around forever. The browsers make use of range requests for quite a variety of file types under the hood.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also generally think it is bad API design for the default setting of one option to change based on what another, different option is set to, as it just adds confusion as to what exactly is going to happen with a set of options, as the defaults are now dynamic to other factors.

otherwise relying on always having complete content it might not work. You
can set `acceptRanges: false` to disallow range requests or you will have
to ensure that your transform logic can handle partial data. For an example
of a transform which can work on range requests, see the test labled
`should transform range requests`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the test, it is honestly hard to tell what exactly is going on there. Is there a better real-world transform like the tobi one that can be used instead? This is mainly because the docs are pointing folks to look there.


Example:

```javascript
var http = require('http')
var Readable = require('stream').Readable
var concatStream = require('concat-stream')
var send = require('send')

http.createServer(function (req, res) {
send(req, req.url, {
transform: function (stream, res, opts) {
var readStream = new Readable({
read: function () { }
})
stream.pipe(concatStream(function (d) {
var _d = Buffer.from(d.toString('utf8').replace('tobi', 'not tobi'))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume that this example does not work with range requests; just above this it says if it doesn't work with them, be sure to turn them off. So the example should probably head the notice from the docs above as a good example :)

res.setHeader('Content-Length', _d.length)
readStream.push(_d)
readStream.push(null)
}))
return readStream
}
}).pipe(res)
})
```

#### Events

The `SendStream` is an event emitter and will emit the following events:
Expand Down
16 changes: 14 additions & 2 deletions index.js
Original file line number Diff line number Diff line change
Expand Up @@ -102,6 +102,10 @@ function SendStream (req, path, options) {
this.path = path
this.req = req

this._transform = opts.transform !== undefined
? opts.transform
: undefined

this._acceptRanges = opts.acceptRanges !== undefined
? Boolean(opts.acceptRanges)
: true
Expand All @@ -112,7 +116,7 @@ function SendStream (req, path, options) {

this._etag = opts.etag !== undefined
? Boolean(opts.etag)
: true
: (this._transform === undefined)

this._dotfiles = opts.dotfiles !== undefined
? opts.dotfiles
Expand Down Expand Up @@ -147,7 +151,7 @@ function SendStream (req, path, options) {

this._lastModified = opts.lastModified !== undefined
? Boolean(opts.lastModified)
: true
: (this._transform === undefined)

this._maxage = opts.maxAge || opts.maxage
this._maxage = typeof this._maxage === 'string'
Expand Down Expand Up @@ -794,6 +798,14 @@ SendStream.prototype.stream = function stream (path, options) {

// pipe
var stream = fs.createReadStream(path, options)
if (this._transform) {
// When using a transform stream, the implementor is
// responsible for setting content length or any other
// headers for their stream type.
res.removeHeader('Content-Length')

stream = this._transform(stream, res, options)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So since the original stream is overwritten and now lost; does this open open the floor to fd leaks again? Does using a transform stream mean users now need to be careful not to cause them with the fs steam encounters an error, or similar?

}
this.emit('stream', stream)
stream.pipe(res)

Expand Down
1 change: 1 addition & 0 deletions package.json
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@
},
"devDependencies": {
"after": "0.8.2",
"concat-stream": "1.6.2",
"eslint": "5.16.0",
"eslint-config-standard": "12.0.0",
"eslint-plugin-import": "2.17.2",
Expand Down
55 changes: 55 additions & 0 deletions test/send.js
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,9 @@ var assert = require('assert')
var fs = require('fs')
var http = require('http')
var path = require('path')
var Readable = require('stream').Readable
var request = require('supertest')
var concatStream = require('concat-stream')
var send = require('..')

// test server
Expand Down Expand Up @@ -916,6 +918,59 @@ describe('send(file).pipe(res)', function () {
.expect(200, 'tobi', done)
})
})

describe('transforming file stream', function () {
it('should transform a file', function (done) {
var app = http.createServer(function (req, res) {
send(req, req.url, {
root: fixtures,
transform: function (stream, res, opts) {
var readStream = new Readable({
read: function () { }
})
stream.pipe(concatStream(function (d) {
var _d = typeof Buffer.from === 'function' ?
Buffer.from(d.toString('utf8').replace('tobi', 'not tobi'))
: new Buffer(d.toString('utf8').replace('tobi', 'not tobi'), 'utf8')
res.setHeader('Content-Length', _d.length)
readStream.push(_d)
readStream.push(null)
}))
return readStream
}
}).pipe(res)
})
request(app)
.get('/name.txt')
.expect(shouldNotHaveHeader('Last-Modified'))
.expect(shouldNotHaveHeader('ETag'))
.expect(200, 'not tobi', done)
})

it('should transform range requests', function (done) {
request(createServer({
root: fixtures,
transform: function (stream, res, opts) {
var readStream = new Readable({
read: function () { }
})
stream
.on('data', function (d) {
readStream.push(d.toString('utf8').split('').map(function (i) {
return ['a', 'b', 'c', 'd', 'e'][parseInt(i, 10) - 1]
}).join(''))
})
.on('end', function (d) {
readStream.push(null)
})
return readStream
}
}))
.get('/nums.txt')
.set('Range', 'bytes=0-4')
.expect(206, 'abcde', done)
})
})
})

describe('send(file, options)', function () {
Expand Down