-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] cache and registry in combination makes fetch unstable #23
Comments
That's really weird that specifying both would cause troubles, because the registry gets a default value anyway. What's |
Its a Nexus repository (https://www.sonatype.com/product-nexus-repository). The configuration of it seems to be quite standard. This is at a largish company and no one has detected anything weird with it. But as you mentioned the default repository, I did more testing. This does produce errors: But this does not produce errors: So the registry is the source of the issue. But it still seems really strange that it bugs out so completely in this scenario, but works well otherwise. I mean, my npm cache seems to be working perfectly when using the npm client in my work. But I dont know enough about the npm cache to say in what way the registry could be buggy. Is there a sensible way to debug this? |
Can you please update your test to show the raw output? Ie, instead of this: const json = (await npmRegistryFetch.json(dependencyName, conf)) Do this: const response = await npmRegistryFetch(dependencyName, conf)
const body = await response.text()
let json
try {
json = JSON.parse(body)
} catch (er) {
console.error('JSON failed', response.status, response.headers, body)
throw er
} And then share the output. |
response.headers was an object and the body was too large to be logged to console. So the exact code I ran was like this:
All the responses I got was similar to this:
It's really strange, it is as if they are cut of in the beginning. I guess that explains why the json cant be parsed. I wrote some responses to file and inspected them, they all looked like this. Here is a pastebin of an entire response: https://pastebin.com/E7456cXz |
Ah, can you run it again, but don't |
Cutting down the body is fine. And yeah, it's definitely being cut off for some reason. |
Maybe I'm missing something, but if I leave the headers as is then only
|
Update on this! I was able to finally track down a way to reproduce this issue, and track it down to this bit of code in make-fetch-happen's const tee = new Minipass()
const cacheStream = cacache.put.stream(
cachePath,
ckey,
cacheOpts
)
// XXX problem is right here:
// if the cacheStream is slow, and there is more than one chunk
// queued for consumption, then the first chunk is lost. witaf?
tee.pipe(cacheStream)
// this works fine, though:
// tee.on('data', d => cacheStream.write(d))
// tee.on('end', () => cacheStream.end())
// this fails in the same way (ie, not a cacache issue, just a minipass tee issue)
// const cacheStream = new Minipass()
// setTimeout(() => cacheStream.resume())
cacheStream.promise().then(cacheWriteResolve, cacheWriteReject)
newBody.unshift(tee) I think this might be a bug in how Minipass handles multiple pipe destinations, so I'm going to dig into that. If so, it's a bug lurking for later anyway. If I can prove that minipass is behaving properly, then it means it's somewhere further up the stack, maybe in minipass-pipeline or minipass-flush (though those modules are super simple and just rely on the Minipass semantics, so I'm skeptical that they're to blame here). |
For posterity: the repro case is, in the npm v7 branch that uses arborist, doing Going to try to reduce it down to just a set of minipass streams in a synthetic environment today or tomorrow. In the meantime, I can float the "no backpressure pipe" patch on make-fetch-happen, since it's really not too much of a hazard to let the cache writes queue up in memory a bit anyway. |
Oh, wild! It's not just dropping the first chunk, it's getting the full data, but out of order!
|
Something weird is going on here. This SHOULD be fine as a simple pipe(), but for some reason, backpressure from the cache stream can cause the pipeline to drop the first chunk of data, resulting in invalid JSON. Until that is fixed, just write into the cache without any backpressure. The only hazard is that, if the fs is truly very slow, and the rest of the consumption pipeline is very fast, then we'll back up into memory and use more than we ought to, rather than pushing back on the incoming stream. However, this isn't likely to ever be a problem due to how npm does HTTP. Either it's fetching a JSON response, or a tarball (which is also either unpacking to disk, or streaming directly to a tarball file on disk). So, if the disk is slow, and it's a tarball request, we're likely to get backpressure from the main pipeline anyway. It can only become a problem if the JSON response is large enough to span multiple chunks, and also the fs is loaded enough to start slowing down. In the JSON response case, we're going to load the whole thing in memory anyway, so nothing is made particularly *worse* by this lack of backpressure. It is possible that the root cause of this bug exists either in cacache, minipass-pipeline, or minipass itself. But since we don't do a multi-pipe tee stream anywhere else in npm's stack, this is the only spot where it can make itself known. Re: npm/npm-registry-fetch#23
Ok, update this to npm-registry-fetch 8.0.2, and it should not hit the problem any more. I'm leaving this issue open though, as the root cause still has to be determined and fixed properly either in cacache, minipass-pipeline, or minipass. |
Alright! Removed the kludge in make-fetch-happen, and fixed this properly at the root. That was a beast to track down. |
If the pipeline as a whole is not flowing, then it should return `false` from any write operation. Since the Pipeline listens to the tail stream's `data` event, the streams in the pipeline always are in flowing mode. However, the Pipeline itself may not be, so it would return `true` from writes inappropriately, allowing data to be buffered up in the Pipeline excessively. This would not cause any significant issues in most cases, except excess memory usage. Discovered while debugging npm/npm-registry-fetch#23
What / Why
Whenever I have a cache and registry specified in the conf-object, some fetches fail. Its not 100%, but somewhere around 50% of the fetches fail. It only happens if I have both a cache and a registry. I have removed all other configurations.
How
My conf object:
{cache: '/tmp/new-cache', registry: 'my-company-internal-registry'}
My usage of npm-registry-fetch:
const json = (await npmRegistryFetch.json(dependencyName, conf))
Current Behavior
Sometimes I receive this response:
{"code":"FETCH_ERROR","errno":"FETCH_ERROR","type":"invalid-json"}
It never happens unless both registry and cache has been specified. In my testing I have done thousands of fetches and it is very reliable.
I'd be happy to supply more information if needed, or perform testing if it can help.
The text was updated successfully, but these errors were encountered: