Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: Comments not loading #244

Closed
dellagustin opened this issue Mar 16, 2023 · 14 comments · Fixed by #247
Closed

Bug: Comments not loading #244

dellagustin opened this issue Mar 16, 2023 · 14 comments · Fixed by #247
Assignees

Comments

@dellagustin
Copy link
Contributor

This is likely a regression of #236.

VM32:1 Uncaught (in promise) SyntaxError: Unexpected non-whitespace character after JSON at position 1574
at JSON.parse ()
at n (bundle.js:6:867129)

image

@dellagustin
Copy link
Contributor Author

@daveajones you can assign this one to me.

@dellagustin
Copy link
Contributor Author

dellagustin commented Mar 16, 2023

So, the current implementation assumes that each res.write on the backend will correspond to exactly one response.body.getReader().read() promise resolution on the front-end. I could not find anything on the documentation related to the backend and frontend with respect to chunked transfer encoding.

From https://developer.mozilla.org/en-US/docs/Web/API/Streams_API/Using_readable_streams:

Now you've got your reader attached, you can read data chunks out of the stream using the ReadableStreamDefaultReader.read() method. This reads one chunk out of the stream, which you can then do anything you like with.

So, yeah, it should be one chunk, so I suspected something else was at play here, then did some verbose curls to production and my local development for comparison.

Production:

❯ curl https://podcastindex.org/api/comments/byepisodeid?id=13886063569 --verbose
*   Trying 2606:4700:20::ac43:4b8f...
* TCP_NODELAY set
* Connected to podcastindex.org (2606:4700:20::ac43:4b8f) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: C:/Program Files/Git/mingw64/ssl/certs/ca-bundle.crt
  CApath: none
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN, server accepted to use h2
* Server certificate:
*  subject: C=US; ST=California; L=San Francisco; O=Cloudflare, Inc.; CN=sni.cloudflaressl.com
*  start date: Jun  6 00:00:00 2022 GMT
*  expire date: Jun  5 23:59:59 2023 GMT
*  subjectAltName: host "podcastindex.org" matched cert's "podcastindex.org"
*  issuer: C=US; O=Cloudflare, Inc.; CN=Cloudflare Inc ECC CA-3
*  SSL certificate verify ok.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* Using Stream ID: 1 (easy handle 0x27948d0)
> GET /api/comments/byepisodeid?id=13886063569 HTTP/2
> Host: podcastindex.org
> User-Agent: curl/7.63.0
> Accept: */*
>
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* old SSL session ID is stale, removing
* Connection state changed (MAX_CONCURRENT_STREAMS == 256)!
< HTTP/2 200
< date: Thu, 16 Mar 2023 22:43:44 GMT
< x-powered-by: Express
< strict-transport-security: max-age=31536000; includeSubDomains
"https://podcastindex.social/users/ChadF"},"commentAsof":"2023-03-16T22:43:43.529Z","replies":[],"repliesAsof":"2023-03-16T22:43:43.529Z"}},"commenters":{"https://podcastindex.social/users/ChadF":{"icon":{"url":"https://cdn.masto.host/podcastindexsocial/accounts/avatars/106/557/219/416/316/803/original/c650f8421a9d4807.png","mediaType":"image/png"},"name":"Sir Chad Farrow :pci: 🎧✂️","url":"https://podcastindex.social/@ChadF","fqUsername":"@ChadF@podcastindex.social","asof":"2023-03-16T22:43:43.529Z"}}}* Connection #0 to host podcastindex.org left intact

Local:

❯ curl http://localhost:5001/api/comments/byepisodeid?id=13886063569 --verbose
*   Trying ::1...
* TCP_NODELAY set
* Connected to localhost (::1) port 5001 (#0)
> GET /api/comments/byepisodeid?id=13886063569 HTTP/1.1
> Host: localhost:5001
> User-Agent: curl/7.63.0
> Accept: */*
>
< HTTP/1.1 200 OK
< X-Powered-By: Express
< Date: Thu, 16 Mar 2023 22:44:49 GMT
< Connection: keep-alive
< Keep-Alive: timeout=5
< Transfer-Encoding: chunked
<
{"roots":["https://noagendasocial.com/users/adam/statuses/109968324221823701"],"nodes":{"https://noagendasocial.com/users/adam/statuses/109968324221823701":{"comment":{"url":"https://noagendasocial.com/@adam/109968324221823701","published":"2023-03-05T02:31:02Z","attachments":[{"mediaType":"image/jpeg","width":816,"height":771,"url":"https://static.noagendasocial.com/media_attachments/files/109/968/324/169/040/998/original/b56f6f27418b0dc3.jpeg"}],"content":{"en":"<p>Podcasting 2.0 Episode 124: &quot;Undeplatformable&quot; -- Adam &amp; Dave discuss the week&#39;s developments on podcastindex.org - A day later due to power and bandwidth, but with the same spunk as always! <a href=\"http://l.curry.com/fDx\" target=\"_blank\" rel=\"nofollow noopener noreferrer\"><span class=\"invisible\">http://</span><span class=\"\">l.curry.com/fDx</span><span class=\"invisible\"></span></a></p>"},"attributedTo":"https://noagendasocial.com/users/adam"},"commentAsof":"2023-03-16T22:44:49.295Z","replies":["https://podcastindex.social/users/ChadF/statuses/110015844766813195"],"repliesAsof":"2023-03-16T22:44:49.295Z"}},"commenters":{"https://noagendasocial.com/users/adam":{"icon":{"url":"https://static.noagendasocial.com/accounts/avatars/000/000/001/original/2433237d38d194bc.jpg","mediaType":"image/jpeg"},"name":"Adam Curry [Texas]","url":"https://noagendasocial.com/@adam","fqUsername":"@adam@noagendasocial.com","asof":"2023-03-16T22:44:49.295Z"}}}{"roots":[],"nodes":{"https://podcastindex.social/users/ChadF/statuses/110015844766813195":{"comment":{"url":"https://podcastindex.social/@ChadF/110015844766813195","published":"2023-03-13T11:56:08Z","attachments":[],"content":{"en":"<p><span class=\"h-card\"><a href=\"https://noagendasocial.com/@adam\" class=\"u-url mention\">@<span>adam</span></a></span> </p><p>Value 4 Value Report EP 124</p><p>LIT Boost = 206,778 sats</p><p>EP 123 Boost = 810,430 sats</p><p>One time Donations = $0</p><p>Monthly Donations = $276.55</p>"},"attributedTo":"https://podcastindex.social/users/ChadF"},"commentAsof":"2023-03-16T22:44:49.295Z","replies":[],"repliesAsof":"2023-03-16T22:44:49.295Z"}},"commenters":{"https://podcastindex.social/users/ChadF":{"icon":{"url":"https://cdn.masto.host/podcastindexsocial/accounts/avatars/106/557/219/416/316/803/original/c650f8421a9d4807.png","mediaType":"image/png"},"name":"Sir Chad Farrow :pci: 🎧✂️","url":"https://podcastindex.social/@ChadF","fqUsername":"@ChadF@podcastindex.social","asof":"2023-03-16T22:44:49.295Z"}}}* Connection #0 to host localhost left intact

Noted something?
Production is using HTTP/2, which does not handle chunked data as in HTTP 1.1 (https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Transfer-Encoding#directives):

Note: HTTP/2 disallows all uses of the Transfer-Encoding header other than the HTTP/2 specific: "trailers". HTTP 2 provides its own more efficient mechanisms for data streaming than chunked transfer and forbids the use of the header. Usage of the header in HTTP/2 may likely result in a specific protocol error as HTTP/2 Protocol prohibits the use.

I'm assuming that Cloudflare is doing http2 by default and converting it at its reverse proxy.
Further investigation is needed here.

@daveajones , any details from the prod setup that you thing by be influencing here, besides the observations I did above?

@ericpp
Copy link
Contributor

ericpp commented Mar 16, 2023

I noticed this too while poking around on my dev copy. My workaround was to add newlines to each chunk (e.g. ndjson) and then split out the newlines on the client side.

@ericpp
Copy link
Contributor

ericpp commented Mar 17, 2023

@dellagustin I was able to recreate it by removing e.part === 'replies' from this statement:

if (e.kind === 'node-processed' && e.part === 'replies') {

Here is my workaround for the issue: ericpp@eaa8c87

@dellagustin
Copy link
Contributor Author

dellagustin commented Mar 17, 2023

Hello @ericpp , thanks for jumping in.

I'm curious to understand the reason you removed e.part === 'replies'.

The solution I was considering is similar to yours, but we cannot assume that all parts of the received data frame (using the HTTP/2 term that is kind of equivalent to chunks, according to what I read), will contain a full JSON, take a look at the following testing screenshot:

image

You can see here 3 complete chunks and the last one that is incomplete.

We need to store the last incomplete part and wait for the next data frame.

Nevertheless I'm reasonably certain that this is impairing the intended performance improvement.
I'm guessing, based on the test results, that Cloudflare's reverse proxy buffers the chunks in packages of a certain length and send the buffered data as a data frame once it reaches that size, but truncated by that size. The string in the screenshot above has 3978 chars, I'm guessing somewhere this package size is the round number of 4096 bytes.

We can workaround that, but if we could somehow ensure a 1:1 chunk to data frame ratio, that would likely give us the best performance.

I'll try to fix it this weekend. Testing in prod-like conditions is one of the challenges.

@daveajones
Copy link
Contributor

Production is client to CF (http/2) then CF to origin (nginx reverse proxy - http/2). I can disable http/2 in nginx and see if it’s the origin buffering that is the problem?

@dellagustin
Copy link
Contributor Author

Hi @daveajones, when I mentioned Cloudflare I was not aware there was nginx in the mix as another "moving part".
That test could help.
Also, could you share your nginx configuration so that I can check if I can reproduce it locally by putting an nginx instance in front of the app?

@ericpp
Copy link
Contributor

ericpp commented Mar 17, 2023

@dellagustin I think I removed the replies part just to see how the code was working and to figure out more ways to speed up the comments loading.

@daveajones
Copy link
Contributor

Ok, here is the nginx config:

server {
    if ($host = www.podcastindex.org) {
        return 302 https://podcastindex.org$request_uri;
    }

    root /var/www/html;
    index index.html;
    server_name www.podcastindex.org podcastindex.org;
    location / {
        proxy_set_header   X-Forwarded-For $remote_addr;
        proxy_set_header   Host $http_host;
        proxy_pass         http://127.0.0.1:8000;       # First attempt to serve request as file, then
    }
    listen 80 default_server;
    add_header Strict-Transport-Security "max-age=31536000; includeSubDomains" always;
}

@dellagustin
Copy link
Contributor Author

Thanks you @daveajones . Seems quite straight forward. I guess that the HTTP/2 issue that is going on is the default behavior then.
I'll try a minimal nginx setup locally and see if it happens as well. If I can reproduce it locally, I can send a PR that solves it from the back end, even if that is not optimal. It will make the back end more robust and we can see how to optimize further after that, if needed.

@ericpp
Copy link
Contributor

ericpp commented Mar 26, 2023

@dellagustin I'm able to recreate the problem through Nginx/Node in Docker: ericpp@314519a

It seems like Nginx is buffering and sending the response in roughly 4096 byte chunks: chunks.txt

@ericpp
Copy link
Contributor

ericpp commented Mar 26, 2023

@dellagustin Looks like passing the X-Accel-Buffering: no header from Node to Nginx might turn off buffering: https://stackoverflow.com/a/25017347
https://www.nginx.com/resources/wiki/start/topics/examples/x-accel/#x-accel-buffering

dellagustin added a commit to podStation/podcastindex-web-ui that referenced this issue Mar 26, 2023
Comments for an episode are sent back to the front end using
chunked Transfer Enconding, but that that is not supported
on HTTP/2 which is used by the reverse proxy (nginx).

At the moment nginx buffers the chunks before sending them
to the browser, which breaks the way the chunks are processed.

With this commit, we use the header X-Accel-Buffering to
disable this buffering, which worked in local testing.

We should still add a more clear chunk delimiter so that
any other buffering or unknown edge case will not
break the comments function (increase robustness).

Co-authored-by: ericpp
@dellagustin
Copy link
Contributor Author

I tested @ericpp 's solution and it works, I have created a PR for that.
We still should add a clear chunk delimiter to avoid other unknown edge cases.

daveajones added a commit that referenced this issue Apr 1, 2023
…or-comments

Fix #244: Disable buffer for comments
dellagustin added a commit to podStation/podcastindex-web-ui that referenced this issue Apr 13, 2023
This commit is a refactoring on the Comments function to improve
robustness for loading partial responses form the comments API.

Once we introduced partial responses (i.e. chunked encoding),
it worked in dev, but failed in production due to buffering in
the reverse proxy (nginx).

This was already solved with
Podcastindex-org#247, but only if
the reverse proxy remains unchanged.

With this refactoring, the implementation would continue to work
even if buffering took place, making it more robust.

For testing, the change introduced in the PR mentioned above was
reverted (temporarily).

For additinal testing instructions, see
Podcastindex-org#247 (comment)
daveajones added a commit that referenced this issue Apr 14, 2023
…ing-robustness

Refactoring #244 - Comments robustness
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants