Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failure on Certain Inputs #46

Open
boyleconnor opened this issue Mar 8, 2021 · 2 comments
Open

Failure on Certain Inputs #46

boyleconnor opened this issue Mar 8, 2021 · 2 comments
Assignees
Labels
Bug Something isn't working Priority: Critical

Comments

@boyleconnor
Copy link
Collaborator

boyleconnor commented Mar 8, 2021

Problem description

I am running the client stack with docker-compose up --build. When I input a certain large/long note (e.g. 119-01.txt, 117-01.txt from the I2B2 dataset), it results in API Error! as the output. I checked the stdout of the docker-compose stack, and it said the following:

phi-deidentifier              | invalid request block size: 4706 (max 4096)...skip
phi-deidentifier              | [uwsgi-http key: localhost client_addr: 172.21.0.7 client_port: 59064] hr_instance_read(): Connection reset by peer [plugins/http/http.c line 647]

Weirdly, though, when I run the phi-deidentifier stack, and input this note through the UI at /api/v1/ui, it seems to run perfectly fine.

Steps to reproduce the issue

  1. Run the phi-deidentifier-app stack
  2. Copy in the contents of one of the specific notes I mentioned above (119-01.txt, 117-01.txt, 116-01.txt, 116-02.txt, 116-03.txt from the I2B2 dataset)
  3. Press "Deidentify Note"

Expected behavior

  • The note should get de-identified
@boyleconnor boyleconnor added Bug Something isn't working Priority: Critical labels Mar 8, 2021
@boyleconnor boyleconnor self-assigned this Mar 8, 2021
boyleconnor pushed a commit to boyleconnor/nlp-sandbox-deidentifier that referenced this issue Mar 10, 2021
The NginX reverse proxy sometimes increases the request block sizes to
more than the default maximum for this server. This addresses issue
nlpsandbox/phi-deidentifier-app#46.
@boyleconnor
Copy link
Collaborator Author

boyleconnor commented Mar 11, 2021

This is a very counter-intuitive bug. An interesting example is that inputting the contents of 116-01.txt or 117-01.txt causes failure. On the other hand, if I input the contents of 116-01.txt followed by 117-01.txt (i.e. the contents of both of the notes) does not cause any error! Clearly, the problem is not as simple as the length of the input or a single special character.

I also noticed that the container that was giving the error was not the nginx container itself, but the phi-deidentifier container. I tried running the React client on its own, pointing it directly at the phi-deidentifier container, instead of going through the NginX reverse proxy:

  1. Add the following two lines to the phi-deidentifier-app section in docker-compose.yml:
ports:
  - "8080:8080"

underneath the phi-deidentifier service definition, as well as adding the line:

- nlp-sandbox

to the networks: list. These would expose the phi-deidentifier server directly at localhost:8080.

  1. In a separate terminal, run the client in development mode, with the server port specially set to 8080, with the following command, run from the client/ directory: SERVER_PORT=8080 npm run start.
  2. Open up a browser with CORS disabled. On Ubuntu, you can do this with the following command: google-chrome --disable-web-security --user-data-dir=TEMP/.
  3. In the CORS-disabled browser, navigate to http://localhost:3000. On this instance of the React client, which is specially configured to connect directly to the phi-deidentifier backend via localhost:8080, try inputting and deidentifying either note 116-01.txt or 117-01.txt. They should get deidentified just fine. You can look at the console output for docker-compose up to double check that this client is using the dockerized phi-deidentifier backend (just not going through the proxy).
  4. In another tab or browser, open up http://localhost:80. This is a React client instance that is configured to still go through the proxy. First, try inputting some short text and then pressing "deidentify note". This should work fine. Now, input the contents of either 116-01.txt or 117-01.txt; this should fail.
  5. Look at the console output from docker-compose up. It should be something like this:
phi-deidentifier              | invalid request block size: 5034 (max 4096)...skip
phi-deidentifier              | [uwsgi-http key: localhost client_addr: 172.22.0.3 client_port: 37523] hr_instance_read(): Connection reset by peer [plugins/http/http.c line 647]

notice that although the only difference between the two deidentify requests we made was that the failing one is going through the NginX proxy on its way to the backend. Also notice that the nginx container did not output any error message; the request seems to be going through the proxy just fine, but when it arrives at the backend, it is somehow too big.

The proxy seems to be changing the size of the request blocks in some way that I don't totally understand and unfortunately can't predict. I looked at the NginX documentation for some setting that would allow me to fix this; specifically, I wanted to find some way to restrict the proxy from passing on request blocks to the phi-deidentifier back end that are beyond a certain size. Unfortunately, I don't see any settings that would allow us to do that. The proxy_buffer_* settings are all concerned with buffering the response from the backend server to the proxy server. The client_buffer_* settings (mentioned in the article that @tschaffter shared with me) are concerned with buffering the request from the client to the proxy server. Neither of these seem to accomplish what we need. Or maybe I am missing something!

Therefore I fixed this problem by changing the buffer size of the WSGI server on the backend server. This is not a very satisfying fix, since I still don't understand why the proxy is outputting larger block sizes than the client is sending to it. I can't even guarantee that it won't send a request block that is even bigger than whatever we set the WSGI server's buffer-size to. However, I have tried this setting out experimentally, and it seems to solve the problem for every note that I have tried.

@tschaffter
Copy link
Member

On the other hand, if I input the contents of 116-01.txt followed by 117-01.txt (i.e. the contents of both of the notes) does not cause any error! Clearly, the problem is not as simple as the length of the input or a single special character.

I agree, it's very confusing that inputing each note individually lead to an error (what are the specific error?) but then you observe no error when combining the two notes together (what did you the idea of doing that? If you were testing the length, you could just have duplicated one note). Did you try to reduce the content to try to narrow the issue?

Other information that could be useful is to find the block size of each request (2 individual and the 1 combined). If the logs don't show that by default, you could configure buffer-size to a small value so that virtually any request you make will trigger the error so that you can get the block size from the log.

Therefore I fixed this problem by changing the buffer size of the WSGI server on the backend server.

What is the current buffer size (is it 4096 as shown in the logs)? What is the new value that you recommend using? Why?

tschaffter pushed a commit to nlpsandbox/phi-deidentifier-example that referenced this issue Apr 6, 2021
The NginX reverse proxy sometimes increases the request block sizes to
more than the default maximum for this server. This addresses issue
nlpsandbox/phi-deidentifier-app#46.
tschaffter added a commit to nlpsandbox/phi-deidentifier-example that referenced this issue Jun 4, 2021
* Increase Buffer Size

The NginX reverse proxy sometimes increases the request block sizes to
more than the default maximum for this server. This addresses issue
nlpsandbox/phi-deidentifier-app#46.

* Add Flask Testing

* Update to Schema 1.1.1

* Update nlpsandboxclient & Annotator Versions

* Switch Unit Tests to Using Properly-formatted dicts

* Update Listed API Version to 1.1.1

* Fix Flake8 Issues

* Fix Flake8 Issues

* Update nlpsandbox-client version in test-requirements.txt

Co-authored-by: Thomas Schaffter <thomas.schaffter@gmail.com>
boyleconnor added a commit to nlpsandbox/phi-deidentifier-example that referenced this issue Jun 4, 2021
* Increase Buffer Size

The NginX reverse proxy sometimes increases the request block sizes to
more than the default maximum for this server. This addresses issue
nlpsandbox/phi-deidentifier-app#46.

* Add Flask Testing

* Update to Schema 1.1.1

* Update nlpsandboxclient & Annotator Versions

* Switch Unit Tests to Using Properly-formatted dicts

* Update Listed API Version to 1.1.1

* Fix Flake8 Issues

* Fix Flake8 Issues

* Add Contact and ID to config.py

* Add ID & Contact Annotators to docker-compose.yml Stack

* Add ID & Contact to deidentified_notes Controller Code

* Fix Annotator Versions in docker-compose.yml

* Add ID & Contact Annotator Tool Dependencies

* Fix ID & Contact annotation keys in De-ID Controller

* Update Unit Tests to Include ID & Contact

* Fix Formatting for Flake8
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Something isn't working Priority: Critical
Projects
None yet
Development

No branches or pull requests

2 participants