Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(SIP-39): Websocket sidecar app #11498

Merged
merged 27 commits into from
Apr 8, 2021
Merged
Show file tree
Hide file tree
Changes from 25 commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
e476ca5
WIP node.js websocket app
robdiciuccio Apr 6, 2020
881e18d
Load testing
robdiciuccio Apr 10, 2020
01e9f1f
Multi-stream publish with blocking reads
robdiciuccio Apr 26, 2020
a1954a3
Use JWT for auth and channel ID
robdiciuccio Oct 20, 2020
3823bba
Update ws jwt cookie name
robdiciuccio Oct 21, 2020
c42cbc2
Typescript
robdiciuccio Jan 22, 2021
08531f5
Frontend WebSocket transport support
robdiciuccio Feb 2, 2021
1ecf689
ws server ping/pong and GC logic
robdiciuccio Feb 2, 2021
dc968bc
ws server unit tests
robdiciuccio Mar 10, 2021
63c765f
GC interval config, debug logging
robdiciuccio Mar 15, 2021
5eab894
Cleanup JWT cookie logic
robdiciuccio Mar 15, 2021
04e8d85
Refactor asyncEvents.ts to support non-Redux use cases
robdiciuccio Mar 18, 2021
bd273ff
Update tests for refactored asyncEvents
robdiciuccio Mar 19, 2021
bab9e4b
Merge branch master
robdiciuccio Mar 24, 2021
09506f9
Add eslint, write READMEs, reorg files
robdiciuccio Mar 25, 2021
c2c35e5
CI workflow
robdiciuccio Mar 25, 2021
4976608
Moar Apache license headers
robdiciuccio Mar 25, 2021
dd31aab
pylint found something
robdiciuccio Mar 26, 2021
f5087ed
adjust GH actions workflow
robdiciuccio Mar 26, 2021
8eadc3a
Improve documentation & comments
robdiciuccio Mar 31, 2021
fedf0f3
Prettier
robdiciuccio Apr 1, 2021
d44742d
Add configurable logging via Winston
robdiciuccio Apr 2, 2021
23d17bb
Merge branch 'master' into rd/async-websocket-app
robdiciuccio Apr 2, 2021
863b503
Add SSL support for Redis connections
robdiciuccio Apr 2, 2021
20b563a
Fix incompatible logger statements
robdiciuccio Apr 2, 2021
3be73e4
Apply suggestions from code review
Apr 6, 2021
3af092f
rename streamPrefix config
robdiciuccio Apr 6, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 33 additions & 0 deletions .github/workflows/superset-websocket.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
name: WebSocket server
on:
push:
paths:
- "superset-websocket/**"
pull_request:
paths:
- "superset-websocket/**"

jobs:
app-checks:
if: github.event.pull_request.draft == false
runs-on: ubuntu-20.04
steps:
- name: "Checkout ${{ github.ref }} ( ${{ github.sha }} )"
uses: actions/checkout@v2
with:
persist-credentials: false
- name: Install dependencies
working-directory: ./superset-websocket
run: npm install
- name: lint
working-directory: ./superset-websocket
run: npm run lint
- name: prettier
working-directory: ./superset-websocket
run: npm run prettier-check
- name: unit tests
working-directory: ./superset-websocket
run: npm run test
- name: build
working-directory: ./superset-websocket
run: npm run build
20 changes: 20 additions & 0 deletions superset-websocket/.eslintignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
*.min.js
node_modules
dist
coverage
38 changes: 38 additions & 0 deletions superset-websocket/.eslintrc.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
/**
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/
module.exports = {
root: true,
parser: '@typescript-eslint/parser',
env: {
node: true,
browser: true,
},
plugins: [
'@typescript-eslint',
],
extends: [
'eslint:recommended',
'plugin:@typescript-eslint/recommended',
'prettier',
],
rules: {
"@typescript-eslint/explicit-module-boundary-types": 0,
"@typescript-eslint/no-var-requires": 0,
},
};
20 changes: 20 additions & 0 deletions superset-websocket/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
#
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this ignore node_modules ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

node_modules is ignored in the top-level .gitignore, but I'll make it explicit.

# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
config.json
dist
node_modules
*.log
1 change: 1 addition & 0 deletions superset-websocket/.nvmrc
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
v14.15.5
24 changes: 24 additions & 0 deletions superset-websocket/.prettierignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
*.min.js
node_modules
dist
coverage
.eslintrc.js
.prettierrc.json
*.md
*.json
5 changes: 5 additions & 0 deletions superset-websocket/.prettierrc.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
{
"trailingComma": "all",
"singleQuote": true,
"arrowParens": "avoid"
}
106 changes: 106 additions & 0 deletions superset-websocket/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
# Superset WebSocket Server

A Node.js WebSocket server for sending async event data to the Superset web application frontend.

## Requirements

- Node.js 12+ (not tested with older versions)
- Redis 5+

To use this feature, Superset needs to be configured to enable global async queries and to use WebSockets as the transport (see below).

## Architecture

This implementation is based on the architecture defined in [SIP-39](https://github.com/apache/superset/issues/9190).

### Streams

Async events are pushed to [Redis Streams](https://redis.io/topics/streams-intro) from the [Superset Flask app](https://github.com/preset-io/superset/blob/master/superset/utils/async_query_manager.py). An event for a particular user is published to two streams: 1) the global event stream that includes events for all users, and 2) a channel/session-specific stream only for the user. This approach provides a good balance of performance (reading off of a single global stream) and fault tolerance (dropped connections can "catch up" by reading from the channel-specific stream).

Note that Redis Stream [consumer groups](https://redis.io/topics/streams-intro#consumer-groups) are not used here due to the fact that each group receives a subset of the data for a stream, and WebSocket clients have a persistent connection to each app instance, requiring access to all data in a stream. Horizontal scaling of the WebSocket app requires having multiple WebSocket servers, each with full access to the Redis Stream data.

### Connection

When a user's browser initially connects to the WebSocket server, it does so over HTTP, which includes the JWT authentication cookie, set by the Flask app, in the request. _Note that due to the cookie-based authentication method, the WebSocket server must be run on the same host as the web application._ The server validates the JWT token by using the shared secret (config: `jwtSecret`), and if valid, proceeds to upgrade the connection to a WebSocket. The user's session-based "channel" ID is contained in the JWT, and serves as the basis for sending received events to the user's connected socket(s).

A user may have multiple WebSocket connections under a single channel (session) ID. This would be the case if the user has multiple browser tabs open, for example. In this scenario, **all events received for a specific channel are sent to all connected sockets**, leaving it to the consumer to decide which events are relevant to the current application context.

### Reconnection

It is expected that a user's WebSocket connection may be dropped or interrupted due to fluctuating network conditions. The Superset frontend code keeps track of the last received async event ID, and attempts to reconnect to the WebSocket server with a `last_id` query parameter in the initial HTTP request. If a connection includes a valid `last_id` value, events that may have already been received and sent unsuccessfully are read from the channel-based Redis Stream and re-sent to the new WebSocket connection. The global event stream flow then assumes responsibility for sending subsequent events to the connected socket(s).

### Connection Management

The server utilizes the standard WebSocket [ping/pong functionality](https://developer.mozilla.org/en-US/docs/Web/API/WebSockets_API/Writing_WebSocket_servers#pings_and_pongs_the_heartbeat_of_websockets) to determine if active WebSocket connections are still alive. Active sockets are sent a _ping_ regularly (config: `pingSocketsIntervalMs`), and the internal _sockets_ registry is updated with a timestamp when a _pong_ response is received. If a _pong_ response has not been received before the timeout period (config: `socketResponseTimeoutMs`), the socket is terminated and removed from the internal registry.

In addition to periodic socket connection cleanup, the internal _channels_ registry is regularly "cleaned" (config: `gcChannelsIntervalMs`) to remove stale references and prevent excessive memory consumption over time.

## Install

Install dependencies:
```
npm install
```

## WebSocket Server Configuration

Copy `config.example.json` to `config.json` and adjust the values for your environment.

## Superset Configuration

Configure the Superset Flask app to enable global async queries (in `superset_config.py`):

Enable the `GLOBAL_ASYNC_QUERIES` feature flag:
```
"GLOBAL_ASYNC_QUERIES": True
```

Configure the following Superset values:
```
GLOBAL_ASYNC_QUERIES_TRANSPORT = "ws"
GLOBAL_ASYNC_QUERIES_WEBSOCKET_URL = "ws://<host>:<port>/"
```

Note that the WebSocket server must be run on the same hostname (different port) for cookies to be shared between the Flask app and the WebSocket server.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the JWT the only thing needed from the cookies? If so, have we considered passing that value explicitly? Seems like if we could do that, we could remove this assumption/constraint.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could pass it explicitly, though we'd have to make the cookie readable by JS (currently it's httponly), so there is a bit of a tradeoff here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keeping the cookie httponly is a good move!


The following config values must contain the same values in both the Flask app config and `config.json`:
```
GLOBAL_ASYNC_QUERIES_REDIS_CONFIG
GLOBAL_ASYNC_QUERIES_REDIS_STREAM_PREFIX
GLOBAL_ASYNC_QUERIES_JWT_COOKIE_NAME
GLOBAL_ASYNC_QUERIES_JWT_SECRET
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

idea, non-blocking: If these are not configured correctly, can we have the service detect it and provide hints to help devs/admins get it working?

```

More info on Superset configuration values for async queries: https://github.com/apache/superset/blob/master/CONTRIBUTING.md#async-chart-queries

## Running

Running locally via dev server:
```
npm run dev-server
```

Running in production:
```
npm run build && npm start
```

*TODO: containerization*
16 changes: 16 additions & 0 deletions superset-websocket/config.example.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
{
"port": 8080,
"logLevel": "info",
"logToFile": false,
"logFilename": "app.log",
"redis": {
"port": 6379,
"host": "127.0.0.1",
"password": "",
"db": 0,
"ssl": false
},
"streamPrefix": "async-events-",
"jwtSecret": "CHANGE-ME",
"jwtCookieName": "async-token"
}
12 changes: 12 additions & 0 deletions superset-websocket/config.test.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
{
"redis": {
"port": 6379,
"host": "127.0.0.1",
"password": "",
"db": 10,
"ssl": false
},
"streamPrefix": "test-async-events-",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change request: Can we make this more specific? Not clear what a streamPrefix is used for.

"jwtSecret": "test123-test123-test123-test123-test123-test123-test123",
"jwtCookieName": "test-async-token"
}
22 changes: 22 additions & 0 deletions superset-websocket/jest.config.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
/**
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/
module.exports = {
preset: 'ts-jest',
testEnvironment: 'node',
};
Loading