Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prod Release 17/01/24 #513

Merged
merged 37 commits into from
Jan 24, 2024
Merged

Prod Release 17/01/24 #513

merged 37 commits into from
Jan 24, 2024

Conversation

morgsmccauley
Copy link
Collaborator

snyk-bot and others added 30 commits October 21, 2023 00:49
Snyk has created this PR to upgrade eslint-config-next from 13.1.6 to 13.5.3.

See this package in npm:


See this project in Snyk:
https://app.snyk.io/org/gabehamilton/project/f1490843-1830-4eb0-a957-99816aa5edcc?utm_source=github&utm_medium=referral&page=upgrade-pr
Logs are causing machine to run out of space, leading to crashes. I've
removed some logs which are not particularly helpful for debugging
problems anymore. These logs also tend to run basically every single
iteration, bloating logs both on machine and in GCP.
As part of implementing a new control plane to oversee QueryApi
resources, Runner needed endpoints with which control plane can send
commands. This PR adds code to create a new gRPC server with endpoints
to start, stop, or list Runner executors. This provides the control
plane the ability to fully control Runner, and removes the need for
bilateral decision making, which was a problem of the previous design.

The changes will be changed further as needs become more concrete,
closer to integration/release.
…28f516705c

[Snyk] Upgrade eslint-config-next from 13.1.6 to 13.5.3
# Conflicts:
#	frontend/yarn.lock
This PR adds the `created_at_block_height` and `updated_at_block_height`
fields to `IndexerConfig` within the registry contract. The motive
behind this is to provide Coordinator V2 with a way for comparing the
actual and desired states of the system, i.e. if there is a mismatch
between the registry and the system, action should be taken. Without
versions, there is no way of making this comparison.

## Compilation Errors
~~I ran in to several issues trying to compile the `wasm32` binary, and
have outlined all these issues in
near/near-sdk-rs#1125, as well as in the
`README.md` so that the fixes are documented. These fixes are a bit
janky, but I've tested the deployed contract and all seems to be ok.~~

These have been resolved, see:
#458 (comment)

## Account Roles Migration
I've also included `account_roles` in this migration as we have some
incorrect accounts as `Owner`s (`pavelnear.near`). All owners will be
wiped and re-written from the contract default state. All `User`s will
remain.

## Coordinator V1
Coordinator V1 has been tested to ensure that it can still parse the
registry after these new fields have been applied.
…nges on the indexerDetails object.

- Show the schema even if it fails the validations
…api into fix/load-schema-from-registry

# Conflicts:
#	frontend/src/components/Editor/Editor.jsx
… validation when user is changing the code
This PR creates a rust based GRPC client for Runner, for use within
Coordinator V2. For now, this exists as its own crate within the
top-level directory. The Coordinator PR was becoming quite large so I
decided to separate this out.

Additionally, I've made a couple changes to the Runner proto:
- Rename the package: `spec` -> `runner`
- Remove `executorId` from `StartExecutorRequest` - it should be
deterministic and assigned internally
- Add `version` to the executor - this will be used to determine whether
an executor should be restarted
morgsmccauley and others added 5 commits January 11, 2024 15:02
This PR adds the initial Coordinator V2 service, which acts as the main
driver for the Control Plane.

## Coordinator Overview
Coordinator V2 will exist as a new standalone service with it's primary
goal being ensuring the current registry configuration is mapped to the
system. It's core logic is just an infinite loop which reads the
registry, and sends necessary requests to the Block Streamer and Runner
services to synchronise that config.

## Block Streamer Changes
Some changes have been made to Block Streamer to enable the above:
1. `version` and `redis_stream` have been added to the proto so that
Coordinator can configure them.
2. Support for `ActionFunctionCall` has been added - Initially I thought
only `ActionAny` allowed, but the current registry has
`ActionFunctionCall` and therefore needs to be supported.
3. `last_published_block` is now written to Redis to enable "Start from
interruption"

## What's not been done
I wanted to limit the scope of this PR as it was starting to get big,
I'll address these tasks in follow up PRs:
- Provisioning - Coordinator should check the status of provisioning,
and act when the state isn't as expected. This can be done after
#426 is implemented.
- Retry recoverable errors - Any error will be propagated cause the
error to exit, this includes connection errors to the Block Streamer and
Runner services. As these are very likely to occur (across restarts) we
should retry these errors.
- Environment configuration - There are many hard coded values
(endpoints, registry contract, etc.) which should be configurable via
the environment.
- Logging and Metrics
More info about this can be found on this
[issue](#483)

Changes on this PR:
- Validate code & schema before registering the indexer
Logic: If formatting either the code or the schema fails, the `Publish`
button is disabled. Only if type generation errors or no errors are
detected, the `Publish` button will be enabled. More about this on this
[discussion](#480 (comment))


https://github.com/near/queryapi/assets/15988846/68f89cb4-f561-4e8a-9fa7-81c1a38e548c

Additionally 
- Created a reusable Modal with a global context to manage it, so we can
trigger it from any component to display some info
- Updated schema with granular error types for improved clarity
- Created a custom error class to filter by type Error. We can add more
fields on it if needed

---------

Co-authored-by: Juan Luis Santana <juanluis@near.org>
Co-authored-by: Roshaan Siddiqui <siddiqui.roshaan@gmail.com>
Currently, the `shard`/`chunk` count is hard-coded to 4 so that we can
fetch the block header and shards in parallel. This PR removes the
hard-coded value by using the chunk count specified in the block header
to fetch the relevant shards.
`fetchShardsPromises` is used within `Promise.all` to fetch all shards
simultaneously. The `async` modifier means this function returns a
`Promise`. `Promise.all` complains since it's passed a `Promise` rather
than an `Array`.

This PR removes the unnecessary `async` modifier so it works with
`Promise.all`
)

- refactor: Configure `coordinator` via environment
- refactor: Configure `block-streamer` via environment
@morgsmccauley morgsmccauley requested a review from a team as a code owner January 17, 2024 01:25
@morgsmccauley morgsmccauley changed the title Prod Release Prod Release 17/01 Jan 17, 2024
@morgsmccauley morgsmccauley changed the title Prod Release 17/01 Prod Release 17/01/24 Jan 17, 2024
Runner will need the ability to manage V1 (Polling Redis to start
streams) and V2 (Starting streams upon receiving calls to gRPC server)
indexers. The rollout plan for V2 involves a rolling migration with automatic migration. 

In order to do this, the main thread needs to have a shared map of executors. The server should be available always and the list API should return both V1 and V2 indexers, and Stop should be able to terminate any indexer. In order for stopped V1 indexers to be not restarted, the stream must be removed from the streams set before Runner performs stop. 

Finally, I addressed some remaining TODOs leftover. Mainly returning
more information, tracking status of executors, and returning all.
information available when listing executors.
Build fails in Dev due to Rpto file not being found. Copying the file
explicitly in the Dockerfile resolves the issue.

Issue was reproducible uding docker compose to build and run image
locally.
@darunrs
Copy link
Collaborator

darunrs commented Jan 22, 2024

The two commits that I authored that got added here would require terraform changes in prod. We could either leave them out of this release, or deploy the terraform changes in prod (Added the env variable). I'd probably lean towards leaving it out? Otherwise, the changes looks fine to me. I haven't managed to get to the bottom of the failure in the UI for showing logs though. But if it works for you, then it's definitely not a code issue.

@darunrs
Copy link
Collaborator

darunrs commented Jan 22, 2024

Did we add start and created block heights to prod contract as well?

@morgsmccauley
Copy link
Collaborator Author

Did we add start and created block heights to prod contract as well?

Yes, it has been deployed to prod.

@morgsmccauley
Copy link
Collaborator Author

The two commits that I authored that got added here would require terraform changes in prod. We could either leave them out of this release, or deploy the terraform changes in prod (Added the env variable). I'd probably lean towards leaving it out? Otherwise, the changes looks fine to me. I haven't managed to get to the bottom of the failure in the UI for showing logs though. But if it works for you, then it's definitely not a code issue.

It's probably easier to apply the terraform changes than revert, and then revert revert those PRs.

What failure in the UI are you referring to?

@darunrs
Copy link
Collaborator

darunrs commented Jan 22, 2024

The two commits that I authored that got added here would require terraform changes in prod. We could either leave them out of this release, or deploy the terraform changes in prod (Added the env variable). I'd probably lean towards leaving it out? Otherwise, the changes looks fine to me. I haven't managed to get to the bottom of the failure in the UI for showing logs though. But if it works for you, then it's definitely not a code issue.

It's probably easier to apply the terraform changes than revert, and then revert revert those PRs.

What failure in the UI are you referring to?

When I click Show Logs, on any indexer in dev, I get an error. In the console, it shows this error:

image

I'm not entirely sure if this is a local issue or a code issue, quite yet. Chances are it's just me, but I wanted to verify with someone else.

@morgsmccauley
Copy link
Collaborator Author

@darunrs yeah that's specific to the dev cert, I've raised an issue with SRE. I'll go ahead and merge this release.

@morgsmccauley morgsmccauley merged commit 2872008 into stable Jan 24, 2024
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants