Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Toggle Runner Version #488

Merged
merged 14 commits into from
Jan 17, 2024
Merged

feat: Toggle Runner Version #488

merged 14 commits into from
Jan 17, 2024

Conversation

darunrs
Copy link
Collaborator

@darunrs darunrs commented Jan 4, 2024

Runner will need the ability to start using V1 (Polling Redis to start streams) or V2 (Starting streams upon receiving calls to gRPC server) depending on an environment variable we set before deployment. This way, we can make the migration from V1 to V2 and roll that back without too much difficulty.

In order to do this, the main thread needs to have a separate map of stream handlers owned by the grpc server. The server needs to be spun up only in V2 mode. And Redis polling should only happen in V1 mode. In addition, the worker thread needs to only poll Redis stream storage for indexer config in V1 mode. To do so, only V2 will pass in the config into the worker upon executor init. This provides version control over indexer config as well.

Finally, I addressed some remaining TODOs leftover. Mainly returning more information, tracking status of executors, and returning all. information available when listing executors.

For now, I've defaulted the version to V1, if the env variable is not set.

@darunrs
Copy link
Collaborator Author

darunrs commented Jan 4, 2024

Some tests I ran manually:

Run unset RUNNER_VERSION and verify Runner starts in V1.
Run export RUNNER_VERSION=V1 and verify V1 started. Tried calling server and received connection refused.
Run export RUNNER_VERSION=V2 and verify V2 started. Then, make the following calls:

  • list executors and verify empty list returned
  • start one executorA which runs
  • list executors and verify it shows up with running status
  • stop the executorA
  • list again and verify no executors show up
  • start executorA again
  • start another executorB with a hard coded flag which makes worker crash if the account matches this indexer
  • list executors immediately after executorB and verify both executors show up with running status
  • verify executorA continues to run while executorB stops running
  • list executors again and verify A has running status while B has stopped status
  • stop executorB
  • list executors and verify executorA is only one in list

@darunrs darunrs marked this pull request as ready for review January 4, 2024 19:25
@darunrs darunrs requested a review from a team as a code owner January 4, 2024 19:25
@darunrs darunrs requested a review from morgsmccauley January 4, 2024 19:25
@darunrs darunrs changed the base branch from rustClient to main January 6, 2024 00:00
@darunrs darunrs force-pushed the toggleRunnerVersion branch 2 times, most recently from 8da0e9b to 23dd326 Compare January 8, 2024 22:47
@darunrs darunrs linked an issue Jan 10, 2024 that may be closed by this pull request
@darunrs
Copy link
Collaborator Author

darunrs commented Jan 10, 2024

So we have decided to pivot away from a full release to a rolling release of V2. As a result, this PR will change to instead be focused on an approach where V1 and V2 share the same stream handlers, allowing Coordinator to affect the Redis streams set on its own, and turn off any executor, V1 or not. This allows to later introduce a allow list to transition indexers to V2 while only requiring allow list visibility to coordinator.

@darunrs
Copy link
Collaborator Author

darunrs commented Jan 12, 2024

New use cases tested manually:
Redis set executor stopped by Server
Stopped executor restarted by server
Stopped executor restarted by redis
List shows both redis and non-redis streams

I believe that by returning a version of -1, we can make the change in Coordinator V2 simple (I might be wrong about this):
If we put an if statement for the synchronizing executors code for the allow list, then whenever something is added, the code will see an existing executor, see a mismatched version, and stop and restart it. We only need to remove the stream from Redis before restarting. This could be a check for if version is -1 or something. Just an idea.

I've also added a env variable lock on the start and stop endpoints so we can deploy this to whatever env without explicitly enabling V2 functionality until we are ready.

Copy link
Collaborator

@morgsmccauley morgsmccauley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job! Just a few comments :)

throw new Error(`Stream handler ${executorId} has no/invalid indexer config.`);
executors.forEach((handler, executorId) => {
let config = handler.getIndexerConfig();
if (config === undefined) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Under what conditions is this undefined?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh because the config is read internally 🤔 So we'll get no information for all V1 indexers?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. The config is passed in for V2 executors and is used as opposed to pulling config from Redis. So, aside from my hardcoded obviously incorrect version number, the empty config should be another indicator its a V1 indexer.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll need the account ID/function name so we know which executor to stop, we can probably extract that from the redis stream key, which in this case would be the executorId?

Copy link
Collaborator Author

@darunrs darunrs Jan 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep that's correct. The redis stream is the "executorId" for V1 indexers. In fact, list functions API returns the redis stream key as the executor Id for V1 indexers.

The executorId is guaranteed to be correct since its populated from the loop through the map of stream handlers itself.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So in that case can we extract the account_id and function_name from the redis stream and populate that information here

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hadn't populated it before since I thought it wasn't needed. What's the information used for given executorId is all that's necessary to drop an indexer?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The executorId is all that is needed to stop the executor, but we need to know which indexer that ID corresponds to in order to stop the right one.

We could to the executorId/redis stream parsing on the Coordinator side, but I want to avoid making that assumption since it won't be true in V2.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see. Ok, sorry I understand what you're getting at now. Yeah, let me go ahead and do that parsing.

@darunrs darunrs force-pushed the toggleRunnerVersion branch from 2e04c8b to 341e5e6 Compare January 16, 2024 18:50
@darunrs
Copy link
Collaborator Author

darunrs commented Jan 17, 2024

Example listing of V1 and V2 indexers together. I manually populated streams with 2 and then added one more using the API.

start:  {
  executorId: '5fa3c3ee92d875791598e775738d1fd98a889a10979b600c154e40f9965892c8'
}
list:  {
  executors: [
    {
      executorId: 'flatirons.near/social_blockheight:real_time:stream',
      accountId: 'flatirons.near',
      functionName: 'social_blockheight',
      status: 'RUNNING',
      version: [Long]
    },
    {
      executorId: 'flatirons.near/sweat_blockheight:real_time:stream',
      accountId: 'flatirons.near',
      functionName: 'sweat_blockheight',
      status: 'RUNNING',
      version: [Long]
    },
    {
      executorId: '5fa3c3ee92d875791598e775738d1fd98a889a10979b600c154e40f9965892c8',
      accountId: 'darunrs.near',
      functionName: 'test_sweat_blockheight',
      status: 'RUNNING',
      version: [Long]
    }
  ]
}

Copy link
Collaborator

@morgsmccauley morgsmccauley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work :)

@darunrs darunrs merged commit d403006 into main Jan 17, 2024
3 checks passed
@darunrs darunrs deleted the toggleRunnerVersion branch January 17, 2024 17:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Enable Toggle between Endpoint and Redis Control of Executors
2 participants