Skip to content
This repository has been archived by the owner on Mar 3, 2023. It is now read-only.

InstanceId string format should never be parsed or assumed #1376

Closed
billonahill opened this issue Sep 12, 2016 · 0 comments
Closed

InstanceId string format should never be parsed or assumed #1376

billonahill opened this issue Sep 12, 2016 · 0 comments
Milestone

Comments

@billonahill
Copy link
Contributor

billonahill commented Sep 12, 2016

A Heron Instance id (i.e., instanceId) is generated using the following format:

String.format("%d:%s:%d:%d", containerIdx, componentName, instanceIdx, componentIdx)

The id provides uniqueness and contains helpful identifying information, which is fine, but it's contents and format are assumed by parts of the system, which is bad. For example:

  • If the instanceIdx part of the id is not globally unique, tuples can be mis-routed when using shuffle grouping. This is because the heron-executor parses taskId from instanceId and passes that to the started instance. This value becomes the globally unique taskId on the PhysicalPlan which is used extensively.
  • The packing package passes maps of instanceIds around and parses out the componentName. Instead it should work with InstancePlan objects and call instnacePlan.getComponentName().

There should be no hidden meaning or expectations of that id beyond it being human readable. That id format should be safe to change without breaking underlying systems. If a concept like instnaceIdx is required by the system, it should be added to InstancePlan as a typed concept that can be accessed without parsing tokens from the id.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant