Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auto generate node ID #162

Merged
merged 11 commits into from
Dec 29, 2022
Merged

Auto generate node ID #162

merged 11 commits into from
Dec 29, 2022

Conversation

honnix
Copy link
Member

@honnix honnix commented Dec 20, 2022

TL;DR

Auto generate node ID

Type

  • Bug Fix
  • Feature
  • Plugin

Are all requirements met?

  • Code completed
  • Smoke tested
  • Unit tests added
  • Code documentation added
  • Any pending items have an associated Issue

Complete description

In most cases, there is no need to specify a node ID that must be unique across one workflow. This PR makes it possible to auto-generate one if unspecified.

If node ID is not specified, it will be generated as w<random>-n0, w<random>-n1, etc. where random is a random integer generated per workflow build, and w<random> can be specified by a user; and to make a node readable, node name is inferred from class name.

Tracking Issue

Closes flyteorg/flyte#3185

Follow-up issue

NA

Signed-off-by: Hongxin Liang <honnix@users.noreply.github.com>
@@ -54,13 +54,6 @@ public SdkBranchNode apply(
List<String> upstreamNodeIds,
@Nullable SdkNodeMetadata metadata,
Map<String, SdkBindingData> inputs) {
if (metadata != null) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels unnecessarily strict.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to delete these because of calling SdkTransform withNameOverride(String name, boolean failOnDuplicate)

@@ -53,11 +53,6 @@ public String getType() {
return "raw-container";
}

/** Specifies task name. */
public String getName() {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to SdkTransform.

Signed-off-by: Hongxin Liang <honnix@users.noreply.github.com>
@@ -58,6 +58,6 @@ public SdkNode apply(String id, SdkTransform transform) {
List<String> upstreamNodeIds =
getOutputs().isEmpty() ? Collections.singletonList(getNodeId()) : Collections.emptyList();

return builder.applyInternal(id, transform, upstreamNodeIds, /*metadata=*/ null, getOutputs());
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is strictly a breaking change, but I'm not sure why we had applyInternal as protected, and here metadata is always null.

@@ -114,6 +126,11 @@ public SdkTransform withTimeoutOverride(Duration timeout) {
return new SdkPartialTransform(transform, fixedInputs, extraUpstreamNodeIds, mergedMetadata);
}

@Override
public String getName() {
return transform.getName();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delegate to transform to get real name.

return applyInternal(/*nodeId=*/ null, transform, emptyList(), inputs);
}

SdkNode applyInternal(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed visibility to package only. Note that this is strictly a breaking change.


private static String toNodeName(String name) {
String lastPart = name.substring(name.lastIndexOf('.') + 1);
return PATTERN.matcher(lastPart).replaceAll("$1-$2").toLowerCase().replaceAll("\\$", "-");
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically camel to kebab, also placing $ for nested class.

Signed-off-by: Hongxin Liang <honnix@users.noreply.github.com>
Signed-off-by: Hongxin Liang <honnix@users.noreply.github.com>
Signed-off-by: Hongxin Liang <honnix@users.noreply.github.com>

public SdkWorkflowBuilder() {
this("w" + ThreadLocalRandom.current().nextInt(1000, 10000) + "-");
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sure we could support subworkflow later.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we move from numbers to characters? numbers can be confusing, as they might mean something, having a hash of characters might seem more natural

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went for consistency because we use number for node already n0, n1` etc. This is a also bit simpler than generating random bytes. But I'm ok either way.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I revised it to use alphabet as prefix. This will also decrease the chance of conflicting.

return withNameOverride(name, true);
}

SdkTransform withNameOverride(String name, boolean failOnDuplicate) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intentionally not opening this method to avoid exposing unnecessary methods to users.

Signed-off-by: Hongxin Liang <honnix@users.noreply.github.com>
Comment on lines 74 to 76
public SdkTransform withNameOverride(String name) {
return withNameOverride(name, true);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is exposed publicly? So now we have 2 ways to set the name? One though the withNameOverride and the other through the buillder.apply(name,...)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

buillder.apply(nodeId,...) is for setting node ID, not node name.

import javax.annotation.Nullable;
import org.flyte.api.v1.LiteralType;
import org.flyte.api.v1.SimpleType;
import org.flyte.api.v1.WorkflowTemplate;

public class SdkWorkflowBuilder {

private static final Pattern PATTERN = Pattern.compile("([a-z])([A-Z]+)");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we allow for capitals? i thought that only lower case was allowed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to convert FooBar to foo-bar for example.

Signed-off-by: Hongxin Liang <honnix@users.noreply.github.com>
@honnix honnix marked this pull request as ready for review December 23, 2022 10:13
Copy link
Contributor

@ckiosidis ckiosidis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks ok to me 👍

Copy link
Collaborator

@RRap0so RRap0so left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In flytekit-python, it throws exception if not set and there's a method to convert to dns-compliant.

See:
https://github.com/flyteorg/flytekit/blob/master/flytekit/core/node.py#L28
https://github.com/flyteorg/flytekit/blob/master/flytekit/core/utils.py#L13

Should we align with the Python way? Fine either way for me just want to be sure if we're supposed to be as much aligned as possible or if we can have our own flavor.

@honnix
Copy link
Member Author

honnix commented Dec 23, 2022

Where should we throw? In case of an invalid node id? That is reasonable but seems to be better in a different PR.

Comment on lines 253 to 254
.ints('a', 'z' + 1)
.limit(4)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.ints('a', 'z' + 1)
.limit(4)
.ints(4, 'a', 'z' + 1)

ints has an overloaded version that take stream size as parameter, so no need for limit after

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe also add a constant RND_PREFIX_SIZE so it is easy to spot what that 4 means.
We could also add a comment to the method saying like

// Returns random prefix in the following format "wqjoz-"


private static String toNodeName(String name) {
String lastPart = name.substring(name.lastIndexOf('.') + 1);
return PATTERN
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PATTERN is such a generic name specially in this class is about Workflow Builder.

I would suggest to either keeping the name as it is but move all the name and id related methods to a new auxiliary class, or try to come up with a more specific name for this constant. the challenge is that this pattern is a small piece in the whole (match a lowercase char followed by an uppercase one).

In any case,

Suggested change
return PATTERN
return UPPER_AFTER_LOWER_PATTERN

at least describe what the pattern does, but still I feel that there is no much cohesion between these methods and constant with the rest of the class

@@ -201,4 +236,23 @@ public void output(String name, SdkBindingData value, String help) {
public WorkflowTemplate toIdlTemplate() {
return WorkflowTemplateIdl.ofBuilder(this);
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Translates "FooBar" to "foo-bar"

To make it easier to follow

if (failOnDuplicate) {
throw e;
}
return this;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is surprising for me, if a name is already set and check is false then we don't override the name (this method becomes no op).

My suggestion would be to create a method called withNameOverrideIfNotSet (or similar) and then write the method like this:

  public SdkTransform withNameOverrideIfNotSet(String name) {
    if (metadata.name() != null) {
      return this;
    }
    return withNameOverride(name);
  }

Leaving the withNameOverride as it was original

@honnix
Copy link
Member Author

honnix commented Dec 27, 2022

Please feel free to take over and make necessary changes. I will not have time to work on this any time soon. Thank you.

Signed-off-by: Nelson Arapé <nelson@spotify.com>
Signed-off-by: Nelson Arapé <nelson@spotify.com>
Signed-off-by: Nelson Arapé <nelson@spotify.com>
Signed-off-by: Nelson Arapé <nelson@spotify.com>
Copy link
Contributor

@ckiosidis ckiosidis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good

@narape narape merged commit 993ce77 into master Dec 29, 2022
@narape narape deleted the generate-node-id branch December 29, 2022 09:18
andresgomezfrr pushed a commit that referenced this pull request Jan 24, 2023
* Auto generate node ID

Signed-off-by: Hongxin Liang <honnix@users.noreply.github.com>

* Do not expose

Signed-off-by: Hongxin Liang <honnix@users.noreply.github.com>

* OK

Signed-off-by: Hongxin Liang <honnix@users.noreply.github.com>

* Use remote name

Signed-off-by: Hongxin Liang <honnix@users.noreply.github.com>

* Use node id prefix

Signed-off-by: Hongxin Liang <honnix@users.noreply.github.com>

* Give it a proper prefix

Signed-off-by: Hongxin Liang <honnix@users.noreply.github.com>

* Use alphabet for prefix

Signed-off-by: Hongxin Liang <honnix@users.noreply.github.com>

* Node id related methods in own class

Signed-off-by: Nelson Arapé <nelson@spotify.com>

* withNameOverride(String, boolean) -> withNameOverrideIfNotSet

Signed-off-by: Nelson Arapé <nelson@spotify.com>

* Rename NamePolicy and add docs

Signed-off-by: Nelson Arapé <nelson@spotify.com>

* Minor refactor test

Signed-off-by: Nelson Arapé <nelson@spotify.com>

Signed-off-by: Hongxin Liang <honnix@users.noreply.github.com>
Signed-off-by: Nelson Arapé <nelson@spotify.com>
Co-authored-by: Nelson Arapé <nelson@spotify.com>
Signed-off-by: Andres Gomez Ferrer <andresg@spotify.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Core feature] flytekit-java does not support generating node ID
4 participants