Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metadata tag whitespace proposed solution #2538

Merged

Conversation

JoshuaAlter
Copy link
Contributor

@JoshuaAlter JoshuaAlter commented Mar 8, 2022

According to the issue referenced above there are problems with whitespace only and empty tags. In fact, upon looking into this issue there are some other concerns that we should look to address. The proposed solution in the schema will touch upon a couple of items:

  1. Whitespace at the beginning of the line. A regex was added following the allowed pattern options in a json schema to stop any number of whitespace characters after the beginning of a tag - "^[^ \t]+" - this can be read as "after the beginning of line, only allow non-whitespace characters for any length." This stop only whitespace as well because a first character whitespace will not be a non-whitespace character.
  2. Whitespace at the end of the line. Very similar to the above, "[^ \t]+$" was added to stop whitespace after a tag. This can be read as "Only allow non-whitespace characters of any length before ending the tag". Great! This will stop that extra space after a tag being allowed - something we do not want.
  3. Minimum length tags Following the linked item, we added an item parameter to stop 0 length tags. This is aimed to stop the empty tag situation.
  4. Unique items While we're here, we should think. This seems to make sense logically, why should there be duplicated tags? Well, we stop this with a uniqueness boolean, but it is something to be discussed. I can imagine a situation where somebody is scripting tag creation and they are tagging something multiple times with "passed" or "correct" or "complete" or one of many other items indication some kind of finish. Then, they are just checking the count of the tags. Similarly, perhaps they are adding the tag "1" to on each round of run and are using that as a counter for their application. Would this not be valid. Should this be stopped?

Anyway, below are examples of many use cases and their current outputs with the added code and it covers more than the expected behavior (linked in the issue and referenced below)!

UI - The editor should discard/flag whitespace only tags.
CLI - Metadata service should validate whitespace/empty string tags. 
  1. expected usage - allowed
    elyra-test-expected
    There is no effect on a traditional tag without whitespace

  2. whitespace in the middle of the tags - allowed
    elyra-test-whitepace-in-the-middle
    There is no effect to tags that contain whitespace within the tags themselves

  3. example used in issue description - not allowed
    elyra-example-from-issue
    The example tests with both an empty string tag '' and an only whitespace tag ' '. Neither of these are allowed. And thankfully not when they're together either. We can look at the individual tags submitted below.

  4. empty string tag
    elyra-empty-tag
    The tag stops the code snippet creation on validation due to the tag being too short. Great!

  5. whitespace only tags - not allowed
    elyra-just-space
    The tag stops the code snippet creation on validation due to a failure on the first pattern match. Great!

  6. whitespace before the tag - not allowed
    elyra-test-begin-space
    The tag stops the code snippet creation on validation due to a failure on the first pattern match. Great!

  7. whitespace after the tag - not allowed
    elyra-test-end-space
    The tag stops the code snippet creation on validation due to a failure on the second pattern match. Great!

  8. whitespace before and after the tag - not allowed
    elyra-test-space-both-ends
    The tag stops the code snippet creation on validation due to a failure on the first pattern match. Great!

  9. duplicated tags - to discuss
    elyra-test-dup
    The tag stops the code snippet creation on validation due to a failure on the uniqueness! Great! Or is it?

Fixes: #2019

more information can be found here: elyra-ai#2019
@elyra-bot
Copy link

elyra-bot bot commented Mar 8, 2022

Thanks for making a pull request to Elyra!

To try out this branch on binder, follow this link: Binder

@JoshuaAlter JoshuaAlter marked this pull request as ready for review March 8, 2022 17:35
@kevin-bates
Copy link
Member

Hi @JoshuaAlter - this is great - thank you! I agree with your 4 bullet points for attacking whitespace in tags and love the use of allOf there.

Regarding uniqueness, I suppose your use cases make sense in some cases, and this particular attribute (uniqueItems: true), I believe, would apply to the schemas we provide. Since folks can bring their own schemas for their own functionality, then they could choose to omit (or set to false) that attribute in their schemas. There's nothing "built-in" that requires that attribute be present, so I think we're good. If others believe any of our existing schemas require duplicate tags, then we should discuss, but I think starting with them being unique is the reasonable default behavior.

Here are some other items to address:

  • There are several other schemas that reference a tags property, so we'll want to update those as well.
  • We should also add this allOf stanza under an allOf_test attribute to the metadata_test schema and add a test to ensure allOf references are properly behaved. (Note that the metadata tests operate exclusively off the metadata-test schemas and not the application schemas. I'm not sure we need front-end/UI tests to test tag behaviors in code-snippets, runtimes, etc., although those tests should already have something like that and perhaps they can be extended to introduce illegal whitespace.)

@JoshuaAlter JoshuaAlter marked this pull request as ready for review May 4, 2022 21:55
Copy link
Member

@kevin-bates kevin-bates left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great @JoshuaAlter. I really like (and appreciate) you being able to address this issue strictly via JSON schema, and the use of parameters in your test addition is excellent - thank you!

@ptitzler ptitzler added this to the 3.9.0 milestone May 4, 2022
@ptitzler
Copy link
Member

ptitzler commented May 4, 2022

I'm not sure items 1 and 2 of the proposed behavior are user friendly and align with the behavior in other parts of the Elyra code.

Where applicable (e.g. environment variable names, mount points, pvc names in node properties, pipeline names, ...), we consider leading and trailing whitespaces in user-provided text input as irrelevant and strip them off and don't return an error. This way the input is standardized prior to processing and the user doesn't need to manually resolve minor issues that the application can easily handle.

@kevin-bates kevin-bates requested a review from ptitzler May 5, 2022 14:19
@kevin-bates
Copy link
Member

@ptitzler

I'm not sure items 1 and 2 of the proposed behavior are user friendly and align with the behavior in other parts of the Elyra code.

You're correct. It seems we've only recently had a focus on that aspect of things and the bulk of this PR was done well before then. That said, I agree that we should have a consistent UX. The issue here is that this kind of implicit handling for specific properties is not well-suited to a schema-driven model inherent in our metadata-based instances. Because the front-end and CLI tooling are also schema-driven, I suspect the correct approach would be to introduce a metadata class (via the metadata_class_name property in the schema, which applies these kinds of changes.

I think introducing classes specific to each of the "system-owned" schemas would be best (at least those that define tags although we need to decide if we want this in general) and each of these classes would either derive from or use a mixin class (multiple inheritance) that is responsible for trimming explicitly enumerated string-valued properties (including lists of strings). The idea is that this "trim" class defines an empty list attribute (e.g., trim_properties: List) that is inherited by the derived class and the derived class initializes that list to its set of properties that should be trimmed. The trim class would then implement pre_save() to trim the values of whichever properties are in the self.trim_properties list.

If we feel that all string-valued properties (and lists) should be trimmed unconditionally, then we could just build this into the base metadata manager and forgo the additional instance class overrides. I think that might be a tough ask for our cyrstal ball since there may be some special case somewhere that then breaks this approach.

As far as this PR goes, I suggest we go ahead and merge this as a backend catch all for untrimmed tags values.

Thoughts and suggestions are welcome...

Copy link
Member

@akchinSTC akchinSTC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Im good with this going in as is for now since its scoped to just tags as a catch all until we can figure out the bigger picture with whitespace handling in properties

@akchinSTC akchinSTC merged commit 8b9d93b into elyra-ai:master May 10, 2022
@JoshuaAlter JoshuaAlter deleted the backend-metadata-whitespace-2019 branch May 10, 2022 13:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Metadata Editor and CLI should block whitespace only tags
4 participants