Metadata tag whitespace proposed solution #2538

JoshuaAlter · 2022-03-08T16:56:41Z

According to the issue referenced above there are problems with whitespace only and empty tags. In fact, upon looking into this issue there are some other concerns that we should look to address. The proposed solution in the schema will touch upon a couple of items:

Whitespace at the beginning of the line. A regex was added following the allowed pattern options in a json schema to stop any number of whitespace characters after the beginning of a tag - "^[^ \t]+" - this can be read as "after the beginning of line, only allow non-whitespace characters for any length." This stop only whitespace as well because a first character whitespace will not be a non-whitespace character.
Whitespace at the end of the line. Very similar to the above, "[^ \t]+$" was added to stop whitespace after a tag. This can be read as "Only allow non-whitespace characters of any length before ending the tag". Great! This will stop that extra space after a tag being allowed - something we do not want.
Minimum length tags Following the linked item, we added an item parameter to stop 0 length tags. This is aimed to stop the empty tag situation.
Unique items While we're here, we should think. This seems to make sense logically, why should there be duplicated tags? Well, we stop this with a uniqueness boolean, but it is something to be discussed. I can imagine a situation where somebody is scripting tag creation and they are tagging something multiple times with "passed" or "correct" or "complete" or one of many other items indication some kind of finish. Then, they are just checking the count of the tags. Similarly, perhaps they are adding the tag "1" to on each round of run and are using that as a counter for their application. Would this not be valid. Should this be stopped?

Anyway, below are examples of many use cases and their current outputs with the added code and it covers more than the expected behavior (linked in the issue and referenced below)!

UI - The editor should discard/flag whitespace only tags.
CLI - Metadata service should validate whitespace/empty string tags.

expected usage - allowed

There is no effect on a traditional tag without whitespace
whitespace in the middle of the tags - allowed

There is no effect to tags that contain whitespace within the tags themselves
example used in issue description - not allowed

The example tests with both an empty string tag '' and an only whitespace tag ' '. Neither of these are allowed. And thankfully not when they're together either. We can look at the individual tags submitted below.
empty string tag

The tag stops the code snippet creation on validation due to the tag being too short. Great!
whitespace only tags - not allowed

The tag stops the code snippet creation on validation due to a failure on the first pattern match. Great!
whitespace before the tag - not allowed

The tag stops the code snippet creation on validation due to a failure on the first pattern match. Great!
whitespace after the tag - not allowed

The tag stops the code snippet creation on validation due to a failure on the second pattern match. Great!
whitespace before and after the tag - not allowed

The tag stops the code snippet creation on validation due to a failure on the first pattern match. Great!
duplicated tags - to discuss

The tag stops the code snippet creation on validation due to a failure on the uniqueness! Great! Or is it?

Fixes: #2019

more information can be found here: elyra-ai#2019

elyra-bot · 2022-03-08T16:56:47Z

Thanks for making a pull request to Elyra!

To try out this branch on binder, follow this link:

kevin-bates · 2022-03-08T18:12:30Z

Hi @JoshuaAlter - this is great - thank you! I agree with your 4 bullet points for attacking whitespace in tags and love the use of allOf there.

Regarding uniqueness, I suppose your use cases make sense in some cases, and this particular attribute (uniqueItems: true), I believe, would apply to the schemas we provide. Since folks can bring their own schemas for their own functionality, then they could choose to omit (or set to false) that attribute in their schemas. There's nothing "built-in" that requires that attribute be present, so I think we're good. If others believe any of our existing schemas require duplicate tags, then we should discuss, but I think starting with them being unique is the reasonable default behavior.

Here are some other items to address:

There are several other schemas that reference a tags property, so we'll want to update those as well.
We should also add this allOf stanza under an allOf_test attribute to the metadata_test schema and add a test to ensure allOf references are properly behaved. (Note that the metadata tests operate exclusively off the metadata-test schemas and not the application schemas. I'm not sure we need front-end/UI tests to test tag behaviors in code-snippets, runtimes, etc., although those tests should already have something like that and perhaps they can be extended to introduce illegal whitespace.)

kevin-bates

This looks great @JoshuaAlter. I really like (and appreciate) you being able to address this issue strictly via JSON schema, and the use of parameters in your test addition is excellent - thank you!

ptitzler · 2022-05-04T23:25:36Z

I'm not sure items 1 and 2 of the proposed behavior are user friendly and align with the behavior in other parts of the Elyra code.

Where applicable (e.g. environment variable names, mount points, pvc names in node properties, pipeline names, ...), we consider leading and trailing whitespaces in user-provided text input as irrelevant and strip them off and don't return an error. This way the input is standardized prior to processing and the user doesn't need to manually resolve minor issues that the application can easily handle.

kevin-bates · 2022-05-05T17:57:43Z

@ptitzler

I'm not sure items 1 and 2 of the proposed behavior are user friendly and align with the behavior in other parts of the Elyra code.

You're correct. It seems we've only recently had a focus on that aspect of things and the bulk of this PR was done well before then. That said, I agree that we should have a consistent UX. The issue here is that this kind of implicit handling for specific properties is not well-suited to a schema-driven model inherent in our metadata-based instances. Because the front-end and CLI tooling are also schema-driven, I suspect the correct approach would be to introduce a metadata class (via the metadata_class_name property in the schema, which applies these kinds of changes.

I think introducing classes specific to each of the "system-owned" schemas would be best (at least those that define tags although we need to decide if we want this in general) and each of these classes would either derive from or use a mixin class (multiple inheritance) that is responsible for trimming explicitly enumerated string-valued properties (including lists of strings). The idea is that this "trim" class defines an empty list attribute (e.g., trim_properties: List) that is inherited by the derived class and the derived class initializes that list to its set of properties that should be trimmed. The trim class would then implement pre_save() to trim the values of whichever properties are in the self.trim_properties list.

If we feel that all string-valued properties (and lists) should be trimmed unconditionally, then we could just build this into the base metadata manager and forgo the additional instance class overrides. I think that might be a tough ask for our cyrstal ball since there may be some special case somewhere that then breaks this approach.

As far as this PR goes, I suggest we go ahead and merge this as a backend catch all for untrimmed tags values.

Thoughts and suggestions are welcome...

akchinSTC

Im good with this going in as is for now since its scoped to just tags as a catch all until we can figure out the bigger picture with whitespace handling in properties

Metadata tag whitespace proposed solution

4f76e87

more information can be found here: elyra-ai#2019

JoshuaAlter marked this pull request as ready for review March 8, 2022 17:35

propagate proposed solution to other tag instances

967a5b5

akchinSTC requested review from kevin-bates and akchinSTC March 8, 2022 22:32

JoshuaAlter marked this pull request as draft March 9, 2022 14:17

akchinSTC added the status:Waiting for Author label Mar 16, 2022

joshuaalter added 2 commits May 4, 2022 17:01

test: added tests for new metadata schema

7960f61

refactor: fix linting

2f1f087

kevin-bates added component:metadata metadata runtime area:back-end and removed status:Waiting for Author labels May 4, 2022

JoshuaAlter marked this pull request as ready for review May 4, 2022 21:55

kevin-bates approved these changes May 4, 2022

View reviewed changes

ptitzler added this to the 3.9.0 milestone May 4, 2022

kevin-bates requested a review from ptitzler May 5, 2022 14:19

akchinSTC approved these changes May 6, 2022

View reviewed changes

akchinSTC merged commit 8b9d93b into elyra-ai:master May 10, 2022

JoshuaAlter deleted the backend-metadata-whitespace-2019 branch May 10, 2022 13:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Metadata tag whitespace proposed solution #2538

Metadata tag whitespace proposed solution #2538

JoshuaAlter commented Mar 8, 2022 •

edited

Loading

elyra-bot bot commented Mar 8, 2022

kevin-bates commented Mar 8, 2022

kevin-bates left a comment

ptitzler commented May 4, 2022

kevin-bates commented May 5, 2022

akchinSTC left a comment

Metadata tag whitespace proposed solution #2538

Metadata tag whitespace proposed solution #2538

Conversation

JoshuaAlter commented Mar 8, 2022 • edited Loading

elyra-bot bot commented Mar 8, 2022

kevin-bates commented Mar 8, 2022

kevin-bates left a comment

Choose a reason for hiding this comment

ptitzler commented May 4, 2022

kevin-bates commented May 5, 2022

akchinSTC left a comment

Choose a reason for hiding this comment

JoshuaAlter commented Mar 8, 2022 •

edited

Loading