Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some Operators Should Support New Attribute Types #1954

Closed
aahei opened this issue Jun 3, 2023 · 1 comment
Closed

Some Operators Should Support New Attribute Types #1954

aahei opened this issue Jun 3, 2023 · 1 comment
Assignees

Comments

@aahei
Copy link
Contributor

aahei commented Jun 3, 2023

During the review of PR #1924, we found that the following operators lack support for some attribute types for their input attributes, which should be added or considered in the future.

ScatterPlot

Currently supports integer and double.

Should also support long, and maybe timestamp.

Aggregate

For the aggregation functions sum, average, min, and max, it currently supports integer, long, double and timestamp.

Should consider supporting boolean.

Sort Partition

Currently supports integer, long, and double.

Should also support timestamp.

Filter

Currently support string/boolean/long/integer/double/timestamp/any.

Should also support other types such as binary, for example, to check for null.

@sadeemsaleh
Copy link
Collaborator

@aahei @Yicong-Huang The reason the scatterplot does not support long and timestamp is mentioned as a comment in the code itself, please check why we should not support other than Integer and Double. Therefore, please do not change it.

aahei added a commit that referenced this issue Jun 24, 2023
…titions, Type Casting (#2005)

This is the second PR for the attribute type checking feature. The first
one is #1924.

## Description of Attribute Type Rules
### Interval Join
`leftAttributeName` (and `rightAttributeName`) must be `integer`,
`long`, `double`, or `timestamp`.

And, `leftAttributeName` attribute must have the same type as the
`rightAttributeName`.

```JSON
{
  "attributeTypeRules": {
    "leftAttributeName": {
      "enum": ["integer", "long", "double", "timestamp"]
    },
    "rightAttributeName": {
      "const": {
        "$data": "leftAttributeName"
      }
    }
  }
}
```

Note: We intentionally put `enum` test in front of `const` test, because
we want to test whether they have the correct type. Or, if we put the
`const` test first, i.e `rightAttributeName` rule first, and if
`leftAttributeName`'s attribute type is an invalid type like `string`,
then it will prompt the user that `rightAttributeName` should have the
same attribute type as `leftAttributeName` -- `string` -- which is
incorrect since both should not be a `string` type.

### Scatter Plot

`xColumn` and `yColumn` attributes must be of `integer` or `double`
type.

```JSON
{
  "attributeTypeRules": {
    "xColumn":{
      "enum": ["integer", "double"]
    },
    "yColumn":{
      "enum": ["integer", "double"]
    }
  }
}
```

Note: it may support `long` in the future. See
#1954.

### Sort Partitions

`sortAttributeName` attribute type must be `integer`, `long`, or
`double`.

```JSON
{
  "attributeTypeRules": {
    "sortAttributeName":{
      "enum": ["integer", "long", "double"]
    }
  }
}
```

Note: May support `timestamp` in the future. See
#1954.

### Type Casting

For example, if we want to convert an attribute to `integer`, it must
have attribute type of `string`, `long`, `double`, or `boolean`. A type
should not convert to the type itself. See the schema for detail.

```JSON
{
	"attributeTypeRules": {
		"attribute": {
			"allOf": [{
					"if": {
						"resultType": {
							"valEnum": ["integer"]
						}
					},
					"then": {
						"enum": ["string", "long", "double", "boolean"]
					}
				},
				{
					"if": {
						"resultType": {
							"valEnum": ["double"]
						}
					},
					"then": {
						"enum": ["string", "integer", "long", "boolean"]
					}
				},
				{
					"if": {
						"resultType": {
							"valEnum": ["boolean"]
						}
					},
					"then": {
						"enum": ["string", "integer", "long", "double"]
					}
				},
				{
					"if": {
						"resultType": {
							"valEnum": ["long"]
						}
					},
					"then": {
						"enum": ["string", "integer", "double", "boolean", "timestamp"]
					}
				},
				{
					"if": {
						"resultType": {
							"valEnum": ["timestamp"]
						}
					},
					"then": {
						"enum": ["string", "long"]
					}
				}
			]
		}
	}
}
```

Note: The type constraint is enforced in
`core/amber/src/main/scala/edu/uci/ics/texera/workflow/common/tuple/schema/AttributeTypeUtils.scala`.

---------

Co-authored-by: Yicong Huang <17627829+Yicong-Huang@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants