Skip to content

Conversation

@libbey-observable
Copy link
Contributor

@libbey-observable libbey-observable commented Jan 25, 2023

Resolves https://github.com/observablehq/observablehq/issues/9862 and https://github.com/observablehq/observablehq/issues/9673

In this inference approach, we take a sample (currently the first 100 rows), and for each column, count how many times we encounter each possible data type. Then, if > 90% of the sampled values conform to that type, we take the most frequently encountered type as the column's type. If not, we use type "other."

These changes include some support for upcoming column type assertions, but for the purposes of this PR, I'm testing against main.

inference_and_coercion.mov

Copy link
Member

@mbostock mbostock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost there! Just a few more things, and we’re done, I swear. 😅

src/table.js Outdated
case "boolean":
if (typeof value === "string") {
const trimValue = value.trim();
return trimValue === "true" ? true : trimValue === "false" ? false : null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just want to note that I was surprised that our boolean coercion returned false for a column of string values, but true for other types, such as a column of empty objects. Maybe the default return value for strings should be Boolean(value), but then that ruins detecting the strings "true" and "false".
Screen Shot 2023-02-02 at 11 45 14 AM

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It returned null for string values (rather than false). I'm not sure what the right response is when a Date, object, or other types are coerced to boolean. TRUE seems surprising to me, but if that's what we want to do, then it does seem strange to have type string alone not conform to that. Any thoughts, @mbostock?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have to do this because string is special: it’s the “untyped” format we use to serialize other types (namely in CSV).

Co-authored-by: Mike Bostock <mbostock@gmail.com>
Copy link
Member

@mbostock mbostock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I posted a few trivial suggestions which you can take or leave, but here is what I think (hope!) is the last logical issue! 🙏

@libbey-observable libbey-observable merged commit fa1b356 into main Feb 3, 2023
@libbey-observable libbey-observable deleted the libbey/infer-and-coerce branch February 3, 2023 00:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants