Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft 2020 style tuples #157

Open
Zemnmez opened this issue Dec 3, 2024 · 1 comment
Open

Draft 2020 style tuples #157

Zemnmez opened this issue Dec 3, 2024 · 1 comment
Labels
enhancement New feature or request

Comments

@Zemnmez
Copy link

Zemnmez commented Dec 3, 2024

I have been trying to connect a zod-to-json-schema generated jsonschema up to Go via https://github.com/a-h/generate. When I generate, I get this error:

the JSON type 'array' cannot be converted into the Go 'Schema' type on struct 'Schema', field 'Properties.Properties.Properties.Items'. See input file bazel-out/darwin_arm64-fastbuild/bin/ts/twitter/json_schema.json line 82, character 1

The relevant lines generated:

            "coordinates": {
              "type": "array",
              "minItems": 2,
              "maxItems": 2,
              "items": [
                {
                  "type": "string"
                },
                {
                  "type": "string"
                }
              ]
            },

The corresponding zod type is:

/**
 * Schema for geographic coordinates.
 */
export const coordinatesSchema = z.strictObject({
  /**
   * An array of coordinates in [longitude, latitude] format.
   */
  coordinates: z.tuple([z.string(), z.string()]),
  /**
   * The type of geographic location, e.g., 'Point'.
   */
  type: z.literal("Point"),
});

As you can see, the 2-tuple of strings has been turned into an array with two items fields. The note in tuple validation describes "items" working as used by zod-to-json-schema only in the draft version of the spec (specifically draft 4).

I've put a jq script below that fixes the issue for me.

Full output JSON schema
{
  "type": "object",
  "properties": {
    "tweet": {
      "type": "object",
      "properties": {
        "edit_info": {
          "type": "object",
          "properties": {
            "initial": {
              "type": "object",
              "properties": {
                "editTweetIds": {
                  "type": "array",
                  "items": {
                    "type": "string"
                  }
                },
                "editableUntil": {
                  "type": "string",
                  "format": "date-time"
                },
                "editsRemaining": {
                  "type": "string"
                },
                "isEditEligible": {
                  "type": "boolean"
                }
              },
              "required": [
                "editTweetIds",
                "editableUntil",
                "editsRemaining",
                "isEditEligible"
              ],
              "additionalProperties": false
            }
          },
          "required": [
            "initial"
          ],
          "additionalProperties": false
        },
        "created_at": {
          "type": "string"
        },
        "id": {
          "type": "string"
        },
        "id_str": {
          "type": "string"
        },
        "full_text": {
          "type": "string"
        },
        "favorited": {
          "type": "boolean"
        },
        "favorite_count": {
          "type": "string"
        },
        "withheld_copyright": {
          "type": "boolean"
        },
        "withheld_in_countries": {
          "type": "array",
          "items": {
            "type": "string"
          }
        },
        "possibly_sensitive": {
          "type": "boolean"
        },
        "geo": {
          "type": "object",
          "properties": {
            "coordinates": {
              "type": "array",
              "minItems": 2,
              "maxItems": 2,
              "items": [
                {
                  "type": "string"
                },
                {
                  "type": "string"
                }
              ]
            },
            "type": {
              "type": "string",
              "const": "Point"
            }
          },
          "required": [
            "coordinates",
            "type"
          ],
          "additionalProperties": false
        },
        "retweeted": {
          "type": "boolean"
        },
        "lang": {
          "type": "string"
        },
        "extended_entities": {
          "type": "object",
          "properties": {
            "hashtags": {
              "type": "array",
              "items": {
                "type": "object",
                "properties": {
                  "text": {
                    "type": "string"
                  },
                  "indices": {
                    "type": "array",
                    "minItems": 2,
                    "maxItems": 2,
                    "items": [
                      {
                        "type": "string"
                      },
                      {
                        "type": "string"
                      }
                    ]
                  }
                },
                "required": [
                  "text"
                ],
                "additionalProperties": false
              }
            },
            "symbols": {
              "type": "array",
              "items": {
                "type": "object",
                "properties": {
                  "text": {
                    "type": "string"
                  },
                  "indices": {
                    "type": "array",
                    "minItems": 2,
                    "maxItems": 2,
                    "items": [
                      {
                        "type": "string"
                      },
                      {
                        "type": "string"
                      }
                    ]
                  }
                },
                "required": [
                  "text",
                  "indices"
                ],
                "additionalProperties": false
              }
            },
            "user_mentions": {
              "type": "array"
            },
            "urls": {
              "type": "array",
              "items": {
                "type": "object",
                "properties": {
                  "url": {
                    "type": "string"
                  },
                  "expanded_url": {
                    "type": "string"
                  },
                  "display_url": {
                    "type": "string"
                  },
                  "indices": {
                    "type": "array",
                    "minItems": 2,
                    "maxItems": 2,
                    "items": [
                      {
                        "type": "string"
                      },
                      {
                        "type": "string"
                      }
                    ]
                  }
                },
                "required": [
                  "url",
                  "expanded_url",
                  "display_url",
                  "indices"
                ],
                "additionalProperties": false
              }
            },
            "media": {
              "type": "array"
            },
            "polls": {
              "type": "array"
            }
          },
          "additionalProperties": false
        },
        "display_text_range": {
          "type": "array",
          "minItems": 2,
          "maxItems": 2,
          "items": [
            {
              "type": "string"
            },
            {
              "type": "string"
            }
          ]
        },
        "retweet_count": {
          "type": "string"
        },
        "source": {
          "type": "string"
        },
        "truncated": {
          "type": "boolean"
        },
        "in_reply_to_status_id": {
          "type": "string"
        },
        "in_reply_to_status_id_str": {
          "type": "string"
        },
        "in_reply_to_user_id": {
          "type": "string"
        },
        "in_reply_to_user_id_str": {
          "type": "string"
        },
        "in_reply_to_screen_name": {
          "type": "string"
        },
        "user": {
          "type": "object",
          "properties": {
            "id": {
              "type": "number"
            },
            "id_str": {
              "type": "string"
            },
            "name": {
              "type": "string"
            },
            "screen_name": {
              "type": "string"
            },
            "location": {
              "type": "string"
            },
            "url": {
              "type": "string"
            },
            "description": {
              "type": "string"
            },
            "verified": {
              "type": "boolean"
            },
            "followers_count": {
              "type": "number"
            },
            "friends_count": {
              "type": "number"
            },
            "listed_count": {
              "type": "number"
            },
            "favourites_count": {
              "type": "number"
            },
            "statuses_count": {
              "type": "number"
            },
            "created_at": {
              "type": "string"
            },
            "utc_offset": {
              "type": "number"
            },
            "time_zone": {
              "type": "string"
            },
            "geo_enabled": {
              "type": "boolean"
            },
            "lang": {
              "type": "string"
            },
            "contributors_enabled": {
              "type": "boolean"
            },
            "is_translator": {
              "type": "boolean"
            },
            "profile_image_url": {
              "type": "string"
            },
            "profile_image_url_https": {
              "type": "string"
            },
            "profile_banner_url": {
              "type": "string"
            },
            "default_profile": {
              "type": "boolean"
            },
            "default_profile_image": {
              "type": "boolean"
            },
            "following": {
              "type": "boolean"
            },
            "follow_request_sent": {
              "type": "boolean"
            },
            "notifications": {
              "type": "boolean"
            }
          },
          "required": [
            "id",
            "id_str",
            "name",
            "screen_name",
            "verified",
            "followers_count",
            "friends_count",
            "listed_count",
            "favourites_count",
            "statuses_count",
            "created_at",
            "geo_enabled",
            "contributors_enabled",
            "is_translator",
            "profile_image_url_https",
            "default_profile",
            "default_profile_image"
          ],
          "additionalProperties": false
        },
        "coordinates": {
          "$ref": "#/properties/tweet/properties/geo"
        },
        "place": {
          "type": "object",
          "properties": {
            "attributes": {
              "type": "object",
              "additionalProperties": {}
            },
            "bounding_box": {
              "type": "object",
              "properties": {
                "coordinates": {
                  "type": "array",
                  "items": {
                    "type": "array",
                    "items": {
                      "type": "array",
                      "minItems": 2,
                      "maxItems": 2,
                      "items": [
                        {
                          "type": "number"
                        },
                        {
                          "type": "number"
                        }
                      ]
                    }
                  }
                },
                "type": {
                  "type": "string"
                }
              },
              "required": [
                "coordinates",
                "type"
              ],
              "additionalProperties": false
            },
            "country": {
              "type": "string"
            },
            "country_code": {
              "type": "string"
            },
            "full_name": {
              "type": "string"
            },
            "id": {
              "type": "string"
            },
            "name": {
              "type": "string"
            },
            "place_type": {
              "type": "string"
            },
            "url": {
              "type": "string"
            }
          },
          "required": [
            "attributes",
            "bounding_box",
            "country",
            "country_code",
            "full_name",
            "id",
            "name",
            "place_type",
            "url"
          ],
          "additionalProperties": false
        },
        "entities": {
          "$ref": "#/properties/tweet/properties/extended_entities"
        }
      },
      "required": [
        "edit_info",
        "created_at",
        "id",
        "id_str",
        "full_text",
        "favorited",
        "source",
        "truncated",
        "entities"
      ],
      "additionalProperties": false
    }
  },
  "required": [
    "tweet"
  ],
  "additionalProperties": false,
  "$schema": "http://json-schema.org/draft-07/schema#"
}
Full Input Zod Schema
import { z } from "zod";

export const tweetId = z.string();
export const date = z.string().datetime();

/**
 * Schema for geographic coordinates.
 */
export const coordinatesSchema = z.strictObject({
  /**
   * An array of coordinates in [longitude, latitude] format.
   */
  coordinates: z.tuple([z.string(), z.string()]),
  /**
   * The type of geographic location, e.g., 'Point'.
   */
  type: z.literal("Point"),
});

/**
 * Schema for place information associated with a post.
 */
export const placeSchema = z.strictObject({
  /**
   * A record of additional attributes about the place.
   */
  attributes: z.record(z.any()),
  /**
   * The bounding box for the place, defining its coordinates.
   */
  bounding_box: z.strictObject({
    /**
     * An array of arrays defining the bounding box. Each sub-array contains [longitude, latitude].
     */
    coordinates: z.array(z.array(z.tuple([z.number(), z.number()]))),
    /**
     * Type of bounding box geometry, e.g., 'Polygon'.
     */
    type: z.string(),
  }),
  /**
   * The country name.
   */
  country: z.string(),
  /**
   * The ISO country code.
   */
  country_code: z.string(),
  /**
   * The full name of the place.
   */
  full_name: z.string(),
  /**
   * The unique identifier for the place.
   */
  id: z.string(),
  /**
   * The short name of the place.
   */
  name: z.string(),
  /**
   * The type of place, e.g., 'city'.
   */
  place_type: z.string(),
  /**
   * A URL providing more details about the place.
   */
  url: z.string(),
});

/**
 * Schema for user information.
 */
export const userSchema = z.strictObject({
  /**
   * The unique identifier for the user.
   */
  id: z.number(),
  /**
   * The string representation of the user ID.
   */
  id_str: z.string(),
  /**
   * The name of the user.
   */
  name: z.string(),
  /**
   * The screen name (handle) of the user.
   */
  screen_name: z.string(),
  /**
   * The user's location, if provided.
   */
  location: z.string().optional(),
  /**
   * The URL associated with the user, if provided.
   */
  url: z.string().optional(),
  /**
   * A short bio or description of the user.
   */
  description: z.string().optional(),
  /**
   * Indicates whether the user is verified.
   */
  verified: z.boolean(),
  /**
   * The number of followers the user has.
   */
  followers_count: z.number(),
  /**
   * The number of accounts the user is following.
   */
  friends_count: z.number(),
  /**
   * The number of public lists that include the user.
   */
  listed_count: z.number(),
  /**
   * The number of posts the user has liked.
   */
  favourites_count: z.number(),
  /**
   * The total number of posts the user has made.
   */
  statuses_count: z.number(),
  /**
   * The date and time when the user account was created.
   */
  created_at: z.string(),
  /**
   * The user's UTC offset, if available.
   */
  utc_offset: z.number().optional(),
  /**
   * The user's time zone, if available.
   */
  time_zone: z.string().optional(),
  /**
   * Indicates whether the user has enabled geotagging.
   */
  geo_enabled: z.boolean(),
  /**
   * The language of the user's interface, if specified (BCP 47 format).
   */
  lang: z.string().optional(),
  /**
   * Indicates if the account supports contributors.
   */
  contributors_enabled: z.boolean(),
  /**
   * Indicates whether the user is marked as a translator.
   */
  is_translator: z.boolean(),
  /**
   * The URL of the user's profile image.
   */
  profile_image_url: z.string().optional(),
  /**
   * The HTTPS URL of the user's profile image.
   */
  profile_image_url_https: z.string(),
  /**
   * The URL of the user's profile banner image, if available.
   */
  profile_banner_url: z.string().optional(),
  /**
   * Indicates whether the user uses the default profile.
   */
  default_profile: z.boolean(),
  /**
   * Indicates whether the user uses the default profile image.
   */
  default_profile_image: z.boolean(),
  /**
   * Indicates whether the authenticating user follows this user.
   */
  following: z.boolean().optional(),
  /**
   * Indicates whether a follow request has been sent to this user.
   */
  follow_request_sent: z.boolean().optional(),
  /**
   * Indicates whether the authenticating user has notifications enabled for this user.
   */
  notifications: z.boolean().optional(),
});

/**
 * Schema for entities extracted from post text.
 */
export const entitiesSchema = z.strictObject({
  /**
   * Array of hashtags included in the post.
   */
	hashtags: z.strictObject({
		text: z.string(),
		indices: z.tuple([
			z.string(),
			z.string()
		]).optional(),
  }).array().optional(),
  /**
   * Array of symbols mentioned in the post.
   */
	symbols: z.strictObject({
	  text: z.string(),
		indices: z.tuple([
			z.string(),
			z.string()
	  ])
  }).array().optional(),
  /**
   * Array of user mentions in the post.
   */
  user_mentions: z.array(z.any()).optional(),
  /**
   * Array of URLs included in the post.
   */
  urls: z.array(
    z.strictObject({
      /**
       * The URL as it appears in the post.
       */
      url: z.string(),
      /**
       * The fully resolved URL.
       */
      expanded_url: z.string(),
      /**
       * The shortened display version of the URL.
       */
      display_url: z.string(),
      /**
       * Start and end indices of the URL in the post text.
       */
      indices: z.tuple([z.string(), z.string()]),
    })
  ).optional(),
  /**
   * Array of media objects included in the post, if any.
   */
  media: z.array(z.any()).optional(),
  /**
   * Array of polls included in the post, if any.
   */
  polls: z.array(z.any()).optional(),
});

/**
 * Schema for a post object.
 */
export const postSchema = z.strictObject({
	edit_info: z.strictObject({
		initial: z.strictObject({
			editTweetIds: z.string().array(),
			editableUntil: date,
			editsRemaining: z.string(),
			isEditEligible: z.boolean()
		})
	}),
  /**
   * The UTC time when the post was created.
   */
  created_at: z.string(),
  /**
   * The unique identifier for the post (integer format).
   */
  id: z.string(),
  /**
   * The unique identifier for the post (string format).
   */
  id_str: z.string(),
  /**
   * The actual text content of the post.
   */
  full_text: z.string(),
  favorited: z.boolean(),
  favorite_count: z.string().optional(),
  /**
   * Whether the tweet was withheld in some
   * countries for copyright reasons.
   *
   * See withheld_in_countries.
   */
  withheld_copyright: z.boolean().optional(),
  withheld_in_countries: z.string().array().optional(),
  possibly_sensitive: z.boolean().optional(),
	geo: coordinatesSchema.optional(),
  retweeted: z.boolean().optional(),
  lang: z.string().optional(),
  extended_entities: entitiesSchema.optional(),
	display_text_range: z.tuple([
		z.string(),
		z.string()
  ]).optional(),
  retweet_count: z.string().optional(),
  /**
   * The source utility used to post the content.
   */
  source: z.string(),
  /**
   * Indicates if the text was truncated due to length limits.
   */
  truncated: z.boolean(),
  /**
   * The ID of the original post this post is replying to (integer format).
   */
  in_reply_to_status_id: z.string().optional(),
  /**
   * The ID of the original post this post is replying to (string format).
   */
  in_reply_to_status_id_str: z.string().optional(),
  /**
   * The ID of the user this post is replying to (integer format).
   */
  in_reply_to_user_id: z.string().optional(),
  /**
   * The ID of the user this post is replying to (string format).
   */
  in_reply_to_user_id_str: z.string().optional(),
  /**
   * The screen name of the user this post is replying to.
   */
  in_reply_to_screen_name: z.string().optional(),
  /**
   * The user who posted this content.
   */
  user: userSchema.optional(),
  /**
   * Geographic location of the post, if available.
   */
  coordinates: coordinatesSchema.optional(),
  /**
   * Place information associated with the post.
   */
  place: placeSchema.optional(),
  /**
   * Entities parsed from the post text.
   */
  entities: entitiesSchema,
});

export const archivedTweetSchema = z.strictObject({
	tweet: postSchema
})
jq script to fix
# Define a recursive function to process JSON schemas.
# This function transforms schemas with `items` arrays (used for tuple validation in older drafts)
# into schemas using `prefixItems`, which is the modern approach for tuple validation.

def process_schema:
  # Bind the current schema to $schema for ease of reference
  . as $schema
  | if type == "object" then
      # Iterate over all keys in the current schema
      reduce keys[] as $key (
        .;
        # Recursively process each value in the schema
        .[$key] = (.[$key] | process_schema)
      )
      # Check if the current schema is an array type
      # and if `items` is defined as an array (indicating tuple validation in older drafts)
      | if .type == "array" and (.items | type == "array") then
          # Replace `items` with `prefixItems` for compatibility with modern drafts
          .prefixItems = .items
          # Remove the legacy `items` field
          | del(.items)
        else
          # Leave the schema unchanged if no transformation is needed
          .
        end
    elif type == "array" then
      # If the current schema is an array of schemas, recursively process each element
      map(process_schema)
    else
      # For all other types, return the schema unchanged
      .
    end;

# Apply the transformation function to the root schema
process_schema
@StefanTerdell
Copy link
Owner

StefanTerdell commented Dec 3, 2024

Hey @Zemnmez, thanks for opening an issue.

I believe you may have misread the spec - the keyword is used like this from draft 4 up to and including draft 2019, the latter of which being the latest target supported by this package.

Obviously you should have gone with Rust instead and this wouldn't have been an issue

Just kidding of course

...mostly :D

I will leave the issue open as a feature request for a draft 2020 target.

@StefanTerdell StefanTerdell added the enhancement New feature or request label Dec 3, 2024
@StefanTerdell StefanTerdell changed the title Tuples Generate Incorrect "items" key. Draft 2020 style tuples Dec 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants