Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Normalize (variable) data types #2324

Open
1 of 7 tasks
sergei-maertens opened this issue Nov 8, 2022 · 8 comments
Open
1 of 7 tasks

Normalize (variable) data types #2324

sergei-maertens opened this issue Nov 8, 2022 · 8 comments
Assignees

Comments

@sergei-maertens
Copy link
Member

sergei-maertens commented Nov 8, 2022

See also #2251
See also #2305 (using local type information instead of requiring formio definitions would be very useful)

In short - we need to ensure that all (input) data has the correct type for processing in python and when calculations are done with json-logic, we need to serialize back to json-types and all data needs to be properly normalized.

We have essentially the following flow of information:


                                             +------------------+
+------------+      +----------------+       | Logic evaluation |____\ - output data (JSON)
| Input data |____\ | backend        |____\  | (JSON)           |    / - updated variable values (python/JSON)
| (JSON)     |    / | (python types) |    /  +------------------+
+------------+      +----------------+        ^
                                              |
                    +----------------+        |
                    | logic rules    |________+
                    | (JSON)         |
                    +----------------+

The 'problem' with JSON is that it only has a number of primitives that are used for richer Python types:

  • string <-> date
  • string <-> datetime
  • string <-> duration
  • string <-> time
  • number <-> int
  • number <-> float
  • number <-> decimal
  • ...

This is further complicated with the formio component types and the notion of single/multiple values (array vs. primitive).

Using the python datatypes

Simply just using JSON types (complex & primitives) is not sufficient because we cannot do smart operations on them. We must support (non-exhaustive list) the following operations:

  • formatting for display using Django's template language filters (e.g. |date, |time, |floatformat )
  • calculations, e.g. timestamp + timedelta
  • private API: introspecting/invoking methods - if we have strategy objects encapsulating data (dataclasses!) we want to use those to the full extent
  • deep data extraction/assigment operations (e.g. using glom or jq to read/write certain data)
  • submission data exports (tablib) also benefits from python types for serialization

Identified boundaries

We can identify "our own code" as the system boundary. This implies:

  • data enters this boundary (via the API endpoints) as JSON, once the boundary is being crossed we transform this into the relevant python types. For input variables, we can use the FormVariable type information (see SubmissionValueVariable.to_python)
  • data enters this boundary (from the database adapter) as JSON (JSON logic expresions/rules). We cannot infer a more detailed type here - a string may be a date, datetime, date or simply a literal text.
  • data exits the boundary once we pass it to third party libraries (json-logic)
  • data exits the boundary once we serialize back for API responses
  • data exits the boundary once we store variable values in the database (as JSON)

note: 'as JSON' implies the result of json.loads(...) here, so we have python dicts/lists/strings/ints/floats/NoneType.

handling JSON logic

The JSON logic library essentially operates on JSON primitives (or complex objects) and we should deal with that. This is particularly challening when comparing datetimes (or dates and datetimes) for example:

2022-11-08T14:12:00+01:00 is equal to 2022-11-08T13:12:00+00:00 and 2022-11-08T13:12:00Z - but simple string operations will not give the same result.

We need to normalize JSON logic expressions with the available type information at save time so that runtime is as simple as possible:

  • Investigate normalizing date/datetime literals to be in the same timezone (settings.TIMEZONE)
  • Investigate date/datetime + timedelta operations

A conclusion may be that we need to pass python-objects (datetimes) down to json logic rather than just serialized versions.

Tasks taken from refinement

  • Prototype the static analysis #2959
  • Prototype the conversion to Python type (step 2)
  • Prototype the preprocessing (step 3)
  • Clean up (track down and burn them) existing workarounds for dealing with datatypes
  • Migrate JSON-logic rules
@sergei-maertens sergei-maertens self-assigned this Nov 8, 2022
@SilviaAmAm
Copy link
Contributor

Additional thoughts:
At the moment we have:

Before my PR, we were using python data for the json logic. If the PR is merged we would have:

  • JSON data for the logic
  • Python data for injection into the dynamic configuration
  • String representations for the rendering

@SilviaAmAm
Copy link
Contributor

Additional example: dealing with currency / number component

  • For the calculations we need to have the currency as a float
  • For displaying in the form we need to have the value formatted with the right number of decimals as configured in the component
  • For injecting into the configuration (content component, labels, help texts) we need to have the currency symbol and the decimals € 123,00.

@sergei-maertens
Copy link
Member Author

blocked until we get the go-ahead for this

@SilviaAmAm
Copy link
Contributor

Another issue related to types: #2707

@joeribekker
Copy link
Contributor

The work on this cannot block releasing intermediate versions. So, working on this needs to be done outside of master OR as feature flag.

@sergei-maertens
Copy link
Member Author

Chris mentioned that we can essentially do some type inference and "rewrite"/compile/transpile JSON logic expressions into equivalents that can be evaluated on both backend and frontend.

E.g. a string datetime + relative delta -> convert to unix timestamp (number) + delta (number) & the result can then be compared in terms of primitives.

@joeribekker
Copy link
Contributor

joeribekker commented Mar 15, 2023

  • Static analysis according to Chris and Sergei means: Infer / record type information of variables and expressions at compile time, so when the form is saved/edited (not when submitter does things with the form)
  • Silvia asks: If you have a JSON logic rule, where you have a date and a delta, would you create a variable of type integer?
  • This solution takes into account eventually moving logic to the frontend
  • Joeri still wants datatypes to be explicitly set (for like overriding that a dropdown has integers and not strings). Sergei/Chris play with the idea to infer this.
  • Goal (according to Sergei) is to have a PoC that can make our Python domain clean with a subset of JSON-logic operators and Python/formio types.
  • Example to showcase (functionally): Dropdown with numbers (which are currently strings and cannot be used in calculations)
  • Example to showcase (better code): Remove patch of JsonLogic to work with date/datetime (today static var)
  • Example to showcase (validation): Show error if logic rule is applied to wrong types (solves Setting variables to wrong type can cause TypeError #2707)

To apply logic:

data (1) -> convert (2) -> preproces (3) -> jsonlogic (4) -> convert (5) -> result (6)

To load data from the database / load data from the user submission:

data (1) -> convert (2) -> result (6)
  1. JSON
  2. Convert JSON to Python data, using datatype metadata
  3. Convert Python to JSON suitable for JsonLogic
  4. Execute JSON-logic with logic rules
  5. Convert JSON result from JSON-logic to Python data, using datatype metadata
  6. Use Python data in templates, component labels, API serializers

Example:

  • \1) "2023-03-15", 2) Date(2023, 3, 15), 3) 12345 (Unix timestamp) 4) Apply rule, like add timedelta, 5) Date(2023, 3, 16) 6) {{ date|format }}

@joeribekker joeribekker added the epic Large theme and/or meta issue label Aug 14, 2023
@joeribekker
Copy link
Contributor

Refinement: We should create concrete things todo (=issues) with this ticket as the main epic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Todo
Development

No branches or pull requests

4 participants