[SIP-99C] Proposal for model and business validation #25828

john-bodley · 2023-11-01T20:25:32Z

[SIP-99C] Proposal for model and business validation

This SIP is part of the [SIP-99] Proposal for correctly handling business logic series. Specifically, it proposes formalizing where and how model and business validation should occur.

Validation

There are typically four places where validation can occur:

Database: via CHECK, FOREIGN KEY, or UNIQUE constraints, triggers, etc.
SQLAlchemy ORM: via the validates() decorator
DAOs: via the various validate_*() methods
Commands: via the BaseCommand.validate() class method

These are ranked and thus preference should be given—per the "shift left" approach—for validation to occur at the lowest level possible to ensure consistency, adherence of the DRY principle, etc.

It's worth noting that validation at the database level is nuanced given how different databases handle NULL values from a distinct uniqueness perspective. A great example of this is ensuring that a dataset is unique—in terms of the (schema, table_name) tuple—where an explicit UNIQUE constraint is missing, given that (as mentioned),

The reason it does not physically exist is MySQL, PostgreSQL, etc. have a different interpretation of uniqueness when it comes to NULL which is problematic given the schema is optional.

i.e., in MySQL the (NULL, foo) and (NULL, foo) tuples are perceived as being different, yet from a dataset perspective we would perceive these as being the same object.

The main challenges with our current approach are:

We tend to specify/enforce non-business validation logic at the DAO or Command level.
We tend to "ask for permission" as opposed to "ask for forgiveness" which is both inefficient (due to the redundancy of first checking and then actioning) and—when combined with (1)—does not protect against a potential race condition.
The validate() method tends to augment state—likely to address the inefficiency of (2)—which is an anti-pattern.

Examples

The following examples illustrate where the validation occurs at the wrong place:

The FavStar model has no uniqueness constraint on the (user_id, class_name, obj_id) tuple meaning that the uniqueness validation check is wrongfully handled by the ChartDAO.favorited_ids() and DashboardDAO.favorited_ids() methods when invoking ChartDAO.add_favorite(), DashboardDAO.add_favorite(), etc.
The Annotation model has no uniqueness constraint on the short_descr column meaning that the uniqueness validation check is wrongfully handled in the AnnotationDAO.validate_update_uniqueness() method.
The TagDAO.validate_tag_name() method checks for invalid characters yet this can be handled by either a CHECK constraint or SQLAlchemy validates() decorator.
The DeleteChartCommand.validate() method unnecessarily checks to see that there are no associated alerts or reports whereas it should rely solely on the ReportSchedule.chart_id FOREIGN KEY constraint.
There exists pairs of methods, i.e., DashboardDAO.validate_slug_uniqueness() and DashboardDAO.validate_update_slug_uniqueness(), for validating uniqueness used for creating and updating an entity respectively where the later (rightfully) includes an additional filter to exclude itself. This differentiation (and thus added complexity) is not required if there existed a corresponding UNIQUE constraint combined with the “ask for forgiveness” approach.

The following examples illustrate where the validation checks could be violated:

The ChartDAO.add_favorite() method first checks whether the user has favorited said chart and if not creates the favorite as opposed to simply trying to favorite the chart and relying on a UNIQUE constraint to block (at the database level) the creation if it already exists.

The following examples illustrate where validation augments state:

The UpdateDatabaseCommand.validate() method assigns the self._model attribute after fetching the model so it can be leveraged during the run phase.
The CreateChartCommand.validate() method assigns the self._model attribute and augments the self._properities attribute.
The ChartWarmUpCacheCommand.validate() method reassigns the self._chart_or_id attribute if the variable is an ID. Post validation the attribute name is misleading given it only pertains to a Chart object—hence this confusing statement.

The following examples illustrate inconsistencies between where validation occurs:

The ChartDataCommand.run() method does not invoke the sibling ChartDataCommand.validate() method, yet the CreateChartCommand.run() method does invoke the sibling CreateChartCommand.validate() method, i.e., due to inconsistencies, it is not apparent whether the Command run() method will invoke the corresponding validate() method.

Proposed Change

Validation should occur at the lowest possible (left most) level, given not all database operations originate at the DAO or Command level.
The DAO and Command should primarily be used to validate business logic, e.g., ownership, access controls, etc.
We should adopt a "ask for forgiveness" as opposed to "ask for permission" type approach by leveraging try/except blocks.
The BaseCommand.validate() method should be removed in favor of in-place validation, i.e., leveraging the “validate as you go” approach during the run() phase.

Examples

The following code illustrates how the model and DAO should be constructed, where it is evident that the model validation is handled at the database level and the DAO uses the “ask for forgiveness” approach*.

class FavStar(Model):
    __tablename__ = "favstar"
    __table_args__ = (UniqueConstraint("user_id", "class_name", "obj_id"),)

    id = Column(Integer, primary_key=True)
    user_id = Column(Integer, ForeignKey("ab_user.id"))
    class_name = Column(String(50))
    obj_id = Column(Integer)

class ChartDAO(BaseDAO[Slice]):
    def favorite(chart: Slice) -> None:
        try:
            db.session.begin_nested():
                db.session.add(
                    FavStar(
                        class_name=FavStarClassName.CHART,
                        obj_id=chart.id,
                        user_id=get_user_id(),
                    )
                )
        except IntegrityError:
            pass  # Already favorited.

The try/except approach is based on the fact that the code block will mostly succeed (which is the case in this example given one can really only favorite an object which is unfavorited). Additionally this approach removes the redundant check of first checking whether said item has already been favorited by the user.

The following code illustrates the use of a CHECK constraint in conjunction with the “ask for forgiveness” approach.

class Tag(Model):
    __tablename__ = "tag"
    __table_args__ = (CheckConstraint(func.regexp_like(name, "[^:,]")),)

    id = Column(Integer, primary_key=True)
    name = Column(String(250), unique=True)
    type = Column(Enum(TagType))
    description = Column(Text)

TagDAO(BaseDAO[Tag]):
    def create(object_type: ObjectType, object_id: int, name: str) -> None:
        try:
            with db.session.begin_nested():
                db.session.add(
                    TaggedObject(
                        object_id=object_id,
                        object_type=object_type,
                        tag=TagDAO.get_by_name(name, TagType.custom)
                    )
                )
        except IntegrityError as ex:
            raise DAOCreateFailedError() from ex

New or Changed Public Interfaces

None.

New Dependencies

None.

Migration Plan and Compatibility

A wiki page will be created to specify the rules which should be adhered to.
A series of PRs to address existing rule violations.
New PRs (where possible) should adhere to the rules.

Rejected Alternatives

None.

The text was updated successfully, but these errors were encountered:

rusackas · 2023-12-06T17:10:55Z

[VOTE] approved! 🎉

john-bodley mentioned this issue Nov 1, 2023

[SIP-99] Proposal for correctly handling business logic #25048

Closed

john-bodley changed the title ~~SIP-99C] Proposal for model and business validation~~ [SIP-99C] Proposal for model and business validation Nov 2, 2023

john-bodley added the sip Superset Improvement Proposal label Nov 2, 2023

john-bodley mentioned this issue Nov 17, 2023

fix: change the validation logic for python_date_format #25510

Merged

9 tasks

rusackas added this to SIPs (Superset Improvement Proposals) Dec 6, 2023

rusackas moved this to [RESULT][VOTE] Approved in SIPs (Superset Improvement Proposals) Dec 6, 2023

rusackas closed this as completed Dec 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SIP-99C] Proposal for model and business validation #25828

[SIP-99C] Proposal for model and business validation #25828

john-bodley commented Nov 1, 2023 •

edited

Loading

rusackas commented Dec 6, 2023

[SIP-99C] Proposal for model and business validation #25828

[SIP-99C] Proposal for model and business validation #25828

Comments

john-bodley commented Nov 1, 2023 • edited Loading