refactor(catalog): Rework how CatalogOps update the DatabaseSchema #25642

jacksonrnewhouse · 2024-12-11T02:05:06Z

In adding additional CatalogOps for Processing Engine Plugins and Triggers, I had to work through new_if_updated_batch(). The intention is reasonable, which is to process CatalogOps and if they do update the DatabaseSchema, return a new copy that can then be inserted. However, this is done by updating a bunch of state inside the method as the ops are iterated. This ends up relying on brittle assumptions, mainly that there won't be a bunch of different operations in a single CatalogBatch, for instance that there won't be multiple deletes or a delete followed by a table creation.

I pulled out the logic into a couple of traits, one or DB level changes and another for table updates. Among other things this will make #25639 cleaner.

praveen-influx

Really nice refactor, easier to follow the method now. There is a lint error to fix, I'll approve it anyway.

praveen-influx · 2024-12-11T13:02:00Z

influxdb3_catalog/src/catalog.rs

-        let mut deleted_table_defn = None;
-        let mut schema_name = Arc::clone(&db_schema.name);
+
+        let mut schema = Cow::Borrowed(db_schema);


This is a neat idea!

praveen-influx · 2024-12-11T13:09:03Z

influxdb3_catalog/src/catalog.rs

+        &self,
+        mut schema: Cow<'a, DatabaseSchema>,
+    ) -> Result<Cow<'a, DatabaseSchema>> {
+        // TODO: check if we want to re-delete an already deleted DB. That is current behavior.


I think it's possible atm if you know new db name (the name changes when we delete it).

yeah, when we delete a DB, its name gets changed. This way you can immediately create a new db with the same name and it will pass uniqueness validation. If they then issue another delete, they're deleting the new DB, not the old one.

hiltontj

Agreed with Praveen, this is a great refactor - much cleaner. I had one concern though, in comment.

hiltontj · 2024-12-11T14:51:26Z

influxdb3_catalog/src/catalog.rs

-    #[test]
-    fn test_check_and_mark_table_as_deleted() {
-        let db_id = DbId::new();
-        let deleted_table_id = TableId::new();
-        let table_name = Arc::from("boo");
-        let deleted_table_defn = DeleteTableDefinition {
-            database_id: db_id,
-            database_name: Arc::from("foo"),
-            table_id: deleted_table_id,
-            table_name: Arc::clone(&table_name),
-            deletion_time: 0,
-        };
-        let mut map = IndexMap::new();
-        let table_defn = Arc::new(
-            TableDefinition::new(
-                deleted_table_id,
-                Arc::clone(&table_name),
-                vec![
-                    (ColumnId::from(0), "tag_1".into(), InfluxColumnType::Tag),
-                    (ColumnId::from(1), "tag_2".into(), InfluxColumnType::Tag),
-                    (ColumnId::from(2), "tag_3".into(), InfluxColumnType::Tag),
-                    (
-                        ColumnId::from(3),
-                        "time".into(),
-                        InfluxColumnType::Timestamp,
-                    ),
-                    (
-                        ColumnId::from(4),
-                        "field".into(),
-                        InfluxColumnType::Field(InfluxFieldType::String),
-                    ),
-                ],
-                vec![ColumnId::from(0), ColumnId::from(1), ColumnId::from(2)],
-            )
-            .unwrap(),
-        );
-        map.insert(deleted_table_id, table_defn);
-        let mut updated_or_new_tables = SerdeVecMap::from(map);
-
-        check_and_mark_table_as_deleted(Some(&deleted_table_defn), &mut updated_or_new_tables);
-
-        let deleted_table = updated_or_new_tables.get(&deleted_table_id).unwrap();
-        assert_eq!(&*deleted_table.table_name, "boo-19700101T000000");
-        assert!(deleted_table.deleted);
-        assert!(!deleted_table.series_key.is_empty());
-    }


There should be a new test added in this one's place, unless you had justification for removing it, but that is not clear.

I think you can replicate this with the new APIs you've added. That is my only recommendation for this PR.

Initially deleted it as it was testing a function that I'd removed, check_and_mark_table_as_deleted. I'll add an equivalent test that validates the CatalogOp does the correct thing.

Updated the commit

Looks good!

hiltontj · 2024-12-11T18:09:55Z

Ah - looks like a test in CI is failing @jacksonrnewhouse - probably not hard to fix?

jacksonrnewhouse requested review from pauldix, mgattozzi, hiltontj and praveen-influx December 11, 2024 02:05

jacksonrnewhouse force-pushed the db_updates branch from 4fc5548 to c407a69 Compare December 11, 2024 02:07

praveen-influx approved these changes Dec 11, 2024

View reviewed changes

hiltontj requested changes Dec 11, 2024

View reviewed changes

jacksonrnewhouse changed the title ~~Rework how CatalogOps update the DatabaseSchema~~ refactor(catalog): Rework how CatalogOps update the DatabaseSchema Dec 11, 2024

jacksonrnewhouse force-pushed the db_updates branch 2 times, most recently from 31a7edf to 465885b Compare December 11, 2024 18:06

hiltontj approved these changes Dec 11, 2024

View reviewed changes

jacksonrnewhouse force-pushed the db_updates branch from 465885b to 3542579 Compare December 11, 2024 18:14

refactor(catalog): Rework how CatalogOps update the DatabaseSchema

d66b873

jacksonrnewhouse force-pushed the db_updates branch from 3542579 to d66b873 Compare December 11, 2024 18:22

jacksonrnewhouse merged commit 9f541b7 into main Dec 11, 2024
12 of 13 checks passed

jacksonrnewhouse mentioned this pull request Dec 19, 2024

fix(catalog): consistent ordering of catalog operations #25690

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(catalog): Rework how CatalogOps update the DatabaseSchema #25642

refactor(catalog): Rework how CatalogOps update the DatabaseSchema #25642

jacksonrnewhouse commented Dec 11, 2024

praveen-influx left a comment

praveen-influx Dec 11, 2024

praveen-influx Dec 11, 2024

pauldix Dec 11, 2024

hiltontj left a comment

hiltontj Dec 11, 2024

jacksonrnewhouse Dec 11, 2024

jacksonrnewhouse Dec 11, 2024

hiltontj Dec 11, 2024

hiltontj commented Dec 11, 2024

refactor(catalog): Rework how CatalogOps update the DatabaseSchema #25642

refactor(catalog): Rework how CatalogOps update the DatabaseSchema #25642

Conversation

jacksonrnewhouse commented Dec 11, 2024

praveen-influx left a comment

Choose a reason for hiding this comment

praveen-influx Dec 11, 2024

Choose a reason for hiding this comment

praveen-influx Dec 11, 2024

Choose a reason for hiding this comment

pauldix Dec 11, 2024

Choose a reason for hiding this comment

hiltontj left a comment

Choose a reason for hiding this comment

hiltontj Dec 11, 2024

Choose a reason for hiding this comment

jacksonrnewhouse Dec 11, 2024

Choose a reason for hiding this comment

jacksonrnewhouse Dec 11, 2024

Choose a reason for hiding this comment

hiltontj Dec 11, 2024

Choose a reason for hiding this comment

hiltontj commented Dec 11, 2024