epic: Cortex Updater can migrate data structure changes #1184

dan-menlo · 2024-09-11T02:53:17Z

Goal

Cortex will need to be able to handle data structure migrations
epic: Implement new Model Folder and model.yaml #1154
Cortex.cpp: Updater #1116
Handles: https://github.com/janhq/cortex-internal/issues/15#issuecomment-2459073710
Scope
- Migrate cortex.db between versions
- Cortex Data Folder
- up/down migrations
- Make filesystem changes if needed

Scope

Should only cover Cortex.cpp: Data Structures #1115 (i.e. model folder, model.yaml, .cortexrc)
Jan should call Cortex's migrator during updates

Discussion

Adapter for DB layer
Down migration - additional considerations?
- Recovery (restore column vs restore data)

Success Criteria:

Successful upgrade and db migration from 1.0.2 to 1.0.X (when /model/sources is merged)

Tasklist

?

The text was updated successfully, but these errors were encountered:

vansangpfiev · 2024-11-12T02:09:21Z

Some thoughts on data migration:

Database Migration Strategy

Versioning: Create a table within SQLite database to track the current schema version. This table will have a single row containing the version number

CREATE TABLE IF NOT EXISTS schema_version (
    version INTEGER NOT NULL
);

Migration Scripts:
- Can create sql scripts for migration by version. These scripts can be executed after run cortex update. Not sure if we can run sql scripts by postscript cc: @hiento09
- Pros: directly writing SQL scripts for migrations is straightforward and gives us full control over the SQL commands executed during migrations.
- Cons: must manually organize, track, and apply scripts, which can become cumbersome as the number of migrations grows
Migration Framework:
- Develop a C++ component that reads the current schema version from the database and applies the necessary migration scripts to reach the target version.
- Pros: provide a consistent approach to managing migrations across different environments
- Cons: not easy to implement
Data Preservation Techniques
- Data Migration: Before dropping any columns or tables, migrate the existing data to a safe location (e.g., a backup table or external storage) so it can be restored if needed.
- Column Preservation: Instead of dropping columns, consider marking them as deprecated or renaming them.

Question: should we use migration scripts or implement a C++ database migration component?
2. Cortex data structure
Similar to database migration, we can use a script to rearrange data to new data structure.
Note: We store yml file path in database, so if model structure changes, we also need to update the models' database

What do you think? @dan-homebrew @louis-jan @janhq/cortex

dan-menlo · 2024-11-13T02:43:58Z

@vansangpfiev Overall, I think we should do what allows us to ship something quickly this Friday.

Up/Down Migrations

I think we should explore something similar to RoR ActiveRecord
Each major version has an up/down migration (SQL migration script and and folder/file structure changes)
My sense is that handling this imperatively is easier for now, and we can figure out a more complicated declarative method later on
https://guides.rubyonrails.org/active_record_migrations.html#using-the-up-down-methods

/migrations
    1.0.2.cpp
    1.0.3.cpp

# 1.0.2.cpp
void up() {
    
    // Run the SQL script (assuming it's a shell script that runs SQL commands)
    std::string runSqlCmd = "psql -c \"" + sqlScript + "\"";
    int runSqlResult = system(runSqlCmd.c_str());

    if (runSqlResult == 0) {
        std::cout << "SQL script executed successfully." << std::endl;
    } else {
        std::cerr << "Error: Failed to execute the SQL script." << std::endl;
    }

    // Define the directories
    std::string rootDir = "/";
    std::string targetDir = "/path/to/target/directory";

    // Define the file to be moved
    std::string fileToMove = rootDir + ".cortexrc";

    // Check if the file exists
    std::ifstream file(fileToMove);
    if (!file) {
        std::cerr << "Error: The file .cortexrc does not exist in the root directory." << std::endl;
        return;
    }

    // Move the file to the target directory
    std::string moveCmd = "mv " + fileToMove + " " + targetDir;
    int moveResult = system(moveCmd.c_str());

    if (moveResult == 0) {
        std::cout << "The .cortexrc file has been moved to the target directory." << std::endl;
    } else {
        std::cerr << "Error: Failed to move the .cortexrc file." << std::endl;
    }
void down() {
    ...
}

Please push back on this @janhq/cortex - as you can tell from the RoR reference, my approach may be very outdated.

Database migration strategy

I think SQL migration scripts make most sense
Data Preservation: I think marking columns as deprecated is ok (for now)

Cortex Data Structure

See up/down methods above

louis-menlo · 2024-11-13T05:43:40Z

We likely need to design for both local running and cloud deployment. Adding a code constraint would block our migration from the deployment layer, which likely waits for an instance to spin up and perform a health check, which is inefficient. There could also be a scenario where we want to share the database instance across services like cortex.py and cortex.cpp.

However, one advantage is that they can embed cortex.cpp as a standalone binary without having to worry about other services, which is particularly useful when embedding into Jan.

I’d prefer built-in migrations as suggested above for cortex.cpp to ensure embedding in Jan won’t cause issues.

vansangpfiev · 2024-11-13T09:38:46Z

We will use SQL scripts for up/down migrations, try to minimize changes between versions to ensure smooth transitions. For complex down migrations, we will create a PR to address any issues.

Script Execution: Migrations will run when the server starts for the first time (to support Jan use case)
Folder Structure Changes: Ideally managed via scripts, but may require code adjustments.
Version Management: We will only guarantee up/down migrations between versions.

Structure

cortexcpp
    |__ models/
    |__ engines/
    |__ migrations/
           |__ v1/
           |    |__ v1.sql
           |    |__ data_structure_v1.sh
           |__ v2/
                |__ v2.sql
                |__ data_structure_v2.sh

Question: How do we ship the migration scripts?

Option 1: host them in a static server and cortex pull the scripts
Option 2: pack the migration scripts into installer
Option 3: option 1 + option2, in case users only download the cortex-server binary, they have an option to fall back to option 1

cc: @dan-homebrew @hiento09 @janhq/cortex

dan-menlo · 2024-11-14T02:30:20Z

@vansangpfiev Our discussion:

We will have a pure C++ updater
- Remote .sql and .sh scripts may represent a security/backdoor risk in future
- We will trade off flexibility (e.g. hotfixing) for verifiability
We will align Version for data structures and Cortex
- Simplify
- Most migrations will be empty (i.e. empty up/down)

cortexcpp
    |__ models/
    |__ engines/
    |__ migrations/
           |__ v1.0.2.cpp
           |__ v1.0.3.cpp

louis-menlo · 2024-11-14T03:18:52Z

Oops. This wouldn’t work for nightly migration support or switching between commits.

dan-menlo · 2024-11-14T06:05:28Z

Hmm, can we solve this via running migrations manually for Nightly?

Nightly

Otherwise, I think this is an edge case and we should trigger migrations manually for nightly
- For nightly, we trigger migrations manually
- e.g. cortex migrations up 1.0.3, cortex migrations down 1.0.3
A more complicated approach is to run down and up in migrator between nightly versions
- e.g. Updater for nightly runs down and then up
- I don't like this, too complicated

Switching between commits

Same as Nightly

vansangpfiev · 2024-11-18T01:57:21Z

@dan-homebrew IMO, It is necessary to distinguish between the data version and the software version. Since all migration actions occur within the software, the application may not know its current version during modifications (can be stable, beta, nightly, etc).

To ensure clarity and consistency, we can increment the data version each time a change is made. In the event of a conflict, we will revert to the previous version of the data. This separation allows for more effective management of both software and data changes, enhancing stability and reliability throughout the migration process.

luke-nguyen990 · 2024-11-18T02:23:34Z

@dan-homebrew IMO, It is necessary to distinguish between the data version and the software version. Since all migration actions occur within the software, the application may not know its current version during modifications (can be stable, beta, nightly, etc).

To ensure clarity and consistency, we can increment the data version each time a change is made. In the event of a conflict, we will revert to the previous version of the data. This separation allows for more effective management of both software and data changes, enhancing stability and reliability throughout the migration process.

I agree with you @vansangpfiev; software/release/app versioning is a separate aspect, more related to business/product management than engineering
Then, we have source code versioning a.k.a Git commits. In today's practices, the relationship between source code versioning and data/migration versioning is one-to-one, as migration files or entity/model files are part of the code.
If we separate migration/data versioning from source code versioning (Git commits), we still need a mechanism to map them back. The migration/data for a specific Git commit should be deterministic—meaning there should be one and only one possibility. While it's possible for multiple commits to share the same migration/data version, the reverse should not be true.

TL, DR: If we roll back to a specific commit abc123, we must know exactly how the data schema looks in a deterministic way.

vansangpfiev · 2024-12-09T03:13:11Z

Expected result:

Migration happens at server start
No change for model structure for v1.0.5
Cortex creates new tables if they do not exist: models, engines, hardware.
Migrate all data to new database schema version.

TC117 · 2024-12-17T00:13:05Z

Migration when server start

remove table hardwares on update

dan-menlo · 2024-12-29T04:02:16Z

@TC117 Please close this ticket once QA sign off and completed

dan-menlo added this to Menlo Sep 11, 2024

dan-menlo converted this from a draft issue Sep 11, 2024

dan-menlo changed the title ~~epic: Cortex Migrations~~ epic: Cortex Migrations for Data Structures Sep 11, 2024

dan-menlo changed the title ~~epic: Cortex Migrations for Data Structures~~ epic: Cortex Updater can migrate data structure changes Sep 29, 2024

dan-menlo moved this to Investigating in Menlo Oct 13, 2024

gabrielle-ong mentioned this issue Nov 11, 2024

closed: Cortex Updater can do Migrations #1642

Closed

gabrielle-ong assigned vansangpfiev and nguyenhoangthuan99 Nov 11, 2024

dan-menlo unassigned nguyenhoangthuan99 Nov 12, 2024

vansangpfiev moved this from Investigating to In Progress in Menlo Nov 13, 2024

vansangpfiev mentioned this issue Nov 18, 2024

feat: data migration #1692

Merged

3 tasks

vansangpfiev moved this from In Review to Review + QA in Menlo Nov 27, 2024

gabrielle-ong added this to the v1.0.4 milestone Nov 27, 2024

gabrielle-ong modified the milestones: v1.0.4, v1.0.5 Dec 6, 2024

TC117 moved this from QA to Completed in Menlo Dec 26, 2024

dan-menlo closed this as completed Dec 29, 2024

github-project-automation bot moved this from Completed to QA in Menlo Dec 29, 2024

TC117 moved this from QA to Completed in Menlo Dec 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

epic: Cortex Updater can migrate data structure changes #1184

epic: Cortex Updater can migrate data structure changes #1184

dan-menlo commented Sep 11, 2024 •

edited by gabrielle-ong

Loading

vansangpfiev commented Nov 12, 2024 •

edited

Loading

dan-menlo commented Nov 13, 2024 •

edited

Loading

louis-menlo commented Nov 13, 2024

vansangpfiev commented Nov 13, 2024 •

edited

Loading

dan-menlo commented Nov 14, 2024

louis-menlo commented Nov 14, 2024

dan-menlo commented Nov 14, 2024 •

edited

Loading

vansangpfiev commented Nov 18, 2024 •

edited

Loading

luke-nguyen990 commented Nov 18, 2024 •

edited

Loading

vansangpfiev commented Dec 9, 2024

TC117 commented Dec 17, 2024

dan-menlo commented Dec 29, 2024 •

edited

Loading

epic: Cortex Updater can migrate data structure changes #1184

epic: Cortex Updater can migrate data structure changes #1184

Comments

dan-menlo commented Sep 11, 2024 • edited by gabrielle-ong Loading

Goal

Scope

Discussion

Success Criteria:

Tasklist

vansangpfiev commented Nov 12, 2024 • edited Loading

dan-menlo commented Nov 13, 2024 • edited Loading

Up/Down Migrations

Database migration strategy

Cortex Data Structure

louis-menlo commented Nov 13, 2024

vansangpfiev commented Nov 13, 2024 • edited Loading

dan-menlo commented Nov 14, 2024

louis-menlo commented Nov 14, 2024

dan-menlo commented Nov 14, 2024 • edited Loading

vansangpfiev commented Nov 18, 2024 • edited Loading

luke-nguyen990 commented Nov 18, 2024 • edited Loading

vansangpfiev commented Dec 9, 2024

TC117 commented Dec 17, 2024

dan-menlo commented Dec 29, 2024 • edited Loading

dan-menlo commented Sep 11, 2024 •

edited by gabrielle-ong

Loading

vansangpfiev commented Nov 12, 2024 •

edited

Loading

dan-menlo commented Nov 13, 2024 •

edited

Loading

vansangpfiev commented Nov 13, 2024 •

edited

Loading

dan-menlo commented Nov 14, 2024 •

edited

Loading

vansangpfiev commented Nov 18, 2024 •

edited

Loading

luke-nguyen990 commented Nov 18, 2024 •

edited

Loading

dan-menlo commented Dec 29, 2024 •

edited

Loading