Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implemented SQL query plan explainer, added necessary indexes #728

Open
wants to merge 30 commits into
base: next
Choose a base branch
from

Conversation

polydez
Copy link
Contributor

@polydez polydez commented Mar 6, 2025

Resolves #712 and #57

In this PR we implemented SQL query plan explainer which writes to log something like a query plan below for each running query:

>> EXPLAIN QUERY PLAN -- Selects new notes matching the tags and account IDs search criteria.
SELECT
    block_num,
    batch_index,
    note_index,
    note_id,
    note_type,
    sender,
    tag,
    aux,
    execution_hint,
    merkle_path
FROM
    notes
WHERE
    -- find the next block which contains at least one note with a matching tag or sender
    block_num = (
        SELECT
            block_num
        FROM
            notes
        WHERE
            (tag IN rarray(NULL) OR sender IN rarray(NULL)) AND
            block_num > 0
        ORDER BY
            block_num ASC
    LIMIT 1) AND
    -- filter the block's notes and return only the ones matching the requested tags or senders
    (tag IN rarray(NULL) OR sender IN rarray(NULL))


QUERY PLAN
├── SEARCH notes USING INDEX sqlite_autoindex_notes_1 (block_num=?)
├── SCALAR SUBQUERY 3
│   ├── MULTI-INDEX OR
│   │   ├── INDEX 1
│   │   │   ├── LIST SUBQUERY 1
│   │   │   │   └── SCAN rarray VIRTUAL TABLE INDEX 1:
│   │   │   └── SEARCH notes USING INDEX idx_notes_tag (tag=? AND block_num>?)
│   │   └── INDEX 2
│   │       ├── LIST SUBQUERY 2
│   │       │   └── SCAN rarray VIRTUAL TABLE INDEX 1:
│   │       └── SEARCH notes USING INDEX idx_notes_sender (sender=? AND block_num>?)
│   └── USE TEMP B-TREE FOR ORDER BY
├── LIST SUBQUERY 4
│   └── SCAN rarray VIRTUAL TABLE INDEX 1:
└── LIST SUBQUERY 5
    └── SCAN rarray VIRTUAL TABLE INDEX 1:

This works for both tests and running node, if feature explain-query-plans is enabled.

In order to achieve this, we implemented our own SQLite connection pool which constructs special wrapper for each connection, returning special wrapper for each cached prepared statement. The latter is able to explain a query once it is being run. The same was done for transactions as well. This solution also helped us to get rid of deadpool-sqlite (all necessary functionality from it we have now). And configuring of connection now done in SQLite connection pool instead of hooking connection creation, it looks more elegant to me.

Also implemented some simple checking for problems in query plans during test run (test will fail, if any problem found in any query plan for queries run in tests).

We also added all necessary indices and foreign keys.

@polydez polydez marked this pull request as ready for review March 6, 2025 14:36
polydez added 2 commits March 10, 2025 22:01
# Conflicts:
#	Cargo.lock
#	crates/store/src/db/migrations.rs
#	crates/store/src/db/mod.rs
#	crates/store/src/db/settings.rs
#	crates/store/src/db/sql/mod.rs
#	crates/store/src/db/sql/utils.rs
#	crates/store/src/db/tests.rs
#	crates/store/src/server/api.rs
Comment on lines 44 to 64
#[cfg(not(feature = "explain-query-plans"))]
impl Connection {
#[inline]
pub fn prepare_cached(&self, sql: &str) -> rusqlite::Result<rusqlite::CachedStatement<'_>> {
self.inner.prepare_cached(sql)
}

#[inline]
pub fn execute<P: rusqlite::Params>(&self, sql: &str, params: P) -> rusqlite::Result<usize> {
self.inner.execute(sql, params)
}

#[inline]
pub fn query_row<T, P, F>(&self, sql: &str, params: P, f: F) -> rusqlite::Result<T>
where
P: rusqlite::Params,
F: FnOnce(&rusqlite::Row<'_>) -> rusqlite::Result<T>,
{
self.inner.query_row(sql, params, f)
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason we need these and can't just use a transaction?

iiuc these are really just convenience features which automatically create and commit a transaction under the hood. I don't think we should be hiding transactions like this - its already happened that we used two connections in the same query instead of using an atomic transaction for both.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just tried to keep the same interface, but your point makes sense, I will use only transactions, thank you!

Comment on lines 32 to 36
#[ctor::ctor]
fn initialize() {
miden_node_utils::logging::setup_tracing(miden_node_utils::logging::OpenTelemetry::Disabled)
.expect("Failed to setup logging for tests");
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this used?

I think we also have a proc-macro for this already

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is needed for initializing logging in tests. If it was implemented somewhere else, please let me know, I haven't found it by myself.

Copy link
Contributor

@Mirko-von-Leipzig Mirko-von-Leipzig Mar 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our test-macro crate I think. But you would need to annotate with

#[miden_node_test_macro::enable_logging]

I'm not sure how much of a fan I am of the macro tbh so maybe yours is better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! I compared both solutions and must admit that even though mine is simpler and more convenient to use, it doesn't create span for each test as #[miden_node_test_macro::enable_logging] does. And our enable_logging ruins syntax coloring in my IDE which I would try to avoid. We could use https://github.com/d-e-s-o/test-log instead, but it's also not perfect.

self.statement.execute(params)
}

fn explain<P: Params>(&mut self, params: P) -> rusqlite::Result<()> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain what this is doing?

Could we instead just do a fn Transaction::query_plan_contains_scan() -> bool function, and then in prepare_cached etc we use it conditionally and do an assert?

pub fn prepare_cached(&self, sql: &str) -> rusqlite::Result<rusqlite::Statement<'_>> {
    // Ensure our sql statements use indices and don't perform a row scan.
    #[cfg(test)
    assert!(self.query_plan_contains_scan().not());

    self.inner().prepare_cached(sql)
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method generates query plan, formats it as tree and writes it to log along with an SQL query. We could simplify this by analyzing just rows of query plan returned by EXPLAIN QUERY PLAN ... request, without formatting, but this would make logs less human-friendly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you provide a before/after example?

Copy link
Contributor Author

@polydez polydez Mar 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends on formatting. Current implementation generates tree like:

QUERY PLAN
├── SEARCH notes USING INDEX sqlite_autoindex_notes_1 (block_num=?)
├── SCALAR SUBQUERY 3
│   ├── MULTI-INDEX OR
│   │   ├── INDEX 1
│   │   │   ├── LIST SUBQUERY 1
│   │   │   │   └── SCAN rarray VIRTUAL TABLE INDEX 1:
│   │   │   └── SEARCH notes USING INDEX idx_notes_tag (tag=? AND block_num>?)
│   │   └── INDEX 2
│   │       ├── LIST SUBQUERY 2
│   │       │   └── SCAN rarray VIRTUAL TABLE INDEX 1:
│   │       └── SEARCH notes USING INDEX idx_notes_sender (sender=? AND block_num>?)
│   └── USE TEMP B-TREE FOR ORDER BY
├── LIST SUBQUERY 4
│   └── SCAN rarray VIRTUAL TABLE INDEX 1:
└── LIST SUBQUERY 5
    └── SCAN rarray VIRTUAL TABLE INDEX 1:

If we get rid of formatting, it would look like:

┌────┬────────┬─────────┬─────────────────────────────────────────────────────────────────┐
│ id ┆ parent ┆ notused ┆ detail                                                          │
╞════╪════════╪═════════╪═════════════════════════════════════════════════════════════════╡
│ 3  ┆ 0      ┆ 0       ┆ SEARCH notes USING INDEX sqlite_autoindex_notes_1 (block_num=?) │
├╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 6  ┆ 0      ┆ 0       ┆ SCALAR SUBQUERY 3                                               │
├╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 16 ┆ 6      ┆ 0       ┆ SEARCH notes USING INDEX sqlite_autoindex_notes_1 (block_num>?) │
├╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 25 ┆ 6      ┆ 0       ┆ LIST SUBQUERY 1                                                 │
├╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 27 ┆ 25     ┆ 0       ┆ SCAN rarray VIRTUAL TABLE INDEX 1:                              │
├╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 45 ┆ 6      ┆ 0       ┆ LIST SUBQUERY 2                                                 │
├╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 47 ┆ 45     ┆ 0       ┆ SCAN rarray VIRTUAL TABLE INDEX 1:                              │
├╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 73 ┆ 0      ┆ 0       ┆ LIST SUBQUERY 4                                                 │
├╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 75 ┆ 73     ┆ 0       ┆ SCAN rarray VIRTUAL TABLE INDEX 1:                              │
├╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 93 ┆ 0      ┆ 0       ┆ LIST SUBQUERY 5                                                 │
├╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 95 ┆ 93     ┆ 0       ┆ SCAN rarray VIRTUAL TABLE INDEX 1:                              │
└────┴────────┴─────────┴─────────────────────────────────────────────────────────────────┘

To be clear, the latter example still uses formatting by external crate.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see okay so its mimicking the format of sqlite3 by building a query tree and formatting it.

Its a bit confusing because the Tree type is recursive and we're assuming that id is strictly incrementing I think? Would be good to either comment this extensively, or find a simpler abstraction.

I know its not optimal (but this doesn't need to be), but what about something like:

// Only required so we can implement `Display` for it.
// Also a bad name but meh.
struct Element(u64, String)

struct QueryTree(Tree<Element>)

impl QueryTree {
    fn new() -> Self { Self(Tree::new(Element(0, "QUERY PLAN")) }

    fn insert(&mut self, parent: u64, element: Element) {
        // Recursively search through all elements in tree already
        // and insert new element as a child there.
    }
}

Copy link
Contributor Author

@polydez polydez Mar 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, it's not that bad to rely on the current row ordering (we will explain this in comments), since it's not a critical feature, but needed only for debugging, and such implementation is simple and fast. And the official SQLIite CLI relies on this ordering as well, it's natural in terms of query plan generation and unlikely to be changed in future.

Executor::Transaction(transaction) => transaction.prepare(&explain_sql)?,
};

let mut rows = explain_stmt.query(params)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this require actual params? With sqlite3 you just do EXPLAIN QUERY PLAN <statment> without any params.

Would be nice to not need params at all.

Copy link
Contributor Author

@polydez polydez Mar 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, this is not supported by ordinary query (we have to fill all params or get "invalid parameter count" error otherwise), but it might work for raw_query, I will check it, thanks.

Comment on lines 46 to 55
enum Executor<'conn> {
Connection(&'conn rusqlite::Connection),
Transaction(&'conn rusqlite::Transaction<'conn>),
}

pub struct CachedStatementWithQueryPlan<'conn> {
executor: Executor<'conn>,
sql: Box<str>,
statement: CachedStatement<'conn>,
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be able to get rid of this if we only support Transaction right?

I'm thinking it just becomes Transaction::check_query_plan(&self, sql: &str) which is selectively enabled by #[cfg(test)].

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants