Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance Enhancement for information_schema."tables" #134

Closed
Alain-Barrette opened this issue Feb 17, 2023 · 10 comments · Fixed by #147
Closed

Performance Enhancement for information_schema."tables" #134

Alain-Barrette opened this issue Feb 17, 2023 · 10 comments · Fixed by #147
Assignees
Labels
enhancement New feature or request

Comments

@Alain-Barrette
Copy link

When we run the code to implement our dremio objects, the following code get run

  select (case when position('.' in table_schema) > 0
          then substring(table_schema, 1, position('.' in table_schema) - 1)
          else table_schema
      end) as table_catalog
      ,table_name
      ,(case when position('.' in table_schema) > 0
          then substring(table_schema, position('.' in table_schema) + 1)
          else 'no_schema'
      end) as table_schema
      ,lower(table_type) as table_type
  from information_schema."tables"
  **where ilike(table_schema, 'Preparation.B2BV.Curated')**
  and table_type <> 'system_table'

The issue is with the where ilike(table_schema, 'Preparation.B2BV.Curated'). By applying a fonction (iLike) on the column, the performance is really bad. As you can see in the image,
image

Replacing this ilike with an equality would resolve this issue

where table_schema = 'Preparation.B2BV.Curated'

This would cause dremio to be lightning fast as it should.

Thanks

@ArgusLi
Copy link
Contributor

ArgusLi commented Feb 17, 2023

Thanks for raising the issue! We will be looking into it and get back to you if we have any questions. Thanks once again!

@fabrice-etanchaud
Copy link

Hi @Alain-Barrette, @ArgusLi,
I totally agree with you, it's far from efficient. But I ended up writing that query that way because there are tricky things concerning character case in dremio.

dremio is case insensitive with metadata (table, column names), but case sensitive with data (equality, like...). And on top of that dbt could issue warnings like : "When searching for a relation, dbt found an approximate match. Instead of guessing which relation to use, dbt will move on. Please delete database.schema.model, or rename it to be less ambiguous.".

So my two cents advice, be very careful with catalog macros. I will be very pleased to help, if needed.

Best regards,
Fabrice

@ArgusLi ArgusLi self-assigned this Feb 28, 2023
@ArgusLi ArgusLi added the enhancement New feature or request label Feb 28, 2023
@ArgusLi
Copy link
Contributor

ArgusLi commented Feb 28, 2023

@Alain-Barrette I've started work on this new change. If you look at the associated branch, I have created a new variable dremio:exact_search_enabled which defaults to false. However, if you add the following to your project's dbt_project.yml file, it will enable the variable.

vars:
    dremio:exact_search_enabled: True

Enabling the variable then changes the ilike to a basic equality for the dremio__list_relations_without_caching macro when reflections are disabled (which is the default). I've done some basic testing but have not done a full test suite or investigated the consequences of enabling this variable.

Please feel free to try it out and let me know if there are any bugs or anything else that needs to be changed.

Note: Using this branch requires dbt-core 1.4.1.

@Alain-Barrette
Copy link
Author

Thank you @ArgusLi , I have created a ticket on my side to have the team validate this. We will keep you informed of the result.

@alexcotecbq
Copy link

@ArgusLi : not working when installing dbt-core 1.4.1 with dbt-dremio 1.3.2 on the CLI with Python 3.10. Could you update PyPi?

@ArgusLi
Copy link
Contributor

ArgusLi commented Mar 1, 2023

@alexcotecbq In order to test this new change, you'll have to clone the repo, switch to the branch 134-performance-enhancement-for-information_schematables and install the local dbt-dremio. Doing dbt --version should then show that dbt-dremio and dbt-core are both at 1.4.1.

We only want to update pypi for each release, and this change will most likely be put into the 1.4 release, which should happen pretty soon, most likely within the month. There still needs to be more thorough testing before this change can be merged into the 1.4 release though.

@alexcotecbq
Copy link

alexcotecbq commented Mar 1, 2023

@ArgusLi : Okay this is working.

@ArgusLi
Copy link
Contributor

ArgusLi commented Mar 1, 2023

@fabrice-etanchaud Could you please give more information or examples on the situations where ilike should be used? That way we can come up with tests and solutions that could make the connector more optimised.

@alexcotecbq
Copy link

alexcotecbq commented Mar 1, 2023

@Alain-Barrette @ArgusLi

Performance without the change (149 seconds)

(dags) cotea@POR771:~/code/dbt-dags$ dbt compile --target qa
20:25:15  Running with dbt=1.3.2
20:25:15  Unable to do partial parsing because config vars, config profile, or config target have changed
20:25:25  Found 969 models, 20 tests, 0 snapshots, 0 analyses, 331 macros, 0 operations, 0 seed files, 331 sources, 0 exposures, 0 metrics
20:25:25  
20:27:33  Concurrency: 100 threads (target='qa')
20:27:33  
20:27:44  Done.

Performance with the new change (27 seconds):

(dags) cotea@POR771:~/code/dbt-dags$ dbt compile --target qa
18:56:27  Running with dbt=1.4.1
18:56:29  Found 969 models, 20 tests, 0 snapshots, 0 analyses, 334 macros, 0 operations, 0 seed files, 331 sources, 0 exposures, 0 metrics
18:56:29  
18:56:42  Concurrency: 100 threads (target='qa')
18:56:42  
18:56:54  Done.

Another stat from a colleague: Total Runtime of one sql query from 30s. to 222.30μs. 

@Alain-Barrette
Copy link
Author

@ArgusLi , This is much much faster now.
image

@alexcotecbq , Thanks a lot for your implication.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Development

Successfully merging a pull request may close this issue.

4 participants