Exposure Parsing (#38)

* Add .idea to gitignore * merge changes from read-from-artifacts fork integrating with latest version of dbt-metabase having improved yaml parsing. large formatting update to conform to black. * fix grammatical error and update readme to say semantic type * fix error in field lookup key setting and continue to support either special or semantic until ready to deprecate * internally use semantic but support meta refs to special which will be properly used if found in metabase api response. * safe importing of dependent modules in addition to bugfixes, logical synchronization, and formatting * added more verbose comments tracking intent as well as ensuring nodes without depends on/test_metadata dont throw * setting fields to PK type is worthy of info logging * handle cross schema foreign keys with explicit metabase.fk_ref with expected input format to be defined in readme. otherwise automatic resolution of target field using relation test will prepend target run schema which should be fine in 95% of use cases. cases outside that can use manifest.json parsing or set fk_ref in yml. * added debug log for validating parsed schema/fields for fk targets and ensured support for schema agnostic fk targets (schema resolved from manifest.json) * all necessary args added for cli usage. some basic assertions added. sync will now only fail hard if timeout is explicit, otherwise default behaviour if --sync is true is to attempt sync for 30 seconds and proceed with aligning what can be aligned successfully. more formatting and a few comments for clarity of intent. also added option to pass custom cert bundle to verify. * corrected typo in exclude var and added support for verbosity flag. translated some args to store action based. * updates to handle aliases when ran via dbt_path (yml parser) ubiquitously with seemless function alongside primary artifact parser (manifest.json). * support for aliased fk refs parsed via parse_ref. backoff of parse ref regex to permit catching last arg of either ref or source always being the target table. if pointing to an alias, we are collecting aliases during yml parsing to be passed to metabased client and translated to metabase table names as needed. this functionality should be unnoticed by the user but provide more resiliency as well as more user friendly outcome whilst still being very specific in our logging. * a blank dict attribute is okay here since we know our refs our clean the getter calls will just return none * correct referencing of semantic type and not special * arg var renamed back to --database to ensure seamless compatibility/use with prev version. * improve typing ensuring python 3.6+ compatibility, expanduser called on path strings allowing relative paths for --dbt_path or --dbt_manifest_path simplified * explicit Any type hint for consistency * Use more generic and compatible typing operators. parse ref now guarantees us `schema.table` format. This allows us to guarantee the incorrectly formatted ref (which should be `schema.table`) is originating from yml. Log the warning and infer correct schema for our users using target schema which covers the 90% use case. * use mapping for column and express last bit of typing for bool args * re added clarification on semantic types being formerly known as special types * Following best practices, declare default args for lists as none setting as empty list in function call * docstrings to reflect default arg is None * updates to ensure use of warning instead of warn on logger, added __init__ to nonemptystr class, cleaned some logging calls to use lazy interpolation * simplified typing and ensure type tests pass * more typing updates * added mutablemapping types * explicit type for reader as being either manifest or yml * update tests to run on updated structure * point to right module for manifest reader * use same keyword arg as other reader for uniformity * format and fix req dev pointer * tests include assertions and dummy data from jaffle shop dbt classic example project * include a compiled manifest.json for test * ensure schema cast to upper for uniformity. * added last upper() call in folder parser. * move tests to fixtures folder and remove dummy CI tests * tests point to correct sample proj path * maintain expected default for sync and https, normalize help str text to be uniform (no ending periods) * make verify typing more specific * union typing with commas, not pipe * remove double import * let timeout type check as optional setting val if None in func * remove unecessary os path join * remove unecessary "" defaults for gets compared to literals * use f strings for readability * removed two end of line periods for uniformity in console help msg * fixed unintentional changes to readme * move ternary to separate statement for cleanliness * update dev req txt to name referred to in setup * fix misnamed test requirements to one specified in CI pipeline * update integrated str checks to use class model attributes * fix ref to requirements-test * use class attribute to populate variable so we can set to None or str for metabase api * added some nice debug logging messages to any model passed over in manifest reader for visibility * include sources propagation as provided in yml parser with the benefit of implicit schema handling provided by manifest. * since we share so many args, make primary export function agnostic differentiating executed client method with getattr for the most DRY/clean approach * initial fully functional auto exposure export func built out * added some model fields to support differentiating between source / ref and another for constructing a dbt ref during model parsing for use downstream * build dbt ref/source jinja during model parsing as well as logging model source * placeholder for dataset query to be base64 encoded in exploratory link generation with compiled sql pulled from manifest parser * add assertions for commands, uniform dbt docs url arg * add recursions for extract card capable of extracting saved questions used as tables for sources or joins robustly. * uniformity in argument dbt_docs -> dbt_docs_url * much more detailed documentation provided for metabase analytics. * added new model_key and ref to unit tests * some typing updates * schema arg is unused for now * preparing to update the docs * add missing argvars for exposure parsing, default output args in working dir, fixed ref to dashboard created at key * remove default schema arg so we propagate down into manifest reader scema agnostically if unspecified * remove alpha sorting so we preserve our constructed format * give folder reader a default schema since it should have a separate invocation per schema * properly allow schema agnostic parsing if schema is not passed since we cprrectly resolce all schemas in manifest parsing * typing fix for schema arg * remove unused type * populating mock api for current and future testing * pre ran jaffle shop with artifacts * moved mock api to api sub dir to play very nicely with mock client api method and added baseline yml for testing * exposure extractor will return its output dict * test suite utils for dbt-metabase for recreating any test files * unit tests for exposure parser complete * fixed test to conform to freshly ran manifest on fresh db, old manifest was created via compile in fixture dir * ensure arg to the fresh db name used when compiling artifacts * leave the artifact for user introspection * added lookup artifacts generated from api calls * util func to rebuild lookup artifacts * added unit test for metadata lookups * remove func call at bottom * updates to include exposure parsing documentation usage * mock api clean up of user field * simplified personal collection check to use personal id key * removed dbt docs param from exposure parse, and small grammatical fixes * removed uneccessary mb prefix * overhaul to dbt-metabase cli interface and programmatic interface using config objects * removed extra arg, fixed missing t in command, and tested CLI in production * use get instead of key because root does not have personal id in keys * use dbt_models as arg to preserve models as main func name synonymously with cli * dropped empty init to lint properly, we have an init in parent and access these via namespace * pass bool to personal collection arg default, re-add init * add default to exclude arg * relative imports make more sense here namespace wise and pass linter * show defaults for command specific optional args * last bits of updates to readme to match updated programmatic interface * fix type in our * more accurate help string * rename args prefixing dbt_ and add schema arg back to cli * name iterable in such a way not conflict with var name in same scope * use DbtConfig as class name, docs_url when in dbt only context, and use context for open files in tests * all config args now match cli dest args exactly * updated readme to reflect args in programmatic section * update args to configs to not have prefix, unneeded * added const resource version to exposure func, made output path default "." * simplify nested if with a force cast so we can call startswith regardless of int/str * updated dumper to indent sequences as dbt documents show * refactor for readability improvements and easier flow tracing * final polishing improvements, documentation, and large readability improvements Co-authored-by: falador_wiz1 <alex@source.co> Co-authored-by: Mike Gouline <1960272+gouline@users.noreply.github.com>
gouline · Aug 1, 2021 · a1fa5a2 · a1fa5a2
1 parent a14c6c5
commit a1fa5a2
Show file tree

Hide file tree

Showing 109 changed files with 2,434 additions and 262 deletions.
diff --git a/README.rst b/README.rst
@@ -43,6 +43,8 @@ The main features provided by dbt-metabase are:
 * Propagating columns description to Metabase
 * Propagating columns semantic types and visibility types to Metabase through the use of dbt meta fields
 * Propagating table relationships represented as dbt ``relationships`` column tests
+* Extracting dbt model exposures from Metabase and generating YAML files to be included and revisioned with your dbt deployment
+
 
 Usage
 =====
@@ -110,6 +112,63 @@ Check your Metabase instance by going into Settings > Admin > Data Model, you
 will notice that ``ID`` in ``STG_USERS`` is now marked as "Entity Key" and
 ``GROUP_ID`` is marked as "Foreign Key" pointing to ``ID`` in ``STG_GROUPS``.
 
+Exposure Extraction
+-------------------
+
+dbt-metabase also allows us to extract exposures from Metabase. The invocation is almost identical to
+our export function with the addition of output name and location args. `dbt exposures`_ let us understand
+how our dbt models are exposed in BI which closes the loop between ELT, modelling, and consumption.
+
+
+.. _`dbt exposures`: https://docs.getdbt.com/docs/building-a-dbt-project/exposures
+
+
+.. code-block:: shell
+
+    dbt-metabase exposures \
+        --dbt_manifest_path ./target/manifest.json \
+        --dbt_database business \
+        --metabase_host metabase.example.com \
+        --metabase_user user@example.com \
+        --metabase_password Password123 \
+        --metabase_database business \
+        --output_path ./models/ \
+        --output_name metabase_exposures
+
+Once execution completes, a look at the output ``metabase_exposures.yml`` will 
+reveal all metabase exposures documented with the documentation, descriptions, creator
+emails & names, links to exposures, and even native SQL propagated over from Metabase.
+
+.. code-block:: yaml
+
+    exposures:
+      - name: Number_of_orders_over_time
+        description: '
+          ### Visualization: Line
+      
+          A line chart depicting how order volume changes over time
+      
+          #### Metadata
+      
+          Metabase Id: __8__
+      
+          Created On: __2021-07-21T08:01:38.016244Z__'
+        type: analysis
+        url: http://your.metabase.com/card/8
+        maturity: medium
+        owner:
+          name: Indiana Jones
+          email: user@example.com
+        depends_on:
+          - ref('orders')
+
+Questions which are native queries will have the SQL propagated to a code block in the documentation's
+description for full visibility. This YAML, like the rest of your dbt project can be committed to source
+control to understand how exposures change over time. In a production environment, one can trigger 
+``dbt docs generate`` after ``dbt-metabase exposures`` (or alternatively run the exposure extraction job
+on a cadence every X days) in order to keep a dbt docs site fully synchronized with BI. This makes ``dbt docs`` a
+useful utility for introspecting the data model from source -> consumption with zero extra/repeated human input.
+
 Reading your dbt project
 ------------------------
 
@@ -260,26 +319,51 @@ line. But if you prefer to call it from your code, here's how to do it:
 
     import dbtmetabase
 
-    dbtmetabase.export(
-      dbt_database=dbt_database,
-      dbt_manifest_path=dbt_manifest_path,
-      dbt_path=dbt_path,
+    # Collect Args the Build Configs #
+    ##################################
+
+    metabase_config = MetabaseConfig(
+        host=metabase_host,
+        user=metabase_user,
+        password=metabase_password,
+        use_http=metabase_use_http,
+        verify=metabase_verify,
+        database=metabase_database,
+        sync_skip=metabase_sync_skip,
+        sync_timeout=metabase_sync_timeout,
+    )
+
+    dbt_config = DbtConfig(
+        path=dbt_path,
+        manifest_path=dbt_manifest_path,
+        database=dbt_database,
+        schema=dbt_schema,
+        schema_excludes=dbt_schema_excludes,
+        includes=dbt_includes,
+        excludes=dbt_excludes,
+    )
+
+    # Propagate models to Metabase #
+    ################################
+
+    dbtmetabase.models(
+      metabase_config=metabase_config,
+      dbt_config=dbt_config,
       dbt_docs_url=dbt_docs,
-      metabase_database=metabase_database,
-      metabase_host=metabase_host,
-      metabase_user=metabase_user,
-      metabase_password=metabase_password,
-      metabase_use_http=metabase_use_http,
-      metabase_verify=metabase_verify,
-      metabase_sync_skip=metabase_sync_skip,
-      metabase_sync_timeout=metabase_sync_timeout,
-      schema=schema,
-      schema_excludes=schema_excludes,
-      includes=includes,
-      excludes=excludes,
-      include_tags=include_tags,
+      dbt_include_tags=include_tags,
+    )
+
+    # Parse exposures from Metabase into dbt yml #
+    ##############################################
+
+    dbtmetabase.exposures(
+      metabase_config=metabase_config,
+      dbt_config=dbt_config,
+      output_path=output_path,
+      output_name=output_name,
     )
 
+
 Code of Conduct
 ===============