Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PowerBI #28

Open
gruckion opened this issue Apr 25, 2019 · 26 comments
Open

PowerBI #28

gruckion opened this issue Apr 25, 2019 · 26 comments

Comments

@gruckion
Copy link
Owner

gruckion commented Apr 25, 2019

Notes from Getting Started with Power BI

Power Query Documentation
Power Query M Reference

Introduction

Power BI is a cloud based service for combining a variety of data sources to build a data model to answer analytical questions. You can then build dashboards to see information at a glance, or make interactive reports. Power BI also supports natural language queries and Quick Insights to quickly start exploring your data. Power BI supports iOS and Android.

Using Datasets and Data Models

You can create a Data Model which can be stored in a *.PBIX or *XLSX file. The other option is to take data which can be *.XLSX, *.XLSM or a *.CSV file. Uploading the data into Power BI results in a Dataset.

A report can only include data from a single Dataset , if you need to combine data then you should create a Data Model to store the various bits of data. If you upload an *.XSLX or *.CSV into Power BI you cannot later make changes to the structure.

Uploading Datasets

Import or Connect to data. You can select a Local File. When you upload a Datasets or Data Model into Power BI you can use the Quick Insights feature to begin exploring data. This produces up-to 8 insights.

Use the navigation menu select Datasets > ... > Quick Insights

  1. Majority
  2. Category Outliers
  3. Time Series Outliers
  4. Trends
  5. Seasonality
  6. Steady Share
  7. Correlation
  8. Change Points
@gruckion
Copy link
Owner Author

Saving Quick Insights

You can pin an insight to a dashboard using the Pin visual button.

Working with reports

Reports can have multiple tabs, with charts that can be filtered by a slicer. With tables that can be ordered by a specific column. You can drill down into subsets of data within a chart.

You can include maps that represents positional data and your charts and tables can be filtered when selecting a data point on the map. Views within a report can be pinned to the dashboard. We can also pin live pages. This will display the default view of the report, ensure you save your edits for them to appear on the live page.

Dashboard

A dashboard consists of multiple tiles, these can be reports, web content, images an SSRS report a text box or video. You can also query using natural language on a text item on a dashboard. Reports are restricted to show items from a single data model. While dashboards can include items from multiple reports.

@gruckion
Copy link
Owner Author

@gruckion
Copy link
Owner Author

Getting data into Power BI

  • Databases
  • Files / Onedrive
  • Web or OData
  • Azure
  • Online services Saas

Database import

You can always import data from databases, this will copy your data into Power BI

Direct Query

With direct query you can access data directly from the database, but this is limited to specific types of databases, Sql server, Oracle, Teradata, SAP HANA. In order to do this you will need to have Enterprise Gateway installed.

All model tables must come from the same database, so you can not mix and match data from various databases when using direct query mode. You can not use calculated columns or tables in the model when using direct query mode. Cannot change datatypes.

This is still a good solution when datasets are large or you want real-time data access.

Connect Live

This is unqiue to analysis services, you can connect to data directly when working with tabular services without making any changes to the model. But you cannot combine tabular data with other sources. This also requires an enterprise gateway to be installed.

You can explore multidimensional data in the Power BI desktop application, but these cannot be publish.

Access and Excel Connection Errors

Power BI uses the provider for the redistributable Microsoft Access database, which will need to be install and the link can be found here

Loading data into Excel

Data can be loaded from the Data tab, select the file to import and then select Load To. From here you can select Onlu Create Connection and Add this data to the Data Model

@gruckion
Copy link
Owner Author

Loading data in Power BI desktop

Select Get Data > More and select the type to connect to. Open the file and click load or edit. Edit brings up the Query editor which allows you to make changes to yout data. Click close and apply once finished.

@gruckion
Copy link
Owner Author

gruckion commented Apr 26, 2019

Web and Other Sources

These can be files from a url, share point list, Hadoop file, active directory, OData feed or ODBC for databases that don't have their own connector, or from Microsoft Exchange server or even R scripts.

Web Url

Select get data from web and paste in the url https://www2.census.gov/geo/docs/maps-data/data/rel/zcta_place_rel_10.txt

Azure data

Such as SqlDb or SqlDw and this can be used with direct query or import from the data sources. There is support for Blob storage, table storage, document Db, data lake, azure marketplace, HDInsights and Spark. As well as Saas, these include SharePoint Online, Dynamics CRM, Google Analytics, Facebook, Salesforce objects and reports and exchange online.

@gruckion
Copy link
Owner Author

Manual entry

In Power BI desktop click the enter data button, manually type then click load. If in excel ensure you format the data as a table. Click Data > From Table. Then confirm and name the table and load the data into the data model.

@gruckion
Copy link
Owner Author

Power BI guides

https://docs.microsoft.com/en-us/power-bi/

Shaping data

The steps you specify (for example, rename a table, transform a data type, or delete columns) are recorded by Power Query Editor. Those steps are then carried out each time the query connects to the data source, so that the data is always shaped the way you specify.

@gruckion
Copy link
Owner Author

gruckion commented Apr 29, 2019

Power BI and Harvest

Harvest API documentation
Guide to connecting Harvest to Power BI

Why invest in a data warehouse

A data warehouse is a database optimized for non-technical users. It has a focus on having less joins and should never be a production database. Having a central data warehouse is better than having multiple records v1, v1.1 v1.2 e.t.c.

Using a data warehouse allows us to reduce having multiple sources of truth and copy-paste errors. It is easy to develop QA suites that check and make sure the data stays accurate.

Data warehouses are optimized for analytical and metric task focused performance. They partition analytical and operational tasks, which limits the risk of locking out other users as updates occur.

Data warehouse for Power BI / Harvest syncing

Using Azure SQL Data Warehouse with Harvest

Currently it appears you will need to make a script that take from the Harvest API and uses an ETL work flow. This data warehouse can then easily be connected to Power BI. To update the data audit information should be stored within the data warehouse to allow the script to update from the last run date.

Stich

Existing solutions allow you to connect Harvest API to a data warehouse which can then be make available to Power BI. Although this has a large cost to it. But the point here is this is someone's marketed solution to connecting Web APIs to Power BI.

@gruckion
Copy link
Owner Author

Azure Application Insights and Power BI

It is possible to connect the Azure application Insights web API to Power BI, this is discussed here.

@gruckion
Copy link
Owner Author

gruckion commented Apr 29, 2019

@gruckion
Copy link
Owner Author

gruckion commented Apr 29, 2019

@gruckion
Copy link
Owner Author

gruckion commented Apr 29, 2019

PowerBI requirements Mindmeister

  • KPI > Key performance index
  • WP > work packages
  • CFS >
  • PMO >
  • YTD > Year to date

@gruckion
Copy link
Owner Author

TODO

@gruckion
Copy link
Owner Author

Json nested data

When imported complex json data multiple nested layers the import may render a row as List. A discussion can be found here and here

image

Solutions are here, here and here

We need to expand lists and records. This can be done through the UI

Json data directly from API end point

To load the data from Harvest we need to specify the Authorization hHarvest-Account-Id and User-Agent

curl "https://api.harvestapp.com/v2/time_entries" \
  -H "Authorization: Bearer $ACCESS_TOKEN" \
  -H "Harvest-Account-Id: $ACCOUNT_ID" \
  -H "User-Agent: MyApp (yourname@example.com)"

To achieve this in Microsoft Power Query M we use the following

let
    Source = Json.Document(
        Web.Contents(
            "https://api.harvestapp.com/api/v2/time_entries",
            [
                Headers=[
                    Authorization="Bearer ACCESS_TOKEN",
                    #"Harvest-Account-ID"="ACCOUNT_ID",
                    UserAgent="New Orbit Power BI Time Entries"
                ]
            ]
        )
    )
in
    Source

@gruckion
Copy link
Owner Author

DAX Power BI

Data Analysis Expressions (DAX) is a library of functions and operators that can be combined to build formulas and expressions in Power BI Desktop, Azure Analysis Services, SQL Server Analysis Services, and Power Pivot in Excel.

@gruckion
Copy link
Owner Author

gruckion commented May 1, 2019

Figuring out how to grammatically create columns for each days in year

Adding all the days in the year as columns in Power BI manually is madness. It is better to automate this or have a rolling date range. Especially as we add more data year on year. We do not want to generate a massive report with an endless list of columns. We would prefer to have a parameter to specify the year you wish to fetch the data for. Then this will use Absence Manager data to calculate the hours available for each employee.

This to figure out

  • How to use parameters and access them within a power query
  • How to generate a list of dates DDD dd/mm/yyyy
  • How to programatically generate columns from a list
  • Ensure the API end point is called for the specified date range from the parameters

Power BI do while during web crawl

The Youtube video shows how to use List.Generate with a lambda function along with try otherwise

Example of a do while which calls a user defined function GetData(int)

= List.Generate( () => 
    [Result = try GetData(1) otherwise null, Page = 1],
    each [Result] <> null,
    each [Result = try GetData([Page]+1) otherwise null, Page = [Page]+1],
    each [Result])

@gruckion
Copy link
Owner Author

gruckion commented May 1, 2019

Power BI and Power Query Parameters and Functions

This Youtube video shows how to convert a query into a function. The applied steps transforming the data can then be applied to other files listed in a folder, where the file (binary) is the input parameter. This is done by adding a new custom column where you use the newly created function on each of the binary contents.

@gruckion
Copy link
Owner Author

gruckion commented May 1, 2019

Retrieve, Transform, and Combine Data Using Power Query

Following course on PluralSight

Excel Business Intelligence Tools

  1. Power query, discoveries the data within / outside of the organisation to bring into Excel
  2. Power Pivot, is a modeling technology and helps us build relationships and produce new columns. It works with large amounts of data (billions) in an Excel data model.
  3. Power View, is for visualise and display the data in Power Pivot, these can be interactive.

Power BI

It uses Power Query as does Excel for querying data.

Power Query

  1. Data retrieval from usual and online data sets. Power Query allows us to reliably connect and read this data.
  2. Query editor, allows us to process the data to filter out data, sort, and modify. These steps are applied on top of the retrieved data.
  3. Combine, we may want to then combine data or use existing data to make new columns. We can also combine data from existing Power Queries to produce new data.
  4. Share, using the data catalog we can share this data. If we need to change a Power Query, say the public data structure changes, then we only need to update the Power Query not each Excel data sheet that uses the Query.

@gruckion
Copy link
Owner Author

gruckion commented May 1, 2019

Working with Parameters and Functions in Power Query / Excel and Power BI

  • Parameters are used in the query editor
  • Parameters are values that can be used by multiple queries.
  • They support multiple data types
  • Two way to create, by clicking Manage Parameters or create a query that returns a single value, then right clic k create parameter

Parameters are a special type of query that return a single value. Normal queries will make a fetch for data then return a data table.

Uses of parameters

  • As global variables to make development easier.
  • Filter data in tables by the value of a parameter
  • Change the data source for your query, for example changing between dev/test/production
  • Supply values for M calculations defined in Custom Columns

Query parameters can be a list of a query. For list you manually specify the values. If you want to have a query that automatically gives you the possible values in a list. Then you will need to get a column containing the possible values and convert the table into a list.

When changing the parameters value that is used within a query it will cause the data to be reloaded. This is not like a normal filter or slice which just modifies the already loaded data.

Creating parameters within a query.

When loading the data during the source step you can specify the file path to load or you can optionally create a new query from the drop down menu New Parameter. If you have not yet made any parameters then you will not see this option. You will need to ensure that Always allow is enabled within the View menu.

Creating a parameter template file

In a template file you will first be prompted to set the parameters. It might be a good idea to have a template of the KPI saved where the user will specify the year when creating a report from that template. However this will of course mean the data will not run year on year.

Generating functions

  • Traditionally functions where only available within the M language. You can only use functions when writing code.
  • Now you can turn queries that contain parameters into functions.
  • A function is a data type in the M language, hence you can have a query that returns a function.
  • Functions can be used to share the same logic across multiple transformations

Create a new query from web source, paste in the url to some XML data. This url can be changed to be a parameter of type text. But now the parameter can be converted into a function right click and select Create Function. The function and query will then be linked, if you modify the generated function it will break this link. If you modify your query it will automatically update your function.

All queries that make use of a function will use the current version of that function. Hence if you need to make a small change to many reports it is best to have that part extracted as a function, as you will then only need to make one change. This will then be reflected in all queries that use your function.

Publishing with parameters

It is not possible to allow the end user to change the parameters once the report has been published onto Power BI online. One alternative when creating a content pack is to create a template content pack and if your published data set contains parameters then when you consume and connect to the content pack you have the ability to change the parameter within there. But it does require this extra step involving template content packs.

Limitations with published parameters

  • You can only have text parameters
  • You can only enter values
  • Can't use multiple data sources
  • Cascading parameters, like SSRS kind of works but not really. The problem is some parameters do not refresh. A slicer would be a better option in this situation.

@gruckion
Copy link
Owner Author

gruckion commented May 1, 2019

Add a Filter to a report in Editing view

There are four types of filters;

  • page
  • visualization
  • report
  • drill through

They will persist when navigating away from a report, allowing you to pick up from where you left of. To change the selected filters in a report go into Edit report mode. You can then change the filters within the visualizations tab under filters.

You can filter with fields that are not included in the visual. From the field pane select the field you want to add as a new visual-level filter and drag it into the Visual level filters.

In addition to adding filters to specific visuals you can also filter on the page-view level. This is again done from within Edit report simply place your filters into the section Page Level Filters.

@gruckion
Copy link
Owner Author

gruckion commented May 2, 2019

Deep Dive into the M Language

Even deeper talk on the key topics

Custom connectors

These are like a set of saved queries to allow you to pre-treat data before providing it to Power Query / Power BI. To create custom connectors you use Visual studio with the Power Query SDK. You build and deploy your custom connector which can then be accessed in Power BI.

  • This allows you to create business analyst friendly views for REST APIs.
  • Implement custom OAuth v2 authentication flows
  • Use as building block for Power BI App / Content Pack
  • Enable Direct Query for an ODBC based data source by wrapping an MWrapper

When creating a custom connector you can add additional meta data to create a branded connector.

Design Tenets for "M"

Target audience is information Workers and data analysts

  • Specifically, the intermediate/advanced Excel users
  • Litmus test: "User benefits from the Excel formula bar"

For that audience, the language should be:

  • Familiar, easy to remember
  • Easy to read and write; limited syntax, use of non standard symbols
  • A superset of the Excel formula language
  • Close to concepts in DAX, where appropriate
    Powerful capabilities for the advanced user

Language Flow

Evaluation flows from one step to the next. Where the next step references the output from the previous step. You are not limited to this flow and an output from one line can be used multiple times. If a line is created but not consumed in any of the future steps then it will not be evaluated at run time, M is a lazy evaluated language.

If you have a line of M that is not consumed but is used for logging then it will be stripped out. This can be a source of error.

let
   Source = Web.Page(Web.Contents("url")), 
   WebTable = Source{0}[Data],
   RenamedColumns = Table.RenameColumns(WebTable,{{"Column1", "Rank"}, {"Column2", "2013"}})
in
   RenamedColumns

Type System

At the primitive level there are no int8, int16 byte e.t.c.

A small set of built-in types

  • any, none
  • null, logical, number, text, binary
  • time, date, datetime, datetimezone, duration
    Complex types
  • list, record, table, function

You can set types on a function parameter, this can be done using the as key word.

Sql.Database = (server as text, database as text, optional options as nullable record) as table

Ascribed Types

A value's ascribed type is the type to which a value is declared to conform to. When a value is ascribed a type, only a limited conformance check occurs. M does not perform conformance checking beyond nillable primitive type. M program authors that choose to ascribe values with type definitions more complex than nullable primitive-type must ensure that such values conform to these types.

This essentially works as a format, from the M engine point of view a number is a number but if the user ascribes the type to be an Int8.Type then the Power BI desktop will store data differently based on the type. When the user tries to enter a value from within the UI that is outside of the ascribed type definition the user will get a warning. This is handy as it allows you to further restrict the range of values coming from the user.

Commonly used ascribed types

  • Byte.Type, Int8.Type, Int16.Type, Int32.Type, Int64.Type
  • Single.Type, Double.Type, Decimal.Type, Currency.Type, Percentage.Type
  • Character.Type

Simple Values in M

Simple Value Literal
Null null
Logical true, false
Number 1, 1.2, 1.2e-3, #infinity, #nan
Text "hello"
Date #date(2013, 3, 8)
Time #time(15, 10, 0)
DateTime #datetime(2013, 3, 8, 15, 10, 0)
DateTimeZone #datetimezone(2013, 3, 8, 15, 10, 0, -8, 0)
Duration #duration(1, 13, 59, 12.34)

Symbolic Operators

You can only concat the same types, you will need to manually convert a datetime to string in order to concat with another string.

Conditional access with ? ensures you get back null if an error occurs, i.e. if the record does not exist.

Type Literal
Arithmetic (numbers, durations, and date-time values) +x, -x, x+y, x-y, x*y, x/y
Equality, inequality x = y, <> y
Comparison x < y, x <= y, x > y, x>= y
Concatenation (text, list, record, table, date and time) x & y
Field access (or null) r[f] r[f]?
Projection (fill with null) r[[f1], [f2]] r[[f1], [f2]]?
Item lookup (or null) l{i} l{i}?
Function application l{[k1=v1, k2=v2]} l{[k1=v1, k2=v2]}?
Not-implemented error ...
Self recursive reference @n

Complex Values - List, records, tables, and other

Lists

are defined with the curly brackets, this is the opposite of json.
= { 1, 2, 3, "hello" }

Records

While records, which can be thought of as objects are again the opposite of json and are defined with square brackets. Records can contain other types within them.
= [Name = "Bob", Children = {1, 2, 3}, Happy = true ]

Records can be named

let Source =
    [
        OrderId = 1,
        CustomerID = 2,
        Item = "Fishing Rod",
        Price = 100.0
    ]

To then access the value of a record you use the [] brackets e.g. Source[Item] which will equal "Fishing rod"

Tables

are defined with #table() you parse in a list of the column names, then a list of list for each row. Tables are sets of values organized into named columns and rows. The column type can be implicit or explicit. To retrieve a row from a table use the {} brackets, for example if Source = '#table() then to access the row at an indexed position you would Source{1} which would return {1, 2} as in the below example.

Example of a table with implicit types

let  
  Source = #table(   
    {"OrderID", "CustomerID", "Item", "Price"},   
      {   
          {1, 1, "Fishing rod", 100.00},   
          {2, 1, "1 lb. worms", 5.00}   
      })  
in  
    Source

Example with explicit types, which returns the first row.

let  
    Source = #table(  
    type table [OrderID = number, CustomerID = number, Item = text, Price = number],   
        {   
                {1, 1, "Fishing rod", 100.00},   
                {2, 1, "1 lb. worms", 5.00}   
        }  
    )  
in  
    Source{1}

Records can be nested within values of other records.

Functions

Functions are a type and a value, more information at Expressions, values and let expressions. So you can parse functions as a parameter to another function.

Types

You can declare new types type table [n = number, #"n^2" = number] if you typing a table or assigning types to a table.

Recall {} is for lists and [] is for records.

List Types Phrases = type { text }
This defines a new type called Phrases that takes in a list of text

Record Types Person = type [ Name = text, Age = number ]
This defines a Person record that only supports to values Name and Age of text and number respectively.

Table Types Persons = type table Person
This defines a table that supports Person records only, these must follow the above definition.

Function Types MakePerson = type function (name as text, optional age as number) as a Person
This defines a new function that allows you to create a Person record, you specify the name and age and the result returned would be a Person record.

Binary

The binary library allows you to parse binary files, reading headers from png, or parsing zip files e.t.c.

Deep Dive into the M Language slides.pptx

Types in the Power Query M formula language.pdf

Power Query M Formula Language Specification (February 2019).pdf

Further reading
https://aka.ms/DataConnectors
https://aka.ms/PowerQuerySDK
https://github.com/mattmasson/PowerQuery
https://channel9.msdn.com/Events/lgnite/Australia-2017/DA326

@gruckion
Copy link
Owner Author

gruckion commented May 2, 2019

Conditional logic and errors

If expressions

For example ``if ` 1 < 2 then "Happy" else "Sad". This is simple and support multiple operators as shown in

Symbolic Operators above.

if 2 > 1 then  
    2 + 2   
else   
    1 + 1

Let expressions

This is the general structure of how you start and end M statements. This example defines x as the arctan of 3 and then returns the square of this value.

let x = Number.Atan(3) in x * x

Error expressions and more on errors

If you want to throw an exception you can use the follow

error "Something went wrong"

Errors can follow a record with Reason, Message, and Detail where detail references a variable.

error [Reason = "Expression.Error", Message = "You cannot use 't' in your expression", Detail = t]

Try expressions

M will attempt to evaluate these expressions and if it fails it will return the alternative result.

try aFunction() otherwise 42

@gruckion
Copy link
Owner Author

gruckion commented May 2, 2019

Folding

Power query will fold M queries into SQL where needed, or any other language that the data source reflects. This supports column filters, row filters, joins, group by, pivot and unpivot. This also applies to numeric calculations and aggregations. When the data source is just a simple text file then power query will process the file locally in memory.

Privacy settings

are considered when folding, if a query can/will be folded then you will receive a prompt to indicate the privacy of the data. I.e. if you join to tables you will be required to enter if the data is public or private. This is to protect you from exposing private data when folding with public accessible data. Otherwise Power Query will send your private data with the public data to the public data web server. With the right privacy settings in place power query will do the operations locally instead.

@gruckion
Copy link
Owner Author

gruckion commented May 2, 2019

Lists

To create a new list for values 1 through to 10 the .. operator can be used.

= {1 .. 10}

This then produces a list of values from 1 to 10.

You can also create a list of records and they do not need to be homogeneous. The last item in the below example is a different record object type to the first two elements.

= {[x=1,y=2],[x=3,y=4],[y=3]}

List can then be converted to a table using Table.FromList() this can also be done by using Table.FromRecords() which takes in a list of records. In this case the function does expect the Records to be homogeneous.

Unary Functions

These functions take in a single parameter, here the SelectRows function takes in a table and the function will check for each row if the Manager value equals the Buddy value.

Table.SelectRows( table, (row) => row[Manager] = row[Buddy])

A shorthand for the above is to use the each key word. Here the _ acts as the row variable above. This can be further simplified.

Table.SelectRows( table, each _[Manager] = _[Buddy])

Since we are not parsing the _ variable into any further functions we can omit the value all together.

Table.SelectRows( table, each [Manager] = [Buddy])

All 3 of the above samples are equivalent. The third expression is closer the the DAX syntax

@gruckion
Copy link
Owner Author

gruckion commented May 2, 2019

Create columns for all days in year 2019

let
    totalColumns = 365,
    nowAsDate = DateTime.Date(#date(2019, 1, 1)),
    columnHeadings = List.Transform(
        {1..totalColumns},
        each if 
            _ is number 
        then 
            DateWithWeekName(Date.AddDays(nowAsDate,_-1))
        else
            Number.ToText(_)
    ),
    Source = #table(columnHeadings,{ {1..totalColumns} })
in
    Source

Merge Names table with 365 days in the year table

This is a work in progress, this does not merge as expected and to also assign values for each name on each of the days is difficult at this stage.

let 
    Source = #"Names",
    #"Removed Other Columns" = Table.SelectColumns(Source,{"Name"}),
    totalColumns = 365,
    nowAsDate = DateTime.Date(#date(2019, 1, 1)),
    columnHeadings = List.Transform(
        {1..totalColumns},
        each if 
            _ is number 
        then 
            DateWithWeekName(Date.AddDays(nowAsDate,_-1))
        else
            Number.ToText(_)
    ),
    daysTable = #table(columnHeadings,{ {1..totalColumns} }),
    fullTable = #"Removed Other Columns" & daysTable
in 
    fullTable

Attempt 2, was doing too much in the M side of things

let 
    Source = #"Employee Availability",
    nowAsDate = DateTime.Date(#date(2019, 1, 1)),
    isPublicHoliday = List.Contains(#"Public Holidays 2019"[Date], nowAsDate),
    hoursAvailable = if isPublicHoliday then "Public Holiday" else 7.5,
    
    #"Added Custom" = Table.AddColumn(Source, DateWithWeekName(nowAsDate),
        each if (_[Public Holiday] <> "IND") then hoursAvailable else 7.5),
    #"Removed Columns" = Table.RemoveColumns(#"Added Custom",{"Team", "Valid from", "Valid until", "Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun", "Public Holiday"})
in
    #"Removed Columns"

@gruckion
Copy link
Owner Author

gruckion commented May 8, 2019

Getting Started With Dax Formulas in Power BI

Dax is an expression language for distilling your data into bite-sized insights. It looks similar to Excel but it is much more powerful. DAX is column focused rather than focusing on each cell as is the case with Excel.

DAX is used in Power BI, Power Pivot, SSAS tabular mode and and Azure Analysis Services.

What is DAX good at?

  • Aggregations, DAX is optomized for turning millions of values into one result, such as a total annual sales.
  • Filtering, DAX is optimized to slice and dice your data, based on filters.

When creating a report

  • Easy way of defining key metrics, so you can add new KPIs later
  • Ability to slice and dice, producing new views
  • Support historical analysis

The Power of DAX

To get the total sales SUM ( [Amount] ), if we instead want to get the sales for the previous year we use CALCULATE which allows us to shift the filters. The second parameter is the filter where we can use SAMEPERIODLASRTYEAR which takes in the DateKey of our Date period.

Prev Year Total Sales := CALCULATE( SUM ( [Amount] ), SAMEPERIODLASTYEAR ( 'Date'[DateKey] ) )

What is DAX Not Good at?

  • Operational Reporting, this is the opposite of analytical reporting. DAX is not good for listing out lines items on a sales order or for providing a list of time sheet entries.
  • Wide tables, a table with 50 columns will reduce DAX's compression and performance
  • Many to Many Relationships, DAX does not support many to many there are ways around it but it is not easy to work with.

The problem with learning Dax

At first DAX appears to be similar to a combination of Excel and SQL formulas which makes it feel simple. Despite DAX looking similar to Excel, even sharing the same function names, the logic is very different since the unit of measure is the column not rows / cells.

When you start working with calculated columns, measures, filtering and Evaluation contexts there are new concepts you need to learn. Calculated columns and measures look very similar but have different use cases. After this it gets easier again with time intelligence and nesting evaluation contexts.

image

DAX involves manipulating filters, it allows you to take current date ranges and shift them by a year. This provides a simple way of getting year on year analysis with little work.

Data Governance, managing your data from an organisational level.

Data Governance is a set of processes that ensures that important data assets are formally managed throughout the enterprise.

Data Governance

  • Organizational BI vs Self-Service BI
  • On-Prem vs Cloud

Self service business intelligence can be summarized with a litmus test, if an accountant needs to speak to IT to get the data needed for a report then the organisation is not self servicing.

Self-Service BI Organizational BI
Report authors closer to the business Single source of truth
(Theoretically) faster turnaround Less report bloat
(Potentially) more scalable Not all users are developers

What is the cloud?

-The cloud, originally was a place holder "IDK" on a networking diagram

  • Now it means computing as a commodity
  • Paying for managed infrastructure

Thinking in Columns, Not Rows

  • OLTP, Online Transnational Processing. Applications designed for day to day use and frequent updates
  • OLAP, Online Analytical Processing. Traditional data warehouse optimized for reads not writes.

Things to optimize for.

  • Focused on single column aggregations, one column with some filters applied. This is normally how your KPIs are structured.
  • Large number of rows, you want to be able to go back by years worth of data.
  • Dealing with repeated values, with transnational processing you do not want to repeat values but rather join data together. But with analytical processing you want to avoid joining tables, which means you will flatten the data into one table which results in repeated values.
  • Need to quickly apply filters to produce many views

Columnar databases are highly optimized for analytical reporting. While row wise databases is like a Swiss army knife it's pretty good at many things.

  • Analytical bases are stored with a star schema which is better suited being stored as columns
  • Vertipaq is what Microsoft uses for storing the data into columns.

Adding Business Logic with Calculated Columns and Measures

Gross sales is total * quantity but the additional information here is was there a discount on the product, was the discount valid when the product was sold. There is always additional complexity or an edge case that is unique to the business. This complexity needs to be encapsulated in the data model within the business intelligence semantic layer. Data without the business knowledge is meaningless and DAX allows us to encode that business logic to the data to create meaning.

Adding meaning

  • Calculated columns, new columns from formulas or calculated expressions restricted by row context.
  • Measures (aggregates), creating own aggregations and date filters to condense columns into singular values.

Calculated columns are computed at the time of refresh, so when you reload your data all these calculated columns will be computed. Row context restricts your expression to only seeing values for each specific row.

Example of Calculating Gross Sales

Gross Sales = [Price] * [Quantity] * (1 - [Discount])

Adding Column from another table as one to many

In order to add a new column from another table we need to define a relationship. This is due to the expression being restricted to the row context. For this we use RELATED() to check the column color from the table Products we use Color = RELATED(Products[Color]). For this to work there needs to be a relationship between the two tables, if you do not see the table you are interested in the you need to create a relationship.

Here this is a one to many relationship. 1 -> *.

image

Adding a Column for many to one relationships

If we wanted to create a relationship in the opposite direction * -> 1 then we use RELATEDTABLE() iterator this pulls data from the many to one side. Say we have a products table and a sales table, if we want the total sales per product then we need to take the many sales records for that specific product, sum the result and then append that as a new column. This would be a many to one relationship.

We have regular aggregate functions and iterators, aggregate functions take in a single column and evaluates based on that one column. An iterator takes in two parameters, a table and an expression to evaluate. SUMX() is an example of an iterator, this will go through each row in the table and runs the evaluation. Using SUMX(RELATEDTABLE(Sales)) will go though each Product and the RELATEDTABLE is being used to grab all the related sales for that specific product row. The second parameter needed in SUMX is the column in the Sales table that you wish to sum. To get a sum of sales per product this would be the total TotalSales = SUMX(RELATEDTABLE(Sales), Sales[Total]).

To change the format of a column use Modeling -> Formatting.

If you do not create a relationship, or can not due to a circular reference. You can still add a column to take data from another table with the LOOKUPVALUE function. With related we use Color = Related(Products[Color]) the same with a lookup is achieved with Color2 = LOOKUPVALUE(Products[Color],Products[ProductID],Sales[ProductID]). Here we specify the column we want and then the additional search parameters that need to match to successfully retrieve the lookup. Unlike Excel if there are duplicate ProductID's we will get an error, in Excel it returns the first value. If there is no match then we will get back BLANK().

Measures

Calculated columns are limited, measure are harder to grasp but allow us to manipulate filters applied by the user. This allows us to look at all of the data one column at a time rather than a single row. A measure is an expression that summarizes all of the data and is evaluated at run time, so this uses more CPU but less RAM. Measures are limited by a filter context but we can manipulate it.

An example of a measure is Total Gross Sales = SUM(Sales[Gross Sales)) here we are using an aggregate function to combine the data together here we are doing that for Gross Sales in the Sales table.

Calculate columns extend the table horizontally while measures extend the table vertically. They are loosely associated with the table and can see all data within the filter context

image

Creating a measure in DAX

Measures allow you to manipulate filters such as data information as well as overriding user filters the user has applied. They can also be used to form more complex aggregates when SUM, MIN or MAX just isn't enough.

From the Report view right click the table under Fields and select New measure this is also accessible from Home > Calculations > New Measure. Measure have access to all the data in any table, it is possible to create a measure about Products within a Sales table. This allows us to create a dummy table and add measures to that table. This dummy table then acts as a way to group the measures together.

Creating a measure Minimum Price = Min(Sales[UnitPrice]) we can add this as yet another column to a table or matrix. This new column is also filtered based on the different row values. This is implicit filtering and is an example of how measures are defined within the filter context.

Limitations of measures

If we wanted to create a slicer to filter by colour this works fine, if we wanted to use our measure it won't work. You can not select a measure as a filter. You must use a calculated columns, this makes sense as the measure of Minimum Price by its self would be a single value.

Filtering data using Calculate

This allows us to get data with filters such as previous years to date and comparisons. There are two types of filters, implicit and explicit visuals and pivot tables are applying implicit filters behind the scenes.

These filters can be modified in DAX using the CALCULATE function. For more advanced comparisons we can use the FILTER function. To undo filters we use the ALL function.

The CALCULATE can be used to override filters that have been applied by the user. Implicit filtering is applied before a DAX expression is evaluated. This could be from a user making a selection from a slicer visual. Explicit filters are coded into the DAX expression, explicit overrides implicit filters from the user.


CALCULATE can be used to filter, it is fast but limited. Calculate only lets you compare a single column to a fixed value. FILTER does a lot more.

You can remove filters using ALL

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant