Følgende indlæg fra Kapacity beskriver en trin-for-trin metode til at navigere i XML-data fra en tabular model, til at bygge en genanvendelig dokumentationsramme i Power Query.
Have you ever had a client come up with the absurd requirement of documentation? No? Well, lucky you. For the rest of us, however, nothing is more tedious than spending hours upon hours of working through every little nook and cranny of our BI projects, to come up with a listing of all the exposed dimensions, attributes, measures, calculations, etc. Will anybody ever read this documentation? Most likely not, but that is not the point. If your client’s notion of “documentation” is simply a selection of metadata formatted as a table inside a word-document, this article might save you a lot of repetitive copy-paste labour.
In Microsofts BI world, everything is XML. Got some complex logic in an SSIS Data Flow Task? It’s XML. Working with SSAS Multidimensional or Tabular cubes? They are XML¹ . Even SSRS reports are completely defined from the bottom up using XML. To see this, simply open the .dtsx, .cube, .dim, .bim or .rdl files in your favourite text editor. Browsing through the XML, you might be able to find the metadata that you are looking for, but how do you get rid of all the other stuff in the file, and put the interesting metadata in a nicely formatted table?
Figure 1 – xml definition of the Adventure Works 2014 Tabular model.bim-file. Interesting metadata such as dimension and attribute names (Product Id, Product Subcategory Id), etc. are hidden inside a jungle of internal properties and IDs.
If you happen to be a guru of xpath, xquery, xlst or other xml-based technologies, you can probably come up with some kind of script in C#, PowerShell or something else, to extract the relevant metadata from the XML in the BI object files. For the rest of us though, documenting code by writing even more code seems a little over the top compared to opening the objects in Visual Studio one by one, and documenting manually from the top down.
Enter Power Query (or “Get & Transform” for Excel 2016 users²)
Without writing any code, Power Query might simply be the best tool for extracting useful information from large amounts of XML data. Once you get used to working with XML data in Power Query, you will be able to document your BI solutions in minutes! The techniques described here, can of course be used with any kind of XML data, not just objects in Microsoft BI projects. To start, open Excel and create a new query from XML:
Now choose the file you want to extract metadata from (in this example, we use the Model.bim file of the Adventure Works 2014 Internet Sales Tabular Model. Note that you don’t need a deployed tabular model to follow the example, just the .bim-file). If you have multiple files that you want to extract the same metadata from (for example, multiple SSIS packages or SSRS reports), use the “From Folder” option.
Power Query will load a preview of the chosen file. Unfortunately, even though the file contains actual XML data, it might not be immediately obvious to Power Query, and the editor might look something like this:
This indicates that Power Query is treating the file as a Text File with a single column of data, which is the individual rows of the file treated as plain text. To change this, click the small cog icon in the upper right corner of the “Applied Steps” box (marked with a blue arrow on the image above).
In the dialog that appears, change the setting for “Open File As” to “Xml Tables”. If you are opening an XML file from a Microsoft BI tool, you should also change the File Origin setting to “65001: Unicode (UTF-8)”, to ensure that special characters show up properly.
Once you press OK, the Power Query editor should now look something like this:
Now we are getting somewhere! If you look closely on Figure 1 and 5, you will see that Power Query has turned the outermost tags of the XML document into individual columns. XML nodes that contain other nodes, will show up as expandable columns with the content “Table”. We can navigate down into the XML structure by either expanding these columns or clicking on the “Table”-text.
Clicking on a “Table” cell will take us to that particular node in the XML document, showing everything it contains as a new dataset, which we can then dive further into. Expanding a column, on the other hand, combines the data of the column we expand with the data at the current level, denormalizing (flattening) the data.
For now, let us dive down through the following objects, to see if we can get a list of dimensions somewhere in this Model.bim file:
After removing all columns but the “Name” and “Attributes” columns, the editor should now look something like this, and thus we have a list of dimensions in the model:
Note that Power Query, clever as it is, inserts some additional steps for changing types of some of the columns, while we navigate down through the XML. While this does not do any harm, it creates unnecessary many intermediate steps. In fact, if you delete all of the “Changed Type” steps inserted by Power Query, the XML drill-down steps will be automatically collapsed into a single “Navigation” step, as seen in Figure 7, which looks a lot neater.
Say now, that we want to know which attributes are available in each dimension, whether they are visible, and what the source column of each attribute is. Instead of creating individual queries for each dimension, let us simply expand the “Attributes” column, which in turn contains a table of “Attribute” nodes that we will also expand, keeping only the “Name”, “KeyColumns” and “AttributeHierarchyVisible” columns:
Hint: If you uncheck “Use original column name as prefix”, the column names after expanding will be shorter and easier to work with.
If your tabular model uses Display Folders, you can document them as well, by checking the “AttributeHierarchyDisplayFolder” columns when expanding the “Attribute” nodes. Unfortunately, the Adventure Works tabular model does not use Display Folders, so we cannot illustrate this here.
The editor should now look something like this (we’ve renamed some columns to better suit our needs, and replaced any “Null” values in the AttributeHierarchyVisible column to “TRUE”, indicating that a particular attribute is visible):
The “KeyColumns” column should contain information about the source of the attribute (for example whether the data comes from a column in the data source, or if the attribute is in fact a calculated column). So expand it along this path:
To work around this, temporarily apply a filter on the Attribute Name column, so you only show a dimension attribute which you happen to know is a physical attribute, such as the “Title” attribute on the “Customer” dimension. After applying this filter, you should be able to see all the XML nodes when expanding the “Source” column:
Expand the ColumnID column as mentioned above, and then remove the step that applies the filter.
The reason we had to apply this trick, is that Power Query only uses the first available record when determining which columns we can choose when expanding down. In a Tabular model, the first attribute on any dimension is the internal “RowNumber” attribute, which unfortunately does not have the ColumnID XML-node, even though other attributes does have it. To make this node visible for Power Query, we therefore apply a filter before drilling down.
The same technique can be used to navigate to the Expression-node, which contains the DAX formula defining any calculated columns. This is also an XML-node which is available under the “Source”-column, so if we want to show both the “ColumnID” as mentioned above, and the “Expression”, we must first duplicate the “Source” column before we expand. Then, we must again apply a filter on a column known to be calculated, and remove the filter after expanding. This is left as an exercise to the reader.
After cleaning up the data set by renaming columns further, replacing NULL values with empty strings, etc. we can get the final result to look something like this:
Click “Close and load”, sit back, and enjoy the finished documentation of all dimension attributes in your tabular model.
Granted, navigating an unknown XML document structure can be a little tricky – even when using Power Query. A helpful tip is to open the XML document in your favorite text-editor while working with Power Query, since a text-editor allows you to quickly search the entire document for a particular XML node. Say you wanted to list all the measures of your Tabular Model and their definitions. You would simply have to search for the name of a measure in your model (or alternatively search for “CREATE MEASURE”), to find out, that they are located at the following path in the XML-file:
ObjectDefinition > Database > Cubes > Cube > MdxScripts > MdxScript > Commands
Each “Command” node under the “Commands” node holds a “Text” node, which contains the MDX script (which is in fact a piece of DAX code), that defines the measure. Using the Power Query “Split column by delimiter” transformation, it is possible to retrieve the name of the measure and the DAX code defining the measure from this script. Information about visibility, display folders, formats, etc. can then be retrieved in a separate query from the following path:
ObjectDefinition > Database > Cubes > Cube > MdxScripts > MdxScript > CalculationProperties
Use standard Power Query features to merge the two queries into one, for a complete listing of measures with their DAX definition, properties, etc.
Power Query is an excellent tool for traversing and transforming data in XML documents, which makes it invaluable when documenting Microsoft BI-solutions, since metadata is mostly stored as XML. But the possibilities reach much further. Whenever you want to extract useful information from metadata, whether it is stored as XML, JSON, or on a relational database, Power Query can definitely help you.
Once you have created your queries, you simply save the Excel sheet and the next time you need updated documentation, you simply refresh the queries in Excel. Furthermore, for some solutions you might even be able to reuse your queries across solutions.
In general, Power Query is intended as a replacement for manually copy-pasting data around, and once again, we’ve seen how it absolutely shines at that task. So go ahead and start documenting your solutions with Power Query.
¹ Well, as of publishing this article, it turns out that not everything will stay XML. In SQL Server 2016, the Analysis Services Tabular Model will switch to a JSON-based file format for storing the model metadata. Luckily, since Power Query supports JSON just as well as XML, the techniques described here still apply.
² Microsoft changed the branding of Power Query with the release of Excel 2016. If you’re using Excel 2016, you will find all the same functionality in the ”New query”-button on the ”Data” tab in the ”Get & Transform”-section.