How many keywords do you need to search for? Can you say “Needle in a haystack”????
So, you are analyzing social media posts and you need to flag any posts that contain certain keywords. The problem is your list contains hundreds of keywords! Today, I will show you a Power Query function which will easily perform the keyword search.
Basic Steps
Create your keyword list – as a List. Once you get your keyword list as a table, use the Convert Table to List function on the Transform ribbon to convert to data type List.
Create the function
(String) =>
let
//check if values in MyKeywords is in String
MatchFound = List.Transform(List.Buffer(MyKeywords), each Text.Contains(String, _, Comparer.OrdinalIgnoreCase)),
//index position of match found
Position = List.PositionOf(MatchFound, true),
//return null if Position is negative
Return = if Position < 0 then null else MyKeywords{Position}
in
Return
Add a custom column to the query with the text to be searched and invoke the function.
And there you have it! Use this technique to search any text across any list of words.
Active Items, Open Tickets, Outstanding Items – All names for basically the same calculation. In this video, we will discuss how to use DAX to enable the count of any “Active” item. In the sample data, there is only a stage created date and no stage end date, so we first create a stage expire date as a calculated column, then leverage the new column in our Active Ticket measure. Finally, we will display our Active Ticket measure in a gantt chart visualization created using a Power BI matrix.
Note: If your data has both a create and end (expire) date per line, then you can skip to the Active Ticket measure at timestamp 9:40 in the video.
Just the Code:
Setup: In my sample data, each row contains a Ticket, Stage and Create Date – but no stage end or Expire Date. There is a Closed date but only for the last stage of the Ticket, not the intermediate stages.
Create calculated column LowIsNew – ranking all stages within a ticket from newest to oldest
Create second calculated ColumnRowExpirationwhich calculated the Created Date for the next event and sets it as the EXPIRE var (varaiable). The IF in the Return also checks if the status of the row is Closed, in which case the EXPIRE is replaced by Closed date.
In your data model, ensure you have a date table, mine is Dates, that has a date field with contiguous values for all of the facts in the Ticket table. The key item here is DO NOT CREATE A RELATIONSHIP BETWEEN Dates AND Tickets.
Finally, create Measure Active Tickets which will count the Active Tickets for each day. This measure is explained in detail starting at 9:40 in the video.
James Philips’ keynote to kick off the 2017 Data Insight Summit in Seattle was nothing less than AMAZING!!
Coming soon to Power BI will be the ability to embed PowerApps within a PowerBI canvas, creating bookmarks which are aware of the filters on which they were set, and the ability to drill to other report while respecting filters. In addition, they noted that Power BI Premium is now generally available.
There were SO many upcoming items… I am giddy with anticipation… A-MAZ-ING!!
Force zeros to align between the two axes on Combo charts
Horizontal Image Slicer
Updates which will allow more Accessible Reports
Well done. Looking forward to working with the June release and getting my hands on the new “toys” coming in the next quarter.
Well, I am off to get ready to present “What does Raspberry Pi have to do with BBQ” on day 2 of summit, will talk more about that tomorrow. Bye for now..
Just wanted to drop a quick post about the Microsoft Data Insight Summit held in Seattle this last week. This was an inaugural event which focused on all things Power BI, with both Power BI Desktop and Excel. It was really fun to meet up with members of the PowerBI community from all over the world. I got to see Alberto Ferrari deliver his Advance DAX class, and learn to be a Data Scientist from Danielle Dean. 🙂
During the Opening Keynote, they shared some incredible upcoming features! My Favorite is being able to set Row Level Security on a Data Modelin the service! WOW! This will truly be a game changer for most of my clients! A few other great features they mentioned coming “this Spring” are
As you can imagine, I am really enjoying playing with Power Query and Power Pivot. PowerPivot models have been around a bit longer so the software is pretty stable. Power Query is still pretty new, so occasionally we run into some “gotchas” with the software, this post is intended to help you avoid a few of these “gotchas”.
Members of my team were happily creating PowerPivot models with the data sourced in Power Query, and out of the blue, mysterious errors kept occurring:
Exception from HRESULT: 0x800A03EC
Could not refresh table ‘Survey’ from connection ‘Power Query – Survey’. Here’s the message we got:
Out of line object ‘<xxx>’ referring to ID(s) ‘<xxxGUID>’ has been specified but has not been used. The ‘<xxx>’ column does not exist in the rowset.
Upon further research and happily landing on this thread (and the brilliant Miguel Llopis) led us understand that there are some common “gotchas” which cause your Power Query to be marked Read Only (bad), and thus not refreshable. Once you are in this state, the only way out is to remove the query from the Data Model and put it back in (see steps below for renaming a PowerPivot table).
To help keep you out of trouble, in a PowerPivot model where the data is sourced from Power Query do NOT perform any of these operations DIRECTLY in the PowerPivot model:
(If you are using the native Power Pivot data connections for ALL tables in the model, these do not apply)
Don’t Edit Table Properties – it seems like you could remove a column from the model by de-selecting it from Table Properties, but don’t do it!
Go back into the Power Query source query for the table and use a Remove Column function to remove the column, then reload to Power Pivot.
Do not Delete, Rename or change the Data Type of a Column in a Power Pivot table.
Go back into the Power Query source query for the table and use a Remove Column, Rename Column, or Change Data Type function to make the changes, then reload to Power Pivot.
Don’t Delete a Table directly in the PowerPivot window
Go back into the Power Query source query for the table uncheck the box for “Load to Data Model”. If you REALLY don’t need the query anymore, you can delete the query and it will also remove it from the PowerPivot data model.
Don’t Rename a PowerPivot table (and underlying Power Query)
For this one, you will have to break your PowerPivot data model. In Power Query,
Go back into the Power Query source query for the table uncheck the box for “Load to Data Model” Apply and Close to refresh.
Edit the query and Change the name of the query, Apply and Close to refresh (but don’t re-check the box just yet).
Edit the query and re-check the box for “Load to Data Model” Apply and Close to refresh – this will also load the table back to the Data Model with the new name. Note: You will have to re-apply your relationships and also add back any calculated columns which had previously been on the PowerPivot table. This is why it is a good idea to think through your naming conventions early.
Import additional PowerPivot tables using PowerPivot Import Wizard. Even if it is a really simple query, use Power Query.
So if one PowerPivot table is Power Query, they all must be Power Query
Don’t try to upgrade an existing Excel 2010 Data Model built using Power Query to an Excel 2013 Data Model.
This one actually makes a ton of sense to me. As Excel 2010 cannot load data DIRECTLY to the Data Model, you have to use Load to Worksheet then create a linked table to get the data into the Data Model. Recreating this model directly in Excel 2013 would result in a much more streamlined model.
One way to tell if you are creating trouble for yourself is to check your Power Query data connections to see if they are still OK.
On the Data Ribbon in Excel, select Data Connections
Open each Power Query data connection, and view Properties
On the Definitions tab of the Properties window, look at the connection string. If the Connection String box is white and looks like you could edit it, then you are OK. However if the Connection String box is greyed-out, then “Houston, you have a Problem”!
At this point, you will need to disconnect the Power Query from the Data Model.
Right click the Power Query and select Edit.
De-select both the Load to Worksheet and Load to Data Model check boxes.
Apply and Close the Power Query.
Right click the Power Query again and select Edit.
Re-select Load to Data Model check box (and Load to Worksheet if needed).
You will need to put back any updates to the table in the PowerPivot model, such as relationships, calculated columns, and measures.
Power Query Version at the time of posting: (2.12.3660.142)
(Due to new bits for Power Query being published essentially monthly, I will note the Power Query version # when the post was created)
Distance calculations! Over the past couple of years, I have had several clients who are using PowerPivot who want to measure the distance between two points on a map: Client to distribution center or building to building.
For just a few calculations, this is very straight-forward using plain old Excel. It has all the needed trig functions to calculate the distance in miles between two points using Longitude and Latitude where they are expressed in decimal form.
The first step is expressing each Latitude/Longitude of both of the coordinates in radians:
lat1 = (lat1Decimal / 180) * PI (3.14159265359)
lon1 = (lon1Decimal / 180) * PI (3.14159265359) – repeat for 2
Finally, use this formula in Excel to derive the distance in Miles. If you prefer km, then change the ending constant to 6371.
You can download a file which has the formulas: The table contains the coordinates, and the radian calculations are in columns H:K, and the final distance calculation is in column M.
The problem we encountered was when we wanted to calculate a million records which was too many formulas for Excel to handle gracefully.
We would have liked to use PowerPivot to solve this issue, but alas, there isn’t a Sin() or Cos()function in DAX (the PowerPivot formula language), so this was not possible.
Power Query to the rescue!!
Power Query is built on a language called “M” – kind of sounds like something out of a 007 movie, huh?
Let’s create the calculation!
I downloaded the From/To coordinates into Excel and formatted the area as a Range (Home Ribbon > Format as Table > select any format)
On the Power Query ribbon, you will see that there are many different data sources we can use to get data. In this case, we are going to select From Table.
This will bring up the Power Query Window. Since I named the table “Distance” in Excel, the Name Property in Query Settings has already named the query “Distance”.
The formula bar displays the syntax for sourcing data from an Excel data Range with a range named “Distance”
Just like in Excel, we could put the entire formula, including the calculation of the radians, in one gigantic formula, but it is much more difficult to read. So we will do it like Excel, and perform it in steps.
The next step is to add a column to calculate each of the 4 coordinates to Radians. To do this, you will select Insert Custom Column from the Insert Ribbon.
Enter the formula in the formula bar:
Lat1_Rad = ([Latitude_1] / 180) * Number.PI
Click OK to close the editor. Repeat this process 3 more times for the remaining coordinates:
Lon1_Rad = ([Longitude_1] / 180) * Number.PI
Lat2_Rad = ([Latitude_2] / 180) * Number.PI
Lon2_Rad = ([Longitude_2] / 180) * Number.PI
After adding all 4, your query editor should look like this. You can see the column names that you entered, and on the right Applied Steps pane, there is an entry for each action we have taken. Currently the last item is highlighted, and the corresponding formula is visible in the formula bar.
PQ has performed an AddColum function on the Table, and for each item, has performed the requisite calculation. Scroll thru each step to see the syntax for each. Also, you will note, as you select each item, it displays what the table looks like up to that point. This is very evident when you click on Source, as the four calculated columns disappear. Click on InsertedColumn3, and they re-appear. This is a great tool for debugging more complex M scripts.
Finally, we add the column for distance as “Distance_Mi”. Be sure to follow the capitalization structure from the Power Query reference EXACTLY as it is currently VERY case sensitive. I hope they ease this a bit in future releases, as it messes with my brain a bit to have to remember when to cap and when to ensure functions are lower case.
Now, in our final model, would probably want to keep the Latitude and Longitude columns in order to play with the cool mapping functions in Power Map and Power View (which we will do in a subsequent post), but you no longer need the “helper calculations” for the radians.
Highlight the four radian columns, and from the Home Ribbon, select the dropdown on Remove Columns and select Remove Columns. Note the other choice here is to Remove OTHER Columns which comes in really handy when you have a list with a ton of columns but you only need a couple of them.
Cool, huh? Since the previous steps are still in the list, the calculations for the removed columns are still in memory, so the distance calculation still works!!
On the bottom right, you have selections for where to load the data. If you don’t need it in Excel, you can go directly to the PowerPivot model by selecting Load to Data Model. For our purposes, we are just going to Load to Worksheet, so ensure that is clicked and select Apply and Close from the Home ribbon.
Note: In Excel 2010 the only selection you can make is Load to Worksheet. The Load to Data Model is available in Excel 2013 and O365.
You now have a finished distance calculation! Test it by comparing the Excel formulas from the original table!
Welcome to GirlsWithPowerTools.com! If you found this blog expecting to find scantily clad ladies with buzz saws, boy have you come to the wrong place!
If you would like to learn more about the great suite of tools that Microsoft Power BI has to offer, then you are in the right place. The Power BI tools encompass Microsoft Excel, PowerPivot, Power Query and Power Map, all showcased in BI Sites on SharePoint. On the server side, we will also dive into SQL Server Analysis Services Tabular models (SSAS Tabular, or just Tabular)
Who am I? I am Barbara Raney, a PowerBI Architect with Slalom Consulting. I have over 20 years of practical experience with Microsoft Excel, and 10 years consulting and training my clients to more effectively use Excel.
Why start my own blog? In all of my years working with Microsoft Excel, I love to figure out all of the cool things you can do with spreadsheets and have been the “go to girl” whenever questions come up. I absolutely love it when people say that “Excel can’t do that” – only to show them – “Yes it can!!”
As I mentioned, I don’t work for Microsoft, but I do a ton of work WITH them, which has really taken my love of all things Excel to a whole new level. In this blog, we are going to go on a journey learning cool new tips, tricks and techniques using all of the “Power Tools”. And “the Girls” mentioned in the website name? They are a group of wonderful geek girls who will be helping me along the way – and hopefully guest-blogging!
Many of you already know about PowerPivot, which is a great data modeling tool, and now the new kid on the block is POWER QUERY! This wonderful little add-in will work in Excel 2010, 2013 and Office 365.
Inventory all of the files on a drive
In our first adventure, we are going to use Power Query to take inventory of all of our files and then find where there are duplicates! This is a great way to free up some space on your disk. 🙂
If you use Excel 2013, make sure you have the latest version at Windows Update.
Open a blank worksheet and check out the Power Query ribbon:
You can use the Get External Data chunk to load data from a ton of data sources:
The last item in the first list, “From Folder”, is what we are going to use.
One the Get External Data chunk, in the From File dropdown, select From Folder
Select the drive (or folder) that you would like to inventory and click OK
Voila!! Now you see a list of all of the files in all of the subfolders in the path provided.
The inventory is missing one key ingredient: file size. For a file to be a true duplicate, the Name, Date Modified and Size all need to be the same.
Expand Attributes Column and select Size, then click OK.
Next we are going to filter out any non-Excel files by applying a filter to the Extension column
Take a look at the Query Settings pane on the right. In the Applied Steps area, you will see the three steps we performed on our data. If you click back on the Source step, you can preview what the data looked like when you started, as you select each step, its action is applied. This is VERY handy when you are doing a particularly long transformation and you need to backtrack what you did without losing your work.
Rename your query to Inventory, check the Load to Worksheet box, and uncheck the Load to Data Model box, and select Apply and Close from the Home ribbon.
Note: If you have Excel 2010, your only option is to Load to Worksheet.
Your beautiful list is now in Excel! The right pane is the Workbook Queries pane, which can be toggled on and off from the Manage Queries chunk of the Power Query ribbon.
Find the Duplicates
We could use simple Excel formulas or conditional formatting to find our duplicates, but this is a post about Power Query, so we are going to use it to find our dups.
We need a second query which starts out just like the first, but then goes just a bit further. Even though it is short, I am going to show you a short cut to create a copy.
In Excel, right click on your query and select Duplicate. Yes, it is just that easy!
Right click it again, and select Edit, as we need to take this one just a bit further.
We only need to keep in the query the columns that define an exact match. Highlight Name, Date Modified and Attribute.Size, and in the Remove Columns dropdown, select Remove Other Columns
Next, highlight all three columns again, and select Group By from the Transform chunk.
Ensure that all three columns are in the Group By section and that the Operation is set to Count Rows, then click OK.
Now you have a count of how many times each file appears in the folder. In my case, I filtered the list to remove the ones and I have 4 files with 4 version each!
Now it would be nice to not have to use VLookup or something like that to find out where they live in the folder structure, so I am going to Merge this list with my full inventory to create a second list of just duplicates!
From the Combine chunk, select Merge Queries, then select holding your shift key down, select Name, Date Modified and Attribute.Size from both tables and check the box to Only include matching rows and click OK. I don’t need the whole inventory again, just the records with the duplicates.
Expand the NewColumn column, and select Folder Path and click OK.
Double click on the NewColumn.Folder Path column and rename to Path. This is the finished query! Again, you can see each of the applied steps. If you want to run this query for a different path, simply right click on the Source step in both queries and change the path to the new one!
Go ahead and Apply and Close to load your dups into Excel, and clean up that drive!!