How I learned PowerPivot Blog Series – Data Prep

This post is the second in a series about learning the built-in Business Intelligence features baked into Excel 2013.  If this is your first time accessing this blog series, I encourage you to read the first blog post in order to get the proper context:

Blog One: How I learned PowerPivot from taking football stats

The first post in the series described how I used an iPad and Excel 2013 to capture stats for my son’s youth football team.  While somewhat kludgy, it did the job reasonably well.   But it was lacking the ability to look deeper into the stats to find nuggets of insight (aka Analytics).  So…PowerPivot to the rescue!

As noted in the previous blog, I began to recognize that this mass of stats was actually a rich data set that was ripe for deeper analysis.  In order to do deep analysis, I needed to ‘pop the hood’ (or ‘bonnet’ for my international audience) and prepare the data to be used in PowerPivot.  What is PowerPivot, you ask?  I could write an entire blog on this, but Rob Collie already did in his blog entry called What is PowerPivot?.  For now, I’ll simply say that PowerPivot is the engine that powers analytics within Excel.

  1. Before I go further – it’s important to note that the company I work for (Magenic) provides access to Microsoft Office Professional Plus, which includes Excel 2013.  This version allows me to use PowerPivot (and Power View, which will be discussed in a future blog). In other words, if you need these features, make sure you have the proper version of Excel 2013.

Step 1: Enable PowerPivot Add-In (this article describes the process better than I can)

Step 2: Convert the Data Range to a Table

It’s a good practice to convert the data range that will be analyzed into a Table, with the added benefits of easier sorting, filtering and formatting data.  Note: this is different that simply renaming the Tab.  To convert the range into a table:

  • Click in the data range
  • Choose Insert from the ribbon, then click Table:

Insert Table

  • Confirm the data range, and make sure that the “My table has headers” checkbox is checked.

Create Table Dialog

  • Change the table name from the default name ‘Table1′. It’s not only best practice; it saves a step when the table gets added to the Data Model as shown below:

Rename Table Name

Step 3: Add to Data Model

  • Once the Table was created, add it to the PowerPivot Data Model.  To do that, choose PowerPivot from the ribbon, then click Add to Data Model:

Add to Data Model

  • After clicking the Manage button within the PowerPivot tab of the Excel ribbon (see the above image), here’s the view of the table within PowerPivot:

Data Model - First Table

Repeto (Latin for Repeat)

I repeated the process for a Roster, Plays, and Teams table.  When I originally set up the Excel spreadsheet with separate tabs to capture the data, I didn’t have the foresight to know that it was step one of creating a dimensional model.  At the time, simply capturing data and putting related data together in a separate tab made little sense to me.  So, in a way, I was lucky.  But the lesson I can pass on to those reading this blog who may not be data analysts is to organize your data so it is readily ‘consumable’ by PowerPivot.  I’m certainly not a data analyst, but here are some tips I can pass along:

  • Think ahead on how your data will be used.  In my case, I knew that the coach wanted to see Rushing Yards by Player, Tackles by Player, Rushing and Passing Yards by Play, Points per Game, etc.  There is a pattern: X by Y.  In data modeling terms, this is referred to as Fact by Dimension.  In my case, the Play_Data table became my Fact table, and the Roster, Plays, and Teams tables became my Dimension tables.  I’ve seen my share of SWS (Super Wide Spreadsheets), where every imaginable piece of data is stuffed into a massive number of columns.  Not only does it make it difficult to print, this approach makes it difficult to use with PowerPivot.  Keep what is being measured in one table, and what it is being measured by in separate tables.
  • Clean data is good data.  While Excel accommodates a bit of sloppiness when it comes to data, PowerPivot and tools like Pivot Tables and Power View do not.  For example, “Offsides” and “Off Sides” may be the same penalty, but not to PowerPivot.  “12 yrds” and “12” may convey the same information, but one is a text entry and the other is a number.  In my case, I inputted my fair share of phantom player numbers and play numbers as well as a large number of misspelled penalties.
  • Renaming is Key!  The names that you assign to columns and tables will be used extensively in PowerPivot, and it’s important to know up front that PowerPivot doesn’t “auto-fix” formulas when you later rename a column or table.  Save yourself time and headaches in the future by thinking it through (even a little bit helps).  I tip my hat to Rob Collie for that tip, which is found in his book DAX Formulas for PowerPivot – The Excel Pro’s Guide to Mastering DAX.  A great read for those who want to take their PowerPivot skills to the next level.
Here’s a useful tip: once the data range is converted to a table, take a few moments to use the Column Filters to ensure you have valid and consistent data.  I use Filters frequently, and I encourage you to become familiar with using this valuable function.  Lastly, while you are reviewing your data, take the extra effort to highlight the columns and set the type of data within the column.  Excel typically categorizes everything as General. Not only is it good practice to properly categorize the data, but PowerPivot will also pick up the data categorization when the table is added to the data model.  Here’s a screen shot:

Data Categorization

Bonus Tip

There is one more tip I’ll share that I’ve found to be pretty handy: the Sort by Column feature.  In the screenshots above, did you notice the “10 – Newport” data within the Game column?  Previous to the version of PowerPivot that comes with Excel 2013, I had to add a number to the text so that the data would sort properly.  This was terribly annoying, especially for everyday things like sorting by Month or Day.  So, without further ado, here is the solution:

  • I removed the appended game numbers in my Play_Data table, and in my Teams table in Excel (not PowerPivot) I added a GameNumber column, as seen below:

Game Number Column

  • I updated PowerPivot to refresh the Play_Data data and to bring in the GameNumber column and data into the data model:

Update Data Model

  • I opened the PowerPivot window:

Manage PowerPivot

  • I selected the Team column and clicked on the Sort by Column button on the ribbon.  In the dialog box, I set the Team column to be sorted by the GameNumber column and clicked OK.
  • Now, when I create a Pivot Table or Power View graph, the games display in the order played:

Game Sorted Properly  Tackles by Game

Blog Preview

In the next blog of this series I’ll show you how to create relationships within the Data Model.

How I learned PowerPivot from taking football stats

It all started out with a simple request from my son’s football coach.  What I didn’t know was how saying yes would help me evolve from a guy that was fairly handy with Excel to a BI analyst.  So, at the urging of Steve Hughes, who is the BI Practice Lead at Magenic, I’m writing a blog series about this process.  My goal: show the steps I took in transforming these stats into a full-fledged BI solution within Excel 2013.  In this series I’ll show how I went from this:

Solution - Before

To this:

Football Power View Image

First, the request: at the beginning of my son’s 6th grade football season, his coach asked me if I could take stats for the team.  He was looking for the very basic: rushing yards and tackles.  I had done it once the previous season, filling in for a parent that couldn’t be at the game.  Simple stuff, really.  Since having this information helps the coaches better plan for the next game, I said yes to the request.

Being the tech savvy over-achieving business analyst that I am, I thought about the easiest way to capture the data.  Writing it down? Not a chance.  I avoid picking up a pen or pencil whenever possible.  Our family has an iPad, so why not use that?  I did a bit of research, and settled on using Numbers to capture the data during the game, export the data, email it to myself, and import it into an Excel spreadsheet.  But to use all this capability to just capture rushing yards and tackles?  Kinda like driving a semi to pick up groceries.  It should have been no surprise to those who know me when I added more stats to the spreadsheet – I mean, why not?  Soon, I had added the play being run, passing yards, kickoff and punt yardage, penalties, touchdowns, etc.

Soon enough, I began amassing a fair amount of data. Using the iPad was fairly easy to capture the stats in real-time (thanks to parent helpers that were my spotters), and I had the added benefit of being an unofficial part of the team staff on the sideline.  The evolution was already underway, as my Excel formulas were becoming more complex in summarizing stats by play and player.  In other words, a bunch of SUMIFS and COUNTIFS-based formulas that did the job, but were cumbersome and cludgy (not to mention error prone).  That said, the coaches couldn’t get enough of it and were actually using the data to tweak the game plan on a weekly basis.

Somewhere during mid-season I discovered is that the football stats were a rich data set that was readily consumable by Power Pivot, a feature available to me in Excel 2010 sitting on my desktop.  The evolution of displaying stats takes an interesting turn, which I’ll talk about in the next blog.  In a word: Slicers!

(Note for those who may be curious: the images shown in these blog posts are real stats, but the names and numbers of the players have been changed to mask their identity.  For those of you who know me and are trying to get a glimpse of how your kid did last year on my son’s team – good luck.)