Data Wrangling Power BI Covid-19 Data Layout Changes

In the last few weeks CDO Advisors created 2 blog posts to show you how to connect to GitHub and access the Covid-19 time series data.  Since the first blog post was created the GitHub data has been modified at least 2 times.  While frustrating this shows a great challenge that data scientist and analyst have when sourcing data from systems they do not control.  Often you will have to change your Power BI reports to keep up with these data sources.  In this post, I show you how to modify your existing report to use the new data by data wrangling Power BI.

The GitHub main repository can be found here – https://github.com/CSSEGISandData/COVID-19

CDO Advisors Microsoft Power BI Solutions:

BI as a Service – Low Monthly Fees, Pay Per Month Per User

Executive Dashboards – Get Dashboards in 4 Weeks

Healthcare Dashboard Proof of Concept – SEE your data visualized

Microsoft Power BI Demo – Compare your current reporting to Power BI

Power BI Quick Start Solutions – Ready to start or enhance your Power BI?

Train Your Team – Remote Power BI Developer and End User Training

Use Power BI for Finance Teams – Learn how Power BI can improve operations

Virtual Chief Data Officer – Confused by all your data?  We can help

CDO Advisors YouTube – Subscribe and Watch Our YouTube Channel

Newsletter Subscribe - blog

Data Wrangling Power BI Covid-19 Data – Setup

If you previously complete these blog posts:

How to Model Power BI COVID 19 Data

or

Build Power BI Covid Reports from GitHub Data

This blog shows you how to update your existing Power BI report to use the updated time series data.

When I tried to refresh the data, I now receive this error in Power BI Desktop.

Data Wrangling Covid-19 in Power BI Error

Power BI Desktop Data Refresh Error

Power BI tries to connect to the previously working URL and it cannot find the file.  The way to investigate this is to review the current GitHub project and see what changes were made to the files.  As of 4/11/2020, the data was split into 2 files.  First time series data with confirmed global cases stored as a CSV.  The US data was moved to a new file of just US confirmed Covid cases.  The columns have changed from the original source file and those will need to be updated as well.  In this example, I will use the US data file to update my Power BI file and update my data.

Update Power BI Data Connection

To update the report connection right click on the table name and select Edit Query.  This opens the query in Power Query.

Power BI Edit Query Menu

Data Wrangling 101 – Understanding Applied Steps in Power Query

When you open Power Query you can see the steps being applied to the data in the Applied Step section for data wrangling.  Each step has its own row in this section.  By clicking on each step starting at the top you can see the changes being made as well as find any errors in the steps.  For example, if you make a change that results in an error at step 5, you can click on step 4 and if there is no error the changes will be applied successfully.  When working on projects from other people, this can help you determine issues in the Power Query steps.  In addition, each step is outlined in the Advanced Editor making it another method for you to see the changes being applied to the data.

Power BI Applied Steps Example

Updating Data with Power Query Advanced Editor

Next click on the Advanced Editor to see all of the steps that are applied to the the Power Query from the previous blog post.  Take some time and read through the steps being taken to better understand how Power Query applies the changes from the visual editor into code.  Understanding the Advanced Editor is key to future data wrangling projects.

Power BI Desktop Advanced Editor
Advanced Editor GitHub String
Power BI Advanced Editor GitHub String Updated

Change Power BI Data Connection in Advanced Editor

The blue highlight on the top picture shows the old Covid-19 GitHub data file name and location.  You can edit the file name directly in the editor.  Delete the old file name and then replace it with the new file name.  Make sure to compare it to the current GitHub file name location.  For reference, the current file name as of 04/12/2020 is:

https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_US.csv

The bottom picture shows the file name after I changed it directly in Advanced Editor.  Once you have completed the changes hit Done to apply the new file name.

Power BI Refresh Data Error

Once you have updated your data connection, you will get an error.  In this example, the Province/State column has been removed from the GitHub data.  In addition, new columns were added from the old file format.  To correct this error, you need to click on Go To Error and delete the steps starting from the bottom of the Applied Steps 1 by 1 until the file imports correctly.  In this case,  you will have 3 applied steps:

  • Source
  • Change Type
  • Promoted Headers
Power BI Refresh Error
Power BI Update to Applied Steps

Data Wrangling New Columns

Now follow the same process from our blog How to Model Power BI Covid 19 Data that shows you how to Unpivot the columns to get the dates from columns to rows.  Then do a few clean up items:

  • Rename Admin2 to City
  • Rename Attribute to Date and change the data type to Date
  • Rename Value to Cases and change the data type to whole number

Once completed, click Close and Apply

You should now have completed your data wrangling required to convert your old Power BI Desktop report from the prior GitHub data file to the new format.  As they continue to modify the files.  You may need to keep making changes to your Power Query.  Now that the data is split between US and Global.  If you wanted to model both files, you could import each file into the same Power BI Desktop and then merge the data to get a single dataset.

Data Wrangling Covid-19 to Add New Columns in Power Query

Ideally the Covid-19 data would be setup to add new rows to the data for each new day like a database does.  In this case, the Covid-19 data is setup to add a new column for each day.  This provides real world experience for data wrangling Covid data.  This requires you to update your Power Query to account for the extra columns each day.  There are 2 changes that need to be made to account for the extra columns.  The count of columns and the data layout for each column.

First, update the number of columns.  In this example, my original file had 64 columns.  Now the new file has 91 columns.  A simple edit from 64 to 91 sets up the number of columns to be read.  Next add a new column definition for each new column you need to add highlighted in the blue rectangle.  In my example, I needed to add “{“Column65”},{“Column66”},{“Column67″},” to Column 91 “{“Column91″},”.  Take care when adding a the updated text to ensure your brackets and commas are correct.

If you do this daily, you will just need to update the number of columns and then add the text definition for the new day.

Click Close and Apply and you should now have your new data.

Power BI Advanced Editor Number of Columns Updated

CDO Advisors Microsoft Power BI Solutions:

BI as a Service – Low Monthly Fees, Pay Per Month Per User

Executive Dashboards – Get Dashboards in 4 Weeks

Healthcare Dashboard Proof of Concept – SEE your data visualized

Microsoft Power BI Demo – Compare your current reporting to Power BI

Power BI Quick Start Solutions – Ready to start or enhance your Power BI?

Train Your Team – Remote Power BI Developer and End User Training

Use Power BI for Finance Teams – Learn how Power BI can improve operations

Virtual Chief Data Officer – Confused by all your data?  We can help