• MarkDataGuy

Visualizing Covid-19 with PowerBI and Publish to Web

Updated: Apr 4

It looks like lots of data people are using Covid-19 data sets to build their own visualizations and run their own analysis of the virus. Using PowerBI and Publish to Web I have built my own. The results are here, but let review how this was approached.

NB - Covid-19 and the pubic health impact is very serious. I am not trying to replace any of the existing public reporting tools out there. Please, wash your hands, stay safe and follow the instructions of your national health authority.

1) Find a dataset. There are quite a few out there, but I wanted one that was clean, complete, updated and official. A little bit of google-fu and I settled on the following source. Johns Hopkins CSSE is one of the official groups that are tracking Covid-19 and they make all of the data available via GitHub.

Working with GitHub is interested. In this case, they have a lot of downloadable CSV's of data, however, with GitHub, you can use the raw.githubusercontent.com link to get to the raw data. In my case, the following link https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv

opens the raw data that I need for Power Bi Query

2) Examine the source data. Once you get to data you want to see what you are dealing with. In this case, the source is CSV, and of particular interest is that date is stored on columns. So each new day adds a new column to the data set. I also note the presence of Latitude and Longitude values as well as normal country/region data. Finally, the data is not additive, so the current total is reported every day.

3) Load to Power BI. This was straightforward, if you do a Get Data -> Web and paste in that URL from point 1 above, Power BI will do a great job in bringing in the data. However, now I want to get it into a shape that suits me. So I want to transform it from its current columnar layout to a row by row approach. (Reporting loves thin and long tables and not fat and short)

PowerQuery makes this really easy and you can do it all via the GUI. You want to

  • Make The First Row the Header

  • Unpivot the Date Columns

  • Rename, change the type and remove rows with 0 cases

In PowerQuery M (Advanced Editor) you end up with;

    Source = Csv.Document(Web.Contents("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Confirmed.csv"),[Delimiter=",", Columns=NumColumns, Encoding=65001, QuoteStyle=QuoteStyle.None]),
    #"Promoted Headers" = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
    #"Unpivoted Columns" = Table.UnpivotOtherColumns(#"Promoted Headers", {"Province/State", "Country/Region", "Lat", "Long"}, "Attribute", "Value"),
    #"Renamed Columns" = Table.RenameColumns(#"Unpivoted Columns",{{"Attribute", "Date"}, {"Value", "Confirmed Cases"}}),
    #"Changed Type2" = Table.TransformColumnTypes(#"Renamed Columns",{{"Date", type date}, {"Confirmed Cases", Int64.Type}}),
    #"Filtered Rows" = Table.SelectRows(#"Changed Type2", each ([Confirmed Cases] <> null and [Confirmed Cases] <> 0))
    #"Filtered Rows"

Note the Columns=NumColumns parameter. This required a little bit of work as you have to tell PowerQuery how many columns you want to read. In this case, we know that there will be one new column per day, so I created a separate query that would calculate a date difference from the current date to my first load date so that the number of columns would increase by 1 per day.

4) Create your data model and any additional calculations. The model is easy in this case as it is just one table. However, you have to allow for the fact that the data is presented as aggregated total. We don't get the number of new cases per day, but the total number of cases to date. If you select a range of dates, you don't want to end up over-counting. What you want is to present the latest value in that range. This requires a little bit of DAX as follows;

CALCULATE(SUM('Covid Growth'[Confirmed Cases]), LASTDATE ('Covid Growth'[Date]))

LASTDATE will get the value for the latest date in the selected date range.

5) Build your report, deploy, schedule a refresh and you are done.

Thanks for reading and stay safe.


Original on Transparent.png

 Based in Cork, Ireland

Tel. +353 86 2247617