Power BI and NLP against COVID-19

Italy has been experiencing a COVID19 pandemic emergency since February. Despite being a small province (about 300,000 inhabitants), Piacenza was one of the epicenters of the epidemic in Italy; Codogno, where the first Italian COVID-19 diagnosed patient lives, is just a few kilometers from the city. In March, the local hospital quickly turned into a “COVID-19 hospital,” as 80% of hospital beds were reserved for COVID-19 patients, and its Intensive Care Unit had to triple the number of beds. This put the healthcare system under considerable stress.

To avoid collapse, the healthcare organization has invested in “door to door” visits to diagnose COVID-19 to treat these patients at home whenever possible. A task force of specialized teams of clinicians, known as USCA (In Italian: Unità Speciali Continuità Assistenziali, Special Unit for Continuity of Care) was created. The main challenge is to diagnose symptomatic patients and identify asymptomatic carriers in a short time, then decide, depending on how serious the disease is, if the patient can be treated at home or requires hospitalization.

The USCA team assesses a patient’s symptoms and measures body temperature, oxygen saturation level, heart rate, blood pressure. Subsequently, they perform clinical and ultrasound examinations.  The use of portable Ultrasound Devices equipment has been identified as a tool to help clinicians in these domestic settings. The outcome of the evaluation is written in a report. The healthcare organization needs to analyze the data collected in order to monitor the pandemic trend and to identify outbreaks. As often happens, reporting was not designed to facilitate analysis.

Since Piacenza is my hometown, I offered to help them pro bono. I used NLP, and especially “regular expressions”, to extract the information from the reports. The extracted data are both numerical:

  • Blood Pressure
  • Heart Rate,
  • Oxygen Saturation
  • Temperature

and textual:

  • symptoms: fever, cough, loss of taste and smell, dyspnea…
  • observations from the examination with the stethoscope (e.g. murmurs and wheezes)
  • observations from the ultrasound examination: B-lines, thickening of the pleura, white Lung, lung thickening, pleural effusion…
  • therapy adopted: cortisone, heparin, hydroxychloroquine, azithromycin, oxygen…

Writing regular expressions (regex) is not an easy task. This is how I extracted the Oxygen Saturation value from the reports (for instance 96% can be written, depending on the doctor:  sat96%, sat.96%, sat02 96%, sat O2 96% sat O2: 96% etc.)

Regex = (?:\bsat\s?o?2?)\.?:?\s*(?<Sat>\d*)

Data have then been imported and analyzed in Power BI. Power BI Desktop is a free tool offered by Microsoft. It can import data from many different sources: csv and excel files, databases and even online services. It offers a lot of powerful visualization, and you can even write Python or R there. You can download it there: https://powerbi.microsoft.com/en-us/desktop/. You can also publish your reports and dashboards on the web, but that requires a Power BI Service license.

I used it to develop some interesting graphics.

Here you see the trend. The peak of home visits was reached around week 17 (April) and went down in June (wk 25). The average patient’s age was above 60. In August the second wave started, but this time the average age (ETA) was below 40. (The data showed here only go through September. Now it is getting worse again.)

Thanks to Power BI, we can also see the percentage of home-treated people over the total population on a map. The USCA effort enabled doctors to treat at home people living in rural areas, far from the main city of Piacenza.

Power BI embeds Machine Learning and Artificial Intelligence algorithms too. For instance, the graphic below displays a “detector” of key influencers. Here you can see that the key influencer of death is age (I recently lost my father, who was 90); the probability of dying increases about ten-fold (9.57x) if you are over 80, about four-fold (3.74x) if you live in an RSA (Residential Care for the Elderly or RCFE in English) and about twice (1.72x) if you are male.

Here you can find more information about Key Influencer in Power BI:


It uses MLNet, an open source and cross-platform machine learning framework, to run a logistic regression.

I spent most of the time trying to extract and clean the important info from the reports, Skyping in the night with a doctor who, during the day, was involved with one of these USCA teams. Importing the data in Power BI and creating a few reports like these ones took only a couple of hours: Power BI is really user friendly, and its data visualization capabilities are awesome: you can take boring flat data and bring it to life through visualization!

If you are reading this, I suggest you download it now from https://powerbi.microsoft.com/en-us/desktop/ and try it yourself!

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>