In my previous posts, I documented my findings from studying my blood pressure (BP) data collected from a consumer blood pressure cuff and correlating it to data from other activities, including travel and exercise. Part One covered the purpose and procedure. Part Two presented the results and conclusions. In this entry, I’ll describe the tools I used to perform the analysis and highlight some lessons learned.
Marshaling the Data
The data for this project came from a variety of sources:
Blood pressure data from the Withings iPhone app
Exercise data from an iPhone app called MapMyRun Note: walk/run and BP data are also available through Apple’s Health app
Miscellaneous data recorded in spreadsheets or on paper
Just aggregating the data into a single format proved to be a challenge. MapMyRun does not support data export in the free version, and Apple’s app exports data in an inconvenient XML format. I ended up entering exercise data by hand into a spreadsheet, which was manageable for the 100+ days of data I had available.
The existing spreadsheet data was easy to use, but the data recorded on paper needed to be manually added to the spreadsheet. Only the Withings app made the export process easy.
Conveniently, all of the data was keyed by date, enabling me to create a spreadsheet that had one row for each date, followed by columns for each data source. The BP data was a minor exception because it had multiple readings per day. I condensed those into a single line by averaging the values.
In the end, I had one spreadsheet containing all the data. A “rectangular” data set makes it easy to use tools like Excel and Orange, rather than having to write a program to manage a more complicated data format.
Rectangular, largely numeric data is right at home in Excel, so that’s a natural starting point for any analysis.
Much of the data required minor clean-up, such as splitting a date/time field into separate fields for date and time and averaging it. Pivot tables and graphs were useful as a way to quickly explore the data set to see if anything stood out. Some of Excel’s graphs are limited to 256 data points, and I had 333 for this analysis, so I couldn’t make full use of those features.
There are some tricks and best practices that dramatically increase Excel’s value. Spendan hour with Joel Spolsky (former program manager for Excel) to find out how to maximize Excel’s functionality.
R is the natural choice for any sort of data analysis, but for this exercise, I was interested in exploring Orange, a graphical tool backed by python for analyzing data.
Orange makes it really easy to browse a data table, visualize the data, and even apply machine learning techniques to the data set. I used Orange to generate all of the graphs in my previous blog post.
Some of the features require an understanding of data science, but a lot of the statistical functions allow you to easily compare subsets of data to find patterns that would be difficult to discover in Excel.
This exercise in real-world data analytics taught several lessons.
Do a test run first! It’s worth planning out your analysis and running it on a small set of data before scaling to the full data set. Trial and error on 100 data points is far less painful than if you have 1000.
Tool knowledge is essential. Mastery of your tool set leads to better analysis and quicker results. But having a problem to work on is also an opportunity to learn more about the tools when you run into problems and figure out how to solve them.
Take a lot of notes. Exporting, aggregating and analyzing data requires numerous manual steps. In a formal environment, you’d use technology to automate the steps and make them repeatable. It’s not worthwhile to do that for a small process you’re only going to repeat a few times. But without an automated process, it’s essential to write things down: the steps taken, solutions created and any “gotchas” you encountered. Especially the gotchas.
They say “all science is becoming data science.” Perhaps in the next decade, the fundamental techniques required to do data science will be incorporated into the tools that scientists use. In the meantime, there’s a disparate set of increasingly powerful and free (or cheap) tools to help you wrangle data and a growing universe of data sets available for learning.
Sensors, health and wellness apps, and data analysis tools are now maturing to the point where clinicians, researchers, and even healthcare consumers can analyze and correlate interesting measurements. As I described in last week’s post I recently used some data science techniques to analyze factors that impact my blood pressure (BP).
I am not a clinician. I’m looking for patterns in data without much information about the underlying physiology.
I’m not a statistician, either. I’m not trying to prove that any of the observed results are statistically significant.
I’m not completely comfortable with publicly sharing my health information. I have removed specific values from the charts and graphs, so that I’m not publishing my actual health data, just the general trends.
Here are some of the results.
Overall average blood pressure
I’ll start with a snapshot of my average blood pressure readings. The following graph represents the average systolic blood pressure measured in 333 readings from December 25, 2015 through April 15, 2016.
Reading the graph
The horizontal axis represents systolic blood pressure, with high values on the right and low values on the left. The vertical axis represents how frequently each measurement appears in the data. The more frequently a specific reading is observed, the higher it will peak. A low point on the curve represents a relatively uncommon reading.
In this graph, there are separate lines for readings before 9:00 a.m. (red), after 6:00 p.m. (blue), and between 9:00 a.m. - 6:00 p.m. (green). The morning measurements more commonly have lower readings. Midday readings are generally higher (further to the right) and more variable (lower peaks). Evening readings are, on average, lower than midday but not as low as the morning.
The Withings blood pressure cuff records systolic, diastolic and heart rate data. I’m simplifying it by showing just one reading.
The BP cuff captures the date and time of each reading. By converting that into a day of the week in Excel (and cross-referencing my calendar), I can tell which days were workdays and which were weekends, holidays or vacation days. It’s no big surprise that being at work shifts my blood pressure slightly higher (3.5 points on average).
The benefits of cardiovascular exercise are apparent from a comparison of days that included running at least 5 km: 4.5 points lower on average.
Compared to cardio, it appears that resistance training offered no particular benefit. Days that included weight lifting had a mean and median BP reading that was nearly identical to days without.
The stress of traveling seems to increase BP by a couple of points.
Cardio: -4.5 BP systolic points vs. not running
Beet juice: -2 points vs. non beet juice days
Resistance training: no effect
Workday: +3.5 BP points vs. non-workday
Travel: +4.5 BP points vs. not traveling
These findings are consistent with expectations, but the observed effects are quite small and might be smaller than the margin of error. Even so, as a patient I find it interesting to see these correlations, and it reinforces the need for exercise. From my doctor’s point of view, the daily BP readings provide more value from the analysis, because they show a more complete picture than the measurements he takes in his office once a year.
Another result of this exercise is that I was able to demonstrate it’s possible to collect, combine and analyze data using low-cost equipment and tools. In the next installment, I will dive into the details of how I carried out the analysis and discuss the tools and techniques I used. I’ll demonstrate how it’s now possible to conduct this type of study without a science lab.
There is still a long way to go to make this a simple, automatic process.
In an earlier post, I talked about devices, wearables, and the Internet of Things as inevitable developments in connected healthcare. Today, Apple’s Health app typifies the state of the art for consumer devices data collection. It aggregates and displays data collected from various sensors and apps, but simple chronological displays of readings are rarely helpful when studying causes and effects. And furthermore, not every kind of data can be aggregated within the app.
Can consumer-grade devices be used to learn about health and wellness causes and effects? I decided to run an experiment to find out. In this entry, I’ll explore the purpose of the experiment and procedure I used. Next week, I will reveal the results and conclusions, followed by a review of the tools and techniques.
A Blood Pressure (BP) reading is one of the fundamental measures of a patient’s health, yet it varies constantly in response to numerous factors – including the subject’s presence in a clinical setting, which may artificially increase BP readings.
This is extremely inconvenient because the doctor’s office is one place where an accurate BP reading is essential for gauging the patient’s health. If a patient’s only BP readings are gathered at his annual physical exam, the data will be too sparse and too skewed to be useful.
That’s why I decided to invest in a home BP monitor and measure my blood pressure three times a day. The Withings BP monitor automatically measures blood pressure and transmits the result via Bluetooth to an app running on an iOS device. My experiment was to collect about three months’ worth of readings and combine them with data I was already collecting for other purposes.
Here are the data streams I was able to aggregate:
Travel dates and destinations. This is manually tracked in an Excel spreadsheet.
Cardio and resistance exercises, which are tracked in three places: automatically by an app called MapMyRun, in the Apple Health app, and on paper. The Apple Health app also tracks my daily steps.
Alcoholic beverage intake, which is recorded for reasons that are too complex to explore in this blog post. This is tracked in an Apple Numbers spreadsheet on my phone.
Plus, there are several other data points that are available “automatically” as a result of the data collection process:
Time of day
Day of week
Workday vs. non-workday
Finally, I read a report that beet juice can lower blood pressure, so I decided to manually track beet juice intake to see if it actually had any effect.
For the most part I didn’t make any attempt to control the variables; I simply went about my normal activities (with a little extra record keeping). This was not a carefully controlled experiment. It was an analysis of real world data.
My plan was to aggregate the data in Excel, and my hope was that most of the different data sources could be easily imported into a single spreadsheet. Since some of the data was already in Excel, those sources were no problem. The Withings app made it fairly straightforward to export BP data.
MapMyRun, however, only allows data exports if you’ve paid for the Pro version. I tried working around this by exporting “walking + running” data from the Apple Health app, but the output from Apple Health was in an inconvenient XML format that I didn’t care to parse. In the end I hand-entered the fitness data into the spreadsheet, which was a drag.
Bottom line: before embarking on an experiment like this, know your data sources and how to access them. It’s worth doing a practice run on a small set of data, just to make sure everything is going to come together.
In the next installment, I’ll cover the tools I used to analyze the data and the findings.