Data Visualization

March Running Check-in

In this post, I’m focusing on three different visualizations: 1) Heart Rate during Run, 2) Heart Rate Recovery after Run and 3) Steps per Minute during Run. With these, I hope to show my progress towards keeping my goals.

So let’s get started.

import datetime as dt
import pytz

import numpy as np
import pandas as pd
import json
import requests

from lib.python_fitbit import fitbit
from lib.python_fitbit import gather_keys_oauth2 as Oauth2

from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
import plotly.graph_objs as go
from plotly import tools
import cufflinks as cf

init_notebook_mode(connected=True)
cf.set_config_file(world_readable=True, offline=True)

Exercise Routine

Before we dive into the visualizations, I’ll explain how I’ve been structuring my workouts so far. Usually, I wake up around 6 am to get to the gym by 6:15 am. Before hopping on the treadmill, I will stretch for about 15 minutes, employing a mix of active and passive stretching.

For my runs, I usually split them into 4 intervals. Most recently, I start at 6 mph and increase my speed by 0.5 mph each interval. In between, I walk at a brisk 3.5 mph. However, earlier in the year I started at slower speeds.

To aid in improving my form, I set the treadmill to a 5 percent grade. This makes stepping from the balls of my feet more natural. Once stepping from the balls of my feet becomes a habit, I will likely decrease the incline to better simulate running outdoors.

Now that we got the background information out of the way, we can get into the charts!

Using my Fitbit to Keep Track

I purchased a Fitbit as a New Year’s gift to myself and it has been no short of addicting for a data nerd like myself. Not only does it help you track things like weight loss, caloric intake, and sleep, but it records detailed records of your workout activities.

Let’s begin by authenticating with the API. I used this fantastic tutorial to get started.

with open('keys.json', 'r') as f:
    keys = json.loads(f.read())
server = Oauth2.OAuth2Server(keys['fitbit_client_id'], keys['fitbit_client_secret'])
server.browser_authorize()
ACCESS_TOKEN = str(server.fitbit.client.session.token['access_token'])
REFRESH_TOKEN = str(server.fitbit.client.session.token['refresh_token'])
fitbit_client = fitbit.Fitbit(keys['fitbit_client_id'], keys['fitbit_client_secret'], oauth2=True, access_token=ACCESS_TOKEN, refresh_token=REFRESH_TOKEN)
[29/Mar/2019:15:16:14] ENGINE Listening for SIGTERM.
[29/Mar/2019:15:16:14] ENGINE Listening for SIGHUP.
[29/Mar/2019:15:16:14] ENGINE Listening for SIGUSR1.
[29/Mar/2019:15:16:14] ENGINE Bus STARTING
CherryPy Checker:
The Application mounted at '' has an empty config.

[29/Mar/2019:15:16:14] ENGINE Started monitor thread 'Autoreloader'.
[29/Mar/2019:15:16:14] ENGINE Serving on http://127.0.0.1:8080
[29/Mar/2019:15:16:14] ENGINE Bus STARTED


127.0.0.1 - - [29/Mar/2019:15:16:16] "GET /?code=6ee9ea06feb4b86e9b2133346b1f276eee122204&state=nBevMeAMybTgKe0EOCs5s1OZdUonvy HTTP/1.1" 200 122 "" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/73.0.3683.75 Chrome/73.0.3683.75 Safari/537.36"


[29/Mar/2019:15:16:17] ENGINE Bus STOPPING
[29/Mar/2019:15:16:22] ENGINE HTTP Server cherrypy._cpwsgi_server.CPWSGIServer(('127.0.0.1', 8080)) shut down
[29/Mar/2019:15:16:22] ENGINE Stopped thread 'Autoreloader'.
[29/Mar/2019:15:16:22] ENGINE Bus STOPPED
[29/Mar/2019:15:16:22] ENGINE Bus EXITING
[29/Mar/2019:15:16:22] ENGINE Bus EXITED
[29/Mar/2019:15:16:22] ENGINE Waiting for child threads to terminate...

Exercise Activities Data

We can use the Get Activity Logs List endpoint to get all of my runs.

Unfortunately, the python library that I was using has not added support this endpoint. With a little gum and a paper clip, I was able to manually create the request.

all_activities = []

# create url manually because not supported by sdk
base_url = "{0}/{1}/user/{2}/activities/list.json?afterDate={3}&sort=asc&limit=20&offset=0"
url = base_url.format(*fitbit_client._get_common_args(None), '2019-01-01')
activities_list = fitbit_client.make_request(url, method='GET')

all_activities.extend(activities_list['activities'])

# api returns <=20 activities, check for more
while activities_list['pagination']['next']:
    # get next list of activities
    next_url = activities_list['pagination']['next']
    activities_list = fitbit_client.make_request(next_url, method='GET')
    
    if len(activities_list['activities']) > 0:
        all_activities.extend(activities_list['activities'])
    
activities = pd.DataFrame(all_activities)
fitbit_runs = activities[activities['activityName'].str.contains("Run")].reset_index(drop=True)
fitbit_runs['startTime'] = pd.to_datetime(fitbit_runs['startTime']).dt.tz_localize('UTC')
fitbit_runs.head()
activeDuration activityLevel activityName activityTypeId averageHeartRate calories caloriesLink distance distanceUnit duration ... logType manualValuesSpecified originalDuration originalStartTime pace source speed startTime steps tcxLink
0 1792000 [{'minutes': 0, 'name': 'sedentary'}, {'minute... Run 90009 145.0 403 https://api.fitbit.com/1/user/-/activities/cal... NaN NaN 1792000 ... auto_detected {'calories': False, 'distance': False, 'steps'... 1792000 2019-01-04T07:35:02.000-05:00 NaN NaN NaN 2019-01-04 12:35:02+00:00 3741.0 https://api.fitbit.com/1/user/-/activities/190...
1 1231000 [{'minutes': 0, 'name': 'sedentary'}, {'minute... Run 90009 145.0 278 https://api.fitbit.com/1/user/-/activities/cal... NaN NaN 1231000 ... auto_detected {'calories': False, 'distance': False, 'steps'... 1231000 2019-01-07T08:10:32.000-05:00 NaN NaN NaN 2019-01-07 13:10:32+00:00 2648.0 https://api.fitbit.com/1/user/-/activities/190...
2 1126000 [{'minutes': 0, 'name': 'sedentary'}, {'minute... Run 90009 147.0 253 https://api.fitbit.com/1/user/-/activities/cal... NaN NaN 1126000 ... auto_detected {'calories': False, 'distance': False, 'steps'... 1126000 2019-01-09T07:14:22.000-05:00 NaN NaN NaN 2019-01-09 12:14:22+00:00 2387.0 https://api.fitbit.com/1/user/-/activities/191...
3 1536000 [{'minutes': 0, 'name': 'sedentary'}, {'minute... Run 90009 138.0 318 https://api.fitbit.com/1/user/-/activities/cal... NaN NaN 1536000 ... auto_detected {'calories': False, 'distance': False, 'steps'... 1536000 2019-01-11T07:36:38.000-05:00 NaN NaN NaN 2019-01-11 12:36:38+00:00 3341.0 https://api.fitbit.com/1/user/-/activities/191...
4 1331000 [{'minutes': 0, 'name': 'sedentary'}, {'minute... Run 90009 158.0 315 https://api.fitbit.com/1/user/-/activities/cal... NaN NaN 1331000 ... auto_detected {'calories': False, 'distance': False, 'steps'... 1331000 2019-01-13T09:50:41.000-05:00 NaN NaN NaN 2019-01-13 14:50:41+00:00 3118.0 https://api.fitbit.com/1/user/-/activities/191...

5 rows × 25 columns

Fitness Graphs

Fitbit’s API also provides a lot of fine grain data that I can use to get a more in-depth view of my runs.

Let’s grab my heart rate and step data for each run.

hr_for_run = []
steps_for_run = []
for index, run in fitbit_runs.iterrows():
    # add buffer to heart rate api url
    hr_link = run['heartRateLink']
    hr_link_split = hr_link.split('/')
    
    start_time_str = hr_link_split[-2]
    end_time_str = hr_link_split[-1].split('.')[0]
    
    start_time = dt.datetime.strptime(start_time_str, '%H:%M:%S')
    start_time = start_time - dt.timedelta(minutes=10)
    
    end_time = dt.datetime.strptime(end_time_str, '%H:%M:%S')
    end_time = end_time + dt.timedelta(minutes=45)
    
    hr_link = '/'.join(hr_link_split[:-2]) + '/' + start_time.strftime('%H:%M:%S') + '/' + end_time.strftime('%H:%M:%S') + '.json'
    
    # get heart rate data
    heart_info = fitbit_client.make_request(hr_link, method='GET')
    
    # heart rate data to df
    hr_df = pd.DataFrame(heart_info['activities-heart-intraday']['dataset'])
    hr_df.time = pd.to_datetime(hr_df.time)
    hr_df['time_from_start'] = hr_df.time - hr_df.time.iloc[0]
    
    hr_for_run.append(hr_df)
    
    # get steps for that day
    start_time = run['startTime'].to_pydatetime()
    steps_info = fitbit_client.intraday_time_series('activities/steps', base_date=start_time.strftime('%Y-%m-%d'), detail_level='1min')
    
    # step data to df
    steps_df = pd.DataFrame(steps_info['activities-steps-intraday']['dataset'])
    steps_df.time = start_time.strftime('%Y-%m-%d') + ' ' + steps_df.time
    steps_df.time = pd.to_datetime(steps_df.time).dt.tz_localize('EST')
    
    # get step data from 10 minutes before run start to 30 minutes after run start
    steps_df = steps_df[(steps_df.time > run.startTime - dt.timedelta(minutes=10)) & (steps_df.time < (run.startTime + dt.timedelta(minutes=30)))]
    steps_df['time_from_start'] = steps_df.time - steps_df.time.iloc[10]
    
    steps_for_run.append(steps_df)
# label each run with its month
fitbit_runs['month'] = fitbit_runs['startTime'].dt.month

Heart Rate During Run

First, we’ll check out my heart rate during my runs. From this, we should be able to see what parts of my runs are most strenuous. My guess is that increases in my heart rate will correlate with the intervals during my run.

To smoothen out the monthly averages, I will apply a 2-minute moving average.

data = []
for month in sorted(fitbit_runs.month.value_counts().keys()):
    # get that month's runs idx
    run_idx = fitbit_runs[fitbit_runs.month == month].index
    
    # find average hr recovery for that month
    seconds_from_start = range(0, 60*40, 5)
    hrs = {key: [] for key in seconds_from_start}
    
    for i in run_idx:
        # skip short runs for better averages
        if fitbit_runs.iloc[i].duration < 600000: continue
        
        # beginning of run
        recovery_start = hr_for_run[i].iloc[0].time_from_start
        
        # query hr
        for seconds in seconds_from_start:
            # find 5 second interval 
            start_delta = recovery_start + dt.timedelta(seconds=seconds)
            end_delta = start_delta + dt.timedelta(seconds=5)

            val = hr_for_run[i][(hr_for_run[i].time_from_start >= start_delta) & (hr_for_run[i].time_from_start <= end_delta)].value
            
            hrs[seconds].extend(val.values)
    
    avg_hrs = {}
    for seconds, hrs in hrs.items():
        if hrs:
            avg_hrs[seconds] = np.mean(hrs)
        else:
            # no hr recorded for interval, use previous interval
            avg_hrs[seconds] = avg_hrs[seconds-5]
            
    xvals = np.array(range(60*-10, 60*35, 5))/60
    yvals = list(avg_hrs.values())
    
    data.append(go.Scatter(
        x=xvals,
        y=yvals,
        name=dt.date(1900, month, 1).strftime('%B'),
        visible="legendonly"
    ))
    
    data.append(go.Scatter(
        x=xvals,
        y=pd.Series(yvals).rolling(window=24).mean().iloc[24-1:].values,
        name=dt.date(1900, month, 1).strftime('%B') + ' 2min SMA',
    ))
layout = go.Layout(
    title='Heart Rate during Run',
    xaxis=dict(
        title='Minutes from Fitbit-Recorded Start of Run'
    ),
    yaxis=dict(
        title='Heart Rate',
        range=[60, 175]
    )
)

fig = go.Figure(data=data, layout=layout)
iplot(fig, filename='hr-by-run')

I would say that my guess was partially right. As we can see, there are two clear spikes at the beginning of the run that just about correspond with the intervals. It’s clear that I’ve been steadily increasing my exercise intensity from month to month.

However, at the end of the run, there is not as much consensus. I believe this is because from time to time I do shorter runs (<15 mins) and thus the averages are pulled down. This is evident in the March moving average as after around 15 minutes there are still spikes but the averages are much lower.

Average Heart Rate Recovery by Month

Heart Rate recovery can be a large indicator of one’s fitness. The faster the heart recovers from exercise, the more fit you are.

In this graph, I want to compare how quickly my heart rate returns to “normal” levels. I will only look at my heart rate starting from 3 minutes before my heart rate dropped below 160 for the last time. I figured this would be the easiest way to line up the ends of my runs, regardless of how long I ran for. To make sure the averages are not affected by any subsequent exercise I do while I’m at the gym, I will only examine my heart rate 5 minutes after the end of the run.

You may be wondering why I have used almost identical code to make this graph. Because of how I’m aligning the heart rate records, I couldn’t use the averages from the graph above because those are aligned at the beginning of the run. I’m sure there exists a more elegant solution but this will do for this check-in.

data = []
for month in sorted(fitbit_runs.month.value_counts().keys()):
    # get that month's runs idx
    run_idx = fitbit_runs[fitbit_runs.month == month].index
    
    # find average hr recovery for that month
    seconds_from_start_of_recovery = range(0, 60*7, 5)
    hrs = {key: [] for key in seconds_from_start_of_recovery}
    
    for i in run_idx:
        if hr_for_run[i][hr_for_run[i].value > 160].empty: continue
        
        # find time 3 minutes before heart rate went below 160
        recovery_start = hr_for_run[i][hr_for_run[i].value > 160].iloc[-1].time_from_start - dt.timedelta(seconds=180)
        
        # query hr through 45 minutes after start of recovery
        for seconds in seconds_from_start_of_recovery:
            # find 5 second interval 
            start_delta = recovery_start + dt.timedelta(seconds=seconds)
            end_delta = start_delta + dt.timedelta(seconds=5)

            val = hr_for_run[i][(hr_for_run[i].time_from_start >= start_delta) & (hr_for_run[i].time_from_start <= end_delta)].value
            
            hrs[seconds].extend(val.values)
    
    avg_hrs = {}
    for seconds, hrs in hrs.items():
        if hrs:
            avg_hrs[seconds] = np.mean(hrs)
        else:
            # no hr recorded for interval, use previous interval
            avg_hrs[seconds] = avg_hrs[seconds-5]
            
    xvals = np.array(list(avg_hrs.keys()))/60
    yvals = list(avg_hrs.values())
    
    data.append(go.Scatter(
        x=xvals,
        y=yvals,
        name=dt.date(1900, month, 1).strftime('%B'),
        visible="legendonly"
    ))
    
    data.append(go.Scatter(
        x=xvals,
        y=pd.Series(yvals).rolling(window=6).mean().iloc[24-1:].values,
        name=dt.date(1900, month, 1).strftime('%B') + ' 30sec SMA',
    ))
layout = go.Layout(
    title='Heart Rate Recovery by Month',
    xaxis=dict(
        title='Minutes from Start of Recovery (3 Minutes before Heart Rate went below 160)'
    ),
    yaxis=dict(
        title='Heart Rate',
        range=[100, 175]
    )
)

fig = go.Figure(data=data, layout=layout)
iplot(fig, filename='hr-recovery-by-month')

Now, this is cool! Although subtle, there is some real improvement visible here. Over the past three months, the rate at which my heart recovers from aerobic exercise has increased. Although the recovery rates during January and March are very similar, the improvement from February to March is substantial.

This is going to be a really interesting analysis to continue checking in on for the next few months.

Steps per Minute

Below, I aggregate the step data from all of my runs so far. As mentioned above, we should see spikes that correspond to the intervals during my run.

# get min amount of data for run
step_len = np.min([len(steps) for steps in steps_for_run])

# assemble runs into matrix
steps_for_run_mat = np.ones((len(steps_for_run), step_len))
for i in range(len(steps_for_run)):
    steps_for_run_mat[i][:] = steps_for_run[i].value[:step_len]
data = []
for month in sorted(fitbit_runs.month.value_counts().keys()):
    # get that month's runs idx
    run_idx = fitbit_runs[fitbit_runs.month == month].index
    month_runs = steps_for_run_mat[run_idx]
    
    # get runs that recorded valid amount of steps
    month_runs = month_runs[np.argwhere(month_runs.mean(axis=1) > 50).reshape(-1)]

    data.append(go.Scatter(
            x=steps_for_run[0].time_from_start.dt.total_seconds() / 60,
            y=month_runs.mean(axis=0),
            name=dt.date(1900, month, 1).strftime('%B')
        ))
layout = go.Layout(
    title='Steps per Minute during Run',
    xaxis=dict(
        title='Minutes from Fitbit-Recorded Start of Run'
    ),
    yaxis=dict(
        title='Steps'
    )
)

fig = go.Figure(data=data, layout=layout)
iplot(fig, filename='steps-by-run')

As I explained in the Exercise Routine section, most recently my starting speed has been 6 mph but earlier in the year, I started a slower speed. This visual really shows that change well.

For most of March, I’ve been pretty comfortable starting at 6 mph but I might start to increase my speeds on subsequent intervals. In the next check-in, we’ll see if I stick to that goal.

Further Analysis

Location Data

The Fitbit API has an endpoint that outputs the TCX data for a run. Using this, we can conduct a similar analysis to the one we did with RunKeeper’s GPX data.

However, there are currently two problems: 1) the endpoint is still in beta and could change and 2) the python SDK currently does not support non-JSON requests. Hopefully, I will be able to submit a pull request to accommodate this endpoint.

Conclusion

From the visuals presented here, I think it safe to say that I’m making progress. What’s not visible is that I’m no longer feeling that pain that I felt after running regularly in the summer. In many ways, this is the most important part for me. I was almost scared that I wouldn’t be able to run anymore because it was too damaging to my legs. The fact that I’m making tangible progress in my fitness and I’m not experiencing any pain indicates that I might be in the clear.

To make sure I’m continuing to improve, I will likely do another check-in at the end of this semester. At that point, I can gauge the effectiveness of my workout routine at school and determine the best way to incorporate workouts into my summer schedule.

I hope this post, along with a few others on my blog, show the utility of using data to track your progress towards your goals. Having a concrete, measurable metric for progress is key to completing goals and in this age of technology, we have countless options for finding such a metric.

As always, corrections, suggestions or comments are always welcome!.