Quick Exploration of Asana Data

If you have never heard of it, Asana is a pretty neat tool for keeping track of your personal tasks or managing the Agile workflow of a team. Here’s what their landing pages says:

Asana is the easiest way for teams to track their work—and get results.

Since I’ve been using it for about three years now, I would say that statement is pretty accurate. Recently, it dawned on me the amount of potential insights data from Asana could provide about my productivity. Sure enough, there is an “Export to CSV” but in the actions menu (click the down arrow next to the project name in the center of the screen) for every project!

Let’s see what lies in my (un)productivity data…

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
import plotly.graph_objs as go
from plotly import tools
import cufflinks as cf
from wordcloud import WordCloud
import nltk
import re
import string

nltk.download('stopwords')
cachedStopWords = nltk.corpus.stopwords.words("english")
punct = set(string.punctuation)

init_notebook_mode(connected=True)
cf.set_config_file(world_readable=True, offline=True)

[nltk_data] Downloading package stopwords to /home/sean/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!

For my own sanity, I split my tasks into two projects: School for school related tasks and, Tasks for everything else. My Personal Tasks is the default personal tasks project and is what I used before I created the Tasks project.

Let’s import the CSVs and concatenate them. I use the verify_integrity to make sure there aren’t any duplicates from shifting tasks between projects.

df1 = pd.read_csv('School.csv', parse_dates=[1, 2, 3, 7, 8])
df2 = pd.read_csv('Tasks.csv', parse_dates=[1, 2, 3, 7, 8])
df3 = pd.read_csv('My_Personal_Tasks.csv', parse_dates=[1, 2, 3, 7, 8])
frames = [df1, df2, df3]
df = pd.concat(frames, verify_integrity=True, ignore_index=True)
df.head()

	Task ID	Created At	Completed At	Last Modified	Name	Assignee	Assignee Email	Start Date	Due Date	Tags	Notes	Projects	Parent Task
0	148623786710031	2018-04-15	2018-04-15	2018-04-15	More debt	NaN	NaN	NaT	NaT	NaN	NaN	School	NaN
1	148623786710030	2018-04-06	2018-04-07	2018-04-07	Send one line email to erik to add you to the ...	NaN	NaN	NaT	NaT	NaN	NaN	School	NaN
2	610060357624798	2018-03-27	2018-03-27	2018-03-27	withdraw from study abroad	NaN	NaN	NaT	NaT	NaN	NaN	School	NaN
3	588106896688257	2018-03-09	2018-03-26	2018-03-26	hold	NaN	NaN	NaT	NaT	NaN	NaN	School	NaN
4	570162249229318	2018-02-22	2018-02-23	2018-02-23	find joydeep	NaN	NaN	NaT	2018-02-23	NaN	NaN	School	NaN

df['Projects'].value_counts(dropna=False)

School          766
Tasks           321
NaN             160
Hampstead WX     17
Name: Projects, dtype: int64

I found interesting that the My Personal Tasks pulled in some tasks from an archived project called Hampstead WX. That project was a site I created a while ago.

Here I plot a histogram of the day of the week the tasks were created on.

dayOfWeek={0:'Mon', 1:'Tue', 2:'Wed', 3:'Thu', 4:'Fri', 5:'Sat', 6:'Sun'}
df['Created At DOW'] = df['Created At'].dt.dayofweek.map(dayOfWeek)

x = []
y = []
for key, day in dayOfWeek.items():
    x.append(day)
    y.append(df['Created At DOW'].value_counts()[day])
bar_tracer = [go.Bar(x=x, y=y)]
iplot(bar_tracer)

I can’t say I didn’t expect these results. The bulk of my tasks are created at the start of the week and they slowly fade off until Saturday.

Next, let’s look at the duration it took for me to complete each task. Because I used the parse_dates parameter when importing the CSVs, using the minus operator will return timedelta objects. Since Asana only provided dates without time, tasks with a duration of 0 days are ones that were created and completed on the same day.

df['Duration'] = (df['Completed At'] - df['Created At'])

x = df['Duration'].value_counts().keys().days
y = list(df['Duration'].value_counts().values)
bar_tracer = [go.Bar(x=x, y=y)]
iplot(bar_tracer)

Wow! Seems like we have some outliers. Let’s see what some of them are.

df[df['Duration'].astype('timedelta64[D]') > 75]

	Task ID	Created At	Completed At	Last Modified	Name	Assignee	Assignee Email	Start Date	Due Date	Tags	Notes	Projects	Parent Task	Created At DOW	Duration
151	425397822367084	2017-09-08	2017-12-19	2017-12-19	Comp Sci 220	NaN	NaN	NaT	NaT	NaN	NaN	School	NaN	Fri	102 days
166	425397822367086	2017-09-08	2017-12-21	2017-12-21	Comp Sci 240	NaN	NaN	NaT	NaT	NaN	NaN	School	NaN	Fri	104 days
184	425397822367087	2017-09-08	2017-12-19	2017-12-19	Physics 151	NaN	NaN	NaT	NaT	NaN	NaN	School	NaN	Fri	102 days
296	425397822367083	2017-09-08	2017-12-19	2017-12-19	Stats 515	NaN	NaN	NaT	NaT	NaN	NaN	School	NaN	Fri	102 days
344	177397329054805	2016-09-06	2017-07-26	2017-07-26	Misc:	NaN	NaN	NaT	NaT	NaN	NaN	School	NaN	Tue	323 days
362	148623786709961	2017-01-26	2017-07-24	2017-07-24	al dimeola - egyptian danza LISTEN AND LEARN	NaN	NaN	NaT	NaT	NaN	NaN	School	NaN	Thu	179 days
419	200785803637408	2016-10-23	2017-07-26	2017-07-26	watch shit	NaN	NaN	NaT	NaT	NaN	howls moving castle\nnausicaa\nyeh jawaani hai...	School	NaN	Sun	276 days
512	177397329054770	2016-09-06	2017-05-18	2017-05-18	NaN	NaN	NaN	NaT	NaT	NaN	NaN	School	NaN	Tue	254 days
540	177397329054778	2016-09-06	2017-05-18	2017-05-18	NaN	NaN	NaN	NaT	NaT	NaN	NaN	School	NaN	Tue	254 days
638	177397329054780	2016-09-06	2017-05-18	2017-05-18	NaN	NaN	NaN	NaT	NaT	NaN	NaN	School	NaN	Tue	254 days
668	179665031917793	2016-09-12	2017-05-18	2017-05-18	NaN	NaN	NaN	NaT	NaT	NaN	NaN	School	NaN	Mon	248 days
692	177397329054776	2016-09-06	2017-05-18	2017-05-18	NaN	NaN	NaN	NaT	NaT	NaN	NaN	School	NaN	Tue	254 days
724	207879281133328	2016-11-06	2017-02-15	2017-02-15	figure out piano chords from HATE verses	NaN	NaN	NaT	NaT	NaN	NaN	School	NaN	Sun	101 days
733	201648184747252	2016-10-24	2017-01-23	2017-01-23	transcribe good voice recordings	NaN	NaN	NaT	NaT	NaN	NaN	School	NaN	Mon	91 days
734	201648184747254	2016-10-24	2017-02-15	2017-02-15	transcribe dank part from Skin Deep	NaN	NaN	NaT	NaT	NaN	NaN	School	NaN	Mon	114 days
1058	432400137331715	2017-09-18	2017-12-07	2017-12-07	hit up Ben for the db seeds	NaN	NaN	NaT	NaT	NaN	NaN	Tasks	NaN	Mon	80 days
1238	95110234477969	2016-02-27	2016-07-05	2016-07-05	Switch to google charts	Sean Kelley	seangtkelley@gmail.com	NaT	NaT	NaN	NaN	Hampstead WX	NaN	Sat	129 days

From what I can tell, about half of these are not actually tasks but rather sections. Asana interestingly still allows you to mark a section as complete which is why my sections for classes like Comp Sci 220 and Stats 515 show up.

The others are a mixed bag. A lot are music projects I was working that obviously took a while. Another task was a global task I had with a running list of all the movies I need to watch.

Let’s go deeper into the histogram. Here, I split the bars by their respective project and only look at the tasks that took less than 30 days to complete.

trace1 = go.Bar(
    x=df[(df['Duration'].astype('timedelta64[D]') < 30) & (df['Projects'] == 'School')]['Duration'].value_counts().keys().days,
    y=df[(df['Duration'].astype('timedelta64[D]') < 30) & (df['Projects'] == 'School')]['Duration'].value_counts().values,
    name='School'
)
trace2 = go.Bar(
    x=df[(df['Duration'].astype('timedelta64[D]') < 30) & (df['Projects'] == 'Tasks')]['Duration'].value_counts().keys().days,
    y=df[(df['Duration'].astype('timedelta64[D]') < 30) & (df['Projects'] == 'Tasks')]['Duration'].value_counts().values,
    name='Tasks'
)
data = [trace1, trace2]
layout = go.Layout(
    barmode='group'
)

fig = go.Figure(data=data, layout=layout)
iplot(fig, filename='grouped-bar')

It seems between both projects, the distribution of the tasks is very similar. Most tasks are completed within the two weeks after creation. This result definitely highlights the intended purpose of Asana (or at least how I use it). Personally, any deadline that is more than a month away I usually put in Google Calendar, rather than create a task in Asana.

Next, let’s see if we can figure out what type of tasks usually take longer to complete. I will once again use the fantastic word_cloud library by amueller.

# concatenate all name fields from tasks separated by duration of 3 days
less_text = ' '.join(list(df[df['Duration'].astype('timedelta64[D]') < 3]['Name'].dropna()))
grtr_text = ' '.join(list(df[df['Duration'].astype('timedelta64[D]') >= 3]['Name'].dropna()))

# remove any punctuation
less_text = ''.join(ch for ch in less_text if ch not in punct)
grtr_text = ''.join(ch for ch in grtr_text if ch not in punct)

# remove stopwords
less_text = ' '.join([word for word in less_text.split() if word not in cachedStopWords])
grtr_text = ' '.join([word for word in grtr_text.split() if word not in cachedStopWords])

# create wordclouds
less_wordcloud = WordCloud(background_color="white", max_words=1000, margin=10,random_state=1).generate(less_text)
grtr_wordcloud = WordCloud(background_color="white", max_words=1000, margin=10,random_state=1).generate(grtr_text)

# display wordclouds using matplotlib
f, (ax1, ax2) = plt.subplots(1, 2, sharex=True)
f.set_size_inches(18, 5)
ax1.imshow(less_wordcloud, interpolation="bilinear")
ax1.set_title('<3 days', fontsize=36)
ax1.axis("off")
ax2.imshow(grtr_wordcloud, interpolation="bilinear")
ax2.set_title('>=3 days', fontsize=36)
ax2.axis("off")

(-0.5, 399.5, 199.5, -0.5)

png

This is by far my favorite visualization. Right away we can see the difference between long and short term tasks.

Some of the top words for the less than three days category are email, note, add, read, and print. Each of these tasks are relatively simple and usually require at most a few actions.

Some of the top words for the greater than or equal to three days category are project, quiz, note, create, make, and research. In contrast, each one of these tasks evidently require more effort and take a lot longer.

It’s interesting that note was a top word in both wordclouds. I can only assume that notes are not high priority for me so they are equally likely to be done immediately or postponed.

Next, let’s take a look at overdue tasks. I haven’t started putting due dates on tasks until recently so there isn’t as much good data but we’ll work with what we’ve got.

Like the Duration field, I again use the timedelta functionality for the datetime objects. In this case, positive deltas will represent tasks completed after the due date and negative deltas tasks completed before the due date.

df['Overdue'] = df['Completed At'] - df['Due Date']

trace1 = go.Bar(
    x=df[(df['Projects'] == 'School')]['Overdue'].value_counts().keys().days,
    y=df[(df['Projects'] == 'School')]['Overdue'].value_counts().values,
    name='School'
)
trace2 = go.Bar(
    x=df[(df['Projects'] == 'Tasks')]['Overdue'].value_counts().keys().days,
    y=df[(df['Projects'] == 'Tasks')]['Overdue'].value_counts().values,
    name='Tasks'
)
data = [trace1, trace2]
layout = go.Layout(
    barmode='group'
)

fig = go.Figure(data=data, layout=layout)
iplot(fig, filename='grouped-bar')

I was happy to see that for the most part, I complete tasks before their set due dates.

Let’s do a similar wordcloud analysis of the overdue tasks to see what might be causing the delay in completion. Here, I’m going to split the tasks into three sections: completed more than one day before due date, completed on same day as due date, and completed after due date.

# concatenate all name fields from overdue tasks
before_text = ' '.join(list(df[df['Overdue'].astype('timedelta64[D]') < 0]['Name'].dropna()))
sameday_text = ' '.join(list(df[df['Overdue'].astype('timedelta64[D]') == 0]['Name'].dropna()))
overdue_text = ' '.join(list(df[df['Overdue'].astype('timedelta64[D]') > 0]['Name'].dropna()))

# remove any punctuation
before_text = ''.join(ch for ch in before_text if ch not in punct)
sameday_text = ''.join(ch for ch in sameday_text if ch not in punct)
overdue_text = ''.join(ch for ch in overdue_text if ch not in punct)

# remove stopwords
before_text = ' '.join([word for word in before_text.split() if word not in cachedStopWords])
sameday_text = ' '.join([word for word in sameday_text.split() if word not in cachedStopWords])
overdue_text = ' '.join([word for word in overdue_text.split() if word not in cachedStopWords])

# create wordclouds
before_wordcloud = WordCloud(background_color="white", max_words=1000, margin=10,random_state=1).generate(before_text)
sameday_wordcloud = WordCloud(background_color="white", max_words=1000, margin=10,random_state=1).generate(sameday_text)
overdue_wordcloud = WordCloud(background_color="white", max_words=1000, margin=10,random_state=1).generate(overdue_text)

# display wordclouds using matplotlib
f, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, sharex=True)
f.set_size_inches(18, 10)
ax1.imshow(before_wordcloud, interpolation="bilinear")
ax1.set_title('Completed Before', fontsize=36)
ax1.axis("off")
ax2.imshow(sameday_wordcloud, interpolation="bilinear")
ax2.set_title('Completed Same Day', fontsize=36)
ax2.axis("off")
ax3.imshow(overdue_wordcloud, interpolation="bilinear")
ax3.set_title('Overdue', fontsize=36)
ax3.axis("off")
ax4.axis("off")

(-0.5, 399.5, 0.0, 1.0)

png

As expected, we some similar words to those in the Duration analysis but there are also some new ones.

In the completed before wordcloud, the words tarea and Voces come up which are actually Spanish for homework and Voices respectively. I study Spanish at University and for some reason, I always write tarea instead of homework just for homework in my Spanish classes. And Voces was the name of a latin american poetry textbook we had one semester in which we often had reading assignments.

Probably the most revealing worldcloud is the overdue one. It’s clear that I have a problem with starting tasks.

Conclusion

Hopefully this post shows just a glimpse of the insights you can gleen from diving into your own metadata. Many productivity applications have options for downloading your data in bulk and I would implore you to give it a shot. Perhaps you can use your discoveries to improve your work habits to become or lifestyle!

By Sean Kelley
On June 05, 2018