Asana Data Revisited: Fall 2018 Semester In Review

I previously did a quick analysis of the data from Asana, the website I use to keep track of all my tasks.

Since it’s been about 6 months, I thought it would be interesting to take another look to see which trends remained and which changed.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
import plotly.graph_objs as go
from plotly import tools
import cufflinks as cf
from wordcloud import WordCloud
import datetime
import collections
from lib import custom_utils

init_notebook_mode(connected=True)
cf.set_config_file(world_readable=True, offline=True)

[nltk_data] Downloading package stopwords to /home/sean/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!

Notice I import a module called custom_utils. I recently started a project to keep track of my personal health and development and I found the need to reuse functions often. The file can be found here.

Data

Unlike last time, I’ll only be pulling the tasks related to school. Just before the start of last semester (Fall 2018), I created a new project using the Boards layout. This allows me to better organize my task by class.

First, let’s load that data.

f18_df = pd.read_csv('asana-umass-f18.csv', parse_dates=[1, 2, 3, 8, 9])
f18_df.head()

	Task ID	Created At	Completed At	Last Modified	Name	Column	Assignee	Assignee Email	Start Date	Due Date	Tags	Notes	Projects	Parent Task
0	949672124046393	2018-12-17	2018-12-19	2018-12-19	training script to take only outputs for annot...	Research	NaN	NaN	NaT	NaT	NaN	NaN	UMass	NaN
1	949607828976735	2018-12-17	2018-12-18	2018-12-18	make generate preds output .npy file with pred...	Research	NaN	NaN	NaT	NaT	NaN	NaN	UMass	NaN
2	949607828976733	2018-12-17	2018-12-18	2018-12-18	option for generate preds script for not rotat...	Research	NaN	NaN	NaT	NaT	NaN	NaN	UMass	NaN
3	949607828976731	2018-12-17	2018-12-18	2018-12-18	make generate preds scripts output rboxes inst...	Research	NaN	NaN	NaT	NaT	NaN	NaN	UMass	NaN
4	949607828976727	2018-12-17	2018-12-18	2018-12-18	nms	Research	NaN	NaN	NaT	NaT	NaN	NaN	UMass	NaN

Next, we can load the previous data to get some sweet comparison visualizations.

old_df = pd.read_csv('School.csv', parse_dates=[1, 2, 3, 7, 8])
old_df.tail()

	Task ID	Created At	Completed At	Last Modified	Name	Assignee	Assignee Email	Start Date	Due Date	Tags	Notes	Projects	Parent Task
804	351764138393835	2017-05-29	2017-05-30	2017-05-30	brand new congress questions	NaN	NaN	NaT	NaT	NaN	NaN	School	NaN
805	351764138393836	2017-05-29	2017-05-29	2017-05-29	sign up for learnupon brand new congress	NaN	NaN	NaT	NaT	NaN	NaN	School	NaN
806	351764138393838	2017-05-29	2017-06-28	2017-06-28	create base for Hodler app	NaN	NaN	NaT	NaT	NaN	NaN	School	NaN
807	351779687262554	2017-05-29	2017-06-28	2017-06-28	battery life on lappy ubuntu	NaN	NaN	NaT	NaT	NaN	NaN	School	NaN
808	356635261007682	2017-06-05	2017-06-28	2017-06-28	bnc volunteer training	NaN	NaN	NaT	NaT	NaN	NaN	School	NaN

all_df = pd.concat([old_df, f18_df], verify_integrity=True, ignore_index=True, sort=True)
all_df.head()

	Assignee	Assignee Email	Column	Completed At	Created At	Due Date	Last Modified	Name	Notes	Parent Task	Projects	Start Date	Tags	Task ID
0	NaN	NaN	NaN	2018-04-15	2018-04-15	NaT	2018-04-15	More debt	NaN	NaN	School	NaT	NaN	148623786710031
1	NaN	NaN	NaN	2018-04-07	2018-04-06	NaT	2018-04-07	Send one line email to erik to add you to the ...	NaN	NaN	School	NaT	NaN	148623786710030
2	NaN	NaN	NaN	2018-03-27	2018-03-27	NaT	2018-03-27	withdraw from study abroad	NaN	NaN	School	NaT	NaN	610060357624798
3	NaN	NaN	NaN	2018-03-26	2018-03-09	NaT	2018-03-26	hold	NaN	NaN	School	NaT	NaN	588106896688257
4	NaN	NaN	NaN	2018-02-23	2018-02-22	2018-02-23	2018-02-23	find joydeep	NaN	NaN	School	NaT	NaN	570162249229318

To help retain our sanity, let’s define colors for each semester.

all_color = 'rgba(219, 64, 82, 0.7)'
old_color = 'rgba(63, 81, 191, 1.0)'
f18_color = 'rgba(33, 150, 255, 1.0)'

Task Creation Day of Week Comparison

Let’s see if tasks where still created with the same daily frequencies. Since there are much more tasks in the old data, we can normalize the value counts for a fair comparison. For the sake of keeping the code clean and the x-axis in order, I decided keep the days of the week as numbers. For reference, 0 is Monday.

old_df['Created At DOW'] = old_df['Created At'].dt.dayofweek
f18_df['Created At DOW'] = f18_df['Created At'].dt.dayofweek

trace1 = go.Bar(
    x=old_df['Created At DOW'].value_counts(normalize=True).keys(),
    y=old_df['Created At DOW'].value_counts(normalize=True).values,
    name='Old Data',
    marker={
        'color': old_color
    }
)
trace2 = go.Bar(
    x=f18_df['Created At DOW'].value_counts(normalize=True).keys(),
    y=f18_df['Created At DOW'].value_counts(normalize=True).values,
    name='Fall 18',
    marker={
        'color': f18_color
    }
)

data = [trace1, trace2]
layout = go.Layout(
    barmode='group'
)

fig = go.Figure(data=data, layout=layout)
iplot(fig, filename='DOW Comparison')

This was quite a surprise. The days when I created tasks seems to have changed somewhat this semester.

Let’s check if the overall trend has remained the same.

all_df['Created At DOW'] = all_df['Created At'].dt.dayofweek

trace1 = go.Bar(
    x=all_df['Created At DOW'].value_counts(normalize=True).keys(),
    y=all_df['Created At DOW'].value_counts(normalize=True).values,
    name='All Data',
    marker={
        'color': all_color
    }
)

data = [trace1]
layout = go.Layout(
    barmode='group'
)

fig = go.Figure(data=data, layout=layout)
iplot(fig, filename='DOW Comparison')

There are definitely some small changes. Thursday caught up to Wednesday and Monday is catching up to Tuesday. However, the overall trend that I create the majority of my tasks at the beginning of the week remains strong.

Completion Time

Next, let’s look at the duration it took for me to complete each task. Because I used the parse_dates parameter when importing the CSVs, using the minus operator will return timedelta objects. Since Asana only provided dates without time, tasks with a duration of 0 days are ones that were created and completed on the same day.

Having already found outliers in the last analysis, let’s only consider only tasks that took less than 30 days to complete. Again, we normalize for a better comparison.

old_df['Duration'] = (old_df['Completed At'] - old_df['Created At'])
f18_df['Duration'] = (f18_df['Completed At'] - f18_df['Created At'])

trace1 = go.Bar(
    x=old_df[(old_df['Duration'].astype('timedelta64[D]') < 30)]['Duration'].value_counts(normalize=True).keys().days,
    y=old_df[(old_df['Duration'].astype('timedelta64[D]') < 30)]['Duration'].value_counts(normalize=True).values,
    name='Old Data',
    marker={
        'color': old_color
    }
)
trace2 = go.Bar(
    x=f18_df[(f18_df['Duration'].astype('timedelta64[D]') < 30)]['Duration'].value_counts(normalize=True).keys().days,
    y=f18_df[(f18_df['Duration'].astype('timedelta64[D]') < 30)]['Duration'].value_counts(normalize=True).values,
    name='Fall 18',
    marker={
        'color': f18_color
    }
)
data = [trace1, trace2]
layout = go.Layout(
    barmode='group'
)

fig = go.Figure(data=data, layout=layout)
iplot(fig, filename='grouped-bar')

Now this is interesting! For the most part, the time it takes me to complete tasks seems to have remained relatively the same. However, this semester it seems I create more tasks that take around a week to complete. In addition, I made less tasks that were completed on the same day.

Next, like last time, let’s see if we can figure out what type of tasks usually take longer to complete. I will once again use the fantastic word_cloud library by amueller.

# concatenate all name fields from tasks separated by duration of 3 days
old_less_text = ' '.join(list(old_df[old_df['Duration'].astype('timedelta64[D]') < 3]['Name'].dropna()))
old_grtr_text = ' '.join(list(old_df[old_df['Duration'].astype('timedelta64[D]') >= 3]['Name'].dropna()))

f18_less_text = ' '.join(list(f18_df[f18_df['Duration'].astype('timedelta64[D]') < 3]['Name'].dropna()))
f18_grtr_text = ' '.join(list(f18_df[f18_df['Duration'].astype('timedelta64[D]') >= 3]['Name'].dropna()))

# prep text
old_less_text = custom_utils.prep_text_for_wordcloud(old_less_text)
old_grtr_text = custom_utils.prep_text_for_wordcloud(old_grtr_text)

f18_less_text = custom_utils.prep_text_for_wordcloud(f18_less_text)
f18_grtr_text = custom_utils.prep_text_for_wordcloud(f18_grtr_text)

# get word frequencies
old_less_counts = dict(collections.Counter(old_less_text.split()))
old_grtr_counts = dict(collections.Counter(old_grtr_text.split()))

f18_less_counts = dict(collections.Counter(f18_less_text.split()))
f18_grtr_counts = dict(collections.Counter(f18_grtr_text.split()))

# create wordclouds
old_less_wordcloud = WordCloud(background_color="white", max_words=1000, margin=10,random_state=1).generate_from_frequencies(old_less_counts)
old_grtr_wordcloud = WordCloud(background_color="white", max_words=1000, margin=10,random_state=1).generate_from_frequencies(old_grtr_counts)

f18_less_wordcloud = WordCloud(background_color="white", max_words=1000, margin=10,random_state=1).generate_from_frequencies(f18_less_counts)
f18_grtr_wordcloud = WordCloud(background_color="white", max_words=1000, margin=10,random_state=1).generate_from_frequencies(f18_grtr_counts)

# display wordclouds using matplotlib
f, axes = plt.subplots(2, 2, sharex=True)
f.set_size_inches(18, 10)
axes[0, 0].imshow(old_less_wordcloud, interpolation="bilinear")
axes[0, 0].set_title('Old <3 days', fontsize=36)
axes[0, 0].axis("off")
axes[0, 1].imshow(old_grtr_wordcloud, interpolation="bilinear")
axes[0, 1].set_title('Old >=3 days', fontsize=36)
axes[0, 1].axis("off")

axes[1, 0].imshow(f18_less_wordcloud, interpolation="bilinear")
axes[1, 0].set_title('F18 <3 days', fontsize=36)
axes[1, 0].axis("off")
axes[1, 1].imshow(f18_grtr_wordcloud, interpolation="bilinear")
axes[1, 1].set_title('F18 >=3 days', fontsize=36)
axes[1, 1].axis("off")

(-0.5, 399.5, 199.5, -0.5)

png

A few things changed this semester. The research project I was on during the semester was pretty demanding so a lot of those tasks show up like image, preds, or synthtext. Also, since none of my classes had projects this semester, that doesn’t show up.

However, some things remained the same. Homework usually takes more than 3 days and lecture notes are either done quickly or they get put off because other tasks are more important.

Overdue Tasks

Next, let’s take a look at overdue tasks.

old_df['Overdue'] = old_df['Completed At'] - old_df['Due Date']
f18_df['Overdue'] = f18_df['Completed At'] - f18_df['Due Date']

trace1 = go.Bar(
    x=old_df['Overdue'].value_counts(normalize=True).keys().days,
    y=old_df['Overdue'].value_counts(normalize=True).values,
    name='Old Data',
    marker={
        'color': old_color
    }
)
trace2 = go.Bar(
    x=f18_df['Overdue'].value_counts(normalize=True).keys().days,
    y=f18_df['Overdue'].value_counts(normalize=True).values,
    name='Fall 18',
    marker={
        'color': f18_color
    }
)
data = [trace1, trace2]
layout = go.Layout(
    barmode='group'
)

fig = go.Figure(data=data, layout=layout)
iplot(fig, filename='grouped-bar')

Seems like I did alright staying on top of things this semester.

Again, let’s use wordclouds to check out what might be causing me to miss due dates.

# concatenate all name fields from overdue tasks
old_before_text = ' '.join(list(old_df[old_df['Overdue'].astype('timedelta64[D]') < 0]['Name'].dropna()))
old_sameday_text = ' '.join(list(old_df[old_df['Overdue'].astype('timedelta64[D]') == 0]['Name'].dropna()))
old_overdue_text = ' '.join(list(old_df[old_df['Overdue'].astype('timedelta64[D]') > 0]['Name'].dropna()))

f18_before_text = ' '.join(list(f18_df[f18_df['Overdue'].astype('timedelta64[D]') < 0]['Name'].dropna()))
f18_sameday_text = ' '.join(list(f18_df[f18_df['Overdue'].astype('timedelta64[D]') == 0]['Name'].dropna()))
f18_overdue_text = ' '.join(list(f18_df[f18_df['Overdue'].astype('timedelta64[D]') > 0]['Name'].dropna()))

# prep text
old_before_text = custom_utils.prep_text_for_wordcloud(old_before_text)
old_sameday_text = custom_utils.prep_text_for_wordcloud(old_sameday_text)
old_overdue_text = custom_utils.prep_text_for_wordcloud(old_overdue_text)

f18_before_text = custom_utils.prep_text_for_wordcloud(f18_before_text)
f18_sameday_text = custom_utils.prep_text_for_wordcloud(f18_sameday_text)
f18_overdue_text = custom_utils.prep_text_for_wordcloud(f18_overdue_text)

# get word frequencies
old_before_counts = dict(collections.Counter(old_before_text.split()))
old_sameday_counts = dict(collections.Counter(old_sameday_text.split()))
old_overdue_counts = dict(collections.Counter(old_overdue_text.split()))

f18_before_counts = dict(collections.Counter(f18_before_text.split()))
f18_sameday_counts = dict(collections.Counter(f18_sameday_text.split()))
f18_overdue_counts = dict(collections.Counter(f18_overdue_text.split()))

# create wordclouds
old_before_wordcloud = WordCloud(background_color="white", max_words=1000, margin=10,random_state=1).generate_from_frequencies(old_before_counts)
old_sameday_wordcloud = WordCloud(background_color="white", max_words=1000, margin=10,random_state=1).generate_from_frequencies(old_sameday_counts)
old_overdue_wordcloud = WordCloud(background_color="white", max_words=1000, margin=10,random_state=1).generate_from_frequencies(old_overdue_counts)

f18_before_wordcloud = WordCloud(background_color="white", max_words=1000, margin=10,random_state=1).generate_from_frequencies(f18_before_counts)
f18_sameday_wordcloud = WordCloud(background_color="white", max_words=1000, margin=10,random_state=1).generate_from_frequencies(f18_sameday_counts)
f18_overdue_wordcloud = WordCloud(background_color="white", max_words=1000, margin=10,random_state=1).generate_from_frequencies(f18_overdue_counts)

# display wordclouds using matplotlib
f, axes = plt.subplots(4, 2, sharex=True)
f.set_size_inches(18, 20)
axes[0, 0].imshow(old_before_wordcloud, interpolation="bilinear")
axes[0, 0].set_title('Old Completed Before', fontsize=36)
axes[0, 0].axis("off")
axes[0, 1].imshow(old_sameday_wordcloud, interpolation="bilinear")
axes[0, 1].set_title('Old Completed Same Day', fontsize=36)
axes[0, 1].axis("off")
axes[1, 0].imshow(old_overdue_wordcloud, interpolation="bilinear")
axes[1, 0].set_title('Old Overdue', fontsize=36)
axes[1, 0].axis("off")
axes[1, 1].axis("off")

axes[2, 0].imshow(f18_before_wordcloud, interpolation="bilinear")
axes[2, 0].set_title('F18 Completed Before', fontsize=36)
axes[2, 0].axis("off")
axes[2, 1].imshow(f18_sameday_wordcloud, interpolation="bilinear")
axes[2, 1].set_title('F18 Completed Same Day', fontsize=36)
axes[2, 1].axis("off")
axes[3, 0].imshow(f18_overdue_wordcloud, interpolation="bilinear")
axes[3, 0].set_title('F18 Overdue', fontsize=36)
axes[3, 0].axis("off")
axes[3, 1].axis("off")

(-0.5, 399.5, 0.0, 1.0)

png

Busiest Class this Semester

Since I began to track the class for each task, we can check out which class was my busiest this semester.

# https://community.plot.ly/t/setting-up-pie-charts-subplots-with-an-appropriate-size-and-spacing/5066
domain1={'x': [0, 1], 'y': [0, 1]}#cell (1,1)

fig = {
  "data": [
    {
      "values": f18_df['Column'].value_counts().values,
      "labels": f18_df['Column'].value_counts().keys(),
      'domain': domain1,
      "name": "Fall 18",
      "hoverinfo":"label+percent+name",
      "hole": .4,
      "type": "pie"
    }],
  "layout": {
        "annotations": [
            {
                "font": {
                    "size": 15
                },
                "showarrow": False,
                "text": "Fall 2018",
                "x": 0.5,
                "y": 0.5
            }
        ]
    }
}

iplot(fig, filename='donut')

Due Date Frequency this Semester

We can also see when my tasks were due.

trace1 = go.Bar(
    x=f18_df['Due Date'].dropna().value_counts().keys(),
    y=f18_df['Due Date'].dropna().value_counts().values,
    name='Fall 18',
    marker={
        'color': f18_color
    }
)

data = [trace1]
iplot(data, filename='due date freq')

Conclusion

Hopefully this post show the motivation and potential benefit of revisiting a previous analysis in an attempt to find any significant changes. In the context of personal development, doing so can help you track your progress and achieve your goals. I still definitely need to put more effort into taking notes on time!

By Sean Kelley
On January 07, 2019