Data Analysis

Asana Data Revisited: Fall 2018 Semester In Review

I previously did a quick analysis of the data from Asana, the website I use to keep track of all my tasks.

Since it’s been about 6 months, I thought it would be interesting to take another look to see which trends remained and which changed.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
import plotly.graph_objs as go
from plotly import tools
import cufflinks as cf
from wordcloud import WordCloud
import datetime
import collections
from lib import custom_utils

init_notebook_mode(connected=True)
cf.set_config_file(world_readable=True, offline=True)
[nltk_data] Downloading package stopwords to /home/sean/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!

Notice I import a module called custom_utils. I recently started a project to keep track of my personal health and development and I found the need to reuse functions often. The file can be found here.

Data

Unlike last time, I’ll only be pulling the tasks related to school. Just before the start of last semester (Fall 2018), I created a new project using the Boards layout. This allows me to better organize my task by class.

First, let’s load that data.

f18_df = pd.read_csv('asana-umass-f18.csv', parse_dates=[1, 2, 3, 8, 9])
f18_df.head()
Task ID Created At Completed At Last Modified Name Column Assignee Assignee Email Start Date Due Date Tags Notes Projects Parent Task
0 949672124046393 2018-12-17 2018-12-19 2018-12-19 training script to take only outputs for annot... Research NaN NaN NaT NaT NaN NaN UMass NaN
1 949607828976735 2018-12-17 2018-12-18 2018-12-18 make generate preds output .npy file with pred... Research NaN NaN NaT NaT NaN NaN UMass NaN
2 949607828976733 2018-12-17 2018-12-18 2018-12-18 option for generate preds script for not rotat... Research NaN NaN NaT NaT NaN NaN UMass NaN
3 949607828976731 2018-12-17 2018-12-18 2018-12-18 make generate preds scripts output rboxes inst... Research NaN NaN NaT NaT NaN NaN UMass NaN
4 949607828976727 2018-12-17 2018-12-18 2018-12-18 nms Research NaN NaN NaT NaT NaN NaN UMass NaN

Next, we can load the previous data to get some sweet comparison visualizations.

old_df = pd.read_csv('School.csv', parse_dates=[1, 2, 3, 7, 8])
old_df.tail()
Task ID Created At Completed At Last Modified Name Assignee Assignee Email Start Date Due Date Tags Notes Projects Parent Task
804 351764138393835 2017-05-29 2017-05-30 2017-05-30 brand new congress questions NaN NaN NaT NaT NaN NaN School NaN
805 351764138393836 2017-05-29 2017-05-29 2017-05-29 sign up for learnupon brand new congress NaN NaN NaT NaT NaN NaN School NaN
806 351764138393838 2017-05-29 2017-06-28 2017-06-28 create base for Hodler app NaN NaN NaT NaT NaN NaN School NaN
807 351779687262554 2017-05-29 2017-06-28 2017-06-28 battery life on lappy ubuntu NaN NaN NaT NaT NaN NaN School NaN
808 356635261007682 2017-06-05 2017-06-28 2017-06-28 bnc volunteer training NaN NaN NaT NaT NaN NaN School NaN
all_df = pd.concat([old_df, f18_df], verify_integrity=True, ignore_index=True, sort=True)
all_df.head()
Assignee Assignee Email Column Completed At Created At Due Date Last Modified Name Notes Parent Task Projects Start Date Tags Task ID
0 NaN NaN NaN 2018-04-15 2018-04-15 NaT 2018-04-15 More debt NaN NaN School NaT NaN 148623786710031
1 NaN NaN NaN 2018-04-07 2018-04-06 NaT 2018-04-07 Send one line email to erik to add you to the ... NaN NaN School NaT NaN 148623786710030
2 NaN NaN NaN 2018-03-27 2018-03-27 NaT 2018-03-27 withdraw from study abroad NaN NaN School NaT NaN 610060357624798
3 NaN NaN NaN 2018-03-26 2018-03-09 NaT 2018-03-26 hold NaN NaN School NaT NaN 588106896688257
4 NaN NaN NaN 2018-02-23 2018-02-22 2018-02-23 2018-02-23 find joydeep NaN NaN School NaT NaN 570162249229318

To help retain our sanity, let’s define colors for each semester.

all_color = 'rgba(219, 64, 82, 0.7)'
old_color = 'rgba(63, 81, 191, 1.0)'
f18_color = 'rgba(33, 150, 255, 1.0)'

Task Creation Day of Week Comparison

Let’s see if tasks where still created with the same daily frequencies. Since there are much more tasks in the old data, we can normalize the value counts for a fair comparison. For the sake of keeping the code clean and the x-axis in order, I decided keep the days of the week as numbers. For reference, 0 is Monday.

old_df['Created At DOW'] = old_df['Created At'].dt.dayofweek
f18_df['Created At DOW'] = f18_df['Created At'].dt.dayofweek
trace1 = go.Bar(
    x=old_df['Created At DOW'].value_counts(normalize=True).keys(),
    y=old_df['Created At DOW'].value_counts(normalize=True).values,
    name='Old Data',
    marker={
        'color': old_color
    }
)
trace2 = go.Bar(
    x=f18_df['Created At DOW'].value_counts(normalize=True).keys(),
    y=f18_df['Created At DOW'].value_counts(normalize=True).values,
    name='Fall 18',
    marker={
        'color': f18_color
    }
)

data = [trace1, trace2]
layout = go.Layout(
    barmode='group'
)

fig = go.Figure(data=data, layout=layout)
iplot(fig, filename='DOW Comparison')

This was quite a surprise. The days when I created tasks seems to have changed somewhat this semester.

Let’s check if the overall trend has remained the same.

all_df['Created At DOW'] = all_df['Created At'].dt.dayofweek

trace1 = go.Bar(
    x=all_df['Created At DOW'].value_counts(normalize=True).keys(),
    y=all_df['Created At DOW'].value_counts(normalize=True).values,
    name='All Data',
    marker={
        'color': all_color
    }
)

data = [trace1]
layout = go.Layout(
    barmode='group'
)

fig = go.Figure(data=data, layout=layout)
iplot(fig, filename='DOW Comparison')

There are definitely some small changes. Thursday caught up to Wednesday and Monday is catching up to Tuesday. However, the overall trend that I create the majority of my tasks at the beginning of the week remains strong.

Completion Time

Next, let’s look at the duration it took for me to complete each task. Because I used the parse_dates parameter when importing the CSVs, using the minus operator will return timedelta objects. Since Asana only provided dates without time, tasks with a duration of 0 days are ones that were created and completed on the same day.

Having already found outliers in the last analysis, let’s only consider only tasks that took less than 30 days to complete. Again, we normalize for a better comparison.

old_df['Duration'] = (old_df['Completed At'] - old_df['Created At'])
f18_df['Duration'] = (f18_df['Completed At'] - f18_df['Created At'])
trace1 = go.Bar(
    x=old_df[(old_df['Duration'].astype('timedelta64[D]') < 30)]['Duration'].value_counts(normalize=True).keys().days,
    y=old_df[(old_df['Duration'].astype('timedelta64[D]') < 30)]['Duration'].value_counts(normalize=True).values,
    name='Old Data',
    marker={
        'color': old_color
    }
)
trace2 = go.Bar(
    x=f18_df[(f18_df['Duration'].astype('timedelta64[D]') < 30)]['Duration'].value_counts(normalize=True).keys().days,
    y=f18_df[(f18_df['Duration'].astype('timedelta64[D]') < 30)]['Duration'].value_counts(normalize=True).values,
    name='Fall 18',
    marker={
        'color': f18_color
    }
)
data = [trace1, trace2]
layout = go.Layout(
    barmode='group'
)

fig = go.Figure(data=data, layout=layout)
iplot(fig, filename='grouped-bar')

Now this is interesting! For the most part, the time it takes me to complete tasks seems to have remained relatively the same. However, this semester it seems I create more tasks that take around a week to complete. In addition, I made less tasks that were completed on the same day.

Next, like last time, let’s see if we can figure out what type of tasks usually take longer to complete. I will once again use the fantastic word_cloud library by amueller.

# concatenate all name fields from tasks separated by duration of 3 days
old_less_text = ' '.join(list(old_df[old_df['Duration'].astype('timedelta64[D]') < 3]['Name'].dropna()))
old_grtr_text = ' '.join(list(old_df[old_df['Duration'].astype('timedelta64[D]') >= 3]['Name'].dropna()))

f18_less_text = ' '.join(list(f18_df[f18_df['Duration'].astype('timedelta64[D]') < 3]['Name'].dropna()))
f18_grtr_text = ' '.join(list(f18_df[f18_df['Duration'].astype('timedelta64[D]') >= 3]['Name'].dropna()))

# prep text
old_less_text = custom_utils.prep_text_for_wordcloud(old_less_text)
old_grtr_text = custom_utils.prep_text_for_wordcloud(old_grtr_text)

f18_less_text = custom_utils.prep_text_for_wordcloud(f18_less_text)
f18_grtr_text = custom_utils.prep_text_for_wordcloud(f18_grtr_text)

# get word frequencies
old_less_counts = dict(collections.Counter(old_less_text.split()))
old_grtr_counts = dict(collections.Counter(old_grtr_text.split()))

f18_less_counts = dict(collections.Counter(f18_less_text.split()))
f18_grtr_counts = dict(collections.Counter(f18_grtr_text.split()))

# create wordclouds
old_less_wordcloud = WordCloud(background_color="white", max_words=1000, margin=10,random_state=1).generate_from_frequencies(old_less_counts)
old_grtr_wordcloud = WordCloud(background_color="white", max_words=1000, margin=10,random_state=1).generate_from_frequencies(old_grtr_counts)

f18_less_wordcloud = WordCloud(background_color="white", max_words=1000, margin=10,random_state=1).generate_from_frequencies(f18_less_counts)
f18_grtr_wordcloud = WordCloud(background_color="white", max_words=1000, margin=10,random_state=1).generate_from_frequencies(f18_grtr_counts)

# display wordclouds using matplotlib
f, axes = plt.subplots(2, 2, sharex=True)
f.set_size_inches(18, 10)
axes[0, 0].imshow(old_less_wordcloud, interpolation="bilinear")
axes[0, 0].set_title('Old <3 days', fontsize=36)
axes[0, 0].axis("off")
axes[0, 1].imshow(old_grtr_wordcloud, interpolation="bilinear")
axes[0, 1].set_title('Old >=3 days', fontsize=36)
axes[0, 1].axis("off")

axes[1, 0].imshow(f18_less_wordcloud, interpolation="bilinear")
axes[1, 0].set_title('F18 <3 days', fontsize=36)
axes[1, 0].axis("off")
axes[1, 1].imshow(f18_grtr_wordcloud, interpolation="bilinear")
axes[1, 1].set_title('F18 >=3 days', fontsize=36)
axes[1, 1].axis("off")
(-0.5, 399.5, 199.5, -0.5)

png

A few things changed this semester. The research project I was on during the semester was pretty demanding so a lot of those tasks show up like image, preds, or synthtext. Also, since none of my classes had projects this semester, that doesn’t show up.

However, some things remained the same. Homework usually takes more than 3 days and lecture notes are either done quickly or they get put off because other tasks are more important.

Overdue Tasks

Next, let’s take a look at overdue tasks.

old_df['Overdue'] = old_df['Completed At'] - old_df['Due Date']
f18_df['Overdue'] = f18_df['Completed At'] - f18_df['Due Date']
trace1 = go.Bar(
    x=old_df['Overdue'].value_counts(normalize=True).keys().days,
    y=old_df['Overdue'].value_counts(normalize=True).values,
    name='Old Data',
    marker={
        'color': old_color
    }
)
trace2 = go.Bar(
    x=f18_df['Overdue'].value_counts(normalize=True).keys().days,
    y=f18_df['Overdue'].value_counts(normalize=True).values,
    name='Fall 18',
    marker={
        'color': f18_color
    }
)
data = [trace1, trace2]
layout = go.Layout(
    barmode='group'
)

fig = go.Figure(data=data, layout=layout)
iplot(fig, filename='grouped-bar')

Seems like I did alright staying on top of things this semester.


Again, let’s use wordclouds to check out what might be causing me to miss due dates.

# concatenate all name fields from overdue tasks
old_before_text = ' '.join(list(old_df[old_df['Overdue'].astype('timedelta64[D]') < 0]['Name'].dropna()))
old_sameday_text = ' '.join(list(old_df[old_df['Overdue'].astype('timedelta64[D]') == 0]['Name'].dropna()))
old_overdue_text = ' '.join(list(old_df[old_df['Overdue'].astype('timedelta64[D]') > 0]['Name'].dropna()))

f18_before_text = ' '.join(list(f18_df[f18_df['Overdue'].astype('timedelta64[D]') < 0]['Name'].dropna()))
f18_sameday_text = ' '.join(list(f18_df[f18_df['Overdue'].astype('timedelta64[D]') == 0]['Name'].dropna()))
f18_overdue_text = ' '.join(list(f18_df[f18_df['Overdue'].astype('timedelta64[D]') > 0]['Name'].dropna()))

# prep text
old_before_text = custom_utils.prep_text_for_wordcloud(old_before_text)
old_sameday_text = custom_utils.prep_text_for_wordcloud(old_sameday_text)
old_overdue_text = custom_utils.prep_text_for_wordcloud(old_overdue_text)

f18_before_text = custom_utils.prep_text_for_wordcloud(f18_before_text)
f18_sameday_text = custom_utils.prep_text_for_wordcloud(f18_sameday_text)
f18_overdue_text = custom_utils.prep_text_for_wordcloud(f18_overdue_text)

# get word frequencies
old_before_counts = dict(collections.Counter(old_before_text.split()))
old_sameday_counts = dict(collections.Counter(old_sameday_text.split()))
old_overdue_counts = dict(collections.Counter(old_overdue_text.split()))

f18_before_counts = dict(collections.Counter(f18_before_text.split()))
f18_sameday_counts = dict(collections.Counter(f18_sameday_text.split()))
f18_overdue_counts = dict(collections.Counter(f18_overdue_text.split()))

# create wordclouds
old_before_wordcloud = WordCloud(background_color="white", max_words=1000, margin=10,random_state=1).generate_from_frequencies(old_before_counts)
old_sameday_wordcloud = WordCloud(background_color="white", max_words=1000, margin=10,random_state=1).generate_from_frequencies(old_sameday_counts)
old_overdue_wordcloud = WordCloud(background_color="white", max_words=1000, margin=10,random_state=1).generate_from_frequencies(old_overdue_counts)

f18_before_wordcloud = WordCloud(background_color="white", max_words=1000, margin=10,random_state=1).generate_from_frequencies(f18_before_counts)
f18_sameday_wordcloud = WordCloud(background_color="white", max_words=1000, margin=10,random_state=1).generate_from_frequencies(f18_sameday_counts)
f18_overdue_wordcloud = WordCloud(background_color="white", max_words=1000, margin=10,random_state=1).generate_from_frequencies(f18_overdue_counts)

# display wordclouds using matplotlib
f, axes = plt.subplots(4, 2, sharex=True)
f.set_size_inches(18, 20)
axes[0, 0].imshow(old_before_wordcloud, interpolation="bilinear")
axes[0, 0].set_title('Old Completed Before', fontsize=36)
axes[0, 0].axis("off")
axes[0, 1].imshow(old_sameday_wordcloud, interpolation="bilinear")
axes[0, 1].set_title('Old Completed Same Day', fontsize=36)
axes[0, 1].axis("off")
axes[1, 0].imshow(old_overdue_wordcloud, interpolation="bilinear")
axes[1, 0].set_title('Old Overdue', fontsize=36)
axes[1, 0].axis("off")
axes[1, 1].axis("off")

axes[2, 0].imshow(f18_before_wordcloud, interpolation="bilinear")
axes[2, 0].set_title('F18 Completed Before', fontsize=36)
axes[2, 0].axis("off")
axes[2, 1].imshow(f18_sameday_wordcloud, interpolation="bilinear")
axes[2, 1].set_title('F18 Completed Same Day', fontsize=36)
axes[2, 1].axis("off")
axes[3, 0].imshow(f18_overdue_wordcloud, interpolation="bilinear")
axes[3, 0].set_title('F18 Overdue', fontsize=36)
axes[3, 0].axis("off")
axes[3, 1].axis("off")
(-0.5, 399.5, 0.0, 1.0)

png

Busiest Class this Semester

Since I began to track the class for each task, we can check out which class was my busiest this semester.

# https://community.plot.ly/t/setting-up-pie-charts-subplots-with-an-appropriate-size-and-spacing/5066
domain1={'x': [0, 1], 'y': [0, 1]}#cell (1,1)

fig = {
  "data": [
    {
      "values": f18_df['Column'].value_counts().values,
      "labels": f18_df['Column'].value_counts().keys(),
      'domain': domain1,
      "name": "Fall 18",
      "hoverinfo":"label+percent+name",
      "hole": .4,
      "type": "pie"
    }],
  "layout": {
        "annotations": [
            {
                "font": {
                    "size": 15
                },
                "showarrow": False,
                "text": "Fall 2018",
                "x": 0.5,
                "y": 0.5
            }
        ]
    }
}

iplot(fig, filename='donut')

Due Date Frequency this Semester

We can also see when my tasks were due.

trace1 = go.Bar(
    x=f18_df['Due Date'].dropna().value_counts().keys(),
    y=f18_df['Due Date'].dropna().value_counts().values,
    name='Fall 18',
    marker={
        'color': f18_color
    }
)

data = [trace1]
iplot(data, filename='due date freq')

Conclusion

Hopefully this post show the motivation and potential benefit of revisiting a previous analysis in an attempt to find any significant changes. In the context of personal development, doing so can help you track your progress and achieve your goals. I still definitely need to put more effort into taking notes on time!