Session 3: Exploring Data - Module A: Mean, Median & Mode

1. Overview

What is central tendency?
Computing mean, median, and mode.
Visual representation using a program.

2. What is Central Tendency?

Let's start with an example. Let's say your Geography class gives you a pop quiz. As your teacher is handing back the papers he stops at your desk and hand you back your quiz. Written at the top is 3/5. How do you react to this score? Are you happy or are you disappointed? You might calculate that you got a 60% or you might ask your neighbors. Let's say your neighbor got a 1/5; you'd feel pretty good. But what if they got a 5/5? How do you accurately reason how well you did? This is where central tendency comes into play. There are 3 primary values that we talk about when we discuss central tendency: mean, median, and mode.

Let's say we have five numbers. We will use these five numbers throughout the rest of our examples:

10, 0, 1, 3, 1

3. Mean

Our first value is the mean. We can calculate the mean by adding up all of the numbers and then dividing the sum by the number of items you have in the list. Sometimes this is referred to as the average.

10, 0, 1, 3, 1
What is the mean of the numbers above?

4. Median

The median is the middle value in the list, so we first need to make sure that the list is put into a sorted order. Let's sort our numbers from the example above:

0, 1, 1, 3, 10
What is the median of these numbers?

5. Mode

The mode is simply the number that appears most frequently. Since our list is already sorted this will be easy.

0, 1, 1, 3, 10
Which number appears most frequently?

6. Additional Examples

0, 0, 0, 2, 6, 10
10, 1, 8, 3, 5, 15, 0, 12, 2, 940, 16
0, 5, 1, 2, 6, 1, 2, 4

7. Visualizing Central Tendency with a Program

Calculating the mean, median, and mode can be a bit tedious, especially if your list has many numbers in it. Let's make things a little more interesting by writing a program to calculate and plot these values for us!

First we need to get a starting program and some data to use:

  1. Download the program cTendency. Save it in your programs directory as cTendency.py.
  2. Download this dataset into your programs directory. Save it as textwomen.
  3. Download this dataset into your programs directory. Save it as textmen.

The datasets contain lists of numbers indicating how many text messages several women and several men, respectively, sent in one month. We're going to write a program to compare these datasets. Let's start by taking a look at cTendency.py. Open a terminal, then cd to your programs directory and open the file in gedit:

cd programs/
gedit cTendency.py&

Our program has definitions for several functions (to review Python function definitions, go here). There's a function to calculate the mean, one to calculate the median, and one to calculate the mode. Can you find the function which calculates the mean?

These functions aren't quite finished. Each of the functions is missing two lines. Everywhere something is missing, there's a comment saying # FILL ME IN. We'll now work together to replace these comments with working code!

Once you've finished all three functions, open the two datasets using gedit and examine the values within. Make some guesses about what the mean, median, and mode of each dataset is, and then record your guesses in your program by replacing the appropriate instance of YOURGUESS near the bottom of the file.

You can now run the program to see how the datasets compare. Go back to your terminal and start IPython, then run your program:

ipython
run cTendency.py

This will create a line plot of the data. The line plot shows us the distribution of the data. How close were your estimates?

Save your plot for your website. In addition to these questions, please answer the following on your website:

  1. Were you surprised by the results you got when you ran your code on the mean?
  2. Would you use the mean to explain this dataset? Why or why not?
  3. Using the median and the mode, explain why you got the results that you got for the mean.