What is central tendency?
Computing mean, median, and mode.
Visual representation using a program.
Let's start with an example. Let's say your Geography class gives you a pop quiz. As your teacher
is handing back the papers he stops at your desk and hand you back your quiz. Written at the top is
3/5. How do you react to this score? Are you happy or are you disappointed? You might calculate that
you got a 60% or you might ask your neighbors. Let's say your neighbor got a 1/5; you'd feel pretty
good. But what if they got a 5/5? How do you accurately reason how well you did? This is where central
tendency comes into play. There are 3 primary values that we talk about when we discuss central
tendency: mean, median, and mode.
Let's say we have five numbers. We will use these five numbers throughout the rest of our examples:
10, 0, 1, 3, 1
Our first value is the mean. We can calculate the mean by adding up all of the numbers and then dividing
the sum by the number of items you have in the list. Sometimes this is referred to as the average.
10, 0, 1, 3, 1
What is the mean of the numbers above?
The median is the middle value in the list, so we first need to make sure that the list is put into a sorted order.
Let's sort our numbers from the example above:
0, 1, 1, 3, 10
What is the median of these numbers?
The mode is simply the number that appears most frequently. Since our list is already sorted this will be easy.
0, 1, 1, 3, 10
Which number appears most frequently?
0, 0, 0, 2, 6, 10
10, 1, 8, 3, 5, 15, 0, 12, 2, 940, 16
0, 5, 1, 2, 6, 1, 2, 4
Calculating the mean, median, and mode can be a bit tedious, especially if your list has many numbers in it. Let's make things a little more interesting by writing a program to calculate and plot these values for us!
First we need to get a starting program and some data to use:
The datasets contain lists of numbers indicating how many text messages several women and several men, respectively, sent in one month. We're going to write a program to compare these datasets. Let's start by taking a look at cTendency.py. Open a terminal, then cd to your programs directory and open the file in gedit:
Our program has definitions for several functions (to review Python function definitions, go here). There's a function to calculate the mean, one to calculate the median, and one to calculate the mode. Can you find the function which calculates the mean?
These functions aren't quite finished. Each of the functions is missing two lines. Everywhere something is missing, there's a comment saying # FILL ME IN. We'll now work together to replace these comments with working code!
Once you've finished all three functions, open the two datasets using gedit and examine the values within. Make some guesses about what the mean, median, and mode of each dataset is, and then record your guesses in your program by replacing the appropriate instance of YOURGUESS near the bottom of the file.
You can now run the program to see how the datasets compare. Go back to your terminal and start IPython, then run your program:
This will create a line plot of the data. The line plot shows us the distribution of the data. How close were your estimates?
Save your plot for your website. In addition to these questions, please answer the following on your website: