Session 3: Exploring Data - Module B: Trends & Seasonality

1. Overview

Plot weather data with a computer.

2. Setting Up


  1. Download the Python program weatherData.py to your programs directory.
  2. Download the dataset meanDailyTemp to your programs directory.
  3. Open weatherData.py in gedit to view and edit the program.

3. Writing the Program

Now you can write a program that will plot the data from a dataset. The dataset contains a long sequence of data points recorded over time (in this case, every day for four years). We call this a time series. In this particular dataset, the data recorded is the average temperature for that day.

Before you can plot the data, however, you need to load it into memory. We can use the loadData() function from our plotting library to do this, like we've done in a couple previous modules.

Once the data is in the computer's memory, we pass it to the function drawPoints() to draw the points on a plot, just like when we plotted arithmetic sequences and when we made scatter plots.

4. Examine the Output


  1. Open a terminal window.
  2. Change the directory to your programs directory:

    cd programs/

  3. Start IPython:

    ipython

  4. From IPython, you can run your program by typing:

    run weatherData.py

What do you notice about the plot? What are the differences from the plots that you made earlier?

Based on what you learned, is this an arithmetic sequence, geometric sequence, or neither?

Right now, the program is only plotting one point for a week's worth of data. Try changing the program so that it plots one data point for every day of the week. (Hint: examine the code: how many days are in a week?)

How would you change the program to plot only one point for each month? Does a single point accurately represent the temperatures of an entire month? What about all the other points that were left out? Can you think of a way to make the plots more accurate?

Don't forget to save the plot and answer these questions on your webpage!

5. Bonus: Other Datasets

Take a look at the datasets on the datasets page under Time-series and download a new one which looks interesting. Change your program to plot this new dataset. Run your program and observe the plot.

Save a copy of the plot. In addition to the questions at this link please answer these questions on your webpage:

The datasets page includes a description of what the values in each of the datasets represent; this could help you figure out what might cause any trends you see.