What is a scatter plot?
What is a line of fit?
How do we fit a line to a plot?
Try fitting a curve to some scatter plots yourself. We have some scatter plots printed out for you up front. (It's also linked on the Datasets page.) Now go ahead and sketch a line of fit onto each plot. Try to have about equal numbers of points above the curve as below, and each individual point as close to your curve as you can manage. Not all of the plots will have a straight line as the best fitting curve.
Make a guess as to what sort of functions might match the curves you have drawn! (You can use the plotting function from the previous module to check your answers.)
Suppose you want to open up an animal shelter and take care of cats. You've surveyed a bunch of other animal shelters to find out how many cats they currently have, and how much money they spend on cat food per week. It might be nice to be able to see if there's any relationship between these two things. Let's make a scatter plot!
cd programs/
gedit lineFit.py&
ipython
run lineFit.py
Take a look at the scatter plot this produces. Can you tell what sort of function the line of fit might be?
Now let's fit a line to the plot:
#drawPoints(xvals, yvals, connect=False)
#drawFitLine(xvals, yvals, 1)
This line will plot a scatter plot using the points stored in xvals and yvals, and then it will fit a line to the data. The third parameter(currently a '1') tells the program what degree of function to try to fit to the plot. 1 means a linear function, 2 means quadratic, etc.In addition to drawing a line, the program tells you the function for the line it came up with. Can you use the function it gives to predict about how much it might cost per week for 11 cats?
Choose one of the other scatter plot datasets from here (any of the datasets under the heading Functions) and save it to your programs directory. Try running lineFit.py using it. If the line doesn't seem to fit very well, try changing the degree and running the program again.
Be sure to save one of the plots for your website, and answer these questions.
How well does the lit of fit seem to fit the scatterplot?
Outliers are points which are very far away from most of the other points in a scatterplot. Look at the plot again; can you find some outliers? How do you think the outliers might impact the line of fit?
Compare this plot to the one with outliers. This dataset is the same as the previous dataset, with the outliers taken out. Can you tell how it changed the line of fit?
Save one of the plots for your website. In addition to the questions here, please answer these questions on your website: