Just how silly is a 45-day weather forecast? And while we’re at it, just how good is a 2-day forecast?



I’ve got bad news for everyone here in the Fort Collins area: there’s a chance of showers on Labor Day (September 2). It’s a shame because the following Monday (September 9) shows a mostly sunny day, with a high of 77 degrees. (I am writing this on August 8.)

Seriously, AccuWeather? Yes, AccuWeather is releasing a 45-day weather “forecast”, although it’s actually just more of a trendline (it will be colder in October than it is now in August, probably a safe bet). It will predict the highs, lows, and chance of precipitation up to 45 days in advance. Presently, I cannot quantify how terrible these forecasts will be, but they will be terrible.

So that’s the silly portion of this blog post. To me, the more interesting questions are as follows.

Guiding Questions

  • Is a 2-day forecast error prone? What about a 7-day? At what point to weather forecasts become statistically useless?
  • Is there a regional difference in weather forecast error?
  • Which weather forecast models are more accurate for your region?

Suggested Activities

  • Solicit predictions. How accurate do you think weather forecasts are? Do you think a 2-day forecast is appreciably more accurate than a 5-day forecast? Do you think it depends on the region you’re checking (i.e. does, say, a dry climate have more error than a wet climate? do the coasts have more forecast error than the plains?)? Can we quantify any of this?
  • Start collecting data. Keep track of the 2-, 3-, 5-, 7-, and shoot, the ridiculous AccuWeather 45-day forecasts. Compare it to the recorded data (you could stick a max/min recording thermometer out your window, or check archived data). Grab the highs and lows, and possibly recorded rainfall. Yes, the 45-day forecasts won’t start paying dividends for a couple months, but that’ll be some pretty rich data (“rich” has a couple different meanings in this case) to have collected. And thankfully, if you’re teaching Algebra 1, a lot of curriculum maps out there don’t have the “Statistics” unit until late in the year.

How to collect the data? Well, Google Forms makes it nice and easy to record data. See, look, I made a Google Form for you and your students to use. The only tricky part will be some spreadsheet column manipulation, correlating the forecasts with the appropriate weather data collection dates.

  • Develop a method to calculate the error of the forecast. Depending on the grade level, your class could certainly develop an error model based on whatever sophistication you feel appropriate. Maybe an elementary class will just take the differences between the forecasted and collected highs and lows. Maybe a middle or high school class will develop something more sophisticated. Something like Mean Square Error should do nicely.



  • Compare model to model; compare region to region. Which weather model was the most accurate for your region? What if you just took the average of all the forecast models? Would that result in a better prediction? Some forecast sites to check:
  • Toward the end of the year, have students begin making their own Highs, Lows, and Precipitation prediction. By the way, these competitions happen all the time in the atmospheric science community. Maybe after some year-long analysis, your last couple weeks are spent gaming around students making their own predictions. They can look at all the forecasts and make a judgment. There are several ways to “score” it, some are more intense (scroll down to “scoring”) than others. Like collecting the data, Google Forms could be helpful.

If you’re looking for a long-term project that all students can access and be involved in (low barrier of entry, high ceiling), you could do worse than tracking weather forecast accuracy.

Teacher’s Edition

I’ve got some news that will probably shock you: 2- and 3-day forecasts are actually pretty good. And they’re getting better.

But this is me telling you that things are getting better and just citing some peer-reviewed research. That’s not necessarily what we do here. This is where you and your students could potentially come in.

For more on weather and weather forecasting, NCAR/UCAR has some excellent resources.

Update 8/9/2013:

Huge thanks to Frank (@fjvitale) for finding some graphs that better show improvement in TEMPERATURE forecasting. Here’s one that shows the error of MAX temperature forecasting for several different day forecasts since 1972.

You can find more forecasting verification graphs here.

(More of my posts on weather and climate.)

How does one provide the complex data of global warming to students?

Update (3/12/2013): An atmospheric scientist friend of mine, Katie, suggested a few edits to this post, primarily to clear up a few of the tools listed here. The edits are in bold.

My initial thesis on this post was originally going to be “why don’t teachers let students investigate global warming very often?” While this may not answer it here’s a terrifying google search for any teacher who is interested in having their students do some independent research on climate change. Google: “global warming raw data“.


So the first result is a good one. A legit one. There are lots of links to reputable sites maintained by reputable scientists. Then the second result is a yahoo! answers post. The the third (third!) google result for a simple query on raw data turns up World Net Daily, a website for conspiracy theorists and people that think they’re going to be put in FEMA camps any day now. That is not a reputable site. They provide the opposite of “raw data”.

This is not a post about the messy politics and confusion-campaigns around climate change. But this does point to a particular difficulty that you’d hope would be much simpler: where can we find raw temperature data that we can actually use? For the record, a google search of “raw temperature data” yields much more acceptable initial results. But still, many of those results can be extremely difficult for a secondary math or science teacher to pick up and use, let alone students. For one, climate data is often presented in a file format that requires heavy coding knowledge or special programs to process (such as NetCDF). Second, it’s hard to know where to start with temperature data. Do you start by geographic location? Do you take the annual mean across the globe? How would one do that, exactly?

So this is the problem, and maybe a fundamental problem of teaching science: data are messy. We have to rely on others to package it for us. Scientists are interested in providing the raw data because they want people to have access to true observations, but that raw data is so vast and difficult to process (but not that difficult to interpret!) you have to get at least a Master’s degree before you can even start to decipher it. And often, scientists aren’t interested in culling the data to make it more digestible for the public. They’d prefer to show you the graph. This is great for communication, but not great for independent research. And worse, they’re now fighting on the same plane as disingenuous charlatans who are paid to be as such. So let’s provide students of science the raw data in a way that anyone with Microsoft Excel and a genuine curiosity can begin to explore the very real phenomenon of climate change.

My favorite site that does that is this NASA’s GISS Surface Temperature Analysis. In terms of accurate, raw, commentary-free, accessible, customizable, and processable data, I haven’t found a better place to start. Bookmark that site. Tell your students to go to that site. Start locally.

To find specific historic local weather stations, Katie recommends using the map rather than the search function. The map appears to have better functionality. So click on your favorite vacation spot and go find that precious, precious raw data.



Once you have the ASCII data (shown here), it’s simply a matter of copying and pasting it into Excel, or if you’re incredibly ambitious (or teaching a Stats class perhaps), having students import it into R, one of the industry standards.

For the uninitiated, let me translate a few things: 

D-J-F= December-January-February average

M-A-M=March-April-May average

J-J-A, S-O-N = I think you get the idea….

The last column, metANN = annual mean temperature. This actually might be the best first place to start. 

Berkley also has a nice data set organized by country. However, the accessible to-layperson data is a bit more hidden.


If you’re not careful, you’ll end up downloading intense, non-accessible-to-the-layperson, NetCDF data. Which, again, is fantastic data, but difficult to work with yourself.

But now we’ve got two sites with data that can be tossed into Excel, R, or even those statistics packages designed for secondary students. Now that we have that data, we can do a lot with it.

Suggested Activities

  • Have students investigate the temperature trend in their area.
  • Create a linear model that predicts temperature as a function of year locally.
  • Assign each group or student a different region of the world to investigate and develop a linear model for.
  • Or what about this: develop a sinusoidal equation that describes monthly temperature. Get some trig in there.
  • Ask the question: is our town/state/country/planet heating up or not? Or is it too uncertain to tell?
  • Can you find local stations that DON’T show a warming trend? Katie suggests looking at weather stations closer to the poles to consider the potential impact of polar temperature trends. This might be a bit science-y, but it’s something I’d happily let students explore in a math class.

Once you have actual data, you can start to test it to assess that last, fundamental question (which then spurs thousands of other questions, like “should I have children?”). Is ß>0 under the general linear model? Once we have that answer, even if it’s just locally, we can start to talk about the implications.

Who doesn’t want to relive the 2000 election? (Stats problem)

We’ll take a slight detour from my college readiness manifesto (that hasn’t even really started yet) to bring you the following election-related problem. Then again, this problem was lifted directly from a graduate level Statistics class, so this might give some insight into what college readiness could potentially look like. Hadn’t thought of that. Enjoy!


Here’s a (non-abridged) problem I received in my graduate level stats class last week (due tomorrow! hope it’s ok that I’m posting it!). I think it’s a great problem and one that’s certainly prevalent around this time:

from The Statistical Sleuth, Ramsey & Schafer, Ed. 2)

1. (SS#8.25) Presidential Election of 2000 

The US presidential election of November 7, 2000, was one of the closest in history. As returns were counted on election night it became clear that the outcome in the state of Florida would determine the next president. At one point in the evening, television networks projected that the state was carried by the Democratic nominee, Al Gore, but a retraction of the projection followed a few hours later. Then, early in the morning of November 8, the networks projected that the Republican nominee, George W. Bush, had carried Florida and won the presidency. Gore called Bush to concede. On the way to his concession speech, Gore then called Bush to retract that concession. When the roughly 6 million Florida votes had been counted, Bush was shown to be leading by only 1,738, and the narrow margin triggered an automatic recount. The recount, completed in the evening of November 9, showed Bush’s lead to be less than 400.

Meanwhile, angry Democratic voters in Palm Beach County complained that a confusing “butterfly” ballot in their county caused them to accidentally vote for the Reform Party candidate Pat Buchanan instead of Gore. See the ballot below. You might understand how one could accidentally vote for Buchanan instead of Gore because Gore’s name is the second listed on the left side, but his “bubble” is the third one. Two pieces of evidence supported the claim of voter confusion. First, Buchanan had an unusually high percentage of the vote in that county. Second, there were also an unusually large number of ballots discarded during counting because voters had marked two circles (possibly by inadvertently voting for Buchanan and then trying to correct the mistake by then voting for Gore).

Make a scatterplot of the data, with X = # of votes for Bush and Y = # of votes for Buchanan. What evidence is there that Buchanan received more votes than expected in Palm Beach County? Analyze the data without Palm Beach County to obtain an appropriate regression model fit. Obtain a 95% prediction interval for the number of Buchanan votes in Palm Beach County from this fitted model (assuming that the relationship between X and Y is the same in this county as the others). If it is assumed that Buchanan’s actual count contains a number of votes intended for Gore, what can be said about the likely size of this number from the prediction interval?

Why couldn’t a similar problem be asked in a HS Stats class? Maybe modified, but seriously, why not? And especially why not now, in a year divisible by four (Summer Olympic and presidential election years)? The problems a bit wordy though. Let’s try this:

Artifact, reworked:

The US presidential election of November 7, 2000, was one of the closest in history. As returns were counted on election night it became clear that the outcome in the state of Florida would determine the next president. When the roughly 6 million Florida votes had been counted, Bush was shown to be leading by only 1,738, and the narrow margin triggered an automatic recount. The recount, completed in the evening of November 9, showed Bush’s lead to be less than 400.

Meanwhile, angry Democratic voters in Palm Beach County complained that a confusing “butterfly” ballot in their county caused them to accidentally vote for the Reform Party candidate Pat Buchanan instead of Gore. See the ballot below.

Guiding Questions

  • How could we use statistics to determine whether or not the “butterfly” ballot confused voters?
  • How big of an outlier was Palm Beach county?
  • Had the ballot been more traditional, can we predict the outcome of the Florida electoral votes (and presumably, the 2000 election?).
  • Is there a model of sorts we could employ to detect such anomalies in the future?
  • While we’re at it, what’s up with Dade County over there?

Suggested activities

  • Make a scatterplot and a linear fit and be, like, DUH, something was whack in Palm Beach county. (data of Bush votes and Buchanan votes by countyare at the bottom of this post)
  • Socratic discussion on outliers, not to be confused with Outliers by Malcolm Gladwell.
  • Workshops on confidence intervals, standard deviation and the like.
  • What does the line of best fit look like with and without Palm Beach? And what might that tell us about the voting discrepancies in Palm Beach?

Attempted solution

I’m not going to post my response to the problem prompt, because it may violate academic honesty or something. But I’ll post a scatterplot of Bush/Buchanan votes by county and leave it at that.

Data: election2000

Why doesn’t Nike+ use math to encourage me to run?


The Nike+ app, which at the end of my run the other day, looked like this:

(editor’s note: yes, I’m slow. Thank you for noticing. Also, along with some encouragement in data format, I had Tim Tebow give me words of encouragement for bettering my pace.)

Now, there are a lot of numbers here, but I’m primarily interested in that last piece.

“You ran 0.10 mi more and 0’56″/mi faster than the average of your past 7 runs”

Why did Nike+ choose my past 7 runs? Was there some sort of algorithm to maximize how good I feel about myself?

Sadly, I think not. Witness my previous run:

So it looks like it just takes your past 7 runs and compares your mean distance and pace. That’s not very good motivation, now is it Nike+? Can we improve (at least, in my opinion it would be an improvement) Nike+’s distance and pace comparison to help the runner feel better about his or her progress?

Guiding Questions

  • Would a different measure of central tendency lead to a different, and perhaps more encouraging, data capture?
  • Would averaging a different number of past runs lead to a different, and perhaps more encouraging data capture?
  • Could we write an IF…THEN or other type of algorithm to encourage the runner?

Suggested Activities

  • Give some runner data (either fabricated or authentically generated; shoot, you can use my data if you want) and ask students to describe after each run “what should the app say in order to give the runner a sense of accomplishment?”
  • Once students have done that with individual data points, have students sketch out an algorithm or decision tree.
  • Test that algorithm or decision tree against a new set of runner data.

  • Compare decision trees and algorithms to see who’s is the “positivest”(?).
  • Turn into algebraic expressions if you want, presumably to help out the coders.

There are few things more discouraging than seeing that I’m actually running slower than my seven previous runs averaged out. At least package the data so I don’t feel like I’m out of shape.

Attention Math Teachers: Slate has graciously discovered your next project

Slate.com’s always entertaining “The Explainer” segment runs an always-even-more entertaining year-end segment on the Unanswered Questions of the Year in which readers are prompted to vote on the question to be answered (aside: say, that’s a pretty awesome activity for a classroom. Students in the middle of a Problem vote on the question the teacher answers.)

This years’ edition contains burning questions such as “When you urinate in a toilet and there’s splashback, is that urine or toilet water?” and “Why do dogs like having their bellies rubbed?”

The one that I’m interested in is the following.


13. When parking in a nearly full parking lot, is it quicker to a) park in the first open space you see and walk, or b) drive a few laps around the lot and grab the closest possible spot? In my experience the two ways are about even, since the extra time spent driving for “b)” means a quicker exit when you leave. Please settle this using statistics as my wife has refused to argue anymore regarding this issue.

Tell me this isn’t a dilemma you play out in your head on a near daily basis.

I suppose you could use statistics, but this would make a super modeling project as well. Moreover, I have no idea what “level” of math a problem such as this requires and that’s a good thing. This could be a middle school project or an Algebra 2 project. This could be attempted by your high-flying AP Calculus-bound students as well as your remediated Algebra 1B students who probably hate the subject you’re teaching them.

So how do we pose this problem to students? A video of you driving around looking for a parking spot with your friend/spouse imploring you to “just park and we’ll walk!” might work. A few simple diagrams might work:

Does your school have parking issues? Particularly in the afternoon? Grab some footage. Or shoot, even the simple question posed in The Explainer may be interesting enough as is.

Guiding Questions

  • Does the size of the parking lot have anything to do with the decision to Park and Walk or to Keep Searching?
  • Do the number of parking spaces have anything to do with our decision?
  • How fast do people walk through parking lots?
  • How often are cars vacating their parking spots?

This is a good example of a problem that A) could potentially be immediately interesting to students (or at least could be posed in an interesting way), B) doesn’t necessarily have a correct solution, C) offers multiple routes through the problem, and D) can be accessed regardless of prior mathematics expertise.

What are some potential “next steps” students could take to engage in the problem? What are some potential mathematical routes to a solution? This is something to think about over the break, and as you’re vigorously searching for a parking space at the mall to buy that last-minute gift you’ve been putting off.

Do NFL teams actually use that draft pick chart when trading draft picks?

The 2011 National Football League draft of players is tomorrow, and I was inspired by a really interesting post over on Mr. Honner’s blog regarding the NFL Draft Pick Trade Value chart and exponential decay. The gist is this: somewhere, sometime, someone came up with a chart that assigned a numerical value to every pick in the NFL Draft. I’ll let Mr. Honner describe the chart:

It turns out that football analysts have created a trade value chart that essentially standardizes the value of picks .  For example, a team holding the 7th overall pick, valued at1,500 points, might expect to receive the 21st pick (800 points) and the 26th pick (700 points) in return, were they to trade their pick.  This conventional valuation helps establish fair prices for trades, as it would in any commodities market.

I couldn’t have said it better.

Anyway, Mr. Honner established that the values of the draft picks decay exponentially (for some reason).

Courtesy of mrhonner.com

My questions were these: do NFL teams actually adhere to this chart? And how can we tell? And is there anything this data aggregation can tell us about horse-trading NFL draft picks?

Here’s a link to every draft day trade involving pick-swapping since 1992.

So let’s start putting together for use in a classroom.


NFL Trade Value Chart

Picks trade history

Guiding questions.

  • Do NFL teams follow the chart as accepted value for draft picks?
  • Does this chart relate at all to what happens on the field?
  • Why is there such a quick exponential drop in draft pick value?
  • Do teams that trade for higher picks tend to get better value or teams that trade for lower picks?

Suggested activities

  • First off, finding the exponential decay function that fits the chart could be interesting. Mr. Honner’s thankfully done that for us.
  • Class discussion: how can we tell if teams are following that chart?
  • Lotta data aggregation and manipulation. Excel to the rescue?

Potential Solution

A couple caveats before we begin:

  • The trading data do not show any additional players that were involved in the picks swap. For instance, teams often trade a draft pick for picks + a player. That is not reflected in the data.
  • Oftentimes teams trade future years’ picks. It’s tough to assign a value to that since that particular pick is in flux. For instance, if I trade a 2011 pick for a 2012 first-round pick, I don’t know immediately where that 2012 first-round pick will land. So I threw out all trades involving future picks.

Now onto some solution.

Here’s a terrible-looking spreadsheet I threw together attempting to aggregate the total pick value of both sides of a trade. The team that traded the higher pick is on the left. I called them the “Higher pick trader picks“. I’m not very creative. Conversely, the info on the right is the “Lower pick trader picks.” Each row represents a different trade.

The total value of the team’s picks that traded away the higher pick is listed in the green column. The total value of the other teams’ picks are listed in the purple column. (note: you’ll have to check it out in full-screen or download the document if you really want to see the spreadsheet in its full glory).

If we make a scatterplot of all the trades we can visually see if NFL teams basically follow this trade chart value. If the linear slope is one, then that’s a perfect trading match. That’s what I’ve done here.

Surprisingly, that’s a pretty good fit, meaning for the most part teams roughly adhere to the trade value chart. The slope is nearly one (0.972). A couple of things jump out at me.

Overall, a slope of 0.972 – or, less than one – suggests that the team that trades away the higher pick is getting the better value (this might be a good place to spot and let students ponder as to why this is), presumably in the form of multiple picks.

That said, for the extremely high-value pick trades (where the value of the traded picks exceed 2000), according to the draft chart, the teams that trade away their valuable, high picks are getting less value. Of the 5 trades that involve higher than 2000 points, 4 of them gave the advantage to the team trading away their lower picks. This is due to the extreme top-heaviness of the trade chart. To wit, according to the chart, the #1 overall pick is worth three times the #16 overall pick.

We could also plot the best-fit line as the horizontal axis as a really good visual of the alleged benefit of trading away higher picks in favor of more lower picks.

If we accept the trade chart as true values, then according to history, we have two suggestions for NFL General Managers:

1) In general trade away your higher picks for multiple lower picks.

2) However, DO trade up for extremely high picks (top 5).

You’re welcome, NFL GMs. Make the check out to Emergent Math. And be sure to give Mr. Honner a cut too.

The Dallas Mavericks are 2-16 in playoff games officiated by Danny Crawford. Is this statistically significant?


This shocked me.

The Mavs have a 2-16 record in playoff games officiated by Crawford, including 16 losses in the last 17 games. Dallas is 48-41 in the rest of their playoff games during the ownership tenure of Mark Cuban, who has been fined millions of dollars in the last 11 years for publicly complaining about officiating.

First of all, is that right? That A) the Dallas Mavericks perform so poorly in Crawford-officiated games, and B) Crawford is still being allowed to referee them? Really? Wow.

And there’s this, which might even be more damming: The Mavs are 4-14 against the Vegas spread. ESPN provides a nice chart of the individual games.

I specifically remember those 2006 Finals games against Miami. By many accounts, those were two of the worst officiated games in NBA history, in which Heat guard Dwyane Wade got what seemed to be every favorable foul call. It pretty much ushered in the era of NBA ref scrutiny.

This has to be tested for statistical significance.

Guiding Questions

  • Really?
  • Is this just coincidence or is there something else going on here?
  • Does Crawford have a vendetta against the Mavericks for some reason?
  • Does Crawford have a suspicious record with any other team?
  • Is there a potential way, other than referee malfeasance, that we could explain away this alleged disparity?
  • Maybe the Mavericks are just playoff chokers?

Suggested activities

  • Obviously if this were a statistics course you could look at statistical significance, which we’ll do in a minute.
  • If students are really up for it, they could delve into the games themselves and look for disparities in “referee stuff” like fouls, technicals, travelling, etc. We’re not going to do that in a minute.
  • Homework: students watch tonight’s Mavs-Trailblazers game closely and look for anything fishy from Crawford (although, this might serve as its own lesson in confirmation bias).

Potential Solution

Let’s start with our null hypothesis:

H(o): Danny Crawford is NOT biased against the Mavericks. The Mavericks’ playoffs woes in games he’s officiated is due to random chance.

I suppose first we have to figure out what the Mavericks’ Crawford-officiated games “should be.” The Mavericks are 48-41 in playoff games not officiated by Crawford, good for a winning percentage of 54%. Although, if you’re like me, you believe more in random chance for sporting events, and the “true percentage” is probably pretty much 50% over the course of a decade. But that could be a fun debate point in your class.

We also need to decide on a significance/confidence level α, usually 0.05 or 0.01.

So what is the probability of a team that “should” win 50% of its games (debatable) ending up winning just 2 of 18 games at random? Or rather, that this team should lose 16 (or more) of 18 games by random chance?

A P-Test would could look like this,

Probability of 16 losses + Probability of 17 losses + Probability of 18 losses =



So no matter what confidence level we choose, this is, again, pretty damning. If we assign a 50% of the Mavericks winning (less than for their other playoff winning percentage) there is only a six hundredths of a percent chance of this being total flukiness.

Before we go nuts, though, let’s look back at that chart. Now, if you’re not familiar with Vegas lines, the negative sign in front of the “DAL Line” column indicates the Mavericks were favored that game. You’ll note that Dallas was only favored/expected to win 8 of those 18 games, and Vegas is usually pretty dead-on about these sorts of things. If we use that as a “true winning percentage” the Mavericks would only be expected to win a mere 44% (a losing percentage of 56%) of their games, not 50%. Let’s recalculate.

Slightly less suspicious, but still grievously suspicious. It’s well below our 5% or even 1% confidence level.

Still, before I would get Ralph Nader involved, I would ask students, investigators, etc. to look for specific evidence pattern within the games themselves (as 82games.com has done for one specific game). The original ESPN article that led to this investigation suggested Dallas had more fouls and less free throws in Crawford-officiated games than others. A next step would be to look at the beneficiaries of the suspect officiating, i.e. Dallas’ opponents for these games. Did they get an inordinate amount of free throws? Did they tend to overperform, just as Dallas underperformed in these games. Now that we have the statistical basis to be suspicious, we can start the investigation in full.

Couldn’t this work as a real nice Project Based Learning Unit for statistics? The Entry Event could be the ESPN article, or Game 5 of the 2006 NBA Finals. The summative event could be that students could present their findings, host a panel or debate, or write a letter to their congressperson. Or Ralph Nader.