Data Trends

March Madness: Mastering Your Bracket with Search-Driven Analytics

The Stage is set. Four teams are left contending for a shot at winning the 2017 NCAA “March Madness” men’s basketball tournament. While most have already set up their brackets and placed bets behind their favorite teams, there are still people who are trying to figure out how to get started.

For those who are new to the world of NCAA basketball, understanding how these teams stack up against each other is difficult. Why are there four #1 seeds? How can I compare Big East against Pac 12? And what are these Cinderella Stories people keep talking about?

Let’s take a step back, and start at the beginning. Each year 347 men’s basketball teams from colleges across the US play each other during a regular season. In March the top 68 are invited to participate in a tournament to determine who will be crowned best collegiate team of the year. The teams play seven rounds, proceeding through a bracket of single-game elimination. One of the reasons this tournament is called March Madness is because despite how teams are ranked, quite often lower seeded teams take down those at the top. For fans around the globe, the fun is in guessing which teams will advance and who will be the ultimate winner.

Official 2017 bracket from the NCAA

So if the teams are ranked, why are the outcomes so hard to predict? Though teams are ranked based on their performance in the regular season, some teams faced tougher competition than others. Also, teams are often playing a completely new cohort from those they played in the regular season. So how do you make sense of all this information and pick the winners?

Method in the Madness

Working at a data-driven organization, I feel the best way to explain the complexities and nuances of March Madness is looking at the data. As mentioned before, the NCAA ranks teams based on their performance in the regular season, and uses that to determine their regional seed number. So we took historical tournament data results from 1985-2016, current year win-loss, and strength of regular season schedule (SoS) stats, and put these data sets into ThoughtSpot.

Below is a chart of the SoS, win-loss record, and seed number for each team that made it to the “Sweet Sixteen” (or 3rd round) this year.

As you can see, teams with #1 seeds (Gonzaga, North Carolina and Kansas) had a varying combination of a difficult schedule and a high win-loss record. For example, though North Carolina had a fewer wins than Gonzaga, they also played a much harder regular schedule. So, it’s a fair to assume they could make it farther in the tournament.

But consider another example—did you notice that #1 seed Villanova is the only #1 seed that didn’t make it into the Sweet Sixteen round? They were upset by #8 seed Wisconsin in the 2nd round. If you looked at the stats, you’d think the outcome should have been the other way around. Villanova had a better win-loss record from playing a stronger regular season, and entered the tournament with a better seed placement. But that’s the madness of it all, sometimes a team has a bad day.

Who are these Cinderellas?

Now that you know a team’s seed isn’t everything, you should also know that it’s still rare for a lower seeded team to advance very far within the tournament. This is what people mean when they talk about “Cinderella Stories”, they’re referring to low seeded teams who break through the ranks to defeat top teams.

But talk is cheap, and everyone loves playing up drama in these games. Media plays a big part in this because all fans, seasoned fans and newcomers alike, love watching the underdog win. So to get a better understanding of what actually happens, let’s look at historical data of national championships won by seed number.


In the last 31 years, 28 of the 31 Champions have been #3 seed or better. Losses, like this year’s #1 seed Villanova losing to the #8 seed Wisconsin happen, but over the years we consistently see #1 seeds making their way to the final rounds. In fact, since 1985, 90% of tournament winners were seeded #1, #2, or #3! And the other 4 winners were seeded no lower than #8.

This year three of four #1 seeds, two of four #2 seeds, and three of four #3 seeds made it to the Sweet Sixteen round—exactly half of the teams facing off. If you count the #4 seeds (all of which were still in the tournament), that makes 12 of 16 teams in the Sweet Sixteen round had a #4 seed or better. Sadly our only Cinderella candidate, didn’t make it.  

One of These is Not Like the Others

Now that the Elite Eight round has played out, we see this year is following suit with many of the years past—three of the Final Four teams (Gonzaga, North Carolina, and Oregon) are seeded #3 or higher. As you can see in the chart below, more teams with higher seeds typically advance through the Sweet Sixteen round.

However, being March Madness there’s still fun to be had. This is the first time (since 1985) South Carolina has advanced beyond the Sweet Sixteen round, and the first time Gonzaga and Oregon have advanced beyond the Elite Eight round.

So who should you be looking at to win it all this year? Here’s another look at the Final Four teams’ stats. Though South Carolina is seeded #7, they had a much harder regular season than Gonzaga, so this could be an interesting match!

My prediction? Well, I had Villanova winning…but now I think Gonzaga and Oregon are going to make it to the final round, with the winner being….. Oregon.