Feature Article from William Spaniel

What Have I Learned

William Spaniel

11/14/2012 10:00:00 AM

submit to reddit » Print «

The last month has been an all-out Nate Silver-fest. For the unaware, Silver is the architect of the FiveThirtyEight blog on the New York Times. In past lives, he was a sabermetrics guru and a successful online poker player. When federal regulations pushed casual poker players out of the game—thus driving out all the easy money—he switched to election forecasts. In 2008, he correctly predicted how 50 out of the 51 states (plus Washington D.C.) would cast their electoral votes; this year, he was perfect.

Silver does not do anything magical. Indeed, all he really did was claim that math (math!) and statistics were better indicators of election results. Polls are noisy because of pure statistical randomness, and the news tends to over-report the extreme ones precisely due to how strange they are. Silver's claim is that we are better off looking at all polls to gather a more informative sample size and rely on fundamentals (the economy, unemployment, and such) that remain relatively static over the course of the election cycle. Thus, while polls would fluctuate wildly in response to particular events (the Democratic/Republic National Conventions, the latest “gotcha” moment, and so forth), Silver's forecast remained comparatively consistent throughout. And it was also spot-on correct this time around.

The whole FiveThirtyEight revolution received tons of criticism from talking-heads on television. Political pundits are poorly informed and have incentive to sensationalize anything they can find. When Mitt Romney's poll ratings surged after the Republican National Convention, it had to mean that Romney was sitting pretty for the November election. These pundits would Chastise Silver's model for only shifting Romney's chances a marginal amount.

Of course, Silver was right. It is a well-established effect that candidates receive a post-convention bounce from all of the publicity surrounding the events, but that said bounce Disappears within a few weeks. The pundits were wrong but seemingly could not care less. Conceding that Silver's methodology was superior to their own threatened their own relevance, so many pundits opted to Lash Out instead. Ultimately, November 8's electoral result provide all the information we need to declare a winner and a loser in this war of words.

What does this have to do with Magic? In 2009, I got the idea to collect large datasets of Magic tournaments to analyze how the metagame fluctuates. The results were the Power Rankings and Metagame Trends series, and Jean-Philippe Theriault and Joshua Justice soon joined me as coauthors. Our results often questioned the status quo, and they certainly made people angry—a lot of people angry. But we were fine with it. After all, we had the numbers to back up our claims.

Power Rankings and Metagame Trends ended about a year ago. I can only blame myself; I am enrolled in a PhD program, and coursework killed off the spare time I had to go around collecting and analyzing the data. However, the recent attention Nate Silver received made me think back to the days when we were still running the numbers for Magic. We never properly wrapped up the series. Today, I want to resolve that by asking a simple question to myself:

In two years of data analysis, what did you learn about Magic metagames?

The actual analysis obviously taught us many things about how specific metagames worked. However, whether Jund or Caw-Blade was successful years ago tells us virtually nothing about Standard today. Instead, I want to focus on overall lessons that will likely be forever applicable. Below are the three that I find most useful:

1) Metagames Don't Wildly Fluctuate
Here is a common story you will hear Magic writers tell. Deck X is the status-quo “best deck,” and it wins the first major tournament of the season. At the next tournament, the triumphant strategy is deck Y, which has been custom-crafted to succeed in a field dominated by deck X. By the time the third tournament rolls around, the metagame has shifted yet again. People are no longer interested in deck X and instead have flocked to deck Y. But that leaves deck Z in perfect position to run the table. This process repeats for a few weeks until deck X is the best once again.

It is a nice tale. Too bad it is basically all false. Metagames evolve, but they do so at a glacial place. Occasionally, a player finds a brand new deck which injects new life into the format, but as long as deck X, deck Y, and deck Z were all known ahead of time, the metagame will not have much of a discernible difference from the Thursday before a tournament to the Monday afterward. The evolution of the metagame provides a nice narrative for the story of the format, but it is just that—a story.

While Magic authors frequently get this wrong, I think the Magic community as a whole has a better grip on reality. Your average player barely responds to the results from a previous tournament, especially who won that tournament. And why should he? The decks that appear in the top eight are usually only one match win better than a large group of other decks, and sometimes the only difference is in the opponent's match win percentage. Put simply, the top eight, top sixteen, or top thirty-two all provide very little information on what the format really looks like. There is a hint of a signal in there, but it is mostly just noise. Players respond in the rational manner: they do not overreact to a single tournament result and instead infer information over the course of the long term.

Again, it is worth noting that the exception here is one the tournament provides new information on what we might think of as “unknown unknowns,” or specifically viable archetypes that were a complete mystery beforehand. These shocks will cause structural breaks to the metagame, since players will immediately netdeck those strategies into the mainstream.

2) Money Matters
One might point out that even if tournaments are extremely noisy, they still provide the best information we have and so we should follow whatever little signal appears. The first inference is correct but the ultimate conclusion overlooks two important factors. To start, optimal deck selection has a bit of path-dependency to it. Suppose for whatever reason you choose to play deck A at the start of the season. Over the first few weeks, you learn a lot about how to play deck A. Then the first major tournament results start flowing in, and they tell you that deck B might be better. While you could switch over to deck B, you'd be starting all over again learning how to optimally play it. If the difference in quality of deck B over deck A is small, you might as well stick to deck A and avoid making bad plays.

Second, and perhaps more important, is that money matters. It is inescapable and sad, since it matters more than anyone is willing to accept. Deck B might improve your expected chances of winning by a few percentage points, and you might be willing to take the time to learn how to play it, but are you going to liquidate your cards from deck A and spend an additional $200 to construct deck B? A few percentage points are nice, but there is a real tradeoff between money and likelihood of winning.

Unfortunately, this is where things turn ugly. Because of this tradeoff, all other things being equal, the metagame dictates more expensive decks will win more frequently than less expensive decks. The reason is simple and was outlined above: something has to compensate players for the extra cost of the deck, and for serious tournament players that thing must be a greater winning percentage. This gives wealthier players a distinct advantage, since it is less painful for them to toss $500 on a deck than a poor player. In other words, people really can “buy” wins. I formalized this idea in the article “To the Rich Deck Go the Spoils”; I think it is the best article I ever wrote, so I suggest taking a look at it.

Money also affects when players decide to enter tournaments. At the beginning of a season, tournament attendance is low. There are very few cards from the new set on the market, and the prices are relatively high due to the low supply. Players will sit out the first few weeks as a result, opting instead to wait until prices go down before making their investment.

However, this gives those players better knowledge over which decks are good and which decks are bad. Thus, a large portion of the players “shifting” the metagame are really those arriving late to the party.

We created a large number of images for the Power Rankings and Metagame Trends series. (Our editors once referred to Metagame Trends specifically as “death by images.”) Of all of them, the one below is my clear favorite. It looks at the evolution of Caw-Blade from the end of May to the middle of June in 2011. As you might recall, this was on the eve of the banning of Jace the Mind Sculptor and Stoneforge Mystic. Caw-Blade was absurdly good at the time, distorting the format and inducing players to quit Standard for the season. Here is how the metagame “evolved” during that period:

Over time, the number of non-Caw-Blade players remained remarkably static. Let me emphasize the fact that these are the number of players, not the proportion. It is as if most people who were playing decks without Stoneforge Mystic and Jace the Mind Sculptor kept playing them throughout the season. But as the season progressed and the tournament sizes increased, it is as if all of those additional players were playing Caw-Blade! While certainly some of those non-Caw-Blade players switched over to Caw-Blade and a few poor souls eventually entered the Standard scene with a losing deck, it remains remarkable how the entire proportional shift in the metagame can be explained by the injection of numbers to just one side of the equation.

What have we learned about money? Metagames do not get stale because of access to Magic Online daily event decklists, or TCGplayer MaxPoint Series events, or SCG Opens, or large data-sets published online. They grow stale because money prevents people from fluidly transitioning from deck-to-deck. During the height of Caw-Blade's dominance, you would think that you would be better off playing an anti-Caw-Blade deck, but you would be wrong. Even as it approached 40% of the metagame, players were still better off opting for Caw-Blade because such a large number of players were trapped playing their current incarnations. (This is another shot to the evolving metagame argument.) No proper foil to Caw-Blade could ever evolve, and no positive metagame shifts could occur. Players were locked-in to their win percentages, stratified by how much their deck cost to build.

3) No method is perfect.
Power Rankings and Metagame Trends were not perfect methodologies and certainly not capable of answering a lot of questions. Is the metagame ripe for a combo deck to come along? Who knows—we can only observe what is actually being played. What about this new decklist I just saw online? Again, same problem. We were very good at understanding the known-knowns and poor at understanding just about everything else.

What frustrated me the most, however, was the absurd standard many critics held us to. Our decklists were not standardized—that was a problem. There was variation in the skill-level of players—that was a problem. There was variation in the experience of the decks' pilots—that was a problem. Looking at statistics gives you no sense of how to strategize in-game—that was a problem.

If there is a single takeaway point from this article, it is this: we should not hold anyone to the standard of perfection. Rather, we should ask ourselves a simple question: is alternative x better than alternative y? If so, we should be using alternative x. If not, then y is perfectly acceptable. This is intuitive from a deck selection standpoint. Unless the format is broken, any archetype you pick up will have problems. The question is whether those problems are better or worse than any other archetype you could play.

Yet people fail to apply this intuitive criterion to other realms of their life. We can all agree that we need to make inferences and predictions about the metagame; going into a tournament completely naïve is certainly worse than having some rough sketch of what will happen. So, given that we must be in the business of prediction, the question is how do we go about doing it? The critics tell us our decklists were not standardized. That was true, but what is the alternative? Should a couple friends sit down together and play 50 games between deck A and deck B to see which one is better for that matchup? Beyond being unreasonably time consuming, it also forces the playtesters to make assumptions about what the “standard” version of deck B is rather than

The critics told us that there was variation in the skill-level of players. This was true. But, again, what was the alternative? Once more, you are stuck playtesting forever. You also have the countervailing problem of your playtest partner potentially not being competent with the deck he is supposed to be piloting, which taints the results. I have no doubt that bad players disproportionately Drag Down the observed win percentages of more methodical decks. However, there is no reason we cannot account for that in the interpretation of the statistics.

So, yes, our methods were not perfect. On the other hand, they provided efficient insight for the reader. We are physically incapable of playtesting thousands of matches on our own. Looking at the statistics gave us a portrait of the metagame which players could then use to informatively guide their playtesting. If you only have ten hours a week to practice, you really could not avoid using Power Rankings and Metagame Trends as a shortcut.

In sum, there is nothing wrong with seeking perfection. But absent perfection, we need to learn to accept the best alternative out of the set of imperfect alternatives in front of us.

So what did Power Rankings and Metagame Trends teach me? Statistics and mathematics help clarify theories and invalidate existing hypotheses. Back in the day, we could directly apply these results into optimal tournament suggestions. In retrospect, we learned a lot about the structure of metagames in general, since these were the first articles to look at consistent data over a very long term. If you remember me from those days, I thank you for your support. And if you are new to the fold, I encourage you to look through the archives and see how statistics can be useful for making metagame inferences.

William Spaniel