Thursday, December 03, 2009

Elims v. Prelims. And the winner is?

A lot of discussion has continued over the elim/prelim math, both on TVFT, in my Sailors meeting, and here both in postings and comments. I wonder if we can establish some sort of best practice.

The goal of any tournament with elimination rounds is to advance an appropriate portion of the field from the prelims. I personally have always tended to think in percentages when called upon to make that sort of decision. I roughly settle in at about advancing 25-30% of a varsity field, and maybe 30-35% of a novice field. (There seems to be some satisfaction in letting more novices break, on an intuitive assumption that it will tend to retain them in the activity if they get off to a rewarding start.) My thinking has changed due to recent discussions. Percentage is okay, but it’s probably not the best way to evaluate the break point. Rounds won/lost is cleaner and more precise, and probably more meaningful to the participants.

All tournaments with elims that I’m talking about have a minimum of 5 rounds. Some have 6, and some have 7. There may or may not be limits to the numbers of rounds that can transpire. But the first determination we need to make is, how many rounds are warranted by what numbers. The warrant point, as has been suggested, is a record of down 2; that is, all down-2s ought to break. Why? Again, it’s intuitive, but I think we believe that certainly you ought to be able to lose one round for real and one dubious round and still break. Makes sense.

In our present systems, it’s usually speaker points that make the difference in which debaters will or won’t break who are down 2. The assumption is that high speaks denote a better debater than lower speaks (and we juggle the points, dropping the high and low, to eliminate the extremes in either direction). But speaker points are a notoriously complicated and personal measure that no amount of normalized has ever normalized. They are not the best tool for establishing the break point.

So, we stick with the down 2. We want all the down-2s to break. So first, we have to find a point given the number of debaters in the field where a reasonable about of down-2s come naturally (i.e., 5, 6 or 7 prelim rounds).

69 naturally breaks all 3-2s with 5 prelims
99 naturally breaks all 4-2s with 6 prelims
149 naturally breaks all 5-2s with 7 prelims

From those starting points,

69 to octos, 10 5-2s break out of 11
99 to doubles, 20 4-2s break out of 23
149 to doubles, 22 52-s break out of 25

You probably don’t want to break the 69 to doubles, and 6 prelims for 69 seems pretty over-the-top, so one solution is a partial double, breaking all the 5-2s, with run-offs at the bubble. 99 works almost perfectly at doubles (which is quite generous, 32 out of 99); do you really want a run-off of 3 rounds? Perhaps. 149 to doubles after 7 is the same as 99 after 6, a small run off.

So our solutions here are 69/5 with a partial double, 99/6 doubles (possible run off), 149/7 doubles (possible run off). 69 requires 5 + 5, 99 requires 6 + 5 (+1?), 149 requires 7 +5 (+1)

Can we use triples as an alternative?

Trips makes no sense with the 69, but let’s look at the 99 and the 149.

99, 5 prelims to triples, advances all 3-2s and half the 2-3s.
149, 6 prelims to triples, advances all the 4-2s and 12 of 46 3-3s

The 99 is dicey with a full triple, but a partial triple excluding the 2-3s means 99/5 triples = 5 + 6. 149/6 means 6+6 (and here, you could exclude the 3-3s). As you can see, the use of fewer prelims and more elims seems, at least with these numbers (all from iDebate, btw), to actually lead to fewer rounds but better breakage. It eliminates the need for run-offs.

More breaks rounds, fewer prelim rounds. Anyone still reading this? Thoughts?

5 comments:

Jim Menick said...

By the way, triples = 64 debaters = 32 rounds = 96 judges single-flighted or 48 double-flighted. So, you need a pretty decent judge pool to consider it, whereas with an extra round, you concentrate good, albeit single, judges on the bubbles.

richmindseed said...

I'm kind of surprised that this is still an open question - there's an easy formula that solves this.

We begin with a simple note - setting logistic concerns aside, we ought to have precisely the amount of prelims that allows for the best sorting of the field. Having one preliminary debate is obviously insufficient, because the break at that point is arbitrary, since people haven't been sorted out. Having 10 prelims, on the other hand, is almost certainly overkill - the last few rounds aren't really accomplishing much sorting, so we might as well save the trouble and get into elim debates.

The question now arises - how are records typically distributed? It turns out that record distributions are usually well approximated by a binomial distribution. This makes sense - assuming two presets and powermatching from then on, we'd have precisely a binomial expectation if every round is a coinflip. Since powermatching serves the purpose of making rounds more "evenly matched," it is unsurprising that the binomial model is a good approximation.

From this we can immediately conclude that a binomial model is an effective starting point for tournament analysis. Our problem is thus the following - given a population of size X that is distributed binomially, how many iterations are necessary to distribute this effectively? Pascal's triangle answers this immediately - the sum across row n (call it SUM(n)) is the number of discrete entities that are distributed by the nth iteration. Thus, if X = SUM(n), n rounds are required for a proper distribution.
Cool fact about Pascal's triangle - for all n, SUM(n) = 2^n (where the apex is row 0).

Thus, we have the formula desired - given X entries to the tournament, you should have n rounds, where n is the natural number that minimizes |2^n - X|. With a fairly small number of calculations, therefore, you immediately know the relevant n, and your job is easy.

Looking at the specific examples in this post - 69 entries => 6 rounds, 99 => 6 rounds, 149 => 7 rounds.

I haven't told you how many people to break from these pools, of course, and I haven't told you whether or not to have runoff rounds. I think the arguments made in this post regarding why you should clear some down-2s are pretty good; I'm more interested in figuring out how to simplify the decision regarding the number to clear...

Final note - in a world where you don't do "a few random rounds, then powermatching all the way," the argument above does not apply. I've yet to hear a good justification for whatever your variation is, though...and I leave as an exercise for the reader the fact that if all rounds are coinflips, the number of random rounds is in fact irrelevant, so 1-vs-2-vs-3 presets is an argument worth ignoring. The condition in the preceding sentence is of course unlikely to hold, but that doesn't make the argument any less interesting...

Ryan Miller said...

richmindseed: I'm too lazy to do the math at the moment, but let me help you try to formalize an intuition about numbers to break, and maybe you'll do it for me. First, as Jim says, there's the psychological criterion--in debate, we give tin to those who break, and human psychology seems to want to see something like 20-33% winners, or 10-50% at the outside. Second though, we could do this in terms of statistics. If you have a strong intuition that the top 20% of debaters should be in outrounds, then you have two factors to worry about. A: if you don't break all down-2s, what about people who hit >1 person also in the top 20%? Obviously power match helps with this, but even at many well-run tournaments the overalls show substantial skew in "opp wins"--so not as much as you'd hope. B: since there's only one judge in prelims, even though we put A judges in "bubble" rounds, what about the round the debater lost that put him/her on the bubble? Say we have at least a 20% chance the judge just made the wrong decision? C: a lot of debaters think they should be able to "mess up" in a round and still break. I'm not counting this as a factor though, because while with time and judge constraints like the ADA this is feasible, in HS it just seems like the answer is "next week, kiddo." Given a number of prelims by the binomial, you can then figure out what % you need to break to include all of the top 20% at p=0.95 or something.

Jim: I have to wonder if the answer for tournaments like Princeton isn't to add a JV division. For those at the top it will be a sufficiently competitive experience that, like Harvard, they will still be learning a lot. For those at the bottom, in late varsity rounds, down 5 or whatever, they'd tend to have opponents and judges who are terrible and don't care anyway, so there isn't much missed there. The major idea is that if you think of JV as people who would be down rounds in varsity, you can shamelessly give them B judges, and thus get your preset, bubble, and first outround judge utilization ratios up for all divisions.

More fundamentally though, I don't think a handful of down-2s not breaking is such a big deal--luck (and speaker points) are part of the game. I worry more about tournaments that just don't have nearly enough total rounds, and thus are clearing half or less of their down-2s.

richmindseed said...

Ryan - first note is that this is a mathematical argument, not a statistical one. No reason to worry about p-values with this model.

More relevantly, the reason i didn't provide the kind of analysis you're talking about is that it's highly dependent on the value of n that you use. For instance, after 6 rounds, 34% of the field is expected to be down-2 or better. After 7, that percentage shifts to 23%. If you say that you always want to break the top 30%, therefore, you end up clearing some 4-3s and not all 4-2s (This, incidentally, is why I originally argued that too many prelims fails to effectively sort the field). Changing the number of prelims doesn't answer this...I've ranted enough about why the algorithmic process described earlier is good, see that above.

As a result, I think you can either be unwilling to compromise on a percentage threshold or on a record threshold, not both. This is perhaps an obvious consequence of the fact that the more rounds there are, the smaller the percentage of people we would expect to be down-2 or better, which is precisely why we want tournaments with a larger entry to have more prelims...

Put more succintly - yeah, there's definitely smarter and dumber ways to figure out how many people should clear, and you're defending one of the smarter ones. I think that a world where the bracket is properly sorted is a world that fits in better with the smarter solution, so I see the question of how many prelims as a more interesting and logically prior one to the question of how many outrounds/people to clear. Finally, I note that the psychological criteria you provide (% clearing, record threshold) are likely to conflict, so people should be really clear on which one they care more about in order to ensure that these decisions make sense. In college parli, for instance, it is not unheard of for 3-3s to clear...

pjwexler said...

For me, I agree with Ryan that there is a difference between a substantial number of down 2s failing to clear and only a few. Speaker points are not exact of course, but there is usually a clear distinction between the top of that pool and the bottom.

If someone is the 33rd seed instead of 32nd that is too bad, and if it is because of bad luck in judging school some round, that is even more too bad, but at the end of the day that is why we have multiple rounds. (I have always liked judging variance myself, though I recognize that there many difficulties with using THAT as a tiebreaker)

Besides, especially among the types of students who go to schools with debate programs, the idea is ingrained that failure in life is automatically associated with lack of effort or talent or both, and that hard work and talent will always be recognized. I would suggest (not that it is a particularly controversial suggestion) that sometimes the most talented and hard working people are not recognized as the most talented and hard-working people, often enough that that there should not be the expectation that one automatically follows the other.

Might breed a little empathy.

In general, I would prefer more prelims and fewer elims, simply because that gives more debaters the bang for their work. Within reason of course, 10 prelims would be absurd in most cases. I do agree that a substantial majority of the down 2s should be breaking, at least at tournaments with large enough fields to justify the practice.