Re: AIS Awards


A few months back, I wrote a short essay on the subject of changing
the AIS awards system. Alas, it didn't get posted to the list and I
deleted it! I decided I would rewrite and expand it, with a certain amount
of care, and post it in response to the recent question. I would also
like Scott to forward it to Terry Aitken to consider for use in the
Bulletin.

*******************************************************************

A SUGGESTION FOR IMPROVING THE AIS AWARDS SYSTEM

Our current awards system receives criticism for the following
shortcomings:

1. At the highest level, TBs are so popular that non-TBs hardly ever
have a chance to receive the Dykes Medal, even those of exceptional
quality.

2. At the HM level, most votes are cast for the irises of well-known
popular hybridizers. Excellent irises by newcomers or hybridizers
living where there are few judges are overlooked.

3. The different classes are not equally competitive. It is much
more difficult to win a TB award of merit, for example, than an MDB
award of merit.

4. Awards are given to irises that do not perform well in different
regions of the country: if an iris is popular in a region that has a
high concentration of judges, that is often sufficient to earn it an
award.

Interestingly, these problems have a single, common cause and a single
solution.

Currently, voting is simply a matter of picking the irises you prefer
from the list of eligible cultivars. This would be reasonable if each
judge were familiar with each iris on the ballot. But we know this is
not the case. It can never be the case--not only because growing every
iris on the HM eligibility list would require a vast estate and a vast
fortune--but also because some classes of irises are just not suited
to some regions of the country. We can't expect judges in southern
Arizona to grow many Japanese irises, or judges in upstate New York
to grow pure aril hybrids.

Under the current system, a no-vote on a judge's ballot can
mean either "I judged this iris in the garden, and it is not worthy
of an award" or "I've never seen it". Likewise, a vote can mean
either "I've judged this to be an iris of excellent quality that is
deserving of an award" or "It's not great, but I don't see anything on
the list that I know to be superior".

The false equation of "haven't seen" = "not worthy" is single-handedly
responsible most of the problems with the current system. TBs win the
Dykes because virtually all judges grow TBs. Not all judges grow Siberians,
so a Siberian effectively receives a "not worthy" judgment from every
judge who's not familiar with it. This is a horrific hurdle for a non-TB
to cross on its way to the Dykes Medal.

Similarly, lesser-known hybridizers are overlooked at the HM level because
of the same "haven't seen" = "not worthy" fallacy. People often blame
this phenomenon on lapses of judging ethics, suggesting that judges give
awards to the irises of popular hybridizers whom they like. One need
not reach for so cynical an explanation, however. The answer is simply
that the irises of popular hybridizers are purchased more, grown more, and
seen more. There's no way to change that. As long as "haven't seen" =
"not worthy", those that are seen are those that will win.

Regional performance is also lost under this system. Judges have no way
to express that an iris is unworthy of awards except by silence, which
makes their protest simply blend in to the vast background of unseen and
unevaluated cultivars. If a judge has never seen a particular variety,
a no-vote on that judge's ballot says nothing about the iris's quality,
and carries little meaning. If, however, a judge has seen the iris die
in garden after garden for five years, the no-vote has an entirely
different significance, and ought to have a large impact on the awards
process. The "haven't seen" = "not worthy" approach treats these two
different no-votes in exactly the same way!

Finally, the difference in competiveness between the different classes
results from a similar problem, this time with the votes themselves,
rather than the no-votes. A vote does not say enough about the _absolute_
quality of the iris; it's really only about its _relative_ quality in the
field of eligible cultivars. Judges are allowed the option of abstaining
if there are no cultivars on the ballot deemed worthy enough. In practice,
though, it is natural to vote for what you feel is the best among the
contenders, even if this means you vote an award to an iris that would
not have "made the cut" if it were competing in a tougher class.

It should now be clear why I say these problems have a single, common
cause. What of the solution?

Instead of assinging awards based on _total votes_, we should assign
awards based on _average ratings_.

What does that mean? Each judge would rate each iris according to the
principles of garden judging. The full-blown point system, with its scale
of 0 to 100, would be an unnecessary burden in this context, but a 10-scale
(or even a 5-scale of bad, poor, average, good, and excellent) would work.
Judges would enter the rating for _each iris they had evaluated_ on their
ballots. An unseen (or unjudged) iris would not receive a rating;
its spot on the ballot would be left blank. The crucial distinction between
"haven't seen" and "not worthy" is made at this point.

Now, if we _average_ the ratings of all judges who have actually
evaluated an iris, we get a measure of the iris's _quality_, not its
popularity or breadth of distribution. This measure of quality can be
used as the criterion for awards. Only a true average will work. Simply
totalling the ratings will not accomplish anything, since it would again
be telling us about how _many_ judges have seen the iris, not _how
good_ they judge it to be.

The criteria for awards would be defined in terms of average rating,
not number of votes cast. For example, if the rating were done on a
10-scale, an HM might require a rating of 8, an AM an 8.5, and a medal
a 9. It might take a year or two using the system before it would be
clear just where the cutoff should be for each level of award. It would
also be prudent to require a mininum number of judges rating each iris
as a prerequisite to receiving each award. An average rating of 9 would
not be significant if it were based on the opinions of only 5 judges,
say.

One could modify the system to require a certain minimum average
rating from each of several geographical regions, but I doubt that
it would be necessary. The low ratings from regions where an iris does
poorly, now separated out from "haven't seen"s, would probably be
sufficient to lower the average and filter out regional performers.

What of cost? This system is more complicated than the simple yes/no
voting currently in practice. Would it be manageable? Would it be worth
the effort?

There would be a small additional burden on judges, who would now need
to make specific judgments on the eligible irises they had seen, rather
than just placing a mark by the names of their favorites. I say this is
a "small" additional burden, because a good judge ought to already be
scrutinizing the eligible cultivars before deciding how to vote. There
would be more marks on the ballot, but they would be just a record of
work the judge should actually be doing now.

The biggest cost would come in tabulating the ballots. RVPs would have
more items to record from each ballot, and the items would be ratings,
not simple yes/no votes. Computing the averages could be done by computer;
it might even be done nationally from all the raw data. I don't think
that is an obstacle, but the actual tabulation might be.

Would it be worth it? Absolutely! I think we simply must work out a
system that measures the quality, not the distribution and popularity,
of our irises. As long as there are many eligible irises that are not
seen by many judges, no system of simply totalling votes will ever
escape the problems enumerated at the beginning of this article; the
problems are simply built in to the system. No amount of tweaking the
requirements will fix it, nor will lecturing judges on ethics. It is
a mathematical consequence of the difference between totalling and
averaging.

Although the proposed solution may seem unfamiliar and complicated,
I hope it will be given some serious consideration. At the very bare
minimum, we need at least _three_ ways to mark a ballot: "yes", "no",
and "haven't seen".


===============================================================

Tom Tadfor Little         tlittle@lanl.gov  -or-  telp@Rt66.com
technical writer/editor   Los Alamos National Laboratory
---------------------------------------------------------------
Telperion Productions     http://www.rt66.com/~telp/
===============================================================






Other Mailing lists | Author Index | Date Index | Subject Index | Thread Index