I've the following players, each value corresponds to a result in percentage of right answers in a given game.

$players = array
(
    'A' => array(0, 0, 0, 0),
    'B' => array(50, 50, 0, 0),
    'C' => array(50, 50, 50, 50),
    'D' => array(75, 90, 100, 25),
    'E' => array(50, 50, 50, 50),
    'F' => array(100, 100, 0, 0),
    'G' => array(100, 100, 100, 100),
);

I want to be able to pick up the best players but I also want to take into account how reliable a player is (less entropy = more reliable), so far I've come up with the following formula:

average - standard_deviation / 2

However I'm not sure if this is a optimal formula and I would like to hear your thoughts on this. I've been thinking some more on this problem and I've come up with a slightly different formula, here it is the revised version:

average - standard_deviation / # of bets

This result would then be weighted for the next upcoming vote, so for instance a new bet from player C would only count as half a bet.

I can't go into specifics here but this is a project related with the Wisdom of Crowds theory and the Delphi method and my goal is to predict as best as possible the next results weighting past bets from several players.

I appreciate all input, thanks.

Comments

trying to pick the best fantasy football team? :)

Written by Kip

@Kip: Not quite, but close. =)

Written by Alix Axel

Re your (bolded) additional idea. Congratulations, you've almost reinvented the standard error of the mean! If you used average - 2*stdev/sqrt(numBets), you'd have the lower bound on the 95% confidence interval surrounding the mean. That value is a not entirely unreasonable way to select the best predictors.

Written by Harlan

@Harlan: Thanks! I wish you had replied with an answer instead of a comment. Using the formula you provided I get some different results (B, D and F). B from 18.75 to 0, D from 62.29 to 43.67 and F also from 37.5 to 0. I think this new formula might be too "drastic" for what I'm trying to get but maybe you care to explain it a little better, I might have a new idea after reading what you have to say, who knows...

Written by Alix Axel

The mean or median is a measure of central tendency. The lower bound of the 95% confidence interval is just that, a lower bound. If what you care most about is "I want to select predictors that are unlikely to have a low central tendency", then using that lower bound is a good idea. If you care more about the actual best central tendency, than use a robust statistic like the median. That help?

Written by Harlan

Just to be sure I understand what you want, (30, 30, 30, 30) would be better than (32, 31, 33, 30), but of course (58, 72, 63, 89) would be better than both?

Written by Tim Post

Yeah, but (32, 31, 33, 30) should be better than (30, 30, 30, 30).

Written by Alix Axel

Accepted Answer

First off, I would not use Standard Deviation if your data arrays have only a few entries. Use more robust statistical measures like Median Absolute Deviation (MAD), likewise you might want to test using the Median instead of the Average.

This is due to the fact that, if your "knowledge" of players' bets is limited to only a few samples, your data is going to be dominated by outliers, i.e. the player being lucky/unlucky. Statistical means may be entirely inappropriate under those circumstances and you may want to use some form of heuristic approach.

I also assume from your links, that you do not in fact intend to pick the best player but rather based on the players next set of answers "A" want to predict the correct set of answers "C" by weighing "A" based on the players' previous track record.

Of course if there were a good solution to this problem, you could make a killing on the stock market ;-) (The fact that no-one does, should be an indication as to the existence of such a solution).

But getting back to ranking the players. Your main problem is that you (have to?) take the percentage of right answers as evenly distributed from 0--100%. If the test contains multiple questions this is certainly not the case. I would look at what a completely random player "R" scores on the test and build up a relative confidence number based on how much better/worse than "R" a given real player is.

Say, for each round of the game generate a million random players and look at the distribution of scores. Use the distribution as a weight for the players' real scores. Then combine the weighted scores using MAD and calculate the Median - MAD / some number, like you already suggested.

Written by Timo
This page was build to provide you fast access to the question and the direct accepted answer.
The content is written by members of the stackoverflow.com community.
It is licensed under cc-wiki