I have a proposal for a new algorithm to select the POTM.
Why? At the moment, the selection, although random, only selects recent images, and the same ones keep coming back pretty frequently. I would like to see older images have a chance as well, and reduce the frequency of repeating the same ones.
The basic idea? Select an image from
all images above a certain minimum score, but such that the probability of selection decreases depending on how old the image is. For practical reasons, the algorithm uses the object_id, not the time of submission.
Here it is.
Determine the object_id of the latest posted imageLet's call this number
U. It's an integer, and this number is steadily approaching the 1 000 000 mark.
Repeat
Select a random number R between 1 and U (see below).
Until object R refers to an image and has a score that exceeds some predetermined threshold valueThe key question is how to select that random number
R between 1 and
U, such that the probability of
R decreases somehow for lower values. There are actually many ways to do that, but after a bit of trial and error I got one that I believe works quite well.
Let
X be a random number from the interval [0,1> taken from a uniform distribution. Most programming environments have an easy way to get such a number. In MS Excel, which I used for my trials and errors, it's the RAND() function.
Now, calculate
R with the formula
U + 1 - floor(exp(X^0.7 * ln(U + 1))For the record, the floor function rounds down to the nearest integer.
The exponent 0.7 in the formula is more or less arbitrary. I tried a few numbers, and this one worked satisfactory. To increase the probability for selecting older images, simply use a somewhat lower value.
Now how does this work out in practice?
With the highest object id currently around 835 000, and before applying the litmus test to see if it is an image with a high enough score:
- 25% of the numbers are from the 174 most recent objects
- 25% from the next 4244
- 25% from the next 65056
- 25% from the oldest 766091
A different way to look at it is to calculate the probabilities for a given object to be selected (again, before applying the test).
- The 10th most recent object has a probability of 0.5%
- The 100th one has a probability of 0.06%
- The 1 000th one has a probability of 0.008%
- The 10 000th one has a probability of 0.009%
- The 100 000th one has a probability of 0.001%
As for what threshold value to use, that's a matter of personal opinion as well as trial and error to see what works well. I think anything between 75% and 80% will work. With a low threshold we get to see more images that do not have a really high score yet, regardless of whether they have been overlooked or are simply not good enough for a higher score. With a high threshold we only get to see images that have already gathered a lot of (high) votes. It's just a matter of deciding what kind of images we want as POTM. I prefer a somewhat lower threshold, so good images that somehow have been overlooked by many can still pass the test. But we could start somewhere in the middle and see how it goes.
I expect that this would be be real easy to implement. So, the question is, do we want this?