Home » Forum Home » General

Topic: Ratings computation bug
Replies: 7   Views: 67,584   Pages: 1   Last Post: Apr 27, 2019, 2:38 PM by: haijinx

Search Forum

Back to Topic List Topics: [ Previous | Next ]
Replies: 7   Views: 67,584   Pages: 1  
rainwolf

Posts: 766
Registered: Apr 12, 2008
From: Singapore
Age: 44
Home page
Ratings computation bug
Posted: Apr 21, 2019, 8:35 AM

After a little investigation with help from pente_gon, it seems there's quite a discrepancy between the description in the faq and the actual code.

- It seems a k-value of 64 is used in the code, whereas the faq describes a k equal to 32, and,
- The cutoff for provisional players is 20 wins/losses (not draws), whereas the formula establishes that k is to be rescaled for provisional players by the fraction of played games vs the cutoff of 20. The problem here is that in the code not the wins+losses are counted but draws are included, and still counted as a fraction of 20. This means that when you are still provisional with over 20 games, this factor becomes larger than 1 and the rescaling blows up k instead of fractionally reducing it.

I'm proposing the following changes in the code to align with the faq.
- I'll set k to 32, and,
- either include draws for the provisional cutoff, or exclude draws when calculating the rescaling factor.

I'm not quite sure which is the best for the latter, I'm leaning more towards including draws for the cutoff. Any ELO experts care to weigh in?


watsu

Posts: 1,468
Registered: Dec 16, 2001
Home page
Re: Ratings computation bug
Posted: Apr 21, 2019, 12:25 PM

I'm not an Elo expert, but can't resist adding my two cents. I'd advocate against aligning the code to the FAQ for the following reason - the FAQ was written based on code which was created at a time when Pente ratings were calculated based on the result in an individual game, hence draws were not considered. When rating calculations based on the result of a set were implemented, there was discussion on whether K should be left at 32 or raised to 64, since winning a set involved the winning of two back to back individual games. The thought in raising K to 64 was that lower rated players should be substantially rewarded in rating points for putting together back to back wins over a higher rated player, including winning out against the P1 advantage (which was why ratings by sets were implemented).
As haijinx will no doubt point out, rating sets and not changing ratings when sets are drawn has led to most games played here being unrated, since there is no rating change in either direction when a set is drawn.
Personally, I would do the following:
1. Leave K at 64, because it is two back to back wins by a player.
2. Temporarily adjust the provisional code to account for draws - I would say exclude them for now, since that is what is happening in established ratings.
3. Give notice to players on the site that the rating formulae are going to be adjusted in the near future in order to rate drawn sets.
4. Change the rating system in the code for both provisional and established players in order to adjust ratings for the drawn set result.
5. Update the FAQ to reflect the changes to the code (this could also be done after 2. as well).

ETA - one other thing is that I'd recommend adding a rating floor (which haijinx says they have in chess) so that players' ratings don't wither down more than 200 points below their highest 100 level of rating, e.g. a 27xx highest rating would drop no lower than 2500, but their opponents would continue to gain points for wins and draws against them.

Retired from TB Pente, but still playing live games & exploring variants like D, poof and boat
haijinx

Posts: 64
Registered: Jan 20, 2019
From: Salem Oregon
Age: 48
Home page
Re: Ratings computation bug
Posted: Apr 21, 2019, 1:17 PM

Good morning!

Well, also no expert, just an enthusiast...I think that:

a) if n = 2 (number of games being rated), then k should be 64. The FAQ is currently in error, not the formula being used. Change the FAQ is best.

b) draws should be included in all ratings calculations, including provisional. By not rating games, the system is not nearly as accurate as it could be. The ratings inflation is multifactorial, but failing to rate split sets is a big part of it.

c) Warning people that this will change in the near future seems wise. Just a week or something.

d) If there's a way to rerun the year-to-date ratings (or even just last four weeks) with that modification, we can compare those results to the current ratings. A quick review of the active players will show us whether ratings split sets is worthwhile. I assume it is, but we can check if someone wants to put in that time.

e) updating FAQ seems important once the decisions are made

rainwolf

Posts: 766
Registered: Apr 12, 2008
From: Singapore
Age: 44
Home page
Re: Ratings computation bug
Posted: Apr 24, 2019, 3:46 PM

I'll make the change to not include draws for the k-factor rescaling shortly.

So, rating adjustments for draws then?
In this case, the provisional cutoff should include drawn sets as well, correct?

- What formula needs to be used to adjust ratings for a drawn set? Can you provide a source so I can verify it?
- In the case of Go there are no sets played, only single games with no draws possible. For these it seems I need to use a k=32?

watsu

Posts: 1,468
Registered: Dec 16, 2001
Home page
Re: Ratings computation bug
Posted: Apr 24, 2019, 5:18 PM

"In this case, the provisional cutoff should include drawn sets as well, correct?" Yes

"In the case of Go there are no sets played, only single games with no draws possible. For these it seems I need to use a k=32?" Yes

I'll defer to haijinx on best source(s) for adjusting the rating formula to handle draws, but I can probably dig up something for you if needed.

Retired from TB Pente, but still playing live games & exploring variants like D, poof and boat
haijinx

Posts: 64
Registered: Jan 20, 2019
From: Salem Oregon
Age: 48
Home page
Re: Ratings computation bug
Posted: Apr 24, 2019, 5:42 PM
IMG_4819.jpg (48.5 K)
IMG_4820.jpg (17.2 K)
IMG_4821.jpg (12.7 K)

In the system used here, you'd set w to .5 for a drawn set.

I'll attach some pics for source...image 4820 shows that in the continuous formula, w is set to "1/2" for draws.

rainwolf

Posts: 766
Registered: Apr 12, 2008
From: Singapore
Age: 44
Home page
Re: Ratings computation bug
Posted: Apr 27, 2019, 9:43 AM

> In the system used here, you'd set w to .5 for a
> drawn set.

But for provisional players, it seems w should be 0, correct?

haijinx

Posts: 64
Registered: Jan 20, 2019
From: Salem Oregon
Age: 48
Home page
Re: Ratings computation bug
Posted: Apr 27, 2019, 2:38 PM

Sorry, I didn't look at the provisional formula...yes, with what you have here, draws for provisional players would have w set to 0.

Replies: 7   Views: 67,584   Pages: 1  
Back to Topic List
Topics: [ Previous | Next ]


Powered by Jive Software