Pente.org » Forums

Home » Forum Home » General

Topic: Color next to our player name
Replies: 47 Views: 364,542 Pages: 4 Last Post: Apr 12, 2019, 12:25 AM by: watsu

Search Forum

Back to Topic List

Topics: [ Previous | Next ]

Replies: 47 Views: 364,542 Pages: 4 [ Previous \| 1 2 3 4 \| Next ]

haijinx

Posts: 64
Registered: Jan 20, 2019
From: Salem Oregon
Age: 48
Home page

Re: Color next to our player name
Posted: Apr 9, 2019, 5:40 PM

watsu wrote:

Obviously (since this ground was covered in detail before and after ratings were changed to sets) we could go around and around on the question of whether or not a split set should benefit the lower rated player and penalize the higher rated player.

We could, but the guy who created the system we all use, Elo, is pretty clear that draws should be rated to avoid inflation and as the higher rated player puts more into the pot, they would lose points on draws.

While it's true that Pente has no draws, we've created them artificially in sets and they should be handled, not discarded. Discarding data is just intrinsically wrong if we're trying to approximate playing strength.

The other argument, that P1 and P2 ratings are divergent, is also true in chess and is perfectly manageable there. I routinely compiled my effective rating as white, as black, in certain openings, etc. because it's useful data and one rating is not necessarily the best measure in all cases.

I made these arguments as a 1600 player and was told I'd feel differently once my rating was higher. I do not feel any different.

The rating system is broken due to inflation...and even the database struggles to give real results because yesteryear's 1800 player can beat today's 2300 player.

I understand the reluctance of highly rated players to "lose" rating points. I get it. They just aren't real rating points if they do not accurately relate playing strength.

Apologies to watsu for repeating myself, as we've discussed this privately before. Just wanted my position clearer here.

haijinx

Posts: 64
Registered: Jan 20, 2019
From: Salem Oregon
Age: 48
Home page

Re: Color next to our player name
Posted: Apr 9, 2019, 6:36 PM

re: ratings protection as endemic in gomoku-based game culture

I don't know enough to speak. This is my only site and only been here 2.5 months.

There are ways to tweak the ratings system though that do not involve tossing over half the data away.

For example, P2 wins could be worth more. The contents of the pot (who puts up what share of rating points) could be dependent on P1 wins.

IOW, two different pots....with different contributions...one awarded for the P1 win, one for the P2 win. That might be a way to address the P1/P2 disparity better.

If the P2s got a fair shake for their victory in the face of adversity, that should reduce not increase the problems of ratings protection culture. I'd hope...

watsu

Posts: 1,500
Registered: Dec 16, 2001
Home page

Re: Color next to our player name
Posted: Apr 9, 2019, 6:40 PM

Apologies for editing my last post after haijinx had replied. One more thought - TB ratings should NOT be compared with live game ratings, which is what most of the older ratings were, since TB is a relatively recent popular option here. Pre set live games, the ratings glass ceiling was pretty clearly set at 2100 as Sjustice's and Virag's ratings show. Post live game set implementations, the ratings ceiling increased by a few hundred points, as Nosovs' 2352 rating established close to a decade ago. Trying to compare a TB rating to those ratings for live games just doesn't make sense, due to DB access, position analysis time, etc. If international masters in chess all played 2 day per move postal games all of a suden, one would expect to see a few to several hundred point jump at the top level. It's apples and oranges, despite the fact that many live games by past masters are worthy of studying the way one would another's TB games.

Message was edited by: watsu at Apr 9, 2019 7:19 PM

Retired from TB Pente, but still playing live games & exploring variants like D, poof and boat

karlw

Posts: 973
Registered: Mar 7, 2006
From: Eugene, Oregon
Age: 36

Re: Color next to our player name
Posted: Apr 9, 2019, 10:16 PM

You can count me in the camp that believes a drawn set should affect ratings, although of course only slightly. If we can quantify, at the master level, how many rating points the P1 advantage is worth--let's say it's 200--then a 2100 beating a 2350's black is an upset, and the ratings should be adjusted to reflect that.

100% willing to admit that this is just my personal opinion, however.

watsu

Posts: 1,500
Registered: Dec 16, 2001
Home page

Re: Color next to our player name
Posted: Apr 9, 2019, 11:02 PM

I don't consider myself 100% opposed to the idea that a draw should affect ratings (minorly). However, I do feel that there will be a price to pay for that in ratings protection. Is it worth paying that price? Possibly. Now that there is a greater ratings spread on the site than there was back in the bad old days of live single game ratings there will likely always be a greater ratings spread than before. I'm certainly willing to see how it would play out - perhaps dual ratings for a transitional period until it seems like the split set differential is accurate?
I'm talking at the moment only about Turn Based Pente ratings. Considering the relatively small number of players and rating spreads in variants and live games I'd currently be leaning towards opposed to doing this in the live and variant settings, as I know I at least would play fewer rated sets of variants if splits favored the lower rated player.

Retired from TB Pente, but still playing live games & exploring variants like D, poof and boat

haijinx

Posts: 64
Registered: Jan 20, 2019
From: Salem Oregon
Age: 48
Home page

Re: Color next to our player name
Posted: Apr 10, 2019, 5:12 AM

I'm just going to put this out here in the hopes it helps discussion. My understanding of a basic Elo systems runs like this:

There's a pot of rating points up for grabs in every game. The total number of points is constant, let's say 32. The points (ante) put in by each player is related to the rating difference between the players.

even = 16 pts each
25 ==> 15/17
50 ==> 14/18
100 ==> 12/20
200 ==> 8/24
300 ==> 4/28
375+ ==> 1/31

The winner gets all the points, the loser forfeits their ante.

When there is a tie, they split the pot between them, 16 points each. If the players are equally rated and they tie, nothing happens.

If a player ties someone rated 100 points higher, they would gain four points and that person would lose 4 points.

(100 difference yields a 12/20 division, then they split the pot with 16 each, so the net is +/- 4).

As this is one game, not two like our sets, this is half the rating change we'd see without tweaking. IOW,

game 1, higher rated wins, they go up by 12, the other down by 12.
game 2, lower rated wins, they go up by 20, the other down by 20

yields a difference of 8 because it's two games, not one.

I do not know the differences between a simplified Elo system and whatever we're using here now. I do know a simplified Elo system will work better than what we have now though.

watsu

Posts: 1,500
Registered: Dec 16, 2001
Home page

Re: Color next to our player name
Posted: Apr 10, 2019, 5:34 AM

Basically, what you're describing sounds pretty similar to what we had before ratings by sets were implemented here. I know the Elo here is modified, rw can probably call up the formula more easily than I can dig it out of an old forum post. In any case, I'd be opposed to having ratings here change to the degree in which a tied set would change if the games were played individually, i.e. 8 points per set with a difference of 100. Would even 4 per set with 100 points rating difference still be too much? IMO most likely so. It might not matter much for the hundred point differences, but consider the 300 point difference scenario - would I play anyone 300 points below me rated if I stood to have a net loss of 12 rating points per set (to say nothing of 24 points per set) if we split? Not likely.

ETA: okay, I dug this out of the site's FAQs, but cutting and pasting it doesn't format it correctly, so here's the link:
https://pente.org/help/helpWindow.jsp?file=faqGeneral#ratings

Ratings are calculated with two different formulas, one for provisional players, and one for established players. Here is the formula used for established players.
1

r1 + K * ( w - ( ------------------------ ) )

1 + 10 ^ ((r2-r1)/400))
Where r1 is your rating, and r2 is your opponents rating.
w is 1 for a win, and 0 for a loss.
K is the largest amount your rating can change for any game, this value is set to 32 when 2 established players are playing. When playing against a provisional player, K is scaled by n / 20, where n is equal to the number of games the provisional player has played.
The '^' symbol means to the power of.
For provisional players, ratings can jump about dramatically. Every game a provisional player plays first has a value calculated. This value is equal to

value = ( r1 + r2 ) / 2 + w * 200 + e * 200
Where r1 and r2 are the same as defined above.
w is 1 for a win and -1 for a loss.
e is 0 if your opponent is provisional, otherwise it equals w.
Then that value is incorporated into your new provisional rating by the following formula.

rating = (value + (rating * total1)) / total2

Where total1 is the total games played excluding this game and total2 is the total games played including this game.

Message was edited by: watsu at Apr 10, 2019 5:43 AM

Message was edited by: watsu at Apr 10, 2019 5:48 AM

Retired from TB Pente, but still playing live games & exploring variants like D, poof and boat

haijinx

Posts: 64
Registered: Jan 20, 2019
From: Salem Oregon
Age: 48
Home page

Re: Color next to our player name
Posted: Apr 10, 2019, 5:52 AM

watsu: I think we just disagree on this. I think 8 points is fine in that example case...but I'm spitballing because I don't know the size of the pot (think I read it was higher than 32...certainly seems to be and I don't know the reasoning there) or the percentage table for the rating difference they were using. It's 400 for a 95% win in the chess world, but a different number might be right here.

However, we don't need to do a risky trial and error phase.

We have everyone's rating on 1/1/19. We have all of the win/loss results. We, well someone I'm sure, can rerun all of the results through 4/1/19 with a modified rating formula. Then we can compare how the ratings look between the formulae. If the list and changes seem better with a modified formula, then it can be instituted.

In short, we can run with historical data without risking a change destabilizing the system before we know it will work. We could even try several different changes and debate about which is best and why.

There are tests and tools for Elo systems in Elo's book, too. It's been years, but there might be something there to help narrow down sources of inflation. I'll try to skim it more deeply tomorrow.

madmike

Posts: 152
Registered: May 27, 2014
From: Abilene, TX
Age: 69

Re: Color next to our player name
Posted: Apr 10, 2019, 6:00 AM

With my rating of 2150-2200, if I win one game out of the set against a player with a rating 3,4,500 or more points than mine, I feel like I've accomplished something and I should be rewarded.

Not saying my reward should be much.

Do people really so jealously guard their rating as to not risk losing games to lower rated people. If your rating is enough higher you should win anyway.

Is there any reason not to a rating as p1 AND a rating as p2 along with the established set based rating.

And I still think the red rated group has gotten awfully spread out.

watsu

Posts: 1,500
Registered: Dec 16, 2001
Home page

Re: Color next to our player name
Posted: Apr 10, 2019, 6:17 AM

@ madmike "Do people really so jealously guard their rating as to not risk losing games to lower rated people." Yes, some more than others, but yes.

"If your rating is enough higher you should win anyway."
Not if you use the database well. I know not every subscriber uses the database, but the point is they can. Even without the database, it's still not that difficult to find winning P1 lines. Just look at the games played by 10 or 15 top rated players...

@ haijinx
It's likely 64 per set maximum change, there was some forum debate as to whether it should remain 32 for sets or go up to 64 since it was 2 games.

Sure, 8 might be fine for splitting with 100 points difference... but is 24 fine for splitting a set with someone who has a 300 rating point difference when so many winning P1 lines are available for subscribers to browse in the DB during TB games?
Would I have risked splitting our rated set and losing more than 24 points to gain 8 if I beat your P1 and your P2? Doubtful...

ETA:
I'm not saying anyone here would ever do this, but the possibility exists to try to mirror the other player's moves in order to have a better chance of splitting a set with them, assuming one gets rewarded for splits. Or play two sets against two higher rated players and mirror the moves between them. Or again, one can just use the DB.

Message was edited by: watsu at Apr 10, 2019 6:31 AM

Retired from TB Pente, but still playing live games & exploring variants like D, poof and boat

haijinx

Posts: 64
Registered: Jan 20, 2019
From: Salem Oregon
Age: 48
Home page

Re: Color next to our player name
Posted: Apr 10, 2019, 6:44 AM

re 64 pt pots

Well, I think the games should be rated independently even if there are sets. This will make the system more responsive and accurate. And if done that way, 32 is better.

The higher players still get a safety net...if they lose as P2, they can always force the win as P1 thereby drawing the match.

Given a 100 pt difference, the results would look like this:

Player A 100 pts above Player B

Ante, per game:
Player A ==> 20
Player B ==> 12

If player A wins both games, he earn and B loses 24 points. If Player B wins both games, he earns and A loses 40 points. A tie merely shifts 8 points to A from B.

That just doesn't seem like a lot, yet it fixes a myriad of issues. Including a sense of fairness that is missing now.

watsu, I have a strong suspicion that if all games were rated, the ratings curve would look a lot more like we think it should look. That would necessarily mean a contraction of inflated ratings. Mine included.

Also, you mentioned how this is different than the live rating, that it's apples and oranges. The thing is, it should not be different. Essentially the same player pool playing the same game. There's a reason why chess ratings and correspondence chess ratings follow the same curve and rating classes. They use the same system, despite books and references, etc.

I think the reason the turn based ratings look so different is that draws were discarded pretty much from the get go, causing both inflation and essentially income disparity.

yet another nickel...

watsu

Posts: 1,500
Registered: Dec 16, 2001
Home page

Re: Color next to our player name
Posted: Apr 10, 2019, 7:11 AM

"Essentially the same player pool playing the same game."
Sorry, but it's just not the same game. Live Pente vs. TB is just not analogous to chess and postal chess. How many chess games last 14 moves? Yes, in theory Pente is probably both broken and simple enough that if I spent a couple of years of my life studying it I could get to the point where I knew all the twists and turns in the DB by memory and could play them live. Frankly, it's not worth the effort to me. I'd rather use those brain cells to play a better game of Boat Pente. On the other hand, it took me relatively little effort to use the database to select and study my opponents before I played them in order to arrive at my current TB Pente rating. It was still a bit boring, I'll admit, but for the most part I was just testing out a line I had found in the wedge against all the active reds I could get to play me at the time. Sure, there was a bit of competition involved as well, since Dmitri King and Kevin Sackett were both very active and moving up rapidly at the same time.
Some players have enough renju background or TB Pente under their belts (or both) that they really have little need for the database (either in live or TB games), but those are the players who everyone else studies. For the rest of us, the difference between live games and TB games is significant and nosovs' 2352 live rating should not be compared with someone else's 2352 TB rating. TB here I'd put him up around Pente_gon, perhaps higher. But he didn't play TB here.

One other chess/Pente example comes to mind in the apples/oranges category:
progambler AKA Pente champ was a 1700 chess player who played a 2200 postal game. 20 minutes often just wasn't long enough for him to "work out the lines in his head" as he put it during a game I played here against him - after he had paused for about 10 minutes at move 5 in our live game.

Retired from TB Pente, but still playing live games & exploring variants like D, poof and boat

haijinx

Posts: 64
Registered: Jan 20, 2019
From: Salem Oregon
Age: 48
Home page

Re: Color next to our player name
Posted: Apr 10, 2019, 7:28 AM

Well, back to apples and oranges.

Let me jump back further.

All working Elo systems look like each other. There's similar learning curves, similar concepts of the "game" for each of the similar rating stratifications.

In USCF, an "expert" means something similar to what FIDE means by "expert" to what correspondence chess clubs mean by "expert" etc, etc. The rating curves even look alike. Inflation even looks the same in them.

The Live Pente ratings seem Elo-typical, but I haven't really investigated.

The TB Pente ratings do not seem Elo-typical. There is no clear or coherent concept of "expert" even if we did use the label. One aspect of this a-typicalness is the rapid inflation within the TB ratings themselves. The discarding of ratable games, which sort of violates a core principle of these systems, is assuredly a key factor in this.

I can't think of any other Elo system where the top rating hit above 2700 in so few rated games. It's really wild. And our rating ceiling is higher...

Now, none of this is to say that a player should have some universal game rating. There are people who are sharks at speed chess and suck at real time controls. There are people who are good at G/30 and weaker in real time. An expert though is an expert, a master is a master...these normally relative classifications are spread out here because of the inflation.

Message was edited by: haijinx at Apr 10, 2019 7:29 AM

watsu

Posts: 1,500
Registered: Dec 16, 2001
Home page

Re: Color next to our player name
Posted: Apr 10, 2019, 8:03 AM

"The discarding of ratable games, which sort of violates a core principle of these systems, is assuredly a key factor in this."

Ratable games have been discarded in live games here for over a decade now, yet as you put it "Live Pente ratings seem Elo-typical".

OTOH, no ratable games have been discarded from a certain other site where players play live gomoku in single games, draws are possible and they have players with 2900-3000 range ratings - admittedly, most of the highest rated players have played a whole lot of games there, but there is one new to the site player who is at a 2504 rating after playing 182 games.
Other active players are at 2375 after 346 games and
2334 after 426 games. One inactive player is rated at
2843 after 190 games.

I'm not and have not been saying that TB ratings aren't currently hyperinflated here (they are) or that the no penalty for draws isn't primarily responsible for that hyper inflation (it is).
But, anyone who wants to can be a TB expert or master at Pente, for any given value of the words "expert" and "master". The database breaks TB that much.

Retired from TB Pente, but still playing live games & exploring variants like D, poof and boat

karlw

Posts: 973
Registered: Mar 7, 2006
From: Eugene, Oregon
Age: 36

Re: Color next to our player name
Posted: Apr 10, 2019, 8:19 AM

I'm trying to keep my record going of stating opinions as if they are fact, so here's another hot take, one that I've learned through years of experience in both pente and chess:

If you feel the need to "protect" your current rating, it's probably because you know that it's not your true rating. Ratings are established by playing, not by hoarding.
.
.
.
.
Also though, watsu is right about TB pente not being analogous to correspondence chess and the outsize effect that rigorous database studying can have on the games. Please, no one look up my recent performance versus a certain K7....

Replies: 47 Views: 364,542 Pages: 4 [ Previous | 1 2 3 4 | Next ]

Back to Topic List

Topics: [ Previous | Next ]