Announcement

Collapse
No announcement yet.

Statistic?

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Statistic?

    I think using a "bell curve" to describe outcomes on this puzzle is a faulty notion. I doubt that the results distribute on a classic bell curve. I would say that the average is likely much more to skew to the 2nd quartile. Maybe this is well tread territory and already discussed.

  • #2
    With no data to back my opinion, I suspect the bell curve might actually be a more reasonable notation -- if the scores were converted to log-normal format. As it is, there should probably be a long tail toward the high score side. I also suspect that the long tail would be "fatter" than a real log normal distribution's. But this was discussed a long time ago in conversations lost in an upgrade. The bell curve shown simply puts the average score in the center and the high score two or three sigma out and arranges everything else to fit that. No, it's not real. But it's amusing and pretty harmless. This is, after all, a game. If this comes off as patronizing, I'm sorry. I don't intend that.

    Comment


    • #3
      Well my statistical education is long long ago but why you would manipulate it as a "log-normal format", I could not fathom since this is straight numerical data. I think the error is that they take that "average" number and depict it at the center of a bell curve. Again, I have no reason to expect that the scores would distribute along a bell curve. But, as you say, it's just a game. It would be interesting to see how the scores distribute is what I am saying.

      Comment


      • #4
        For a log normal distribution, you'd pick a reference score and then express all other scores as a ratio to that score. So, if the median score were 250, a score of 500 would be expressed as 2.0 and a score of 125 would be 0.5. Once that was done, you'd take the natural log of all the values. A score of 250 would then have a value of ln(1), which is zero. Values less than 250 would have values less than zero, and values larger than 250 would have positive values. Why would you do this? Because that's how things seem to be. When I play a board on which Scheffek has the high score of 700, I usually get about 70% of that score. I don't get 200 points less. That makes a big difference on players like RussDNails, for which I also tend to get about 70%. RussDNails will score 850 or 900 on some boards, and I will not get 200 points less. I will get about 70% of the score. To go to an extreme, take Megaword. Megaword might average 850 per board this month, by playing darned near every board that comes up. But I won't average 350 points less than Megaword on the really rich boards, where Megaword will score 1400 points. I will get about 40% of Megaword's score, same as always. I hope this sounds reasonably clear.

        Comment


        • #5
          Another thing you need to consider is how many games have been played on that board. If I were to play a board and then have the same board come up again immediately or almost immediately, I will almost always score higher, often a lot higher. If I encounter that same board and my high score is still the high score but 20 people have played the board since I last did, I am unlikely to beat my previous score and will usually score lower. Your score varies randomly, probably within a log-normal distribution for that board. If you did really well FOR YOU on that board, you probably won't do that well again. If your score weren't so outstanding, someone else would have bettered it already and perhaps you wouldn't know if you were beating your previous score. So, if you want to know if your game is really as good as someone else's on a mix of random boards, it would be really hard to tell by just comparing their score and your score on boards for which they have the posted high score. You'd get the best measure if you played boards where they were the only previous player. SO, don't get discouraged if you can't seem to surpass someone's high score. There may well be a lot of boards where they didn't do all that well and you did outscore them -- but so did some other people and you'll never know the names of all the people you outscored. Some time ago, someone (forget who) posted that they never could seem to beat my high score. It was someone whose high score I could never seem to beat. We were both correct, and that's why.

          Comment


          • #6
            " Some time ago, someone (forget who) posted that they never could seem to beat my high score. It was someone whose high score I could never seem to beat. We were both correct, and that's why."

            This is why I read the forum posts....I love it, (bear in mind I spend my days with kindergarteners).

            Comment


            • #7
              I still don't see the advantage to expressing simple numeric data as a log. I mean these are scores, simple descriptive statistical information that are summations of scoring. Why log it? Seems like someone is trying to make the simple complex.
              I also have never (to my knowledge) gotten to play the same puzzle again. Seems like if I had, I'd have a deja vu feeling. I actually would welcome the chance to play the same puzzles over at least at my discretion to improve my ability to see patterns.
              Isn't anyone else genuinely curious to see how the data actually distributes? Am I alone on that?

              Comment


              • #8
                I would _predict_ a right skewed distribution. https://www.statisticshowto.com/prob...-distribution/

                Comment


                • #9
                  Stock market prices are simple and numeric, and two U of Chicago economics professors (Black and Scholes) won a Nobel for writing a mathematical model showing how options prices based directly on those stock prices obey a log normal distribution (that's the Black-Scholes equation, which I used daily for some years to make serious money in the stock market). And yes, if scores on this game are log-normal distributed, you'd expect to see right-skewed data. More intuitively, if the average score on a board is 250 the top score would probably be about 800. There's more room on the right side of the distribution, so you'd have to expect right-skewed.

                  Comment


                  • #10
                    I can buy that for the stock market because they are trying to extrapolate and predict. The function here is simple descriptive statistics. On, as you have said, a game for leisure.

                    Comment


                    • #11
                      The data are the data, and it matters not if someone is trying to extrapolate or predict. To me, log normal distributions are simple descriptive statistics. I'm not trying to seem superintelligent, but I used statistics heavily for a lot of years. Something you become very familiar with eventually does become simple. It's like understanding that if you think a stock is going down, you'd want to buy a put option rather than sell a call option and you wouldn't want to touch the stock. If you do that stuff every day, it IS simple. If you have never heard of options before, you're practically counting on your fingers to figure this stuff out. I know I was, when I started. But trust me on this, even if this is just a leisure-time game. Just think of it this way: stock prices can't go lower than zero, but there is no practical limit to how high they can go. They have to be skewed right. Scores on this game can't go lower than zero, but the actual limit to how high they can go is many multiples of the average score on a game. So their score distributions for each board are skewed right. And in all the time I've been playing this game, I've observed that the ratio of my score to the score of any other player seems to be roughly constant and only changes if one of us gets better. That tells me log normal is the way to bet.

                      Comment


                      • #12
                        Log normal also makes sense because scoring is not really one word to one score, we find words in clusters.

                        Comment


                        • #13
                          This is a fun discussion to follow. I'm a mathematician & not a statistician, so take the following with a grain of salt.

                          I agree with bwt that log normal makes sense. (More about this later.) He's taken the score and rewritten it as a product of the average score for the board and (roughly) a multiplier (call it g) indicating how good a player is compared to the average player. This is still skewed to the right, but taking the log of g maps it onto the real line, with an expected value of zero. Now, it makes sense that log(g) could follow a normal distribution. If so, the scores for a board will (by definition) follow a log normal distribution. The problem is that "makes sense" doesn't always correspond to reality. There are an infinite number of ways to map g onto the real line, but without actually seeing the true score distribution, it's impossible to say which produces a normal curve.

                          Another complication is that they purport to show the stats for each particular board. Individual boards could have strange score distributions. If, as 6winner9 suggests, the board does have a particularly rich cluster of words, the distribution could have two peaks, corresponding to the groups of players who find the cluster and those who don't. Anyway, I'm guessing that they just assume that scores are normally distributed, take all historical data for the board and back out the mean and standard deviation.

                          Still another complication is that a reasonable value for g can only be found by taking some sort of average over all boards for a particular player, and this may not really apply to any particular board (for reasons like differences in vocabulary between players, ability to see clusters, etc.) This seems to be a more useful way to compare yourself to other players. As bwt notes, he (she?) does fairly consistently with respect to Megaword (although that's pulling the numbers apart in a different direction). That's a reasonable argument that g is "real", but doesn't say anything about its distribution.

                          Comment


                          • #14
                            The really fun thing about this discussion (is it really log normal?) is that the same discussion was had about stock prices and the option prices derived from them. Black and Scholes won a Nobel for giving a firm mathematical ground for pricing options, using a log-normal distribution for stock prices (i.e. each stock's price distribution was log-normal). Since then, other scholars have argued successfully that other distributions fit even better, that even using log normal can be seen not quite right because the tails are too fat (I'm going to skip the mathematical terms here) and some traders claim to have made money by betting against the log-normal distribution and using variants of a binomial distribution. Without having the actual data, there's really no way to know for sure. To me, a log normal feels right. But, heck, it could even be Poisson. The fundamental rule of mathematical modeling is that you pick your starting point based on evidence and on the real world. You don't take the data and then try every kind of model there is and take the one that fits best. You have to be able to defend it logically. Doing it the exact wrong way is to declare that COVID cases were going to zero in three weeks, because a cubic model fit the data. There are two kinds of cubics. One starts low and ends high, the other starts high and ends low, and both kinds have a wiggle in the middle. If you're using a cubic, you can't possibly be reflecting the real world, because you're either saying that the number of cases will become negative or will go to infinity. Using a cubic for that kind of thing is just proof that you can't be trusted with numbers because you're incompetent. If you're going to do a model, you don't want to be or to look incompetent. You don't just pick something because it fits. You pick it because it makes sense, and of all the things that make sense you pick the one that fits the best and can be logically defended.

                            Comment


                            • #15
                              The thing about models of reality is that very few (if any) are true, but good ones are useful (if you recognize & stay within their limitations). Even a cubic can be accurate locally. (The fact that the real data may never go negative or off to infinity is obviously one of these limitations.) You're right about how you settle on a model. It's an iterative process of looking at the data, coming up with a model that seems to fit & makes sense, then testing it with new data, readjusting the model, etc. (Lather, rinse & repeat.) Here, we're lacking the data, so we're kind of stuck at the "makes sense" step. Like I said, it's fun though.

                              Comment

                              Working...
                              X