Announcement

Collapse
No announcement yet.

Statistic?

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Why the great effort to deny that the data is skewed, or maybe even greatly skewed? (This is really humorous if you think about it.) I still don't understand the advantage of applying the log to it.

    Comment


    • #17
      Yeah, polynomials are good at interpolation (as you said, even a cubic can be good locally). There are even formulae to tell you what the expected error might be for any degree of polynomial, and given any N points, there is always a polynomial of degree N-1 or less that will fit every point exactly. The problem is with extrapolation. Outside the data, you have to have something that will behave reasonably or you will have something that just won't reflect reality. Not our reality, anyway. That was the problem with the cubic. It was used to extrapolate. Bad idea. Really bad. It was the equivalent of modeling a bacterial growth curve with a quadratic. Sure, it might even fit pretty well. But you know that the real curve will flatten out as resources are consumed. It can't go to infinity and beyond, no matter what Buzz Lightyear says. You have to be reasonable and your model has to be logical. Now, if all you're trying to do is interpolate, you can go pretty wild and probably won't do anyone any harm. If you have a lot of points and you love cubics, you can even use splines and never go to any high-order polynomials. Just don't extrapolate unless you have reason to believe that the cubic really is a logical explanation for the data.

      Comment


      • #18
        Originally posted by JanetH56 View Post
        Why the great effort to deny that the data is skewed, or maybe even greatly skewed? (This is really humorous if you think about it.) I still don't understand the advantage of applying the log to it.
        We're not trying to deny that the data are skewed. Spike and I agree that it seems to be. We're just discussing what kind of skewness applies, and have generally decided that we can't know because we don't have the data. But it looks like log normal is a reasonable guess. See, before you take logs the data will be skewed or very skewed. The log takes the apparent skewness out and can allow you to use the normal distribution to figure probabilities out.

        Comment


        • #19
          Maybe instead of "average score" we should call it "risk free rate".

          Comment


          • #20
            I was searching for any previous posts on long word probabilities and ran across this thread. Before this, I didn't realize the loss of early posts due to upgrade. So apologies to dig up the past, but I thought this thread was really interesting. I agree with just about everything everyone said.

            The bell curve figure pasted over the data isn't really a bell curve, or rather, it is the shape of a bell curve assuming the x-axis is linear. But the x-axis isn't linear (at least not when more than one or two people have played. It is instead is a skewed distribution (to fit the only data used to fit it: 0,average and high score, as bwt1213 pointed out) so that the distribution shown really does have a long tail out toward the high score. Also, if it were a real bell curve and there were a single game played, that data point would be at the average (high point of the curve, the peak of the distribution), instead, it is heavily shifted, with the first data point placed out at 3sigma. Apparently 0 is used as a data point, at least for the first game, maybe included in the average for all games too, to offset using one data point as the high score anchoring one end of the distribution at the 3sigma point?

            Also, regarding the 0 end, games are retired after 1000 plays, and only 0.1% of the tail past 3sigma is lost below 0, so that seems a reasonable point to limit the model.

            Anyway, in effect, some of the more advanced items discussed in the thread, such as taking into accound a skewed distribution by transforming x-axis is already being done, albeit in a simple manner. A two-step piecewise linear scaling is used as a first approximation (past a single linear scale) towards whatever the best continous function applied to x-axis would be to convert to normal distribution (log or other, or varying or maybe not possible with just x axis scaling if binomial distribution). Maybe, if distributions do vary sufficiencly game to game, this simple methods is just as good as something more advanced, at least in early play where there are too few points to see what the distribution looks like for that particular game.
            Last edited by BoggleOtaku; 01-06-2024, 04:36 PM.

            Comment

            Working...
            X