Announcement

Collapse
No announcement yet.

Reverse engineering of scoring algorithm

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Reverse engineering of scoring algorithm

    Since the scoring algorithm has been regarded as a state secret. I took a shot of reverse engineering the results via a simple regression analysis. First a few words about the data that I used. I have done about 350 puzzles and have solved 65% of them (typical of most puzzles). However I have only collected data from my last 130 puzzles. These puzzle included from 9 to 26 clues. I am no 'wizz' as my best result on a puzzles 110% of the average. My highest score is 480 points. My lowest score is 100 points. My average solution time is about 2.5 time the average time.

    When I started fiddling with the results I noticed that if I calculated the % of points that I obtained on a particular puzzle and compared it to the ratio average time/my time, there was monotonic relationship between the % of points and the quantity average/my time for puzzles where when my average time was less than or equal to twice the average time (30 puzzles). I also noticed that the per cent solution rate was totally uncorrelated with determining the number of points awarded.I applied a linear regression analysis to the data and developed the following relationship

    points awarded = ((0.971 * average time/my time)-0.263)*maximum points on the puzzle.

    The results of the regression analysis showed a correlation analysis of 0.99 with a standard deviation of 2.8 points. Now what about the puzzles that took longer than twice the average time to finish?

    As I mentioned before there seems to be a floor of 100 in awarding points (I never use hints). Thus as the solution time goes above 2.0, the score calculated by the equation above drops below 100, especially for puzzles below 15 clues. I haven't been able to figure out how the equation has been changed but I do notice that if my time/average time goes above 2.25, the awarded scores are all between 100 and 125 with no obvious methodology.

    Since my data is limited to puzzle solution time that are greater than the average time, I would be interested for people that have solved puzzles in faster times to PM me the results in points, times and number of clues. I'm sure that the equation doesn't hold for very, very fast solutions as if the solution is about 1/3.5 times the average, maximum points would be awarded.

    Maybe the administrator will pipe in and let me know if I am on the right track.


  • #2
    Originally posted by duhmel View Post
    Maybe the administrator will pipe in and let me know if I am on the right track.
    You're warm. :-)

    If you enjoy our puzzles, please consider upgrading to a premium account to remove all ads and help support us financially. Thanks for your support!

    Comment


    • #3
      I have now solved over 400 puzzles and have been able to deduce a very accurate algorithm to estimate the number of points awarded per puzzle with a few caveats. While my first attempt to derive the formulas utilized a linear regression analysis, as I obtained more results it became apparent that a quadratic regression model fit the data more accurately. The algorithm is broken into three parts - 1) puzzles solved in less than the average time, 2) those solved between the average time and 2.2 times the average time and 3) puzzles solved in greater than 2.2 times the average time. The models for 1) and 2) have a correlation coefficient of 0.99 (for you probability junkies) with a standard deviation for the amount that the estimated number of points differs from the actual points by about 5 points. Important to note that for items 1) and 2), the derivation of points is ONLY dependent on the quantity of (my time / average time). Neither puzzle solution percentage nor time to solve enters into determination of points.

      1) For puzzles solved in less time than the average but more that 1/3 times the average (fastest time I have solved a puzzle)

      points awarded = ((-0.0835*x*x+0.451*x+0.240)*maximum points on the puzzle.

      where x=my time/average time. For a puzzle solved in the average time, the award is 0.61 times the maximum points for the puzzle

      2) For puzzles solved in a time longer than the average to 2.2 times the average

      points awarded = ((-1.450*x*x+2.914*x-0.896)*maximum points on the puzzle.

      where x=my time/average time. For a puzzle solved in the average time, the award is 0.57 times the maximum points for the puzzle. The model is slightly off at the crossover point of my time=average time.

      3) It became apparent that these equations do not hold when solution time is greater than 2.2 times the average. For these cases the points that are awarded range between 100 and 132. I see no consistency in the way points are awarded for results n this region. Obviously solving a puzzle without getting 'hints' gets you at least 100 points and up to 132 points. But how our friendly Administrator derives the award for solution times greater than 2.2 is mysterious. It is clear to me that it is not related to solution time, (solution time/average time), per cent of people solving the puzzle or number of clues.

      I welcome any comments or data on very fast solution times.
      Last edited by duhmel; 11-24-2021, 01:52 AM.

      Comment


      • #4
        Maybe I'm missing something, but it looks like formula #2 is zero at about x = 1.63, which would mean negative points for 1.63<x<2.2.

        Comment


        • #5
          Originally posted by M Schereau View Post
          Maybe I'm missing something, but it looks like formula #2 is zero at about x = 1.63, which would mean negative points for 1.63<x<2.2.
          Good catch - typo error in the formula. The x-quantity should be = (average time/my time). Now x=1.63 would give a score of about 0.36 x available points. I have corrected the formula in my original post.

          Comment


          • #6
            I don't know what's better, that you're doing the work to figure this out or that your work is being peer reviewed.

            Keep up the good work!

            Comment


            • #7
              Originally posted by NeonClaws View Post
              I don't know what's better, that you're doing the work to figure this out or that your work is being peer reviewed.

              Keep up the good work!
              Doing the work satisfies an itch as I am a retired engineer - see a problem, solve the problem. Having some people comment on the results is gratifying. Now if I only can get some more results for solutions less that the average solution time, I will be able to improve the results in that region.

              Comment


              • #8
                I collected a little bit of data, with times in the range of 0.4-0.8 times the average. It matches pretty well with your formula #1 from the 11/23 post.

                But also, if you invert the time fraction, so that:

                x = my time / average time

                And then plot it, it's pretty linear. Rounding the parameters gives:

                points awarded = (1-0.4*x)*maximum points

                This is much simpler, and the residuals come out smaller, even after rounding the parameters. I think this might be the formula used, at least in this time range.

                Comment


                • #9
                  Originally posted by M Schereau View Post
                  I collected a little bit of data, with times in the range of 0.4-0.8 times the average. It matches pretty well with your formula #1 from the 11/23 post.

                  But also, if you invert the time fraction, so that:

                  x = my time / average time

                  And then plot it, it's pretty linear. Rounding the parameters gives:

                  points awarded = (1-0.4*x)*maximum points

                  This is much simpler, and the residuals come out smaller, even after rounding the parameters. I think this might be the formula used, at least in this time range.
                  Good catch. Looks like I have been overthinking the analysis. As you suggested I flipped the value of x and get the same equation

                  points awarded = (1-.4*x)*maximum points for the entire range 0.4 < x < 2.0.

                  The correlation comes out 0.999 for over 200 points. Now if only one could figure out how the points are assigned for solutions greater than 2 times the average time (actually for larger numbers of clues, the equation above still hols up to 2.25 x average time(. It seems to me that if you get a solution, you get a value between 100 and 132, seemingly random.

                  Comment


                  • #10
                    Originally posted by duhmel View Post

                    Good catch. Looks like I have been overthinking the analysis. As you suggested I flipped the value of x and get the same equation

                    points awarded = (1-.4*x)*maximum points

                    The correlation comes out 0.999 for over 200 points. Now if only one could figure out how the points are assigned for solutions greater than 2 times the average time (actually for larger numbers of clues, the equation above still hols up to 2.25 x average time(. It seems to me that if you get a solution, you get a value between 100 and 132, seemingly random.
                    Looking at the points awarded in more detail it is clear that the equation

                    points awarded = (1-.4*x)*maximum points

                    as long as the 'points awarded' is greater than 100 where x=(time to solve/average time). Note that the value of x that give 100 points changes for the different number of clues. If the equation gives a value less than 100, an 'arbitrary' number between 100 and 132 is the awarded points.

                    This gives an interesting outcome. I did a 21 clue puzzle at 2.25 times the average and got 109 points per the equation. However when I solved a puzzle with the same number of clues in a longer time (2.34 x average) I got 127 points where the equation would predict 72 points. Thus, taking longer to solve turned into a higher score. I have data that shows that this happens for every number of clues. How the administrator assigns scores for the equation would give less than 100 points remains a 'state secret' but I know it is not dependent on time to solve, percent solved, or solution time compared to the average. Actually it looks like a random number - the administrator wanted to give everybody at least 100 points for a solution, but instead of just giving 100, he has randomized the award to between 100 and 132. I'm just guessing on this but that's what the data shows.
                    Last edited by duhmel; 12-14-2021, 05:20 PM.

                    Comment

                    Working...
                    X