Announcement

Collapse
No announcement yet.

Surprising word usage weightings...

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Surprising word usage weightings...

    I just discovered that the singular ROOIBOS is rated as ultra rare while the plural ROOIBOSES is rated as widely used. I've used the singular a lot. I always keep rooibos in my tea drawer. I don't think I've ever seen or heard anyone use the plural in my life.

    Where exactly are the word usage stats coming from?

  • #2
    Word usage stats are admittedly spurious, and come from a database I licensed more than a dozen years ago at this point. Going off memory I think this database just did a simple count of how many times a word was found within a given (immense) catalogue of English literature, and the score was based off that.

    After we launched the latest batch of puzzles I installed a new form of data tracking that will tally real-world game play statistics for each individual word on each board, and the long-term plan is to recalibrate "word usage" to align with that metric instead. So ROOIBOS might be scored as "rare" on one board (i.e. where you really have to dance and jig around the board in order to spell it out, and only 5% of all users ever find it), and yet it might be "common" on another board (i.e. where it is almost completely spelled out horizontally with just one or two letter turns required and 70% of all users find it).

    My guess is we will need a solid year or more of collecting data before this new method can be implemented.
    If you enjoy our puzzles, please consider upgrading to a premium account to remove all ads and help support us financially. Thanks for your support!

    Comment


    • #3
      Interesting!

      I'm not suggesting that ROOIBOS and ROOIBOSES should be at any particular level -- I don't think either word is that common. My surprise was mostly that the plural was scored as being much more common than the singular, when in my experience (as someone who drinks rooibos tea regularly) the singular seems to be much more commonly used than the plural.

      Comment


      • #4
        I am surprised at he plural form of the word. Rooibos is a South African word and originates from Dutch. We would write the plural with double s, so rooibosses. (Bos means shrub or bush in this case). English never stops to amaze me...

        Comment


        • #5
          Originally posted by admin View Post
          After we launched the latest batch of puzzles I installed a new form of data tracking that will tally real-world game play statistics for each individual word on each board, and the long-term plan is to recalibrate "word usage" to align with that metric instead. So ROOIBOS might be scored as "rare" on one board (i.e. where you really have to dance and jig around the board in order to spell it out, and only 5% of all users ever find it), and yet it might be "common" on another board (i.e. where it is almost completely spelled out horizontally with just one or two letter turns required and 70% of all users find it).

          My guess is we will need a solid year or more of collecting data before this new method can be implemented.
          So, based on your example, would the point value of the specific word change with each board? Would the rare word ROOIBOS have a higher point value than the common one? For long words will the current point value be lessened from their current value? For instance today I found UNEXTINGUISHABLENESSES for 44 pts. If a lot of people find it in the next year, will its value drop thereafter?

          Comment


          • #6
            Yes. The idea is that there would no longer be "global" rarity values for words, their rarity scores would be based on real-world statistics from only the board currently being played. Scoring based on word length would remain unchanged.

            Here's an example data dump of what these real-world solving statistics look like for a random 5x5 board. Percentage is how many players successfully played that word. It is interesting to see that unusual words like "CILL" and "SESSES" are actually very commonly found (by more than 50% of players), and yet "common" words like "ENTER" and "RETEST" are found by less than 2% of players. This shows just how much individual board layouts can dictate real-world rarity values. Note that the top three words (LITER, LITE and LITES) are more or less automatically spelled out on the second row, making them super obvious and easy to find. "ENTER", on the other hand, a word that everyone is familiar with, requires quite a few diagonal turns to uncover, which can help explain why it is so rarely found on this board.

            VERNA
            LITES
            YYSES
            NEISL
            RSCLT


            LITER -- 100%
            LITE -- 84.9%
            LITES -- 84.9%
            SEES -- 83%
            LESS -- 79.2%
            LIST -- 77.4%
            SILLS -- 75.5%
            REST -- 75.5%
            SILL -- 73.6%
            LISTER -- 71.7%
            NEST -- 71.7%
            SESS -- 69.8%
            TERN -- 69.8%
            NESS -- 67.9%
            RITES -- 66%
            RITE -- 66%
            TERNS -- 66%
            TEES -- 64.2%
            SESSES -- 64.2%
            STEELS -- 64.2%
            LITRE -- 64.2%
            SELLS -- 62.3%
            NESSES -- 62.3%
            NESTER -- 62.3%
            TIRE -- 60.4%
            LITRES -- 60.4%
            STEEL -- 60.4%
            SLEET -- 60.4%
            SELL -- 58.5%
            ESSES -- 58.5%
            REELS -- 58.5%
            REEL -- 58.5%
            LESSES -- 56.6%
            RESTER -- 56.6%
            LEES -- 56.6%
            STERN -- 56.6%
            SISTER -- 54.7%
            SLEETS -- 54.7%
            LESSEN -- 54.7%
            SISS -- 54.7%
            LEER -- 52.8%
            TREES -- 52.8%
            TEEL -- 52.8%
            SANE -- 52.8%
            CILL -- 52.8%
            SISSES -- 52.8%
            LIRE -- 52.8%
            TIRES -- 52.8%
            TEELS -- 52.8%
            LIER -- 50.9%
            STERNS -- 50.9%
            CILLS -- 50.9%
            TELLS -- 50.9%
            TEENS -- 50.9%
            TEEN -- 50.9%
            VITE -- 49.1%
            LIES -- 49.1%
            TRESS -- 49.1%
            TREE -- 49.1%
            TRESSES -- 49.1%
            ESTER -- 49.1%
            TELL -- 49.1%
            CIST -- 49.1%
            LISTEN -- 47.2%
            LETS -- 47.2%
            NETS -- 47.2%
            LISTENS -- 47.2%
            STEEN -- 45.3%
            SEER -- 45.3%
            SENS -- 45.3%
            SETS -- 45.3%
            VERT -- 45.3%
            LEETS -- 43.4%
            LEST -- 43.4%
            STEN -- 43.4%
            TILE -- 43.4%
            RETS -- 41.5%
            SEAN -- 41.5%
            STIRE -- 41.5%
            LIVER -- 41.5%
            ESSE -- 41.5%
            STEENS -- 41.5%
            SENT -- 39.6%
            SANES -- 39.6%
            LEET -- 39.6%
            TENS -- 39.6%
            RENS -- 39.6%
            SEEL -- 39.6%
            SILT -- 39.6%
            RETES -- 39.6%
            RETE -- 39.6%
            TILER -- 39.6%
            SEEN -- 39.6%
            SEELS -- 37.7%
            STENS -- 37.7%
            RENT -- 37.7%
            ESES -- 37.7%
            VIER -- 35.8%
            ERNS -- 35.8%
            SANER -- 35.8%
            REES -- 35.8%
            STILE -- 35.8%
            VIRE -- 35.8%
            SIRE -- 34%
            RISES -- 34%
            SIRES -- 34%
            LESSER -- 34%
            STEANS -- 34%
            STEAN -- 34%
            SIST -- 34%
            VERTS -- 34%
            LIVE -- 34%
            NETE -- 32.1%
            RENTS -- 32.1%
            STIRES -- 32.1%
            STEER -- 32.1%
            SEIS -- 32.1%
            CISTERN -- 32.1%
            SITES -- 32.1%
            SIES -- 32.1%
            RITS -- 32.1%
            STIR -- 32.1%
            SITE -- 32.1%
            SERE -- 30.2%
            RISE -- 30.2%
            NETES -- 30.2%
            SLEER -- 30.2%
            SESE -- 30.2%
            SEANS -- 30.2%
            CISTERNS -- 30.2%
            LESSENS -- 30.2%
            ILLS -- 28.3%
            CESSES -- 28.3%
            CESS -- 28.3%
            VISE -- 28.3%
            REAN -- 28.3%
            EELS -- 28.3%
            SENSE -- 28.3%
            TENSE -- 28.3%
            TRES -- 28.3%
            TIER -- 28.3%
            VILE -- 28.3%
            RESET -- 28.3%
            SENSES -- 28.3%
            ERNES -- 28.3%
            VISES -- 28.3%
            TEAS -- 26.4%
            SLEE -- 26.4%
            SIREN -- 26.4%
            ERNE -- 26.4%
            TEER -- 26.4%
            RESEES -- 26.4%
            TENSES -- 26.4%
            SILTS -- 26.4%
            SEYS -- 26.4%
            REANS -- 26.4%
            EASE -- 26.4%
            LITS -- 26.4%
            STERE -- 26.4%
            YENS -- 26.4%
            RESEE -- 26.4%
            SIRENS -- 24.5%
            VILER -- 24.5%
            TEASES -- 24.5%
            SEAS -- 24.5%
            ANTES -- 24.5%
            TEASE -- 24.5%
            ANTE -- 24.5%
            TERNES -- 24.5%
            YESSES -- 24.5%
            RESIST -- 22.6%
            TYES -- 22.6%
            VIRES -- 22.6%
            TERNE -- 22.6%
            RESEEN -- 22.6%
            SNEE -- 22.6%
            REIS -- 22.6%
            YITE -- 22.6%
            LERE -- 22.6%
            STYE -- 22.6%
            STYES -- 22.6%
            YESES -- 22.6%
            SEISES -- 20.8%
            STERES -- 20.8%
            RESISTER -- 20.8%
            ESNES -- 20.8%
            ICES -- 20.8%
            EASES -- 20.8%
            ANES -- 20.8%
            ESNE -- 20.8%
            RESES -- 20.8%
            SNEES -- 20.8%
            SISES -- 20.8%
            LICE -- 18.9%
            LYTES -- 18.9%
            LYTE -- 18.9%
            NEESE -- 18.9%
            NEESES -- 18.9%
            LERES -- 18.9%
            SENTS -- 18.9%
            RISEN -- 18.9%
            SEISE -- 18.9%
            TELS -- 18.9%
            YITES -- 18.9%
            EVITE -- 17%
            ELLS -- 17%
            TERES -- 17%
            ERES -- 17%
            TYER -- 17%
            LYES -- 17%
            SANEST -- 17%
            ICER -- 17%
            SEASE -- 17%
            SYES -- 17%
            STELL -- 17%
            STRESSES -- 15.1%
            REISES -- 15.1%
            EVITES -- 15.1%
            STRESS -- 15.1%
            ICERS -- 15.1%
            STELLS -- 15.1%
            SICS -- 15.1%
            SANT -- 15.1%
            SELS -- 15.1%
            SANTS -- 15.1%
            SEASES -- 15.1%
            RILE -- 15.1%
            IRES -- 15.1%
            RESETS -- 15.1%
            SEISER -- 13.2%
            SLICE -- 13.2%
            VETS -- 13.2%
            TYERS -- 13.2%
            LIERS -- 13.2%
            TELT -- 13.2%
            ISLES -- 13.2%
            ASSETS -- 13.2%
            RIVE -- 13.2%
            SEISERS -- 11.3%
            SYNES -- 11.3%
            STYLE -- 11.3%
            ASSES -- 11.3%
            STIE -- 11.3%
            TYNES -- 11.3%
            TYNE -- 11.3%
            SICE -- 11.3%
            SLICES -- 11.3%
            ASSET -- 11.3%
            CENS -- 11.3%
            ANTS -- 11.3%
            ISLET -- 11.3%
            ISLETS -- 11.3%
            SEIL -- 11.3%
            SYNE -- 11.3%
            LESES -- 9.4%
            SIREES -- 9.4%
            LEIS -- 9.4%
            CESSE -- 9.4%
            STIVE -- 9.4%
            RESENT -- 9.4%
            ISLE -- 9.4%
            SIREE -- 9.4%
            SLICER -- 9.4%
            EANS -- 9.4%
            SENAS -- 9.4%
            SLICERS -- 9.4%
            SICES -- 9.4%
            SENA -- 9.4%
            ETENS -- 7.5%
            RESIT -- 7.5%
            SEILS -- 7.5%
            RENTES -- 7.5%
            RENTE -- 7.5%
            REIST -- 7.5%
            NESTY -- 7.5%
            LYSES -- 7.5%
            REEST -- 7.5%
            LYNE -- 7.5%
            LEESE -- 7.5%
            LIEN -- 7.5%
            STYLER -- 7.5%
            VERITY -- 7.5%
            ELSE -- 7.5%
            LIENS -- 7.5%
            LYNES -- 7.5%
            SILE -- 7.5%
            ETEN -- 7.5%
            YESTER -- 7.5%
            YEST -- 7.5%
            YILLS -- 5.7%
            TEST -- 5.7%
            ANTIS -- 5.7%
            ANTI -- 5.7%
            LISTLESS -- 5.7%
            TEASELS -- 5.7%
            TEASEL -- 5.7%
            SILVER -- 5.7%
            LYSE -- 5.7%
            REVISES -- 5.7%
            RESISTLESS -- 5.7%
            SETNESS -- 5.7%
            YILL -- 5.7%
            ITER -- 5.7%
            ASSENTS -- 5.7%
            SLIEST -- 5.7%
            SILER -- 5.7%
            LEIR -- 5.7%
            LISLES -- 5.7%
            ASSENT -- 5.7%
            RELIT -- 5.7%
            RESITES -- 5.7%
            RESITE -- 5.7%
            VEIL -- 5.7%
            SIEN -- 3.8%
            STIVER -- 3.8%
            RESENTER -- 3.8%
            SIENS -- 3.8%
            RELIST -- 3.8%
            RESELL -- 3.8%
            RESELLS -- 3.8%
            LEESES -- 3.8%
            STRIVE -- 3.8%
            RECS -- 3.8%
            RIVET -- 3.8%
            RIVETS -- 3.8%
            SECS -- 3.8%
            VERITES -- 3.8%
            EVIL -- 3.8%
            ASSERTS -- 3.8%
            SENTE -- 3.8%
            LISLE -- 3.8%
            ASSERT -- 3.8%
            RETELL -- 3.8%
            RESTY -- 3.8%
            SIVER -- 3.8%
            RETELLS -- 3.8%
            ILLEST -- 3.8%
            VERITE -- 3.8%
            CLIES -- 1.9%
            ISTLES -- 1.9%
            SESSILE -- 1.9%
            TYLER -- 1.9%
            RISER -- 1.9%
            RISERS -- 1.9%
            STREEL -- 1.9%
            STREELS -- 1.9%
            ELITE -- 1.9%
            ISTLE -- 1.9%
            STEIL -- 1.9%
            REVISE -- 1.9%
            ELITES -- 1.9%
            VERILY -- 1.9%
            ETNA -- 1.9%
            EASELS -- 1.9%
            REESTY -- 1.9%
            RESTIVE -- 1.9%
            ENTIRELY -- 1.9%
            ENTIRE -- 1.9%
            RETEST -- 1.9%
            TELLIES -- 1.9%
            RETILE -- 1.9%
            CEIL -- 1.9%
            STERNA -- 1.9%
            ENTER -- 1.9%
            EASEL -- 1.9%
            ETNAS -- 1.9%
            ASSERTIVELY -- 1.9%
            TRESSIER -- 1.9%
            LISSES -- 1.9%
            SERS -- 1.9%
            STYLIER -- 1.9%
            SISSY -- 1.9%
            RELY -- 1.9%
            ASSENTER -- 1.9%
            NYES -- 1.9%
            If you enjoy our puzzles, please consider upgrading to a premium account to remove all ads and help support us financially. Thanks for your support!

            Comment


            • #7
              Out of curiosity I also checked for the stats on UNEXTINGUISHABLENESSES, since you mentioned it... so far it has been found by 8% of players on the one board where it can be found. So that would surely keep it in the "Ultra Rare" category.
              If you enjoy our puzzles, please consider upgrading to a premium account to remove all ads and help support us financially. Thanks for your support!

              Comment


              • #8
                What would the word values be the first time a new board was played?

                Comment


                • #9
                  I suspect that a review of all boards played would reveal something like a degree-of-difficulty score that would increase if the word were spelled backward in whole or in part, the number of times the path changed, the starting position of the word, and the length of the word. But no person would have to figure that out, and it wouldn't have to be dependent on previous players because it would be a global statistic for all players on all games played. Once that degree-of-difficulty score had been determined, a word could be assigned a category automatically. Words might start as Common or Wide and end up as Rare or Ultra Rare if they were long enough, convoluted enough, and started in a really odd position (like the middle of the grid), then coiled widdershins. Nope, no one's going to get that one. Except maybe Lalatan or Spike or . . .

                  Comment


                  • #10
                    Originally posted by DonGuy47 View Post
                    What would the word values be the first time a new board was played?
                    When we release new boards in the future, those boards would likely default to our original (current) rarity ranking system until enough new data has been collected.
                    If you enjoy our puzzles, please consider upgrading to a premium account to remove all ads and help support us financially. Thanks for your support!

                    Comment


                    • #11
                      For what it's worth (that being nothing), I hope you do *not* make major changes to the scoring. My goal here is to improve on my stats from previous years (I have a spreadsheet, of course). All of that data and aspiration would become pointless if the scoring changed.

                      Comment

                      Working...
                      X