I love Scrabble, but I've always thought that the letter distribution was wrong. Kat and I normally play a (perfectly legal) version where there are two bags of letters: vowels and consonants. You can choose how many of each type when you pick new tiles, I find this helps with the letters, otherwise I end up with a rack full of vowels and have to waste a turn changing them. Playing two bags reduces the randomness a little and makes the game based more on skill than chance.
Show us your Butts
The story goes that Alfred Butts, who invented Scrabble, used the front page of the New York Times in 1938 to get the frequencies of the different letters and used that to calculate how many tiles of each letter to use for 100 tiles in total. Sadly I've been unable to find out exactly which edition of the newspaper he used, but I can only imagine that the stories were all about India initialising institutions, because I've always found there are way too many i tiles in the game.
First I did a letter frequency analysis on the front page of the New York Times and BBC News websites, averaged the letter counts of both websites and graphed them against the Scrabble letter tiles. There were 34,980 characters in total, twice the number of letters that Alfred Butts used.
So far so good, there's not a huge difference between them. The most obvious difference is that J, Q, X and Z all appear with a frequency of less than 1% on the two news websites, so none of those tiles should be in the game at all, but that's going to kill a lot of the fun.
Looking at these news websites more closely there are lots of words that aren't in the Scrabble dictionary, and a lot of small words like the, of and is which skew the results. I don't think this is a good sample. A much better sample of words would be the complete 267,751-word Official Scrabble Words (OSW) dictionary, which I downloaded and analysed.
There were 2,250,566 characters in total, a hundred and fifty times as many characters as Alfred Butts analysed and all of them legal Scrabble words. Alfred must have spent many hours doing his analysis; my PC completed it in about five seconds.
The distribution of the OSW diverges much more than a sample of news text does from a Scrabble set. It is also different from normal letter frequencies in written English, where the first top five letters are E, T, A, O and I in that order. But the Scrabble dictionary is pretty different to written English!
The OSW analysis has more R's and T's, way more S's and fewer E's, but there are still no J, Q or X letters. Let's be charitable and add in one of each of these letters and we come up with a new distribution:
Luckily we have several Scrabble sets in the house, so we can make a new set with the adjusted number of tiles.
Here is the original Scrabble set on the left, with my amended set on the right. You can see the additional letters are a lighter colour because they're from a different edition of Scrabble. The removed letters from the original set are at the bottom on the right.
To be fair to the inventor, he didn't have the OSW or the computing power to work this out.
The next step is to test it out!
Here are the results of the first game that Kat and I played using the new set. We didn't play the two-bag rule, and I felt that the letter distribution was definitely better.
The most obvious change was that the seven extra S tiles meant that there were many more instances of making an already-played word into the plural and scoring more points. It needs more games to find out if it's a real improvement though.
Want to give it a go? You'll need an additional A, C, L, M, P, R, seven S's and two T's. You can buy individual tiles on eBay.
From now on, new posts are going to be every other week.
In ten days we're leaving the UK for new adventures in Spain...Read more
Archive2016December 2016 (2)November 2016 (2)October 2016 (2)September 2016 (3)August 2016 (4)July 2016 (5)June 2016 (4)May 2016 (4)April 2016 (5)March 2016 (4)February 2016 (4)January 2016 (5)2015December 2015 (4)November 2015 (4)October 2015 (5)September 2015 (4)August 2015 (4)July 2015 (4)June 2015 (4)May 2015 (5)April 2015 (4)March 2015 (4)February 2015 (4)January 2015 (4)
What's this about?
Hi I'm Mat and I'm addicted to new hobbies. I used to think this was a bad thing but now I'm embracing it.
Writing them all up in this blog encourages me to finish projects, and helps me keep track of which ones I've tried.
I always laugh when I hear people struggle to spell words phonetically, when they say random words starting with that letter instead of the standard (if dull) NATO phonetic alphabet...
Also in Inside my brain...
I decided to run a competition to find out who I should vote for in the general election...