Scrabble

Friday 2nd September 2016

I love Scrabble, but I've always thought that the letter distribution was wrong. Kat and I normally play a (perfectly legal) version where there are two bags of letters: vowels and consonants. You can choose how many of each type when you pick new tiles, I find this helps with the letters, otherwise I end up with a rack full of vowels and have to waste a turn changing them. Playing two bags reduces the randomness a little and makes the game based more on skill than chance.

Show us your Butts

The story goes that Alfred Butts, who invented Scrabble, used the front page of the New York Times in 1938 to get the frequencies of the different letters and used that to calculate how many tiles of each letter to use for 100 tiles in total. Sadly I've been unable to find out exactly which edition of the newspaper he used, but I can only imagine that the stories were all about India initialising institutions, because I've always found there are way too many i tiles in the game.

First I did a letter frequency analysis on the front page of the New York Times and BBC News websites, averaged the letter counts of both websites and graphed them against the Scrabble letter tiles. There were 34,980 characters in total, twice the number of letters that Alfred Butts used.

 

So far so good, there's not a huge difference between them. The most obvious difference is that J, Q, X and Z all appear with a frequency of less than 1% on the two news websites, so none of those tiles should be in the game at all, but that's going to kill a lot of the fun.

Looking at these news websites more closely there are lots of words that aren't in the Scrabble dictionary, and a lot of small words like the, of and is which skew the results. I don't think this is a good sample. A much better sample of words would be the complete 267,751-word Official Scrabble Words (OSW) dictionary, which I downloaded and analysed.

There were 2,250,566 characters in total, a hundred and fifty times as many characters as Alfred Butts analysed and all of them legal Scrabble words. Alfred must have spent many hours doing his analysis; my PC completed it in about five seconds. 

The distribution of the OSW diverges much more than a sample of news text does from a Scrabble set. It is also different from normal letter frequencies in written English, where the first top five letters are E, T, A, O and I in that order. But the Scrabble dictionary is pretty different to written English!

The OSW analysis has more R's and T's, way more S's and fewer E's, but there are still no J, Q or X letters. Let's be charitable and add in one of each of these letters and we come up with a new distribution:

Luckily we have several Scrabble sets in the house, so we can make a new set with the adjusted number of tiles.

Here is the original Scrabble set on the left, with my amended set on the right. You can see the additional letters are a lighter colour because they're from a different edition of Scrabble. The removed letters from the original set are at the bottom on the right.

To be fair to the inventor, he didn't have the OSW or the computing power to work this out. 

The next step is to test it out!

Here are the results of the first game that Kat and I played using the new set. We didn't play the two-bag rule, and I felt that the letter distribution was definitely better.

The most obvious change was that the seven extra S tiles meant that there were many more instances of making an already-played word into the plural and scoring more points. It needs more games to find out if it's a real improvement though.

Want to give it a go? You'll need an additional A, C, L, M, P, R, seven S's and two T's. You can buy individual tiles on eBay.  

From now on, new posts are going to be every other week.

Comments

Recent posts

Five things I'm going to miss about Cornwall

Five things I'm going to miss about Cornwall

In ten days we're leaving the UK for new adventures in Spain...

Read more

Owning less stuff

Owning less stuff

We're off to Spain for new adventures in a couple of weeks, and we've spent the last three months sorting our possessions into the four S's: ship, store, sell or skip...

Read more

What's this about?

Hi I'm Mat and I'm addicted to new hobbies. I used to think this was a bad thing but now I'm embracing it.

Writing them all up in this blog encourages me to finish projects, and helps me keep track of which ones I've tried.

Also in Inside my brain...

Who should I vote for?

Who should I vote for?

I decided to run a competition to find out who I should vote for in the general election...



Also in Games...

Ice cream vans in games

Ice cream vans in games

Ice cream vans and video games are inseparable. Here's a quick run-down of the history, thanks to YouTube:   Twisted Metal - Playstation 1 (1995)    I believe this is the earliest use of an ice cream van in a game...