It got me wondering about words in English. Are most words connected to each other in this way? If not, what do the distinct groups (or cliques in graph parlance) look like? Do two-letter words, three-letter words or four-letter words have more cliques? I decided I'd find out.
Preliminaries
For simplicity, I limited the edit operation to only allow one letter in the word to change (ie. no deletions or insertions) at a time. You can think of the dictionary as an undirected graph, with each node being a word and each edge an edit operation permitting you to travel between adjacent nodes.My data source is the 2of12inf.txt file from the 12 Dicts package from wordlists.sourceforge.net. It uses American spellings and seems to be a fairly decent list of words (which is to say my 10 minutes of hunting didn't provide me with anything better). It contains:
- 2-letter words: 62
- 3-letter words: 642
- 4-letter words: 2546
- 5-letter words: 5122
Results
A little ruby script yielded the following results.All 2-letter words belong to the same clique.
Of the 3-letter words, 631 (of 642) belong to the same clique, the remaining 11 are each entirely disconnected from each other. They are,
- nth, ism, urn, ebb, obi, qua, ova, use, ugh, gnu, aha
- ache, achy, acme, acne, acre, ashy
- info, into, onto, undo, unto
- high, nigh, sigh, sign
- afar, agar, ajar
- also, alto, auto
- eddy, edge, edgy
- icon, ikon, iron
- idle, idly, isle
- opal, oral, oval
- used, user, uses
- bevy, levy
- crud, crux
- demo, memo
- hadj, hajj
- idol, idyl
- ogle, ogre
- orzo, ouzo
- thou, thru
- adze, agog, ague, ahoy, alga, ammo, amok, anal, ankh, apse, aqua, aura, avow, awol, ayah, bozo, ciao, ditz, ebbs, echo, ecru, egad, emus, ends, envy, epee, epic, espy, euro, evil, exam, expo, guru, hymn, ibex, iffy, imam, iota, isms, jato, judo, kiwi, liar, luau, lynx, meow, myna, nevi, nova, obey, oboe, odor, ohms, okay, okra, once, onyx, orgy, ovum, rely, rhea, rhos, semi, sexy, stye, sync, tofu, tuft, ugly, ulna, upon, urge, uric, urns, void, yegg, yeti, yuan, zebu
- reset, resew, resow, renew, beset, besot, besom, bosom, begot, begat, began, begun, begum, begin, vegan, bigot, bight, wight, tight, sight, sighs, signs, highs, right, night, might, light, fight, eight, beret, beget (31)
- round, wound, would, world, mould, moult, mount, fount, count, court, could, sound, pound, mound, hound, found, bound (17)
- acnes, acres, acmes, aches, ached, acted, anted, antes, antis, antic, attic, ashes, ashen, aspen, asses, asset, apses (17)
- comic, conic, cynic, tonic, toxic, toxin, topic, tunic, runic, sonic, ionic, colic (12)
- overs, overt, avert, alert, ovens, opens, omens, evens, event, avers (10)
The joys of a Sunday evening well-spent. I also learnt that nth is a valid word without vowels, a zebu is a breed of cattle, and a yegg is a burglar or safecracker.