Dictionary Shenanigans

Changelog

2026-03-30: Initial publication

A Critical Note

I live in America, I’m a career programmer and I have horrific health problems.

Normally I’d assume folk know what that means but I need to be explicit: this means my survival requires me to have a practical, useful and ‘reasonable’ understanding of Generative AI. Particularly models that programmers use.

I don’t like it. I wish more humans would recognize the horrific social and environmental costs that come with these things. I loathe the fact my survival is attached to such things.

Unfortunately, my survival. Again: survival. I cannot harp on survival enough. Requires me to engage with these tools.

This blog is 100% human generated content. I like writing. I look forward to when I have something useful to write up on this blog. I like sharing my writing with fellow weirdos, nerds and randos. Generative AI is not used to write this blog.

That said.

Later in this blog there will be a link to my code and a section about analysis / quality. These two items, and only these two items, were aided by Anthropic’s Claude. I’ve also added notes in this post at the exact points described. I am grumpy over the fact I need to understand this tool. Rather than produce excess waste, I used it to create something useful.

Absolutists can wander off after being told “You greatly underestimate what it is to live in America with health issues and what that means for survival as a programmer currently”.

Everyone else: I figured out some really interesting things related to electronic dictionaries. This post is all about my work within this topic.

Backstory

I’ve always been a reader at heart and I took to electronic reading devices of various forms like a fish takes to water. I switched to reading books electronically around 2002 and never looked back. I’m also not shy about reading ‘difficult texts’, especially now that a lot of electronic book readers include dictionary support.

The great thing about the dictionary feature in electronic book readers is I’m able to read older texts more easily and with better understanding. Over the years I have slowly worked my way backwards through time to the point I’m now seeing words that I don’t recognize or I know are being used in ways different than they are used in modern English texts. I’ve also gotten better at spotting when I may have a vocabulary gap. I no longer work from context exclusively. I work from context then lookup a word to verify my understanding and ‘read’ of the word. Interestingly my read is rarely fully accurate. I’m close but there is almost always some missed, but important, nuance found when I lookup the word in a dictionary.

This of course means that dictionary support in my electronic reading setup (KOReader and Xteink X4 currently) is a critical feature. Critical enough that I’m currently working on a robust dictionary feature for the Xteink X4 Crosspoint Reader firmware as the current code does not have the feature and a previous attempt at adding a dictionary feature I found lacking. Clearly I’m not fucking about when it comes to dictionaries.

Dictionaries & A Problem

Historically I’ve used StarDict dictionaries from reader-dict. They use Wiktionary as a source and find their dictionary files to be high quality. I really do recommend their free, monolingual dictionaries; particularly the English one. It’s a really good dictionary for reading texts written in Modern English. Particularly texts written around and after the year 1900.

Personally, I’m closing in on the early decades of the 1800’s with my reading and I’ll likely work backwards into the 1700’s and further until I end up in the land of Middle English. When I get there I’ll need to learn to read a new language as Middle (and Old) English is very different from modern English. I look forward to this day but for now I have a more immediate problem.

My immediate problem is that I noticed the Wiktionary definitions of words start to feel less robust, carry less nuance and sometimes don’t fit the context of the text when reading materials written in the mid 1800’s and earlier. I can figure out the gaps but I also know having some context on how a word may have developed across Old, Middle and Modern English provides important context for understanding. Basically the quality of definitions has started to fade. They are still robust but I’m a reading nerd with autism; I tend to notice subtle shifts sooner than more normative folk.

I’ve also run into words that are outright missing form the Wiktionary Modern English dictionary. I’ve read enough linguistic blog posts (shout out to Dead Language Society for filling my brain with really fun facts and Actually Engaging reads) to know that some of these words are likely re-used from earlier forms of English. I’d like to know what these words mean beyond my understanding gleaned from context within a text. Even if that means I have to do a search in a Middle and/or Old English dictionary to be sure I’ve searched the whole of the English language corpus.

Fixing The Problem

Thankfully Wiktionary has Middle and Old English definitions that can be turned into StarDict dictionaries. Unfortunately for me, I could not find pre-made StarDict files for either Middle or Old English. I did dig around and came up with nothing. If you know of a set, let me know and I’ll update this post.

Since I couldn’t find a pre-made Middle or Old English dictionary, I spent the time to figure out how to convert Wiktionary Middle and Old English definitions into StarDict files I can use on my devices.

The reader-dict folk were kind enough to publish their sources on how to generate a monolingual StarDict file from Wiktionary data. With some monkey patching and haphazard programming (my sources are here ; note: I had help from Anthropic’s Claude when writing this code) I was able to create usable StarDict files for Old, Middle and Modern English.

I even figured out how to merge the three (Old, Middle, Modern) English dictionaries together into a monolithic dictionary. The merged dictionary entries show the Modern, Middle and Old English definitions as sections so I can see how the definition has changed over time. It’ll even omit headings if the word lacks a definition for Modern, Middle or Old English.

On top of that, I was able to include the IPA phonetic characters in the definitions so you can learn to speak the word aloud. Note: Middle and Old English can lack this information, particularly Old English, as we just don’t know how some words sounded when spoken.

Problem. Solved.

The Dictionaries

To save y’all time and pain of processing an absurd amount of text for a lengthy amount of time; I’ve included download links for each generated dictionary in this section. I’ve also included a ’test document’. It’s an epub that contains a number of different words that will allow you to see how the different definitions look using the different dictionary files. I strongly recommend downloading this epub along with the dictionary/ies you want to use so you can quickly see how definitions will look with the dictionary you’ve selected.

Test epub

This is a combined test document that will let you lookup words that are in all forms of English (Old + Middle + Modern) as well as words that are only used in a specific form of English (Old / Middle / Modern). I used this document for testing dictionary generation and find it does a good job of getting a sense of what these dictionaries offer. I highly recommend downloading it.

test-ang_enm_en-en-20260301.epub

Note: this test document was generated by code Anthropic’s Claude wrote at my direction. This file’s generation is part of the source code linked above.

Conventions

The below dictionaries are named according to ISO language codes:

ang: Old English
enm: Middle English
en: Modern English

Additionally file names may have an underscore (_) in them between two ISO language codes. This means the dictionary supports lookups across multiple forms of English. For example: ang_enm_an means the dictionary will allow you to lookup Old English, Middle English and Modern English words.

Definitions in the dictionaries are written in Modern English, the same as Wictionary. For multi-form dictionaries, all definitions across all English forms will be included as separate headings in a definition. If a word is not defined in a particular form of English, that heading is not included in multi-form definitions.

Files with noetym in the name have had the etymological information removed from definitions. This helps reduce size, processing needs, etc and are meant for lower resource devices and/or folk who would like more concise definitions.

Single Form

These are the single form dictionaries. They only include one form of English.

Dual Form

These are the dual form dictionaries. They contain two forms of English.

Monolithic

This dictionary contains all three forms of English.

Analysis

Preface

The following sections were generated by Anthropic’s Claude at my direction.

The ‘Dictionary Analysis` sub-section is an accounting of the dictionary data, what some of the differences between dictionary files are as well as statistics.

The ‘Font Coverage Analysis’ section is an overview of common fonts used for electronic reading and their coverage for the text contained within the dictionaries. There are some very real, very meaningful gaps to consider when it comes to font selection. The dictionaries are usable despite font coverage gaps but it’s something worth noting and probably reviewing, even superficially.

Dictionary Analysis

Generated StarDict dictionaries derived from Wiktionary data via engrish.py. Date of dictionary build: 2026-03-01.

Dictionaries

ID	Full Name	Description
`ang`	Old English	Words from Old English (Anglo-Saxon), spoken roughly 450–1150 AD. The language of Beowulf and the earliest written English records. Grammatically very different from Modern English — heavily inflected with complex case endings.
`enm`	Middle English	Words from Middle English, spoken roughly 1150–1500 AD. The language of Chaucer. A transitional form — Old English grammar is simplifying, French and Latin vocabulary is flooding in after the Norman Conquest.
`en`	Modern English	Contemporary English as documented in Wiktionary. The baseline dictionary.
`ang+en`	Old English + Modern English	Combined dictionary containing all headwords from both `ang` and `en`. Words present in both layers appear once with merged entries.
`enm+en`	Middle English + Modern English	Combined dictionary containing all headwords from both `enm` and `en`.
`ang+enm+en`	Old + Middle + Modern English	The full historical stack — all three layers combined. The most comprehensive dictionary in this set.

Each dictionary also has a -noetym variant; see the noetym note below for additional detail.

Headword Counts

Dictionary	Primary Headwords	Synonym / Alt-Form Entries	Syn : Primary Ratio
`ang`	20,689	13,510	0.65
`enm`	41,711	3,228	0.08
`en`	900,726	507,296	0.56
`ang+en`	919,986	517,424	0.56
`enm+en`	936,074	499,014	0.53
`ang+enm+en`	954,566	509,345	0.53

The high synonym ratio in ang (0.65) reflects the inflected nature of Old English — many alternate spellings and grammatical forms are indexed as synonym entries pointing back to the canonical headword. Middle English (enm) has a notably low ratio (0.08), likely because its Wiktionary coverage is less complete rather than the language being less inflected.

File Sizes

Dictionary	`.dict` (raw)	`.dict.dz` (compressed)	`.idx`	`.syn`
`ang`	3.8 MB	1.2 MB	341.8 KB	228.6 KB
`enm`	6.3 MB	2.0 MB	664.8 KB	50.9 KB
`en`	177.5 MB	46.5 MB	17.5 MB	8.2 MB
`ang+en`	183.0 MB	48.3 MB	17.8 MB	8.6 MB
`enm+en`	183.5 MB	48.5 MB	18.0 MB	8.2 MB
`ang+enm+en`	189.4 MB	50.1 MB	18.3 MB	8.5 MB

Parts of Speech Distribution

Full corpus counts for all dictionaries.

POS	`ang`	`enm`	`en`	`ang+en`	`enm+en`	`ang+enm+en`
Noun	9,699	26,760	480,638	490,336	507,398	517,099
Adjective	2,976	6,046	182,059	185,036	188,105	191,081
Proper Noun	2,218	1,134	179,775	181,991	180,908	183,126
Verb	4,285	6,331	58,435	62,720	64,766	69,051
Adverb	911	2,732	27,114	28,026	29,847	30,758
Symbol	—	—	15,698	15,698	15,698	15,698
Interjection	—	—	4,683	4,714	4,757	4,788
Preposition	127	260	3,809	3,936	4,069	4,196
Prefix	148	—	2,359	2,507	2,435	2,583
Suffix	245	502	1,360	1,605	1,862	2,107
Pronoun	139	730	937	1,076	1,667	1,806
Numeral	272	302	—	1,017	1,047	1,319
Determiner	—	373	—	—	—	—
Contraction	—	—	849	—	—	—

Notable observations:

ang has a relatively high Verb share compared to Modern English — Old English had many strong/weak verb paradigms each warranting separate entries.
enm is noun-heavy even proportionally to its size, reflecting the influence of French borrowings (predominantly nouns) after 1066.
Symbol entries appear only in en and the combined dicts — these are non-alphabetic characters (mathematical symbols, currency, etc.) documented in Wiktionary. The count is identical across all three combined dicts (15,698), confirming symbols come entirely from the en layer.
Suffixes accumulate across layers: the combined ang+enm+en has the most suffix entries (2,107), reflecting all three layers’ morphological inventories.

Etymology Content

Etymology content was detected by searching for characteristic markers (Proto-, from Old, from Middle, Cognate with, Inherited from).

Dictionary	Entries with Etymology	Total Entries	%
`ang`	5,807	20,689	28.1%
`enm`	5,445	41,711	13.1%
`en`	19,998	900,727	2.2%
`ang+en`	25,716	921,412	2.8%
`enm+en`	23,610	942,434	2.5%
`ang+enm+en`	29,328	963,123	3.0%

Note: total entry counts for the combined dictionaries exceed their headword counts (e.g. ang+en has 921,412 content entries vs 919,986 headwords). This occurs because some headwords carry separate content blocks sourced from each contributing language layer, stored as distinct entries under the same headword.

Old English entries are proportionally the richest in etymology (28%), reflecting the scholarly interest in tracing Proto-Germanic and Proto-Indo-European roots for ancient vocabulary. Modern English entries have sparse etymology coverage in Wiktionary (2.2%), likely because the sheer number of entries outpaces editorial effort. The combined dictionaries show a bump in etymology % as the historical entries (which have higher etymology density) are incorporated.

Script and Character Coverage

Headwords are overwhelmingly Latin-script, but the historical dictionaries include characters outside the basic ASCII range.

Script / Block	`ang`	`enm`	`en`
Latin-1 Supplement (à, æ, ð, þ, etc.)	6,058	1,777	6,941
Latin Extended (ā, ē, ī, ō, ū macrons, etc.)	79	1,370	3,518
Runic (ᚪ, ᛞ, ᛏ, etc.)	209	0	0
IPA Extensions	—	—	252
Greek/Coptic	1	—	289
Combining Diacritics	5	1	317

Runic characters in ang: Old English was sometimes written in the Runic alphabet (the futhorc) before the Latin alphabet became dominant. The 209 Runic character occurrences across ang headwords represent a small set of entries for individual runes (e.g., ᛞ dæg “day”, ᚪ āc “oak”) — these are the runes themselves treated as lexical items, not a parallel Runic transcription of all entries.

enm uses extended Latin characters heavily (macrons, yogh-derived forms), reflecting the varied spelling conventions of the Middle English period. en includes IPA (pronunciation transcriptions) and Greek (loanwords, scientific terminology).

Headword Overlap

Historical layers vs. Modern English

Overlap	Count	Notes
`ang` headwords unique to Old English	18,492	Not found in `enm` or `en`
`ang` headwords shared with Modern English (`en`)	1,429	Words that survived ~1,000 years
`enm` headwords unique to Middle English	34,580	Not found in `ang` or `en`
`enm` headwords shared with Modern English (`en`)	6,363	Words that survived from Middle English
`ang` ∩ `enm` (not in `en`)	768	Present in both old layers, lost in modern
`ang` ∩ `enm` ∩ `en`	588	Present across all three eras

Combined Dictionary Composition

The combined dictionaries are mathematically exact unions of their component layers — no headwords are added or lost in the merge.

Dictionary	Calculation	Expected	Actual
`ang+en`	ang (20,689) + en (900,726) − overlap (1,429)	919,986	919,986
`enm+en`	enm (41,711) + en (900,726) − overlap (6,363)	936,074	936,074
`ang+enm+en`	union of all three	954,566	954,566

The incremental contribution of each historical layer over the en baseline:

Adding ang → +19,260 headwords (+2.1%)
Adding enm → +35,348 headwords (+3.9%)
Adding both ang + enm → +53,840 headwords (+5.98%)

The relatively small percentage gain reflects how thoroughly Modern English Wiktionary already covers vocabulary — the historical layers contribute depth (richer etymology, morphological variants) rather than breadth.

noetym Variants

Each dictionary has a -noetym companion (e.g., en-en-noetym-20260301). These are size-reduced versions intended for resource-constrained environments (older e-readers, smaller storage).

Comparison of en vs en-noetym:

File	Full	noetym	Reduction
`.dict` (raw)	177.5 MB	136.4 MB	−41.1 MB (23%)
`.dict.dz` (compressed)	46.5 MB	34.6 MB	−11.9 MB (26%)
`.idx`	17.5 MB	17.5 MB	0
`.syn`	8.2 MB	8.2 MB	0

The .idx and .syn files are byte-identical between full and noetym variants — headword lists and synonym indexes are unchanged. Only the .dict content differs: etymology paragraphs are stripped while definitions remain intact. The noetym variants are not analyzed separately in this document as they are structurally identical to their full counterparts.

Font Coverage Analysis

Glyph and codepoint coverage analysis of fonts in the dictionary set. Generated from build date 2026-03-01.

Coverage is measured across three non-overlapping content categories:

Headwords — the index terms users look up, parsed from .idx files. Metric: percentage of headwords where every character is present in the font (fully renderable).
Entry Content — definition text, part-of-speech labels, usage notes. Sourced from -noetym .dict files, which exclude etymology. Metric: percentage of total character occurrences covered (frequency-weighted).
Etymology — the delta between full and -noetym .dict files: text present only in etymological content. Metric: percentage of total character occurrences covered (frequency-weighted).

Users running -noetym variants are only subject to Headword and Entry Content coverage.

Fonts

Font	Category	Notes
Atkinson Hyperlegible Next	Accessibility	Designed for low-vision readers; optimised for character distinction
Lexica Ultralegible	Accessibility	Legibility-focused; similar goals to Atkinson
OpenDyslexic	Accessibility	Weighted letterforms designed to reduce dyslexic reading errors
Georgia	E-reader Staple	Serif, widely bundled on e-readers and operating systems
Garamond	E-reader Staple	Classical serif; narrow codepoint coverage
Palatino	E-reader Staple	Humanist serif; narrow codepoint coverage
Bookerly	E-reader Staple	Amazon Kindle’s primary reading font; broad Latin coverage
Caecilia	E-reader Staple	Slab serif; default on early Kindle devices
Charis SIL	Broad Coverage	Designed for linguists; full IPA and extended Latin support
DejaVu Sans	Broad Coverage	Wide Unicode coverage; common on Linux-based e-readers
Noto Sans	Broad Coverage	Google’s pan-Unicode font family

Headword Coverage

Percentage of headwords fully renderable (all characters present in font).

Font	`ang`	`enm`	`en`	`ang+en`	`enm+en`	`ang+enm+en`
Atkinson Hyperlegible Next	99.7%	96.8%	99.5%	99.5%	99.4%	99.4%
Lexica Ultralegible	99.7%	96.8%	99.5%	99.5%	99.4%	99.4%
OpenDyslexic	99.7%	96.8%	99.6%	99.6%	99.5%	99.5%
Georgia	99.7%	96.8%	99.5%	99.5%	99.4%	99.4%
Garamond	99.4%	96.8%	99.2%	99.2%	99.1%	99.1%
Palatino	99.7%	96.8%	99.4%	99.4%	99.3%	99.3%
Bookerly	99.8%	99.9%	99.6%	99.6%	99.6%	99.6%
Caecilia	99.7%	96.8%	99.5%	99.5%	99.4%	99.4%
Charis SIL	99.8%	99.9%	99.7%	99.7%	99.7%	99.7%
DejaVu Sans	99.8%	99.9%	99.8%	99.8%	99.8%	99.8%
Noto Sans	99.8%	100.0%	99.7%	99.7%	99.7%	99.7%

Headword coverage is high across all fonts. The enm column shows a cluster of fonts dropping to 96.8% — this reflects Middle English’s use of characters such as yogh (ȝ) and other extended forms that narrower fonts do not include. Noto Sans is the only font achieving 100% headword coverage for any dictionary (enm).

The small shortfall in all fonts across all dictionaries is partly structural: the ang dictionary contains 47 headwords that are Runic characters themselves (the Old English futhorc runes treated as lexical entries). No font in this set covers the Runic block; those 47 headwords will not render in any of these fonts without system fallback.

Entry Content Coverage

Percentage of character occurrences in definition text covered by the font (frequency-weighted; excludes etymology).

Font	`ang`	`enm`	`en`	`ang+en`	`enm+en`	`ang+enm+en`
Atkinson Hyperlegible Next	99.9%	99.1%	99.4%	100.0%	100.0%	100.0%
Lexica Ultralegible	100.0%	99.2%	99.4%	100.0%	100.0%	100.0%
OpenDyslexic	100.0%	99.8%	99.9%	100.0%	100.0%	100.0%
Georgia	100.0%	99.1%	99.4%	100.0%	100.0%	100.0%
Garamond	99.7%	99.1%	99.4%	100.0%	100.0%	100.0%
Palatino	99.8%	99.1%	99.4%	100.0%	100.0%	100.0%
Bookerly	100.0%	99.6%	99.7%	100.0%	100.0%	100.0%
Caecilia	100.0%	99.1%	99.4%	100.0%	100.0%	100.0%
Charis SIL	100.0%	99.9%	99.9%	100.0%	100.0%	100.0%
DejaVu Sans	100.0%	99.9%	99.9%	100.0%	100.0%	100.0%
Noto Sans	100.0%	99.9%	99.9%	100.0%	100.0%	100.0%

Entry content coverage is excellent across all fonts. The combined dictionaries (ang+en, enm+en, ang+enm+en) round to 100.0% for all fonts — the historical content is small enough by character volume relative to the Modern English base that any residual gaps are diluted to sub-rounding thresholds.

The enm standalone dictionary shows a consistent ~99.1% floor for narrower fonts, reflecting Middle English characters absent from those fonts appearing in definition text. The sub-100% figures for en across several fonts are primarily due to IPA characters used in pronunciation guides within entries (see IPA section below).

There is a bounded rendering gap in ang entry content: 30 Latin-script headwords carry a secondary sense that names a runic character, referencing the rune glyph inline (e.g. “the runic character ᛞ (/d/)”). No font in this set covers Runic; those 30 inline glyphs will not render without system fallback. The frequency impact is negligible in the coverage percentages above but the visual gap will be apparent to readers looking up those entries.

Etymology Coverage

Percentage of character occurrences in etymological text covered by the font (frequency-weighted). Relevant only for full (non-noetym) dictionary variants.

Font	`ang`	`enm`	`en`	`ang+en`	`enm+en`	`ang+enm+en`
Atkinson Hyperlegible Next	99.2%	99.4%	98.9%	98.9%	98.9%	98.9%
Lexica Ultralegible	99.5%	99.8%	99.4%	99.4%	99.4%	99.4%
OpenDyslexic	99.7%	99.9%	99.6%	99.6%	99.6%	99.6%
Georgia	99.5%	99.8%	99.5%	99.5%	99.5%	99.5%
Garamond	98.3%	98.8%	98.8%	98.8%	98.8%	98.8%
Palatino	99.1%	99.4%	98.9%	98.9%	98.9%	98.9%
Bookerly	99.7%	99.9%	99.6%	99.6%	99.6%	99.6%
Caecilia	99.4%	99.5%	98.9%	99.0%	99.0%	99.0%
Charis SIL	99.6%	99.8%	99.3%	99.3%	99.3%	99.3%
DejaVu Sans	99.7%	100.0%	99.8%	99.8%	99.8%	99.8%
Noto Sans	99.7%	99.9%	99.7%	99.7%	99.7%	99.7%

Etymology coverage is consistently high but no font achieves 100% across all dictionaries. Etymological text makes heavier use of reconstructed Proto-Germanic and Proto-Indo-European forms, cognate examples in other languages (German, Dutch, Gothic, Old Norse), and specialised linguistic notation — all of which draw on a wider codepoint range than plain definition text. Garamond shows the lowest floor (98.3% for ang), consistent with its narrow overall codepoint inventory.

A small set of Runic characters (7 codepoints) appear exclusively in etymological text in ang — not in headwords or definitions. No font covers these; the rendering gap exists but affects only etymological content in the full ang variant.

Unicode Block Coverage

Coverage of each Unicode block present in the dictionary content, by font. Values are percentage of codepoints within that block that the font supports. Blocks are ordered by number of distinct codepoints present in the dictionaries.

Block	Codepoints in dicts	Atkinson	Lexica	OpenDyslexic	Georgia	Garamond	Palatino	Bookerly	Caecilia	Charis	DejaVu	Noto
Basic Latin	96	98%	99%	99%	99%	99%	99%	99%	99%	99%	99%	99%
Latin-1 Supplement	93	94%	100%	100%	100%	94%	100%	100%	100%	100%	100%	100%
Latin Extended	203	38%	57%	73%	50%	0%	33%	90%	46%	100%	100%	100%
Latin Extended Additional	116	3%	53%	71%	3%	0%	0%	100%	0%	100%	100%	100%
Latin Extended-D	58	0%	0%	3%	0%	0%	0%	0%	0%	95%	26%	95%
IPA Extensions	90	0%	6%	88%	0%	0%	0%	11%	0%	100%	100%	100%
Phonetic Extensions	101	0%	0%	11%	0%	0%	0%	0%	0%	100%	84%	100%
Spacing Modifier Letters	74	1%	12%	42%	12%	0%	12%	31%	12%	100%	82%	100%
Combining Diacritics	71	21%	23%	65%	11%	0%	0%	42%	0%	97%	93%	100%
Greek and Coptic	103	4%	63%	85%	63%	0%	1%	68%	4%	20%	99%	93%
Cyrillic	141	0%	0%	88%	62%	45%	0%	86%	0%	86%	100%	100%
General Punctuation	68	25%	25%	72%	31%	22%	24%	76%	24%	85%	97%	100%
Superscripts and Subscripts	38	0%	3%	50%	47%	0%	3%	47%	0%	100%	79%	100%
Number Forms	47	0%	0%	66%	9%	0%	0%	68%	0%	100%	79%	40%
Currency Symbols	34	6%	6%	59%	32%	0%	3%	12%	3%	97%	65%	97%
Letterlike Symbols	60	5%	5%	17%	10%	2%	2%	17%	5%	13%	93%	100%
Arrows	61	0%	0%	25%	0%	0%	0%	23%	0%	30%	100%	0%
Mathematical Operators	220	6%	6%	7%	6%	0%	6%	25%	6%	18%	100%	0%
Runic	44	0%	0%	0%	0%	0%	0%	0%	0%	0%	0%	0%

IPA Extensions

90 distinct IPA codepoints are present in the dictionaries, used primarily in pronunciation guides within en entry content. IPA also appears in ang entry content where rune definitions include phonetic values.

Font	IPA Coverage
Atkinson Hyperlegible Next	0% (0/90)
Lexica Ultralegible	6% (5/90)
OpenDyslexic	88% (79/90)
Georgia	0% (0/90)
Garamond	0% (0/90)
Palatino	0% (0/90)
Bookerly	11% (10/90)
Caecilia	0% (0/90)
Charis SIL	100% (90/90)
DejaVu Sans	100% (90/90)
Noto Sans	100% (90/90)

IPA coverage is a clear dividing line. Six fonts — Atkinson Hyperlegible Next, Georgia, Garamond, Palatino, Caecilia, and to a meaningful degree Bookerly and Lexica Ultralegible — provide little to no IPA support. Readers using these fonts will encounter rendering gaps wherever pronunciation is given inline. The three broad-coverage fonts (Charis, DejaVu, Noto) and OpenDyslexic cover IPA well; Charis SIL was specifically designed for this use case.

Runic

No font in this set covers the Runic Unicode block. The 44 Runic codepoints present in the dictionaries are confined to ang (and by extension the combined dictionaries that include ang). All Runic rendering depends entirely on system font fallback.

Runic content appears in three contexts:

47 headwords that are the Old English futhorc runes themselves, treated as lexical entries in ang
30 Latin-script headword entries that carry a secondary sense naming a runic character, with the rune glyph inline in the definition text
7 Runic codepoints that appear exclusively in etymological content in ang (coverage gap noted; affects full variant only)

For dictionaries that do not include ang (enm, en, enm+en) there is no Runic content and this does not apply.

Practical Rendering Notes

en (Modern English only) All fonts render headwords at 99%+. The primary gap is IPA in entry content: six fonts have 0% IPA coverage, meaning pronunciation guides will not render. Users who do not rely on inline pronunciation notation are not meaningfully affected. For full rendering including IPA, Charis SIL, DejaVu Sans, Noto Sans, or OpenDyslexic are the suitable choices.

enm (Middle English only) Several fonts drop to 96.8% headword coverage due to Middle English characters (notably yogh) outside their Latin range. Bookerly, Charis SIL, DejaVu Sans, and Noto Sans hold at 99.9–100%. The accessibility fonts (Atkinson, Lexica) and the narrower e-reader staples (Garamond, Palatino, Caecilia, Georgia) all show this gap.

ang (Old English only) Headword coverage is near-universal for Latin-script headwords across all fonts. The Runic gap is absolute — 47 headwords and 30 inline definition glyphs will not render in any of these fonts. This is a known constraint of the font set, not of the dictionary. Entry content and etymology coverage are otherwise excellent (99%+).

Combined dictionaries (ang+en, enm+en, ang+enm+en) Coverage figures largely mirror en — the Modern English content dominates by volume and dilutes any historical-layer gaps in the frequency-weighted metrics. Headword coverage is 99%+ across all fonts. The Runic and IPA gaps described above apply where the relevant layers are present. Users of -noetym variants of any combined dictionary will see 100.0% entry content coverage for most fonts.

Tech Reading