Table of Contents

Changelog

2026-03-30
Initial publication

A Critical Note

I live in America, I’m a career programmer and I have horrific health problems.

Normally I’d assume folk know what that means but I need to be explicit: this means my survival requires me to have a practical, useful and ‘reasonable’ understanding of Generative AI. Particularly models that programmers use.

I don’t like it. I wish more humans would recognize the horrific social and environmental costs that come with these things. I loathe the fact my survival is attached to such things.

Unfortunately, my survival. Again: survival. I cannot harp on survival enough. Requires me to engage with these tools.

This blog is 100% human generated content. I like writing. I look forward to when I have something useful to write up on this blog. I like sharing my writing with fellow weirdos, nerds and randos. Generative AI is not used to write this blog.

That said.

Later in this blog there will be a link to my code and a section about analysis / quality. These two items, and only these two items, were aided by Anthropic’s Claude. I’ve also added notes in this post at the exact points described. I am grumpy over the fact I need to understand this tool. Rather than produce excess waste, I used it to create something useful.

Absolutists can wander off after being told “You greatly underestimate what it is to live in America with health issues and what that means for survival as a programmer currently”.

Everyone else: I figured out some really interesting things related to electronic dictionaries. This post is all about my work within this topic.

Backstory

I’ve always been a reader at heart and I took to electronic reading devices of various forms like a fish takes to water. I switched to reading books electronically around 2002 and never looked back. I’m also not shy about reading ‘difficult texts’, especially now that a lot of electronic book readers include dictionary support.

The great thing about the dictionary feature in electronic book readers is I’m able to read older texts more easily and with better understanding. Over the years I have slowly worked my way backwards through time to the point I’m now seeing words that I don’t recognize or I know are being used in ways different than they are used in modern English texts. I’ve also gotten better at spotting when I may have a vocabulary gap. I no longer work from context exclusively. I work from context then lookup a word to verify my understanding and ‘read’ of the word. Interestingly my read is rarely fully accurate. I’m close but there is almost always some missed, but important, nuance found when I lookup the word in a dictionary.

This of course means that dictionary support in my electronic reading setup (KOReader and Xteink X4 currently) is a critical feature. Critical enough that I’m currently working on a robust dictionary feature for the Xteink X4 Crosspoint Reader firmware as the current code does not have the feature and a previous attempt at adding a dictionary feature I found lacking. Clearly I’m not fucking about when it comes to dictionaries.

Dictionaries & A Problem

Historically I’ve used StarDict dictionaries from reader-dict. They use Wiktionary as a source and find their dictionary files to be high quality. I really do recommend their free, monolingual dictionaries; particularly the English one. It’s a really good dictionary for reading texts written in Modern English. Particularly texts written around and after the year 1900.

Personally, I’m closing in on the early decades of the 1800’s with my reading and I’ll likely work backwards into the 1700’s and further until I end up in the land of Middle English. When I get there I’ll need to learn to read a new language as Middle (and Old) English is very different from modern English. I look forward to this day but for now I have a more immediate problem.

My immediate problem is that I noticed the Wiktionary definitions of words start to feel less robust, carry less nuance and sometimes don’t fit the context of the text when reading materials written in the mid 1800’s and earlier. I can figure out the gaps but I also know having some context on how a word may have developed across Old, Middle and Modern English provides important context for understanding. Basically the quality of definitions has started to fade. They are still robust but I’m a reading nerd with autism; I tend to notice subtle shifts sooner than more normative folk.

I’ve also run into words that are outright missing form the Wiktionary Modern English dictionary. I’ve read enough linguistic blog posts (shout out to Dead Language Society for filling my brain with really fun facts and Actually Engaging reads) to know that some of these words are likely re-used from earlier forms of English. I’d like to know what these words mean beyond my understanding gleaned from context within a text. Even if that means I have to do a search in a Middle and/or Old English dictionary to be sure I’ve searched the whole of the English language corpus.

Fixing The Problem

Thankfully Wiktionary has Middle and Old English definitions that can be turned into StarDict dictionaries. Unfortunately for me, I could not find pre-made StarDict files for either Middle or Old English. I did dig around and came up with nothing. If you know of a set, let me know and I’ll update this post.

Since I couldn’t find a pre-made Middle or Old English dictionary, I spent the time to figure out how to convert Wiktionary Middle and Old English definitions into StarDict files I can use on my devices.

The reader-dict folk were kind enough to publish their sources on how to generate a monolingual StarDict file from Wiktionary data. With some monkey patching and haphazard programming (my sources are here ; note: I had help from Anthropic’s Claude when writing this code) I was able to create usable StarDict files for Old, Middle and Modern English.

I even figured out how to merge the three (Old, Middle, Modern) English dictionaries together into a monolithic dictionary. The merged dictionary entries show the Modern, Middle and Old English definitions as sections so I can see how the definition has changed over time. It’ll even omit headings if the word lacks a definition for Modern, Middle or Old English.

On top of that, I was able to include the IPA phonetic characters in the definitions so you can learn to speak the word aloud. Note: Middle and Old English can lack this information, particularly Old English, as we just don’t know how some words sounded when spoken.

Problem. Solved.

The Dictionaries

To save y’all time and pain of processing an absurd amount of text for a lengthy amount of time; I’ve included download links for each generated dictionary in this section. I’ve also included a ’test document’. It’s an epub that contains a number of different words that will allow you to see how the different definitions look using the different dictionary files. I strongly recommend downloading this epub along with the dictionary/ies you want to use so you can quickly see how definitions will look with the dictionary you’ve selected.

Test epub

This is a combined test document that will let you lookup words that are in all forms of English (Old + Middle + Modern) as well as words that are only used in a specific form of English (Old / Middle / Modern). I used this document for testing dictionary generation and find it does a good job of getting a sense of what these dictionaries offer. I highly recommend downloading it.

test-ang_enm_en-en-20260301.epub

Note: this test document was generated by code Anthropic’s Claude wrote at my direction. This file’s generation is part of the source code linked above.

Conventions

The below dictionaries are named according to ISO language codes:

  • ang: Old English
  • enm: Middle English
  • en: Modern English

Additionally file names may have an underscore (_) in them between two ISO language codes. This means the dictionary supports lookups across multiple forms of English. For example: ang_enm_an means the dictionary will allow you to lookup Old English, Middle English and Modern English words.

Definitions in the dictionaries are written in Modern English, the same as Wictionary. For multi-form dictionaries, all definitions across all English forms will be included as separate headings in a definition. If a word is not defined in a particular form of English, that heading is not included in multi-form definitions.

Files with noetym in the name have had the etymological information removed from definitions. This helps reduce size, processing needs, etc and are meant for lower resource devices and/or folk who would like more concise definitions.

Single Form

These are the single form dictionaries. They only include one form of English.

Dual Form

These are the dual form dictionaries. They contain two forms of English.

Monolithic

This dictionary contains all three forms of English.

Analysis

Preface

The following sections were generated by Anthropic’s Claude at my direction.

The ‘Dictionary Analysis` sub-section is an accounting of the dictionary data, what some of the differences between dictionary files are as well as statistics.

The ‘Font Coverage Analysis’ section is an overview of common fonts used for electronic reading and their coverage for the text contained within the dictionaries. There are some very real, very meaningful gaps to consider when it comes to font selection. The dictionaries are usable despite font coverage gaps but it’s something worth noting and probably reviewing, even superficially.

Dictionary Analysis

Generated StarDict dictionaries derived from Wiktionary data via engrish.py. Date of dictionary build: 2026-03-01.

Dictionaries

ID Full Name Description
ang Old English Words from Old English (Anglo-Saxon), spoken roughly 450–1150 AD. The language of Beowulf and the earliest written English records. Grammatically very different from Modern English — heavily inflected with complex case endings.
enm Middle English Words from Middle English, spoken roughly 1150–1500 AD. The language of Chaucer. A transitional form — Old English grammar is simplifying, French and Latin vocabulary is flooding in after the Norman Conquest.
en Modern English Contemporary English as documented in Wiktionary. The baseline dictionary.
ang+en Old English + Modern English Combined dictionary containing all headwords from both ang and en. Words present in both layers appear once with merged entries.
enm+en Middle English + Modern English Combined dictionary containing all headwords from both enm and en.
ang+enm+en Old + Middle + Modern English The full historical stack — all three layers combined. The most comprehensive dictionary in this set.

Each dictionary also has a -noetym variant; see the noetym note below for additional detail.

Headword Counts

Dictionary Primary Headwords Synonym / Alt-Form Entries Syn : Primary Ratio
ang 20,689 13,510 0.65
enm 41,711 3,228 0.08
en 900,726 507,296 0.56
ang+en 919,986 517,424 0.56
enm+en 936,074 499,014 0.53
ang+enm+en 954,566 509,345 0.53

The high synonym ratio in ang (0.65) reflects the inflected nature of Old English — many alternate spellings and grammatical forms are indexed as synonym entries pointing back to the canonical headword. Middle English (enm) has a notably low ratio (0.08), likely because its Wiktionary coverage is less complete rather than the language being less inflected.

File Sizes

Dictionary .dict (raw) .dict.dz (compressed) .idx .syn
ang 3.8 MB 1.2 MB 341.8 KB 228.6 KB
enm 6.3 MB 2.0 MB 664.8 KB 50.9 KB
en 177.5 MB 46.5 MB 17.5 MB 8.2 MB
ang+en 183.0 MB 48.3 MB 17.8 MB 8.6 MB
enm+en 183.5 MB 48.5 MB 18.0 MB 8.2 MB
ang+enm+en 189.4 MB 50.1 MB 18.3 MB 8.5 MB

Parts of Speech Distribution

Full corpus counts for all dictionaries.

POS ang enm en ang+en enm+en ang+enm+en
Noun 9,699 26,760 480,638 490,336 507,398 517,099
Adjective 2,976 6,046 182,059 185,036 188,105 191,081
Proper Noun 2,218 1,134 179,775 181,991 180,908 183,126
Verb 4,285 6,331 58,435 62,720 64,766 69,051
Adverb 911 2,732 27,114 28,026 29,847 30,758
Symbol 15,698 15,698 15,698 15,698
Interjection 4,683 4,714 4,757 4,788
Preposition 127 260 3,809 3,936 4,069 4,196
Prefix 148 2,359 2,507 2,435 2,583
Suffix 245 502 1,360 1,605 1,862 2,107
Pronoun 139 730 937 1,076 1,667 1,806
Numeral 272 302 1,017 1,047 1,319
Determiner 373
Contraction 849

Notable observations:

  • ang has a relatively high Verb share compared to Modern English — Old English had many strong/weak verb paradigms each warranting separate entries.
  • enm is noun-heavy even proportionally to its size, reflecting the influence of French borrowings (predominantly nouns) after 1066.
  • Symbol entries appear only in en and the combined dicts — these are non-alphabetic characters (mathematical symbols, currency, etc.) documented in Wiktionary. The count is identical across all three combined dicts (15,698), confirming symbols come entirely from the en layer.
  • Suffixes accumulate across layers: the combined ang+enm+en has the most suffix entries (2,107), reflecting all three layers’ morphological inventories.

Etymology Content

Etymology content was detected by searching for characteristic markers (Proto-, from Old, from Middle, Cognate with, Inherited from).

Dictionary Entries with Etymology Total Entries %
ang 5,807 20,689 28.1%
enm 5,445 41,711 13.1%
en 19,998 900,727 2.2%
ang+en 25,716 921,412 2.8%
enm+en 23,610 942,434 2.5%
ang+enm+en 29,328 963,123 3.0%

Note: total entry counts for the combined dictionaries exceed their headword counts (e.g. ang+en has 921,412 content entries vs 919,986 headwords). This occurs because some headwords carry separate content blocks sourced from each contributing language layer, stored as distinct entries under the same headword.

Old English entries are proportionally the richest in etymology (28%), reflecting the scholarly interest in tracing Proto-Germanic and Proto-Indo-European roots for ancient vocabulary. Modern English entries have sparse etymology coverage in Wiktionary (2.2%), likely because the sheer number of entries outpaces editorial effort. The combined dictionaries show a bump in etymology % as the historical entries (which have higher etymology density) are incorporated.

Script and Character Coverage

Headwords are overwhelmingly Latin-script, but the historical dictionaries include characters outside the basic ASCII range.

Script / Block ang enm en
Latin-1 Supplement (à, æ, ð, þ, etc.) 6,058 1,777 6,941
Latin Extended (ā, ē, ī, ō, ū macrons, etc.) 79 1,370 3,518
Runic (ᚪ, ᛞ, ᛏ, etc.) 209 0 0
IPA Extensions 252
Greek/Coptic 1 289
Combining Diacritics 5 1 317

Runic characters in ang: Old English was sometimes written in the Runic alphabet (the futhorc) before the Latin alphabet became dominant. The 209 Runic character occurrences across ang headwords represent a small set of entries for individual runes (e.g., ᛞ dæg “day”, ᚪ āc “oak”) — these are the runes themselves treated as lexical items, not a parallel Runic transcription of all entries.

enm uses extended Latin characters heavily (macrons, yogh-derived forms), reflecting the varied spelling conventions of the Middle English period. en includes IPA (pronunciation transcriptions) and Greek (loanwords, scientific terminology).

Headword Overlap

Historical layers vs. Modern English
Overlap Count Notes
ang headwords unique to Old English 18,492 Not found in enm or en
ang headwords shared with Modern English (en) 1,429 Words that survived ~1,000 years
enm headwords unique to Middle English 34,580 Not found in ang or en
enm headwords shared with Modern English (en) 6,363 Words that survived from Middle English
angenm (not in en) 768 Present in both old layers, lost in modern
angenmen 588 Present across all three eras

Combined Dictionary Composition

The combined dictionaries are mathematically exact unions of their component layers — no headwords are added or lost in the merge.

Dictionary Calculation Expected Actual Delta
ang+en ang (20,689) + en (900,726) − overlap (1,429) 919,986 919,986 0
enm+en enm (41,711) + en (900,726) − overlap (6,363) 936,074 936,074 0
ang+enm+en union of all three 954,566 954,566 0

The incremental contribution of each historical layer over the en baseline:

  • Adding ang+19,260 headwords (+2.1%)
  • Adding enm+35,348 headwords (+3.9%)
  • Adding both ang + enm+53,840 headwords (+5.98%)

The relatively small percentage gain reflects how thoroughly Modern English Wiktionary already covers vocabulary — the historical layers contribute depth (richer etymology, morphological variants) rather than breadth.


noetym Variants

Each dictionary has a -noetym companion (e.g., en-en-noetym-20260301). These are size-reduced versions intended for resource-constrained environments (older e-readers, smaller storage).

Comparison of en vs en-noetym:

File Full noetym Reduction
.dict (raw) 177.5 MB 136.4 MB −41.1 MB (23%)
.dict.dz (compressed) 46.5 MB 34.6 MB −11.9 MB (26%)
.idx 17.5 MB 17.5 MB 0
.syn 8.2 MB 8.2 MB 0

The .idx and .syn files are byte-identical between full and noetym variants — headword lists and synonym indexes are unchanged. Only the .dict content differs: etymology paragraphs are stripped while definitions remain intact. The noetym variants are not analyzed separately in this document as they are structurally identical to their full counterparts.

Font Coverage Analysis

Glyph and codepoint coverage analysis of fonts in the dictionary set. Generated from build date 2026-03-01.

Coverage is measured across three non-overlapping content categories:

  • Headwords — the index terms users look up, parsed from .idx files. Metric: percentage of headwords where every character is present in the font (fully renderable).
  • Entry Content — definition text, part-of-speech labels, usage notes. Sourced from -noetym .dict files, which exclude etymology. Metric: percentage of total character occurrences covered (frequency-weighted).
  • Etymology — the delta between full and -noetym .dict files: text present only in etymological content. Metric: percentage of total character occurrences covered (frequency-weighted).

Users running -noetym variants are only subject to Headword and Entry Content coverage.

Fonts

Font Category Notes
Atkinson Hyperlegible Next Accessibility Designed for low-vision readers; optimised for character distinction
Lexica Ultralegible Accessibility Legibility-focused; similar goals to Atkinson
OpenDyslexic Accessibility Weighted letterforms designed to reduce dyslexic reading errors
Georgia E-reader Staple Serif, widely bundled on e-readers and operating systems
Garamond E-reader Staple Classical serif; narrow codepoint coverage
Palatino E-reader Staple Humanist serif; narrow codepoint coverage
Bookerly E-reader Staple Amazon Kindle’s primary reading font; broad Latin coverage
Caecilia E-reader Staple Slab serif; default on early Kindle devices
Charis SIL Broad Coverage Designed for linguists; full IPA and extended Latin support
DejaVu Sans Broad Coverage Wide Unicode coverage; common on Linux-based e-readers
Noto Sans Broad Coverage Google’s pan-Unicode font family

Headword Coverage

Percentage of headwords fully renderable (all characters present in font).

Font ang enm en ang+en enm+en ang+enm+en
Atkinson Hyperlegible Next 99.7% 96.8% 99.5% 99.5% 99.4% 99.4%
Lexica Ultralegible 99.7% 96.8% 99.5% 99.5% 99.4% 99.4%
OpenDyslexic 99.7% 96.8% 99.6% 99.6% 99.5% 99.5%
Georgia 99.7% 96.8% 99.5% 99.5% 99.4% 99.4%
Garamond 99.4% 96.8% 99.2% 99.2% 99.1% 99.1%
Palatino 99.7% 96.8% 99.4% 99.4% 99.3% 99.3%
Bookerly 99.8% 99.9% 99.6% 99.6% 99.6% 99.6%
Caecilia 99.7% 96.8% 99.5% 99.5% 99.4% 99.4%
Charis SIL 99.8% 99.9% 99.7% 99.7% 99.7% 99.7%
DejaVu Sans 99.8% 99.9% 99.8% 99.8% 99.8% 99.8%
Noto Sans 99.8% 100.0% 99.7% 99.7% 99.7% 99.7%

Headword coverage is high across all fonts. The enm column shows a cluster of fonts dropping to 96.8% — this reflects Middle English’s use of characters such as yogh (ȝ) and other extended forms that narrower fonts do not include. Noto Sans is the only font achieving 100% headword coverage for any dictionary (enm).

The small shortfall in all fonts across all dictionaries is partly structural: the ang dictionary contains 47 headwords that are Runic characters themselves (the Old English futhorc runes treated as lexical entries). No font in this set covers the Runic block; those 47 headwords will not render in any of these fonts without system fallback.

Entry Content Coverage

Percentage of character occurrences in definition text covered by the font (frequency-weighted; excludes etymology).

Font ang enm en ang+en enm+en ang+enm+en
Atkinson Hyperlegible Next 99.9% 99.1% 99.4% 100.0% 100.0% 100.0%
Lexica Ultralegible 100.0% 99.2% 99.4% 100.0% 100.0% 100.0%
OpenDyslexic 100.0% 99.8% 99.9% 100.0% 100.0% 100.0%
Georgia 100.0% 99.1% 99.4% 100.0% 100.0% 100.0%
Garamond 99.7% 99.1% 99.4% 100.0% 100.0% 100.0%
Palatino 99.8% 99.1% 99.4% 100.0% 100.0% 100.0%
Bookerly 100.0% 99.6% 99.7% 100.0% 100.0% 100.0%
Caecilia 100.0% 99.1% 99.4% 100.0% 100.0% 100.0%
Charis SIL 100.0% 99.9% 99.9% 100.0% 100.0% 100.0%
DejaVu Sans 100.0% 99.9% 99.9% 100.0% 100.0% 100.0%
Noto Sans 100.0% 99.9% 99.9% 100.0% 100.0% 100.0%

Entry content coverage is excellent across all fonts. The combined dictionaries (ang+en, enm+en, ang+enm+en) round to 100.0% for all fonts — the historical content is small enough by character volume relative to the Modern English base that any residual gaps are diluted to sub-rounding thresholds.

The enm standalone dictionary shows a consistent ~99.1% floor for narrower fonts, reflecting Middle English characters absent from those fonts appearing in definition text. The sub-100% figures for en across several fonts are primarily due to IPA characters used in pronunciation guides within entries (see IPA section below).

There is a bounded rendering gap in ang entry content: 30 Latin-script headwords carry a secondary sense that names a runic character, referencing the rune glyph inline (e.g. “the runic character ᛞ (/d/)”). No font in this set covers Runic; those 30 inline glyphs will not render without system fallback. The frequency impact is negligible in the coverage percentages above but the visual gap will be apparent to readers looking up those entries.

Etymology Coverage

Percentage of character occurrences in etymological text covered by the font (frequency-weighted). Relevant only for full (non-noetym) dictionary variants.

Font ang enm en ang+en enm+en ang+enm+en
Atkinson Hyperlegible Next 99.2% 99.4% 98.9% 98.9% 98.9% 98.9%
Lexica Ultralegible 99.5% 99.8% 99.4% 99.4% 99.4% 99.4%
OpenDyslexic 99.7% 99.9% 99.6% 99.6% 99.6% 99.6%
Georgia 99.5% 99.8% 99.5% 99.5% 99.5% 99.5%
Garamond 98.3% 98.8% 98.8% 98.8% 98.8% 98.8%
Palatino 99.1% 99.4% 98.9% 98.9% 98.9% 98.9%
Bookerly 99.7% 99.9% 99.6% 99.6% 99.6% 99.6%
Caecilia 99.4% 99.5% 98.9% 99.0% 99.0% 99.0%
Charis SIL 99.6% 99.8% 99.3% 99.3% 99.3% 99.3%
DejaVu Sans 99.7% 100.0% 99.8% 99.8% 99.8% 99.8%
Noto Sans 99.7% 99.9% 99.7% 99.7% 99.7% 99.7%

Etymology coverage is consistently high but no font achieves 100% across all dictionaries. Etymological text makes heavier use of reconstructed Proto-Germanic and Proto-Indo-European forms, cognate examples in other languages (German, Dutch, Gothic, Old Norse), and specialised linguistic notation — all of which draw on a wider codepoint range than plain definition text. Garamond shows the lowest floor (98.3% for ang), consistent with its narrow overall codepoint inventory.

A small set of Runic characters (7 codepoints) appear exclusively in etymological text in ang — not in headwords or definitions. No font covers these; the rendering gap exists but affects only etymological content in the full ang variant.

Unicode Block Coverage

Coverage of each Unicode block present in the dictionary content, by font. Values are percentage of codepoints within that block that the font supports. Blocks are ordered by number of distinct codepoints present in the dictionaries.

Block Codepoints in dicts Atkinson Lexica OpenDyslexic Georgia Garamond Palatino Bookerly Caecilia Charis DejaVu Noto
Basic Latin 96 98% 99% 99% 99% 99% 99% 99% 99% 99% 99% 99%
Latin-1 Supplement 93 94% 100% 100% 100% 94% 100% 100% 100% 100% 100% 100%
Latin Extended 203 38% 57% 73% 50% 0% 33% 90% 46% 100% 100% 100%
Latin Extended Additional 116 3% 53% 71% 3% 0% 0% 100% 0% 100% 100% 100%
Latin Extended-D 58 0% 0% 3% 0% 0% 0% 0% 0% 95% 26% 95%
IPA Extensions 90 0% 6% 88% 0% 0% 0% 11% 0% 100% 100% 100%
Phonetic Extensions 101 0% 0% 11% 0% 0% 0% 0% 0% 100% 84% 100%
Spacing Modifier Letters 74 1% 12% 42% 12% 0% 12% 31% 12% 100% 82% 100%
Combining Diacritics 71 21% 23% 65% 11% 0% 0% 42% 0% 97% 93% 100%
Greek and Coptic 103 4% 63% 85% 63% 0% 1% 68% 4% 20% 99% 93%
Cyrillic 141 0% 0% 88% 62% 45% 0% 86% 0% 86% 100% 100%
General Punctuation 68 25% 25% 72% 31% 22% 24% 76% 24% 85% 97% 100%
Superscripts and Subscripts 38 0% 3% 50% 47% 0% 3% 47% 0% 100% 79% 100%
Number Forms 47 0% 0% 66% 9% 0% 0% 68% 0% 100% 79% 40%
Currency Symbols 34 6% 6% 59% 32% 0% 3% 12% 3% 97% 65% 97%
Letterlike Symbols 60 5% 5% 17% 10% 2% 2% 17% 5% 13% 93% 100%
Arrows 61 0% 0% 25% 0% 0% 0% 23% 0% 30% 100% 0%
Mathematical Operators 220 6% 6% 7% 6% 0% 6% 25% 6% 18% 100% 0%
Runic 44 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0%

IPA Extensions

90 distinct IPA codepoints are present in the dictionaries, used primarily in pronunciation guides within en entry content. IPA also appears in ang entry content where rune definitions include phonetic values.

Font IPA Coverage
Atkinson Hyperlegible Next 0% (0/90)
Lexica Ultralegible 6% (5/90)
OpenDyslexic 88% (79/90)
Georgia 0% (0/90)
Garamond 0% (0/90)
Palatino 0% (0/90)
Bookerly 11% (10/90)
Caecilia 0% (0/90)
Charis SIL 100% (90/90)
DejaVu Sans 100% (90/90)
Noto Sans 100% (90/90)

IPA coverage is a clear dividing line. Six fonts — Atkinson Hyperlegible Next, Georgia, Garamond, Palatino, Caecilia, and to a meaningful degree Bookerly and Lexica Ultralegible — provide little to no IPA support. Readers using these fonts will encounter rendering gaps wherever pronunciation is given inline. The three broad-coverage fonts (Charis, DejaVu, Noto) and OpenDyslexic cover IPA well; Charis SIL was specifically designed for this use case.

Runic

No font in this set covers the Runic Unicode block. The 44 Runic codepoints present in the dictionaries are confined to ang (and by extension the combined dictionaries that include ang). All Runic rendering depends entirely on system font fallback.

Runic content appears in three contexts:

  • 47 headwords that are the Old English futhorc runes themselves, treated as lexical entries in ang
  • 30 Latin-script headword entries that carry a secondary sense naming a runic character, with the rune glyph inline in the definition text
  • 7 Runic codepoints that appear exclusively in etymological content in ang (coverage gap noted; affects full variant only)

For dictionaries that do not include ang (enm, en, enm+en) there is no Runic content and this does not apply.

Practical Rendering Notes

en (Modern English only) All fonts render headwords at 99%+. The primary gap is IPA in entry content: six fonts have 0% IPA coverage, meaning pronunciation guides will not render. Users who do not rely on inline pronunciation notation are not meaningfully affected. For full rendering including IPA, Charis SIL, DejaVu Sans, Noto Sans, or OpenDyslexic are the suitable choices.

enm (Middle English only) Several fonts drop to 96.8% headword coverage due to Middle English characters (notably yogh) outside their Latin range. Bookerly, Charis SIL, DejaVu Sans, and Noto Sans hold at 99.9–100%. The accessibility fonts (Atkinson, Lexica) and the narrower e-reader staples (Garamond, Palatino, Caecilia, Georgia) all show this gap.

ang (Old English only) Headword coverage is near-universal for Latin-script headwords across all fonts. The Runic gap is absolute — 47 headwords and 30 inline definition glyphs will not render in any of these fonts. This is a known constraint of the font set, not of the dictionary. Entry content and etymology coverage are otherwise excellent (99%+).

Combined dictionaries (ang+en, enm+en, ang+enm+en) Coverage figures largely mirror en — the Modern English content dominates by volume and dilutes any historical-layer gaps in the frequency-weighted metrics. Headword coverage is 99%+ across all fonts. The Runic and IPA gaps described above apply where the relevant layers are present. Users of -noetym variants of any combined dictionary will see 100.0% entry content coverage for most fonts.

See also