Table of Contents
Changelog
- 2026-03-30
- Initial publication
A Critical Note
I live in America, I’m a career programmer and I have horrific health problems.
Normally I’d assume folk know what that means but I need to be explicit: this means my survival requires me to have a practical, useful and ‘reasonable’ understanding of Generative AI. Particularly models that programmers use.
I don’t like it. I wish more humans would recognize the horrific social and environmental costs that come with these things. I loathe the fact my survival is attached to such things.
Unfortunately, my survival. Again: survival. I cannot harp on survival enough. Requires me to engage with these tools.
This blog is 100% human generated content. I like writing. I look forward to when I have something useful to write up on this blog. I like sharing my writing with fellow weirdos, nerds and randos. Generative AI is not used to write this blog.
That said.
Later in this blog there will be a link to my code and a section about analysis / quality. These two items, and only these two items, were aided by Anthropic’s Claude. I’ve also added notes in this post at the exact points described. I am grumpy over the fact I need to understand this tool. Rather than produce excess waste, I used it to create something useful.
Absolutists can wander off after being told “You greatly underestimate what it is to live in America with health issues and what that means for survival as a programmer currently”.
Everyone else: I figured out some really interesting things related to electronic dictionaries. This post is all about my work within this topic.
Backstory
I’ve always been a reader at heart and I took to electronic reading devices of various forms like a fish takes to water. I switched to reading books electronically around 2002 and never looked back. I’m also not shy about reading ‘difficult texts’, especially now that a lot of electronic book readers include dictionary support.
The great thing about the dictionary feature in electronic book readers is I’m able to read older texts more easily and with better understanding. Over the years I have slowly worked my way backwards through time to the point I’m now seeing words that I don’t recognize or I know are being used in ways different than they are used in modern English texts. I’ve also gotten better at spotting when I may have a vocabulary gap. I no longer work from context exclusively. I work from context then lookup a word to verify my understanding and ‘read’ of the word. Interestingly my read is rarely fully accurate. I’m close but there is almost always some missed, but important, nuance found when I lookup the word in a dictionary.
This of course means that dictionary support in my electronic reading setup (KOReader and Xteink X4 currently) is a critical feature. Critical enough that I’m currently working on a robust dictionary feature for the Xteink X4 Crosspoint Reader firmware as the current code does not have the feature and a previous attempt at adding a dictionary feature I found lacking. Clearly I’m not fucking about when it comes to dictionaries.
Dictionaries & A Problem
Historically I’ve used StarDict dictionaries from reader-dict. They use Wiktionary as a source and find their dictionary files to be high quality. I really do recommend their free, monolingual dictionaries; particularly the English one. It’s a really good dictionary for reading texts written in Modern English. Particularly texts written around and after the year 1900.
Personally, I’m closing in on the early decades of the 1800’s with my reading and I’ll likely work backwards into the 1700’s and further until I end up in the land of Middle English. When I get there I’ll need to learn to read a new language as Middle (and Old) English is very different from modern English. I look forward to this day but for now I have a more immediate problem.
My immediate problem is that I noticed the Wiktionary definitions of words start to feel less robust, carry less nuance and sometimes don’t fit the context of the text when reading materials written in the mid 1800’s and earlier. I can figure out the gaps but I also know having some context on how a word may have developed across Old, Middle and Modern English provides important context for understanding. Basically the quality of definitions has started to fade. They are still robust but I’m a reading nerd with autism; I tend to notice subtle shifts sooner than more normative folk.
I’ve also run into words that are outright missing form the Wiktionary Modern English dictionary. I’ve read enough linguistic blog posts (shout out to Dead Language Society for filling my brain with really fun facts and Actually Engaging reads) to know that some of these words are likely re-used from earlier forms of English. I’d like to know what these words mean beyond my understanding gleaned from context within a text. Even if that means I have to do a search in a Middle and/or Old English dictionary to be sure I’ve searched the whole of the English language corpus.
Fixing The Problem
Thankfully Wiktionary has Middle and Old English definitions that can be turned into StarDict dictionaries. Unfortunately for me, I could not find pre-made StarDict files for either Middle or Old English. I did dig around and came up with nothing. If you know of a set, let me know and I’ll update this post.
Since I couldn’t find a pre-made Middle or Old English dictionary, I spent the time to figure out how to convert Wiktionary Middle and Old English definitions into StarDict files I can use on my devices.
The reader-dict folk were kind enough to publish their sources on how to generate a monolingual StarDict file from Wiktionary data. With some monkey patching and haphazard programming (my sources are here ; note: I had help from Anthropic’s Claude when writing this code) I was able to create usable StarDict files for Old, Middle and Modern English.
I even figured out how to merge the three (Old, Middle, Modern) English dictionaries together into a monolithic dictionary. The merged dictionary entries show the Modern, Middle and Old English definitions as sections so I can see how the definition has changed over time. It’ll even omit headings if the word lacks a definition for Modern, Middle or Old English.
On top of that, I was able to include the IPA phonetic characters in the definitions so you can learn to speak the word aloud. Note: Middle and Old English can lack this information, particularly Old English, as we just don’t know how some words sounded when spoken.
Problem. Solved.
The Dictionaries
To save y’all time and pain of processing an absurd amount of text for a lengthy amount of time; I’ve included download links for each generated dictionary in this section. I’ve also included a ’test document’. It’s an epub that contains a number of different words that will allow you to see how the different definitions look using the different dictionary files. I strongly recommend downloading this epub along with the dictionary/ies you want to use so you can quickly see how definitions will look with the dictionary you’ve selected.
Test epub
This is a combined test document that will let you lookup words that are in all forms of English (Old + Middle + Modern) as well as words that are only used in a specific form of English (Old / Middle / Modern). I used this document for testing dictionary generation and find it does a good job of getting a sense of what these dictionaries offer. I highly recommend downloading it.
test-ang_enm_en-en-20260301.epub
Note: this test document was generated by code Anthropic’s Claude wrote at my direction. This file’s generation is part of the source code linked above.
Conventions
The below dictionaries are named according to ISO language codes:
ang: Old Englishenm: Middle Englishen: Modern English
Additionally file names may have an underscore (_) in them between two ISO language codes. This means the dictionary supports lookups across multiple forms of English. For example: ang_enm_an means the dictionary will allow you to lookup Old English, Middle English and Modern English words.
Definitions in the dictionaries are written in Modern English, the same as Wictionary. For multi-form dictionaries, all definitions across all English forms will be included as separate headings in a definition. If a word is not defined in a particular form of English, that heading is not included in multi-form definitions.
Files with noetym in the name have had the etymological information removed from definitions. This helps reduce size, processing needs, etc and are meant for lower resource devices and/or folk who would like more concise definitions.
Single Form
These are the single form dictionaries. They only include one form of English.
- ang-en-20260301.zip
- ang-en-noetym-20260301.zip
- enm-en-20260301.zip
- enm-en-noetym-20260301.zip
- en-en-20260301.zip
- en-en-noetym-20260301.zip
Dual Form
These are the dual form dictionaries. They contain two forms of English.
- ang_en-en-20260301.zip
- ang_en-en-noetym-20260301.zip
- enm_en-en-20260301.zip
- enm_en-en-noetym-20260301.zip
Monolithic
This dictionary contains all three forms of English.
Analysis
Preface
The following sections were generated by Anthropic’s Claude at my direction.
The ‘Dictionary Analysis` sub-section is an accounting of the dictionary data, what some of the differences between dictionary files are as well as statistics.
The ‘Font Coverage Analysis’ section is an overview of common fonts used for electronic reading and their coverage for the text contained within the dictionaries. There are some very real, very meaningful gaps to consider when it comes to font selection. The dictionaries are usable despite font coverage gaps but it’s something worth noting and probably reviewing, even superficially.
Dictionary Analysis
Generated StarDict dictionaries derived from Wiktionary data via engrish.py. Date of dictionary build: 2026-03-01.
Dictionaries
| ID | Full Name | Description |
|---|---|---|
ang |
Old English | Words from Old English (Anglo-Saxon), spoken roughly 450–1150 AD. The language of Beowulf and the earliest written English records. Grammatically very different from Modern English — heavily inflected with complex case endings. |
enm |
Middle English | Words from Middle English, spoken roughly 1150–1500 AD. The language of Chaucer. A transitional form — Old English grammar is simplifying, French and Latin vocabulary is flooding in after the Norman Conquest. |
en |
Modern English | Contemporary English as documented in Wiktionary. The baseline dictionary. |
ang+en |
Old English + Modern English | Combined dictionary containing all headwords from both ang and en. Words present in both layers appear once with merged entries. |
enm+en |
Middle English + Modern English | Combined dictionary containing all headwords from both enm and en. |
ang+enm+en |
Old + Middle + Modern English | The full historical stack — all three layers combined. The most comprehensive dictionary in this set. |
Each dictionary also has a -noetym variant; see the noetym note below for additional detail.
Headword Counts
| Dictionary | Primary Headwords | Synonym / Alt-Form Entries | Syn : Primary Ratio |
|---|---|---|---|
ang |
20,689 | 13,510 | 0.65 |
enm |
41,711 | 3,228 | 0.08 |
en |
900,726 | 507,296 | 0.56 |
ang+en |
919,986 | 517,424 | 0.56 |
enm+en |
936,074 | 499,014 | 0.53 |
ang+enm+en |
954,566 | 509,345 | 0.53 |
The high synonym ratio in ang (0.65) reflects the inflected nature of Old English — many alternate spellings and grammatical forms are indexed as synonym entries pointing back to the canonical headword. Middle English (enm) has a notably low ratio (0.08), likely because its Wiktionary coverage is less complete rather than the language being less inflected.
File Sizes
| Dictionary | .dict (raw) |
.dict.dz (compressed) |
.idx |
.syn |
|---|---|---|---|---|
ang |
3.8 MB | 1.2 MB | 341.8 KB | 228.6 KB |
enm |
6.3 MB | 2.0 MB | 664.8 KB | 50.9 KB |
en |
177.5 MB | 46.5 MB | 17.5 MB | 8.2 MB |
ang+en |
183.0 MB | 48.3 MB | 17.8 MB | 8.6 MB |
enm+en |
183.5 MB | 48.5 MB | 18.0 MB | 8.2 MB |
ang+enm+en |
189.4 MB | 50.1 MB | 18.3 MB | 8.5 MB |
Parts of Speech Distribution
Full corpus counts for all dictionaries.
| POS | ang |
enm |
en |
ang+en |
enm+en |
ang+enm+en |
|---|---|---|---|---|---|---|
| Noun | 9,699 | 26,760 | 480,638 | 490,336 | 507,398 | 517,099 |
| Adjective | 2,976 | 6,046 | 182,059 | 185,036 | 188,105 | 191,081 |
| Proper Noun | 2,218 | 1,134 | 179,775 | 181,991 | 180,908 | 183,126 |
| Verb | 4,285 | 6,331 | 58,435 | 62,720 | 64,766 | 69,051 |
| Adverb | 911 | 2,732 | 27,114 | 28,026 | 29,847 | 30,758 |
| Symbol | — | — | 15,698 | 15,698 | 15,698 | 15,698 |
| Interjection | — | — | 4,683 | 4,714 | 4,757 | 4,788 |
| Preposition | 127 | 260 | 3,809 | 3,936 | 4,069 | 4,196 |
| Prefix | 148 | — | 2,359 | 2,507 | 2,435 | 2,583 |
| Suffix | 245 | 502 | 1,360 | 1,605 | 1,862 | 2,107 |
| Pronoun | 139 | 730 | 937 | 1,076 | 1,667 | 1,806 |
| Numeral | 272 | 302 | — | 1,017 | 1,047 | 1,319 |
| Determiner | — | 373 | — | — | — | — |
| Contraction | — | — | 849 | — | — | — |
Notable observations:
anghas a relatively high Verb share compared to Modern English — Old English had many strong/weak verb paradigms each warranting separate entries.enmis noun-heavy even proportionally to its size, reflecting the influence of French borrowings (predominantly nouns) after 1066.- Symbol entries appear only in
enand the combined dicts — these are non-alphabetic characters (mathematical symbols, currency, etc.) documented in Wiktionary. The count is identical across all three combined dicts (15,698), confirming symbols come entirely from theenlayer. - Suffixes accumulate across layers: the combined
ang+enm+enhas the most suffix entries (2,107), reflecting all three layers’ morphological inventories.
Etymology Content
Etymology content was detected by searching for characteristic markers (Proto-, from Old, from Middle, Cognate with, Inherited from).
| Dictionary | Entries with Etymology | Total Entries | % |
|---|---|---|---|
ang |
5,807 | 20,689 | 28.1% |
enm |
5,445 | 41,711 | 13.1% |
en |
19,998 | 900,727 | 2.2% |
ang+en |
25,716 | 921,412 | 2.8% |
enm+en |
23,610 | 942,434 | 2.5% |
ang+enm+en |
29,328 | 963,123 | 3.0% |
Note: total entry counts for the combined dictionaries exceed their headword counts (e.g. ang+en has 921,412 content entries vs 919,986 headwords). This occurs because some headwords carry separate content blocks sourced from each contributing language layer, stored as distinct entries under the same headword.
Old English entries are proportionally the richest in etymology (28%), reflecting the scholarly interest in tracing Proto-Germanic and Proto-Indo-European roots for ancient vocabulary. Modern English entries have sparse etymology coverage in Wiktionary (2.2%), likely because the sheer number of entries outpaces editorial effort. The combined dictionaries show a bump in etymology % as the historical entries (which have higher etymology density) are incorporated.
Script and Character Coverage
Headwords are overwhelmingly Latin-script, but the historical dictionaries include characters outside the basic ASCII range.
| Script / Block | ang |
enm |
en |
|---|---|---|---|
| Latin-1 Supplement (à, æ, ð, þ, etc.) | 6,058 | 1,777 | 6,941 |
| Latin Extended (ā, ē, ī, ō, ū macrons, etc.) | 79 | 1,370 | 3,518 |
| Runic (ᚪ, ᛞ, ᛏ, etc.) | 209 | 0 | 0 |
| IPA Extensions | — | — | 252 |
| Greek/Coptic | 1 | — | 289 |
| Combining Diacritics | 5 | 1 | 317 |
Runic characters in ang: Old English was sometimes written in the Runic alphabet (the futhorc) before the Latin alphabet became dominant. The 209 Runic character occurrences across ang headwords represent a small set of entries for individual runes (e.g., ᛞ dæg “day”, ᚪ āc “oak”) — these are the runes themselves treated as lexical items, not a parallel Runic transcription of all entries.
enm uses extended Latin characters heavily (macrons, yogh-derived forms), reflecting the varied spelling conventions of the Middle English period. en includes IPA (pronunciation transcriptions) and Greek (loanwords, scientific terminology).
Headword Overlap
Historical layers vs. Modern English
| Overlap | Count | Notes |
|---|---|---|
ang headwords unique to Old English |
18,492 | Not found in enm or en |
ang headwords shared with Modern English (en) |
1,429 | Words that survived ~1,000 years |
enm headwords unique to Middle English |
34,580 | Not found in ang or en |
enm headwords shared with Modern English (en) |
6,363 | Words that survived from Middle English |
ang ∩ enm (not in en) |
768 | Present in both old layers, lost in modern |
ang ∩ enm ∩ en |
588 | Present across all three eras |
Combined Dictionary Composition
The combined dictionaries are mathematically exact unions of their component layers — no headwords are added or lost in the merge.
| Dictionary | Calculation | Expected | Actual | Delta |
|---|---|---|---|---|
ang+en |
ang (20,689) + en (900,726) − overlap (1,429) | 919,986 | 919,986 | 0 |
enm+en |
enm (41,711) + en (900,726) − overlap (6,363) | 936,074 | 936,074 | 0 |
ang+enm+en |
union of all three | 954,566 | 954,566 | 0 |
The incremental contribution of each historical layer over the en baseline:
- Adding
ang→ +19,260 headwords (+2.1%) - Adding
enm→ +35,348 headwords (+3.9%) - Adding both
ang+enm→ +53,840 headwords (+5.98%)
The relatively small percentage gain reflects how thoroughly Modern English Wiktionary already covers vocabulary — the historical layers contribute depth (richer etymology, morphological variants) rather than breadth.
noetym Variants
Each dictionary has a -noetym companion (e.g., en-en-noetym-20260301). These are size-reduced versions intended for resource-constrained environments (older e-readers, smaller storage).
Comparison of en vs en-noetym:
| File | Full | noetym | Reduction |
|---|---|---|---|
.dict (raw) |
177.5 MB | 136.4 MB | −41.1 MB (23%) |
.dict.dz (compressed) |
46.5 MB | 34.6 MB | −11.9 MB (26%) |
.idx |
17.5 MB | 17.5 MB | 0 |
.syn |
8.2 MB | 8.2 MB | 0 |
The .idx and .syn files are byte-identical between full and noetym variants — headword lists and synonym indexes are unchanged. Only the .dict content differs: etymology paragraphs are stripped while definitions remain intact. The noetym variants are not analyzed separately in this document as they are structurally identical to their full counterparts.
Font Coverage Analysis
Glyph and codepoint coverage analysis of fonts in the dictionary set. Generated from build date 2026-03-01.
Coverage is measured across three non-overlapping content categories:
- Headwords — the index terms users look up, parsed from
.idxfiles. Metric: percentage of headwords where every character is present in the font (fully renderable). - Entry Content — definition text, part-of-speech labels, usage notes. Sourced from
-noetym.dictfiles, which exclude etymology. Metric: percentage of total character occurrences covered (frequency-weighted). - Etymology — the delta between full and
-noetym.dictfiles: text present only in etymological content. Metric: percentage of total character occurrences covered (frequency-weighted).
Users running -noetym variants are only subject to Headword and Entry Content coverage.
Fonts
| Font | Category | Notes |
|---|---|---|
| Atkinson Hyperlegible Next | Accessibility | Designed for low-vision readers; optimised for character distinction |
| Lexica Ultralegible | Accessibility | Legibility-focused; similar goals to Atkinson |
| OpenDyslexic | Accessibility | Weighted letterforms designed to reduce dyslexic reading errors |
| Georgia | E-reader Staple | Serif, widely bundled on e-readers and operating systems |
| Garamond | E-reader Staple | Classical serif; narrow codepoint coverage |
| Palatino | E-reader Staple | Humanist serif; narrow codepoint coverage |
| Bookerly | E-reader Staple | Amazon Kindle’s primary reading font; broad Latin coverage |
| Caecilia | E-reader Staple | Slab serif; default on early Kindle devices |
| Charis SIL | Broad Coverage | Designed for linguists; full IPA and extended Latin support |
| DejaVu Sans | Broad Coverage | Wide Unicode coverage; common on Linux-based e-readers |
| Noto Sans | Broad Coverage | Google’s pan-Unicode font family |
Headword Coverage
Percentage of headwords fully renderable (all characters present in font).
| Font | ang |
enm |
en |
ang+en |
enm+en |
ang+enm+en |
|---|---|---|---|---|---|---|
| Atkinson Hyperlegible Next | 99.7% | 96.8% | 99.5% | 99.5% | 99.4% | 99.4% |
| Lexica Ultralegible | 99.7% | 96.8% | 99.5% | 99.5% | 99.4% | 99.4% |
| OpenDyslexic | 99.7% | 96.8% | 99.6% | 99.6% | 99.5% | 99.5% |
| Georgia | 99.7% | 96.8% | 99.5% | 99.5% | 99.4% | 99.4% |
| Garamond | 99.4% | 96.8% | 99.2% | 99.2% | 99.1% | 99.1% |
| Palatino | 99.7% | 96.8% | 99.4% | 99.4% | 99.3% | 99.3% |
| Bookerly | 99.8% | 99.9% | 99.6% | 99.6% | 99.6% | 99.6% |
| Caecilia | 99.7% | 96.8% | 99.5% | 99.5% | 99.4% | 99.4% |
| Charis SIL | 99.8% | 99.9% | 99.7% | 99.7% | 99.7% | 99.7% |
| DejaVu Sans | 99.8% | 99.9% | 99.8% | 99.8% | 99.8% | 99.8% |
| Noto Sans | 99.8% | 100.0% | 99.7% | 99.7% | 99.7% | 99.7% |
Headword coverage is high across all fonts. The enm column shows a cluster of fonts dropping to 96.8% — this reflects Middle English’s use of characters such as yogh (ȝ) and other extended forms that narrower fonts do not include. Noto Sans is the only font achieving 100% headword coverage for any dictionary (enm).
The small shortfall in all fonts across all dictionaries is partly structural: the ang dictionary contains 47 headwords that are Runic characters themselves (the Old English futhorc runes treated as lexical entries). No font in this set covers the Runic block; those 47 headwords will not render in any of these fonts without system fallback.
Entry Content Coverage
Percentage of character occurrences in definition text covered by the font (frequency-weighted; excludes etymology).
| Font | ang |
enm |
en |
ang+en |
enm+en |
ang+enm+en |
|---|---|---|---|---|---|---|
| Atkinson Hyperlegible Next | 99.9% | 99.1% | 99.4% | 100.0% | 100.0% | 100.0% |
| Lexica Ultralegible | 100.0% | 99.2% | 99.4% | 100.0% | 100.0% | 100.0% |
| OpenDyslexic | 100.0% | 99.8% | 99.9% | 100.0% | 100.0% | 100.0% |
| Georgia | 100.0% | 99.1% | 99.4% | 100.0% | 100.0% | 100.0% |
| Garamond | 99.7% | 99.1% | 99.4% | 100.0% | 100.0% | 100.0% |
| Palatino | 99.8% | 99.1% | 99.4% | 100.0% | 100.0% | 100.0% |
| Bookerly | 100.0% | 99.6% | 99.7% | 100.0% | 100.0% | 100.0% |
| Caecilia | 100.0% | 99.1% | 99.4% | 100.0% | 100.0% | 100.0% |
| Charis SIL | 100.0% | 99.9% | 99.9% | 100.0% | 100.0% | 100.0% |
| DejaVu Sans | 100.0% | 99.9% | 99.9% | 100.0% | 100.0% | 100.0% |
| Noto Sans | 100.0% | 99.9% | 99.9% | 100.0% | 100.0% | 100.0% |
Entry content coverage is excellent across all fonts. The combined dictionaries (ang+en, enm+en, ang+enm+en) round to 100.0% for all fonts — the historical content is small enough by character volume relative to the Modern English base that any residual gaps are diluted to sub-rounding thresholds.
The enm standalone dictionary shows a consistent ~99.1% floor for narrower fonts, reflecting Middle English characters absent from those fonts appearing in definition text. The sub-100% figures for en across several fonts are primarily due to IPA characters used in pronunciation guides within entries (see IPA section below).
There is a bounded rendering gap in ang entry content: 30 Latin-script headwords carry a secondary sense that names a runic character, referencing the rune glyph inline (e.g. “the runic character ᛞ (/d/)”). No font in this set covers Runic; those 30 inline glyphs will not render without system fallback. The frequency impact is negligible in the coverage percentages above but the visual gap will be apparent to readers looking up those entries.
Etymology Coverage
Percentage of character occurrences in etymological text covered by the font (frequency-weighted). Relevant only for full (non-noetym) dictionary variants.
| Font | ang |
enm |
en |
ang+en |
enm+en |
ang+enm+en |
|---|---|---|---|---|---|---|
| Atkinson Hyperlegible Next | 99.2% | 99.4% | 98.9% | 98.9% | 98.9% | 98.9% |
| Lexica Ultralegible | 99.5% | 99.8% | 99.4% | 99.4% | 99.4% | 99.4% |
| OpenDyslexic | 99.7% | 99.9% | 99.6% | 99.6% | 99.6% | 99.6% |
| Georgia | 99.5% | 99.8% | 99.5% | 99.5% | 99.5% | 99.5% |
| Garamond | 98.3% | 98.8% | 98.8% | 98.8% | 98.8% | 98.8% |
| Palatino | 99.1% | 99.4% | 98.9% | 98.9% | 98.9% | 98.9% |
| Bookerly | 99.7% | 99.9% | 99.6% | 99.6% | 99.6% | 99.6% |
| Caecilia | 99.4% | 99.5% | 98.9% | 99.0% | 99.0% | 99.0% |
| Charis SIL | 99.6% | 99.8% | 99.3% | 99.3% | 99.3% | 99.3% |
| DejaVu Sans | 99.7% | 100.0% | 99.8% | 99.8% | 99.8% | 99.8% |
| Noto Sans | 99.7% | 99.9% | 99.7% | 99.7% | 99.7% | 99.7% |
Etymology coverage is consistently high but no font achieves 100% across all dictionaries. Etymological text makes heavier use of reconstructed Proto-Germanic and Proto-Indo-European forms, cognate examples in other languages (German, Dutch, Gothic, Old Norse), and specialised linguistic notation — all of which draw on a wider codepoint range than plain definition text. Garamond shows the lowest floor (98.3% for ang), consistent with its narrow overall codepoint inventory.
A small set of Runic characters (7 codepoints) appear exclusively in etymological text in ang — not in headwords or definitions. No font covers these; the rendering gap exists but affects only etymological content in the full ang variant.
Unicode Block Coverage
Coverage of each Unicode block present in the dictionary content, by font. Values are percentage of codepoints within that block that the font supports. Blocks are ordered by number of distinct codepoints present in the dictionaries.
| Block | Codepoints in dicts | Atkinson | Lexica | OpenDyslexic | Georgia | Garamond | Palatino | Bookerly | Caecilia | Charis | DejaVu | Noto |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Basic Latin | 96 | 98% | 99% | 99% | 99% | 99% | 99% | 99% | 99% | 99% | 99% | 99% |
| Latin-1 Supplement | 93 | 94% | 100% | 100% | 100% | 94% | 100% | 100% | 100% | 100% | 100% | 100% |
| Latin Extended | 203 | 38% | 57% | 73% | 50% | 0% | 33% | 90% | 46% | 100% | 100% | 100% |
| Latin Extended Additional | 116 | 3% | 53% | 71% | 3% | 0% | 0% | 100% | 0% | 100% | 100% | 100% |
| Latin Extended-D | 58 | 0% | 0% | 3% | 0% | 0% | 0% | 0% | 0% | 95% | 26% | 95% |
| IPA Extensions | 90 | 0% | 6% | 88% | 0% | 0% | 0% | 11% | 0% | 100% | 100% | 100% |
| Phonetic Extensions | 101 | 0% | 0% | 11% | 0% | 0% | 0% | 0% | 0% | 100% | 84% | 100% |
| Spacing Modifier Letters | 74 | 1% | 12% | 42% | 12% | 0% | 12% | 31% | 12% | 100% | 82% | 100% |
| Combining Diacritics | 71 | 21% | 23% | 65% | 11% | 0% | 0% | 42% | 0% | 97% | 93% | 100% |
| Greek and Coptic | 103 | 4% | 63% | 85% | 63% | 0% | 1% | 68% | 4% | 20% | 99% | 93% |
| Cyrillic | 141 | 0% | 0% | 88% | 62% | 45% | 0% | 86% | 0% | 86% | 100% | 100% |
| General Punctuation | 68 | 25% | 25% | 72% | 31% | 22% | 24% | 76% | 24% | 85% | 97% | 100% |
| Superscripts and Subscripts | 38 | 0% | 3% | 50% | 47% | 0% | 3% | 47% | 0% | 100% | 79% | 100% |
| Number Forms | 47 | 0% | 0% | 66% | 9% | 0% | 0% | 68% | 0% | 100% | 79% | 40% |
| Currency Symbols | 34 | 6% | 6% | 59% | 32% | 0% | 3% | 12% | 3% | 97% | 65% | 97% |
| Letterlike Symbols | 60 | 5% | 5% | 17% | 10% | 2% | 2% | 17% | 5% | 13% | 93% | 100% |
| Arrows | 61 | 0% | 0% | 25% | 0% | 0% | 0% | 23% | 0% | 30% | 100% | 0% |
| Mathematical Operators | 220 | 6% | 6% | 7% | 6% | 0% | 6% | 25% | 6% | 18% | 100% | 0% |
| Runic | 44 | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% |
IPA Extensions
90 distinct IPA codepoints are present in the dictionaries, used primarily in pronunciation guides within en entry content. IPA also appears in ang entry content where rune definitions include phonetic values.
| Font | IPA Coverage |
|---|---|
| Atkinson Hyperlegible Next | 0% (0/90) |
| Lexica Ultralegible | 6% (5/90) |
| OpenDyslexic | 88% (79/90) |
| Georgia | 0% (0/90) |
| Garamond | 0% (0/90) |
| Palatino | 0% (0/90) |
| Bookerly | 11% (10/90) |
| Caecilia | 0% (0/90) |
| Charis SIL | 100% (90/90) |
| DejaVu Sans | 100% (90/90) |
| Noto Sans | 100% (90/90) |
IPA coverage is a clear dividing line. Six fonts — Atkinson Hyperlegible Next, Georgia, Garamond, Palatino, Caecilia, and to a meaningful degree Bookerly and Lexica Ultralegible — provide little to no IPA support. Readers using these fonts will encounter rendering gaps wherever pronunciation is given inline. The three broad-coverage fonts (Charis, DejaVu, Noto) and OpenDyslexic cover IPA well; Charis SIL was specifically designed for this use case.
Runic
No font in this set covers the Runic Unicode block. The 44 Runic codepoints present in the dictionaries are confined to ang (and by extension the combined dictionaries that include ang). All Runic rendering depends entirely on system font fallback.
Runic content appears in three contexts:
- 47 headwords that are the Old English futhorc runes themselves, treated as lexical entries in
ang - 30 Latin-script headword entries that carry a secondary sense naming a runic character, with the rune glyph inline in the definition text
- 7 Runic codepoints that appear exclusively in etymological content in
ang(coverage gap noted; affects full variant only)
For dictionaries that do not include ang (enm, en, enm+en) there is no Runic content and this does not apply.
Practical Rendering Notes
en (Modern English only)
All fonts render headwords at 99%+. The primary gap is IPA in entry content: six fonts have 0% IPA coverage, meaning pronunciation guides will not render. Users who do not rely on inline pronunciation notation are not meaningfully affected. For full rendering including IPA, Charis SIL, DejaVu Sans, Noto Sans, or OpenDyslexic are the suitable choices.
enm (Middle English only)
Several fonts drop to 96.8% headword coverage due to Middle English characters (notably yogh) outside their Latin range. Bookerly, Charis SIL, DejaVu Sans, and Noto Sans hold at 99.9–100%. The accessibility fonts (Atkinson, Lexica) and the narrower e-reader staples (Garamond, Palatino, Caecilia, Georgia) all show this gap.
ang (Old English only)
Headword coverage is near-universal for Latin-script headwords across all fonts. The Runic gap is absolute — 47 headwords and 30 inline definition glyphs will not render in any of these fonts. This is a known constraint of the font set, not of the dictionary. Entry content and etymology coverage are otherwise excellent (99%+).
Combined dictionaries (ang+en, enm+en, ang+enm+en)
Coverage figures largely mirror en — the Modern English content dominates by volume and dilutes any historical-layer gaps in the frequency-weighted metrics. Headword coverage is 99%+ across all fonts. The Runic and IPA gaps described above apply where the relevant layers are present. Users of -noetym variants of any combined dictionary will see 100.0% entry content coverage for most fonts.