Creator Program:Earn $3 per subscriber you refer.Fast payouts, instant approval.
Learn
CraftJune 12, 2026 · 8 min read

Thumbnail Text That Reads at 168 Pixels: Words, Fonts, and Placement

How many words, how big, where they go, and which font choices survive the shrink to suggested-sidebar size.

You design a thumbnail at 1280×720. Almost nobody sees it at 1280×720. The suggested sidebar — where most channels collect the bulk of their impressions — renders thumbnails at roughly 168 pixels wide. That's the real canvas. Every decision about thumbnail text — how many words, how big, what font, where it sits — should be made for the 168-pixel version, because that's the version doing the selling.

Most text mistakes come from forgetting this. The thumbnail looks great in the editor: crisp eight-word headline, elegant typeface, text tucked neatly into a corner. Then YouTube shrinks it by a factor of eight, slaps a duration badge over the corner, and the elegant headline becomes gray noise. Here's how to make text that survives the trip.

Three to five words — and they must say what the title can't

The word budget is three to five. Not because of some retention study, but because of arithmetic: the more words you add, the smaller every word gets, and at 168 pixels there is no headroom to give away. A five-word phrase set large is readable. A ten-word sentence set at half the size is decoration.

The harder discipline is which words. Your thumbnail never appears alone — the title sits right next to it, always. Text that repeats the title wastes the single most expensive real estate you have. If the title says "I Built a PC for $300," the thumbnail text shouldn't say "$300 PC Build" — it should say the thing the title withholds: "it runs Cyberpunk" or "huge mistake." Title and thumbnail are two halves of one pitch, and the packaging checklist covers that pairing in depth. The short version: if deleting the thumbnail text loses nothing the title already said, the text was dead weight.

A useful editing pass: write the phrase, then remove one word and check whether the meaning survived. Repeat until it breaks, then put the last word back. "You won't believe what happened next" compresses to "then this happened" compresses to nothing — which tells you the phrase was filler all along.

How big: the 10% rule of thumb

On the 1280×720 canvas, letters should be roughly 10% of the frame height or taller — about 80 pixels. That's not a law of optics; it's a rule of thumb that happens to keep letterforms legible after the shrink to sidebar size. Text at 40px looks perfectly fine in the editor and disappears completely in the feed.

But don't measure — test. Zoom your draft out until it's about the width of your thumb on screen, or view it at actual sidebar size, and ask one question: can I read every word in under a second without leaning in? That check beats any pixel value, because the pixel value can't account for your font choice, your contrast treatment, or how busy the image behind the text is.

Fonts: what survives the shrink and what dies

Shrinking an image is a brutality test for typography. Strokes thinner than a couple of pixels at final size simply vanish; tight letter spacing fuses into blobs. That sorts typefaces into survivors and casualties fast:

  • Survives: heavy geometric and grotesque sans-serifs. The DM Sans / Inter class of typeface at bold, extra-bold, or black weight is the safe archetype — thick, even strokes and open counters that stay distinct at any size. No font is "the best font for thumbnails," but this family of shapes is the one you can pick without thinking and never regret.
  • Dies first: thin weights. Light and regular weights of even excellent fonts lose their strokes in the downscale. If the weight name doesn't say bold or heavier, it's a body-text weight, not a thumbnail weight.
  • Dies second: script and decorative faces. Connected letterforms depend on detail that 168 pixels cannot carry. The vibe you're buying costs you the words themselves.
  • Dies quietly: condensed faces with tight tracking. Condensed fonts tempt you because they fit more words — which is the wrong goal — and their narrow counters close up into vertical stripes at small size.

One more lever: ALL CAPS usually reads better at tiny sizes for short phrases. Caps produce a uniform block of consistent height, which holds together as a shape when individual letters start to blur. For three to five words it's almost always the right call; for longer text caps get harder to scan — which is another argument for not having longer text.

Contrast: pick one treatment, not three

Legible text at small size isn't about the font alone — it's about separation from the image behind it. There are four treatments that work:

  1. A solid color block behind the text. The bluntest and most reliable: a rectangle of flat color, text on top. Maximum contrast, zero ambiguity.
  2. A thick stroke or outline. White text with a heavy dark outline (or the reverse) stays readable over almost any background. The operative word is thick — a 1px stroke does nothing after the shrink.
  3. A drop shadow with real offset. Not a subtle 2px blur — a visible, offset shadow that creates a dark zone behind every letter.
  4. Text over a deliberately flat region. Compose the image so one area — sky, a wall, an out-of-focus zone — is quiet, and put the text there. The most elegant option, and the one that requires planning the shot.

The mistake is stacking them. Stroke plus shadow plus glow plus a gradient box reads as visual panic, and the effects start eating the letterforms they were supposed to protect. Choose one treatment per thumbnail and commit.

Placement: where YouTube's own UI won't eat your words

Two regions of every thumbnail belong to YouTube, not to you. The bottom-right corner carries the video duration badge on essentially every surface — text placed there is partially covered for every viewer. The very bottom edge gets the red watch-progress bar for anyone who has started the video before, which is exactly the returning audience you most want to re-hook.

That leaves the upper-left and center-left as the safest zones: nothing overlays them, and the small crops that different feeds apply to thumbnails treat the left side most predictably. Keep text inside the central safe area rather than flush against any edge — the exact margins and surface-by-surface specs are in the thumbnail size and safe-zone guide.

The cheat sheet

DecisionGuidelineCommon mistake
Word count3–5 words that add what the title can'tA full sentence that restates the title
Letter heightRoughly 10%+ of frame height (~80px on 1280×720)Editor-legible 40px text that vanishes in the feed
TreatmentOne: block, thick stroke, offset shadow, or flat regionStroke + shadow + glow stacked on one phrase
PlacementUpper-left or center-left, inside the safe areaBottom-right (duration badge) or bottom edge (progress bar)

When no text wins

Text is a clarity tool, not a requirement. If the image already poses the question — a strong face reacting to something visible in frame, an object in a state that demands explanation, two things that shouldn't be together — words can only dilute it. Some of the most-clicked thumbnails on the platform are textless, because the image carries the tension and the title supplies the context. The principles behind that kind of image-first composition are covered in what makes a good thumbnail.

The honest test: cover the text on your draft. If the thumbnail still makes you want to know what happens, the text was optional — keep it only if it sharpens the promise. If covering the text makes the image meaningless, the text is doing real work, so give it the size and contrast that work deserves.

And if typesetting is the part you fight with — weights, strokes, placement, twelve exports to get one readable frame — that's a solved problem. An AI thumbnail maker applies heavy-weight type, a single clean treatment, and safe-zone placement by default; you describe the concept and check the result at real size. The thinking in this article still decides what the words are. The tooling just stops the words from dying at 168 pixels.

Quick Answers

Questions on this topic

Three to five words, and many strong thumbnails use zero. The text has to read at roughly 168 pixels wide — the size of the suggested sidebar — so every extra word shrinks every other word. Say the one thing the title can't say, and cut everything else.
As a rule of thumb, letters should be at least 10% of the frame height on the 1280×720 canvas — around 80 pixels or taller. But the number is a starting point, not the test. The real test is zooming your draft out to thumb size: if you can't read it instantly, it's too small, whatever the pixel value says.
There's no single best font, but there is a reliable archetype: a heavy geometric or grotesque sans-serif — the DM Sans / Inter class of typeface at bold or black weight. What matters is what survives the shrink: thick, even strokes and open letterforms. Thin weights, scripts, and tightly tracked condensed faces all fall apart at sidebar size.
Upper-left or center-left. The bottom-right corner is covered by the video duration badge on every surface, and the bottom edge gets a red watch-progress bar for returning viewers. Left-side placements also survive the small crops feeds apply more predictably than edge-hugging layouts.

Put this guide to work on your next upload

Generate options that apply these patterns, check them at feed size, and publish the one you’d click yourself.