A Screenshot and OCR Workflow for Wayland
X11 screenshot tools like scrot or maim do not work on Wayland. The compositor owns the display, and there is no equivalent of grabbing the X framebuffer. On Sway, the replacements are grim (screenshot capture) and slurp (region selection). Together with Tesseract for OCR, they form a lightweight screenshot suite that is entirely keyboard-driven.
Region Screenshot
The simplest case: select a region, capture it, copy to clipboard.
#!/usr/bin/env bash
tmpfile=$(mktemp /tmp/screenshot-XXXXXX.png)
grim -g "$(slurp)" "$tmpfile" && echo "Screenshot saved to $tmpfile"
wl-copy < "$tmpfile"
mv "$tmpfile" "$HOME/Pictures/"slurp gives you a crosshair to draw a rectangle. grim -g captures that exact region. wl-copy puts the image in the Wayland clipboard. The file gets moved to ~/Pictures/ as a backup.
Focused Window Screenshot
Instead of manually selecting a region, this variant grabs the currently focused window by querying the Sway tree:
geometry=$(swaymsg -t get_tree | jq '.. | select(.focused? == true) | .rect | "\(.x),\(.y) \(.width)x\(.height)"' | head -n 1 | tr -d '"')
if [ -n "$geometry" ]; then
grim -g "$geometry" "$tmpfile" && echo "Screenshot saved to $tmpfile"
wl-copy < "$tmpfile"
mv "$tmpfile" "$HOME/Pictures/"
fiThe jq query recursively walks the Sway window tree, finds the node with .focused == true, and extracts its position and size in the x,y widthxheight format that grim expects.
Adding OCR
This is where it gets interesting. Select a region, capture it, run it through Tesseract, and get the extracted text in your clipboard:
#!/usr/bin/env bash
tmpfile=$(mktemp /tmp/screenshot-ocr-XXXXXX.png)
grim -g "$(slurp)" "$tmpfile" && echo "Screenshot saved to $tmpfile"
tesseract "$tmpfile" - -l eng 2>/dev/null | wl-copy
copied_text=$(wl-paste)
truncated_text="${copied_text:0:50}$([ ${#copied_text} -gt 50 ] && echo "..." || echo "")"
notify-send "OCR Complete" "Text copied to clipboard: $truncated_text"
mv "$tmpfile" "$HOME/Pictures/"tesseract "$tmpfile" - -l eng reads the image and writes the extracted text to stdout (the - tells Tesseract to output to stdout instead of a file). That gets piped straight into wl-copy. A desktop notification shows a 50-character preview of what was captured so you get instant feedback without switching windows.
The truncation is a nice touch for notifications -- without it, a full page of OCR text would create an absurdly tall notification bubble.
The Color Picker
A bonus one-liner that captures a single pixel and copies the hex color to clipboard:
bindsym CTRL+Print exec grim -g "$(slurp -p)" -t ppm - | convert - -format '%[pixel:p{0,0}]' txt:- | tail -n 1 | cut -d ' ' -f 4 | wl-copyslurp -p selects a single pixel instead of a region. grim captures it as a PPM image piped to stdout. ImageMagick's convert extracts the hex color value. The result lands in your clipboard.
Keybindings
All four variants are bound to intuitive key combinations in the Sway config:
bindsym Print exec $USER_BIN/screenshot.sh &
bindsym Shift+Print exec $USER_BIN/screenshot-highlighted-window.sh &
bindsym Mod1+Print exec $USER_BIN/screenshot-ocr.sh &
bindsym Mod1+Shift+Print exec $USER_BIN/screenshot-window-ocr.sh &
bindsym CTRL+Print exec grim -g "$(slurp -p)" -t ppm - | convert - -format '%[pixel:p{0,0}]' txt:- | tail -n 1 | cut -d ' ' -f 4 | wl-copyPrint for region, Shift+Print for focused window, Alt+Print for region OCR, Alt+Shift+Print for window OCR, and Ctrl+Print for the color picker. Easy to remember once you think of Alt as the "OCR modifier" and Shift as the "focused window modifier."
The Good and The Bad
The good: this entire setup is five short shell scripts and five lines of keybindings. There is no screenshot application running in the background, no GUI to navigate, no settings to configure. Press a key, get a result.
The bad: Tesseract's accuracy depends heavily on the source. Clean rendered text from a terminal or code editor works well. Text over complex backgrounds, curved text, or handwriting will produce garbage. For those cases you still need to screenshot and read it yourself. Adding -l eng+nld for multiple languages helps if you work in more than one language, but it slows down the processing noticeably.
Dependencies: grim, slurp, wl-clipboard, tesseract, jq, and optionally imagemagick for the color picker. On Arch, that is pacman -S grim slurp wl-clipboard tesseract tesseract-data-eng jq imagemagick.
