Overview
TXT, or plain text, is the most fundamental digital document format — a file containing nothing but a sequence of characters with no formatting, styling, or structural metadata. Every character in a TXT file maps directly to a code point in the file's character encoding (typically UTF-8 in modern practice, or ASCII, Latin-1, or other legacy encodings in older files), and the file's meaning is entirely determined by the text itself. There are no bold or italic markers, no font specifications, no embedded images, no hyperlinks — just characters, spaces, and line breaks.
This radical simplicity is both the format's greatest strength and its most obvious limitation. Any computer ever made can read a plain text file. Text files created on a mainframe in the 1970s are still perfectly readable today. No special software is required, no proprietary codec needs licensing, and no version compatibility concerns arise. A text editor, a terminal, a web browser, a phone — all can display the contents immediately.
Plain text is the foundation on which almost all other text-based formats are built. HTML, XML, JSON, YAML, CSV, Markdown, source code in every programming language, configuration files, shell scripts, log files, and even the PostScript and SVG formats described elsewhere in this guide are all, at their core, structured plain text with conventions layered on top.
History
Plain text predates personal computing entirely. The ASCII (American Standard Code for Information Interchange) encoding, which defines the basic Latin alphabet, digits, punctuation, and control characters in 7-bit codes (0-127), was first published as ASA X3.4 in 1963 by the American Standards Association (now ANSI). ASCII became the foundation of text interchange between mainframes, minicomputers, and terminals throughout the 1960s and 1970s.
As computing globalized, the need for non-Latin scripts led to a proliferation of incompatible 8-bit encodings: Latin-1 (ISO 8859-1) for Western European languages, Shift_JIS for Japanese, EUC-KR for Korean, and hundreds of others. The Unicode Consortium, founded in 1991, developed the Universal Character Set to unify all scripts into a single encoding. UTF-8, designed by Ken Thompson and Rob Pike in 1993, became the dominant text encoding on the web (surpassing 98% of web pages by 2024) because it is backward-compatible with ASCII while supporting every Unicode code point.
Technical Details
A TXT file has no header, no footer, and no structural markers — it is simply a stream of bytes interpreted according to a character encoding. In UTF-8, ASCII characters (0-127) occupy one byte each, while characters outside the ASCII range use 2, 3, or 4 bytes per code point using a prefix-encoding scheme. The Byte Order Mark (BOM, U+FEFF) is sometimes prepended to UTF-8 files for encoding identification, though this practice is discouraged on Unix-like systems.
Line endings differ by platform: Unix and macOS use a single Line Feed (LF, 0x0A), Windows uses Carriage Return followed by Line Feed (CRLF, 0x0D 0x0A), and classic Mac OS used a single Carriage Return (CR, 0x0D). Most modern text editors and parsers handle all three conventions transparently. The MIME type text/plain is used in HTTP and email; the optional charset parameter (e.g., text/plain; charset=utf-8) specifies the encoding. Without it, receivers must guess the encoding, which historically caused mojibake (garbled characters) when the guess was wrong.
Pros & Cons
Pros
- Universally readable by every operating system, editor, and programming language
- No vendor lock-in, no proprietary dependencies, no format obsolescence risk
- Minimal overhead — the entire file is content with no metadata bloat
- Perfect for version control (git diff works line by line on plain text)
- Foundation for all structured text formats (HTML, JSON, XML, CSV, source code)
Cons
- No formatting — no bold, italic, headings, colors, or font control
- Character encoding ambiguity in legacy files can cause garbled text
- No embedded images, tables, or any visual structure beyond whitespace
- Line-ending differences between platforms can cause subtle issues
- No metadata for author, creation date, or document properties
Common Use Cases
- Writing and distributing README files, changelogs, and documentation
- Storing configuration files, environment variables, and application settings
- Logging application events, server access, and error messages
- Writing source code in every programming language
- Exchanging data between systems that need the simplest possible format
- Taking quick notes and drafting content without any formatting overhead