What Is a Hash Function?
A hash function takes an input of any size — a single character, a paragraph, or an entire file — and produces a fixed-length output called a digest or hash. The same input always produces the same hash. Even a tiny change in the input produces a completely different hash. And crucially, you cannot reverse the process: given a hash, there is no mathematical way to recover the original input.
These properties make hash functions indispensable in software engineering, security, and data management. They verify file integrity, store passwords safely, detect duplicates, and underpin digital signatures and blockchain technology.
How Hashing Works
When you hash a string like "hello", the algorithm processes it through a series of mathematical transformations — bitwise operations, modular arithmetic, and compression functions — to produce a fixed-length output. For SHA-256, the result is always 256 bits (64 hexadecimal characters), regardless of whether the input is 5 characters or 5 million.
The key properties that define a good hash function are:
**Deterministic**: The same input always gives the same output. Hash "hello" a million times and you get the same result every time.
**Avalanche effect**: Change a single bit in the input and roughly half the output bits change. "hello" and "Hello" produce hashes that look completely unrelated.
**Pre-image resistance**: Given a hash value, it is computationally infeasible to find an input that produces it.
**Collision resistance**: It is extremely difficult to find two different inputs that produce the same hash.
Common Hash Algorithms
MD5
MD5 produces a 128-bit (32-character hex) digest. Created in 1991 by Ronald Rivest, it was widely used for file integrity checks and password storage for over a decade. However, MD5 is now considered cryptographically broken — researchers have demonstrated practical collision attacks, meaning they can create two different files with the same MD5 hash.
MD5 remains acceptable for non-security purposes: checksums to verify file downloads, deduplication keys, or cache invalidation identifiers. But it should never be used for password hashing, digital signatures, or any application where collision resistance matters.
SHA-1
SHA-1 produces a 160-bit (40-character hex) digest. Developed by the NSA and published in 1995, it was the standard hash function for digital certificates, Git commits, and many security protocols. In 2017, Google demonstrated the first practical SHA-1 collision (the "SHAttered" attack), and the algorithm is now deprecated for security use.
Git still uses SHA-1 for commit identifiers, though it is migrating to SHA-256. For new projects, SHA-1 should be avoided in favor of SHA-2 family algorithms.
SHA-256
SHA-256, part of the SHA-2 family, produces a 256-bit (64-character hex) digest. It is currently the most widely used cryptographic hash function. No practical attacks against SHA-256 are known, and it is the backbone of Bitcoin's proof-of-work system, TLS certificates, and countless security protocols.
SHA-256 strikes a good balance between security and performance. For most applications, it is the recommended default choice.
SHA-512
SHA-512 produces a 512-bit (128-character hex) digest. It offers a larger security margin than SHA-256 and can actually be faster on 64-bit processors due to its use of 64-bit operations. It is a good choice when you need extra security headroom or are working on 64-bit systems.
Practical Applications
File Integrity Verification
When you download software, the publisher often provides a SHA-256 hash. After downloading, you compute the hash of your file and compare it to the published value. If they match, the file has not been corrupted or tampered with during transfer.
Password Storage
Passwords should never be stored in plain text. Instead, applications hash the password and store only the hash. When a user logs in, the application hashes the submitted password and compares it to the stored hash. Even if the database is compromised, attackers get hashes, not passwords.
For password hashing specifically, algorithms like bcrypt, scrypt, or Argon2 are preferred over raw SHA-256 because they are deliberately slow and incorporate salting, which protects against rainbow table attacks.
Data Deduplication
Hashing allows efficient duplicate detection. Rather than comparing potentially massive files byte by byte, compute their hashes and compare those. Identical hashes (with a good algorithm) mean identical content. Cloud storage services use this technique to avoid storing the same file multiple times.
Digital Signatures
Digital signatures combine hashing with asymmetric cryptography. Rather than signing an entire document (which would be slow), the signer hashes the document and signs only the hash. The recipient hashes the document independently and verifies the signature against their computed hash. This is both faster and proves the document has not been modified.
Choosing the Right Algorithm
For security-critical applications (digital signatures, certificates, authentication): use SHA-256 or SHA-512. These have no known practical attacks and are widely supported.
For integrity checks (file verification, cache keys, deduplication): SHA-256 is ideal, but MD5 is acceptable when security is not a concern and speed matters.
For password storage: use bcrypt, scrypt, or Argon2 — not a general-purpose hash function.
For legacy compatibility: if you must interact with systems using MD5 or SHA-1, use them for that purpose but plan a migration path to stronger algorithms.
The bottom line: when in doubt, use SHA-256. It is fast, secure, and universally supported.