Understanding Git from the inside out — clearly, practically and with real-world analogies.
Why You Should Care About Git Internals
Most software engineer use Git daily — git add, git commit, git push — without ever thinking about what Git actually does under the hood. That’s fine, until it isn’t.
Understanding Git internals helps
- Debug strange Git issues with confidence
- Recover lost commits and corrupted repositories
- Design better CI/CD and branching strategies
- Use Git more efficiently at scale (large repos, mono-repos)
- Explain Git clearly to your team (a leadership skill)
At its core, Git is not a version control tool — it is a content-addressable database. Once we understand this single idea, everything else clicks.
Git in One Sentence
Git stores snapshots of your project as immutable objects, addressed by the cryptographic hash of their content.
The .git Directory: Git’s Brain
When we run
git init
Git creates a hidden directory
.git/
This folder contains everything Git knows about our repository. Delete it, and our project becomes a normal folder again.
Key internal directories
.git/├── objects/ # All Git data lives here├── refs/ # Branches and tags├── HEAD # Pointer to current branch├── index # Staging area├── config # Repo-specific config
We’ll focus mainly on objects, because that’s where Git truly stores data.
Git Is a Content-Addressable Object Store
Git does not store files. It stores objects.
Each object is
- Immutable (never changes)
- Identified by a SHA-1 hash (40 hex characters)
- Stored based on its content, not its name
If two files have identical content, Git stores only one object.
This design makes Git
- Extremely space-efficient
- Naturally deduplicated
- Cryptographically verifiable
The Four Core Git Objects
Git has only four object types:
- Blob – file content
- Tree – directory structure
- Commit – snapshot + metadata
- Tag – named reference (optional)
Everything in Git is built from these.
Blob Objects – Storing File Content
A blob stores the contents of a file — nothing else. Blobs do not store: File name, File Path, No file history
Example
File
hello.txt
Content
Hello Git
Git stores
blob "Hello Git"
we can inspect it
git hash-object hello.txt
This outputs a SHA-1 hash, for example
557db03de997c86a4a028e1ebd3a1ceb225be238
Git stores this blob at
.git/objects/55/7db03de997c86a4a028e1ebd3a1ceb225be238
Compressed. Immutable. Permanent.
Tree Objects: Representing Directories
A tree represents a directory.
It maps
- Filenames
- To blobs or other trees
- With permissions
Think of a tree as Git’s version of a filesystem index.
Example Tree
project/├── README.md└── src/ └── app.py
Git creates
- Blob for
README.md - Blob for
app.py - Tree for
src/ - Tree for
project/
Trees reference blobs and other trees by hash, not by path.
Commit Objects – Snapshots, Not Diffs
This is the most misunderstood part of Git.
Git does not store diffs — it stores full snapshots.
A commit contains
- Reference to a root tree
- Parent commit(s)
- Author & committer
- Timestamp
- Commit message
Commit Structure (Conceptual)
commit├── tree: <root-tree-hash>├── parent: <parent-commit-hash>├── author: Rahul <rahul@email>├── committer: Rahul <rahul@email>└── message: "Initial commit"
Each commit is a node in a DAG (Directed Acyclic Graph).
Why Git Is Fast & Still Space Efficient
Although commits store full snapshots, Git is still fast and space-efficient. Why?
Because unchanged files reuse the same blob objects. Only new or modified files create new blobs.
This is called structural sharing — a powerful idea used in functional programming and distributed systems.
The Staging Area (Index) – Git’s Secret Weapon
The index (or staging area) sits between
Working Directory → Index → Repository
When we run
git add file.txt
Git
- Creates a blob
- Stores it in
.git/objects - Adds a reference in
.git/index
This allows
- Partial commits
- Clean commit history
- Fine-grained control (enterprise teams love this)
Branches Are Just Pointers
A Git branch is not a copy of code.
It is simply
refs/heads/main → <commit-hash>
That’s it.
Creating a branch
git branch feature-x
Means
- New pointer
- Same commit
- Zero data copied
This is why branching in Git is cheap and fast.
HEAD – Where You Are Right Now
HEAD is a special pointer.
Usually
HEAD → refs/heads/main
Detached HEAD
HEAD → <commit-hash>
Understanding HEAD helps you
- Recover lost commits
- Understand rebases
- Avoid accidental history loss
- Recover from mistakes safely
Tags – Human-Friendly Names
Tags point to commits.
Two types
- Lightweight tag (simple pointer)
- Annotated tag (full object)
Annotated tags are preferred for
- Releases
- Production deployments
- Audit trails
Git Internals and Distributed Systems
From an architect’s lens, Git is a distributed system:
- Immutable data structures
- Content-addressable storage
- Merkle DAG
- Eventual consistency across clones
Every clone is
- A full replica
- Capable of independent operation
- Cryptographically verifiable
This is why Git scales so well across
- Large enterprises
- Open-source ecosystems
- Global teams
Real-World Example – Recovering a Lost Commit
Because commits are immutable
git reflog
Shows where HEAD pointed in the past. we can recover almost anything — a lifesaver in production incidents.
Remember:
- Blob → File content
- Tree → Folder structure
- Commit → Snapshot + metadata
- Branch → Pointer
- HEAD → Current pointer
Once this clicks, Git stops being magical — and becomes predictable.
Architect’s Perspective
Git’s design is
- Simple
- Immutable
- Distributed by default
These same principles appear in
- Blockchains
- Event sourcing
- Modern data platforms
Mastering Git internals doesn’t just make you a better developer — it sharpens system design intuition.
Git Internals: A Complete Guide for Engineers
- Part 1: How Git Stores Data
- Part 2: How Git Scales with Packfiles & Compression
- Part 3: Git Branching, Merging & Rebase Internals
- Part 4: Git Internals for CI/CD, Mono-Repos
- Part 5: Git Security Internals
Frequently Asked Questions (FAQ)
Yes. Internally, Git works like a content-addressable database. Every file, directory, and commit is stored as an object identified by a cryptographic hash.
Git stores file contents as blob objects, directories as tree objects, and history as commit objects. Each object is stored using a hash of its content.
Git stores snapshots, not diffs. Each commit represents a complete snapshot of the project, but Git optimizes storage using references and compression behind the scenes.
Hashes ensure data integrity. If any content changes, its hash changes. This makes Git history tamper-evident and reliable across distributed systems.
Git stores everything inside the .git directory, including objects, references, the index (staging area), and metadata.
Yes. Understanding Git internals helps developers
1. Debug broken repositories
2. Use Git more confidently
3. Understand branching, merging, and rebasing clearly
4. Work effectively in large teams and CI/CD systems
2 thoughts on “Git Internals Explained – How Git Actually Stores Data”
Comments are closed.