Git Internals Explained Simply – How Git Actually Stores Data

Home » Cloud & DevSecOps » Git Internals Explained – How Git Actually Stores Data

Understanding Git from the inside out — clearly, practically and with real-world analogies.

Why You Should Care About Git Internals

Most software engineer use Git daily — git add, git commit, git push — without ever thinking about what Git actually does under the hood. That’s fine, until it isn’t.

Understanding Git internals helps

Debug strange Git issues with confidence
Recover lost commits and corrupted repositories
Design better CI/CD and branching strategies
Use Git more efficiently at scale (large repos, mono-repos)
Explain Git clearly to your team (a leadership skill)

At its core, Git is not a version control tool — it is a content-addressable database. Once we understand this single idea, everything else clicks.

Git in One Sentence

Git stores snapshots of your project as immutable objects, addressed by the cryptographic hash of their content.

The `.git` Directory: Git’s Brain

When we run

git init

Git creates a hidden directory

.git/

This folder contains everything Git knows about our repository. Delete it, and our project becomes a normal folder again.

Key internal directories

			
.git/
├── objects/     # All Git data lives here
├── refs/        # Branches and tags
├── HEAD         # Pointer to current branch
├── index        # Staging area
├── config       # Repo-specific config

		

We’ll focus mainly on objects, because that’s where Git truly stores data.

Git Is a Content-Addressable Object Store

Git does not store files. It stores objects.

Each object is

Immutable (never changes)
Identified by a SHA-1 hash (40 hex characters)
Stored based on its content, not its name

If two files have identical content, Git stores only one object.

This design makes Git

Extremely space-efficient
Naturally deduplicated
Cryptographically verifiable

The Four Core Git Objects

Git has only four object types:

Blob – file content
Tree – directory structure
Commit – snapshot + metadata
Tag – named reference (optional)

Everything in Git is built from these.

Blob Objects – Storing File Content

A blob stores the contents of a file — nothing else. Blobs do not store: File name, File Path, No file history

Example

File

hello.txt

Content

Hello Git

Git stores

blob "Hello Git"

we can inspect it

git hash-object hello.txt

This outputs a SHA-1 hash, for example

557db03de997c86a4a028e1ebd3a1ceb225be238

Git stores this blob at

.git/objects/55/7db03de997c86a4a028e1ebd3a1ceb225be238

Compressed. Immutable. Permanent.

Tree Objects: Representing Directories

A tree represents a directory.

It maps

Filenames
To blobs or other trees
With permissions

Think of a tree as Git’s version of a filesystem index.

Example Tree

			
project/
├── README.md
└── src/
    └── app.py

Git creates

Blob for README.md
Blob for app.py
Tree for src/
Tree for project/

Trees reference blobs and other trees by hash, not by path.

Commit Objects – Snapshots, Not Diffs

This is the most misunderstood part of Git.

Git does not store diffs — it stores full snapshots.

A commit contains

Reference to a root tree
Parent commit(s)
Author & committer
Timestamp
Commit message

Commit Structure (Conceptual)

			
commit
├── tree: <root-tree-hash>
├── parent: <parent-commit-hash>
├── author: Rahul <rahul@email>
├── committer: Rahul <rahul@email>
└── message: "Initial commit"

		

Each commit is a node in a DAG (Directed Acyclic Graph).

Why Git Is Fast & Still Space Efficient

Although commits store full snapshots, Git is still fast and space-efficient. Why?

Because unchanged files reuse the same blob objects. Only new or modified files create new blobs.

This is called structural sharing — a powerful idea used in functional programming and distributed systems.

The Staging Area (Index) – Git’s Secret Weapon

The index (or staging area) sits between

Working Directory → Index → Repository

When we run

git add file.txt

Git

Creates a blob
Stores it in .git/objects
Adds a reference in .git/index

This allows

Partial commits
Clean commit history
Fine-grained control (enterprise teams love this)

Branches Are Just Pointers

A Git branch is not a copy of code.

It is simply

refs/heads/main → <commit-hash>

That’s it.

Creating a branch

git branch feature-x

Means

New pointer
Same commit
Zero data copied

This is why branching in Git is cheap and fast.

HEAD – Where You Are Right Now

HEAD is a special pointer.

Usually

HEAD → refs/heads/main

Detached HEAD

HEAD → <commit-hash>

Understanding HEAD helps you

Recover lost commits
Understand rebases
Avoid accidental history loss
Recover from mistakes safely

Tags – Human-Friendly Names

Tags point to commits.

Two types

Lightweight tag (simple pointer)
Annotated tag (full object)

Annotated tags are preferred for

Releases
Production deployments
Audit trails

Git Internals and Distributed Systems

From an architect’s lens, Git is a distributed system:

Immutable data structures
Content-addressable storage
Merkle DAG
Eventual consistency across clones

Every clone is

A full replica
Capable of independent operation
Cryptographically verifiable

This is why Git scales so well across

Large enterprises
Open-source ecosystems
Global teams

Real-World Example – Recovering a Lost Commit

Because commits are immutable

git reflog

Shows where HEAD pointed in the past. we can recover almost anything — a lifesaver in production incidents.

Remember:

Blob → File content
Tree → Folder structure
Commit → Snapshot + metadata
Branch → Pointer
HEAD → Current pointer

Once this clicks, Git stops being magical — and becomes predictable.

Architect’s Perspective

Git’s design is

Simple
Immutable
Distributed by default

These same principles appear in

Blockchains
Event sourcing
Modern data platforms

Mastering Git internals doesn’t just make you a better developer — it sharpens system design intuition.

Git Internals: A Complete Guide for Engineers

Part 1: How Git Stores Data
Part 2: How Git Scales with Packfiles & Compression
Part 3: Git Branching, Merging & Rebase Internals
Part 4: Git Internals for CI/CD, Mono-Repos
Part 5: Git Security Internals

Frequently Asked Questions (FAQ)

Is Git a database?

Yes. Internally, Git works like a content-addressable database. Every file, directory, and commit is stored as an object identified by a cryptographic hash.

How does Git store files internally?

Git stores file contents as blob objects, directories as tree objects, and history as commit objects. Each object is stored using a hash of its content.

Does Git store diffs or full copies of files?

Git stores snapshots, not diffs. Each commit represents a complete snapshot of the project, but Git optimizes storage using references and compression behind the scenes.

Why does Git use hashes (SHA)?

Hashes ensure data integrity. If any content changes, its hash changes. This makes Git history tamper-evident and reliable across distributed systems.

Where does Git store all this data?

Git stores everything inside the .git directory, including objects, references, the index (staging area), and metadata.

Is understanding Git internals useful for everyday developers?

Yes. Understanding Git internals helps developers
1. Debug broken repositories
2. Use Git more confidently
3. Understand branching, merging, and rebasing clearly
4. Work effectively in large teams and CI/CD systems

Git Internals Explained – How Git Actually Stores Data

Why You Should Care About Git Internals

Git in One Sentence

The `.git` Directory: Git’s Brain

Git Is a Content-Addressable Object Store

The Four Core Git Objects

Blob Objects – Storing File Content

Example

Tree Objects: Representing Directories

Example Tree

Commit Objects – Snapshots, Not Diffs

Commit Structure (Conceptual)

Why Git Is Fast & Still Space Efficient

The Staging Area (Index) – Git’s Secret Weapon

Branches Are Just Pointers

HEAD – Where You Are Right Now

Tags – Human-Friendly Names

Git Internals and Distributed Systems

Real-World Example – Recovering a Lost Commit

Architect’s Perspective

Frequently Asked Questions (FAQ)

Like this:

Related

2 thoughts on “Git Internals Explained – How Git Actually Stores Data”

Why You Should Care About Git Internals

Git in One Sentence

The .git Directory: Git’s Brain

Git Is a Content-Addressable Object Store

The Four Core Git Objects

Blob Objects – Storing File Content

Example

Tree Objects: Representing Directories

Example Tree

Commit Objects – Snapshots, Not Diffs

Commit Structure (Conceptual)

Why Git Is Fast & Still Space Efficient

The Staging Area (Index) – Git’s Secret Weapon

Branches Are Just Pointers

HEAD – Where You Are Right Now

Tags – Human-Friendly Names

Git Internals and Distributed Systems

Real-World Example – Recovering a Lost Commit

Architect’s Perspective

Frequently Asked Questions (FAQ)

Share this:

Like this:

Related

2 thoughts on “Git Internals Explained – How Git Actually Stores Data”

Discover more from Rahul Suryawanshi

The `.git` Directory: Git’s Brain