🌿

Git Advanced

Master Git internals and power tools: the object model, plumbing commands, history rewriting and recovery.

38 lessons 114 quiz questions
Lessons & quizzes Certificate

📚 Lessons & quizzes

Each lesson ends with its own short quiz. Answer them as you go — score 90% across all lessons to earn your certificate.

1 The Git Object Database

At its core, Git is a content-addressable filesystem. Everything you store — file contents, directory layouts, commits and tags — lives in the object database under .git/objects as one of four object types: blob, tree, commit and tag.

Each object is identified by a hash computed from its content. Store the same bytes twice and you get the same object — Git deduplicates automatically. This is why most Git operations are fast and local: they are simple lookups by hash.

# Count the objects currently in the database
git count-objects -v

2 Content Addressing with SHA-1 and SHA-256

An object’s name is the hash of a header plus its content. Historically Git used SHA-1, producing 40-character hex object IDs. Newer repositories can use SHA-256, producing 64-character IDs, configured at git init --object-format=sha256.

Because the name is derived from content, an object’s ID changes the instant a single byte changes. This gives Git its integrity guarantee: if stored bytes were altered, the recomputed hash would no longer match.

# Initialise a repository using the SHA-256 object format
git init --object-format=sha256 myrepo

3 Blobs: Storing File Content

A blob stores the raw contents of a file — nothing else. A blob has no filename, no permissions and no timestamp; those belong to the tree that references it. Two files with identical content, anywhere in the repo, share a single blob.

You can create a blob directly from text and inspect it with plumbing commands. This is the lowest level at which Git stores your data.

# Hash content into a blob and write it to the database
echo 'hello' | git hash-object -w --stdin

4 Trees: Representing Directories

A tree represents one directory. Each entry records a mode (file type and permissions), a name, and the hash of either a blob (a file) or another tree (a subdirectory). Trees nest to form the full directory hierarchy of a snapshot.

Use git ls-tree to read a tree and see exactly which blobs and subtrees it points to.

# List the entries of the tree at the current HEAD
git ls-tree HEAD

5 Commits: A Snapshot Plus Parents

A commit object ties together a single root tree (the full snapshot), zero or more parent commits, an author and committer with timestamps, and a message. A commit does not store diffs; it points to a complete tree, and diffs are computed on demand by comparing trees.

The first commit has no parent; a normal commit has one parent; a merge commit has two or more.

# Show the raw commit object, revealing its tree and parents
git cat-file -p HEAD

6 Annotated Tags as Objects

A lightweight tag is just a ref pointing at a commit. An annotated tag is a real object in the database: it stores the tagged object’s hash, a type, a tagger, a date, a message, and can be GPG-signed. Annotated tags are recommended for releases because they carry metadata and can be verified.

# Create an annotated tag and inspect the tag object
git tag -a v1.0 -m 'Release 1.0'
git cat-file -p v1.0

7 hash-object and cat-file

Two plumbing commands sit at the boundary of the object database. git hash-object computes an object’s ID and, with -w, writes it. git cat-file reads objects back: -t prints the type, -s the size, and -p pretty-prints the content.

Together they let you create and inspect any object by hand, which is the clearest way to understand the model.

# Inspect type, size and content of an object
git cat-file -t HEAD
git cat-file -s HEAD
git cat-file -p HEAD

8 Porcelain vs Plumbing

Git’s commands split into two layers. Porcelain commands (add, commit, log, checkout) are the friendly user interface. Plumbing commands (hash-object, cat-file, write-tree, commit-tree, update-ref, rev-parse) are low-level tools intended for scripts and for building higher-level behaviour.

Porcelain output can change between versions; plumbing output is stable and machine-parseable, which is why scripts should prefer it.

# rev-parse is plumbing: resolve a name to a full object ID
git rev-parse HEAD

9 Building a Commit by Hand

You can assemble a commit using only plumbing. git write-tree turns the current index into a tree object and prints its hash. git commit-tree takes a tree (and optional -p parents and -m message) and produces a commit object. Finally git update-ref moves a branch to that commit.

This is exactly what git commit does for you under the hood.

# Create a commit object from the current index by hand
tree=$(git write-tree)
commit=$(echo 'msg' | git commit-tree "$tree")
git update-ref refs/heads/main "$commit"

10 Refs and the refs Directory

A ref is a human-friendly name pointing at an object, usually a commit. Branches live under refs/heads/, remote-tracking branches under refs/remotes/, and tags under refs/tags/. A loose ref is simply a small file containing the target hash; refs can also be stored packed in .git/packed-refs.

Creating a branch is therefore extremely cheap — it writes one tiny file.

# List every ref and the object it points to
git show-ref

11 HEAD and Symbolic Refs

HEAD is a special ref that normally points indirectly at a branch, e.g. ref: refs/heads/main. This is a symbolic ref. When you commit, Git updates the branch HEAD names, and HEAD follows along.

A detached HEAD is when HEAD points directly at a commit rather than a branch. New commits then belong to no branch until you create one.

# Read what HEAD currently points to
git symbolic-ref HEAD

12 The Reflog: A Safety Net

The reflog records where each ref (including HEAD) has pointed over time, locally on your machine. Every commit, checkout, reset, rebase and merge appends an entry. This is your primary recovery tool: even after a hard reset or a botched rebase, the old commit is still reachable through the reflog.

Reflog entries expire after a while (90 days by default for reachable entries), and they are not pushed to remotes.

# View the recent movements of HEAD
git reflog

13 Recovering with reflog and reset

Suppose git reset --hard HEAD~3 threw away three commits. Because the reflog still holds the old tip, you can find it with git reflog and restore it. Use git reset --hard <hash> to move the branch back, or git branch rescue <hash> to point a fresh branch at the lost work.

The lesson: with Git, a hard reset is rarely truly destructive — the commits linger until garbage collection.

# Find the lost commit and restore the branch to it
git reflog
git reset --hard HEAD@{1}

14 The Index (Staging Area) Internals

The index, stored in .git/index, is a binary file listing the paths, modes and blob hashes that will form the next commit. It is the staging area: git add writes blob objects and records them in the index, and git commit turns the index into a tree.

You can read the index directly with git ls-files --stage, which shows each path’s mode, blob hash and stage number.

# Show staged entries with their modes and blob hashes
git ls-files --stage

15 Loose Objects vs Packfiles

Newly written objects are stored loose: one zlib-compressed file each under .git/objects/ab/cdef.... Over time Git consolidates them into packfiles (.pack with a .idx), which store many objects together and use delta compression — similar objects stored as differences against a base. Packs are far smaller and faster to transfer.

# Verify the contents and structure of pack indexes
git verify-pack -v .git/objects/pack/*.idx

16 Garbage Collection with git gc

git gc tidies the database: it packs loose objects, removes unreachable objects whose grace period has passed, prunes expired reflog entries, and packs refs. Objects still reachable from any ref or recent reflog entry are kept. Run git gc --aggressive occasionally for a more thorough repack, though it is slow.

Most commands trigger an automatic gc --auto when enough loose objects accumulate.

# Run garbage collection to pack and prune the database
git gc

17 The Commit-Graph for Performance

The commit-graph file (.git/objects/info/commit-graph) caches commit metadata — parents, generation numbers and commit dates — so that history walks (used by log, merge-base and reachability checks) avoid parsing every commit object. On large repositories this dramatically speeds up many operations.

# Build or update the commit-graph
git commit-graph write --reachable

18 Filesystem Monitor (fsmonitor)

On large working trees, git status spends most of its time scanning files for changes. The filesystem monitor (core.fsmonitor) hooks into OS file-change notifications so Git only re-examines files that actually changed, making status and other commands much faster.

Modern Git ships a built-in fsmonitor daemon you can enable with one config setting.

# Enable the built-in filesystem monitor
git config core.fsmonitor true

19 Rewriting History with git filter-repo

git filter-repo is the recommended tool for large-scale history rewriting: removing a file from all of history, stripping secrets, or splitting a repository. It walks the whole history and rewrites commits, which changes their hashes. Everyone who has a clone must re-clone or carefully reset, because the rewritten history is incompatible with the old one.

# Remove a file from the entire history
git filter-repo --path secrets.txt --invert-paths

20 Why Not git filter-branch

The older git filter-branch still exists but is discouraged: it is extremely slow, easy to misuse, and has dangerous default behaviours, so much so that Git now prints a warning recommending alternatives. git filter-repo is faster, safer and far more capable, and is the official replacement.

# The modern replacement for filter-branch
git filter-repo --path keep/ --force

21 BFG Repo-Cleaner

The BFG Repo-Cleaner is a fast, purpose-built tool for two common jobs: deleting large files from history and redacting secrets such as passwords. It is simpler but less general than filter-repo, and like any history rewrite it changes commit hashes. After cleaning, you typically run git reflog expire and git gc to physically remove the old objects.

# Strip files larger than 50M from history with BFG
java -jar bfg.jar --strip-blobs-bigger-than 50M repo.git

22 Interactive Rebase: Reorder and Edit

git rebase -i opens a todo list of commits you can reorder, drop, reword or mark to edit. Reordering lines reorders commits; changing pick to reword lets you rewrite a message; edit stops at that commit so you can amend it. Because each commit is replayed, all rewritten commits get new hashes.

# Interactively rebase the last 4 commits
git rebase -i HEAD~4

23 Splitting and Joining Commits

During an interactive rebase you can split a commit: mark it edit, run git reset HEAD^ to unstage its changes, then create two or more smaller commits. To join commits, use squash (combine and edit the message) or fixup (combine and discard the squashed commit’s message). These give you a clean, logical history before sharing.

# Inside an edit stop: reset to split into multiple commits
git reset HEAD^
git add -p
git commit -m 'first part'

24 Recovering Lost Commits with fsck

When even the reflog cannot help — for instance after it was expired — git fsck --lost-found scans the object database and reports dangling commits and blobs that are no longer reachable from any ref. You can inspect a dangling commit and resurrect it by pointing a branch at its hash before garbage collection removes it.

# Find unreachable (dangling) objects in the database
git fsck --lost-found

25 Merge Strategies: ort, ours, theirs

Git’s default merge strategy is now ort (Ostensibly Recursive’s Twin), a faster reimplementation of the classic recursive three-way merge. The ours strategy records a merge but keeps only your side’s tree. Note the distinction from the -X ours / -X theirs options, which only decide conflicting hunks while still merging non-conflicting changes.

# Merge but resolve all conflicts in favour of our side
git merge -X ours feature

26 Octopus and Subtree Merges

The octopus strategy merges more than two branches in a single commit, useful for batching topic branches that do not conflict. The subtree strategy is a variant of the recursive merge that adjusts the tree so one project can be merged as a subdirectory of another, which underpins the subtree workflow.

# Merge several non-conflicting branches at once (octopus)
git merge topic-a topic-b topic-c

27 Merge Drivers and .gitattributes

For files where a textual three-way merge is wrong — generated files, lockfiles, or binary formats — you can define a custom merge driver. A driver is a command configured in .git/config and assigned to paths via .gitattributes. The built-in union driver, for example, keeps lines from both sides instead of conflicting.

# Assign the union merge driver to changelog files
echo 'CHANGELOG.md merge=union' >> .gitattributes

28 rerere: Reuse Recorded Resolution

rerere (reuse recorded resolution) remembers how you resolved a conflict so that if the same conflict appears again — common during long-running rebases or repeated merges — Git replays your earlier resolution automatically. It records the pre-image of the conflict and the resolution you committed, storing them under .git/rr-cache.

# Enable reuse of recorded conflict resolutions
git config rerere.enabled true

29 git notes: Attaching Metadata

git notes attach extra information to existing commits without changing them — the commit hash stays the same. Notes are stored in a separate ref (commonly refs/notes/commits) and shown by git log. They are handy for CI results, review links, or release annotations added after the fact.

Because notes live in their own ref, they must be fetched and pushed explicitly.

# Attach a note to the current commit
git notes add -m 'reviewed by QA' HEAD

30 Replace Refs and Grafts

git replace creates a replacement ref under refs/replace/ that tells Git to substitute one object for another when reading history — without altering the originals. This modern, shareable mechanism supersedes the old .git/info/grafts file, which faked parent relationships locally. Replace refs are commonly used to stitch a shallow history onto an imported older one.

# Replace one commit with another when reading history
git replace <old-sha> <new-sha>

31 Submodules vs Subtrees in Depth

A submodule embeds another repository by recording a specific commit hash as a special gitlink entry in the tree; the nested repo stays separate and must be initialised and updated. A subtree merges another project’s history directly into a subdirectory of yours, so clones need no extra steps. Submodules keep histories cleanly separated; subtrees simplify consumption at the cost of a larger repository.

# Add a nested repository as a submodule
git submodule add https://github.com/example/lib.git vendor/lib

32 Sparse-Checkout and Partial Clone

Sparse-checkout populates only a subset of the working tree, so huge monorepos check out just the paths you need while the full history stays in the repo. Partial clone (--filter=blob:none) goes further by omitting most object content at clone time and fetching blobs lazily on demand. Combined, they make working with enormous repositories practical.

# Clone without blobs, fetching them on demand
git clone --filter=blob:none https://github.com/example/huge.git

33 Worktree Internals

git worktree lets one repository have multiple working trees checked out at once, each on a different branch, sharing the same object database. The main repo records linked worktrees under .git/worktrees/, and each linked tree has a .git file (not directory) pointing back. This avoids re-cloning just to work on two branches in parallel.

# Create a second working tree on a new branch
git worktree add ../hotfix -b hotfix

34 Signing Commits and Tags

You can cryptographically sign commits and tags so others can verify authorship and integrity. Git supports GPG, SSH keys (gpg.format ssh), and gitsign (keyless signing via Sigstore). Signed objects are verified with git verify-commit or git log --show-signature. Signing protects against impersonation in the supply chain.

# Create a GPG-signed commit
git commit -S -m 'signed change'

35 Server-Side Hooks

On the server, hooks enforce policy as pushes arrive. The pre-receive hook runs once per push and can reject the whole push by exiting non-zero. The update hook runs once per ref being updated, allowing per-branch rules (such as protecting main). The post-receive hook runs after acceptance for notifications or deployments. These hooks are central to access control and CI triggers.

# A pre-receive hook reads ref updates from stdin
while read old new ref; do echo "$ref $old -> $new"; done

36 Automated Bisection with git bisect run

git bisect performs a binary search through history to find the commit that introduced a bug. Mark a known good and bad commit, then automate the whole search with git bisect run <script>: Git checks out a midpoint, runs your script, and uses its exit code (0 = good, non-zero = bad) to narrow down to the culprit, with no manual steps.

# Automatically find the first failing commit
git bisect start HEAD v1.0
git bisect run ./test.sh

37 git bundle for Offline Transfer

A bundle packs commits, trees, blobs and refs into a single file you can move over any medium — email, USB, an air-gapped network — without a live server. The recipient can git clone or git fetch directly from the bundle file. Bundles can be full or incremental (a range of revisions), making them ideal for sneakernet syncs and backups.

# Create a bundle containing all branches
git bundle create repo.bundle --all

38 Background Maintenance with git maintenance

git maintenance schedules routine upkeep so repositories stay fast without manual gc. Enabling it registers background tasks — incremental repacking, commit-graph writes, prefetching from remotes, and loose-object cleanup — that run on a timer. It is the modern, finer-grained replacement for relying solely on automatic garbage collection.

# Register background maintenance for this repository
git maintenance start

🎓 Certificate of Completion

🔒 Complete every lesson quiz above with 90%+ to unlock your downloadable certificate.