What is git filter-repo
?
git filter-repo
is a powerful tool designed for modifying Git repository history in an efficient and straightforward manner. It allows users to remove files, rewrite commit messages, change author information, and restructure repositories while preserving commit history.
Unlike the now-deprecated git filter-branch
, which is slow and complex, git filter-repo
is significantly faster and easier to use. It operates by processing the repository history in a single optimized pass, making it an ideal choice for cleaning up repositories, removing sensitive data, and reorganizing project structures without the risk of breaking commit integrity.
Its built-in safety features also help prevent accidental data loss, ensuring that history rewriting is performed correctly.
Installing git filter-repo
Whether you're on Windows, macOS, or Linux, installing git filter-repo
is a simple process.
On macOS/Linux, using Homebrew:
brew install git-filter-repo
On Windows:
python -m pip install git-filter-repo
Key Features of git filter-repo
1. Faster than git filter-branch
(Optimized Performance)
One of the biggest pain points with git filter-branch
is its slow execution, especially on large repositories. Because git filter-branch
processes each commit sequentially while rewriting history, it can take hours or even days for complex operations on repositories with extensive histories.
In contrast, git filter-repo
is built for speed. It processes Git history in a single pass, making it significantly faster than git filter-branch
. Users who switch often report speed improvements by 10x or more! If you need to clean up a repository efficiently, git filter-repo
offers a far more optimized solution.
Example: Removing all commits containing a specific file in a large repo:
git filter-repo --path largefile.zip --invert-paths
This operation, which might take hours in git filter-branch
, will complete in seconds or minutes with git filter-repo
.
2. Easy-to-Use Syntax (Short, Clean Commands)
git filter-branch
is notorious for its complex, error-prone syntax. Many of its operations require writing long shell scripts, using intricate environment variables, and parsing outputs manually—making it hard for even experienced users to execute correctly.
git filter-repo
simplifies all of this by offering a more intuitive and direct command structure. Instead of multi-line scripts, you can achieve most repository filtering operations with a single command. This makes it accessible to both beginners and advanced Git users.
Example: Renaming a directory across the entire Git history:
git filter-repo --path-rename old_directory:new_directory
Compare this to git filter-branch
, which requires complex scripting and conditional handling. The simplicity of git filter-repo
ensures fewer mistakes and quicker execution.
3. Works on All Branches at Once (--all
Flag)
One major limitation of git filter-branch
is that it operates on a single branch by default. If you need to apply changes across multiple branches, you would have to manually iterate through each branch, making bulk operations painfully slow and tedious.
With git filter-repo
, you can modify every branch and tag in the repository in one go using the --all
flag. This means no need to manually switch branches or repeat operations, saving hours of extra work when cleaning up a repo.
Example: Removing a sensitive file from all branches and tags:
git filter-repo --path secrets.txt --invert-paths --all
This automatically applies changes to every branch and tag, avoiding manual branch-by-branch filtering.
4. Prevents Accidental Destructive History Rewrites (Built-in Safety Checks)
Rewriting Git history is a potentially dangerous operation—pushing incorrect modifications can permanently alter the history of a repository, making recovery difficult.
Unlike git filter-branch
, which allows destructive changes with little warning, git filter-repo
includes built-in safety measures to prevent accidental overwrites.
Key safety features:
- Requires a fresh clone:
git filter-repo
refuses to run on repos that aren’t fresh clones, reducing unintentional overwrites. - Prevents data corruption: It ensures that blobs, commits, and trees are properly rewritten before applying changes.
- Guided errors & warnings: It provides clear error messages when users attempt a potentially destructive operation.
Example: Git filter-repo
will refuse to overwrite history in a non-fresh clone:
git filter-repo --to-subdirectory-filter src/
Error Message:
Aborting: Refusing to destructively overwrite repo history since this does not look like a fresh clone. Expected freshly packed repo.
To proceed safely, users will need to clone the repository anew, ensuring they don’t corrupt shared branch history.
Common Use Cases & Practical Examples
1. Removing Large Files from Git History
Git repositories can accumulate large files over time, significantly increasing the overall repository size. This leads to slower cloning times, increased storage usage, and inefficient performance. Even if a large file has been deleted in a later commit, Git still retains it in history, making the repository unnecessarily large.
Using git filter-repo
, you can completely remove all traces of a specific file across all commits and branches.
git filter-repo --path large-file.zip --invert-paths --all
Example: Removing videos.mp4
from history:
git filter-repo --path videos.mp4 --invert-paths
After running the command, the large file will be erased from all commits, drastically reducing the repository size. Ensure you use git push --force
after filtering to overwrite the remote history.
2. Removing Sensitive Data (Passwords, API Keys)
Accidentally committing sensitive data, such as passwords, API keys, or database credentials, can pose a huge security risk. Even if you delete the file in your latest commit, previous revisions will still contain the exposed credentials, leaving your system vulnerable.
Instead of starting a new repository, git filter-repo
allows you to completely remove or replace sensitive information from all previous commits while keeping the rest of your commit history intact.
echo 'AWS_SECRET_KEY' > remove.txt
git filter-repo --replace-text remove.txt
Example: Removing hardcoded passwords:
git filter-repo --replace-text credentials.txt
This method safely eliminates sensitive credentials from every recorded commit in the repository. However, if credentials have already been pushed to a public repository, you should still revoke and regenerate any compromised keys or passwords.
3. Moving All Files to a Subdirectory Without History Breakage
Sometimes, projects evolve, and restructuring a repository becomes necessary. If you need to organize files into a subdirectory while preserving the commit history, simply moving the files manually won’t be enough.
Instead, git filter-repo
allows you to move all existing files into a subdirectory while keeping previous commits as if they were always stored in that structure.
git filter-repo --to-subdirectory-filter src/
Example: Moving everything into a new backend/
folder:
git filter-repo --to-subdirectory-filter backend/
This is especially useful when converting a repository into a monorepo or when integrating an existing codebase into a larger project while maintaining full history.
Renaming a Directory in Every Commit
Over time, project folder structures can change. A directory name that made sense in the past may no longer be relevant, and renaming it while keeping history intact can be challenging.
Normally, renaming files or folders in Git affects only the latest commit, leaving older commits unchanged. By using git filter-repo
, you can ensure that the rename is reflected across every commit in your repository.
git filter-repo --path-rename old_folder:new_folder
Example: Renaming api
folder to services
:
git filter-repo --path-rename api:services
This is particularly useful when rebranding a section of a project, fixing outdated directory names, or making repositories more readable when onboarding new developers.
4. Splitting a Monorepo into Multiple Repositories
A monorepo is a repository structure that contains multiple distinct projects. This setup can sometimes become too large and unwieldy, making it difficult to manage independent projects separately.
If you need to extract a portion of a monorepo and keep its original commit history, you can use git filter-repo
to create a new standalone repository for a specific directory.
git filter-repo --subdirectory-filter backend/
Example: Extracting the frontend/
directory into a new repository:
git filter-repo --subdirectory-filter frontend/
After running this command, your repository will consist only of the selected subdirectory’s history, making it much easier to manage as an independent project. This approach is essential when transitioning from a monolithic to a microservices-based architecture.
5. Changing Commit Author Information
If you have mistakenly used the wrong Git username or email in past commits, git filter-repo
allows you to fix these details across all historical commits instead of editing them manually.
This is particularly useful when:
- Changing your GitHub or corporate email to a new one
- Normalizing commit author names for better consistency
- Fixing misconfigured Git usernames
git filter-repo --name-callback '
if name == "wrong_name":
name = "correct_name"'
Example: Updating an old incorrect author name
git filter-repo --email-callback '
if email == "old@example.com":
email = "new@example.com"'
This ensures that every past commit reflects the correct author metadata, making collaboration easier and maintaining repository integrity.
git filter-repo
vs git filter-branch
: Which One Should You Use?
Feature | git filter-repo | git filter-branch |
---|---|---|
Speed | Fast | Slow |
Syntax Simplicity | Easy commands | Complex & error-prone |
Works on all branches | Yes (--all option) |
No |
Actively Maintained | Yes | No (Deprecated) |
In conclusion, if you need fast, reliable, and easy history rewriting, git filter-repo
is the way to go.
FAQs: Common Questions on git filter-repo
Can I undo git filter-repo
changes?
If you haven’t pushed yet, you can recover using:
git reflog
git reset --hard HEAD@{N} # (Replace N with the correct ref)
If you already pushed, recovery is only possible if:
- A backup clone exists
- The repository is hosted somewhere with history retention (e.g., GitHub’s reflog for force-pushed branches)
- Someone else still has an uncorrupted local copy
How do I rewrite history safely with git filter-repo?
Always work on a fresh clone, filter your history, and inspect your changes before pushing.
Can I use git filter-repo on a shared repository?
Be careful! You may overwrite commits that teammates rely on.
Warn your team before using git push --force
(more information here).
Final Words
In conclusion, this is why you should use git filter-repo
for Git History Rewriting:
- It's easy to use with modern safety measures built in.
- It's a fast and efficient alternative to
git filter-branch
. - It's a great tool for cleaning up history (removing large files, sensitive data, renaming commits).
To learn more about this powerful tool, read the official git filter-repo documentation and start applying it to clean up your Git history today!