How to Use "Sparse Checkout" to Manage Large Git Repositories
If you have ever worked with a repository containing thousands of files, you are certainly familiar with the frustration of waiting for basic commands like git status
or git checkout
to complete. Commands that usually run in milliseconds in small repositories can be surprisingly slow (and test your patience) in larger repositories.
This frustration is amplified when you're focusing on a small part of the codebase and don't need to interact with 99% of the files.
You may be wondering: "Why keep track of everything if I am only working on a small part of the project? Wouldn't it be great if I could just tell Git to just keep track of a couple of directories I'm working on?"
Well, good news! The sparse-checkout
command does exactly that! ✌️
It enables you to work with only a specific set of files from a repository, rather than the entire repository. This allows you to benefit from a monorepo while ensuring efficient Git performance — the best of both worlds!
Introduced in Git 2.25.0, this command was designed with large Git repositories in mind, particularly monorepos. While a similar functionality was available through the core.sparsecheckout
config option previously, this new command greatly streamlines the process.
Let's see how it works!
Tip
Performance in Git: The Complete Guide
For more tips on how to improve performance in Git, check out our complete guide!
Getting Started with sparse-checkout
Initiating sparse-checkout
is quite simple. To set it up, run the following command:
$ git sparse-checkout init
This command configures the necessary settings, telling Git that you will specify which parts of the project you want to work with.
Now you just need to specify which directories you require! You can do so by typing the following:
$ git sparse-checkout set <path1> <path2> ...
This command downloads only the necessary parts and enables you to access them in your working directory. For example, if your focus is on an Electron app located within the /client directory, you can use a command like the following to work with it:
$ git sparse-checkout set client/electron
If you wish to add more paths later, you can use the add
subcommand. To view the current settings, you can rely on the list
subcommand, as illustrated below:
$ git sparse-checkout add <new_path>
$ git sparse-checkout list
If you encounter any issues or wish to revert to having the full working directory available, you can disable sparse-checkout
by entering the following:
$ git sparse-checkout disable
Better Performance with Cone Mode
For more efficient sparse checkouts, especially in really large repositories, you can leverage Cone Mode, introduced in Git 2.27.0. The name makes a lot of sense since it creates a cone-shaped subset of the repository tree, including all parent directories of specified paths.
Cone mode automatically includes parent directories and is often faster due to its pattern matching capabilities. It only allows full directory paths, not individual files or complex patterns — resulting in much faster processing compared to the regular sparse-checkout
command.
This way, Git can quickly determine if a path should be included without complex regex evaluations, as it only needs to check if a path is within the "cone" of specified directories.
To benefit from this feature, simply add the --cone
flag when initializing the sparse-checkout
command:
$ git sparse-checkout init --cone
You can then proceed to add directories as detailed above.
What About Disk Space?
You may be surprised to learn that Sparse Checkout does not inherently save disk space. In reality, all objects are still downloaded and stored in the local .git
directory. Its purpose is to reduce the number of files Git needs to scan for status updates.
To save up disk space, you should consider Partial Cloning, which allows you to clone a repository without downloading all of its objects.
Here's how you can perform a partial clone:
$ git clone --filter=blob:none --sparse <repository-url>
This command significantly reduces both initial download size and local storage requirements. It exclusively downloads the tree objects initially, without any file content. It will then fetch each file's contents on-demand, as you work.
You can then run the sparse-checkout
commands mentioned earlier. They work really well in combination!
Learn More
- How to Improve Performance in Git: The Complete Guide
- More frequently asked questions about Git & version control