Version control: Git 2.38 curates large repositories with Scalar

Git recently shines with a built-in administration tool: Scalar is intended to solve problems with the management of particularly large repositories through uniform features.

The source code version control Git has been released in version 2.38. The highlight this time should be Scalar, a new tool for repositories. According to the publishers, Scalar is intended to tackle the most well-known problems that regularly arise when managing particularly large repositories.

It curates and configures a feature set with a built-in filesystem monitor, multipack index, graphs of committed and received commits, and an overview of planned maintenance work. In addition, Scalar offers options for targeted, even partial cloning, and with a new technique called sparse checkout (in English about “checking out in economy mode”), large monorepos appear smaller in handling.

Table of Contents

Clone repositories and update references with Scalar

Scalar makes it relatively easy to clone repositories with the following command: $ scalar clone /path/to/repo. Cloning entire repositories is also possible with the option --full-clone so possible. To apply the default configuration recommended for Scalar to an existing clone, the following code suffices:

$ cd /path/to/repo
$ scalar register

Another innovation concerns the management of multiple branches that together form a larger feature. When editing large features, it can be helpful to break the work down into several separate branches that each build on one another. However, once developers edit earlier branches, managing version history for a git rebasebecome tedious. For this, Git 2.38 provides an option called --update-refs.

If you want to carry out the rebase regularly in this way, you can use the configuration git config --global rebase.updateRefs true Tell Git to use the option --update-refs to use automatically.

Economy mode to reduce repository size

The idea of the economy mode to reduce the repository size is not new; Git developers had already published a longer blog post with instructions and explanations (“Bring your monorepo down to size”) at the beginning of 2020. It also describes the conical pattern now embedded in Scalar to increase performance in economy mode. At that time, work was underway on the development of a sparse-checkout-Order and more manual work was required than now. I think the trick with eco mode is that it reduces the need for Git to inspect individual files when developers run commands like git add, git status or git checkout carry out.

A cone mode should enable the feature to be further scaled for particularly large repositories. Pattern matching techniques are used for this, which follow the logic of .gitignore follow. The files list patterns. Starts a line with !, the paths that match this pattern are excluded from further evaluation. This speeds up the matching process, since only more folders with patterns without the call sign need to be matched. However, in practice this can become quite complex when it comes to specifying complicated patterns. Large repositories may need thousands of patterns to describe a working directory, and checking each pattern with millions of paths would result in billions of pattern checks on each git checkout command. That can take a long time.

Under the hood: Sparse Checkout in Scalar

Most Git users can resort to a simpler mechanism instead, just considering constraints based on file folder prefix matching. The sparse checkout file can then act on whether a pattern is recursive or a parent pattern, speeding up the process. Git creates two different hash sets for recursive and parent paths when parsing the patterns. As a result, recursive data sets are automatically added to their parent paths. A detailed explanation of the underlying code can be found on GitHub. The blog entry provides instructions on how to clone repositories in sparse mode. According to the development team, the configuration options are currently still in experimental status.

The rest of the iceberg in the release notes

The timing of the current release coincided with the Summer of Code organized by Google, in which Git took part. In this context, the project was supervised by two students who dealt with sparse index integration and accessibility bitmaps. Their work has been incorporated into Git 2.38, as detailed in the Tidbits blog post section.

Those interested can find a full list of changes in the release notes on GitHub. In addition, an announcement has appeared in the GitHub blog, which presents the main changes in detail.