2018-08-05

1: The Whole Picture of Git

<The previous article in this series | The table of contents of this series | The next article in this series>

Git is not a distributed versions control system that networks centralized repositories, a point most tutorials fail to clarify at the very beginning.

Topics


About: Git

The table of contents of this article


Starting Context


  • The reader has knowledge on what versions control system in general is (the purpose of it and the mechanism that is common in various types of versions control system) so that he or she knows (or at least can confidently guess) what some terms (for example, 'repository' and 'working tree' (also known as 'working directory')) mean.
  • The reader has read a Git tutorial, but couldn't understand the whole picture of Git well.

Target Context


  • The reader will better understand the whole picture of Git.

Orientation


Hypothesizer 7
When I, a Git novice, read one Git tutorial after another for the purpose of introducing Git into my system, they similarly explained things and similarly ignored my concern.

I wanted to use Git in order to manage my own documents in a repository with old versions preserved and being able to be looked up on demand: I didn't intend to share the repository with others. And my humble concern was to establish the single repository, which resided somewhere (for example, in '~/myData/gitRepositries/documents'), and register my documents, which resided somewhere else (for example, in '~/myData/documents'), where I continued to edit them, into the repository.

Is that concern unreasonable? . . . In fact, I have some experience with a few centralized versions control systems (any centralized versions control system is a versions control system that can have repositories, contents of each of which (the repositories) can be concertedly edited in multiple working trees (even per branch), typically by multiple persons), and that concern is certainly normal, or standard, I would say, for any centralized versions control system.

Certainly, I knew that Git was a distributed versions control system, but I expected that being a distributed versions control system was just about synchronizing multiple repositories; as for handling a single repository, Git would be able to be treated as a centralized versions control system. In fact, why shouldn't I expect so?

So I read a Git tutorial, which created a repository in a directory, which was fine, and . . . began to put files directly into that directory as though that was the obvious, sole, imaginable thing to do. . . . "No, no, no, I don't intend to do such a thing: I want to have my all the repositories (including this one) under a directory (I don't want to scatter repositories around), and I want to edit files at other locations (for me, the repositories directory is for storing repositories, not for editing files, and I don't agree to be forced to edit files only under the repositories directory).

So, I read another tutorial, and . . . it did the same thing . . . Huh? And I read another, and it showed a chart like this.


"Yes! 'Working directory'! I want to have my files in the 'working directory', not in the repository directory!"

So, I eagerly read the tutorial along, and it created a repository in a directory, and . . . began to put files directly into that directory without mentioning any 'working directory' . . .

Huh?? "What happened to the 'working directory'? Let me know how to set up the 'working directory', please?"

. . . Well, certainly, I am the one to be blamed in that I had come with the expectation that was natural for centralized versions control systems users, but are centralized versions control systems users unwelcome to those tutorials? . . . As it seems so, I will try to make a clarification that welcomes anyone who fulfills 'Starting Context'.


Main Body


1: Notes


Hypothesizer 7
Note that I will omit mentioning 'branch' in this article. That is because handling multiple branches isn't any concern of this article and scattering the term, 'branch', over the article while only one branch per repository is always considered seems to contribute to more confusion rather than to clarification. For example, I will use an expression like "the contents of a repository" while it is strictly speaking 'the contents of a branch in a repository'.

Furthermore, I intentionally ignore 'linked working tree' in this article. That is because it is about dealing with different branches in a repository at the same time, which isn't any concern in this article, and also because it is experimental and incomplete. So, in this article, I will just mention 'working tree', meaning 'main working tree' (also known as 'main working directory').


2: Strictly Speaking, the Directory Specified as the Working Tree for a Git Command Execution Is the Working Tree for the Git Command Execution


Hypothesizer 7
I had expected that I would set up a directory as a working tree of a repository, and I would be able to use the directory as a working tree of the repository thereafter, until I drop the setting. In fact, Git doesn't work like that.

The 'git' command takes the '--git-dir=' and '--work-tree=' switches, which let us designate the repository directory and the working tree to be handled. So, whatever directory we specify as the working tree, it is the working tree for the command execution.


3: However, the Contents of Any Repository Are Basically Supposed to Be Edited from a Single Working Tree at a Time


Hypothesizer 7
However, as Git doesn't check any conflict of changes staged from multiple working trees, basically, the contents of any repository is supposed to be edited from a single working tree at a time (by "at a time" I mean that we can delete the existing working tree and then create a new working tree (by checking out the repository), which means editing the contents of the repository from multiple working trees, but not from multiple working trees at a time).


4: In Fact, There Is the Default Working Tree for Any Non-Bare Repository


Hypothesizer 7
What is 'non-bare repository'? Actually, it is a repository that has the default working tree.

Although it's fine if we consistently specify the working tree by the '--work-tree=' switch, it is usually cumbersome. So, any repository can have the default working tree, which unnecessitates specifying the working tree each time.


5: Then, What Is 'Bare Repository'?'


Hypothesizer 7
Obviously, 'bare repository' is a repository that doesn't have any default working tree.

However, actually, that doesn't mean that the contents in the bare repository cannot be edited from any working tree: they can be edited from any working tree specified by the '--work-tree=' switch, although whether that is encouraged is another story.


6: So, What Does the 'git init' Command Really Do?'


Hypothesizer 7
As most tutorials say just "the 'git init' command creates a repository", I have to wonder ". . . Then what about the working tree? How can I create the working tree?".

In fact, the 'git init' command without any further parameter 'creates a repository in the '.git' directory in the current directory, and designates the current directory as the default working tree of the repository'. If those tutorials had kindly explained so, I wouldn't have been left at a loss . . .

Anyway, that is the default directories structure for pair of a repository and its default working tree, but actually, we can place each of the repository and the default working tree at any location by specifying the '--git-dir=' and '--work-tree=' switches for the 'git init' command.

In fact, the location of the default working tree is set in the 'config' file in the repository directory (although in the default directories structure, it isn't explicitly set) and can be changed afterward by our modifying the file.


7: After All, Git Is a Distributed Versions Control System That Networks Local Versions Control System Repositories'


Hypothesizer 7
As one should have already understood now, Git is a distributed versions control system that networks local versions control system repositories, not any distributed versions control system that networks centralized versions control system repositories as I had expected.

Why had I conceived that erroneous expectation? Well, for one, that was natural for a centralized versions control systems user; for two, as I thought that versions control system had evolved from local versions control system to centralized versions control system to distributed versions control system, I guessed that any distributed versions control system should be an enhanced centralized versions control system.

Anyway, Git with a single repository isn't any centralized versions control system, but a local versions control system.


8: In the Intended Usage, Any Working Tree Is Given Its Private Repository, and the Two Forms an Exclusive Pair'


Hypothesizer 7
Let's distinguish between how Git can be possibly used and how Git is supposed to be used. Although by wantonly using the '--git-dir=' and '--work-tree=' switches, we can use multiple repositories and multiple working trees in free combinations, this seems to be the way in which Git is supposed to be used: any working tree is given its private repository, and the two forms an exclusive pair.

From a 'centralized versions control systems user'-ish view, Git's being able to have only one working tree per repository is just an inconvenient restriction. But from the Git-ish view, that seems to be meant to be understood the other way around: every working tree is kindly given its private repository.

I admit that that decision has enabled the merit of being very light for Git: as any repository is meant to be handled by only one working tree, it can be accessed just as a group of files without any server software that coordinates requests from multiple working trees.

On the other hand, honestly, I don't want any gratuitous repository given to my every working tree: any extra repository means waste for disk space and a nuisance for me (as it forces me to have to synchronize repositories).


9: Any Repository Which Is Being Directly/Indirectly Handled by Someone Is Local/Remote for Him or Her at the Instant


Hypothesizer 7
I am troubled by some usages of the terms, 'local repository' and 'remote repository', in many tutorials. That is because a repository is stated as though being local or remote in itself. In fact, being local or remote isn't any attribute of the repository, but is about how the repository is being used by someone at the instant: the repository is local or remote only for that person at the instant. So, expressions like "create a local repository" seem nonsense: the repository can be local or remote according to each usage.

In fact, when someone is handling a repository directly, the repository is a local repository for him or her at the instant, while when someone is handling a repository indirectly (meaning 'via another repository'), the repository is a remote repository for him or her at the instant.

Besides, using 'local/remote repository' also as 'repository that resides in a local/remote computer' is confusing. One should make up his or her mind about what 'local/remote repository' means.


10: Should Git Be Used for Centralized Versions Control System Needs?'


Hypothesizer 7
I read a tutorial that claimed (in effect) that one didn't need to use any centralized versions control system any more because Git, as a distributed versions control system, can serve as a centralized versions control system.

Well, . . . that may be so, but whether Git should be used is another story. For one, Git doesn't seem to address some typical centralized versions control system needs. For example, there may be a need that any file that is edited by someone to be made readonly (or at least known to be being edited by someone) to the others. Someone might say that such an operation model is bad or at least unnecessarily, but such an operation model is more effective for some projects than Git's.

For two, Git for centralized versions control system needs necessitates unnecessary extra private repositories, which is waste for disk space and nuisances for users.

As a conclusion, I don't think that Git should particularly be used for centralized versions control system needs, although if it is decided to be used by someone, that's, of course, fine.


11: So, What Should I Do for My System?'


Hypothesizer 7
Although I had expected a centralized versions control system from Git with a single repository, actually, I just need a local versions control system that allows me to place repositories and working trees at arbitrary locations .

For my needs, Git with a single repository and a single (at a time) working tree at customized locations seems fine.


12: The Conclusion and Beyond


Hypothesizer 7
Now, I seem to better understand the whole picture of Git.

In short, Git isn't any distributed versions control system that networks centralized versions control system repositories, but one that networks local versions control system repositories.

In the intended usage, any working tree is given its private repository, and such non-bare repositories plus bare repositories are networked. Although any repository is supposed to have at most only one working tree at a time, the working tree can be located anywhere.

That architecture has its merits and its demerits (at least for some people), which will be material on which one can judge whether Git is optimal for his or her needs.

By the way, I have found that Git cannot store file modification times (yes, 'cannot store them', not just 'cannot restore them when being checked out'), which is a critical problem, at least for me. I will try to refute the argument that supports such Git's decision and to understand how to solve the problem, in a future article.


References


<The previous article in this series | The table of contents of this series | The next article in this series>