Showing posts with label Let Me Understand Git. Show all posts
Showing posts with label Let Me Understand Git. Show all posts

2018-09-30

3: Store and Restore File Modification Dates and Times in Git

<The previous article in this series | The table of contents of this series |

Git's not storing/restoring file modification dates and times is a decision of a specialized taste, which does not have to be shared by everyone.

Topics


About: Git

The table of contents of this article


Starting Context


  • The reader has knowledge on the whole picture of Git.
  • The reader has knowledge on some of the basic operations (staging the files, unstaging the files, removing the files, committing the changes, checking out the commit or the files) of Git.

Target Context


  • The reader will understand how to store the file modification dates and times into the Git repository and how to restore them to the checked-out files.

Orientation


Hypothesizer 7
Git does not store any file modification date and time (the commit dates and times are stored, but cannot substitute for the file modification dates and times, at least for me). Hmm . . .

There is a document that claims to be an answer to "Why does not Git preserve the file modification times? ", and "preserving" seems to mean 'restoring to the checked-out files' in the document. However, I am not asking "Why does not Git restore the file modification dates and times to the checked-out files?" (at least for now, although I will do so later) but "Why does not Git store the file modification dates and times into the repository?".

The file modification dates and times are useful information for some administrative purposes, and I definitely want them to be recorded in the repository: being able to look up them in the repository is fine. In fact, not restoring the file modification dates and times to the checked-out files does not necessitate nor justify not storing the file modification dates and times into the repository.

After all, Git does not store the file modification dates and times not because of the concern stated in the document, but because of the specialized taste that file modification dates and times exist only for build tools to identify the files to be compiled, which I do not share or recognize any reason to be forced to share.

Even as for restoring the file modification dates and times, I do not agree with the document: when I check out a branch, I will certainly clean the project anyway, because otherwise, the unnecessary and could-cause-troubles derived files that do not have any corresponding source file in the checked-out branch would be left as trash; those derived files could cause troubles because using those phantom derived files would not be detected as any error, and trash would be unfavorably included into the Jar file (sorry, my first programming language is Java). As checking out a branch means that I now face a possibly quite different version of the project, cleaning the project is reasonable, in my opinion.

On the other hand, the files checkouts are different (the difference between 'commit checkout' and 'files checkout' is argued in this article), and not restoring the file modification dates and times makes (at least some) sense for them (certainly, I do not want to clean the project just because I have replaced a few source files). As I keep facing the same version of the project, not cleaning the project is reasonable.

So, not restoring the file modification dates and times in a files checkout is fine (the checked-out files can be naturally considered to have been modified at the checkout, with their after-modifications contents happening to be the same with the contents of some files registered in the repository), but I prefer restoring the file modification dates and times in any commit checkout (it is unnatural to consider that all the files in the checked-out branch are modified at the checkout).

In the first place, is Git only for compiler programming language projects? The concern stated in the document does not matter at all for non-compiler programming language projects or document files repositories.

I have heard that Git was originally developed for a specific project, and certainly, I am not in any position to criticize any artifact that exists only for a single project, of which I am not any member: I guess that the decision suited the tastes of some majority members of the project or the taste of the dictator of the project. If Git opts to keep being only for the specific project, let us, the outsiders, leave it alone and use another artifact that listens to general needs.

So, I wondered whether I would adopt another versions control system or would (if possible) tweak the behavior of Git, and have tried the latter, first.


Main Body


1: Although There Is 'Metastore', . . .


Hypothesizer 7
I have found that there is an artifact called 'Metastore', but I have also found that it does not exactly address my concern: it does not store the modification dates and times of the files registered in the repository, but the modification dates and times of the files in the working tree.

For example, when a file has been changed in the working tree without being staged, 'Metastore' will store the modification date and time in the working tree when the next commit is performed, although the change will not be reflected in the repository.

In fact, although 'Metastore' gets the modification date and time at the commit time (at least in the usage that uses the pre-commit hook included in it), that is too late because the modification date and time of the staged version can be lost anytime after the staging.

And simply restoring all the registered file modification dates and times to the files in the working tree, in the post-checkout hook does not realize the accurate result because of the complicated checkout behavior: the modification dates and times of the carried-over files in the working tree should be left alone.

And the meta data file's being in a binary format seems a problem for resolving conflicts from 'pushes' from some multiple repositories, although I haven't personally experienced that issue, yet.


2: A Rough Idea


Hypothesizer 7
However, the basic idea is usable, or I cannot think of another one: store the modification dates and times of the files concerned (the files that belong to the commit) into a file (which I will call 'file meta data bundle file') that is registered into the repository as a part of the commit (which means that there is a single file meta data bundle file for each commit).

The modification date and time of any file has to be recorded when the file is staged ('when the commit is executed' is too late as argued in the previous section). How can I do that?

We cannot have any hook for staging, but we can create a filter that is called when any file is being staged.

I considered using such a filter, but have found out that that way has some difficulties. First, the filter is called not only when the 'git add' command is executed, but also in some other unfathomable (for me) occasions (some commits and some checkouts), and could cause some unexpected results. Second, we have to maintain the file modification dates and times data not only when the files are staged, but also when the files are unstaged or removed, for which the filter is not called.

Hmm . . ., after all, any hook or filter does not do, and I do not seem to have any other option than creating a wrapper of the 'git' command.


3: The Wrapper Will Cover Only a Part of the Whole Possible Usage of the 'git' Command


Hypothesizer 7
Actually, I do not intend to make the wrapper cover the whole possible usage of the 'git' command, because making the wrapper cover some usage is rather tiresome (although not impossible) while I am not interested in using such usage or such usage is not indispensable.

Such usage includes the interactive mode of the 'add' sub command (the '-i' switch): simply, I do not feel any necessity for it.

The patching modes of the 'add' and 'reset' sub commands (the '-p' switch or the '--patch' switch) call for more consideration. Do I need them? . . . Hmm, they are about directly editing the file in the staging area without first editing the file in the working tree, which, basically, I do not do because I, usually, feel necessity to examine the file in the working tree (for example, by building the project and testing the program or by proofreading the document) before I commit the file. In the first place, why do I want to edit the file only in the staging area? . . . Probably, I want to create a spin-off version of the project, but, then, for me, just creating a commit in the master branch does not do: I will want a secondary branch. So, I would rather stash the master branch, create a secondary branch, apply the stash to the secondary branch, edit the file in the working tree, examine the change, stage and commit the change to the secondary branch, return to the master branch, and pop the stash to the master branch, without using the patching feature. I know that that involves many steps, but as I need the secondary branch, just patching the file in the staging area and committing the patch in the master branch does not do anyway. I do not particularly deny that 'patching' might be sometimes handy, but that does not incentivize me to make the wrapper cover the feature through tiresome toil.

Also included is 'resetting' any file by using any commit that is not 'HEAD': that is also about directly editing the file in the staging area without first editing the file in the working tree, which I do not do. In fact, I use 'reset' only in order to just cancel the staging (unstage the file), which is sometimes a necessary operation.

The patching mode of 'files checkout' (the '-p' switch or the '--patch' switch for the 'checkout' sub command) has some charm, certainly . . ., but the wrapper will not cover that either because the wrapper will not really call the 'checkout' sub command for any files checkout (as described in a subsequent section) and honestly, it is tiresome to make the wrapper cover the feature.

Actually, all that are in my scope are to use 'add' (without '-i' or '-p' or '--patch') to stage the files, to use 'reset' to just cancel the staging (reverting to the 'HEAD' state), to use 'rm' to remove the files, to use 'commit' to commit the staged changes, and to use 'checkout' (without '-p' or '--patch') to prepare for handling another commit (typically a branch) ('commit checkout') or to incorporate some files from another commit into the current commit ('files checkout').


4: The Format of File Meta Data Bundle File


Hypothesizer 7
Let me determine the format of file meta bundle file.

Any file meta data bundle file (each commit has one) will be an extended JSON file. "extended JSON file"? . . . Actually, I have personally added the date, time, and datetime types into the JSON format because they are indispensable for me.

This is the format of file meta data bundle file.

[%commit date and time%, {%file path%: [%staged file modification date and time%, %registered file modification date and time%], . . .}]

It has the commit date and time in order to differentiate each file meta data bundle file from the other file meta data bundle files; the registered file modification date and time is retained in order to recover the staged file modification date and time when the file is reset.

'%registered file modification date and time%' will be 'null' if the file has not been registered yet; '%staged file modification date and time%' will be 'null' if the file has been committed and then removed but not the removal has not been committed yet; both cannot be 'null' at the same time because the file is removed from the file meta data bundle file in such cases.


5: What the Wrapper Has to Do for Storing the File Modification Dates and Times


Hypothesizer 7
As the principle, the wrapper has to record any change to the staging area (meaning any addition, modification, or removal of any file) at the instant when the change is made; such recording is made into the file meta data bundle file, and the file meta data bundle file will be staged immediately after the recording (yes, not when the commit is executed, because of a reason described below).

Any change to the working tree does not matter because it cannot directly go into the repository while we are concerned with the modification dates and times of the registered files.

How can any file be changed in the staging area?

An obvious way is to be specified in a 'git add' command execution, which is OK (getting the file modification date and time in the working tree and recording it as the staged file modification date and time of the file into the file meta data bundle file).

Another way is to be specified in a 'git reset' command execution (only being reset to the 'HEAD' state is considered, as stated above), which is OK (putting the registered file modification date and time value of the file into the staged file modification date and time slot of the file if the registered file modification date and time is not 'null', or removing the file entry otherwise, in the file meta data bundle file).

Another way is to be specified in a 'git rm' command execution, which is OK (putting 'null' into the staged file modification date and time slot of the file if the registered file modification date and time of the file is not 'null' or removing the file entry otherwise, in the file meta data bundle file).

Another way is to be carried over by a commit checkout (see 'A-1', 'A-2', 'A-3', 'M-7', 'M-8', 'M-9', 'M-10', 'R-6', 'R-7', and 'R-8' in the previous article), which is not OK because it can happen surreptitiously (see 'A-3' and 'M-10' in the previous article). Hmm . . ., as I do not want that carrying-over behavior at all, I will block it from happening (I will talk how later).

Another way is to be automatically staged by a files checkout, which I can cope with whether I will restore the file modification date and time or not. However, cannot I rather cancel the automatic staging itself, which is annoying to me? . . . Hmm, just resetting the file in the staging area to the 'HEAD' state does not do because a change may had been staged, which would be lost . . .. I will rather replace any files checkout operation with some 'show' sub command executions inside the wrapper, redirecting the outcomes into the files (I will have to first identify the possibly multiple files and to execute the 'show' sub command for each file).

The point to be considered is how to identify the files that have been changed in the staging area. . . . As for the 'add' and 'rm' sub commands, it is easy because they, decently, report those files (with the '-v' switch for the 'add' sub command and without the '-q' or '--quiet' switch for the 'rm' sub command), but the 'reset' sub command is not decent . . .. Hmm, it seems that I have to identify the files through the files specification expressions passed to the 'reset' sub command execution. I thought naturally, I think, that the format of those files specification expressions should be 'glob', but . . . it is found out not to be so nor be the regular expression format. . .. Then, what is it? . . . In fact, it is an uncanny Git-original format in which 'aa*.txt' matches 'aaa.txt' and 'aaa/aaa.txt', but not 'bbb/aaa.txt' while 'b*a.txt' matches 'bbb/aaa.txt' (really?). Hmm . . .. On the other hand, the format of the files specification expressions for the 'checkout' sub command execution is different: 'aa*.txt' matches 'aaa.txt', but not 'aaa/aaa.txt' nor 'bbb/aaa.txt' while 'b*a.txt' does not match 'bbb/aaa.txt' while '*/aa*.txt' does not match anything while 'aaa/aa*.txt' matches 'aaa/aaa.txt'. Hmm . . .. By the way, the files specification expressions for the 'ls-tree' sub command do not even accept any wildcard although the manual says that the expressions are not "really raw pathnames", but "rather a list of patterns to match" . . .. Honestly, I really have begun to hate Git . . .. Anyway, the question is "Should my wrapper follow such absurd (I humbly declare that it is absurd) behavior?". . . . Really, I do not want 'bbb/aaa.txt' to be reset when I specify 'b*a.txt'. . . . So, the wrapper will replace such absurd behavior with just simple glob behavior, which means that the wrapper will take glob expressions, expand them, and pass the expanded file paths to the 'git' command.


6: How Will Problematic Commit Checkouts Be Blocked?


Hypothesizer 7
In fact, problematic commit checkouts are already blocked by the measures described above: the file meta data bundle file will cause an error in such any checkout.

In fact, as each file meta data bundle file has its commit date and time (each commit can be supposed to have a unique commit date and time, practically), the file meta data bundle file of the new current commit cannot have the same contents with the file meta data bundle file of the previous current commit (see this article in order to know what I mean by 'the previous current commit' and 'the new current commit'), which causes an error in the checkout if the file meta data bundle file has been changed in the working tree and in the staging area of the previous current commit (see 'M-17' and 'A-7' in the previous article). If not, it does not matter for storing the file modification dates and times if the checkout is not blocked, because there is no change in the staging area to be carried over (changes in the working tree can be carried over, which does not matter for storing the file modification dates and times, but certainly matters for restoring the file modification dates and times). The reason why the file meta data bundle file has to be staged immediately after it is changed is to make the situation conform to 'A-7' when the checkout is done from the state in which there is no committed file (if the file meta data bundle file was not staged, it would not block the checkout because it would be just a untracked file).


7: What the Wrapper Has to Do for Restoring the File Modification Dates and Times in Any Commit Checkout


Hypothesizer 7
We want to restore the file modification dates and times after any commit checkout has been done.

The file meta data bundle file in the new current commit should have been extracted from the repository into the working tree: it cannot have been carried over from the previous current commit because such behavior is blocked.

However, we cannot just set all the file modification dates and times registered in the meta data file, to the files in the working tree, because some files might be ones carried over from the previous current commit. In fact, 'M-6' and 'R-5' are such cases.

Anyway, the carried-over files can be detected from the message of the checkout, which enables the wrapper to leave those files alone.


8: What the Wrapper Has to Do for Restoring the File Modification Dates and Times in Any Files Checkout, If Desired So


Hypothesizer 7
As I said, I do not generally mind the file modification dates and times' not being restored in a files checkout, but I do not also mind having an option of restoring them.

Basically, it is simple: getting (not checking out) the file meta data bundle file from the specified commit and setting the modification dates and times registered in the file meta data bundle file to the checked-out files.


9: The Conclusion and Beyond


Hypothesizer 7
Now, I seem to understand how to store the file modification dates and times into the Git repository and how to restore them to the checked-out files: I need to create a wrapper of the 'git' command, which (the wrapper) does what are described above.

. . . Is that it? . . . Where is the wrapper? . . . Actually, I am working on it, which will be published in a future article.


References


  • Przemoc. (2018/01/06). Przemoc's software. Retrieved from http://software.przemoc.net/#metastore
<The previous article in this series | The table of contents of this series |

2018-09-16

2: Git's Checkout Behavior

<The previous article in this series | The table of contents of this series | The next article in this series>

Whether it is rational or not, whatever the motive is, Git's checkout behaves like this.

Topics


About: Git

The table of contents of this article


Starting Context


  • The reader has knowledge on the whole picture of Git.
  • The reader has knowledge on basic operations (staging, committing, checking out, etc.) of Git.

Target Context


  • The reader will understand how Git's checkouts behave.

Orientation


Hypothesizer 7
First, let us remember that there are two kinds of checkouts: 'commit checkout' and 'files checkout'; they are conceptually very different.

Any repository at any time (except when it has no commit yet) has a single current commit (which is actually 'HEAD'), and any commit checkout sets the current commit to be the specified commit (note that any branch is a pointer to a commit). The current commit is the commit on which we work at the time.

The commit checkout operation, basically, also sets up the working tree so that the working tree has the registered files of the new current commit (which is natural because when we begin to work on the new current commit, we usually first want to have the authentic files of the new current commit at hand, right?). But "basically"? . . . Yes, only basically: it is more complicated, which is the main issue here.

On the other hand, any files checkout creates or replaces some files in the working tree in the current commit (with the current commit keeping being the same commit), using the contents of the files registered in the specified commit (which can be the current commit or another commit).

There is an important thing to note though, as will be discussed in the main body.


Main Body


1: Commit Checkouts' Unfathomable Behavior


Hypothesizer 7
'Orientation' should have clarified what 'commit checkout' basically is, but its behavior is unfathomable (in the meaning that its guiding principle is incomprehensible, at least to me, and its rationality is questionable, at least to me). However, I can list how it behaves as it does.

Note that when any commit checkout changes the current commit from a commit to another commit, I call the former commit and the latter commit 'the previous current commit' and 'the new current commit', respectively.

First, when the previous current commit is clean (meaning that all the changes have been committed), there is nothing to be specifically mentioned.

Second, when the previous current commit had any uncommitted (whether not staged or staged) change, the commit checkout behaves in a complicated manner.

In fact, the commit checkout does not always tries to fill the staging area and the working tree with the registered files of the new current commit, but sometimes tries to carry over the uncommitted changes that were being done to the previous current commit, into the new current commit.

Honestly, I do not understand the rationality of the latter try: the changes were naturally (at least for me) for the previous current commit (why will I ever make the changes to the previous current commit if they are for the new current commit?), and I do not want the changes for the previous current commit to be carried over into the new current commit: most certainly, I have just forgotten to commit the changes to the previous current commit and Git's reminding me the forgetfulness would be nice.

Besides, the criterion on when Git opts to do the latter try is incomprehensible to me.

In fact, these are the tests I have done in order to understand Git's commit checkout behavior. To explain the descriptions below, for example, in the test, 'A-1', the previous current commit does not have the concerned file been committed ('none' means that there is no file there), has the concerned file been staged with the contents of 'ccc', and has the concerned file in the working tree with the contents of 'ccc'; the new current commit does not have the concerned file been committed (as the new current commit exists only in the repository before the checkout, there is no staged state or working tree state for it at that time); the result message of the checkout is "A ccc.txt", the new current commit has the concerned file been staged with the contents of 'ccc', and has the concerned file in the working tree with the contents of 'ccc', after the checkout. 'Message''s 'none' does not mean that there is no message at all, but there is no message to be specifically mentioned. Any message , "A . . .", means that the change as an added file has been carried over into the new current commit; any message , "M . . .", means that the change as a modified file has been carried over into the new current commit; any message , "D . . .", means that the change as a removed file has been carried over into the new current commit; any error means that Git tried to restore the registered state of the new current commit and found out that the restoration would destroy the changes that had been done to the previous current commit.

# Tests for an Added File Start

Test Number: A-1
Previous Current Commit: Committed -> none, Staged -> ccc, Working -> ccc
New Current Commit : Committed -> none
--> Checkout -->
Message : A ccc.txt
Staged : ccc
Working : ccc

Test Number: A-2
Previous Current Commit: Committed -> none, Staged -> ccc, Working -> cccc
New Current Commit : Committed -> none
--> Checkout -->
Message : A ccc.txt
Staged : ccc
Working : cccc

Test Number: A-3
Previous Current Commit: Committed -> none, Staged -> ccc, Working -> none
New Current Commit : Committed -> none
--> Checkout -->
Message : none
Staged : ccc
Working : none
* Although the change has been carried over into the new current commit, there is no message.

Test Number: A-4
Previous Current Commit: Committed -> none, Staged -> ccc, Working -> ccc
New Current Commit : Committed -> ccc
--> Checkout -->
Message : none
Staged : ccc
Working : ccc

Test Number: A-5
Previous Current Commit: Committed -> none, Staged -> ccc, Working -> cccc
New Current Commit : Committed -> ccc
--> Checkout -->
Message : M ccc.txt
Staged : ccc
Working : cccc

Test Number: A-6
Previous Current Commit: Committed -> none, Staged -> ccc, Working -> none
New Current Commit : Committed -> ccc
--> Checkout -->
Message : D ccc.txt
Staged : ccc
Working : none

Test Number: A-7
Previous Current Commit: Committed -> none, Staged -> ccc, Working -> ccc
New Current Commit : Committed -> cccc
--> Checkout -->
Message : error: Your local changes to the following files would be overwritten by checkout:
Staged : n/a
Working : n/a

Test Number: A-8
Previous Current Commit: Committed -> none, Staged -> ccc, Working -> cccc
New Current Commit : Committed -> cccc
--> Checkout -->
Message : error: Your local changes to the following files would be overwritten by checkout:
Staged : n/a
Working : n/a

Test Number: A-9
Previous Current Commit: Committed -> none, Staged -> ccc, Working -> none
New Current Commit : Committed -> cccc
--> Checkout -->
Message : error: Your local changes to the following files would be overwritten by checkout:
Staged : n/a
Working : n/a

Test Number: A-10
Previous Current Commit: Committed -> none, Staged -> ccc, Working -> ccc
New Current Commit : Committed -> ccccc
--> Checkout -->
Message : error: Your local changes to the following files would be overwritten by checkout:
Staged : n/a
Working : n/a

Test Number: A-11
Previous Current Commit: Committed -> none, Staged -> ccc, Working -> cccc
New Current Commit : Committed -> ccccc
--> Checkout -->
Message : error: Your local changes to the following files would be overwritten by checkout:
Staged : n/a
Working : n/a

Test Number: A-12
Previous Current Commit: Committed -> none, Staged -> ccc, Working -> none
New Current Commit : Committed -> ccccc
--> Checkout -->
Message : error: Your local changes to the following files would be overwritten by checkout:
Staged : n/a
Working : n/a

# Tests for an Added File End

# Tests for a Modified File Start

Test Number: M-1
Previous Current Commit: Committed -> aaa, Staged -> aaa, Working -> aaaa
New Current Commit : Committed -> none
--> Checkout -->
Message : error: Your local changes to the following files would be overwritten by checkout:
Staged : n/a
Working : n/a

Test Number: M-2
Previous Current Commit: Committed -> aaa, Staged -> aaaa, Working -> aaaa
New Current Commit : Committed -> none
--> Checkout -->
Message : error: Your local changes to the following files would be overwritten by checkout:
Staged : n/a
Working : n/a

Test Number: M-3
Previous Current Commit: Committed -> aaa, Staged -> aaaa, Working -> aaaaa
New Current Commit : Committed -> none
--> Checkout -->
Message : error: Your local changes to the following files would be overwritten by checkout:
Staged : n/a
Working : n/a

Test Number: M-4
Previous Current Commit: Committed -> aaa, Staged -> aaaa, Working -> aaa
New Current Commit : Committed -> none
--> Checkout -->
Message : error: Your local changes to the following files would be overwritten by checkout:
Staged : n/a
Working : n/a

Test Number: M-5
Previous Current Commit: Committed -> aaa, Staged -> aaaa, Working -> none
New Current Commit : Committed -> none
--> Checkout -->
Message : error: Your local changes to the following files would be overwritten by checkout:
Staged : n/a
Working : n/a

Test Number: M-6
Previous Current Commit: Committed -> aaa, Staged -> aaa, Working -> aaaa
New Current Commit : Committed -> aaa
--> Checkout -->
Message : M aaa.txt
Staged : aaa
Working : aaaa

Test Number: M-7
Previous Current Commit: Committed -> aaa, Staged -> aaaa, Working -> aaaa
New Current Commit : Committed -> aaa
--> Checkout -->
Message : M aaa.txt
Staged : aaaa
Working : aaaa

Test Number: M-8
Previous Current Commit: Committed -> aaa, Staged -> aaaa, Working -> aaaaa
New Current Commit : Committed -> aaa
--> Checkout -->
Message : M aaa.txt
Staged : aaaa
Working : aaaaa

Test Number: M-9
Previous Current Commit: Committed -> aaa, Staged -> aaaa, Working -> aaa
New Current Commit : Committed -> aaa
--> Checkout -->
Message : M aaa.txt
Staged : aaaa
Working : aaa

Test Number: M-10
Previous Current Commit: Committed -> aaa, Staged -> aaaa, Working -> none
New Current Commit : Committed -> aaa
--> Checkout -->
Message : D aaa.txt
Staged : aaaa
Working : none
* Although the modification has been carried over as staged into the new current commit, the message is on the deletion in the working tree.

Test Number: M-11
Previous Current Commit: Committed -> aaa, Staged -> aaa, Working -> aaaa
New Current Commit : Committed -> aaaa
--> Checkout -->
Message : error: Your local changes to the following files would be overwritten by checkout:
Staged : n/a
Working : n/a

Test Number: M-12
Previous Current Commit: Committed -> aaa, Staged -> aaaa, Working -> aaaa
New Current Commit : Committed -> aaaa
--> Checkout -->
Message : none
Staged : aaaa
Working : aaaa
* The staging for the previous current commit has been silently lost.

Test Number: M-13
Previous Current Commit: Committed -> aaa, Staged -> aaaa, Working -> aaaaa
New Current Commit : Committed -> aaaa
--> Checkout -->
Message : M aaa.txt
Staged : aaaa
Working : aaaaa

Test Number: M-14
Previous Current Commit: Committed -> aaa, Staged -> aaaa, Working -> aaa
New Current Commit : Committed -> aaaa
--> Checkout -->
Message : M aaa.txt
Staged : aaaa
Working : aaa

Test Number: M-15
Previous Current Commit: Committed -> aaa, Staged -> aaaa, Working -> none
New Current Commit : Committed -> aaaa
--> Checkout -->
Message : D aaa.txt
Staged : aaaa
Working : none

Test Number: M-16
Previous Current Commit: Committed -> aaa, Staged -> aaa, Working -> aaaa
New Current Commit : Committed -> aaaaa
--> Checkout -->
Message : error: Your local changes to the following files would be overwritten by checkout:
Staged : n/a
Working : n/a

Test Number: M-17
Previous Current Commit: Committed -> aaa, Staged -> aaaa, Working -> aaaa
New Current Commit : Committed -> aaaaa
--> Checkout -->
Message : error: Your local changes to the following files would be overwritten by checkout:
Staged : n/a
Working : n/a

Test Number: M-18
Previous Current Commit: Committed -> aaa, Staged -> aaaa, Working -> aaaaa
New Current Commit : Committed -> aaaaa
--> Checkout -->
Message : error: Your local changes to the following files would be overwritten by checkout:
Staged : n/a
Working : n/a

Test Number: M-19
Previous Current Commit: Committed -> aaa, Staged -> aaaa, Working -> aaa
New Current Commit : Committed -> aaaaa
--> Checkout -->
Message : error: Your local changes to the following files would be overwritten by checkout:
Staged : n/a
Working : n/a

Test Number: M-20
Previous Current Commit: Committed -> aaa, Staged -> aaaa, Working -> none
New Current Commit : Committed -> aaaaa
--> Checkout -->
Message : error: Your local changes to the following files would be overwritten by checkout:
Staged : n/a
Working : n/a

# Tests for a Modified File End

# Tests for Removed a File Start

Test Number: R-1
Previous Current Commit: Committed -> aaa, Staged -> aaa, Working -> none
New Current Commit : Committed -> none
--> Checkout -->
Message : none
Staged : none
Working : none

Test Number: R-2
Previous Current Commit: Committed -> aaa, Staged -> none, Working -> aaa
New Current Commit : Committed -> none
--> Checkout -->
Message : error: The following untracked working tree files would be removed by checkout:
Staged : n/a
Working : n/a

Test Number: R-3
Previous Current Commit: Committed -> aaa, Staged -> none, Working -> none
New Current Commit : Committed -> none
--> Checkout -->
Message : none
Staged : none
Working : none
* The staging for the previous current commit has been silently lost.

Test Number: R-4
Previous Current Commit: Committed -> aaa, Staged -> none, Working -> aaaa
New Current Commit : Committed -> none
--> Checkout -->
Message : error: The following untracked working tree files would be removed by checkout:
Staged : n/a
Working : n/a

Test Number: R-5
Previous Current Commit: Committed -> aaa, Staged -> aaa, Working -> none
New Current Commit : Committed -> aaa
--> Checkout -->
Message : D aaa.txt
Staged : aaa
Working : none

Test Number: R-6
Previous Current Commit: Committed -> aaa, Staged -> none, Working -> aaa
New Current Commit : Committed -> aaa
--> Checkout -->
Message : D aaa.txt
Staged : none
Working : aaa

Test Number: R-7
Previous Current Commit: Committed -> aaa, Staged -> none, Working -> none
New Current Commit : Committed -> aaa
--> Checkout -->
Message : D aaa.txt
Staged : none
Working : none

Test Number: R-8
Previous Current Commit: Committed -> aaa, Staged -> none, Working -> aaaa
New Current Commit : Committed -> aaa
--> Checkout -->
Message : D aaa.txt
Staged : none
Working : aaaa

Test Number: R-9
Previous Current Commit: Committed -> aaa, Staged -> aaa, Working -> none
New Current Commit : Committed -> aaaa
--> Checkout -->
Message : none
Staged : aaaa
Working : aaaa

Test Number: R-10
Previous Current Commit: Committed -> aaa, Staged -> none, Working -> aaa
New Current Commit : Committed -> aaaa
--> Checkout -->
Message : error: Your local changes to the following files would be overwritten by checkout:
Staged : n/a
Working : n/a

Test Number: R-11
Previous Current Commit: Committed -> aaa, Staged -> none, Working -> none
New Current Commit : Committed -> aaaa
--> Checkout -->
Message : error: Your local changes to the following files would be overwritten by checkout:
Staged : n/a
Working : n/a

Test Number: R-12
Previous Current Commit: Committed -> aaa, Staged -> none, Working -> aaaa
New Current Commit : Committed -> aaaa
--> Checkout -->
Message : error: Your local changes to the following files would be overwritten by checkout:
Staged : n/a
Working : n/a

Test Number: R-13
Previous Current Commit: Committed -> aaa, Staged -> aaa, Working -> none
New Current Commit : Committed -> aaaaa
--> Checkout -->
Message : none
Staged : aaaaa
Working : aaaaa

Test Number: R-14
Previous Current Commit: Committed -> aaa, Staged -> none, Working -> aaa
New Current Commit : Committed -> aaaaa
--> Checkout -->
Message : error: Your local changes to the following files would be overwritten by checkout:
Staged : n/a
Working : n/a

Test Number: R-15
Previous Current Commit: Committed -> aaa, Staged -> none, Working -> none
New Current Commit : Committed -> aaaaa
--> Checkout -->
Message : error: Your local changes to the following files would be overwritten by checkout:
Staged : n/a
Working : n/a

Test Number: R-16
Previous Current Commit: Committed -> aaa, Staged -> none, Working -> aaaa
New Current Commit : Committed -> aaaaa
--> Checkout -->
Message : error: Your local changes to the following files would be overwritten by checkout:
Staged : n/a
Working : n/a

# Tests for a Removed File End

Hmm . . ., some results are weird, I would say.

First, Git sometimes tries to carry over the changes and sometimes tries to just restore the registered state of the new current commit, but on what criterion?

Well, I had built a hypothesis that only when the file contents in the new current commit equaled the file contents in the previous current commit repository (supposing 'none' equaled 'none'), Git would carry over the changes, but some results contradict the hypothesis: 'A-5', 'A-6', 'M-13', 'M-14', and 'M-15'. And I had built another hypothesis that only when the file contents in the new current commit equaled the file contents in the previous current commit repository or the file contents in the the previous current commit staging area (supposing 'none' equaled 'none'), Git would carry over the changes, but some results contradict the hypothesis: 'R2' and 'R4'. . . . Does removing just have a different criterion? . . . Hmm, anyway, whatever the criterion is, the behavior does not make sense to me.

Besides, some messages or non-existence of them seem unreasonable to me. In fact, why does not 'A-3' give the message that the change has been carried over as staged? If the user does not notice the fact, the file can be committed to the new current commit unintentionally . . .. And in each of 'M-12' and 'R-3', the staging for the previous current commit just has vanished without any notice (certainly, the contents itself remains in the staging area in the new current commit, but the fact of its being staged for the previous current commit just has vanished). And in 'M-10', the "D" message is confusing because the modification, not the deletion, is staged for the new current commit: the next commit to the new current commit will modify, not remove, the file.

Honestly, I do not understand why Git has to have such complicated behavior. I think, it should just give a warning and block the checkout when there is any uncommitted change in the previous current commit, unless the '-f' flag is specified.

Should I just always use the '-f' flag? . . .You know, I need the warning, which using the '-f' flag does not give.


2: Files Checkout Behavior to Beware of


Hypothesizer 7
Any files checkout doesn't just place the files in the working tree, but also automatically stages the files.

Hmm, that is also behavior I do not favor. . . . I almost always do not check out a file in any conviction that I will certainly commit the file, but I first examine the file (typically by rebuilding the project and doing some tests) and then decide to commit it. The file's being automatically staged is an annoyance: I have to take caution not to unintentionally commit such files.

Should I use rather 'show'? Just showing doesn't let me do tests. Should I redirect the shown result into the file? . . . Well, if I have to do so, I would have to do so, although that is not particularly desirable (I have to specify the file path of the redirection and I cannot check out multiple files at once) . . .


3: The Conclusion and Beyond


Hypothesizer 7
Now, I seem to understand how Git's checkouts behave (although not their guiding principle).

In short, in any commit checkout, Git sometimes kindly (really?) carries over uncommitted changes meant for the previous current commit, into the new current commit, and in any files checkout, Git kindly (really?) stages the files before the user examines the files.

I have investigated that behavior because that influences (unfavorably) my scheme of storing and restoring file modification times. For example, as changes in the working tree are carried over from the previous current commit into the new current commit, I cannot just restore the file modification times that has been registered in the new current commit.

Hmm, honestly, the more I learn Git, the more I dislike it . . . I swear that I began to learn Git because I had high expectation of it, but its way of thinking doesn't fit me well . . .


References


<The previous article in this series | The table of contents of this series | The next article in this series>

2018-08-05

1: The Whole Picture of Git

<The previous article in this series | The table of contents of this series | The next article in this series>

Git is not a distributed versions control system that networks centralized repositories, a point most tutorials fail to clarify at the very beginning.

Topics


About: Git

The table of contents of this article


Starting Context


  • The reader has knowledge on what versions control system in general is (the purpose of it and the mechanism that is common in various types of versions control system) so that he or she knows (or at least can confidently guess) what some terms (for example, 'repository' and 'working tree' (also known as 'working directory')) mean.
  • The reader has read a Git tutorial, but couldn't understand the whole picture of Git well.

Target Context


  • The reader will better understand the whole picture of Git.

Orientation


Hypothesizer 7
When I, a Git novice, read one Git tutorial after another for the purpose of introducing Git into my system, they similarly explained things and similarly ignored my concern.

I wanted to use Git in order to manage my own documents in a repository with old versions preserved and being able to be looked up on demand: I didn't intend to share the repository with others. And my humble concern was to establish the single repository, which resided somewhere (for example, in '~/myData/gitRepositries/documents'), and register my documents, which resided somewhere else (for example, in '~/myData/documents'), where I continued to edit them, into the repository.

Is that concern unreasonable? . . . In fact, I have some experience with a few centralized versions control systems (any centralized versions control system is a versions control system that can have repositories, contents of each of which (the repositories) can be concertedly edited in multiple working trees (even per branch), typically by multiple persons), and that concern is certainly normal, or standard, I would say, for any centralized versions control system.

Certainly, I knew that Git was a distributed versions control system, but I expected that being a distributed versions control system was just about synchronizing multiple repositories; as for handling a single repository, Git would be able to be treated as a centralized versions control system. In fact, why shouldn't I expect so?

So I read a Git tutorial, which created a repository in a directory, which was fine, and . . . began to put files directly into that directory as though that was the obvious, sole, imaginable thing to do. . . . "No, no, no, I don't intend to do such a thing: I want to have my all the repositories (including this one) under a directory (I don't want to scatter repositories around), and I want to edit files at other locations (for me, the repositories directory is for storing repositories, not for editing files, and I don't agree to be forced to edit files only under the repositories directory).

So, I read another tutorial, and . . . it did the same thing . . . Huh? And I read another, and it showed a chart like this.


"Yes! 'Working directory'! I want to have my files in the 'working directory', not in the repository directory!"

So, I eagerly read the tutorial along, and it created a repository in a directory, and . . . began to put files directly into that directory without mentioning any 'working directory' . . .

Huh?? "What happened to the 'working directory'? Let me know how to set up the 'working directory', please?"

. . . Well, certainly, I am the one to be blamed in that I had come with the expectation that was natural for centralized versions control systems users, but are centralized versions control systems users unwelcome to those tutorials? . . . As it seems so, I will try to make a clarification that welcomes anyone who fulfills 'Starting Context'.


Main Body


1: Notes


Hypothesizer 7
Note that I will omit mentioning 'branch' in this article. That is because handling multiple branches isn't any concern of this article and scattering the term, 'branch', over the article while only one branch per repository is always considered seems to contribute to more confusion rather than to clarification. For example, I will use an expression like "the contents of a repository" while it is strictly speaking 'the contents of a branch in a repository'.

Furthermore, I intentionally ignore 'linked working tree' in this article. That is because it is about dealing with different branches in a repository at the same time, which isn't any concern in this article, and also because it is experimental and incomplete. So, in this article, I will just mention 'working tree', meaning 'main working tree' (also known as 'main working directory').


2: Strictly Speaking, the Directory Specified as the Working Tree for a Git Command Execution Is the Working Tree for the Git Command Execution


Hypothesizer 7
I had expected that I would set up a directory as a working tree of a repository, and I would be able to use the directory as a working tree of the repository thereafter, until I drop the setting. In fact, Git doesn't work like that.

The 'git' command takes the '--git-dir=' and '--work-tree=' switches, which let us designate the repository directory and the working tree to be handled. So, whatever directory we specify as the working tree, it is the working tree for the command execution.


3: However, the Contents of Any Repository Are Basically Supposed to Be Edited from a Single Working Tree at a Time


Hypothesizer 7
However, as Git doesn't check any conflict of changes staged from multiple working trees, basically, the contents of any repository is supposed to be edited from a single working tree at a time (by "at a time" I mean that we can delete the existing working tree and then create a new working tree (by checking out the repository), which means editing the contents of the repository from multiple working trees, but not from multiple working trees at a time).


4: In Fact, There Is the Default Working Tree for Any Non-Bare Repository


Hypothesizer 7
What is 'non-bare repository'? Actually, it is a repository that has the default working tree.

Although it's fine if we consistently specify the working tree by the '--work-tree=' switch, it is usually cumbersome. So, any repository can have the default working tree, which unnecessitates specifying the working tree each time.


5: Then, What Is 'Bare Repository'?'


Hypothesizer 7
Obviously, 'bare repository' is a repository that doesn't have any default working tree.

However, actually, that doesn't mean that the contents in the bare repository cannot be edited from any working tree: they can be edited from any working tree specified by the '--work-tree=' switch, although whether that is encouraged is another story.


6: So, What Does the 'git init' Command Really Do?'


Hypothesizer 7
As most tutorials say just "the 'git init' command creates a repository", I have to wonder ". . . Then what about the working tree? How can I create the working tree?".

In fact, the 'git init' command without any further parameter 'creates a repository in the '.git' directory in the current directory, and designates the current directory as the default working tree of the repository'. If those tutorials had kindly explained so, I wouldn't have been left at a loss . . .

Anyway, that is the default directories structure for pair of a repository and its default working tree, but actually, we can place each of the repository and the default working tree at any location by specifying the '--git-dir=' and '--work-tree=' switches for the 'git init' command.

In fact, the location of the default working tree is set in the 'config' file in the repository directory (although in the default directories structure, it isn't explicitly set) and can be changed afterward by our modifying the file.


7: After All, Git Is a Distributed Versions Control System That Networks Local Versions Control System Repositories'


Hypothesizer 7
As one should have already understood now, Git is a distributed versions control system that networks local versions control system repositories, not any distributed versions control system that networks centralized versions control system repositories as I had expected.

Why had I conceived that erroneous expectation? Well, for one, that was natural for a centralized versions control systems user; for two, as I thought that versions control system had evolved from local versions control system to centralized versions control system to distributed versions control system, I guessed that any distributed versions control system should be an enhanced centralized versions control system.

Anyway, Git with a single repository isn't any centralized versions control system, but a local versions control system.


8: In the Intended Usage, Any Working Tree Is Given Its Private Repository, and the Two Forms an Exclusive Pair'


Hypothesizer 7
Let's distinguish between how Git can be possibly used and how Git is supposed to be used. Although by wantonly using the '--git-dir=' and '--work-tree=' switches, we can use multiple repositories and multiple working trees in free combinations, this seems to be the way in which Git is supposed to be used: any working tree is given its private repository, and the two forms an exclusive pair.

From a 'centralized versions control systems user'-ish view, Git's being able to have only one working tree per repository is just an inconvenient restriction. But from the Git-ish view, that seems to be meant to be understood the other way around: every working tree is kindly given its private repository.

I admit that that decision has enabled the merit of being very light for Git: as any repository is meant to be handled by only one working tree, it can be accessed just as a group of files without any server software that coordinates requests from multiple working trees.

On the other hand, honestly, I don't want any gratuitous repository given to my every working tree: any extra repository means waste for disk space and a nuisance for me (as it forces me to have to synchronize repositories).


9: Any Repository Which Is Being Directly/Indirectly Handled by Someone Is Local/Remote for Him or Her at the Instant


Hypothesizer 7
I am troubled by some usages of the terms, 'local repository' and 'remote repository', in many tutorials. That is because a repository is stated as though being local or remote in itself. In fact, being local or remote isn't any attribute of the repository, but is about how the repository is being used by someone at the instant: the repository is local or remote only for that person at the instant. So, expressions like "create a local repository" seem nonsense: the repository can be local or remote according to each usage.

In fact, when someone is handling a repository directly, the repository is a local repository for him or her at the instant, while when someone is handling a repository indirectly (meaning 'via another repository'), the repository is a remote repository for him or her at the instant.

Besides, using 'local/remote repository' also as 'repository that resides in a local/remote computer' is confusing. One should make up his or her mind about what 'local/remote repository' means.


10: Should Git Be Used for Centralized Versions Control System Needs?'


Hypothesizer 7
I read a tutorial that claimed (in effect) that one didn't need to use any centralized versions control system any more because Git, as a distributed versions control system, can serve as a centralized versions control system.

Well, . . . that may be so, but whether Git should be used is another story. For one, Git doesn't seem to address some typical centralized versions control system needs. For example, there may be a need that any file that is edited by someone to be made readonly (or at least known to be being edited by someone) to the others. Someone might say that such an operation model is bad or at least unnecessarily, but such an operation model is more effective for some projects than Git's.

For two, Git for centralized versions control system needs necessitates unnecessary extra private repositories, which is waste for disk space and nuisances for users.

As a conclusion, I don't think that Git should particularly be used for centralized versions control system needs, although if it is decided to be used by someone, that's, of course, fine.


11: So, What Should I Do for My System?'


Hypothesizer 7
Although I had expected a centralized versions control system from Git with a single repository, actually, I just need a local versions control system that allows me to place repositories and working trees at arbitrary locations .

For my needs, Git with a single repository and a single (at a time) working tree at customized locations seems fine.


12: The Conclusion and Beyond


Hypothesizer 7
Now, I seem to better understand the whole picture of Git.

In short, Git isn't any distributed versions control system that networks centralized versions control system repositories, but one that networks local versions control system repositories.

In the intended usage, any working tree is given its private repository, and such non-bare repositories plus bare repositories are networked. Although any repository is supposed to have at most only one working tree at a time, the working tree can be located anywhere.

That architecture has its merits and its demerits (at least for some people), which will be material on which one can judge whether Git is optimal for his or her needs.

By the way, I have found that Git cannot store file modification times (yes, 'cannot store them', not just 'cannot restore them when being checked out'), which is a critical problem, at least for me. I will try to refute the argument that supports such Git's decision and to understand how to solve the problem, in a future article.


References


<The previous article in this series | The table of contents of this series | The next article in this series>

0: The Table of Contents of the Series, 'Let Me Understand Git'

| The table of contents of this series | The next article in this series>

Table of Contents


1: The Whole Picture of Git
Git is not a distributed versions control system that networks centralized repositories, a point most tutorials fail to clarify at the very beginning.
2: Git's Checkout Behavior
Whether it is rational or not, whatever the motive is, Git's checkout behaves like this.
3: Store and Restore File Modification Dates and Times in Git
Git's not storing/restoring file modification dates and times is a decision of a specialized taste, which does not have to be shared by everyone.


| The table of contents of this series | The next article in this series>