2018-09-30

3: Store and Restore File Modification Dates and Times in Git

<The previous article in this series | The table of contents of this series |

Git's not storing/restoring file modification dates and times is a decision of a specialized taste, which does not have to be shared by everyone.

Topics


About: Git

The table of contents of this article


Starting Context


  • The reader has knowledge on the whole picture of Git.
  • The reader has knowledge on some of the basic operations (staging the files, unstaging the files, removing the files, committing the changes, checking out the commit or the files) of Git.

Target Context


  • The reader will understand how to store the file modification dates and times into the Git repository and how to restore them to the checked-out files.

Orientation


Hypothesizer 7
Git does not store any file modification date and time (the commit dates and times are stored, but cannot substitute for the file modification dates and times, at least for me). Hmm . . .

There is a document that claims to be an answer to "Why does not Git preserve the file modification times? ", and "preserving" seems to mean 'restoring to the checked-out files' in the document. However, I am not asking "Why does not Git restore the file modification dates and times to the checked-out files?" (at least for now, although I will do so later) but "Why does not Git store the file modification dates and times into the repository?".

The file modification dates and times are useful information for some administrative purposes, and I definitely want them to be recorded in the repository: being able to look up them in the repository is fine. In fact, not restoring the file modification dates and times to the checked-out files does not necessitate nor justify not storing the file modification dates and times into the repository.

After all, Git does not store the file modification dates and times not because of the concern stated in the document, but because of the specialized taste that file modification dates and times exist only for build tools to identify the files to be compiled, which I do not share or recognize any reason to be forced to share.

Even as for restoring the file modification dates and times, I do not agree with the document: when I check out a branch, I will certainly clean the project anyway, because otherwise, the unnecessary and could-cause-troubles derived files that do not have any corresponding source file in the checked-out branch would be left as trash; those derived files could cause troubles because using those phantom derived files would not be detected as any error, and trash would be unfavorably included into the Jar file (sorry, my first programming language is Java). As checking out a branch means that I now face a possibly quite different version of the project, cleaning the project is reasonable, in my opinion.

On the other hand, the files checkouts are different (the difference between 'commit checkout' and 'files checkout' is argued in this article), and not restoring the file modification dates and times makes (at least some) sense for them (certainly, I do not want to clean the project just because I have replaced a few source files). As I keep facing the same version of the project, not cleaning the project is reasonable.

So, not restoring the file modification dates and times in a files checkout is fine (the checked-out files can be naturally considered to have been modified at the checkout, with their after-modifications contents happening to be the same with the contents of some files registered in the repository), but I prefer restoring the file modification dates and times in any commit checkout (it is unnatural to consider that all the files in the checked-out branch are modified at the checkout).

In the first place, is Git only for compiler programming language projects? The concern stated in the document does not matter at all for non-compiler programming language projects or document files repositories.

I have heard that Git was originally developed for a specific project, and certainly, I am not in any position to criticize any artifact that exists only for a single project, of which I am not any member: I guess that the decision suited the tastes of some majority members of the project or the taste of the dictator of the project. If Git opts to keep being only for the specific project, let us, the outsiders, leave it alone and use another artifact that listens to general needs.

So, I wondered whether I would adopt another versions control system or would (if possible) tweak the behavior of Git, and have tried the latter, first.


Main Body


1: Although There Is 'Metastore', . . .


Hypothesizer 7
I have found that there is an artifact called 'Metastore', but I have also found that it does not exactly address my concern: it does not store the modification dates and times of the files registered in the repository, but the modification dates and times of the files in the working tree.

For example, when a file has been changed in the working tree without being staged, 'Metastore' will store the modification date and time in the working tree when the next commit is performed, although the change will not be reflected in the repository.

In fact, although 'Metastore' gets the modification date and time at the commit time (at least in the usage that uses the pre-commit hook included in it), that is too late because the modification date and time of the staged version can be lost anytime after the staging.

And simply restoring all the registered file modification dates and times to the files in the working tree, in the post-checkout hook does not realize the accurate result because of the complicated checkout behavior: the modification dates and times of the carried-over files in the working tree should be left alone.

And the meta data file's being in a binary format seems a problem for resolving conflicts from 'pushes' from some multiple repositories, although I haven't personally experienced that issue, yet.


2: A Rough Idea


Hypothesizer 7
However, the basic idea is usable, or I cannot think of another one: store the modification dates and times of the files concerned (the files that belong to the commit) into a file (which I will call 'file meta data bundle file') that is registered into the repository as a part of the commit (which means that there is a single file meta data bundle file for each commit).

The modification date and time of any file has to be recorded when the file is staged ('when the commit is executed' is too late as argued in the previous section). How can I do that?

We cannot have any hook for staging, but we can create a filter that is called when any file is being staged.

I considered using such a filter, but have found out that that way has some difficulties. First, the filter is called not only when the 'git add' command is executed, but also in some other unfathomable (for me) occasions (some commits and some checkouts), and could cause some unexpected results. Second, we have to maintain the file modification dates and times data not only when the files are staged, but also when the files are unstaged or removed, for which the filter is not called.

Hmm . . ., after all, any hook or filter does not do, and I do not seem to have any other option than creating a wrapper of the 'git' command.


3: The Wrapper Will Cover Only a Part of the Whole Possible Usage of the 'git' Command


Hypothesizer 7
Actually, I do not intend to make the wrapper cover the whole possible usage of the 'git' command, because making the wrapper cover some usage is rather tiresome (although not impossible) while I am not interested in using such usage or such usage is not indispensable.

Such usage includes the interactive mode of the 'add' sub command (the '-i' switch): simply, I do not feel any necessity for it.

The patching modes of the 'add' and 'reset' sub commands (the '-p' switch or the '--patch' switch) call for more consideration. Do I need them? . . . Hmm, they are about directly editing the file in the staging area without first editing the file in the working tree, which, basically, I do not do because I, usually, feel necessity to examine the file in the working tree (for example, by building the project and testing the program or by proofreading the document) before I commit the file. In the first place, why do I want to edit the file only in the staging area? . . . Probably, I want to create a spin-off version of the project, but, then, for me, just creating a commit in the master branch does not do: I will want a secondary branch. So, I would rather stash the master branch, create a secondary branch, apply the stash to the secondary branch, edit the file in the working tree, examine the change, stage and commit the change to the secondary branch, return to the master branch, and pop the stash to the master branch, without using the patching feature. I know that that involves many steps, but as I need the secondary branch, just patching the file in the staging area and committing the patch in the master branch does not do anyway. I do not particularly deny that 'patching' might be sometimes handy, but that does not incentivize me to make the wrapper cover the feature through tiresome toil.

Also included is 'resetting' any file by using any commit that is not 'HEAD': that is also about directly editing the file in the staging area without first editing the file in the working tree, which I do not do. In fact, I use 'reset' only in order to just cancel the staging (unstage the file), which is sometimes a necessary operation.

The patching mode of 'files checkout' (the '-p' switch or the '--patch' switch for the 'checkout' sub command) has some charm, certainly . . ., but the wrapper will not cover that either because the wrapper will not really call the 'checkout' sub command for any files checkout (as described in a subsequent section) and honestly, it is tiresome to make the wrapper cover the feature.

Actually, all that are in my scope are to use 'add' (without '-i' or '-p' or '--patch') to stage the files, to use 'reset' to just cancel the staging (reverting to the 'HEAD' state), to use 'rm' to remove the files, to use 'commit' to commit the staged changes, and to use 'checkout' (without '-p' or '--patch') to prepare for handling another commit (typically a branch) ('commit checkout') or to incorporate some files from another commit into the current commit ('files checkout').


4: The Format of File Meta Data Bundle File


Hypothesizer 7
Let me determine the format of file meta bundle file.

Any file meta data bundle file (each commit has one) will be an extended JSON file. "extended JSON file"? . . . Actually, I have personally added the date, time, and datetime types into the JSON format because they are indispensable for me.

This is the format of file meta data bundle file.

[%commit date and time%, {%file path%: [%staged file modification date and time%, %registered file modification date and time%], . . .}]

It has the commit date and time in order to differentiate each file meta data bundle file from the other file meta data bundle files; the registered file modification date and time is retained in order to recover the staged file modification date and time when the file is reset.

'%registered file modification date and time%' will be 'null' if the file has not been registered yet; '%staged file modification date and time%' will be 'null' if the file has been committed and then removed but not the removal has not been committed yet; both cannot be 'null' at the same time because the file is removed from the file meta data bundle file in such cases.


5: What the Wrapper Has to Do for Storing the File Modification Dates and Times


Hypothesizer 7
As the principle, the wrapper has to record any change to the staging area (meaning any addition, modification, or removal of any file) at the instant when the change is made; such recording is made into the file meta data bundle file, and the file meta data bundle file will be staged immediately after the recording (yes, not when the commit is executed, because of a reason described below).

Any change to the working tree does not matter because it cannot directly go into the repository while we are concerned with the modification dates and times of the registered files.

How can any file be changed in the staging area?

An obvious way is to be specified in a 'git add' command execution, which is OK (getting the file modification date and time in the working tree and recording it as the staged file modification date and time of the file into the file meta data bundle file).

Another way is to be specified in a 'git reset' command execution (only being reset to the 'HEAD' state is considered, as stated above), which is OK (putting the registered file modification date and time value of the file into the staged file modification date and time slot of the file if the registered file modification date and time is not 'null', or removing the file entry otherwise, in the file meta data bundle file).

Another way is to be specified in a 'git rm' command execution, which is OK (putting 'null' into the staged file modification date and time slot of the file if the registered file modification date and time of the file is not 'null' or removing the file entry otherwise, in the file meta data bundle file).

Another way is to be carried over by a commit checkout (see 'A-1', 'A-2', 'A-3', 'M-7', 'M-8', 'M-9', 'M-10', 'R-6', 'R-7', and 'R-8' in the previous article), which is not OK because it can happen surreptitiously (see 'A-3' and 'M-10' in the previous article). Hmm . . ., as I do not want that carrying-over behavior at all, I will block it from happening (I will talk how later).

Another way is to be automatically staged by a files checkout, which I can cope with whether I will restore the file modification date and time or not. However, cannot I rather cancel the automatic staging itself, which is annoying to me? . . . Hmm, just resetting the file in the staging area to the 'HEAD' state does not do because a change may had been staged, which would be lost . . .. I will rather replace any files checkout operation with some 'show' sub command executions inside the wrapper, redirecting the outcomes into the files (I will have to first identify the possibly multiple files and to execute the 'show' sub command for each file).

The point to be considered is how to identify the files that have been changed in the staging area. . . . As for the 'add' and 'rm' sub commands, it is easy because they, decently, report those files (with the '-v' switch for the 'add' sub command and without the '-q' or '--quiet' switch for the 'rm' sub command), but the 'reset' sub command is not decent . . .. Hmm, it seems that I have to identify the files through the files specification expressions passed to the 'reset' sub command execution. I thought naturally, I think, that the format of those files specification expressions should be 'glob', but . . . it is found out not to be so nor be the regular expression format. . .. Then, what is it? . . . In fact, it is an uncanny Git-original format in which 'aa*.txt' matches 'aaa.txt' and 'aaa/aaa.txt', but not 'bbb/aaa.txt' while 'b*a.txt' matches 'bbb/aaa.txt' (really?). Hmm . . .. On the other hand, the format of the files specification expressions for the 'checkout' sub command execution is different: 'aa*.txt' matches 'aaa.txt', but not 'aaa/aaa.txt' nor 'bbb/aaa.txt' while 'b*a.txt' does not match 'bbb/aaa.txt' while '*/aa*.txt' does not match anything while 'aaa/aa*.txt' matches 'aaa/aaa.txt'. Hmm . . .. By the way, the files specification expressions for the 'ls-tree' sub command do not even accept any wildcard although the manual says that the expressions are not "really raw pathnames", but "rather a list of patterns to match" . . .. Honestly, I really have begun to hate Git . . .. Anyway, the question is "Should my wrapper follow such absurd (I humbly declare that it is absurd) behavior?". . . . Really, I do not want 'bbb/aaa.txt' to be reset when I specify 'b*a.txt'. . . . So, the wrapper will replace such absurd behavior with just simple glob behavior, which means that the wrapper will take glob expressions, expand them, and pass the expanded file paths to the 'git' command.


6: How Will Problematic Commit Checkouts Be Blocked?


Hypothesizer 7
In fact, problematic commit checkouts are already blocked by the measures described above: the file meta data bundle file will cause an error in such any checkout.

In fact, as each file meta data bundle file has its commit date and time (each commit can be supposed to have a unique commit date and time, practically), the file meta data bundle file of the new current commit cannot have the same contents with the file meta data bundle file of the previous current commit (see this article in order to know what I mean by 'the previous current commit' and 'the new current commit'), which causes an error in the checkout if the file meta data bundle file has been changed in the working tree and in the staging area of the previous current commit (see 'M-17' and 'A-7' in the previous article). If not, it does not matter for storing the file modification dates and times if the checkout is not blocked, because there is no change in the staging area to be carried over (changes in the working tree can be carried over, which does not matter for storing the file modification dates and times, but certainly matters for restoring the file modification dates and times). The reason why the file meta data bundle file has to be staged immediately after it is changed is to make the situation conform to 'A-7' when the checkout is done from the state in which there is no committed file (if the file meta data bundle file was not staged, it would not block the checkout because it would be just a untracked file).


7: What the Wrapper Has to Do for Restoring the File Modification Dates and Times in Any Commit Checkout


Hypothesizer 7
We want to restore the file modification dates and times after any commit checkout has been done.

The file meta data bundle file in the new current commit should have been extracted from the repository into the working tree: it cannot have been carried over from the previous current commit because such behavior is blocked.

However, we cannot just set all the file modification dates and times registered in the meta data file, to the files in the working tree, because some files might be ones carried over from the previous current commit. In fact, 'M-6' and 'R-5' are such cases.

Anyway, the carried-over files can be detected from the message of the checkout, which enables the wrapper to leave those files alone.


8: What the Wrapper Has to Do for Restoring the File Modification Dates and Times in Any Files Checkout, If Desired So


Hypothesizer 7
As I said, I do not generally mind the file modification dates and times' not being restored in a files checkout, but I do not also mind having an option of restoring them.

Basically, it is simple: getting (not checking out) the file meta data bundle file from the specified commit and setting the modification dates and times registered in the file meta data bundle file to the checked-out files.


9: The Conclusion and Beyond


Hypothesizer 7
Now, I seem to understand how to store the file modification dates and times into the Git repository and how to restore them to the checked-out files: I need to create a wrapper of the 'git' command, which (the wrapper) does what are described above.

. . . Is that it? . . . Where is the wrapper? . . . Actually, I am working on it, which will be published in a future article.


References


  • Przemoc. (2018/01/06). Przemoc's software. Retrieved from http://software.przemoc.net/#metastore
<The previous article in this series | The table of contents of this series |