2018-12-16

4: "Move Semantics", "Xvalue", "Prvalue", "Glvalue"? Misnomers

<The previous article in this series | The table of contents of this series | The next article in this series>

They are all misnomers, as "lvalue" and "rvalue" are, and any correct understanding begins only when they are understood as misnomers.

Topics


About: C++

The table of contents of this article


Starting Context



Target Context


  • The reader will have a reasonable explanation of what so-called "move semantics", "xvalue", "prvalue", and "glvalue" are in C++.

Orientation


Hypothesizer 7
What? . . . While I have finally understood what so-called "lvalue" and "rvalue" are, are there also "xvalue", "prvalue", and "glvalue"? . . . Yes, there are.

Although I will understand them in 'Main Body', I declare two things here.

First, those terms are misnomers: "xvalue", "prvalue", or "glvalue" is not any value at all, but they are all expressions (in fact, I call them 'xexpression', 'prexpression', and 'glexpression').

Second, most prevalent explanations are essentially wrong because, mainly, they are influenced by the misnomers and explain those categories of expressions as though they are categories of values, although there are some other points on which I have some objections.

In order to understand 'xexpression', 'prexpression', and 'glexpression', I have to first understand so-called "move semantics", which I will do.

Are "move semantics" and the distinctions of expressions really good ideas? . . . Actually, I do not think so, as I will argue in a section. However, whether I like them or not, they exist, and I have no option but understand them.


Main Body


1: What Is "Move Semantics"?


Hypothesizer 7
First note that although it is called "move semantics", nothing is really moved in the meaning that something's location in a memory or a CPU register is changed. Certainly, the ownership of some member data of an object may be moved to another object, in some cases (not in all the cases).

In fact, what exactly happens in any "move" action is just a shallow copy, possibly (but not necessarily) with the re-initialization of the original object.

Let me see an example.

@C++ Source Code
#include <string>
			
			class Owned {
				public:
					::std::string i_name;
					Owned (::std::string a_name);
			};
			
			class Owner {
				public:
					::std::string i_name;
					Owned * i_ownedPointer;
					Owner (::std::string a_name, ::std::string a_ownedName);
					~Owner ();
			};

			Owned::Owned (::std::string a_name) : i_name (a_name) {
			}
			
			Owner::Owner (::std::string a_name, ::std::string a_ownedName) : i_name (a_name) {
				i_ownedPointer = new Owned (a_ownedName);
			}
			
			Owner::~Owner () {
				if (i_ownedPointer != nullptr) {
					delete i_ownedPointer;
					i_ownedPointer = nullptr;
				}
			}
			
			void moveWithoutMoveSemantics (Owner & a_destinationOwner, Owner & a_originalOwner) {
				a_destinationOwner.i_name = a_originalOwner.i_name;
				a_destinationOwner.i_ownedPointer = a_originalOwner.i_ownedPointer;
				// The re-initialization of 'a_originalOwner' Start
				a_originalOwner.i_name = string ("");
				a_originalOwner.i_ownedPointer = nullptr;
				// The re-initialization of 'a_originalOwner' End
			}
			
			int test1 () {
				Owner l_owner1 ("Owner1", "Owned1");
				Owner l_owner2 ("Owner2", "Owned2");
				moveWithoutMoveSemantics (l_owner1, l_owner2);
			}

In fact, that is not any example of "move semantics", but an example of a "move" action that does not use "move semantics": I have shown the example in order to demonstrate '"move" action' independently of "move semantics". So, in order to perform a "move" action, "move semantics" is not indispensable.

This is an example of "move semantics".

@C++ Source Code
eWithMoveSemantics (Owner & a_destinationOwner, Owner & a_originalOwner) {
				::std::cout << "The deep copy version is called." << ::std::endl << ::std::flush;
				a_destinationOwner.i_name = a_originalOwner.i_name;
				a_destinationOwner.i_ownedPointer = new Owned (a_originalOwner.i_ownedPointer->i_name);
			}
			
			void copyOrMoveWithMoveSemantics (Owner & a_destinationOwner, Owner && a_originalOwner) {
				::std::cout << "The shallow copy version is called." << ::std::endl << ::std::flush;
				a_destinationOwner.i_name = a_originalOwner.i_name;
				a_destinationOwner.i_ownedPointer = a_originalOwner.i_ownedPointer;
				// The re-initialization of 'a_originalOwner' Start
				a_originalOwner.i_name = string ("");
				a_originalOwner.i_ownedPointer = nullptr;
				// The re-initialization of 'a_originalOwner' End
			}
			
			int test2 () {
				Owner l_owner1 ("Owner1", "Owned1");
				Owner l_owner2 ("Owner2", "Owned2");
				copyOrMoveWithMoveSemantics (l_owner1, l_owner2);
				copyOrMoveWithMoveSemantics (l_owner1, Owner ("Owner3", "Owned3"));
			}

'copyOrMoveWithMoveSemantics' is overloaded, and the first 'copyOrMoveWithMoveSemantics' performs a deep copy while the second 'copyOrMoveWithMoveSemantics' performs a shallow copy and reinitializes the original object. So, which is called for each of the 'copyOrMoveWithMoveSemantics' calls in 'test2'? In fact, the first call calls the first version; the second call calls the second version.

Why? . . . Because any rexpression (so-called "rvalue") ops for any '&&' argument (which, in fact, accepts only rexpressions) over any '&' argument while any lexpression (so-called "lvalue") ops for any '&' argument (which accepts also rexpressions if the argument is constant) over any '&&' argument, and "l_owner2" is an lexpression while "Owner ("Owner3", "Owned3")" is an rexpression.

In fact, "move semantics" is a mechanism in which one of the overloads is selected according to whether each of the expressions put into a function is an lexpression or an rexpression.

To repeat, "move semantics" is not 'to perform a "move" action', but 'the mechanism in which the "move" action is selected over the "copy" action according to the expression categories of the expressions put into an overloaded function'.

Also note that "move semantics" does not automatically implement any "move" action: the programmer has to implement the "move" action, which, in fact (from the viewpoint of the mechanism), does not have to be something that really performs a move.


2: Prevalent Use Cases of "Move Semantics"


Hypothesizer 7
Although I used "move semantics" for the independent (not a class method) function in the above example, the most prevalent use cases of "move semantics" are for "move" constructors and "move" assignment operators.

Let me see an example.

@C++ Source Code
#include <string>
#include <iostream>
			
			class Owned {
				public:
					::std::string i_name;
					Owned (::std::string a_name);
			};
			
			class Owner {
				public:
					::std::string i_name;
					Owned * i_ownedPointer;
					Owner (::std::string a_name, ::std::string a_ownedName);
					~Owner ();
					Owner (Owner const & a_originalOwner); // the copy consructor
					Owner (Owner && a_originalOwner); // the "move" constructor
					Owner & operator = (Owner const & a_originalOwner); // the copy assignment operator
					Owner & operator = (Owner && a_originalOwner); // the "move" assignment operator
			};
			
			Owned::Owned (::std::string a_name) : i_name (a_name) {
			}
			
			Owner::Owner (::std::string a_name, ::std::string a_ownedName) : i_name (a_name) {
				i_ownedPointer = new Owned (a_ownedName);
			}
			
			Owner::~Owner () {
				if (i_ownedPointer != nullptr) {
					delete i_ownedPointer;
					i_ownedPointer = nullptr;
				}
			}
			
			Owner::Owner (Owner const & a_originalOwner) {
				::std::cout << "The \"copy\" constructor is called." << ::std::endl << ::std::flush;
				i_name = a_originalOwner.i_name;
				i_ownedPointer = new Owned (a_originalOwner.i_ownedPointer->i_name);
			}
			
			Owner::Owner (Owner && a_originalOwner) {
				::std::cout << "The \"move\" constructor is called." << ::std::endl << ::std::flush;
				i_name = a_originalOwner.i_name;
				i_ownedPointer = a_originalOwner.i_ownedPointer;
				// The re-initialization of 'a_originalOwner' Start
				a_originalOwner.i_name = string ("");
				a_originalOwner.i_ownedPointer = nullptr;
				// The re-initialization of 'a_originalOwner' End
			}
			
			Owner & Owner::operator = (Owner const & a_originalOwner) {
				::std::cout << "The \"copy\" assignment operator is called." << ::std::endl << ::std::flush;
				i_name = a_originalOwner.i_name;
				if (i_ownedPointer != nullptr) {
					delete i_ownedPointer;
					i_ownedPointer = nullptr;
				}
				i_ownedPointer = new Owned (a_originalOwner.i_ownedPointer->i_name);
			}
			
			Owner & Owner::operator = (Owner && a_originalOwner) {
				::std::cout << "The \"move\" assignment operator is called." << ::std::endl << ::std::flush;
				i_name = a_originalOwner.i_name;
				if (i_ownedPointer != nullptr) {
					delete i_ownedPointer;
					i_ownedPointer = nullptr;
				}
				i_ownedPointer = a_originalOwner.i_ownedPointer;
				// The re-initialization of 'a_originalOwner' Start
				a_originalOwner.i_name = string ("");
				a_originalOwner.i_ownedPointer = nullptr;
				// The re-initialization of 'a_originalOwner' End
			}
			
			void test3 () {
				Owner l_owner1 ("Owner1", "Owned1");
				Owner l_owner2 (l_owner1); // calling the copy constructor
				Owner l_owner3 (Owner ("Owner3", "Owned3")); // not really calling the move constructor because of the optimization
				l_owner2 = l_owner3; // calling the copy assignment operator
				l_owner3 = Owner ("Owner4", "Owned4"); // calling the move assignment operator
			}

Note that the "move" constructor is not really called in my environment because the compiler eliminates it in the optimization.


3: Is "Move Semantics" a Good Idea?


Hypothesizer 7
Is "move semantics" really a good idea? . . . Hmm . . .

The problem is that the expression category does not really link to whether the programmer wants to do a "copy" or a "move". Why? . . . Although "move semantics" seems to be based on the assumption, "An expression's being an rexpression or an lexpression should link to the expression value's being a temporary object or not, and the datum's being a temporary object or not should link to whether the programmer wants to do a "move" or a "copy", the former part as well as the latter part of the assumption is not true.

As for the former part, an expression's being an lexpression does not necessarily mean the expression's not representing a temporary object, as been seen in the previous article.

As for the latter part, a datum's not being a temporary object does not necessarily mean that the programmer wants to do a "copy". In fact, such an assertion as "the programmer should not want to do any "move" from any non-temporary datum." is a gratuitous intervention.

Certainly, there is a remedy that makes an lexpression's value "moved" by explicitly turning the lexpression into an rexpression (the remedy is the topic of the next section), but wrongfully linking an lexpression to not being "moved" and demanding the lexpression to be turned into an rexpression evokes a social injustice: why do not we remedy the prejudice itself?

Then, how, do I think, it should have been? . . . In my opinion, the programmer should have been let explicitly specify whether he or she wanted to do a "copy" or a "move" (for example, a notation that means the "move" assignment (for example, '<-') should have been introduced), which seems better than making the programmer wonder whether the expression is an lexpression or an rexpression (the distinction cannot be said to have been prevalently explained correctly and clearly).


4: How to Forcefully Turn any Lexpression into an Rexpression


Hypothesizer 7
As linking being an lexpression with not being "moved" is unnatural, naturally, there are some cases in which I want to "move" the values of some lexpressions. In such a case, I can turn any lexpression into an rexpression by the '::std::move' function.

Let me modify the 'test3' function above to add some use cases of the '::std::move' function.

@C++ Source Code
			// Modified
			void test3 () {
				Owner l_owner1 ("Owner1", "Owned1");
				Owner l_owner2 (l_owner1); // calling the copy constructor
				Owner l_owner3 (Owner ("Owner3", "Owned3")); // not calling the move constructor because of the optimization
				// Added
				Owner l_owner4 (::std::move (l_owner1)); // calling the move constructor
				l_owner2 = l_owner3; // calling the copy assignment operator
				l_owner3 = Owner ("Owner4", "Owned4"); // calling the move assignment operator
				// Added
				l_owner2 = ::std::move (l_owner3); // calling the move assignment operator
			}


5: '&&' for the Return of any Function


Hypothesizer 7
Although I have above seen '&&' for a function argument variable, '&&' can be used also for the return of any function. For what purpose?

'&&' makes any call of the function a reference and an rexpression (an xexpression, to be more specific), which means that such any function call ops for the '&&' argument over the '&' argument when passed into another function.

Note that any '&&' function return type does not turn any lexpression into an rexpression by itself: the function has to return an rexpression.


6: What Is "Xvalue"?


Hypothesizer 7
So, what is "xvalue"? . . . Actually, "xvalue" is a misnomer as "lvalue" and "rvalue" are. In fact, so-called "xvalue" is an expression, not a value, and I call it 'xexpression'.

'Xexpression' is an expression that is explicitly turned into an rexpression whether the original expression was an lexpression or not.

"or not"? Can an rexpression be turned into an rexpression? . . .Well, any rexpression can be passed into the 'move' function, and the function call is an xexpression, although I do not know any objective to do so.

Note that I mean "explicitly turned into an rexpression" in a broad meaning.

For example, when an expression is explicitly turned into an rexpression, whose member access expression is also regarded as "has been explicitly turned into an rexpression".

As another example, any call of any '&&' return function is regarded as "has been explicitly turned into an rexpression".

Let me see such examples in a code.

@C++ Source Code
			Owner && getOwner (Owner & a_owner) {
				return ::std::move (a_owner);
			}
			
			void test4 () {
				Owner l_owner1 ("Owner1", "Owned1");
				::std::move (l_owner1).i_ownedPointer; // "::std::move (l_owner3).i_owenedPointer" is an xexpression.
				getOwner (l_owner1); // "getOwner (l_owner1)" is an xexpression.
				getOwner (l_owner1).i_ownedPointer; // "getOwner (l_owner1).i_owenedPointer" is an xexpression.
			}


7: What Is "Prvalue"?


Hypothesizer 7
So, what is "prvalue"? . . . Again, "prvalue" is a misnomer as "lvalue" and "rvalue" are. In fact, so-called "prvalue" is an expression, not a value, and I call it 'prexpression'.

'Prexpression' is an rexpression that is not explicitly turned into an rexpression.

So, any rexpression that has been an rexpression until the introduction of "move semantics" is a prexpression.


8: Any Expression Is an Lexpression, an Xexpression, or a Prexpression, Exclusively


Hypothesizer 7
After all, any expression is an lexpression, an xexpression, or a prexpression, exclusively.

To explain more, without being explicitly turned into an rexpression, any expression is an lexpression or a prexpression, exclusively, but any lexpression or prexpression can be explicitly turned into an xexpression.


9: What Is "Glvalue"?


Hypothesizer 7
So, what is "glvalue"? . . . Of course, "glvalue" is a misnomer as "lvalue" and "rvalue" are. In fact, so-called "glvalue" is an expression, not a value, and I call it 'glexpression'.

'Glexpression' is an lexpression or an xexpression.




10: Unfathomable Explanations


Hypothesizer 7
Although I have seen an explanation like "Any glvalue has an identity and any rvalue has no identity", that explanation is unfathomable for me.

Any xexpression is a glexpression, and "::std::move (Owner ("Owner1", "Owned1"))" is an xexpression, right? So, does "::std::move (Owner ("Owner1", "Owned1"))", an xexpression, have an identity while "Owner ("Owner1", "Owned1")", a prexpression, has no identity? . . . Does putting a prexpression through the '::std::move' function suddenly give it an identity? . . . Such an explanation is unreasonable, I would say.

And do even all the lexpressions have identities? . . . Let me see an example.

@C++ Source Code
			Owner const & getOwner () {
				return Owner ("Owner1", "Owned1");
			}
			
			void test5 () {
				::std::cout << "\getOwner ()\" is an lexpression whose value address is '" << &(getOwner ()) << "'" << ::std::endl << ::std::flush;
			}

While "getOwner ()" is an lexpression (being able to get the address of its value is a proof), it does not seem to have any identity, to me . . . (Of course, I know returning the temporary object as a reference is a problem, but still, it is an lexpression).

I think that any explanation that tries to determine the category of an expression based on whether the expression has an identity or not is essentially broken.

And I have also seen an explanation like "Any rvalue is near the end of its lifetime", that explanation is also unfathomable for me.

Let me see an example.

@C++ Source Code
			void test6 () {
				Owner l_owner1 ("Owner1", "Owned1");
				::std::move (l_owner1);
				// 'l_owner1' lives as long as this function call is not finished
				//
				//
				::std::cout << "'l_owner1' was not near the end of its lifetime, but is valid here: 'l_owner1.i_name is '" << l_owner1.i_name << "'." << ::std::endl << ::std::flush;
			}

The '::std::move' function does not change any lifetime of any datum, and "::std::move (l_owner1)" (correctly speaking, its value) is not particularly near the end of its lifetime. . . . In the first place, an description like "near the end of its lifetime" is too vague! How near something should be, in order to be an rexpression?

Will not we stop doing or blindly accepting such explanations?


11: What Are So-called "Lvalue Reference" and "Rvalue Reference"? And a Complaint


Hypothesizer 7
"Lvalue reference" does not mean 'a reference that is an "lvalue" (certainly, any "lvalue reference" variable (or any "lvalue reference" return function call) is an lexpression, but that is not the reason of the naming, because any "rvalue reference" variable is also an lexpression), but seems to be intended to mean 'a reference that accepts "lvalues"', but the name is not appropriate (if not wrong): certainly, any "lvalue reference" accepts any lexpression, but any constant "lvalue reference" variable accepts also any rexpression, and such a name that cites only one of the two categories while it accepts both the categories does not seem to be appropriate.

"Rvalue reference" does not mean 'a reference that is an "rvalue" (any "rvalue reference" variable is an lexpression although any "rvalue reference" return function call is an rexpression), but seems to be intended to mean 'a reference that accepts "rvalues"', but the name is not appropriate (if not wrong): as also some "lvalue reference" variables accept rexpressions, the name does not distinguish the two types of references.

. . . Will not we begin to adopt more appropriate terms? . . . Well, how about 'lexpression-phil reference' and 'rexpression-phil reference'? Just adding '-phil' makes them much more appropriate: first, "-phil" clarifies that 'lexpression' or 'rexpression' is not about whether the reference is so, but about for what the reference has affinity; second, "-phil" means that the reference prefers lexpressions or rexpressions, not rejecting the others (so, an lexpression-phil reference's accepting also an rexpression does not betray the name at all). . . . Or '& reference' and '&& reference' would be fine: although they are not names that evoke the concepts, at least they do not irritate me by evoking what they do not really represent.

Anyway, I will use terms, 'lexpression-phil reference' and 'rexpression-phil reference', and 'lexpression-phil reference' is a reference defined with '&'; 'rexpression-phil reference' is a reference defined with '&&'.


12: An Unfathomable Restriction for Lexpression-phil Reference and Rexpression-phil Reference's Partially-Remedial Side Effect


Hypothesizer 7
Although there is the restriction that non-constant lexpression-phil reference does not accept any rexpression, I have not encountered any satisfactory explanation of the rationality of it: why does being an rexpression require constantness?

I guess that the restriction intends to keep temporary objects constant, but being a temporary object has nothing to do with being constant, does it? . . . Someone might say "As any temporary object will vanish in a short while, any change wouldn't be visible after the demise"; I would reply "Certainly. So what? . . . Any temporary object lives for a while (even if it is a short while), and until the demise, any change can be useful and visible!".

Let me see an example.

@C++ Source Code
#include <string>
#include <iostream>
#include <sstream>
	
			::std::string getCapitalizedString (::std::istringstream & a_stream) {
				char l_character = (char) -1;
				::std::string l_string;
				while (a_stream.get (l_character)) {
					l_string += toupper (l_character);
				}
				return l_string;
			}
			
			void test7 () {
				//::std::cout << "The capitalized string is '" << getCapitalizedString (::std::istringstream (::std::string ("abc"))) << "'." << ::std::endl << ::std::flush; // This is unreasonably rejected.
				::std::istringstream l_stream (::std::string ("abc"));
				::std::cout << "The capitalized string is '" << getCapitalizedString (l_stream) << "'." << ::std::endl << ::std::flush;
			}

You know, the stream has to change inside the 'getCapitalizedString' function because reading the stream means proceeding the current character pointer in the stream. The changes are useful and necessary whether the changes are visible after the function call finishes or not.

Is changing any temporary object impossible, because of a technical reason? . . . I do not think so: the proof is the introduction of 'rexpression-phil reference' ('rexpression-phil reference' realizes that).

Now, with the introduction of 'rexpression-phil reference', we are unchained from that unreasonable restriction, partially.

Why partially? . . . Because any rexpression-phil reference can accept only rexpressions. So, we will have to turn any lexpression into an rexpression by putting it through the '::std::move' function (not particularly harmful, but odd) in order to use the lexpression for the rexpression-phil reference.


13: The Conclusion and Beyond


Hypothesizer 7
Now, I seem to understand what "move semantics", "xvalue", "prvalue", and "glvalue" are.

Although I suspect that the introduction of such a semantics and distinctions was a bad decision, I have to face the reality and understand what they are.

As there are some other unsatisfactory explanations on C++, I will try to make more reasonable explanations in future articles.


References


<The previous article in this series | The table of contents of this series | The next article in this series>