2018-11-25

2: Reference in C++: No! Not "an alias of a variable"!

<The previous article in this series | The table of contents of this series | The next article in this series>

Any sloppy explanation like "reference is an alias of a variable" must be sent to grass. It is a shame if one does not perceive it as sloppy.

Topics


About: C++

The table of contents of this article


Starting Context


  • The reader has a basic knowledge on C++, even if he or she doesn't accurately understand its some widely-misrepresented elements.

Target Context


  • The reader will have a reasonable explanation of what 'reference' is in C++.

Orientation


Hypothesizer 7
I have found many explanations of what 'reference' is in C++, but not any satisfactory one.

A typical explanation is like "'C++ reference' is an alias for a variable.". . . . Certainly, that explanation well explains this case (in fact, this usage does not seem meaningful, because I do not see any necessity to have that alias: why do not I use the original variable instead?.).

@C++ Source Code
int l_integer = 1;
int & l_integerReference = l_integer;

The reference, 'l_integerReference', is an alias of the variable, 'l_integer', right? . . . However, in the succeeding code, the reference, 'a_integerReference', is an alias of . . . which variable?

@C++ Source Code
#include <iostream>

void referenceArgumentFunction (int const & a_integerReference) {
    ::std::cout << "### 'a_integerReference' is " << a_integerReference << " at " << &a_integerReference << "." << ::std::endl << ::std::flush;
}

int main (int a_argumentsNumber, char const * a_arguments []) {
	referenceArgumentFunction (2 * 3);
	int l_integer1 (2 * 3);
	::std::cout << "### 'l_integer1' is " << l_integer1 << " at " << &l_integer1 << "." << ::std::endl << ::std::flush;
	int l_integer2 (2 * 3);
	::std::cout << "### 'l_integer2' is " << l_integer2 << " at " << &l_integer2 << "." << ::std::endl << ::std::flush;
}

'2 * 3'? Is '2 * 3' officially a variable? . . . I do not think so because of the following reason.

First, let me clarify what 'name' is. Any name is an expression that represents the same entity any time it is used. If an expression represents different entities in two occurrences, that expression is not any name.

Second, as any variable is a named box, anything that does not have any name is not any variable.

As '2 * 3' represents different entities in multiple occurrences (the result of the above code in my computer shows that the first '2 * 3' and the second '2 * 3' are two data at different locations (in another computer, the result may be different; but at least, the three '2 * 3's cannot be at the same location)), "2 * 3" is not any name, and '2 * 3' is not any variable, which is a named box.

Besides, 'reference' is not any alias (another name) in the following case.

@C++ Source Code
int l_integer = 1;
int const & l_integerReference = l_integer;

Not as in the first code, the reference, 'l_integerReference', is meaningful, because it is not another name for the variable, 'l_integer', but a different variable that has a different attribute: being constant.

So, that explanation seems intrinsically unsatisfactory.

Another typical explanation is like "'C++ reference' is an alias for an object". . . . Hmm, that explanation does not make sense to me. As any alias is another name, first, there must be an original name, and then an alias can exist. While an object, for example the one created by the expression, "2 * 3", does not have any name (as discussed above), which is the original name? . . .

And again, any explanation as 'reference''s being a name is not satisfactory.

How about an expression that 'reference' is a fancy pointer? . . . Actually, I used to adopt that interpretation, and I will consider that interpretation later, although I do not adopt that interpretation now.


Main Body


1: A Note


Hypothesizer 7
I am not interested in swallowing what an official document says or how existing implementations are like. The reasons are explained in an article. I am interested in a theory that consistently explains the whole behavior (there can be multiple such theories).


2: What 'Reference' Is


Hypothesizer 7
In fact, there are two kinds of references: reference variables and reference 'function returns'. Although the term, 'reference', has the same meaning in the two, it will be helpful to first study each of them independently.

Any reference variable is a variable that is defined over an existing datum.

Let us refrain from using words like 'alias' carelessly. Any alias is just a name, but any reference variable is not just a name, but a named box that has independent attributes like 'type', 'scope', and 'constantness'.

What does "defined over an existing datum" mean? . . . When any non-reference variable is defined, it is allocated at an empty lot. So, when we want to set an existing datum, which exists somewhere, to the variable, we copy the datum into the box, which is the variable. On the other hand, when any reference variable is defined, it is defined at the location where the specified existing datum resides, which is the reason why we have to associate a datum to the reference variable when the reference variable is defined: otherwise, the location of the reference variable cannot be determined.

It is also natural that any reference variable cannot be associated to another datum after being defined: if it could, that would mean that the reference variable was moved to the location of the another datum, but no variable can be moved in C++ (although there is the confusing term, "move semantics", nothing is really moved in it (I will discuss "move semantics" in a future article)).

Any reference 'function return' is a mechanism in which any function call expression of the function represents the datum specified by the return statement, not any copy of the datum.

Again, note that the word, 'alias', does not apply here. As any function return does not have any name and any function call expression is not any name, there is no name involved there.

After all, the common trait of any reference whether it is a reference variable or a reference 'function return' is that it is something that represents an existing datum.


3: The Usefulness of 'Reference'


Hypothesizer 7
A reason why a reference variable is useful is that it is not a name, but a variable that has its own attributes ('type', 'scope', 'constantness', etc.).

In fact, having multiple names for a variable does not seem useful: whichever name is used, it means the same variable with the same type, the same scope, and the same constantness.

A typical use case of 'reference variable' is a reference variable argument of a function: as the reference variable argument has the scope inside the function, a datum that was inaccessible inside the function becomes accessible inside the function by being represented by the reference variable argument.

As another use case, a class member reference variable makes a datum accessible inside the class.

'Scope' is not necessarily the reason why a reference variable is used, though: having a constant version of an original variable can be useful: it can enhance the readability of a source file because any reader can be basically (not definitely, as 'constantness' can be wantonly stripped off in C++) assured that the datum represented by the reference variable is not changed via the reference variable (tracing the changes of a datum is a typical concern of reading a source file, right?).

Althogh any reference 'function return' is not any variable, its usefulness is similar: the function caller can see a datum that is seen in the function.

Anyway, an essence of using a reference (whether it is a reference variable or a reference 'function return') is to directly use an existing datum, without creating unnecessary copies of the datum.

Having unnecessary copies is a nuisance because the copying acts are costly, copies occupy memory space, and having copies raises the doubt as to the consistency among the copies and necessitates synchronizing the copies.

As for the last reason, when I regard a program as a simulation of a reality, which I always do, having a single instance in the program for a single entity in the reality is reasonable and natural.


4: The Necessity of 'Reference'


Hypothesizer 7
But, we can use 'pointer' in order to do anything we can do using 'reference', right? . . . Right, if we do not mind certain syntax.

A favorable case for 'reference' is an operator argument.

For example, if the argument of a '+' operator of a class was a pointer in order to prevent the so-called actual argument datum from being copied, we would have to specify an address for the right hand side of the '+' operator: so, it would look like we are adding an address to a non-address. Although that is not necessarily wrong or bad (considering that any operator is a function), the intuitive feeling of the '+' operator would be broken.


5: The Lifetime of the Value of any Reference Variable


Hypothesizer 7
As any reference variable is a variable, it has a scope. However, the value of the reference is not necessarily valid throughout the scope. Let me see an example.

@C++ Source Code
#include <iostream>
#include <string>

class Greeter {
	private:
		::std::string const i_name;
		::std::string const & i_nickname;
	public:
		Greeter (::std::string const a_name, ::std::string const & a_nickname);
		void selfIntroduce ();
};

Greeter::Greeter (::std::string const a_name, ::std::string const & a_nickname) : i_name (a_name), i_nickname (a_nickname) {
	::std::cout << "### A Greeter instance is created with the name/nickname, '" << i_name << "'/'" << i_nickname << "'." << ::std::endl << ::std::flush;
}

void Greeter::selfIntroduce () {
	::std::cout << "### I am " << i_name << " also known as " << i_nickname << "!" << ::std::endl << ::std::flush;
}

int main (int a_argumentsNumber, char const * a_arguments []) {
	Greeter l_greeter (::std::string ("Greeter 1"), ::std::string ("Reeter 1"));
	Greeter l_greeted (::std::string ("Greeted 1"), ::std::string ("Reeted 1"));
	l_greeter.selfIntroduce ();
	l_greeted.selfIntroduce ();
}

The datum created by the expression, '::std::string ("Reeter 1")', exists until the statement in which the expression appears finishes; after the statement, although the object, 'l_greeter', and its member reference variable, 'i_nickname', exist, the value of the reference variable is not valid: the output is a weird one in my computer (might be not so in some computers).


6: The Safety of 'Reference'


Hypothesizer 7
As seen in the previous section, using a reference, instead of a pointer, does not guarantee safety: an invalid memory area could be accessed through the reference.


7: "Pass-by-Reference"?


Hypothesizer 7
"pass-by-reference"? . . . Hmm, that seems a weird expression.

Pass-by-value means that a value is passed (to be exact, copied into the function); pass-by-address means that an address is passed (copied into the function), right? Then, in a "pass-by-reference", is a reference passed (copied into the function)? . . . Such a statement does not make sense to me in any definition of 'reference'. In my definition of 'reference', the reference is newly created in the function (so, it cannot or need not be passed); in a definition of 'reference''s being an alias, is a name ('alias' means 'another name') passed? But passing a name does not seem useful, because what the function can do just with the name (analyzing a name like "a_integerReference" character by character does not let the function access any datum)?

What is really passed in "pass-by-reference" seems to be an address, according to my guess; so, "pass-by-reference" is really a pass-by-address, in regard to what is passed. The difference between using a reference or a pointer as a function argument seems to be not what is passed (an address is passed either way), but how the passed address is used (whether a variable is defined at the address or the address is stored into the pointer).


8: Can 'Reference' Be Interpreted as a Fancy Pointer?


Hypothesizer 7
In fact, I had regarded 'reference' as a mutated pointer, for a long time. That is, I had thought that any reference contained an address as any pointer did although it was treated weirdly syntactically.

Is that interpretation valid? . . . I think so, although I do not adopt that now, in favor of the interpretation explained above.

In fact, as far as I accept an explanation that "C++ just treats 'reference' so syntactically.", any behavior would be able to be explained, probably.

For example, in this example, I say that ""a_argument2 = a_argument1" means really '*a_argument2 = *a_argument1' although C++ requires the former expression syntactically."

@C++ Source Code
void swap (int & a_argument1, int & a_argument2) {
	int l_savedArgument2 = a_argument2;
	a_argument2 = a_argument1;
	a_argument1 = l_savedArgument2;
}

Then, how do I explain the fact that the address of any reference as a pointer cannot be gotten (the '&' operator returns the address of the datum pointed by the reference, not the address of the reference as a pointer)? Is that not a proof that any reference is not a pointer? . . . Not really. As I have adopted the explanation, "C++ just treats 'reference' so syntactically.", I just say, "the '&' operator in C++ just returns the address of the datum pointed by the reference, syntactically, and C++ does not allow any syntax that gets the address of the reference as a pointer, although there really exists the address of the reference as a pointer.".

Whether such an explanation that attributes anything to odd syntax is desirable is another story. However, to repeat, I do not accept any refutation that that explanation does not accord with most (or even all the) existing implementations: any implementation does not dictate the specification; there can appear a new implementation that is totally different from the existing implementations in future, theoretically.


9: The Conclusion and Beyond


Hypothesizer 7
Now, I have a satisfactory explanation on what 'reference' is.

There are some other concepts whose prevalent explanations are not satisfactory, in C++. I will try to make reasonable explanations on those concepts, in future articles. "lvalue" and "rvalue" are two of them.


References


<The previous article in this series | The table of contents of this series | The next article in this series>