2021-10-17

2: Reference in C++: No! Not "an alias of a variable"!

<The previous article in this series | The table of contents of this series | The next article in this series>

"Reference is an alias of a variable." is a too-sloppy explanation and is untrue and missing the point of why reference is meaningful.

Topics


About: C++

The table of contents of this article


Starting Context


  • The reader has a basic knowledge on C++, even if he or she doesn't accurately understand its some widely-misrepresented elements.

Target Context


  • The reader will have a reasonable explanation of what reference is in C++.

Orientation


There is an article that distinguishes 'datum', 'variable', 'expression', and 'value'.


Main Body

Stage Direction
Here is Special-Student-7 in a room in an old rather isolated house surrounded by some mountains in Japan.


1: "Reference is an alias of a variable." Is Untrue


Special-Student-7-Hypothesizer
The most pervasively accepted explanation of C++ reference seems to be "Reference is an alias of a variable.".

Special-Student-7-Rebutter
"alias" should mean 'another name'.

Special-Student-7-Hypothesizer
So, "an alias of a variable" should mean 'another name for a variable', which should imply that the variable had existed since before the reference and the variable had had the canonical name since before the reference.

Special-Student-7-Rebutter
I am not sure whether one of the names have to be declared to be canonical, but at least, a name must have existed before the alias, because otherwise, the alias would not be any "another" name.

Special-Student-7-Hypothesizer
I am calling it "canonical" in order to distinguish it from the alias, although it may not be so "canonical".

Now, let us see this code.

@C++ Source Code
#include <iostream>

void referenceArgumentFunction (int const & a_integerReference) {
    ::std::cout << "### 'a_integerReference' is " << a_integerReference << " at " << &a_integerReference << "." << ::std::endl << ::std::flush;
}

int main (int a_argumentsNumber, char const * a_arguments []) {
	referenceArgumentFunction (2 * 3);
	int l_integer1 (2 * 3);
	::std::cout << "### 'l_integer1' is " << l_integer1 << " at " << &l_integer1 << "." << ::std::endl << ::std::flush;
	int l_integer2 (2 * 3);
	::std::cout << "### 'l_integer2' is " << l_integer2 << " at " << &l_integer2 << "." << ::std::endl << ::std::flush;
}

@Output
### 'a_integerReference' is 6 at 0x7ffecef538ec.
### 'l_integer1' is 6 at 0x7ffecef5385c.
### 'l_integer2' is 6 at 0x7ffecef53858.

"a_integerReference" is a reference, but it is another name of what variable?

Special-Student-7-Rebutter
I can guess that the people who are happy with that sloppy explanation would say that it was "2 * 3", but "2 * 3" is not really any variable.

Special-Student-7-Hypothesizer
You know, any variable is a box that can contain a datum, but "2 * 3" is not any box. It is a datum, not any variable.

Special-Student-7-Rebutter
Well, apparently, they are meaning the datum by "variable".

Special-Student-7-Hypothesizer
But they sometimes mean variable by "variable".

Special-Student-7-Rebutter
The obvious problem is that those people are quite lax in their terminology.

Special-Student-7-Hypothesizer
The obvious issue is that the distinction between datum and variable is crucial in C++.

Special-Student-7-Rebutter
The distinction is crucial in any programming language; a variable takes a datum now and another datum later and a datum is shared by multiple variables (at least in programming languages that have pointers like Java and Python). We cannot properly talk on programming without the 2 concepts distinguished.

Special-Student-7-Hypothesizer
Especially, we are talking about reference, which is an occasion that requires a special attention to the distinction . . .

Special-Student-7-Rebutter
In fact, the distinction is at the center of the argument.

Special-Student-7-Hypothesizer
Besides, supposing (falsely) "2 * 3" was a "variable", what would be its canonical name?

Special-Student-7-Rebutter
Well, would they say that it was "2 * 3"?

Special-Student-7-Hypothesizer
But any name has to identify the entity represented by the name (which should be the purpose of the name), but "2 * 3" is not identifying the "variable" (in fact, a datum), as has been shown by the outputs concerning "l_integer1" and "l_integer2".

Special-Student-7-Rebutter
Also "l_integer1" and "l_integer2" contain "2 * 3", but the 3 "2 * 3"s are different entities, as the addresses of them are different.

Special-Student-7-Hypothesizer
So, "2 * 3" is not identifying the "variable", so it cannot be called to be any name of the "variable".

Special-Student-7-Rebutter
Probably, they would say a thing like ""2 * 3" is a temporary name that is valid only at the specific point.".

Special-Student-7-Hypothesizer
A very far-fetched story. . . . Where are the specifications about such "temporary names" written?

Special-Student-7-Rebutter
I do not know.


2: "Reference is an alias of a variable." Is Missing the Point of Why Reference Is Meaningful


Special-Student-7-Hypothesizer
The sloppy explanation says "alias", but who needs such an alias? I mean, why would I want to call a variable 'l_integer1Alias' instead of the canonical 'l_integer1'?

Special-Student-7-Rebutter
I cannot imagine, if we are really talking about an alias of a variable. It is just confusing to call the same variable with different names.

Special-Student-7-Hypothesizer
The reason why the sloppy explanation is unwise is that it is missing the point of why reference is meaningful: any reference is meaningful because the reference is a new variable.

Special-Student-7-Rebutter
That above "a_integerReference" is a new variable in its own right, not another name for a variable that exists somewhere else.

Special-Student-7-Hypothesizer
That is the point! As "a_integerReference" is a new variable, it has a new scope and a new constantness, which is the reason why we have used the reference.

Let us see this code.

@C++ Source Code
int l_integer = 1;
int const & l_integerReference = l_integer;

"l_integerReference" is meaningful because it is a new variable that has a new constantness.

"an alias of a variable" is really meaningless.

Special-Student-7-Rebutter
Again, the wording of the sloppy explanation is really bad.


3: What Reference Really Is and Its Merits


Special-Student-7-Hypothesizer
In fact, there are two kinds of references: reference variables and reference 'function returns'. Although the term, 'reference', has a common undertone in the two, it will be helpful to first study each of them independently.

Any reference variable is a variable that is defined over an existing datum.

Let us refrain from using inappropriate words like 'alias' here. Any alias is just a name, but any reference variable is not just a name, but a named box that has its own attributes like type, scope, and constantness.

What does "defined over an existing datum" mean? . . . When any non-reference variable is defined, it is allocated at an empty lot. So, when we want to set an existing datum, which exists somewhere, into the variable, we copy the datum into the box, which is the variable. On the other hand, when any reference variable is defined, it is defined at the location where the specified existing datum resides, which is the reason why we have to associate a datum to the reference variable when the reference variable is defined: otherwise, the location of the reference variable could not be determined.

It is also natural that any reference variable cannot be associated to another datum after being defined: if it could, that would mean that the reference variable was moved to the location of the another datum, but no variable can be moved in C++ (although there is the confusing term, "move semantics", nothing is really moved in it (I will discuss "move semantics" in a future article)).

Do you see? With a proper explanation, all the pieces fit together.

Special-Student-7-Rebutter
Yes. With the sloppy explanation, the restriction sounds like a mean needless prohibition; why should an "alias" not be reassigned to another "variable"?

Special-Student-7-Hypothesizer
On the other hand, any reference 'function return' is a mechanism in which any function call expression of the function represents the datum specified by the return statement, not any copy of the datum.

Again, note that the word, 'alias', does not apply here. As any function return does not have any name and any function call expression is not any name, there is no name involved there.

After all, the common trait of any reference whether it is a reference variable or a reference 'function return' is that it is something that represents an existing datum.

A typical use case of reference variable is a reference variable argument of a function: as the reference variable argument has the scope inside the function, a datum that was inaccessible inside the function becomes accessible inside the function by being represented by the reference variable argument.

As another use case, a class member reference variable makes a datum accessible inside the class.

'Scope' is not necessarily the reason why a reference variable is used, though: having a constant version of an original variable can be useful: it can enhance the readability of the source file because any reader can be basically (not definitely, as constantness can be wantonly stripped off in C++) assured that the datum represented by the reference variable is not changed via the reference variable (tracing the changes of a datum is a typical concern of reading a source file, right?).

Although any reference 'function return' is not any variable, its usefulness is similar: the function caller can see a datum that could been seen only in the function otherwise.

Anyway, an essence of using a reference (whether it is a reference variable or a reference 'function return') is to directly use an existing datum, without creating unnecessary copies of the datum.

Having unnecessary copies is a nuisance because the copying acts are costly, copies occupy memory space, and having copies raises the doubt as to the consistency among the copies and necessitates synchronizing the copies.

As for the last reason, when I regard a program as a simulation of a reality, which I always do, having a single instance in the program for a single entity in the reality is reasonable and natural.


4: The Lifetime of the Value of any Reference Variable: Reference Is Not Particularly Safe


Special-Student-7-Hypothesizer
As any reference variable is a variable, it has a scope. However, the value of the reference variable is not necessarily valid throughout the scope. Let me see an example.

@C++ Source Code
#include <iostream>
#include <string>

class Greeter {
	private:
		::std::string const i_name;
		::std::string const & i_nickname;
	public:
		Greeter (::std::string const a_name, ::std::string const & a_nickname);
		void selfIntroduce ();
};

Greeter::Greeter (::std::string const a_name, ::std::string const & a_nickname) : i_name (a_name), i_nickname (a_nickname) {
	::std::cout << "### A Greeter instance is created with the name/nickname, '" << i_name << "'/'" << i_nickname << "'." << ::std::endl << ::std::flush;
}

void Greeter::selfIntroduce () {
	::std::cout << "### I am " << i_name << " also known as " << i_nickname << "!" << ::std::endl << ::std::flush;
}

int main (int a_argumentsNumber, char const * a_arguments []) {
	Greeter l_greeter (::std::string ("Greeter 1"), ::std::string ("Reeter 1"));
	Greeter l_greeted (::std::string ("Greeted 1"), ::std::string ("Reeted 1"));
	l_greeter.selfIntroduce ();
	l_greeted.selfIntroduce ();
}

The datum created by the expression, '::std::string ("Reeter 1")', exists until the statement in which the expression appears finishes; after the statement, although the object, 'l_greeter', and its member reference variable, 'i_nickname', exist, the value of the reference variable is not valid: the output is a weird one in my computer (might be not so in some computers).

As is seen in that example, using a reference, instead of a pointer, does not guarantee safety: an invalid memory area could be accessed through the reference.


5: The Difference from Pointer


Special-Student-7-Hypothesizer
Pointer is different from reference variable, and the distinction is very important.

There are some people who adamantly refuse to make the distinction, but it is a very bad move, I say.

Special-Student-7-Rebutter
What that article is saying is that they want to jumble the 2 distinct concepts of pointer and reference together as just "reference", and they want to call Java variables and Python variables "references", which are really pointers.

Special-Student-7-Hypothesizer
What we are saying is that such a jumbling does not work: the distinction has to be clearly made.

Special-Student-7-Rebutter
Certainly, naming is somewhat arbitrary and people may have different tastes for naming, but at least, the demarcation of important concepts must be made, which is what we are claiming.

Special-Student-7-Hypothesizer
Any pointer is different in that it is a box allocated at an empty lot, compared with any reference variable, which is a box defined over an existing datum, and is different in that its contents are an address of a datum, compared with any reference variable, whose contents are a datum.

The distinction is very important, because without the correct knowledge of it, you cannot do appropriate programming.


6: What "Pass-by-Reference" Means, Compared with "Pass-by-Value", an Odd Term


Special-Student-7-Hypothesizer
"pass-by-reference" is a prevalent term, but what it means is not "pass a reference", but 'pass by the reference mechanism'.

Special-Student-7-Rebutter
"pass a reference" would be odd: any reference variable is not something to be passed, because any reference argument is something newly defined inside the function.

Special-Student-7-Hypothesizer
What is being passed is not the reference, but the datum (which is the value of the reference variable) over which the reference argument is defined.

On the other hand, what does "pass-by-value" mean? . . . That is an odd term, actually.

It is odd, because it should not mean "pass a value", because also "pass-by-reference" passes a value (the datum passed by pass-by-reference is the value of the reference argument), and it should not mean "pass by the value mechanism", because what the hell is a "value mechanism"?

"pass-by-copy" will be a better term.

In pass-by-copy, the value is copied into the function argument.

When the function argument is a pointer, the value is passed by pass-by-copy, because the pointer takes an address, the address is copied into the pointer, and the address is nothing but the value of the pointer.

What happen for Java and Python function arguments are exactly that, and that is the reason why Java and Python variables must be called pointers.


References


<The previous article in this series | The table of contents of this series | The next article in this series>