2018-11-18

1: Datum, Variable, Expression, Value in C++ or Any Programming Language

<The previous article in this series | The table of contents of this series | The next article in this series>

Regrettably, those terms are prevalently being sloppily used, and such usage has ruined many descriptions that could have been intelligible otherwise.

Topics


About: C++

The table of contents of this article


Starting Context


  • The reader has a basic knowledge on C++, even if he or she doesn't accurately understand its some widely-misrepresented elements.

Target Context


  • The reader will distinctively understand what 'datum', 'variable', 'expression', and 'value' are in C++ or any other programming language.

Orientation


Hypothesizer 7
Such things as 'datum', 'variable', 'expression', and 'value' are too obvious? . . . But in reality, those terms are not used with each clearly distinguished from the others, in the world, and that is a cause of many unreasonable explanations about C++ or any other programming language.

For example, I frequently see explanations like "'C++ reference' is an alias for a variable.". . . . Really? . . . Let me think of an example (an example of a typical usage of 'reference', not any unlikely nit-picking example).

@C++ Source Code
#include <iostream>

void referenceArgumentFunction (int const & a_integerReference) {
	::std::cout << "### The argument value is " << a_integerReference << "." << ::std::endl << ::std::flush;
}

int main (int a_argumentsNumber, char const * a_arguments []) {
	referenceArgumentFunction (2 * 3);
}

Which is the variable for which the reference ('a_integerReference') is an alias? . . . '2 * 3'? Is '2 * 3' officially a variable? . . . If so, I demand the definition of 'variable' on which the statement is based.

Well, why do not we use terms more carefully? . . . One of my primary opinions is that any adequate description is based on an adequate terminology, and in any adequate terminology, each term has to represent a single distinct concept.

Of course, I know that each of most words in human languages has actually several meanings, but in my opinion, it is that such a word is used in multiple domains as a term in each domain, and in each domain, each term still has to have a single meaning (I am saying that that is a target to be aimed at, not that I manage to always hit the bull's-eye). For example, 'variable' has several meanings (in the domain of meteorology, 'variable' seems to mean a variable wind as contrasted with 'trade wind'), but has to represent a single concept in the domain of programming language.

I also insist that misnomers should be avoided.

For example, one of my pet peeves is "serverless". Should one call something "serverless" while it is not serverless at all? . . . Certainly, logically speaking, if a term is defined clearly and used consistently, any misunderstanding may be avoided, whatever the name is (any name is intrinsically arbitrary). However, any name has a certain connotation based on the name and unavoidably evokes an image that corresponds to the connotation: the name, "serverless", evokes (against my will) an image of something serverless and I have to correct myself that it is not serverless at all, which irritates me every time when I encounter the usage of "serverless". . . . In fact, I suspect an intention of fraud in such a term. So, I think that any term should not be given any name that has a connotation that blatantly betrays the concept represented by the term.

In this article, I will distinguish four basic terms, 'datum', 'variable', 'expression', and 'value'.


Main Body


1: A Note


Hypothesizer 7
I know that my definitions of the four terms, 'datum', 'variable', 'expression', and 'value', may not coincide with some people's (for example, some people use "value" meaning 'datum' for me), and also I know that I am not in any position to push my definitions to anybody: any one can make a different terminology.

However, I insist that in any terminology, each term has to represent a single distinct concept and has to be used consistently meaning the concept. As I cannot find any satisfactory widely-used terminology, I have no option but to make one myself: I am not particularly trying to be different.


2: What Is 'Datum'?


Hypothesizer 7
What is 'datum'? . . . Any datum ('memory datum', to be exact) is a single presumed physical in-memory existence of a piece of information.

Please note the traces of painstaking effort in that expression.

First, there can be file data in hard disks, of course, but I am talking about only memory data in this article, and I use 'datum' meaning 'memory datum' in this article (seeing 'memory datum' every time would be annoying, right?).

Second, I am not at all interested in whether a bits sequence is really stored in a stack or lives only in a CPU registry (or totally eliminated as unnecessary): that is a matter of the optimization of each compiler implementation, which I do not think a matter that programmers have to be concerned with. So, I presume that any so-called "temporary object" is stored in a stack (so, any "temporary object" is a datum), although I know that that is not guaranteed. That is the meaning of "presumed" in the expression above.

Third, any datum is a single physical existence, meaning that if a piece of information has two copies in the memory, those copies are two data.


3: What Is 'Variable'?


Hypothesizer 7
What is 'variable'? . . . Any variable is a named box that can contain a datum at a time.

Note that I used the word, 'named'. Any area that is accommodating a datum can be called a box, but that is not necessarily a variable because the box may not be named.


4: What Is 'Expression'?


Hypothesizer 7
What is 'expression'? . . . Any expression is a string in a source file that represents a datum in a run time.

For example, these are expressions.

@C++ Source Code
#include <iostream>

void function () {
	int l_integer = 0;
	int * l_integerPointer = &l_integer;
	int & l_integerReference = l_integer;
	
	::std::cout << "### " << l_integer << ::std::endl << ::std::flush; // "l_integer" is an expression.
	::std::cout << "### " << l_integerPointer << ::std::endl << ::std::flush; // "l_integerPointer" is an expression.
	::std::cout << "### " << *l_integerPointer << ::std::endl << ::std::flush; // "*l_integerPointer" is an expression.
	::std::cout << "### " << l_integerReference << ::std::endl << ::std::flush; // "l_integerReference" is an expression.
	::std::cout << "### " << &l_integerReference << ::std::endl << ::std::flush; // "&l_integerReference" is an expression.
	::std::cout << "### " << l_integer + 1 << ::std::endl << ::std::flush; // "l_integer + 1" is an expression.
	::std::cout << "### " << 2 * 3 << ::std::endl << ::std::flush; // "2 * 3" is an expression.
}

Note that any variable name can constitute an expression by itself because that represents the datum contained in the variable; any string that creates a so-called "temporary object" is an expression because the "temporary object" is a datum in my definition, and the string represents the datum.


5: What Is 'Value'?


Hypothesizer 7
What is 'value'? . . . Any value is the datum represented by an expression. Especially, the datum contained in a variable is the value of the variable.

Note that a value without specifying the expression of which the value is the value is nonsense: any value is the value of an expression.

So, any value is a datum, but any datum mentioned without related with any expression has to be called 'datum', not 'value'.


6: Some Examples of 'Datum', 'Variable', 'Expression', and 'Value'


Hypothesizer 7
Let me see this code.

@C++ Source Code
void function () {
	int l_integer = 1 * 2 * 3;
	int * l_integerPointer = &l_integer;
	int & l_integerReference = l_integer;
}

'1', '2', '3', '2' (created by '1 * 2'), '6' (created by '1 * 2 * 3'), and the address of '6', which are presumed to be stored in the memory in a run time, are data. Note that the datum, '1', may not be really stored in the memory, but only in a CPU register, but that is a matter the compiler should be concerned with (the compiler can place anything anywhere as far as that does not break the promised behavior of the program), not any matter the programmer should be concerned with, in my opinion. Also note that the two '2's are different data (again, I am not concerned with whether the compiler really allocate two areas for the two '2's: that is a matter of the optimization).

'l_integer', 'l_integerPointer', and 'l_integerReference' are variables.

"1", "2", "3", "1 * 2", "1 * 2 * 3", "&l_integer", and "l_integer" (in the third line), which are strings in the source file, are expressions. Note that the expressions exist in the source file, contrasted with the data that exist in a run time.

The values of the expressions, "1", "2", "3", "1 * 2", "1 * 2 * 3", "&l_integer", and "l_integer", are the data, '1', '2', '3', '2' (a different datum from the previous '2'), '6', the address of '6', and '6' (the same datum with the previous '6') respectively.


7: The Distinction Between 'Datum Type' and 'Variable Type'


Hypothesizer 7
As the terms literally say, 'datum type' is the type of a datum and 'variable type' is the type of a variable, and the two are different things.

I have expressly said so because I frequently see descriptions that use those terms confusingly.

For example, this document about UNO types claims to be about datum types, but is 'any' really a datum type? . . . I do not think so. 'any' is a variable type, not a datum type: there is no such a thing as a datum of the 'any' type, but a 'any' type variable (a box) that can contain any datum (the datum contained in the variable is sometimes a 'int' datum, sometimes a 'string' datum, but never an 'any' datum).


8: 'Pointer' and 'Reference' Are Variable Types, Not Datum Types


Hypothesizer 7
Being a normal variable (I mean a variable that is not any pointer or any reference), a pointer, or a reference is about a variable, not about any datum.

Let me see this example.

@C++ Source Code
class ClassA {
	protected:
		int i_memberA;
	public:
		ClassA (int a_memberA) : i_memberA (a_memberA) {
		}
};
class ClassB : public ClassA {
	public:
		ClassB (int a_memberA) : ClassA (a_memberA) {
		}
};

void function () {
	ClassA * l_classAPointer = new ClassB (2 * 3);
	ClassA & l_classAReference = *l_classAPointer;
	delete l_classAPointer;
}

One will understand that a question like "Is 'ClassB' a reference type or a pointer type?" is nonsense: 'ClassB' is a datum type, while 'ClassA *' is a pointer type and 'ClassA &' is a reference type.

Then, is not a statement like "Classes in Java are reference types" nonsense? . . . Hmm, that is at least a very weird expression. Here, I try to make more reasonable statements, but note that I use C++ terms of 'pointer' and 'reference' if not stated explicitly otherwise: "reference" in the Java terminology is really 'pointer', not 'reference' in the C++ terminology.

More reasonably speaking, in Java, a class can be a datum type (not every class can be a datum type because any abstract class cannot be instantiated), but any class cannot be any variable type although any class instance address can be a variable type.

In other words, Java does not allow any normal class instance variable or any class instance reference variable, but any class instance pointer variable.

Strictly speaking, classes in Java are not reference types (whatever that means), but are datum types for whose instances Java allows only reference (in the Java terminology) variables.

Let me see this example.

@Java Source Code
class ClassA {
	protected int i_memberA;
	
	public ClassA (int a_memberA) {
		i_memberA = a_memberA;
	}
}

class ClassB extends ClassA {
	public ClassB (int a_memberA) {
		super (a_memberA);
	}
}

class ClassC {
	public void funtion () {
		ClassA l_classAPointer = new ClassB (2 * 3);
	}
}

Is not the class, 'ClassA', a reference (in the Java terminology) type? . . . It is not so. Although the Java syntax is the cause of the confusion, "ClassA l_classAPointer" is really 'ClassA * l_classAPointer', and 'ClassA *' is the real variable type; it is just that 'ClassA * l_classAPointer' is written as 'ClassA l_classAPointer' in Java, syntactically, omitting the obvious '*' (obvious because there is no option but put '*' there in Java).


9: The Conclusion and Beyond


Hypothesizer 7
Now, I distinctively understand what 'datum', 'variable', 'expression', and 'value' are.

So, what is 'reference' (in 'Orientation', I explained why an explanation like "'C++ reference' is an alias for a variable." is unacceptable )? I will take on that in the next article of this series.

Related with the term, 'value', what are "lvalue" and "rvalue"? . . . Well, I suspect that they are misnomers (I am not saying only about the "left" or "right" part, but also about the "value" part). Are "lvalue" and "rvalue" really values? I will look into what "lvalue" and "rvalue" are in a future article.


References


<The previous article in this series | The table of contents of this series | The next article in this series>