C++ references - Quality Coders

C++ references are one of the basic and almost mandatory topics to be covered. In this post, we will focus on the most crucial things. The examples will be very concise and easy to follow.

We will work on the example repository, available on my GitHub page. [1] I highly recommend using the c++ environment setup that is described in our previous post. [2]

Table of Contents

Very Brief definition

We can think of the reference as an alias to an already existing variable. References are usually considered a safer alternative to pointers. Although it is not guaranteed to be 100% safe so we still need to be careful with its usage.

Unique Features

In order to have a good fundamental understanding of what the c++ references are we will need to talk about some of its unique features. Let’s list those features in bullets:

Cannot be lazily initialized.
Cannot be re-assigned.
They don’t have their own identity.
Cannot have multiple levels of indirection.
They are not iterable.
Cannot be stored in contiguous memory space.

Lazy initialization

Lazy initialization means postponing the creation of the variable. Reference demands setting the value immediately. It is a strong constraint but also it makes code more straightforward, easy to read, and less error-prone.

Usually, it is recommended to use the reference because of that constraint but lazy initialization has its purpose. For instance when we need to create a complex object that requires multiple steps in different time frames. In contrast, pointers can be lazily initialized or even reset due to nullptr usage.

In the code below we can see a straightforward example of lazy initialization. The file can be found in our example repository, the file is named – lazy_initialization.cpp.The ptr is initialized with nullptr, meaning no value. That means we are postponing its initialization. In the next line, we can see the commented code that would result in a compilation error because every reference variable needs to be instantiated with value. That means we cannot do on them the lazy initialization as we did with ptr. The line with number 9 shows the valid way of creating a reference variable named ref. In line number 10 we actually lazy initialize the ptr with the same value as the ref variable.

After printing out to the console we reset the ptr variable by assigning nullptr which means we can reuse it later in a different context. This approach cannot be done with reference. Once it is initialized to point to the given object it will stay that way for the whole life scope of the reference.

#include <iostream>

int main()
{
    int value{5};

    int *ptr{nullptr};
    // int &invalidRef; //compilation error
    int &ref{value};
    ptr = &value;

    std::cout << "ref: " << ref << std::endl;
    std::cout << "ptr: " << *ptr << std::endl;
    ptr = nullptr;

    return 0;
}

Re-assignment

Re-assignment constraint regarding the reference means that we cannot change the variable that the reference is pointing to. Once we say that the given variable points to ObjectA we cannot set the variable to point to ObjectB.

To illustrate this rule we will use re_assignment.cpp, code from that file is pasted below. Let’s make a walkthrough of that example. In contrast to the previous one we will need an actual structure hence we defined Point. It contains data members x and y that represent the axis and the helper method called describe which is used to print the state of the structure.

The main function consists of two parts. The first one shows the assignment and the second one, separated by “=” symbols, represents re-assignment. At the very beginning of the main, we initialize two variables representing the Points (lines 16-17). Later on, we proceed with the assignment (lines 19-20). Next, the program will multiply the point’s x and y data member by 10 using the pointer and reference variables. The expected values for point1 should be x=50 and y=50.

In the second part of the program, we do the re-assignment so the base values of point data members are 6. So the expected result for point2 should be x=60 and y=60 and point1 should stay unchanged but the actual results are different. Let’s check the program console output in the next paragraph.

#include <iostream>

struct Point
{
    int x{0};
    int y{0};

    void describe(const char *id) const
    {
        std::cout << id << " x: " << x << " y: " << y << std::endl;
    }
};

int main()
{
    Point point1{5, 5};
    Point point2{6, 6};

    Point *ptr{&point1};
    Point &ref{point1};

    ptr->x *= 10;
    ref.y *= 10;

    point1.describe("point1");
    point2.describe("point2");
    ref.describe("ref");
    ptr->describe("ptr");
    std::cout << "====================" << std::endl;

    ptr = &point2;
    ref = point2;

    ptr->x *= 10;
    ref.y *= 10;

    point1.describe("point1");
    point2.describe("point2");
    ref.describe("ref");
    ptr->describe("ptr");

    return 0;
}

The output for the first part is as expected. The second part seems to be different from our expectations. The program actually didn’t apply the changes for the y data member for point2 instead ref variable in the second part of the program was still pointing to point1. That’s because once we initialize the reference variable we cannot change the address that it is pointing to. Hence ref variable was still referencing point1 instead of point2. Once we have done the re-assignment the assignment operator was used instead of copy constructor. Hence point1 data members were changed from 5 to 6. Next, we applied multiplication by 10 and the y data member from point1 was changed to 60 and the point2 y data member stays intact.

point1 x: 50 y: 50
point2 x: 6 y: 6
ref x: 50 y: 50
ptr x: 50 y: 50
====================
point1 x: 6 y: 60
point2 x: 60 y: 6
ref x: 6 y: 60
ptr x: 60 y: 6

Own identity

Own identity is probably the vaguest term on the list. It means that if we use sizeof or & operator on a reference we will receive the size and address of objects that it points to instead of the actual reference variable.

The following example is a simplified version of a previous one. The code is placed in identity.cpp file in the example repository. The most crucial part is printing to the console. We can see 3 lines using the std::cout. Those prints do the following:

Prints the size and address of the pointer type data variable
Dereference the pointer and prints information about the actual Point variable
Gets size and address of reference variable

#include <iostream>
#include <cstdint>

struct Point
{
    std::uint64_t x{0};
    std::uint64_t y{0};
};

int main()
{
    Point point1{5, 5};

    Point *ptr{&point1};
    Point &ref{point1};

    std::cout << "ptr size: " << sizeof(ptr) << " address: " << &ptr << std::endl;
    std::cout << "dereferenced ptr size: " << sizeof(*ptr) << " address: " << &*ptr << std::endl;
    std::cout << "ref size: " << sizeof(ref) << " address: " << &ref << std::endl;

    return 0;
}

Let’s take a look at the program console output. The first print shows size of the actual pointer-type data variable. The pointer is an actual data type that has its own size and memory address. In my system, the pointer size is 8 bytes.

The dereferenced pointer and the reference are showing the size and the address of an underlying variable. Hence their size and addresses are identical.

That small detail is the real reason behind the statement that reference does not have its own identity as we can only get the information of the underlying object and not the reference type itself while for a pointer we can do both – get the size and address of pointer itself as well as the underlying object.

ptr size: 8 address: 0x7ffe3b128d60
dereferenced ptr size: 16 address: 0x7ffe3b128d68
ref size: 16 address: 0x7ffe3b128d68

It’s worth mentioning the following quote from cpp reference. As it provides implementation details on reference from compiler perspective:

References are not objects; they do not necessarily occupy storage, although the compiler may allocate storage if it is necessary to implement the desired semantics [https://en.cppreference.com/w/cpp/language/reference]

Multiple levels of indirection

Multiple levels of indirection in case of pointer mean for instance: int*** arr;. In case of reference, it is not possible to do such nesting. We are allowed only one level of indirection due to the reference collapsing. [4] Let’s take a look at a slightly modified version of the code from cpp reference that explains what reference collapsing is.

In the code example below, available in our repository file named – reference_collapsing.cpp in the first code block we can see the declaration of two helper types – lref (lvalue reference) and rref (rvalue reference) and initialization of integer variable named n.

The comments on the next code block describe what will be the final c++ reference type and as we can see we can achieve only one level of indirection as we treat rvalue and lvalue references as one level. The only difference between those two types is that the rvalue reference is used to extend the lifetime of a temporary object.

#include <iostream>

int main()
{
    using lref = int &;
    using rref = int &&;
    int n{0};

    lref &r1 = n;  // type of r1 is int&
    lref &&r2 = n; // type of r2 is int&
    rref &r3 = n;  // type of r3 is int&
    rref &&r4 = 1; // type of r4 is int&&

    std::cout << "r1: " << r1 << std::endl;
    std::cout << "r2: " << r2 << std::endl;
    std::cout << "r3: " << r3 << std::endl;
    std::cout << "r4: " << r4 << std::endl;

    return 0;
}

The table below summarizes the previous code. The conclusion from both of them is simple. In order to evaluate reference to the rvalue reference during the collapsing process we need to have two rvalue references in all other cases the result will be lvalue reference.

	lvalue reference (&)	rvalue reference(&&)
lvalue reference (&)	lvalue reference (&)	lvalue reference (&)
rvalue reference(&&)	lvalue reference (&)	rvalue reference(&&)

Iteration support

The code below can be found inside iterable.cpp file. References do not support iterations. In contrast, pointers do. We can use increment – ++ – and decrement – — – operators on them. By doing so we move the pointer by the number of bytes of the structure/primitive it points to.

In the example below we have created and a c-style array of size 3 and initialized it with some values. Next, we have created a pointer that points to the beginning of that array. The next block of code shows the iteration process which includes printing the value and incrementing the pointer variable. Such iteration is possible only for pointers and not for a reference type.

#include <iostream>

int main()
{
    constexpr int size{3};
    int arr[size] = {10, 20, 30};
    int *ptr = arr;

    for (int i{0}; i < size; i++)
    {
        std::cout << i << ": " << *ptr << std::endl;
        ptr++;
    }

    return 0;
}

Storing in contiguous memory space

Plain c++ reference cannot be stored in a contiguous memory space but the pointers can. The code from the example below can be found inside contiguous_memory_space.cpp file.

In the first lines, we initialize two integer helper variables that will be used inside the vectors. The vec1 variable can easily be created, there is no surprise here but in line 11 we get a compilation error while trying to store the reference. As we already know references might or might not occupy memory space and don’t have their own identity so the plain c++ reference cannot be stored that way.

In order to store the reference inside the contiguous memory space we need to use a wrapper. C++ standard library provides such a utility inside functional header and that utility is std::reference_wrapper. [5] This utility has its limitations for instance we must provide the value in initialization as we need with standard references but it also provides the requirements demanded from vector container such as being copy constructible and assignable.

#include <iostream>
#include <vector>
#include <functional>

int main()
{
    int value1{1};
    int value2{2};

    std::vector<int *> vec1 = {&value1, &value2};
    // std::vector<int &> vec2 = {value1, value2}; // compilation error
    std::vector<std::reference_wrapper<int>> vec3 = {value1, value2};

    return 0;
}

Summary

This article covered the most crucial differences between the c++ reference and pointer data type variables. Such as lazy initialization, re-assignment, identity, direction levels, iterations, and contiguous memory space storage. Additionally, we have briefly described lvalue and rvalue references and the reference collapsing mechanism. As well we have shown a solution for storing the reference inside a contiguous memory space by std::reference_wrapper usage.

I think it is important to have good fundamental knowledge as all advanced topics use them to a greater extend. So now we have a good base to talk about more advanced topics.