One-Shot Learning of C++ r-value, &&, and Move

Written by holmeshe | Published 2017/09/07
Tech Story Tags: programming | cplusplus | valuer | move | reverse-engineering

TLDRvia the TL;DR App

C++ is hard, the newer versions become even harder. This article will deal with some of the hard parts in C++, rvalue, rvalue reference (**&&**) and move semantics. And I am going to reverse engineer (not a metaphor) these complex and correlated topics, so you can understand them completely in one shot.

Firstly, let’s examine

What is a rvalue?

A **r**value is one that should be on the right side of an equals sign.

Example:

int var; // too much JavaScript recently:)var = 8; // OK! lvalue (yes, there is a lvalue) on the left8 = var; // ERROR! rvalue on the left(var + 1) = 8; // ERROR! rvalue on the left

Simple enough. Then let’s look at a more subtle case, a **r**values returned by functions:

#include <string>#include <stdio.h>int g_var = 8;int& returnALvalue() {   return g_var; //here we return a lvalue}int returnARvalue() {   return g_var; //here we return a rvalue}int main() {   printf("%d", returnALvalue()++); // g_var += 1;   printf("%d", returnARvalue());}

Result:

89

It is worth noting that the way of returning a lvalue (in the example) is considered a bad practice. So do not do that in real world programming.

Beyond theoretical level

Whether a variable is a rvalue can make differences in real programming even before **&&** is invented.

For example, this line

const int& var = 8;

can be compiled fine while this:

int& var = 8; // use a lvalue reference for a rvalue

generates following error:

rvalue.cc:24:6: error: non-const lvalue reference to type 'int' cannot bind to atemporary of type 'int'

The error message means that the compiler enforces a const reference for rvalue.

A more interesting example:

#include <stdio.h>#include <string>void print(const std::string& name) {    printf("rvalue detected:%s\n", name.c_str());}

void print(std::string& name) {    printf("lvalue detected:%s\n", name.c_str());}

int main() {    std::string name = "lvalue";    

    print(name); //compiler can detect the right function for lvalue    print(rvalu + "e"); // likewise for rvalue}

Result:

lvalue detected:lvaluervalue detected:rvalue

The difference is actually significant enough and compiler can determine overloaded functions.

So rvalue is constant value?

Not exactly. And this where **&&** (rvalue reference)comes in.

Example:

#include <stdio.h>#include <string>

void print(const std::string& name) {printf(“const value detected:%s\n”, name.c_str());}

void print(std::string& name) {printf(“lvalue detected%s\n”, name.c_str());}

void print(std::string&& name) {printf(“rvalue detected:%s\n”, name.c_str());}

int main() {std::string name = “lvalue”;const std::string cname = “cvalue”;std::string rvalu = "rvalu";

print(name);print(cname);print(rvalu + "e");}

Result:

lvalue detected:lvalueconst value detected:cvaluervalue detected:rvalue

If the functions are overloaded for rvalue, a rvalue variable choose the more specified version over the version takes a const reference parameter that is compatible for both. Thus, **&&** can further diversify rvalue from const value.

In bellow I summarize the compatibility of overloaded function versions to different types in default setting. You can verify the result by selectively commenting out lines in the example above.

It sounds cool to further differentiate rvalue and constant value as they are not exactly the same indeed. But what is the practical value?

What problem does && solve exactly?

The problem is the unnecessary deep copy when the argument is a rvalue.

To be more specific. **&&** notation is provided to specify a rvalue, which can be used to avoid the deep copy when the rvalue, 1) is passed as an argument of either a constructor or an assignment operator, and 2) the class of which contains a pointer (or pointers) referring to dynamically allocated resource (memory).

It can be more specific with examples:

#include <stdio.h>#include <string>#include <algorithm>using namespace std;class ResourceOwner {public:  ResourceOwner(const char res[]) {    theResource = new string(res);  }  ResourceOwner(const ResourceOwner& other) {    printf("copy %s\n", other.theResource->c_str());    theResource = new string(other.theResource->c_str());  }  ResourceOwner& operator=(const ResourceOwner& other) {    ResourceOwner tmp(other);    swap(theResource, tmp.theResource);    printf("assign %s\n", other.theResource->c_str());  }  ~ResourceOwner() {    if (theResource) {      printf("destructor %s\n", theResource->c_str());      delete theResource;    }  }private:  string* theResource;};

void testCopy() { // case 1  printf("=====start testCopy()=====\n");

  ResourceOwner res1("res1");  ResourceOwner res2 = res1;  //copy res1

  printf("=====destructors for stack vars, ignore=====\n");}

void testAssign() { // case 2  printf("=====start testAssign()=====\n");

  ResourceOwner res1("res1");  ResourceOwner res2("res2");  res2 = res1; //copy res1, assign res1, destrctor res2

  printf("=====destructors for stack vars, ignore=====\n");}

void testRValue() { // case 3printf("=====start testRValue()=====\n"); ResourceOwner res2("res2");res2 = ResourceOwner("res1"); //copy res1, assign res1, destructor res2, destructor res1

  printf("=====destructors for stack vars, ignore=====\n");

int main() {  testCopy();  testAssign();  

Result:

=start testCopy()=copy res1=destructors for stack vars, ignore=destructor res1destructor res1=start testAssign()=copy res1assign res1destructor res2=destructors for stack vars, ignore=destructor res1destructor res1=start testRValue()=copy res1assign res1destructor res2destructor res1=destructors for stack vars, ignore=destructor res1

The result are all good for the first two test cases, i.e., testCopy() and testAssign(), in which resource in res1 is copied for the res2. It is reasonable to copy the resource because they are two entities both need their unshared resource (a string).

However, in the third case, the (deep) copying of the resource in res1 is superfluous because the anonymous rvalue (returned by ResourceOwner(“res1”)) will be destructed right after the assignment thus it does not need the resource anymore:

res2 = ResourceOwner("res1"); // Please note that the destructor res1 is called right after this line before the point where stack variables are destructed.

I think it is a good chance to repeat the problem statement:

**&&** notation is provided to specify a rvalue, which can be used to avoid the deep copy when the rvalue, 1) is passed as an argument of either a constructor or an assignment operator, and 2) the class of which contains a pointer (or pointers) referring to dynamically allocated resource (memory).

If copying of a resource that is about to disappear is not optimal, what is the right operation then? The answer is

Move

The idea is pretty straightforward, if the argument is a rvalue, we do not need to copy. Rather, we can simply “move” the resource (that is the memory the rvalue points to). Now let’s overload the assignment operator using the new technique:

ResourceOwner& operator=(ResourceOwner&& other) {theResource = other.theResource;other.theResource = NULL;}

This new assignment operator is called a move assignment operator. And a move constructor can be programmed in a similar way.

A good way of understanding this is: when you sell your old property and move to a new house, you do not have to toss all the furniture as we did in case 3 right? Rather, you can simply move the furniture to the new home.

All good.

What is std::move?

Besides the move assignment operator and move constructor discussed above, there is one last missing piece in this puzzle, std::move.

Again, we look at the problem first:

when 1) we know a variable is in fact a rvalue, while 2) the compiler does not. The right version of the overloaded functions can not be called.

A common case is when we add another layer of resource owner, ResourceHolder and the relation of the three entities is given as bellow:

holder||----->owner||----->resource

(N.b., in the following example, I complete the implementation of ResourceOwner’s move constructor as well)

Example:

#include <string>#include <algorithm>

using namespace std;

class ResourceOwner {

public:ResourceOwner(const char res[]) {theResource = new string(res);}

ResourceOwner(const ResourceOwner& other) {printf(“copy %s\n”, other.theResource->c_str());theResource = new string(other.theResource->c_str());}

++ResourceOwner(ResourceOwner&& other) {++ printf(“move cons %s\n”, other.theResource->c_str());++ theResource = other.theResource;++ other.theResource = NULL;++}

ResourceOwner& operator=(const ResourceOwner& other) {ResourceOwner tmp(other);swap(theResource, tmp.theResource);printf(“assign %s\n”, other.theResource->c_str());}

++ResourceOwner& operator=(ResourceOwner&& other) {++ printf(“move assign %s\n”, other.theResource->c_str());++ theResource = other.theResource;++ other.theResource = NULL;++}

~ResourceOwner() {if (theResource) {printf(“destructor %s\n”, theResource->c_str());delete theResource;}}

private:string* theResource;};

class ResourceHolder {

……

ResourceHolder& operator=(ResourceHolder&& other) {printf(“move assign %s\n”, other.theResource->c_str());resOwner = other.resOwner;}

……

private:ResourceOwner resOwner;}

In ResourceHolder’s move assignment operator, we want to call ResourceOwner’s move assignment operator since “a no-pointer member of a rvalue should be a rvalue too”. However, when we simply code resOwner = other.resOwner, what get invoked is actually the ResourceOwner’s normal assignment operator that, again, incurs the extra copy.

It’s a good chance to repeat the problem statement again:

when 1) we know a variable is in fact a rvalue, while 2) the compiler does not. The right version of the overloaded functions can not be called.

As a solution we use to std::move to cast the variable to rvalue, so the right version of ResourceOwner’s assignment operator can be called.

ResourceHolder& operator=(ResourceHolder&& other) {printf(“move assign %s\n”, other.theResource->c_str());resOwner = std::move(other.resOwner);}

What is std::move exactly?

We know that type cast is not simply a compiler placebo telling a compiler that “I know what I am doing”. It effectively generate instructions of mov a value to bigger or smaller registers (e.g.,%eax->%cl) to conduct the “cast”.

So what std::move does exactly behind scene. I do not know myself when I am writing this paragraph, so let’s find out together.

First we modify the main a bit (I tried to make the style consistent)

Example:

int main() {ResourceOwner res(“res1”);asm(“nop”); // remeber meResourceOwner && rvalue = std::move(res);asm(“nop”); // remeber me}

Compile it, and dissemble the obj using

clang++ -g -c -std=c++11 -stdlib=libc++ -Weverything move.ccgobjdump -d -D move.o

Result:

0000000000000000 <_main>:0: 55 push %rbp1: 48 89 e5 mov %rsp,%rbp4: 48 83 ec 20 sub $0x20,%rsp8: 48 8d 7d f0 lea -0x10(%rbp),%rdic: 48 8d 35 41 03 00 00 lea 0x341(%rip),%rsi # 354 <GCC_except_table5+0x18>13: e8 00 00 00 00 callq 18 <_main+0x18>18: 90 nop // remember me19: 48 8d 75 f0 lea -0x10(%rbp),%rsi1d: 48 89 75 f8 mov %rsi,-0x8(%rbp)21: 48 8b 75 f8 mov -0x8(%rbp),%rsi25: 48 89 75 e8 mov %rsi,-0x18(%rbp)29: 90 nop // remember me2a: 48 8d 7d f0 lea -0x10(%rbp),%rdi2e: e8 00 00 00 00 callq 33 <_main+0x33>33: 31 c0 xor %eax,%eax35: 48 83 c4 20 add $0x20,%rsp39: 5d pop %rbp3a: c3 retq3b: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)

I briefly explain what happens between the two nop.

  1. assign the address of one stack variable (presumably ResourceOwner res) to %rsi
  2. assign the value of %rsi to another stack variable (this one is anonymous)
  3. assign the value of the anonymous stack variable back to %rsi (what?)
  4. assign the value of %rsi to yet another stack variable (presumably ResourceOwner && rvalue)
  5. so the whole operations can be summarized as “assigning the address of ResourceOwner res to ResourceOwner && rvalue“, which is the same to normal reference assignment.

If we turn on O (-O1)for the compiler, all those dummy instructions will be gone.

clang++ -g -c -O1 -std=c++11 -stdlib=libc++ -Weverything move.ccgobjdump -d -D move.o

Moreover, if changing the critical line to a normal reference assignment:

ResourceOwner & rvalue = res;

Except for some minor differences in variables offsets, the assembly code generated are mostly identical, as we assumed in point 5 above.

The test shows that the move semantics is pure syntax candy and a machine does not care at all.

To conclude,

If you like this read please clap for it or follow me by clicking the button. Thanks for coming along and hope to see you the next time.

This post is also archived here.


Published by HackerNoon on 2017/09/07