Some of you may have been tired of this topic already, but our software engineers picked 7 examples and tried to explain their behavior using the Standard (the latest draft at the time of writing):
struct A {
int data_mem;
void non_static_mem_fn() {}
static void static_mem_fn() {}
};
void foo(int) {}
A* p{nullptr};
/*1*/ *p;
/*2*/ foo((*p, 5));
/*3*/ A a{*p};
/*4*/ p->data_mem;
/*5*/ int b{p->data_mem};
/*6*/ p->non_static_mem_fn();
/*7*/ p->static_mem_fn();
One obvious yet important point is that p initialized with a null pointer can’t point to a valid object because it’s “distinguishable from every other value of object pointer” (conv.ptr#1).
*p;
It’s an expression statement with *p being a discarded-value expression which needs to be evaluated nevertheless (stmt.expr#1). By definition (expr.unary.op#1), the unary operator * “performs indirection,” and the result is
“an lvalue referring to the object or function to which the expression points.”
It’s clear what semantics is, but not whether there’s a precondition that an object has to exist. A null pointer is not mentioned there even once.
One could try to conclude from the fact that it performs indirection because basic.stc#4 says that
“indirection through an invalid pointer value . . . have undefined behavior”.
However, that exact paragraph contains the definition of an invalid pointer value and refers to basic.compound#3.4, where a null pointer value and an invalid pointer value are listed as different values of a pointer.
There’s also a note in dcl.ref#5 saying that
“the only way to create such a reference would be to bind it to the “object” obtained by indirection through a null pointer, which causes undefined behavior,”
but it’s not clear which part the last clause is referring to. In case it’s “to bind it,” then binding to a non-existing object is undefined behavior, which goes in line with the normative text of that paragraph.
Since the Standard leaves room for interpretation instead of being clear on this particular topic, let’s turn to the core language issues list, where the Core Working Group elaborates wording of the Standard, among other things. There’s a dedicated issue for our topic, where CWG came to an informal consensus (that’s how “drafting” status is defined) that
“p = 0; *p; is not inherently an error. An lvalue-to-rvalue conversion would give it undefined behavior.”
If the “informal consensus” doesn’t sound good enough, there’s another issue dedicated to example 7, where CWG says it should be allowed for that exact reason.
We’ll take into account this consensus in what follows. If a future Standard prohibits indirection through a null pointer like C does (N2176, 6.5.3.2 and footnote 104), then all examples will be rendered to contain undefined behavior.
foo((*p, 5));
In order to call foo(), the parameter needs to be initialized, which leads to the evaluation of the operator comma. Its operands are evaluated from left to right, and except for the rightmost, all of them are discarded-value expressions (expr.comma#1). So this example is well-formed, too.
A a{*p};
An implicit copy constructor will be picked to initialize a, and const A& needs to be initialized with a valid object in order to call it, otherwise behavior is undefined (dcl.ref#5). However, there’s no such object in our case.
p->data_mem;
Expression of this expression statement will be converted to (*(p)).data_mem per expr.ref#2, which designates “the corresponding member subobject of the object designated by the first expression” (expr.ref#6.2).
It’s once again not clear whether there’s a precondition that an object has to exist. Seeing “to refer” and “to designate” being used interchangeably in basic.lookup.qual#1 make it similar to example 1 all the more. I’d say that this example is well-formed because of that, but some compilers disagree. See “Checking with constant expressions” section at the end of this article for more details.
int b{p->data_mem};
Continuing the previous example, we’ll try to initialize int with the result of expression instead of discarding it. It needs to be converted to prvalue, because expressions of this category initialize objects (basic.lval#1.2).
Since the target type is int, the result of the expression will be accessed (conv.lval#3.4), which leads to undefined behavior in our case, because none of the conditions in basic.lval#11 are met.
p->non_static_mem_fn();
class.mfct.non-static#1 reads that
“a non-static member function may be called for an object of its class type, or for an object of a class derived from its class type,”
where “may be” means permission, and not a possibility (ISO Directives Part 2). So behavior is undefined since there’s no object.
p->static_mem_fn();
As we mentioned in description to example 1, CWG says that this example is a valid code. The only thing to add is that indirection through expression to the left of -> is performed even when its result is not required (foontnote 59).
Since constant expression can’t rely on undefined behavior (expr.const#5), we can ask compilers’ opinion on our examples. Even though their diagnostics are not ideal, they are at least sometimes right. We edited our examples a bit to fit them into constant expression evaluation, fed them to three popular compilers, and commented out examples they had considered bad because diagnostic messages of GCC and MSVC leave a lot to be desired on those particular examples.
The tests themselves can be found on godbolt, and the summary of our results is presented in the table below.
The results make us a bit doubtful about our conclusion on example 6, and even more on example 4. But it’s also interesting to see all of us share the same opinion about the key example 1.
Thanks for staying with me to follow the adventures of a null pointer in C++! :-) Usually, we share fragments of code taken from our current firmware development projects, but this time our software engineers were interested in purely "philosophical" questions, so the examples were synthesized.
If you share our love contradictions in C++, feel free to share your code and comments.
Previously published at https://dev.to/promwad_team/null-pointers-in-c-what-you-can-and-can-t-do-25ic