Does the C++ standard allow for an uninitialized bool to crash a program?
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}
I know that an "undefined behaviour" in C++ can pretty much allow the compiler to do anything it wants. However, I had a crash that surprised me, as I assumed that the code was safe enough.
In this case, the real problem happened only on a specific platform using a specific compiler, and only if optimization was enabled.
I tried several things in order to reproduce the problem and simplify it to the maximum. Here's an extract of a function called Serialize
, that would take a bool parameter, and copy the string true
or false
to an existing destination buffer.
Would this function be in a code review, there would be no way to tell that it, in fact, could crash if the bool parameter was an uninitialized value?
// Zero-filled global buffer of 16 characters
char destBuffer[16];
void Serialize(bool boolValue) {
// Determine which string to print based on boolValue
const char* whichString = boolValue ? "true" : "false";
// Compute the length of the string we selected
const size_t len = strlen(whichString);
// Copy string into destination buffer, which is zero-filled (thus already null-terminated)
memcpy(destBuffer, whichString, len);
}
If this code is executed with clang 5.0.0 + optimizations, it will/can crash.
The expected ternary-operator boolValue ? "true" : "false"
looked safe enough for me, I was assuming, "Whatever garbage value is in boolValue
doesn't matter, since it will evaluate to true or false anyhow."
I have setup a Compiler Explorer example that shows the problem in the disassembly, here the complete example. Note: in order to repro the issue, the combination I've found that worked is by using Clang 5.0.0 with -O2 optimisation.
#include <iostream>
#include <cstring>
// Simple struct, with an empty constructor that doesn't initialize anything
struct FStruct {
bool uninitializedBool;
__attribute__ ((noinline)) // Note: the constructor must be declared noinline to trigger the problem
FStruct() {};
};
char destBuffer[16];
// Small utility function that allocates and returns a string "true" or "false" depending on the value of the parameter
void Serialize(bool boolValue) {
// Determine which string to print depending if 'boolValue' is evaluated as true or false
const char* whichString = boolValue ? "true" : "false";
// Compute the length of the string we selected
size_t len = strlen(whichString);
memcpy(destBuffer, whichString, len);
}
int main()
{
// Locally construct an instance of our struct here on the stack. The bool member uninitializedBool is uninitialized.
FStruct structInstance;
// Output "true" or "false" to stdout
Serialize(structInstance.uninitializedBool);
return 0;
}
The problem arises because of the optimizer: It was clever enough to deduce that the strings "true" and "false" only differs in length by 1. So instead of really calculating the length, it uses the value of the bool itself, which should technically be either 0 or 1, and goes like this:
const size_t len = strlen(whichString); // original code
const size_t len = 5 - boolValue; // clang clever optimization
While this is "clever", so to speak, my question is: Does the C++ standard allow a compiler to assume a bool can only have an internal numerical representation of '0' or '1' and use it in such a way?
Or is this a case of implementation-defined, in which case the implementation assumed that all its bools will only ever contain 0 or 1, and any other value is undefined behaviour territory?
c++ llvm undefined-behavior abi
|
show 7 more comments
I know that an "undefined behaviour" in C++ can pretty much allow the compiler to do anything it wants. However, I had a crash that surprised me, as I assumed that the code was safe enough.
In this case, the real problem happened only on a specific platform using a specific compiler, and only if optimization was enabled.
I tried several things in order to reproduce the problem and simplify it to the maximum. Here's an extract of a function called Serialize
, that would take a bool parameter, and copy the string true
or false
to an existing destination buffer.
Would this function be in a code review, there would be no way to tell that it, in fact, could crash if the bool parameter was an uninitialized value?
// Zero-filled global buffer of 16 characters
char destBuffer[16];
void Serialize(bool boolValue) {
// Determine which string to print based on boolValue
const char* whichString = boolValue ? "true" : "false";
// Compute the length of the string we selected
const size_t len = strlen(whichString);
// Copy string into destination buffer, which is zero-filled (thus already null-terminated)
memcpy(destBuffer, whichString, len);
}
If this code is executed with clang 5.0.0 + optimizations, it will/can crash.
The expected ternary-operator boolValue ? "true" : "false"
looked safe enough for me, I was assuming, "Whatever garbage value is in boolValue
doesn't matter, since it will evaluate to true or false anyhow."
I have setup a Compiler Explorer example that shows the problem in the disassembly, here the complete example. Note: in order to repro the issue, the combination I've found that worked is by using Clang 5.0.0 with -O2 optimisation.
#include <iostream>
#include <cstring>
// Simple struct, with an empty constructor that doesn't initialize anything
struct FStruct {
bool uninitializedBool;
__attribute__ ((noinline)) // Note: the constructor must be declared noinline to trigger the problem
FStruct() {};
};
char destBuffer[16];
// Small utility function that allocates and returns a string "true" or "false" depending on the value of the parameter
void Serialize(bool boolValue) {
// Determine which string to print depending if 'boolValue' is evaluated as true or false
const char* whichString = boolValue ? "true" : "false";
// Compute the length of the string we selected
size_t len = strlen(whichString);
memcpy(destBuffer, whichString, len);
}
int main()
{
// Locally construct an instance of our struct here on the stack. The bool member uninitializedBool is uninitialized.
FStruct structInstance;
// Output "true" or "false" to stdout
Serialize(structInstance.uninitializedBool);
return 0;
}
The problem arises because of the optimizer: It was clever enough to deduce that the strings "true" and "false" only differs in length by 1. So instead of really calculating the length, it uses the value of the bool itself, which should technically be either 0 or 1, and goes like this:
const size_t len = strlen(whichString); // original code
const size_t len = 5 - boolValue; // clang clever optimization
While this is "clever", so to speak, my question is: Does the C++ standard allow a compiler to assume a bool can only have an internal numerical representation of '0' or '1' and use it in such a way?
Or is this a case of implementation-defined, in which case the implementation assumed that all its bools will only ever contain 0 or 1, and any other value is undefined behaviour territory?
c++ llvm undefined-behavior abi
182
It's a great question. It's a solid illustration of how undefined behavior isn't just a theoretical concern. When people say anything can happen as a result of UB, that "anything" can really be quite surprising. One might assume that undefined behavior still manifests in predictable ways, but these days with modern optimizers that's not at all true. OP took the time to create a MCVE, investigated the problem thoroughly, inspected the disassembly, and asked a clear, straightforward question about it. Couldn't ask for more.
– John Kugelman
Jan 10 at 2:04
6
Observe that the requirement that “non-zero evaluates totrue
” is a rule about Boolean operations including “assignment to a bool” (which might implicitly invoke astatic_cast<bool>()
depending on specifics). It is however not a requirement about the internal representation of abool
chosen by the compiler.
– Euro Micelli
Jan 10 at 3:48
2
Comments are not for extended discussion; this conversation has been moved to chat.
– Samuel Liew♦
Jan 11 at 12:28
3
On a very related note, this is a "fun" source of binary incompatibility. If you have an ABI A that zero-pads values before calling a function, but compiles functions such that it assumes parameters are zero-padded, and an ABI B that's the opposite (doesn't zero-pad, but doesn't assume zero-padded parameters), it'll mostly work, but a function using the B ABI will cause issues if it calls a function using the A ABI that takes a 'small' parameter. IIRC you have this on x86 with clang and ICC.
– TLW
Jan 12 at 19:36
1
@TLW: Although the Standard does not require that implementations provide any means of calling or being called by outside code, it would have been helpful to have a means of specifying such things for implementations where they are relevant (implementations where such details aren't relevant could ignore such attributes).
– supercat
Jan 12 at 22:14
|
show 7 more comments
I know that an "undefined behaviour" in C++ can pretty much allow the compiler to do anything it wants. However, I had a crash that surprised me, as I assumed that the code was safe enough.
In this case, the real problem happened only on a specific platform using a specific compiler, and only if optimization was enabled.
I tried several things in order to reproduce the problem and simplify it to the maximum. Here's an extract of a function called Serialize
, that would take a bool parameter, and copy the string true
or false
to an existing destination buffer.
Would this function be in a code review, there would be no way to tell that it, in fact, could crash if the bool parameter was an uninitialized value?
// Zero-filled global buffer of 16 characters
char destBuffer[16];
void Serialize(bool boolValue) {
// Determine which string to print based on boolValue
const char* whichString = boolValue ? "true" : "false";
// Compute the length of the string we selected
const size_t len = strlen(whichString);
// Copy string into destination buffer, which is zero-filled (thus already null-terminated)
memcpy(destBuffer, whichString, len);
}
If this code is executed with clang 5.0.0 + optimizations, it will/can crash.
The expected ternary-operator boolValue ? "true" : "false"
looked safe enough for me, I was assuming, "Whatever garbage value is in boolValue
doesn't matter, since it will evaluate to true or false anyhow."
I have setup a Compiler Explorer example that shows the problem in the disassembly, here the complete example. Note: in order to repro the issue, the combination I've found that worked is by using Clang 5.0.0 with -O2 optimisation.
#include <iostream>
#include <cstring>
// Simple struct, with an empty constructor that doesn't initialize anything
struct FStruct {
bool uninitializedBool;
__attribute__ ((noinline)) // Note: the constructor must be declared noinline to trigger the problem
FStruct() {};
};
char destBuffer[16];
// Small utility function that allocates and returns a string "true" or "false" depending on the value of the parameter
void Serialize(bool boolValue) {
// Determine which string to print depending if 'boolValue' is evaluated as true or false
const char* whichString = boolValue ? "true" : "false";
// Compute the length of the string we selected
size_t len = strlen(whichString);
memcpy(destBuffer, whichString, len);
}
int main()
{
// Locally construct an instance of our struct here on the stack. The bool member uninitializedBool is uninitialized.
FStruct structInstance;
// Output "true" or "false" to stdout
Serialize(structInstance.uninitializedBool);
return 0;
}
The problem arises because of the optimizer: It was clever enough to deduce that the strings "true" and "false" only differs in length by 1. So instead of really calculating the length, it uses the value of the bool itself, which should technically be either 0 or 1, and goes like this:
const size_t len = strlen(whichString); // original code
const size_t len = 5 - boolValue; // clang clever optimization
While this is "clever", so to speak, my question is: Does the C++ standard allow a compiler to assume a bool can only have an internal numerical representation of '0' or '1' and use it in such a way?
Or is this a case of implementation-defined, in which case the implementation assumed that all its bools will only ever contain 0 or 1, and any other value is undefined behaviour territory?
c++ llvm undefined-behavior abi
I know that an "undefined behaviour" in C++ can pretty much allow the compiler to do anything it wants. However, I had a crash that surprised me, as I assumed that the code was safe enough.
In this case, the real problem happened only on a specific platform using a specific compiler, and only if optimization was enabled.
I tried several things in order to reproduce the problem and simplify it to the maximum. Here's an extract of a function called Serialize
, that would take a bool parameter, and copy the string true
or false
to an existing destination buffer.
Would this function be in a code review, there would be no way to tell that it, in fact, could crash if the bool parameter was an uninitialized value?
// Zero-filled global buffer of 16 characters
char destBuffer[16];
void Serialize(bool boolValue) {
// Determine which string to print based on boolValue
const char* whichString = boolValue ? "true" : "false";
// Compute the length of the string we selected
const size_t len = strlen(whichString);
// Copy string into destination buffer, which is zero-filled (thus already null-terminated)
memcpy(destBuffer, whichString, len);
}
If this code is executed with clang 5.0.0 + optimizations, it will/can crash.
The expected ternary-operator boolValue ? "true" : "false"
looked safe enough for me, I was assuming, "Whatever garbage value is in boolValue
doesn't matter, since it will evaluate to true or false anyhow."
I have setup a Compiler Explorer example that shows the problem in the disassembly, here the complete example. Note: in order to repro the issue, the combination I've found that worked is by using Clang 5.0.0 with -O2 optimisation.
#include <iostream>
#include <cstring>
// Simple struct, with an empty constructor that doesn't initialize anything
struct FStruct {
bool uninitializedBool;
__attribute__ ((noinline)) // Note: the constructor must be declared noinline to trigger the problem
FStruct() {};
};
char destBuffer[16];
// Small utility function that allocates and returns a string "true" or "false" depending on the value of the parameter
void Serialize(bool boolValue) {
// Determine which string to print depending if 'boolValue' is evaluated as true or false
const char* whichString = boolValue ? "true" : "false";
// Compute the length of the string we selected
size_t len = strlen(whichString);
memcpy(destBuffer, whichString, len);
}
int main()
{
// Locally construct an instance of our struct here on the stack. The bool member uninitializedBool is uninitialized.
FStruct structInstance;
// Output "true" or "false" to stdout
Serialize(structInstance.uninitializedBool);
return 0;
}
The problem arises because of the optimizer: It was clever enough to deduce that the strings "true" and "false" only differs in length by 1. So instead of really calculating the length, it uses the value of the bool itself, which should technically be either 0 or 1, and goes like this:
const size_t len = strlen(whichString); // original code
const size_t len = 5 - boolValue; // clang clever optimization
While this is "clever", so to speak, my question is: Does the C++ standard allow a compiler to assume a bool can only have an internal numerical representation of '0' or '1' and use it in such a way?
Or is this a case of implementation-defined, in which case the implementation assumed that all its bools will only ever contain 0 or 1, and any other value is undefined behaviour territory?
c++ llvm undefined-behavior abi
c++ llvm undefined-behavior abi
edited Jan 27 at 16:52
double-beep
3,10641432
3,10641432
asked Jan 10 at 1:39
RemzRemz
1,5612310
1,5612310
182
It's a great question. It's a solid illustration of how undefined behavior isn't just a theoretical concern. When people say anything can happen as a result of UB, that "anything" can really be quite surprising. One might assume that undefined behavior still manifests in predictable ways, but these days with modern optimizers that's not at all true. OP took the time to create a MCVE, investigated the problem thoroughly, inspected the disassembly, and asked a clear, straightforward question about it. Couldn't ask for more.
– John Kugelman
Jan 10 at 2:04
6
Observe that the requirement that “non-zero evaluates totrue
” is a rule about Boolean operations including “assignment to a bool” (which might implicitly invoke astatic_cast<bool>()
depending on specifics). It is however not a requirement about the internal representation of abool
chosen by the compiler.
– Euro Micelli
Jan 10 at 3:48
2
Comments are not for extended discussion; this conversation has been moved to chat.
– Samuel Liew♦
Jan 11 at 12:28
3
On a very related note, this is a "fun" source of binary incompatibility. If you have an ABI A that zero-pads values before calling a function, but compiles functions such that it assumes parameters are zero-padded, and an ABI B that's the opposite (doesn't zero-pad, but doesn't assume zero-padded parameters), it'll mostly work, but a function using the B ABI will cause issues if it calls a function using the A ABI that takes a 'small' parameter. IIRC you have this on x86 with clang and ICC.
– TLW
Jan 12 at 19:36
1
@TLW: Although the Standard does not require that implementations provide any means of calling or being called by outside code, it would have been helpful to have a means of specifying such things for implementations where they are relevant (implementations where such details aren't relevant could ignore such attributes).
– supercat
Jan 12 at 22:14
|
show 7 more comments
182
It's a great question. It's a solid illustration of how undefined behavior isn't just a theoretical concern. When people say anything can happen as a result of UB, that "anything" can really be quite surprising. One might assume that undefined behavior still manifests in predictable ways, but these days with modern optimizers that's not at all true. OP took the time to create a MCVE, investigated the problem thoroughly, inspected the disassembly, and asked a clear, straightforward question about it. Couldn't ask for more.
– John Kugelman
Jan 10 at 2:04
6
Observe that the requirement that “non-zero evaluates totrue
” is a rule about Boolean operations including “assignment to a bool” (which might implicitly invoke astatic_cast<bool>()
depending on specifics). It is however not a requirement about the internal representation of abool
chosen by the compiler.
– Euro Micelli
Jan 10 at 3:48
2
Comments are not for extended discussion; this conversation has been moved to chat.
– Samuel Liew♦
Jan 11 at 12:28
3
On a very related note, this is a "fun" source of binary incompatibility. If you have an ABI A that zero-pads values before calling a function, but compiles functions such that it assumes parameters are zero-padded, and an ABI B that's the opposite (doesn't zero-pad, but doesn't assume zero-padded parameters), it'll mostly work, but a function using the B ABI will cause issues if it calls a function using the A ABI that takes a 'small' parameter. IIRC you have this on x86 with clang and ICC.
– TLW
Jan 12 at 19:36
1
@TLW: Although the Standard does not require that implementations provide any means of calling or being called by outside code, it would have been helpful to have a means of specifying such things for implementations where they are relevant (implementations where such details aren't relevant could ignore such attributes).
– supercat
Jan 12 at 22:14
182
182
It's a great question. It's a solid illustration of how undefined behavior isn't just a theoretical concern. When people say anything can happen as a result of UB, that "anything" can really be quite surprising. One might assume that undefined behavior still manifests in predictable ways, but these days with modern optimizers that's not at all true. OP took the time to create a MCVE, investigated the problem thoroughly, inspected the disassembly, and asked a clear, straightforward question about it. Couldn't ask for more.
– John Kugelman
Jan 10 at 2:04
It's a great question. It's a solid illustration of how undefined behavior isn't just a theoretical concern. When people say anything can happen as a result of UB, that "anything" can really be quite surprising. One might assume that undefined behavior still manifests in predictable ways, but these days with modern optimizers that's not at all true. OP took the time to create a MCVE, investigated the problem thoroughly, inspected the disassembly, and asked a clear, straightforward question about it. Couldn't ask for more.
– John Kugelman
Jan 10 at 2:04
6
6
Observe that the requirement that “non-zero evaluates to
true
” is a rule about Boolean operations including “assignment to a bool” (which might implicitly invoke a static_cast<bool>()
depending on specifics). It is however not a requirement about the internal representation of a bool
chosen by the compiler.– Euro Micelli
Jan 10 at 3:48
Observe that the requirement that “non-zero evaluates to
true
” is a rule about Boolean operations including “assignment to a bool” (which might implicitly invoke a static_cast<bool>()
depending on specifics). It is however not a requirement about the internal representation of a bool
chosen by the compiler.– Euro Micelli
Jan 10 at 3:48
2
2
Comments are not for extended discussion; this conversation has been moved to chat.
– Samuel Liew♦
Jan 11 at 12:28
Comments are not for extended discussion; this conversation has been moved to chat.
– Samuel Liew♦
Jan 11 at 12:28
3
3
On a very related note, this is a "fun" source of binary incompatibility. If you have an ABI A that zero-pads values before calling a function, but compiles functions such that it assumes parameters are zero-padded, and an ABI B that's the opposite (doesn't zero-pad, but doesn't assume zero-padded parameters), it'll mostly work, but a function using the B ABI will cause issues if it calls a function using the A ABI that takes a 'small' parameter. IIRC you have this on x86 with clang and ICC.
– TLW
Jan 12 at 19:36
On a very related note, this is a "fun" source of binary incompatibility. If you have an ABI A that zero-pads values before calling a function, but compiles functions such that it assumes parameters are zero-padded, and an ABI B that's the opposite (doesn't zero-pad, but doesn't assume zero-padded parameters), it'll mostly work, but a function using the B ABI will cause issues if it calls a function using the A ABI that takes a 'small' parameter. IIRC you have this on x86 with clang and ICC.
– TLW
Jan 12 at 19:36
1
1
@TLW: Although the Standard does not require that implementations provide any means of calling or being called by outside code, it would have been helpful to have a means of specifying such things for implementations where they are relevant (implementations where such details aren't relevant could ignore such attributes).
– supercat
Jan 12 at 22:14
@TLW: Although the Standard does not require that implementations provide any means of calling or being called by outside code, it would have been helpful to have a means of specifying such things for implementations where they are relevant (implementations where such details aren't relevant could ignore such attributes).
– supercat
Jan 12 at 22:14
|
show 7 more comments
5 Answers
5
active
oldest
votes
Yes, ISO C++ allows (but doesn't require) implementations to make this choice.
But also note that ISO C++ allows a compiler to emit code that crashes on purpose (e.g. with an illegal instruction) if the program encounters UB, e.g. as a way to help you find errors. (Or because it's a DeathStation 9000. Being strictly conforming is not sufficient for a C++ implementation to be useful for any real purpose). So ISO C++ would allow a compiler to make asm that crashed (for totally different reasons) even on similar code that read an uninitialized uint32_t
. Even though that's required to be a fixed-layout type with no trap representations.
It's an interesting question about how real implementations work, but remember that even if the answer was different, your code would still be unsafe because modern C++ is not a portable version of assembly language.
You're compiling for the x86-64 System V ABI, which specifies that a bool
as a function arg in a register is represented by the bit-patterns false=0
and true=1
in the low 8 bits of the register1. In memory, bool
is a 1-byte type that again must have an integer value of 0 or 1.
(An ABI is a set of implementation choices that compilers for the same platform agree on so they can make code that calls each other's functions, including type sizes, struct layout rules, and calling conventions.)
ISO C++ doesn't specify it, but this ABI decision is widespread because it makes bool->int conversion cheap (just zero-extension). I'm not aware of any ABIs that don't let the compiler assume 0 or 1 for bool
, for any architecture (not just x86). It allows optimizations like !mybool
with xor eax,1
to flip the low bit: Any possible code that can flip a bit/integer/bool between 0 and 1 in single CPU instruction. Or compiling a&&b
to a bitwise AND for bool
types. Some compilers do actually take advantage Boolean values as 8 bit in compilers. Are operations on them inefficient?.
In general, the as-if rule allows allows the compiler to take advantage of things that are true on the target platform being compiled for, because the end result will be executable code that implements the same externally-visible behaviour as the C++ source. (With all the restrictions that Undefined Behaviour places on what is actually "externally visible": not with a debugger, but from another thread in a well-formed / legal C++ program.)
The compiler is definitely allowed to take full advantage of an ABI guarantee in its code-gen, and make code like you found which optimizes strlen(whichString)
to5U - boolValue
. (BTW, this optimization is kind of clever, but maybe shortsighted vs. branching and inlining memcpy
as stores of immediate data2.)
Or the compiler could have created a table of pointers and indexed it with the integer value of the bool
, again assuming it was a 0 or 1. (This possibility is what @Barmar's answer suggested.)
Your __attribute((noinline))
constructor with optimization enabled led to clang just loading a byte from the stack to use as uninitializedBool
. It made space for the object in main
with push rax
(which is smaller and for various reason about as efficient as sub rsp, 8
), so whatever garbage was in AL on entry to main
is the value it used for uninitializedBool
. This is why you actually got values that weren't just 0
.
5U - random garbage
can easily wrap to a large unsigned value, leading memcpy to go into unmapped memory. The destination is in static storage, not the stack, so you're not overwriting a return address or something.
Other implementations could make different choices, e.g. false=0
and true=any non-zero value
. Then clang probably wouldn't make code that crashes for this specific instance of UB. (But it would still be allowed to if it wanted to.) I don't know of any implementations that choose anything other what x86-64 does for bool
, but the C++ standard allows many things that nobody does or even would want to do on hardware that's anything like current CPUs.
ISO C++ leaves it unspecified what you'll find when you examine or modify the object representation of a bool
. (e.g. by memcpy
ing the bool
into unsigned char
, which you're allowed to do because char*
can alias anything. And unsigned char
is guaranteed to have no padding bits, so the C++ standard does formally let you hexdump object representations without any UB. Pointer-casting to copy the object representation is different from assigning char foo = my_bool
, of course, so booleanization to 0 or 1 wouldn't happen and you'd get the raw object representation.)
You've partially "hidden" the UB on this execution path from the compiler with noinline
. Even if it doesn't inline, though, interprocedural optimizations could still make a version of the function that depends on the definition of another function. (First, clang is making an executable, not a Unix shared library where symbol-interposition can happen. Second, the definition in inside the class{}
definition so all translation units must have the same definition. Like with the inline
keyword.)
So a compiler could emit just a ret
or ud2
(illegal instruction) as the definition for main
, because the path of execution starting at the top of main
unavoidably encounters Undefined Behaviour. (Which the compiler can see at compile time if it decided to follow the path through the non-inline constructor.)
Any program that encounters UB is totally undefined for its entire existence. But UB inside a function or if()
branch that never actually runs doesn't corrupt the rest of the program. In practice that means that compilers can decide to emit an illegal instruction, or a ret
, or not emit anything and fall into the next block / function, for the whole basic block that can be proven at compile time to contain or lead to UB.
GCC and Clang in practice do actually sometimes emit ud2
on UB, instead of even trying to generate code for paths of execution that make no sense. Or for cases like falling off the end of a non-void
function, gcc will sometimes omit a ret
instruction. If you were thinking that "my function will just return with whatever garbage is in RAX", you are sorely mistaken. Modern C++ compilers don't treat the language like a portable assembly language any more. Your program really has to be valid C++, without making assumptions about how a stand-alone non inlined version of your function might look in asm.
Another fun example is Why does unaligned access to mmap'ed memory sometimes segfault on AMD64?. x86 doesn't fault on unaligned integers, right? So why would a misaligned uint16_t*
be a problem? Because alignof(uint16_t) == 2
, and violating that assumption led to a segfault when auto-vectorizing with SSE2.
See also What Every C Programmer Should Know About Undefined Behavior #1/3, an article by a clang developer.
Key point: if the compiler noticed the UB at compile time, it could "break" (emit surprising asm) the path through your code that causes UB even if targeting an ABI where any bit-pattern is a valid object representation for bool
.
Expect total hostility toward many mistakes by the programmer, especially things modern compilers warn about. This is why you should use -Wall
and fix warnings. C++ is not a user-friendly language, and something in C++ can be unsafe even if it would be safe in asm on the target you're compiling for. (e.g. signed overflow is UB in C++ and compilers will assume it doesn't happen, even when compiling for 2's complement x86, unless you use clang/gcc -fwrapv
.)
Compile-time-visible UB is always dangerous, and it's really hard to be sure (with link-time optimization) that you've really hidden UB from the compiler and can thus reason about what kind of asm it will generate.
Not to be over-dramatic; often compilers do let you get away with some things and emit code like you're expecting even when something is UB. But maybe it will be a problem in the future if compiler devs implement some optimization that gains more info about value-ranges (e.g. that a variable is non-negative, maybe allowing it to optimize sign-extension to free zero-extension on x86-64). For example, in current gcc and clang, doing tmp = a+INT_MIN
doesn't let them optimize a<0
as always-true, only that tmp
is always negative. (So they don't backtrack from the inputs of a calculation to derive range info, only on the results based on the assumption of no signed overflow: example on Godbolt. I don't know if this is intentional user-friendliness or simply a missed optimization.)
Also note that implementations (aka compilers) are allowed to define behaviour that ISO C++ leaves undefined. For example, all compilers that support Intel's intrinsics (like _mm_add_ps(__m128, __m128)
for manual SIMD vectorization) must allow forming mis-aligned pointers, which is UB in C++ even if you don't dereference them. __m128i _mm_loadu_si128(const __m128i *)
does unaligned loads by taking a misaligned __m128i*
arg, not a void*
or char*
. Is `reinterpret_cast`ing between hardware vector pointer and the corresponding type an undefined behavior?
GNU C/C++ also defines the behaviour of left-shifting a negative signed number (even without -fwrapv
), separately from the normal signed-overflow UB rules. (This is UB in ISO C++, while right shifts of signed numbers are implementation-defined (logical vs. arithmetic); good quality implementations choose arithmetic on HW that has arithmetic right shifts, but ISO C++ doesn't specify). This is documented in the GCC manual's Integer section, along with defining implementation-defined behaviour that C standards require implementations to define one way or another.
There are definitely quality-of-implementation issues that compiler developers care about; they generally aren't trying to make compilers that are intentionally hostile, but taking advantage of all the UB potholes in C++ (except ones they choose to define) to optimize better can be nearly indistinguishable at times.
Footnote 1: The upper 56 bits can be garbage which the callee must ignore, as usual for types narrower than a register.
(Other ABIs do make different choices here. Some do require narrow integer types to be zero- or sign-extended to fill a register when passed to or returned from functions, like MIPS64 and PowerPC64. See the last section of this x86-64 answer which compares vs. those earlier ISAs.)
For example, a caller might have calculated a & 0x01010101
in RDI and used it for something else, before calling bool_func(a&1)
. The caller could optimize away the &1
because it already did that to the low byte as part of and edi, 0x01010101
, and it knows the callee is required to ignore the high bytes.
Or if a bool is passed as the 3rd arg, maybe a caller optimizing for code-size loads it with mov dl, [mem]
instead of movzx edx, [mem]
, saving 1 byte at the cost of a false dependency on the old value of RDX (or other partial-register effect, depending on CPU model). Or for the first arg, mov dil, byte [r10]
instead of movzx edi, byte [r10]
, because both require a REX prefix anyway.
This is why clang emits movzx eax, dil
in Serialize
, instead of sub eax, edi
. (For integer args, clang violates this ABI rule, instead depending on the undocumented behaviour of gcc and clang to zero- or sign-extend narrow integers to 32 bits. Is a sign or zero extension required when adding a 32bit offset to a pointer for the x86-64 ABI?
So I was interested to see that it doesn't do the same thing for bool
.)
Footnote 2: After branching, you'd just have a 4-byte mov
-immediate, or a 4-byte + 1-byte store. The length is implicit in the store widths + offsets.
OTOH, glibc memcpy will do two 4-byte loads/stores with an overlap that depends on length, so this really does end up making the whole thing free of conditional branches on the boolean. See the L(between_4_7):
block in glibc's memcpy/memmove. Or at least, go the same way for either boolean in memcpy's branching to select a chunk size.
If inlining, you could use 2x mov
-immediate + cmov
and a conditional offset, or you could leave the string data in memory.
Or if tuning for Intel Ice Lake (with the Fast Short REP MOV feature), an actual rep movsb
might be optimal. glibc memcpy
might start using rep movsb
for small sizes on CPUs with that feature, saving a lot of branching.
Tools for detecting UB and usage of uninitialized values
In gcc and clang, you can compile with -fsanitize=undefined
to add run-time instrumentation that will warn or error out on UB that happens at runtime. That won't catch unitialized variables, though. (Because it doesn't increase type sizes to make room for an "uninitialized" bit).
See https://developers.redhat.com/blog/2014/10/16/gcc-undefined-behavior-sanitizer-ubsan/
To find usage of uninitialized data, there's Address Sanitizer and Memory Sanitizer in clang/LLVM. https://github.com/google/sanitizers/wiki/MemorySanitizer shows examples of clang -fsanitize=memory -fPIE -pie
detecting uninitialized memory reads. It might work best if you compile without optimization, so all reads of variables end up actually loading from memory in the asm. They show it being used at -O2
in a case where the load wouldn't optimize away. I haven't tried it myself. (In some cases, e.g. not initializing an accumulator before summing an array, clang -O3 will emit code that sums into a vector register that it never initialized. So with optimization, you can have a case where there's no memory read associated with the UB. But -fsanitize=memory
changes the generated asm, and might result in a check for this.)
It will tolerate copying of uninitialized memory, and also simple logic and arithmetic operations with it. In general, MemorySanitizer silently tracks the spread of uninitialized data in memory, and reports a warning when a code branch is taken (or not taken) depending on an uninitialized value.
MemorySanitizer implements a subset of functionality found in Valgrind (Memcheck tool).
It should work for this case because the call to glibc memcpy
with a length
calculated from uninitialized memory will (inside the library) result in a branch based on length
. If it had inlined a fully branchless version that just used cmov
, indexing, and two stores, it might not have worked.
Valgrind's memcheck
will also look for this kind of problem, again not complaining if the program simply copies around uninitialized data. But it says it will detect when a "Conditional jump or move depends on uninitialised value(s)", to try to catch any externally-visible behaviour that depends on uninitialized data.
Perhaps the idea behind not flagging just a load is that structs can have padding, and copying the whole struct (including padding) with a wide vector load/store is not an error even if the individual members were only written one at a time. At the asm level, the information about what was padding and what is actually part of the value has been lost.
1
I've seen a worse case where the variable took a value not in range of an 8 bit integer, but only of the entire CPU register. And Itanium has a worse one yet, use of an uninitialized variable can crash outright.
– Joshua
Jan 11 at 3:27
5
xkcd.com/499 is pretty good explanation of what UB is.
– val
Jan 11 at 4:30
7
Moreover, this also illustrates why the UB featurebug was introduced in the design of the languages C and C++ in the first place: because it gives the compiler exactly this kind of freedom, which has now permitted the most modern compilers to perform these high-quality optimizations that make C/C++ such high-performance mid-level languages.
– The_Sympathizer
Jan 11 at 7:04
1
And so the war between C++ compiler writers and C++ programmers trying to write useful programs continues. This answer, totally comprehensive in answering this question, could also be used as is as convincing ad copy for vendors of static analysis tools ...
– davidbak
Jan 12 at 2:45
3
@The_Sympathizer: UB was included to allow implementations to behave in whatever ways would be most useful to their customers. It was not intended to suggest that all behaviors should be considered equally useful.
– supercat
Jan 12 at 22:23
|
show 6 more comments
The compiler is allowed to assume that a boolean value passed as an argument is a valid boolean value (i.e. one which has been initialised or converted to true
or false
). The true
value doesn't have to be the same as the integer 1 -- indeed, there can be various representations of true
and false
-- but the parameter must be some valid representation of one of those two values, where "valid representation" is implementation-defined.
So if you fail to initialise a bool
, or if you succeed in overwriting it through some pointer of a different type, then the compiler's assumptions will be wrong and Undefined Behaviour will ensue. You had been warned:
50) Using a bool value in ways described by this International Standard as “undefined”, such as by examining the value of an uninitialized automatic object, might cause it to behave as if it is neither true nor false. (Footnote to para 6 of §6.9.1, Fundamental Types)
11
The "true
value doesn't have to be the same as the integer 1" is kind of misleading. Sure, the actual bit pattern could be something else, but when implicitly converted/promoted (the only way you'd see a value other thantrue
/false
),true
is always1
, andfalse
is always0
. Of course, such a compiler would also be unable to use the trick this compiler was trying to use (using the fact thatbool
s actual bit pattern could only be0
or1
), so it's kind of irrelevant to the OP's problem.
– ShadowRanger
Jan 10 at 2:08
3
@ShadowRanger You can always inspect the object representation directly.
– T.C.
Jan 10 at 2:12
6
@shadowranger: my point is that the implementation is in charge. If it limits valid representations oftrue
to the bit pattern1
, that's its prerogative. If it chooses some other set of representations, then it indeed could not use the optimisation noted here. If it does choose that particular representation, then it can. It only needs to be internally consistent. You can examine the representation of abool
by copying it into a byte array; that is not UB (but it is implementation-defined)
– rici
Jan 10 at 2:28
3
Yes, optimizing compilers (i.e. real-world C++ implementation) often will sometimes emit code that depends on abool
having a bit-pattern of0
or1
. They don't re-booleanize abool
every time they read it from memory (or a register holding a function arg). That's what this answer is saying. examples: gcc4.7+ can optimizereturn a||b
toor eax, edi
in a function returningbool
, or MSVC can optimizea&b
totest cl, dl
. x86'stest
is a bitwiseand
, so ifcl=1
anddl=2
test sets flags according tocl&dl = 0
.
– Peter Cordes
Jan 10 at 8:21
4
The point about undefined behavior is that the compiler is allowed to draw far more conclusions about it, e.g. to assume that a code path which would lead to accessing an uninitialized value is never taken at all, as ensuring that is precisely the responsibility of the programmer. So it’s not just about the possibility that the low level values could be different than zero or one.
– Holger
Jan 10 at 10:47
|
show 6 more comments
The function itself is correct, but in your test program, the statement that calls the function causes undefined behaviour by using the value of an uninitialized variable.
The bug is in the calling function, and it could be detected by code review or static analysis of the calling function. Using your compiler explorer link, the gcc 8.2 compiler does detect the bug. (Maybe you could file a bug report against clang that it doesn't find the problem).
Undefined behaviour means anything can happen, which includes the program crashing a few lines after the event that triggered the undefined behaviour.
NB. The answer to "Can undefined behaviour cause _____ ?" is always "Yes". That's literally the definition of undefined behaviour.
2
Is the first clause true? Does merely copying an uninitializedbool
trigger UB?
– Joshua Green
Jan 10 at 3:25
10
@JoshuaGreen see [dcl.init]/12 "If an indeterminate value is produced by an evaluation, the behaviour is undefined except in the following cases:" (and none of those cases have an exception forbool
). Copying requires evaluating the source
– M.M
Jan 10 at 3:34
8
@JoshuaGreen And the reason for that is that you might have a platform that triggers a hardware fault if you access some invalid values for some types. These are sometimes called "trap representations".
– David Schwartz
Jan 10 at 11:15
4
Itanium, while obscure, is a CPU that's still in production, has trap values, and has two at least semi-modern C++ compilers (Intel/HP). It literally hastrue
,false
andnot-a-thing
values for booleans.
– MSalters
Jan 10 at 20:03
3
On the flip side, the answer to "Does the standard require all compilers to process something a certain way" is generally "no", even/especially in cases where it's obvious that any quality compiler should do so; the more obvious something is, the less need there should be for the authors of the Standard to actually say it.
– supercat
Jan 10 at 21:23
|
show 3 more comments
A bool is only allowed to hold the values 0
or 1
, and the generated code can assume that it will only hold one of these two values. The code generated for the ternary in the assignment could use the value as the index into an array of pointers to the two strings, i.e. it might be converted to something like:
// the compile could make asm that "looks" like this, from your source
const static char *strings = {"false", "true"};
const char *whichString = strings[boolValue];
If boolValue
is uninitialized, it could actually hold any integer value, which would then cause accessing outside the bounds of the strings
array.
1
@SidS Thanks. Theoretically, the internal representations could be the opposite of how they cast to/from integers, but that would be perverse.
– Barmar
Jan 10 at 2:09
1
You are right, and your example will also crash. However it is "visible" to a code review that you are using an uninitialized variable as an index to an array. Also, it would crash even in debug (for example some debugger/compiler will initialize with specific patterns to make it easier to see when it crashes). In my example, the surprising part is that the usage of the bool is invisible: The optimizer decided to use it in a calculation not present in the source code.
– Remz
Jan 10 at 2:25
3
@Remz I'm just using the array to show what the generated code could be equivalent to, not suggesting that anyone would actually write that.
– Barmar
Jan 10 at 2:28
1
@Remz Recast thebool
toint
with*(int *)&boolValue
and print it for debugging purposes, see if it is anything other than0
or1
when it crashes. If that's the case, it pretty much confirms the theory that the compiler is optimizing the inline-if as an array which explains why it is crashing.
– Havenard
Jan 10 at 2:57
2
@MSalters:std::bitset<8>
doesn't give me nice names for all my different flags. Depending on what they are, that may be important.
– Martin Bonner
Jan 11 at 15:13
|
show 6 more comments
Summarising your question a lot, you are asking Does the C++ standard allow a compiler to assume a bool
can only have an internal numerical representation of '0' or '1' and use it in such a way?
The standard says nothing about the internal representation of a bool
. It only defines what happens when casting a bool
to an int
(or vice versa). Mostly, because of these integral conversions (and the fact that people rely rather heavily on them), the compiler will use 0 and 1, but it doesn't have to (although it has to respect the constraints of any lower level ABI it uses).
So, the compiler, when it sees a bool
is entitled to consider that said bool
contains either of the 'true
' or 'false
' bit patterns and do anything it feels like. So if the values for true
and false
are 1 and 0, respectively, the compiler is indeed allowed to optimise strlen
to 5 - <boolean value>
. Other fun behaviours are possible!
As gets repeatedly stated here, undefined behaviour has undefined results. Including but not limited to
- Your code working as you expected it to
- Your code failing at random times
- Your code not being run at all.
See What every programmer should know about undefined behavior
add a comment |
protected by P.W Feb 26 at 9:45
Thank you for your interest in this question.
Because it has attracted low-quality or spam answers that had to be removed, posting an answer now requires 10 reputation on this site (the association bonus does not count).
Would you like to answer one of these unanswered questions instead?
5 Answers
5
active
oldest
votes
5 Answers
5
active
oldest
votes
active
oldest
votes
active
oldest
votes
Yes, ISO C++ allows (but doesn't require) implementations to make this choice.
But also note that ISO C++ allows a compiler to emit code that crashes on purpose (e.g. with an illegal instruction) if the program encounters UB, e.g. as a way to help you find errors. (Or because it's a DeathStation 9000. Being strictly conforming is not sufficient for a C++ implementation to be useful for any real purpose). So ISO C++ would allow a compiler to make asm that crashed (for totally different reasons) even on similar code that read an uninitialized uint32_t
. Even though that's required to be a fixed-layout type with no trap representations.
It's an interesting question about how real implementations work, but remember that even if the answer was different, your code would still be unsafe because modern C++ is not a portable version of assembly language.
You're compiling for the x86-64 System V ABI, which specifies that a bool
as a function arg in a register is represented by the bit-patterns false=0
and true=1
in the low 8 bits of the register1. In memory, bool
is a 1-byte type that again must have an integer value of 0 or 1.
(An ABI is a set of implementation choices that compilers for the same platform agree on so they can make code that calls each other's functions, including type sizes, struct layout rules, and calling conventions.)
ISO C++ doesn't specify it, but this ABI decision is widespread because it makes bool->int conversion cheap (just zero-extension). I'm not aware of any ABIs that don't let the compiler assume 0 or 1 for bool
, for any architecture (not just x86). It allows optimizations like !mybool
with xor eax,1
to flip the low bit: Any possible code that can flip a bit/integer/bool between 0 and 1 in single CPU instruction. Or compiling a&&b
to a bitwise AND for bool
types. Some compilers do actually take advantage Boolean values as 8 bit in compilers. Are operations on them inefficient?.
In general, the as-if rule allows allows the compiler to take advantage of things that are true on the target platform being compiled for, because the end result will be executable code that implements the same externally-visible behaviour as the C++ source. (With all the restrictions that Undefined Behaviour places on what is actually "externally visible": not with a debugger, but from another thread in a well-formed / legal C++ program.)
The compiler is definitely allowed to take full advantage of an ABI guarantee in its code-gen, and make code like you found which optimizes strlen(whichString)
to5U - boolValue
. (BTW, this optimization is kind of clever, but maybe shortsighted vs. branching and inlining memcpy
as stores of immediate data2.)
Or the compiler could have created a table of pointers and indexed it with the integer value of the bool
, again assuming it was a 0 or 1. (This possibility is what @Barmar's answer suggested.)
Your __attribute((noinline))
constructor with optimization enabled led to clang just loading a byte from the stack to use as uninitializedBool
. It made space for the object in main
with push rax
(which is smaller and for various reason about as efficient as sub rsp, 8
), so whatever garbage was in AL on entry to main
is the value it used for uninitializedBool
. This is why you actually got values that weren't just 0
.
5U - random garbage
can easily wrap to a large unsigned value, leading memcpy to go into unmapped memory. The destination is in static storage, not the stack, so you're not overwriting a return address or something.
Other implementations could make different choices, e.g. false=0
and true=any non-zero value
. Then clang probably wouldn't make code that crashes for this specific instance of UB. (But it would still be allowed to if it wanted to.) I don't know of any implementations that choose anything other what x86-64 does for bool
, but the C++ standard allows many things that nobody does or even would want to do on hardware that's anything like current CPUs.
ISO C++ leaves it unspecified what you'll find when you examine or modify the object representation of a bool
. (e.g. by memcpy
ing the bool
into unsigned char
, which you're allowed to do because char*
can alias anything. And unsigned char
is guaranteed to have no padding bits, so the C++ standard does formally let you hexdump object representations without any UB. Pointer-casting to copy the object representation is different from assigning char foo = my_bool
, of course, so booleanization to 0 or 1 wouldn't happen and you'd get the raw object representation.)
You've partially "hidden" the UB on this execution path from the compiler with noinline
. Even if it doesn't inline, though, interprocedural optimizations could still make a version of the function that depends on the definition of another function. (First, clang is making an executable, not a Unix shared library where symbol-interposition can happen. Second, the definition in inside the class{}
definition so all translation units must have the same definition. Like with the inline
keyword.)
So a compiler could emit just a ret
or ud2
(illegal instruction) as the definition for main
, because the path of execution starting at the top of main
unavoidably encounters Undefined Behaviour. (Which the compiler can see at compile time if it decided to follow the path through the non-inline constructor.)
Any program that encounters UB is totally undefined for its entire existence. But UB inside a function or if()
branch that never actually runs doesn't corrupt the rest of the program. In practice that means that compilers can decide to emit an illegal instruction, or a ret
, or not emit anything and fall into the next block / function, for the whole basic block that can be proven at compile time to contain or lead to UB.
GCC and Clang in practice do actually sometimes emit ud2
on UB, instead of even trying to generate code for paths of execution that make no sense. Or for cases like falling off the end of a non-void
function, gcc will sometimes omit a ret
instruction. If you were thinking that "my function will just return with whatever garbage is in RAX", you are sorely mistaken. Modern C++ compilers don't treat the language like a portable assembly language any more. Your program really has to be valid C++, without making assumptions about how a stand-alone non inlined version of your function might look in asm.
Another fun example is Why does unaligned access to mmap'ed memory sometimes segfault on AMD64?. x86 doesn't fault on unaligned integers, right? So why would a misaligned uint16_t*
be a problem? Because alignof(uint16_t) == 2
, and violating that assumption led to a segfault when auto-vectorizing with SSE2.
See also What Every C Programmer Should Know About Undefined Behavior #1/3, an article by a clang developer.
Key point: if the compiler noticed the UB at compile time, it could "break" (emit surprising asm) the path through your code that causes UB even if targeting an ABI where any bit-pattern is a valid object representation for bool
.
Expect total hostility toward many mistakes by the programmer, especially things modern compilers warn about. This is why you should use -Wall
and fix warnings. C++ is not a user-friendly language, and something in C++ can be unsafe even if it would be safe in asm on the target you're compiling for. (e.g. signed overflow is UB in C++ and compilers will assume it doesn't happen, even when compiling for 2's complement x86, unless you use clang/gcc -fwrapv
.)
Compile-time-visible UB is always dangerous, and it's really hard to be sure (with link-time optimization) that you've really hidden UB from the compiler and can thus reason about what kind of asm it will generate.
Not to be over-dramatic; often compilers do let you get away with some things and emit code like you're expecting even when something is UB. But maybe it will be a problem in the future if compiler devs implement some optimization that gains more info about value-ranges (e.g. that a variable is non-negative, maybe allowing it to optimize sign-extension to free zero-extension on x86-64). For example, in current gcc and clang, doing tmp = a+INT_MIN
doesn't let them optimize a<0
as always-true, only that tmp
is always negative. (So they don't backtrack from the inputs of a calculation to derive range info, only on the results based on the assumption of no signed overflow: example on Godbolt. I don't know if this is intentional user-friendliness or simply a missed optimization.)
Also note that implementations (aka compilers) are allowed to define behaviour that ISO C++ leaves undefined. For example, all compilers that support Intel's intrinsics (like _mm_add_ps(__m128, __m128)
for manual SIMD vectorization) must allow forming mis-aligned pointers, which is UB in C++ even if you don't dereference them. __m128i _mm_loadu_si128(const __m128i *)
does unaligned loads by taking a misaligned __m128i*
arg, not a void*
or char*
. Is `reinterpret_cast`ing between hardware vector pointer and the corresponding type an undefined behavior?
GNU C/C++ also defines the behaviour of left-shifting a negative signed number (even without -fwrapv
), separately from the normal signed-overflow UB rules. (This is UB in ISO C++, while right shifts of signed numbers are implementation-defined (logical vs. arithmetic); good quality implementations choose arithmetic on HW that has arithmetic right shifts, but ISO C++ doesn't specify). This is documented in the GCC manual's Integer section, along with defining implementation-defined behaviour that C standards require implementations to define one way or another.
There are definitely quality-of-implementation issues that compiler developers care about; they generally aren't trying to make compilers that are intentionally hostile, but taking advantage of all the UB potholes in C++ (except ones they choose to define) to optimize better can be nearly indistinguishable at times.
Footnote 1: The upper 56 bits can be garbage which the callee must ignore, as usual for types narrower than a register.
(Other ABIs do make different choices here. Some do require narrow integer types to be zero- or sign-extended to fill a register when passed to or returned from functions, like MIPS64 and PowerPC64. See the last section of this x86-64 answer which compares vs. those earlier ISAs.)
For example, a caller might have calculated a & 0x01010101
in RDI and used it for something else, before calling bool_func(a&1)
. The caller could optimize away the &1
because it already did that to the low byte as part of and edi, 0x01010101
, and it knows the callee is required to ignore the high bytes.
Or if a bool is passed as the 3rd arg, maybe a caller optimizing for code-size loads it with mov dl, [mem]
instead of movzx edx, [mem]
, saving 1 byte at the cost of a false dependency on the old value of RDX (or other partial-register effect, depending on CPU model). Or for the first arg, mov dil, byte [r10]
instead of movzx edi, byte [r10]
, because both require a REX prefix anyway.
This is why clang emits movzx eax, dil
in Serialize
, instead of sub eax, edi
. (For integer args, clang violates this ABI rule, instead depending on the undocumented behaviour of gcc and clang to zero- or sign-extend narrow integers to 32 bits. Is a sign or zero extension required when adding a 32bit offset to a pointer for the x86-64 ABI?
So I was interested to see that it doesn't do the same thing for bool
.)
Footnote 2: After branching, you'd just have a 4-byte mov
-immediate, or a 4-byte + 1-byte store. The length is implicit in the store widths + offsets.
OTOH, glibc memcpy will do two 4-byte loads/stores with an overlap that depends on length, so this really does end up making the whole thing free of conditional branches on the boolean. See the L(between_4_7):
block in glibc's memcpy/memmove. Or at least, go the same way for either boolean in memcpy's branching to select a chunk size.
If inlining, you could use 2x mov
-immediate + cmov
and a conditional offset, or you could leave the string data in memory.
Or if tuning for Intel Ice Lake (with the Fast Short REP MOV feature), an actual rep movsb
might be optimal. glibc memcpy
might start using rep movsb
for small sizes on CPUs with that feature, saving a lot of branching.
Tools for detecting UB and usage of uninitialized values
In gcc and clang, you can compile with -fsanitize=undefined
to add run-time instrumentation that will warn or error out on UB that happens at runtime. That won't catch unitialized variables, though. (Because it doesn't increase type sizes to make room for an "uninitialized" bit).
See https://developers.redhat.com/blog/2014/10/16/gcc-undefined-behavior-sanitizer-ubsan/
To find usage of uninitialized data, there's Address Sanitizer and Memory Sanitizer in clang/LLVM. https://github.com/google/sanitizers/wiki/MemorySanitizer shows examples of clang -fsanitize=memory -fPIE -pie
detecting uninitialized memory reads. It might work best if you compile without optimization, so all reads of variables end up actually loading from memory in the asm. They show it being used at -O2
in a case where the load wouldn't optimize away. I haven't tried it myself. (In some cases, e.g. not initializing an accumulator before summing an array, clang -O3 will emit code that sums into a vector register that it never initialized. So with optimization, you can have a case where there's no memory read associated with the UB. But -fsanitize=memory
changes the generated asm, and might result in a check for this.)
It will tolerate copying of uninitialized memory, and also simple logic and arithmetic operations with it. In general, MemorySanitizer silently tracks the spread of uninitialized data in memory, and reports a warning when a code branch is taken (or not taken) depending on an uninitialized value.
MemorySanitizer implements a subset of functionality found in Valgrind (Memcheck tool).
It should work for this case because the call to glibc memcpy
with a length
calculated from uninitialized memory will (inside the library) result in a branch based on length
. If it had inlined a fully branchless version that just used cmov
, indexing, and two stores, it might not have worked.
Valgrind's memcheck
will also look for this kind of problem, again not complaining if the program simply copies around uninitialized data. But it says it will detect when a "Conditional jump or move depends on uninitialised value(s)", to try to catch any externally-visible behaviour that depends on uninitialized data.
Perhaps the idea behind not flagging just a load is that structs can have padding, and copying the whole struct (including padding) with a wide vector load/store is not an error even if the individual members were only written one at a time. At the asm level, the information about what was padding and what is actually part of the value has been lost.
1
I've seen a worse case where the variable took a value not in range of an 8 bit integer, but only of the entire CPU register. And Itanium has a worse one yet, use of an uninitialized variable can crash outright.
– Joshua
Jan 11 at 3:27
5
xkcd.com/499 is pretty good explanation of what UB is.
– val
Jan 11 at 4:30
7
Moreover, this also illustrates why the UB featurebug was introduced in the design of the languages C and C++ in the first place: because it gives the compiler exactly this kind of freedom, which has now permitted the most modern compilers to perform these high-quality optimizations that make C/C++ such high-performance mid-level languages.
– The_Sympathizer
Jan 11 at 7:04
1
And so the war between C++ compiler writers and C++ programmers trying to write useful programs continues. This answer, totally comprehensive in answering this question, could also be used as is as convincing ad copy for vendors of static analysis tools ...
– davidbak
Jan 12 at 2:45
3
@The_Sympathizer: UB was included to allow implementations to behave in whatever ways would be most useful to their customers. It was not intended to suggest that all behaviors should be considered equally useful.
– supercat
Jan 12 at 22:23
|
show 6 more comments
Yes, ISO C++ allows (but doesn't require) implementations to make this choice.
But also note that ISO C++ allows a compiler to emit code that crashes on purpose (e.g. with an illegal instruction) if the program encounters UB, e.g. as a way to help you find errors. (Or because it's a DeathStation 9000. Being strictly conforming is not sufficient for a C++ implementation to be useful for any real purpose). So ISO C++ would allow a compiler to make asm that crashed (for totally different reasons) even on similar code that read an uninitialized uint32_t
. Even though that's required to be a fixed-layout type with no trap representations.
It's an interesting question about how real implementations work, but remember that even if the answer was different, your code would still be unsafe because modern C++ is not a portable version of assembly language.
You're compiling for the x86-64 System V ABI, which specifies that a bool
as a function arg in a register is represented by the bit-patterns false=0
and true=1
in the low 8 bits of the register1. In memory, bool
is a 1-byte type that again must have an integer value of 0 or 1.
(An ABI is a set of implementation choices that compilers for the same platform agree on so they can make code that calls each other's functions, including type sizes, struct layout rules, and calling conventions.)
ISO C++ doesn't specify it, but this ABI decision is widespread because it makes bool->int conversion cheap (just zero-extension). I'm not aware of any ABIs that don't let the compiler assume 0 or 1 for bool
, for any architecture (not just x86). It allows optimizations like !mybool
with xor eax,1
to flip the low bit: Any possible code that can flip a bit/integer/bool between 0 and 1 in single CPU instruction. Or compiling a&&b
to a bitwise AND for bool
types. Some compilers do actually take advantage Boolean values as 8 bit in compilers. Are operations on them inefficient?.
In general, the as-if rule allows allows the compiler to take advantage of things that are true on the target platform being compiled for, because the end result will be executable code that implements the same externally-visible behaviour as the C++ source. (With all the restrictions that Undefined Behaviour places on what is actually "externally visible": not with a debugger, but from another thread in a well-formed / legal C++ program.)
The compiler is definitely allowed to take full advantage of an ABI guarantee in its code-gen, and make code like you found which optimizes strlen(whichString)
to5U - boolValue
. (BTW, this optimization is kind of clever, but maybe shortsighted vs. branching and inlining memcpy
as stores of immediate data2.)
Or the compiler could have created a table of pointers and indexed it with the integer value of the bool
, again assuming it was a 0 or 1. (This possibility is what @Barmar's answer suggested.)
Your __attribute((noinline))
constructor with optimization enabled led to clang just loading a byte from the stack to use as uninitializedBool
. It made space for the object in main
with push rax
(which is smaller and for various reason about as efficient as sub rsp, 8
), so whatever garbage was in AL on entry to main
is the value it used for uninitializedBool
. This is why you actually got values that weren't just 0
.
5U - random garbage
can easily wrap to a large unsigned value, leading memcpy to go into unmapped memory. The destination is in static storage, not the stack, so you're not overwriting a return address or something.
Other implementations could make different choices, e.g. false=0
and true=any non-zero value
. Then clang probably wouldn't make code that crashes for this specific instance of UB. (But it would still be allowed to if it wanted to.) I don't know of any implementations that choose anything other what x86-64 does for bool
, but the C++ standard allows many things that nobody does or even would want to do on hardware that's anything like current CPUs.
ISO C++ leaves it unspecified what you'll find when you examine or modify the object representation of a bool
. (e.g. by memcpy
ing the bool
into unsigned char
, which you're allowed to do because char*
can alias anything. And unsigned char
is guaranteed to have no padding bits, so the C++ standard does formally let you hexdump object representations without any UB. Pointer-casting to copy the object representation is different from assigning char foo = my_bool
, of course, so booleanization to 0 or 1 wouldn't happen and you'd get the raw object representation.)
You've partially "hidden" the UB on this execution path from the compiler with noinline
. Even if it doesn't inline, though, interprocedural optimizations could still make a version of the function that depends on the definition of another function. (First, clang is making an executable, not a Unix shared library where symbol-interposition can happen. Second, the definition in inside the class{}
definition so all translation units must have the same definition. Like with the inline
keyword.)
So a compiler could emit just a ret
or ud2
(illegal instruction) as the definition for main
, because the path of execution starting at the top of main
unavoidably encounters Undefined Behaviour. (Which the compiler can see at compile time if it decided to follow the path through the non-inline constructor.)
Any program that encounters UB is totally undefined for its entire existence. But UB inside a function or if()
branch that never actually runs doesn't corrupt the rest of the program. In practice that means that compilers can decide to emit an illegal instruction, or a ret
, or not emit anything and fall into the next block / function, for the whole basic block that can be proven at compile time to contain or lead to UB.
GCC and Clang in practice do actually sometimes emit ud2
on UB, instead of even trying to generate code for paths of execution that make no sense. Or for cases like falling off the end of a non-void
function, gcc will sometimes omit a ret
instruction. If you were thinking that "my function will just return with whatever garbage is in RAX", you are sorely mistaken. Modern C++ compilers don't treat the language like a portable assembly language any more. Your program really has to be valid C++, without making assumptions about how a stand-alone non inlined version of your function might look in asm.
Another fun example is Why does unaligned access to mmap'ed memory sometimes segfault on AMD64?. x86 doesn't fault on unaligned integers, right? So why would a misaligned uint16_t*
be a problem? Because alignof(uint16_t) == 2
, and violating that assumption led to a segfault when auto-vectorizing with SSE2.
See also What Every C Programmer Should Know About Undefined Behavior #1/3, an article by a clang developer.
Key point: if the compiler noticed the UB at compile time, it could "break" (emit surprising asm) the path through your code that causes UB even if targeting an ABI where any bit-pattern is a valid object representation for bool
.
Expect total hostility toward many mistakes by the programmer, especially things modern compilers warn about. This is why you should use -Wall
and fix warnings. C++ is not a user-friendly language, and something in C++ can be unsafe even if it would be safe in asm on the target you're compiling for. (e.g. signed overflow is UB in C++ and compilers will assume it doesn't happen, even when compiling for 2's complement x86, unless you use clang/gcc -fwrapv
.)
Compile-time-visible UB is always dangerous, and it's really hard to be sure (with link-time optimization) that you've really hidden UB from the compiler and can thus reason about what kind of asm it will generate.
Not to be over-dramatic; often compilers do let you get away with some things and emit code like you're expecting even when something is UB. But maybe it will be a problem in the future if compiler devs implement some optimization that gains more info about value-ranges (e.g. that a variable is non-negative, maybe allowing it to optimize sign-extension to free zero-extension on x86-64). For example, in current gcc and clang, doing tmp = a+INT_MIN
doesn't let them optimize a<0
as always-true, only that tmp
is always negative. (So they don't backtrack from the inputs of a calculation to derive range info, only on the results based on the assumption of no signed overflow: example on Godbolt. I don't know if this is intentional user-friendliness or simply a missed optimization.)
Also note that implementations (aka compilers) are allowed to define behaviour that ISO C++ leaves undefined. For example, all compilers that support Intel's intrinsics (like _mm_add_ps(__m128, __m128)
for manual SIMD vectorization) must allow forming mis-aligned pointers, which is UB in C++ even if you don't dereference them. __m128i _mm_loadu_si128(const __m128i *)
does unaligned loads by taking a misaligned __m128i*
arg, not a void*
or char*
. Is `reinterpret_cast`ing between hardware vector pointer and the corresponding type an undefined behavior?
GNU C/C++ also defines the behaviour of left-shifting a negative signed number (even without -fwrapv
), separately from the normal signed-overflow UB rules. (This is UB in ISO C++, while right shifts of signed numbers are implementation-defined (logical vs. arithmetic); good quality implementations choose arithmetic on HW that has arithmetic right shifts, but ISO C++ doesn't specify). This is documented in the GCC manual's Integer section, along with defining implementation-defined behaviour that C standards require implementations to define one way or another.
There are definitely quality-of-implementation issues that compiler developers care about; they generally aren't trying to make compilers that are intentionally hostile, but taking advantage of all the UB potholes in C++ (except ones they choose to define) to optimize better can be nearly indistinguishable at times.
Footnote 1: The upper 56 bits can be garbage which the callee must ignore, as usual for types narrower than a register.
(Other ABIs do make different choices here. Some do require narrow integer types to be zero- or sign-extended to fill a register when passed to or returned from functions, like MIPS64 and PowerPC64. See the last section of this x86-64 answer which compares vs. those earlier ISAs.)
For example, a caller might have calculated a & 0x01010101
in RDI and used it for something else, before calling bool_func(a&1)
. The caller could optimize away the &1
because it already did that to the low byte as part of and edi, 0x01010101
, and it knows the callee is required to ignore the high bytes.
Or if a bool is passed as the 3rd arg, maybe a caller optimizing for code-size loads it with mov dl, [mem]
instead of movzx edx, [mem]
, saving 1 byte at the cost of a false dependency on the old value of RDX (or other partial-register effect, depending on CPU model). Or for the first arg, mov dil, byte [r10]
instead of movzx edi, byte [r10]
, because both require a REX prefix anyway.
This is why clang emits movzx eax, dil
in Serialize
, instead of sub eax, edi
. (For integer args, clang violates this ABI rule, instead depending on the undocumented behaviour of gcc and clang to zero- or sign-extend narrow integers to 32 bits. Is a sign or zero extension required when adding a 32bit offset to a pointer for the x86-64 ABI?
So I was interested to see that it doesn't do the same thing for bool
.)
Footnote 2: After branching, you'd just have a 4-byte mov
-immediate, or a 4-byte + 1-byte store. The length is implicit in the store widths + offsets.
OTOH, glibc memcpy will do two 4-byte loads/stores with an overlap that depends on length, so this really does end up making the whole thing free of conditional branches on the boolean. See the L(between_4_7):
block in glibc's memcpy/memmove. Or at least, go the same way for either boolean in memcpy's branching to select a chunk size.
If inlining, you could use 2x mov
-immediate + cmov
and a conditional offset, or you could leave the string data in memory.
Or if tuning for Intel Ice Lake (with the Fast Short REP MOV feature), an actual rep movsb
might be optimal. glibc memcpy
might start using rep movsb
for small sizes on CPUs with that feature, saving a lot of branching.
Tools for detecting UB and usage of uninitialized values
In gcc and clang, you can compile with -fsanitize=undefined
to add run-time instrumentation that will warn or error out on UB that happens at runtime. That won't catch unitialized variables, though. (Because it doesn't increase type sizes to make room for an "uninitialized" bit).
See https://developers.redhat.com/blog/2014/10/16/gcc-undefined-behavior-sanitizer-ubsan/
To find usage of uninitialized data, there's Address Sanitizer and Memory Sanitizer in clang/LLVM. https://github.com/google/sanitizers/wiki/MemorySanitizer shows examples of clang -fsanitize=memory -fPIE -pie
detecting uninitialized memory reads. It might work best if you compile without optimization, so all reads of variables end up actually loading from memory in the asm. They show it being used at -O2
in a case where the load wouldn't optimize away. I haven't tried it myself. (In some cases, e.g. not initializing an accumulator before summing an array, clang -O3 will emit code that sums into a vector register that it never initialized. So with optimization, you can have a case where there's no memory read associated with the UB. But -fsanitize=memory
changes the generated asm, and might result in a check for this.)
It will tolerate copying of uninitialized memory, and also simple logic and arithmetic operations with it. In general, MemorySanitizer silently tracks the spread of uninitialized data in memory, and reports a warning when a code branch is taken (or not taken) depending on an uninitialized value.
MemorySanitizer implements a subset of functionality found in Valgrind (Memcheck tool).
It should work for this case because the call to glibc memcpy
with a length
calculated from uninitialized memory will (inside the library) result in a branch based on length
. If it had inlined a fully branchless version that just used cmov
, indexing, and two stores, it might not have worked.
Valgrind's memcheck
will also look for this kind of problem, again not complaining if the program simply copies around uninitialized data. But it says it will detect when a "Conditional jump or move depends on uninitialised value(s)", to try to catch any externally-visible behaviour that depends on uninitialized data.
Perhaps the idea behind not flagging just a load is that structs can have padding, and copying the whole struct (including padding) with a wide vector load/store is not an error even if the individual members were only written one at a time. At the asm level, the information about what was padding and what is actually part of the value has been lost.
1
I've seen a worse case where the variable took a value not in range of an 8 bit integer, but only of the entire CPU register. And Itanium has a worse one yet, use of an uninitialized variable can crash outright.
– Joshua
Jan 11 at 3:27
5
xkcd.com/499 is pretty good explanation of what UB is.
– val
Jan 11 at 4:30
7
Moreover, this also illustrates why the UB featurebug was introduced in the design of the languages C and C++ in the first place: because it gives the compiler exactly this kind of freedom, which has now permitted the most modern compilers to perform these high-quality optimizations that make C/C++ such high-performance mid-level languages.
– The_Sympathizer
Jan 11 at 7:04
1
And so the war between C++ compiler writers and C++ programmers trying to write useful programs continues. This answer, totally comprehensive in answering this question, could also be used as is as convincing ad copy for vendors of static analysis tools ...
– davidbak
Jan 12 at 2:45
3
@The_Sympathizer: UB was included to allow implementations to behave in whatever ways would be most useful to their customers. It was not intended to suggest that all behaviors should be considered equally useful.
– supercat
Jan 12 at 22:23
|
show 6 more comments
Yes, ISO C++ allows (but doesn't require) implementations to make this choice.
But also note that ISO C++ allows a compiler to emit code that crashes on purpose (e.g. with an illegal instruction) if the program encounters UB, e.g. as a way to help you find errors. (Or because it's a DeathStation 9000. Being strictly conforming is not sufficient for a C++ implementation to be useful for any real purpose). So ISO C++ would allow a compiler to make asm that crashed (for totally different reasons) even on similar code that read an uninitialized uint32_t
. Even though that's required to be a fixed-layout type with no trap representations.
It's an interesting question about how real implementations work, but remember that even if the answer was different, your code would still be unsafe because modern C++ is not a portable version of assembly language.
You're compiling for the x86-64 System V ABI, which specifies that a bool
as a function arg in a register is represented by the bit-patterns false=0
and true=1
in the low 8 bits of the register1. In memory, bool
is a 1-byte type that again must have an integer value of 0 or 1.
(An ABI is a set of implementation choices that compilers for the same platform agree on so they can make code that calls each other's functions, including type sizes, struct layout rules, and calling conventions.)
ISO C++ doesn't specify it, but this ABI decision is widespread because it makes bool->int conversion cheap (just zero-extension). I'm not aware of any ABIs that don't let the compiler assume 0 or 1 for bool
, for any architecture (not just x86). It allows optimizations like !mybool
with xor eax,1
to flip the low bit: Any possible code that can flip a bit/integer/bool between 0 and 1 in single CPU instruction. Or compiling a&&b
to a bitwise AND for bool
types. Some compilers do actually take advantage Boolean values as 8 bit in compilers. Are operations on them inefficient?.
In general, the as-if rule allows allows the compiler to take advantage of things that are true on the target platform being compiled for, because the end result will be executable code that implements the same externally-visible behaviour as the C++ source. (With all the restrictions that Undefined Behaviour places on what is actually "externally visible": not with a debugger, but from another thread in a well-formed / legal C++ program.)
The compiler is definitely allowed to take full advantage of an ABI guarantee in its code-gen, and make code like you found which optimizes strlen(whichString)
to5U - boolValue
. (BTW, this optimization is kind of clever, but maybe shortsighted vs. branching and inlining memcpy
as stores of immediate data2.)
Or the compiler could have created a table of pointers and indexed it with the integer value of the bool
, again assuming it was a 0 or 1. (This possibility is what @Barmar's answer suggested.)
Your __attribute((noinline))
constructor with optimization enabled led to clang just loading a byte from the stack to use as uninitializedBool
. It made space for the object in main
with push rax
(which is smaller and for various reason about as efficient as sub rsp, 8
), so whatever garbage was in AL on entry to main
is the value it used for uninitializedBool
. This is why you actually got values that weren't just 0
.
5U - random garbage
can easily wrap to a large unsigned value, leading memcpy to go into unmapped memory. The destination is in static storage, not the stack, so you're not overwriting a return address or something.
Other implementations could make different choices, e.g. false=0
and true=any non-zero value
. Then clang probably wouldn't make code that crashes for this specific instance of UB. (But it would still be allowed to if it wanted to.) I don't know of any implementations that choose anything other what x86-64 does for bool
, but the C++ standard allows many things that nobody does or even would want to do on hardware that's anything like current CPUs.
ISO C++ leaves it unspecified what you'll find when you examine or modify the object representation of a bool
. (e.g. by memcpy
ing the bool
into unsigned char
, which you're allowed to do because char*
can alias anything. And unsigned char
is guaranteed to have no padding bits, so the C++ standard does formally let you hexdump object representations without any UB. Pointer-casting to copy the object representation is different from assigning char foo = my_bool
, of course, so booleanization to 0 or 1 wouldn't happen and you'd get the raw object representation.)
You've partially "hidden" the UB on this execution path from the compiler with noinline
. Even if it doesn't inline, though, interprocedural optimizations could still make a version of the function that depends on the definition of another function. (First, clang is making an executable, not a Unix shared library where symbol-interposition can happen. Second, the definition in inside the class{}
definition so all translation units must have the same definition. Like with the inline
keyword.)
So a compiler could emit just a ret
or ud2
(illegal instruction) as the definition for main
, because the path of execution starting at the top of main
unavoidably encounters Undefined Behaviour. (Which the compiler can see at compile time if it decided to follow the path through the non-inline constructor.)
Any program that encounters UB is totally undefined for its entire existence. But UB inside a function or if()
branch that never actually runs doesn't corrupt the rest of the program. In practice that means that compilers can decide to emit an illegal instruction, or a ret
, or not emit anything and fall into the next block / function, for the whole basic block that can be proven at compile time to contain or lead to UB.
GCC and Clang in practice do actually sometimes emit ud2
on UB, instead of even trying to generate code for paths of execution that make no sense. Or for cases like falling off the end of a non-void
function, gcc will sometimes omit a ret
instruction. If you were thinking that "my function will just return with whatever garbage is in RAX", you are sorely mistaken. Modern C++ compilers don't treat the language like a portable assembly language any more. Your program really has to be valid C++, without making assumptions about how a stand-alone non inlined version of your function might look in asm.
Another fun example is Why does unaligned access to mmap'ed memory sometimes segfault on AMD64?. x86 doesn't fault on unaligned integers, right? So why would a misaligned uint16_t*
be a problem? Because alignof(uint16_t) == 2
, and violating that assumption led to a segfault when auto-vectorizing with SSE2.
See also What Every C Programmer Should Know About Undefined Behavior #1/3, an article by a clang developer.
Key point: if the compiler noticed the UB at compile time, it could "break" (emit surprising asm) the path through your code that causes UB even if targeting an ABI where any bit-pattern is a valid object representation for bool
.
Expect total hostility toward many mistakes by the programmer, especially things modern compilers warn about. This is why you should use -Wall
and fix warnings. C++ is not a user-friendly language, and something in C++ can be unsafe even if it would be safe in asm on the target you're compiling for. (e.g. signed overflow is UB in C++ and compilers will assume it doesn't happen, even when compiling for 2's complement x86, unless you use clang/gcc -fwrapv
.)
Compile-time-visible UB is always dangerous, and it's really hard to be sure (with link-time optimization) that you've really hidden UB from the compiler and can thus reason about what kind of asm it will generate.
Not to be over-dramatic; often compilers do let you get away with some things and emit code like you're expecting even when something is UB. But maybe it will be a problem in the future if compiler devs implement some optimization that gains more info about value-ranges (e.g. that a variable is non-negative, maybe allowing it to optimize sign-extension to free zero-extension on x86-64). For example, in current gcc and clang, doing tmp = a+INT_MIN
doesn't let them optimize a<0
as always-true, only that tmp
is always negative. (So they don't backtrack from the inputs of a calculation to derive range info, only on the results based on the assumption of no signed overflow: example on Godbolt. I don't know if this is intentional user-friendliness or simply a missed optimization.)
Also note that implementations (aka compilers) are allowed to define behaviour that ISO C++ leaves undefined. For example, all compilers that support Intel's intrinsics (like _mm_add_ps(__m128, __m128)
for manual SIMD vectorization) must allow forming mis-aligned pointers, which is UB in C++ even if you don't dereference them. __m128i _mm_loadu_si128(const __m128i *)
does unaligned loads by taking a misaligned __m128i*
arg, not a void*
or char*
. Is `reinterpret_cast`ing between hardware vector pointer and the corresponding type an undefined behavior?
GNU C/C++ also defines the behaviour of left-shifting a negative signed number (even without -fwrapv
), separately from the normal signed-overflow UB rules. (This is UB in ISO C++, while right shifts of signed numbers are implementation-defined (logical vs. arithmetic); good quality implementations choose arithmetic on HW that has arithmetic right shifts, but ISO C++ doesn't specify). This is documented in the GCC manual's Integer section, along with defining implementation-defined behaviour that C standards require implementations to define one way or another.
There are definitely quality-of-implementation issues that compiler developers care about; they generally aren't trying to make compilers that are intentionally hostile, but taking advantage of all the UB potholes in C++ (except ones they choose to define) to optimize better can be nearly indistinguishable at times.
Footnote 1: The upper 56 bits can be garbage which the callee must ignore, as usual for types narrower than a register.
(Other ABIs do make different choices here. Some do require narrow integer types to be zero- or sign-extended to fill a register when passed to or returned from functions, like MIPS64 and PowerPC64. See the last section of this x86-64 answer which compares vs. those earlier ISAs.)
For example, a caller might have calculated a & 0x01010101
in RDI and used it for something else, before calling bool_func(a&1)
. The caller could optimize away the &1
because it already did that to the low byte as part of and edi, 0x01010101
, and it knows the callee is required to ignore the high bytes.
Or if a bool is passed as the 3rd arg, maybe a caller optimizing for code-size loads it with mov dl, [mem]
instead of movzx edx, [mem]
, saving 1 byte at the cost of a false dependency on the old value of RDX (or other partial-register effect, depending on CPU model). Or for the first arg, mov dil, byte [r10]
instead of movzx edi, byte [r10]
, because both require a REX prefix anyway.
This is why clang emits movzx eax, dil
in Serialize
, instead of sub eax, edi
. (For integer args, clang violates this ABI rule, instead depending on the undocumented behaviour of gcc and clang to zero- or sign-extend narrow integers to 32 bits. Is a sign or zero extension required when adding a 32bit offset to a pointer for the x86-64 ABI?
So I was interested to see that it doesn't do the same thing for bool
.)
Footnote 2: After branching, you'd just have a 4-byte mov
-immediate, or a 4-byte + 1-byte store. The length is implicit in the store widths + offsets.
OTOH, glibc memcpy will do two 4-byte loads/stores with an overlap that depends on length, so this really does end up making the whole thing free of conditional branches on the boolean. See the L(between_4_7):
block in glibc's memcpy/memmove. Or at least, go the same way for either boolean in memcpy's branching to select a chunk size.
If inlining, you could use 2x mov
-immediate + cmov
and a conditional offset, or you could leave the string data in memory.
Or if tuning for Intel Ice Lake (with the Fast Short REP MOV feature), an actual rep movsb
might be optimal. glibc memcpy
might start using rep movsb
for small sizes on CPUs with that feature, saving a lot of branching.
Tools for detecting UB and usage of uninitialized values
In gcc and clang, you can compile with -fsanitize=undefined
to add run-time instrumentation that will warn or error out on UB that happens at runtime. That won't catch unitialized variables, though. (Because it doesn't increase type sizes to make room for an "uninitialized" bit).
See https://developers.redhat.com/blog/2014/10/16/gcc-undefined-behavior-sanitizer-ubsan/
To find usage of uninitialized data, there's Address Sanitizer and Memory Sanitizer in clang/LLVM. https://github.com/google/sanitizers/wiki/MemorySanitizer shows examples of clang -fsanitize=memory -fPIE -pie
detecting uninitialized memory reads. It might work best if you compile without optimization, so all reads of variables end up actually loading from memory in the asm. They show it being used at -O2
in a case where the load wouldn't optimize away. I haven't tried it myself. (In some cases, e.g. not initializing an accumulator before summing an array, clang -O3 will emit code that sums into a vector register that it never initialized. So with optimization, you can have a case where there's no memory read associated with the UB. But -fsanitize=memory
changes the generated asm, and might result in a check for this.)
It will tolerate copying of uninitialized memory, and also simple logic and arithmetic operations with it. In general, MemorySanitizer silently tracks the spread of uninitialized data in memory, and reports a warning when a code branch is taken (or not taken) depending on an uninitialized value.
MemorySanitizer implements a subset of functionality found in Valgrind (Memcheck tool).
It should work for this case because the call to glibc memcpy
with a length
calculated from uninitialized memory will (inside the library) result in a branch based on length
. If it had inlined a fully branchless version that just used cmov
, indexing, and two stores, it might not have worked.
Valgrind's memcheck
will also look for this kind of problem, again not complaining if the program simply copies around uninitialized data. But it says it will detect when a "Conditional jump or move depends on uninitialised value(s)", to try to catch any externally-visible behaviour that depends on uninitialized data.
Perhaps the idea behind not flagging just a load is that structs can have padding, and copying the whole struct (including padding) with a wide vector load/store is not an error even if the individual members were only written one at a time. At the asm level, the information about what was padding and what is actually part of the value has been lost.
Yes, ISO C++ allows (but doesn't require) implementations to make this choice.
But also note that ISO C++ allows a compiler to emit code that crashes on purpose (e.g. with an illegal instruction) if the program encounters UB, e.g. as a way to help you find errors. (Or because it's a DeathStation 9000. Being strictly conforming is not sufficient for a C++ implementation to be useful for any real purpose). So ISO C++ would allow a compiler to make asm that crashed (for totally different reasons) even on similar code that read an uninitialized uint32_t
. Even though that's required to be a fixed-layout type with no trap representations.
It's an interesting question about how real implementations work, but remember that even if the answer was different, your code would still be unsafe because modern C++ is not a portable version of assembly language.
You're compiling for the x86-64 System V ABI, which specifies that a bool
as a function arg in a register is represented by the bit-patterns false=0
and true=1
in the low 8 bits of the register1. In memory, bool
is a 1-byte type that again must have an integer value of 0 or 1.
(An ABI is a set of implementation choices that compilers for the same platform agree on so they can make code that calls each other's functions, including type sizes, struct layout rules, and calling conventions.)
ISO C++ doesn't specify it, but this ABI decision is widespread because it makes bool->int conversion cheap (just zero-extension). I'm not aware of any ABIs that don't let the compiler assume 0 or 1 for bool
, for any architecture (not just x86). It allows optimizations like !mybool
with xor eax,1
to flip the low bit: Any possible code that can flip a bit/integer/bool between 0 and 1 in single CPU instruction. Or compiling a&&b
to a bitwise AND for bool
types. Some compilers do actually take advantage Boolean values as 8 bit in compilers. Are operations on them inefficient?.
In general, the as-if rule allows allows the compiler to take advantage of things that are true on the target platform being compiled for, because the end result will be executable code that implements the same externally-visible behaviour as the C++ source. (With all the restrictions that Undefined Behaviour places on what is actually "externally visible": not with a debugger, but from another thread in a well-formed / legal C++ program.)
The compiler is definitely allowed to take full advantage of an ABI guarantee in its code-gen, and make code like you found which optimizes strlen(whichString)
to5U - boolValue
. (BTW, this optimization is kind of clever, but maybe shortsighted vs. branching and inlining memcpy
as stores of immediate data2.)
Or the compiler could have created a table of pointers and indexed it with the integer value of the bool
, again assuming it was a 0 or 1. (This possibility is what @Barmar's answer suggested.)
Your __attribute((noinline))
constructor with optimization enabled led to clang just loading a byte from the stack to use as uninitializedBool
. It made space for the object in main
with push rax
(which is smaller and for various reason about as efficient as sub rsp, 8
), so whatever garbage was in AL on entry to main
is the value it used for uninitializedBool
. This is why you actually got values that weren't just 0
.
5U - random garbage
can easily wrap to a large unsigned value, leading memcpy to go into unmapped memory. The destination is in static storage, not the stack, so you're not overwriting a return address or something.
Other implementations could make different choices, e.g. false=0
and true=any non-zero value
. Then clang probably wouldn't make code that crashes for this specific instance of UB. (But it would still be allowed to if it wanted to.) I don't know of any implementations that choose anything other what x86-64 does for bool
, but the C++ standard allows many things that nobody does or even would want to do on hardware that's anything like current CPUs.
ISO C++ leaves it unspecified what you'll find when you examine or modify the object representation of a bool
. (e.g. by memcpy
ing the bool
into unsigned char
, which you're allowed to do because char*
can alias anything. And unsigned char
is guaranteed to have no padding bits, so the C++ standard does formally let you hexdump object representations without any UB. Pointer-casting to copy the object representation is different from assigning char foo = my_bool
, of course, so booleanization to 0 or 1 wouldn't happen and you'd get the raw object representation.)
You've partially "hidden" the UB on this execution path from the compiler with noinline
. Even if it doesn't inline, though, interprocedural optimizations could still make a version of the function that depends on the definition of another function. (First, clang is making an executable, not a Unix shared library where symbol-interposition can happen. Second, the definition in inside the class{}
definition so all translation units must have the same definition. Like with the inline
keyword.)
So a compiler could emit just a ret
or ud2
(illegal instruction) as the definition for main
, because the path of execution starting at the top of main
unavoidably encounters Undefined Behaviour. (Which the compiler can see at compile time if it decided to follow the path through the non-inline constructor.)
Any program that encounters UB is totally undefined for its entire existence. But UB inside a function or if()
branch that never actually runs doesn't corrupt the rest of the program. In practice that means that compilers can decide to emit an illegal instruction, or a ret
, or not emit anything and fall into the next block / function, for the whole basic block that can be proven at compile time to contain or lead to UB.
GCC and Clang in practice do actually sometimes emit ud2
on UB, instead of even trying to generate code for paths of execution that make no sense. Or for cases like falling off the end of a non-void
function, gcc will sometimes omit a ret
instruction. If you were thinking that "my function will just return with whatever garbage is in RAX", you are sorely mistaken. Modern C++ compilers don't treat the language like a portable assembly language any more. Your program really has to be valid C++, without making assumptions about how a stand-alone non inlined version of your function might look in asm.
Another fun example is Why does unaligned access to mmap'ed memory sometimes segfault on AMD64?. x86 doesn't fault on unaligned integers, right? So why would a misaligned uint16_t*
be a problem? Because alignof(uint16_t) == 2
, and violating that assumption led to a segfault when auto-vectorizing with SSE2.
See also What Every C Programmer Should Know About Undefined Behavior #1/3, an article by a clang developer.
Key point: if the compiler noticed the UB at compile time, it could "break" (emit surprising asm) the path through your code that causes UB even if targeting an ABI where any bit-pattern is a valid object representation for bool
.
Expect total hostility toward many mistakes by the programmer, especially things modern compilers warn about. This is why you should use -Wall
and fix warnings. C++ is not a user-friendly language, and something in C++ can be unsafe even if it would be safe in asm on the target you're compiling for. (e.g. signed overflow is UB in C++ and compilers will assume it doesn't happen, even when compiling for 2's complement x86, unless you use clang/gcc -fwrapv
.)
Compile-time-visible UB is always dangerous, and it's really hard to be sure (with link-time optimization) that you've really hidden UB from the compiler and can thus reason about what kind of asm it will generate.
Not to be over-dramatic; often compilers do let you get away with some things and emit code like you're expecting even when something is UB. But maybe it will be a problem in the future if compiler devs implement some optimization that gains more info about value-ranges (e.g. that a variable is non-negative, maybe allowing it to optimize sign-extension to free zero-extension on x86-64). For example, in current gcc and clang, doing tmp = a+INT_MIN
doesn't let them optimize a<0
as always-true, only that tmp
is always negative. (So they don't backtrack from the inputs of a calculation to derive range info, only on the results based on the assumption of no signed overflow: example on Godbolt. I don't know if this is intentional user-friendliness or simply a missed optimization.)
Also note that implementations (aka compilers) are allowed to define behaviour that ISO C++ leaves undefined. For example, all compilers that support Intel's intrinsics (like _mm_add_ps(__m128, __m128)
for manual SIMD vectorization) must allow forming mis-aligned pointers, which is UB in C++ even if you don't dereference them. __m128i _mm_loadu_si128(const __m128i *)
does unaligned loads by taking a misaligned __m128i*
arg, not a void*
or char*
. Is `reinterpret_cast`ing between hardware vector pointer and the corresponding type an undefined behavior?
GNU C/C++ also defines the behaviour of left-shifting a negative signed number (even without -fwrapv
), separately from the normal signed-overflow UB rules. (This is UB in ISO C++, while right shifts of signed numbers are implementation-defined (logical vs. arithmetic); good quality implementations choose arithmetic on HW that has arithmetic right shifts, but ISO C++ doesn't specify). This is documented in the GCC manual's Integer section, along with defining implementation-defined behaviour that C standards require implementations to define one way or another.
There are definitely quality-of-implementation issues that compiler developers care about; they generally aren't trying to make compilers that are intentionally hostile, but taking advantage of all the UB potholes in C++ (except ones they choose to define) to optimize better can be nearly indistinguishable at times.
Footnote 1: The upper 56 bits can be garbage which the callee must ignore, as usual for types narrower than a register.
(Other ABIs do make different choices here. Some do require narrow integer types to be zero- or sign-extended to fill a register when passed to or returned from functions, like MIPS64 and PowerPC64. See the last section of this x86-64 answer which compares vs. those earlier ISAs.)
For example, a caller might have calculated a & 0x01010101
in RDI and used it for something else, before calling bool_func(a&1)
. The caller could optimize away the &1
because it already did that to the low byte as part of and edi, 0x01010101
, and it knows the callee is required to ignore the high bytes.
Or if a bool is passed as the 3rd arg, maybe a caller optimizing for code-size loads it with mov dl, [mem]
instead of movzx edx, [mem]
, saving 1 byte at the cost of a false dependency on the old value of RDX (or other partial-register effect, depending on CPU model). Or for the first arg, mov dil, byte [r10]
instead of movzx edi, byte [r10]
, because both require a REX prefix anyway.
This is why clang emits movzx eax, dil
in Serialize
, instead of sub eax, edi
. (For integer args, clang violates this ABI rule, instead depending on the undocumented behaviour of gcc and clang to zero- or sign-extend narrow integers to 32 bits. Is a sign or zero extension required when adding a 32bit offset to a pointer for the x86-64 ABI?
So I was interested to see that it doesn't do the same thing for bool
.)
Footnote 2: After branching, you'd just have a 4-byte mov
-immediate, or a 4-byte + 1-byte store. The length is implicit in the store widths + offsets.
OTOH, glibc memcpy will do two 4-byte loads/stores with an overlap that depends on length, so this really does end up making the whole thing free of conditional branches on the boolean. See the L(between_4_7):
block in glibc's memcpy/memmove. Or at least, go the same way for either boolean in memcpy's branching to select a chunk size.
If inlining, you could use 2x mov
-immediate + cmov
and a conditional offset, or you could leave the string data in memory.
Or if tuning for Intel Ice Lake (with the Fast Short REP MOV feature), an actual rep movsb
might be optimal. glibc memcpy
might start using rep movsb
for small sizes on CPUs with that feature, saving a lot of branching.
Tools for detecting UB and usage of uninitialized values
In gcc and clang, you can compile with -fsanitize=undefined
to add run-time instrumentation that will warn or error out on UB that happens at runtime. That won't catch unitialized variables, though. (Because it doesn't increase type sizes to make room for an "uninitialized" bit).
See https://developers.redhat.com/blog/2014/10/16/gcc-undefined-behavior-sanitizer-ubsan/
To find usage of uninitialized data, there's Address Sanitizer and Memory Sanitizer in clang/LLVM. https://github.com/google/sanitizers/wiki/MemorySanitizer shows examples of clang -fsanitize=memory -fPIE -pie
detecting uninitialized memory reads. It might work best if you compile without optimization, so all reads of variables end up actually loading from memory in the asm. They show it being used at -O2
in a case where the load wouldn't optimize away. I haven't tried it myself. (In some cases, e.g. not initializing an accumulator before summing an array, clang -O3 will emit code that sums into a vector register that it never initialized. So with optimization, you can have a case where there's no memory read associated with the UB. But -fsanitize=memory
changes the generated asm, and might result in a check for this.)
It will tolerate copying of uninitialized memory, and also simple logic and arithmetic operations with it. In general, MemorySanitizer silently tracks the spread of uninitialized data in memory, and reports a warning when a code branch is taken (or not taken) depending on an uninitialized value.
MemorySanitizer implements a subset of functionality found in Valgrind (Memcheck tool).
It should work for this case because the call to glibc memcpy
with a length
calculated from uninitialized memory will (inside the library) result in a branch based on length
. If it had inlined a fully branchless version that just used cmov
, indexing, and two stores, it might not have worked.
Valgrind's memcheck
will also look for this kind of problem, again not complaining if the program simply copies around uninitialized data. But it says it will detect when a "Conditional jump or move depends on uninitialised value(s)", to try to catch any externally-visible behaviour that depends on uninitialized data.
Perhaps the idea behind not flagging just a load is that structs can have padding, and copying the whole struct (including padding) with a wide vector load/store is not an error even if the individual members were only written one at a time. At the asm level, the information about what was padding and what is actually part of the value has been lost.
edited Jan 13 at 14:08
answered Jan 10 at 9:42
Peter CordesPeter Cordes
134k18203342
134k18203342
1
I've seen a worse case where the variable took a value not in range of an 8 bit integer, but only of the entire CPU register. And Itanium has a worse one yet, use of an uninitialized variable can crash outright.
– Joshua
Jan 11 at 3:27
5
xkcd.com/499 is pretty good explanation of what UB is.
– val
Jan 11 at 4:30
7
Moreover, this also illustrates why the UB featurebug was introduced in the design of the languages C and C++ in the first place: because it gives the compiler exactly this kind of freedom, which has now permitted the most modern compilers to perform these high-quality optimizations that make C/C++ such high-performance mid-level languages.
– The_Sympathizer
Jan 11 at 7:04
1
And so the war between C++ compiler writers and C++ programmers trying to write useful programs continues. This answer, totally comprehensive in answering this question, could also be used as is as convincing ad copy for vendors of static analysis tools ...
– davidbak
Jan 12 at 2:45
3
@The_Sympathizer: UB was included to allow implementations to behave in whatever ways would be most useful to their customers. It was not intended to suggest that all behaviors should be considered equally useful.
– supercat
Jan 12 at 22:23
|
show 6 more comments
1
I've seen a worse case where the variable took a value not in range of an 8 bit integer, but only of the entire CPU register. And Itanium has a worse one yet, use of an uninitialized variable can crash outright.
– Joshua
Jan 11 at 3:27
5
xkcd.com/499 is pretty good explanation of what UB is.
– val
Jan 11 at 4:30
7
Moreover, this also illustrates why the UB featurebug was introduced in the design of the languages C and C++ in the first place: because it gives the compiler exactly this kind of freedom, which has now permitted the most modern compilers to perform these high-quality optimizations that make C/C++ such high-performance mid-level languages.
– The_Sympathizer
Jan 11 at 7:04
1
And so the war between C++ compiler writers and C++ programmers trying to write useful programs continues. This answer, totally comprehensive in answering this question, could also be used as is as convincing ad copy for vendors of static analysis tools ...
– davidbak
Jan 12 at 2:45
3
@The_Sympathizer: UB was included to allow implementations to behave in whatever ways would be most useful to their customers. It was not intended to suggest that all behaviors should be considered equally useful.
– supercat
Jan 12 at 22:23
1
1
I've seen a worse case where the variable took a value not in range of an 8 bit integer, but only of the entire CPU register. And Itanium has a worse one yet, use of an uninitialized variable can crash outright.
– Joshua
Jan 11 at 3:27
I've seen a worse case where the variable took a value not in range of an 8 bit integer, but only of the entire CPU register. And Itanium has a worse one yet, use of an uninitialized variable can crash outright.
– Joshua
Jan 11 at 3:27
5
5
xkcd.com/499 is pretty good explanation of what UB is.
– val
Jan 11 at 4:30
xkcd.com/499 is pretty good explanation of what UB is.
– val
Jan 11 at 4:30
7
7
Moreover, this also illustrates why the UB featurebug was introduced in the design of the languages C and C++ in the first place: because it gives the compiler exactly this kind of freedom, which has now permitted the most modern compilers to perform these high-quality optimizations that make C/C++ such high-performance mid-level languages.
– The_Sympathizer
Jan 11 at 7:04
Moreover, this also illustrates why the UB featurebug was introduced in the design of the languages C and C++ in the first place: because it gives the compiler exactly this kind of freedom, which has now permitted the most modern compilers to perform these high-quality optimizations that make C/C++ such high-performance mid-level languages.
– The_Sympathizer
Jan 11 at 7:04
1
1
And so the war between C++ compiler writers and C++ programmers trying to write useful programs continues. This answer, totally comprehensive in answering this question, could also be used as is as convincing ad copy for vendors of static analysis tools ...
– davidbak
Jan 12 at 2:45
And so the war between C++ compiler writers and C++ programmers trying to write useful programs continues. This answer, totally comprehensive in answering this question, could also be used as is as convincing ad copy for vendors of static analysis tools ...
– davidbak
Jan 12 at 2:45
3
3
@The_Sympathizer: UB was included to allow implementations to behave in whatever ways would be most useful to their customers. It was not intended to suggest that all behaviors should be considered equally useful.
– supercat
Jan 12 at 22:23
@The_Sympathizer: UB was included to allow implementations to behave in whatever ways would be most useful to their customers. It was not intended to suggest that all behaviors should be considered equally useful.
– supercat
Jan 12 at 22:23
|
show 6 more comments
The compiler is allowed to assume that a boolean value passed as an argument is a valid boolean value (i.e. one which has been initialised or converted to true
or false
). The true
value doesn't have to be the same as the integer 1 -- indeed, there can be various representations of true
and false
-- but the parameter must be some valid representation of one of those two values, where "valid representation" is implementation-defined.
So if you fail to initialise a bool
, or if you succeed in overwriting it through some pointer of a different type, then the compiler's assumptions will be wrong and Undefined Behaviour will ensue. You had been warned:
50) Using a bool value in ways described by this International Standard as “undefined”, such as by examining the value of an uninitialized automatic object, might cause it to behave as if it is neither true nor false. (Footnote to para 6 of §6.9.1, Fundamental Types)
11
The "true
value doesn't have to be the same as the integer 1" is kind of misleading. Sure, the actual bit pattern could be something else, but when implicitly converted/promoted (the only way you'd see a value other thantrue
/false
),true
is always1
, andfalse
is always0
. Of course, such a compiler would also be unable to use the trick this compiler was trying to use (using the fact thatbool
s actual bit pattern could only be0
or1
), so it's kind of irrelevant to the OP's problem.
– ShadowRanger
Jan 10 at 2:08
3
@ShadowRanger You can always inspect the object representation directly.
– T.C.
Jan 10 at 2:12
6
@shadowranger: my point is that the implementation is in charge. If it limits valid representations oftrue
to the bit pattern1
, that's its prerogative. If it chooses some other set of representations, then it indeed could not use the optimisation noted here. If it does choose that particular representation, then it can. It only needs to be internally consistent. You can examine the representation of abool
by copying it into a byte array; that is not UB (but it is implementation-defined)
– rici
Jan 10 at 2:28
3
Yes, optimizing compilers (i.e. real-world C++ implementation) often will sometimes emit code that depends on abool
having a bit-pattern of0
or1
. They don't re-booleanize abool
every time they read it from memory (or a register holding a function arg). That's what this answer is saying. examples: gcc4.7+ can optimizereturn a||b
toor eax, edi
in a function returningbool
, or MSVC can optimizea&b
totest cl, dl
. x86'stest
is a bitwiseand
, so ifcl=1
anddl=2
test sets flags according tocl&dl = 0
.
– Peter Cordes
Jan 10 at 8:21
4
The point about undefined behavior is that the compiler is allowed to draw far more conclusions about it, e.g. to assume that a code path which would lead to accessing an uninitialized value is never taken at all, as ensuring that is precisely the responsibility of the programmer. So it’s not just about the possibility that the low level values could be different than zero or one.
– Holger
Jan 10 at 10:47
|
show 6 more comments
The compiler is allowed to assume that a boolean value passed as an argument is a valid boolean value (i.e. one which has been initialised or converted to true
or false
). The true
value doesn't have to be the same as the integer 1 -- indeed, there can be various representations of true
and false
-- but the parameter must be some valid representation of one of those two values, where "valid representation" is implementation-defined.
So if you fail to initialise a bool
, or if you succeed in overwriting it through some pointer of a different type, then the compiler's assumptions will be wrong and Undefined Behaviour will ensue. You had been warned:
50) Using a bool value in ways described by this International Standard as “undefined”, such as by examining the value of an uninitialized automatic object, might cause it to behave as if it is neither true nor false. (Footnote to para 6 of §6.9.1, Fundamental Types)
11
The "true
value doesn't have to be the same as the integer 1" is kind of misleading. Sure, the actual bit pattern could be something else, but when implicitly converted/promoted (the only way you'd see a value other thantrue
/false
),true
is always1
, andfalse
is always0
. Of course, such a compiler would also be unable to use the trick this compiler was trying to use (using the fact thatbool
s actual bit pattern could only be0
or1
), so it's kind of irrelevant to the OP's problem.
– ShadowRanger
Jan 10 at 2:08
3
@ShadowRanger You can always inspect the object representation directly.
– T.C.
Jan 10 at 2:12
6
@shadowranger: my point is that the implementation is in charge. If it limits valid representations oftrue
to the bit pattern1
, that's its prerogative. If it chooses some other set of representations, then it indeed could not use the optimisation noted here. If it does choose that particular representation, then it can. It only needs to be internally consistent. You can examine the representation of abool
by copying it into a byte array; that is not UB (but it is implementation-defined)
– rici
Jan 10 at 2:28
3
Yes, optimizing compilers (i.e. real-world C++ implementation) often will sometimes emit code that depends on abool
having a bit-pattern of0
or1
. They don't re-booleanize abool
every time they read it from memory (or a register holding a function arg). That's what this answer is saying. examples: gcc4.7+ can optimizereturn a||b
toor eax, edi
in a function returningbool
, or MSVC can optimizea&b
totest cl, dl
. x86'stest
is a bitwiseand
, so ifcl=1
anddl=2
test sets flags according tocl&dl = 0
.
– Peter Cordes
Jan 10 at 8:21
4
The point about undefined behavior is that the compiler is allowed to draw far more conclusions about it, e.g. to assume that a code path which would lead to accessing an uninitialized value is never taken at all, as ensuring that is precisely the responsibility of the programmer. So it’s not just about the possibility that the low level values could be different than zero or one.
– Holger
Jan 10 at 10:47
|
show 6 more comments
The compiler is allowed to assume that a boolean value passed as an argument is a valid boolean value (i.e. one which has been initialised or converted to true
or false
). The true
value doesn't have to be the same as the integer 1 -- indeed, there can be various representations of true
and false
-- but the parameter must be some valid representation of one of those two values, where "valid representation" is implementation-defined.
So if you fail to initialise a bool
, or if you succeed in overwriting it through some pointer of a different type, then the compiler's assumptions will be wrong and Undefined Behaviour will ensue. You had been warned:
50) Using a bool value in ways described by this International Standard as “undefined”, such as by examining the value of an uninitialized automatic object, might cause it to behave as if it is neither true nor false. (Footnote to para 6 of §6.9.1, Fundamental Types)
The compiler is allowed to assume that a boolean value passed as an argument is a valid boolean value (i.e. one which has been initialised or converted to true
or false
). The true
value doesn't have to be the same as the integer 1 -- indeed, there can be various representations of true
and false
-- but the parameter must be some valid representation of one of those two values, where "valid representation" is implementation-defined.
So if you fail to initialise a bool
, or if you succeed in overwriting it through some pointer of a different type, then the compiler's assumptions will be wrong and Undefined Behaviour will ensue. You had been warned:
50) Using a bool value in ways described by this International Standard as “undefined”, such as by examining the value of an uninitialized automatic object, might cause it to behave as if it is neither true nor false. (Footnote to para 6 of §6.9.1, Fundamental Types)
edited Jan 10 at 2:32
answered Jan 10 at 1:59
ricirici
158k20139207
158k20139207
11
The "true
value doesn't have to be the same as the integer 1" is kind of misleading. Sure, the actual bit pattern could be something else, but when implicitly converted/promoted (the only way you'd see a value other thantrue
/false
),true
is always1
, andfalse
is always0
. Of course, such a compiler would also be unable to use the trick this compiler was trying to use (using the fact thatbool
s actual bit pattern could only be0
or1
), so it's kind of irrelevant to the OP's problem.
– ShadowRanger
Jan 10 at 2:08
3
@ShadowRanger You can always inspect the object representation directly.
– T.C.
Jan 10 at 2:12
6
@shadowranger: my point is that the implementation is in charge. If it limits valid representations oftrue
to the bit pattern1
, that's its prerogative. If it chooses some other set of representations, then it indeed could not use the optimisation noted here. If it does choose that particular representation, then it can. It only needs to be internally consistent. You can examine the representation of abool
by copying it into a byte array; that is not UB (but it is implementation-defined)
– rici
Jan 10 at 2:28
3
Yes, optimizing compilers (i.e. real-world C++ implementation) often will sometimes emit code that depends on abool
having a bit-pattern of0
or1
. They don't re-booleanize abool
every time they read it from memory (or a register holding a function arg). That's what this answer is saying. examples: gcc4.7+ can optimizereturn a||b
toor eax, edi
in a function returningbool
, or MSVC can optimizea&b
totest cl, dl
. x86'stest
is a bitwiseand
, so ifcl=1
anddl=2
test sets flags according tocl&dl = 0
.
– Peter Cordes
Jan 10 at 8:21
4
The point about undefined behavior is that the compiler is allowed to draw far more conclusions about it, e.g. to assume that a code path which would lead to accessing an uninitialized value is never taken at all, as ensuring that is precisely the responsibility of the programmer. So it’s not just about the possibility that the low level values could be different than zero or one.
– Holger
Jan 10 at 10:47
|
show 6 more comments
11
The "true
value doesn't have to be the same as the integer 1" is kind of misleading. Sure, the actual bit pattern could be something else, but when implicitly converted/promoted (the only way you'd see a value other thantrue
/false
),true
is always1
, andfalse
is always0
. Of course, such a compiler would also be unable to use the trick this compiler was trying to use (using the fact thatbool
s actual bit pattern could only be0
or1
), so it's kind of irrelevant to the OP's problem.
– ShadowRanger
Jan 10 at 2:08
3
@ShadowRanger You can always inspect the object representation directly.
– T.C.
Jan 10 at 2:12
6
@shadowranger: my point is that the implementation is in charge. If it limits valid representations oftrue
to the bit pattern1
, that's its prerogative. If it chooses some other set of representations, then it indeed could not use the optimisation noted here. If it does choose that particular representation, then it can. It only needs to be internally consistent. You can examine the representation of abool
by copying it into a byte array; that is not UB (but it is implementation-defined)
– rici
Jan 10 at 2:28
3
Yes, optimizing compilers (i.e. real-world C++ implementation) often will sometimes emit code that depends on abool
having a bit-pattern of0
or1
. They don't re-booleanize abool
every time they read it from memory (or a register holding a function arg). That's what this answer is saying. examples: gcc4.7+ can optimizereturn a||b
toor eax, edi
in a function returningbool
, or MSVC can optimizea&b
totest cl, dl
. x86'stest
is a bitwiseand
, so ifcl=1
anddl=2
test sets flags according tocl&dl = 0
.
– Peter Cordes
Jan 10 at 8:21
4
The point about undefined behavior is that the compiler is allowed to draw far more conclusions about it, e.g. to assume that a code path which would lead to accessing an uninitialized value is never taken at all, as ensuring that is precisely the responsibility of the programmer. So it’s not just about the possibility that the low level values could be different than zero or one.
– Holger
Jan 10 at 10:47
11
11
The "
true
value doesn't have to be the same as the integer 1" is kind of misleading. Sure, the actual bit pattern could be something else, but when implicitly converted/promoted (the only way you'd see a value other than true
/false
), true
is always 1
, and false
is always 0
. Of course, such a compiler would also be unable to use the trick this compiler was trying to use (using the fact that bool
s actual bit pattern could only be 0
or 1
), so it's kind of irrelevant to the OP's problem.– ShadowRanger
Jan 10 at 2:08
The "
true
value doesn't have to be the same as the integer 1" is kind of misleading. Sure, the actual bit pattern could be something else, but when implicitly converted/promoted (the only way you'd see a value other than true
/false
), true
is always 1
, and false
is always 0
. Of course, such a compiler would also be unable to use the trick this compiler was trying to use (using the fact that bool
s actual bit pattern could only be 0
or 1
), so it's kind of irrelevant to the OP's problem.– ShadowRanger
Jan 10 at 2:08
3
3
@ShadowRanger You can always inspect the object representation directly.
– T.C.
Jan 10 at 2:12
@ShadowRanger You can always inspect the object representation directly.
– T.C.
Jan 10 at 2:12
6
6
@shadowranger: my point is that the implementation is in charge. If it limits valid representations of
true
to the bit pattern 1
, that's its prerogative. If it chooses some other set of representations, then it indeed could not use the optimisation noted here. If it does choose that particular representation, then it can. It only needs to be internally consistent. You can examine the representation of a bool
by copying it into a byte array; that is not UB (but it is implementation-defined)– rici
Jan 10 at 2:28
@shadowranger: my point is that the implementation is in charge. If it limits valid representations of
true
to the bit pattern 1
, that's its prerogative. If it chooses some other set of representations, then it indeed could not use the optimisation noted here. If it does choose that particular representation, then it can. It only needs to be internally consistent. You can examine the representation of a bool
by copying it into a byte array; that is not UB (but it is implementation-defined)– rici
Jan 10 at 2:28
3
3
Yes, optimizing compilers (i.e. real-world C++ implementation) often will sometimes emit code that depends on a
bool
having a bit-pattern of 0
or 1
. They don't re-booleanize a bool
every time they read it from memory (or a register holding a function arg). That's what this answer is saying. examples: gcc4.7+ can optimize return a||b
to or eax, edi
in a function returning bool
, or MSVC can optimize a&b
to test cl, dl
. x86's test
is a bitwise and
, so if cl=1
and dl=2
test sets flags according to cl&dl = 0
.– Peter Cordes
Jan 10 at 8:21
Yes, optimizing compilers (i.e. real-world C++ implementation) often will sometimes emit code that depends on a
bool
having a bit-pattern of 0
or 1
. They don't re-booleanize a bool
every time they read it from memory (or a register holding a function arg). That's what this answer is saying. examples: gcc4.7+ can optimize return a||b
to or eax, edi
in a function returning bool
, or MSVC can optimize a&b
to test cl, dl
. x86's test
is a bitwise and
, so if cl=1
and dl=2
test sets flags according to cl&dl = 0
.– Peter Cordes
Jan 10 at 8:21
4
4
The point about undefined behavior is that the compiler is allowed to draw far more conclusions about it, e.g. to assume that a code path which would lead to accessing an uninitialized value is never taken at all, as ensuring that is precisely the responsibility of the programmer. So it’s not just about the possibility that the low level values could be different than zero or one.
– Holger
Jan 10 at 10:47
The point about undefined behavior is that the compiler is allowed to draw far more conclusions about it, e.g. to assume that a code path which would lead to accessing an uninitialized value is never taken at all, as ensuring that is precisely the responsibility of the programmer. So it’s not just about the possibility that the low level values could be different than zero or one.
– Holger
Jan 10 at 10:47
|
show 6 more comments
The function itself is correct, but in your test program, the statement that calls the function causes undefined behaviour by using the value of an uninitialized variable.
The bug is in the calling function, and it could be detected by code review or static analysis of the calling function. Using your compiler explorer link, the gcc 8.2 compiler does detect the bug. (Maybe you could file a bug report against clang that it doesn't find the problem).
Undefined behaviour means anything can happen, which includes the program crashing a few lines after the event that triggered the undefined behaviour.
NB. The answer to "Can undefined behaviour cause _____ ?" is always "Yes". That's literally the definition of undefined behaviour.
2
Is the first clause true? Does merely copying an uninitializedbool
trigger UB?
– Joshua Green
Jan 10 at 3:25
10
@JoshuaGreen see [dcl.init]/12 "If an indeterminate value is produced by an evaluation, the behaviour is undefined except in the following cases:" (and none of those cases have an exception forbool
). Copying requires evaluating the source
– M.M
Jan 10 at 3:34
8
@JoshuaGreen And the reason for that is that you might have a platform that triggers a hardware fault if you access some invalid values for some types. These are sometimes called "trap representations".
– David Schwartz
Jan 10 at 11:15
4
Itanium, while obscure, is a CPU that's still in production, has trap values, and has two at least semi-modern C++ compilers (Intel/HP). It literally hastrue
,false
andnot-a-thing
values for booleans.
– MSalters
Jan 10 at 20:03
3
On the flip side, the answer to "Does the standard require all compilers to process something a certain way" is generally "no", even/especially in cases where it's obvious that any quality compiler should do so; the more obvious something is, the less need there should be for the authors of the Standard to actually say it.
– supercat
Jan 10 at 21:23
|
show 3 more comments
The function itself is correct, but in your test program, the statement that calls the function causes undefined behaviour by using the value of an uninitialized variable.
The bug is in the calling function, and it could be detected by code review or static analysis of the calling function. Using your compiler explorer link, the gcc 8.2 compiler does detect the bug. (Maybe you could file a bug report against clang that it doesn't find the problem).
Undefined behaviour means anything can happen, which includes the program crashing a few lines after the event that triggered the undefined behaviour.
NB. The answer to "Can undefined behaviour cause _____ ?" is always "Yes". That's literally the definition of undefined behaviour.
2
Is the first clause true? Does merely copying an uninitializedbool
trigger UB?
– Joshua Green
Jan 10 at 3:25
10
@JoshuaGreen see [dcl.init]/12 "If an indeterminate value is produced by an evaluation, the behaviour is undefined except in the following cases:" (and none of those cases have an exception forbool
). Copying requires evaluating the source
– M.M
Jan 10 at 3:34
8
@JoshuaGreen And the reason for that is that you might have a platform that triggers a hardware fault if you access some invalid values for some types. These are sometimes called "trap representations".
– David Schwartz
Jan 10 at 11:15
4
Itanium, while obscure, is a CPU that's still in production, has trap values, and has two at least semi-modern C++ compilers (Intel/HP). It literally hastrue
,false
andnot-a-thing
values for booleans.
– MSalters
Jan 10 at 20:03
3
On the flip side, the answer to "Does the standard require all compilers to process something a certain way" is generally "no", even/especially in cases where it's obvious that any quality compiler should do so; the more obvious something is, the less need there should be for the authors of the Standard to actually say it.
– supercat
Jan 10 at 21:23
|
show 3 more comments
The function itself is correct, but in your test program, the statement that calls the function causes undefined behaviour by using the value of an uninitialized variable.
The bug is in the calling function, and it could be detected by code review or static analysis of the calling function. Using your compiler explorer link, the gcc 8.2 compiler does detect the bug. (Maybe you could file a bug report against clang that it doesn't find the problem).
Undefined behaviour means anything can happen, which includes the program crashing a few lines after the event that triggered the undefined behaviour.
NB. The answer to "Can undefined behaviour cause _____ ?" is always "Yes". That's literally the definition of undefined behaviour.
The function itself is correct, but in your test program, the statement that calls the function causes undefined behaviour by using the value of an uninitialized variable.
The bug is in the calling function, and it could be detected by code review or static analysis of the calling function. Using your compiler explorer link, the gcc 8.2 compiler does detect the bug. (Maybe you could file a bug report against clang that it doesn't find the problem).
Undefined behaviour means anything can happen, which includes the program crashing a few lines after the event that triggered the undefined behaviour.
NB. The answer to "Can undefined behaviour cause _____ ?" is always "Yes". That's literally the definition of undefined behaviour.
answered Jan 10 at 2:12
M.MM.M
107k11120244
107k11120244
2
Is the first clause true? Does merely copying an uninitializedbool
trigger UB?
– Joshua Green
Jan 10 at 3:25
10
@JoshuaGreen see [dcl.init]/12 "If an indeterminate value is produced by an evaluation, the behaviour is undefined except in the following cases:" (and none of those cases have an exception forbool
). Copying requires evaluating the source
– M.M
Jan 10 at 3:34
8
@JoshuaGreen And the reason for that is that you might have a platform that triggers a hardware fault if you access some invalid values for some types. These are sometimes called "trap representations".
– David Schwartz
Jan 10 at 11:15
4
Itanium, while obscure, is a CPU that's still in production, has trap values, and has two at least semi-modern C++ compilers (Intel/HP). It literally hastrue
,false
andnot-a-thing
values for booleans.
– MSalters
Jan 10 at 20:03
3
On the flip side, the answer to "Does the standard require all compilers to process something a certain way" is generally "no", even/especially in cases where it's obvious that any quality compiler should do so; the more obvious something is, the less need there should be for the authors of the Standard to actually say it.
– supercat
Jan 10 at 21:23
|
show 3 more comments
2
Is the first clause true? Does merely copying an uninitializedbool
trigger UB?
– Joshua Green
Jan 10 at 3:25
10
@JoshuaGreen see [dcl.init]/12 "If an indeterminate value is produced by an evaluation, the behaviour is undefined except in the following cases:" (and none of those cases have an exception forbool
). Copying requires evaluating the source
– M.M
Jan 10 at 3:34
8
@JoshuaGreen And the reason for that is that you might have a platform that triggers a hardware fault if you access some invalid values for some types. These are sometimes called "trap representations".
– David Schwartz
Jan 10 at 11:15
4
Itanium, while obscure, is a CPU that's still in production, has trap values, and has two at least semi-modern C++ compilers (Intel/HP). It literally hastrue
,false
andnot-a-thing
values for booleans.
– MSalters
Jan 10 at 20:03
3
On the flip side, the answer to "Does the standard require all compilers to process something a certain way" is generally "no", even/especially in cases where it's obvious that any quality compiler should do so; the more obvious something is, the less need there should be for the authors of the Standard to actually say it.
– supercat
Jan 10 at 21:23
2
2
Is the first clause true? Does merely copying an uninitialized
bool
trigger UB?– Joshua Green
Jan 10 at 3:25
Is the first clause true? Does merely copying an uninitialized
bool
trigger UB?– Joshua Green
Jan 10 at 3:25
10
10
@JoshuaGreen see [dcl.init]/12 "If an indeterminate value is produced by an evaluation, the behaviour is undefined except in the following cases:" (and none of those cases have an exception for
bool
). Copying requires evaluating the source– M.M
Jan 10 at 3:34
@JoshuaGreen see [dcl.init]/12 "If an indeterminate value is produced by an evaluation, the behaviour is undefined except in the following cases:" (and none of those cases have an exception for
bool
). Copying requires evaluating the source– M.M
Jan 10 at 3:34
8
8
@JoshuaGreen And the reason for that is that you might have a platform that triggers a hardware fault if you access some invalid values for some types. These are sometimes called "trap representations".
– David Schwartz
Jan 10 at 11:15
@JoshuaGreen And the reason for that is that you might have a platform that triggers a hardware fault if you access some invalid values for some types. These are sometimes called "trap representations".
– David Schwartz
Jan 10 at 11:15
4
4
Itanium, while obscure, is a CPU that's still in production, has trap values, and has two at least semi-modern C++ compilers (Intel/HP). It literally has
true
, false
and not-a-thing
values for booleans.– MSalters
Jan 10 at 20:03
Itanium, while obscure, is a CPU that's still in production, has trap values, and has two at least semi-modern C++ compilers (Intel/HP). It literally has
true
, false
and not-a-thing
values for booleans.– MSalters
Jan 10 at 20:03
3
3
On the flip side, the answer to "Does the standard require all compilers to process something a certain way" is generally "no", even/especially in cases where it's obvious that any quality compiler should do so; the more obvious something is, the less need there should be for the authors of the Standard to actually say it.
– supercat
Jan 10 at 21:23
On the flip side, the answer to "Does the standard require all compilers to process something a certain way" is generally "no", even/especially in cases where it's obvious that any quality compiler should do so; the more obvious something is, the less need there should be for the authors of the Standard to actually say it.
– supercat
Jan 10 at 21:23
|
show 3 more comments
A bool is only allowed to hold the values 0
or 1
, and the generated code can assume that it will only hold one of these two values. The code generated for the ternary in the assignment could use the value as the index into an array of pointers to the two strings, i.e. it might be converted to something like:
// the compile could make asm that "looks" like this, from your source
const static char *strings = {"false", "true"};
const char *whichString = strings[boolValue];
If boolValue
is uninitialized, it could actually hold any integer value, which would then cause accessing outside the bounds of the strings
array.
1
@SidS Thanks. Theoretically, the internal representations could be the opposite of how they cast to/from integers, but that would be perverse.
– Barmar
Jan 10 at 2:09
1
You are right, and your example will also crash. However it is "visible" to a code review that you are using an uninitialized variable as an index to an array. Also, it would crash even in debug (for example some debugger/compiler will initialize with specific patterns to make it easier to see when it crashes). In my example, the surprising part is that the usage of the bool is invisible: The optimizer decided to use it in a calculation not present in the source code.
– Remz
Jan 10 at 2:25
3
@Remz I'm just using the array to show what the generated code could be equivalent to, not suggesting that anyone would actually write that.
– Barmar
Jan 10 at 2:28
1
@Remz Recast thebool
toint
with*(int *)&boolValue
and print it for debugging purposes, see if it is anything other than0
or1
when it crashes. If that's the case, it pretty much confirms the theory that the compiler is optimizing the inline-if as an array which explains why it is crashing.
– Havenard
Jan 10 at 2:57
2
@MSalters:std::bitset<8>
doesn't give me nice names for all my different flags. Depending on what they are, that may be important.
– Martin Bonner
Jan 11 at 15:13
|
show 6 more comments
A bool is only allowed to hold the values 0
or 1
, and the generated code can assume that it will only hold one of these two values. The code generated for the ternary in the assignment could use the value as the index into an array of pointers to the two strings, i.e. it might be converted to something like:
// the compile could make asm that "looks" like this, from your source
const static char *strings = {"false", "true"};
const char *whichString = strings[boolValue];
If boolValue
is uninitialized, it could actually hold any integer value, which would then cause accessing outside the bounds of the strings
array.
1
@SidS Thanks. Theoretically, the internal representations could be the opposite of how they cast to/from integers, but that would be perverse.
– Barmar
Jan 10 at 2:09
1
You are right, and your example will also crash. However it is "visible" to a code review that you are using an uninitialized variable as an index to an array. Also, it would crash even in debug (for example some debugger/compiler will initialize with specific patterns to make it easier to see when it crashes). In my example, the surprising part is that the usage of the bool is invisible: The optimizer decided to use it in a calculation not present in the source code.
– Remz
Jan 10 at 2:25
3
@Remz I'm just using the array to show what the generated code could be equivalent to, not suggesting that anyone would actually write that.
– Barmar
Jan 10 at 2:28
1
@Remz Recast thebool
toint
with*(int *)&boolValue
and print it for debugging purposes, see if it is anything other than0
or1
when it crashes. If that's the case, it pretty much confirms the theory that the compiler is optimizing the inline-if as an array which explains why it is crashing.
– Havenard
Jan 10 at 2:57
2
@MSalters:std::bitset<8>
doesn't give me nice names for all my different flags. Depending on what they are, that may be important.
– Martin Bonner
Jan 11 at 15:13
|
show 6 more comments
A bool is only allowed to hold the values 0
or 1
, and the generated code can assume that it will only hold one of these two values. The code generated for the ternary in the assignment could use the value as the index into an array of pointers to the two strings, i.e. it might be converted to something like:
// the compile could make asm that "looks" like this, from your source
const static char *strings = {"false", "true"};
const char *whichString = strings[boolValue];
If boolValue
is uninitialized, it could actually hold any integer value, which would then cause accessing outside the bounds of the strings
array.
A bool is only allowed to hold the values 0
or 1
, and the generated code can assume that it will only hold one of these two values. The code generated for the ternary in the assignment could use the value as the index into an array of pointers to the two strings, i.e. it might be converted to something like:
// the compile could make asm that "looks" like this, from your source
const static char *strings = {"false", "true"};
const char *whichString = strings[boolValue];
If boolValue
is uninitialized, it could actually hold any integer value, which would then cause accessing outside the bounds of the strings
array.
edited Jan 10 at 9:45
Peter Cordes
134k18203342
134k18203342
answered Jan 10 at 2:02
BarmarBarmar
435k36260364
435k36260364
1
@SidS Thanks. Theoretically, the internal representations could be the opposite of how they cast to/from integers, but that would be perverse.
– Barmar
Jan 10 at 2:09
1
You are right, and your example will also crash. However it is "visible" to a code review that you are using an uninitialized variable as an index to an array. Also, it would crash even in debug (for example some debugger/compiler will initialize with specific patterns to make it easier to see when it crashes). In my example, the surprising part is that the usage of the bool is invisible: The optimizer decided to use it in a calculation not present in the source code.
– Remz
Jan 10 at 2:25
3
@Remz I'm just using the array to show what the generated code could be equivalent to, not suggesting that anyone would actually write that.
– Barmar
Jan 10 at 2:28
1
@Remz Recast thebool
toint
with*(int *)&boolValue
and print it for debugging purposes, see if it is anything other than0
or1
when it crashes. If that's the case, it pretty much confirms the theory that the compiler is optimizing the inline-if as an array which explains why it is crashing.
– Havenard
Jan 10 at 2:57
2
@MSalters:std::bitset<8>
doesn't give me nice names for all my different flags. Depending on what they are, that may be important.
– Martin Bonner
Jan 11 at 15:13
|
show 6 more comments
1
@SidS Thanks. Theoretically, the internal representations could be the opposite of how they cast to/from integers, but that would be perverse.
– Barmar
Jan 10 at 2:09
1
You are right, and your example will also crash. However it is "visible" to a code review that you are using an uninitialized variable as an index to an array. Also, it would crash even in debug (for example some debugger/compiler will initialize with specific patterns to make it easier to see when it crashes). In my example, the surprising part is that the usage of the bool is invisible: The optimizer decided to use it in a calculation not present in the source code.
– Remz
Jan 10 at 2:25
3
@Remz I'm just using the array to show what the generated code could be equivalent to, not suggesting that anyone would actually write that.
– Barmar
Jan 10 at 2:28
1
@Remz Recast thebool
toint
with*(int *)&boolValue
and print it for debugging purposes, see if it is anything other than0
or1
when it crashes. If that's the case, it pretty much confirms the theory that the compiler is optimizing the inline-if as an array which explains why it is crashing.
– Havenard
Jan 10 at 2:57
2
@MSalters:std::bitset<8>
doesn't give me nice names for all my different flags. Depending on what they are, that may be important.
– Martin Bonner
Jan 11 at 15:13
1
1
@SidS Thanks. Theoretically, the internal representations could be the opposite of how they cast to/from integers, but that would be perverse.
– Barmar
Jan 10 at 2:09
@SidS Thanks. Theoretically, the internal representations could be the opposite of how they cast to/from integers, but that would be perverse.
– Barmar
Jan 10 at 2:09
1
1
You are right, and your example will also crash. However it is "visible" to a code review that you are using an uninitialized variable as an index to an array. Also, it would crash even in debug (for example some debugger/compiler will initialize with specific patterns to make it easier to see when it crashes). In my example, the surprising part is that the usage of the bool is invisible: The optimizer decided to use it in a calculation not present in the source code.
– Remz
Jan 10 at 2:25
You are right, and your example will also crash. However it is "visible" to a code review that you are using an uninitialized variable as an index to an array. Also, it would crash even in debug (for example some debugger/compiler will initialize with specific patterns to make it easier to see when it crashes). In my example, the surprising part is that the usage of the bool is invisible: The optimizer decided to use it in a calculation not present in the source code.
– Remz
Jan 10 at 2:25
3
3
@Remz I'm just using the array to show what the generated code could be equivalent to, not suggesting that anyone would actually write that.
– Barmar
Jan 10 at 2:28
@Remz I'm just using the array to show what the generated code could be equivalent to, not suggesting that anyone would actually write that.
– Barmar
Jan 10 at 2:28
1
1
@Remz Recast the
bool
to int
with *(int *)&boolValue
and print it for debugging purposes, see if it is anything other than 0
or 1
when it crashes. If that's the case, it pretty much confirms the theory that the compiler is optimizing the inline-if as an array which explains why it is crashing.– Havenard
Jan 10 at 2:57
@Remz Recast the
bool
to int
with *(int *)&boolValue
and print it for debugging purposes, see if it is anything other than 0
or 1
when it crashes. If that's the case, it pretty much confirms the theory that the compiler is optimizing the inline-if as an array which explains why it is crashing.– Havenard
Jan 10 at 2:57
2
2
@MSalters:
std::bitset<8>
doesn't give me nice names for all my different flags. Depending on what they are, that may be important.– Martin Bonner
Jan 11 at 15:13
@MSalters:
std::bitset<8>
doesn't give me nice names for all my different flags. Depending on what they are, that may be important.– Martin Bonner
Jan 11 at 15:13
|
show 6 more comments
Summarising your question a lot, you are asking Does the C++ standard allow a compiler to assume a bool
can only have an internal numerical representation of '0' or '1' and use it in such a way?
The standard says nothing about the internal representation of a bool
. It only defines what happens when casting a bool
to an int
(or vice versa). Mostly, because of these integral conversions (and the fact that people rely rather heavily on them), the compiler will use 0 and 1, but it doesn't have to (although it has to respect the constraints of any lower level ABI it uses).
So, the compiler, when it sees a bool
is entitled to consider that said bool
contains either of the 'true
' or 'false
' bit patterns and do anything it feels like. So if the values for true
and false
are 1 and 0, respectively, the compiler is indeed allowed to optimise strlen
to 5 - <boolean value>
. Other fun behaviours are possible!
As gets repeatedly stated here, undefined behaviour has undefined results. Including but not limited to
- Your code working as you expected it to
- Your code failing at random times
- Your code not being run at all.
See What every programmer should know about undefined behavior
add a comment |
Summarising your question a lot, you are asking Does the C++ standard allow a compiler to assume a bool
can only have an internal numerical representation of '0' or '1' and use it in such a way?
The standard says nothing about the internal representation of a bool
. It only defines what happens when casting a bool
to an int
(or vice versa). Mostly, because of these integral conversions (and the fact that people rely rather heavily on them), the compiler will use 0 and 1, but it doesn't have to (although it has to respect the constraints of any lower level ABI it uses).
So, the compiler, when it sees a bool
is entitled to consider that said bool
contains either of the 'true
' or 'false
' bit patterns and do anything it feels like. So if the values for true
and false
are 1 and 0, respectively, the compiler is indeed allowed to optimise strlen
to 5 - <boolean value>
. Other fun behaviours are possible!
As gets repeatedly stated here, undefined behaviour has undefined results. Including but not limited to
- Your code working as you expected it to
- Your code failing at random times
- Your code not being run at all.
See What every programmer should know about undefined behavior
add a comment |
Summarising your question a lot, you are asking Does the C++ standard allow a compiler to assume a bool
can only have an internal numerical representation of '0' or '1' and use it in such a way?
The standard says nothing about the internal representation of a bool
. It only defines what happens when casting a bool
to an int
(or vice versa). Mostly, because of these integral conversions (and the fact that people rely rather heavily on them), the compiler will use 0 and 1, but it doesn't have to (although it has to respect the constraints of any lower level ABI it uses).
So, the compiler, when it sees a bool
is entitled to consider that said bool
contains either of the 'true
' or 'false
' bit patterns and do anything it feels like. So if the values for true
and false
are 1 and 0, respectively, the compiler is indeed allowed to optimise strlen
to 5 - <boolean value>
. Other fun behaviours are possible!
As gets repeatedly stated here, undefined behaviour has undefined results. Including but not limited to
- Your code working as you expected it to
- Your code failing at random times
- Your code not being run at all.
See What every programmer should know about undefined behavior
Summarising your question a lot, you are asking Does the C++ standard allow a compiler to assume a bool
can only have an internal numerical representation of '0' or '1' and use it in such a way?
The standard says nothing about the internal representation of a bool
. It only defines what happens when casting a bool
to an int
(or vice versa). Mostly, because of these integral conversions (and the fact that people rely rather heavily on them), the compiler will use 0 and 1, but it doesn't have to (although it has to respect the constraints of any lower level ABI it uses).
So, the compiler, when it sees a bool
is entitled to consider that said bool
contains either of the 'true
' or 'false
' bit patterns and do anything it feels like. So if the values for true
and false
are 1 and 0, respectively, the compiler is indeed allowed to optimise strlen
to 5 - <boolean value>
. Other fun behaviours are possible!
As gets repeatedly stated here, undefined behaviour has undefined results. Including but not limited to
- Your code working as you expected it to
- Your code failing at random times
- Your code not being run at all.
See What every programmer should know about undefined behavior
answered Jan 10 at 11:48
Tom TannerTom Tanner
8,13322351
8,13322351
add a comment |
add a comment |
protected by P.W Feb 26 at 9:45
Thank you for your interest in this question.
Because it has attracted low-quality or spam answers that had to be removed, posting an answer now requires 10 reputation on this site (the association bonus does not count).
Would you like to answer one of these unanswered questions instead?
182
It's a great question. It's a solid illustration of how undefined behavior isn't just a theoretical concern. When people say anything can happen as a result of UB, that "anything" can really be quite surprising. One might assume that undefined behavior still manifests in predictable ways, but these days with modern optimizers that's not at all true. OP took the time to create a MCVE, investigated the problem thoroughly, inspected the disassembly, and asked a clear, straightforward question about it. Couldn't ask for more.
– John Kugelman
Jan 10 at 2:04
6
Observe that the requirement that “non-zero evaluates to
true
” is a rule about Boolean operations including “assignment to a bool” (which might implicitly invoke astatic_cast<bool>()
depending on specifics). It is however not a requirement about the internal representation of abool
chosen by the compiler.– Euro Micelli
Jan 10 at 3:48
2
Comments are not for extended discussion; this conversation has been moved to chat.
– Samuel Liew♦
Jan 11 at 12:28
3
On a very related note, this is a "fun" source of binary incompatibility. If you have an ABI A that zero-pads values before calling a function, but compiles functions such that it assumes parameters are zero-padded, and an ABI B that's the opposite (doesn't zero-pad, but doesn't assume zero-padded parameters), it'll mostly work, but a function using the B ABI will cause issues if it calls a function using the A ABI that takes a 'small' parameter. IIRC you have this on x86 with clang and ICC.
– TLW
Jan 12 at 19:36
1
@TLW: Although the Standard does not require that implementations provide any means of calling or being called by outside code, it would have been helpful to have a means of specifying such things for implementations where they are relevant (implementations where such details aren't relevant could ignore such attributes).
– supercat
Jan 12 at 22:14