GCC-10.3.0优化器在处理类型强转时的bug
下面代码使用-O2和-O3编译会得到不同的结果
doris-sandbox04 :: ~/public » /home/disk1/doris-deps/toolchain/installed/gcc-10.3.0/bin/g++ -std=gnu++17 -O3 -msse4.2 -mavx2 -fopt-info-vec-optimized test.cpp -o opt3.s test.cpp:63:23: optimized: loop vectorized using 32 byte vectors /home/disk1/doris-deps/toolchain/installed/gcc-10.3.0/include/c++/10.3.0/ext/new_allocator.h:115:41: optimized: basic block part vectorized using 32 byte vectors /home/disk1/doris-deps/toolchain/installed/gcc-10.3.0/include/c++/10.3.0/ext/new_allocator.h:115:41: optimized: basic block part vectorized using 32 byte vectors test.cpp:82:18: optimized: basic block part vectorized using 32 byte vectors doris-sandbox04 :: ~/public » ./opt3.s 4138674677912027985 4138674677912027985 4138674677912027985 4138674677912027985 doris-sandbox04 :: ~/public » /home/disk1/doris-deps/toolchain/installed/gcc-10.3.0/bin/g++ -std=gnu++17 -O2 -msse4.2 -mavx2 -fopt-info-vec-optimized test.cpp -o opt2.s doris-sandbox04 :: ~/public » ./opt2.s 4138674677912027985 17614482930881034518 9674455539515676295 16429943614018478328
复现代码如下:
#include <cstdint> #include <vector> #include <iostream> static const uint64_t MURMUR_PRIME = 0xc6a4a7935bd1e995ULL; static const uint32_t MURMUR_SEED = 0xadc83b19ULL; // Our hash function is MurmurHash2, 64 bit version. // It was modified in order to provide the same result in // big and little endian archs (endian neutral). uint64_t murmur_hash64A (const void* key, int32_t len, unsigned int seed) { const uint64_t m = MURMUR_PRIME; const int r = 47; uint64_t h = seed ^ (len * m); const uint8_t *data = (const uint8_t *)key; const uint8_t *end = data + (len-(len&7)); while(data != end) { uint64_t k; #if (BYTE_ORDER == BIG_ENDIAN) k = (uint64_t) data[0]; k |= (uint64_t) data[1] << 8; k |= (uint64_t) data[2] << 16; k |= (uint64_t) data[3] << 24; k |= (uint64_t) data[4] << 32; k |= (uint64_t) data[5] << 40; k |= (uint64_t) data[6] << 48; k |= (uint64_t) data[7] << 56; #else k = *((uint64_t*)data); #endif k *= m; k ^= k >> r; k *= m; h ^= k; h *= m; data += 8; } switch(len & 7) { case 7: h ^= (uint64_t)data[6] << 48; case 6: h ^= (uint64_t)data[5] << 40; case 5: h ^= (uint64_t)data[4] << 32; case 4: h ^= (uint64_t)data[3] << 24; case 3: h ^= (uint64_t)data[2] << 16; case 2: h ^= (uint64_t)data[1] << 8; case 1: h ^= (uint64_t)data[0]; h *= m; }; h ^= h >> r; h *= m; h ^= h >> r; return h; } // static const uint32_t MURMUR_SEED = 0xadc83b19ULL; // uint64_t murmur_hash64A (const void* key, int32_t len, unsigned int seed); void update_double(const std::vector<double>& values, std::vector<uint64_t>& hashes) { auto size = values.size(); for (int i = 0; i < size; ++i) { auto v = values[i]; uint64_t value = murmur_hash64A(&v, sizeof(v), MURMUR_SEED); hashes[i] = value; } } const int N = 3; int main() { uint64_t x = 0; uint64_t xv = murmur_hash64A(&x, sizeof(x), MURMUR_SEED); printf("%llu\n", xv); std::vector<double> values(N); std::vector<uint64_t> hashes(N); for (int i = 0; i < N; ++i) { values[i] = i + 1; } update_double(values, hashes); for (auto hash : hashes) { std::cout << hash << std::endl; } return 0; }
GCC官方回复如下: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100224
You are accessing 'double' via a pointer to uint64_t * here:
k = ((uint64_t)data);
that violates type based aliasing rules. You can use -fno-strict-aliasing to work around your bug or use
typedef uint64_t aliasing_uint64_t __attribute__((may_alias)); k = ((aliasing_uint64_t)data);
我有个比较有趣的观察是,如果使用这个murmurhash + seed去计算 uint64_t x= 0的话,得到的hash value也是 4138674677912027985 查看汇编代码的话,可以看到有类似下面的语句
movabsq $4138674677912027985, %rdi
然后在 update_counter 函数里面也有类似的数值,相当于把这个预先计算的值copy到了所有的结果上,而且还是向量化的copy.
我的理解是,就像gcc bug里面那个人说的,编译器估计认为double地址和int64地址不会成为alias, 那么激进的优化策略就是假设int64地址上的内容为0,那么hash值就可以预先计算出来,然后只要copy出去就行。
另外就是如果把mumurhash单独编译成为函数,是没有这个问题的,优化没有办法跨编译单元进行。
gcc 编译参数 -fno-strict-aliasing - 云+社区 - 腾讯云 https://cloud.tencent.com/developer/article/1159055. 这篇文章里面说-O2就会把 -fstrict-aliasing 打开,但是我们开了-O2也没有遇到过着问题,也是比较奇怪。