虚函数的额外开销
CK代码里面使用了获取虚函数地址的方式来减少虚函数调用开销
src/AggregateFunctions/IAggregateFunction.h
class IAggregateFunction : public std::enable_shared_from_this<IAggregateFunction> { /** The inner loop that uses the function pointer is better than using the virtual function. * The reason is that in the case of virtual functions GCC 5.1.2 generates code, * which, at each iteration of the loop, reloads the function address (the offset value in the virtual function table) from memory to the register. * This gives a performance drop on simple queries around 12%. * After the appearance of better compilers, the code can be removed. */ using AddFunc = void (*)(const IAggregateFunction *, AggregateDataPtr, const IColumn **, size_t, Arena *); virtual AddFunc getAddressOfAddFunction() const = 0; // codes ... } template <typename Derived> class IAggregateFunctionHelper : public IAggregateFunction { private: static void addFree(const IAggregateFunction * that, AggregateDataPtr place, const IColumn ** columns, size_t row_num, Arena * arena) { static_cast<const Derived &>(*that).add(place, columns, row_num, arena); } public: AddFunc getAddressOfAddFunction() const override { return &addFree; } // codes ... }
从注释里面可以看到,低版本虚函数实现有些问题,到了高版本编译器应该就没有问题了。我理解是
- 差的编译器用的方法是 lea (, rcx, rbx) rax; call *rax. 其中rcx是虚表的偏移量. 低版本编译器没有办法保证是常量
- 好的编译器是 mov 0x32(rbx) rax; call *rax; 其中 0x32 是虚表的偏移量
- 静态函数就是 call 0x16eff ,其中地址是常量
普通函数(静态链接)相比虚函数的优势有下面这些:
- 函数地址是常量,虚函数需要从内存中获取函数地址
- 如何普通函数实现在头文件中可以被内联
- 低版本编译器的虚函数偏移是不确定的,导致每次需要重新计算函数地址