In GCC it's __builtin_pow which the compiler knows it should optimize.
Uhm, no. GCC doesn't optimize
__builtin_pow nearly as much as it optimizes
__builtin_powi. In particular
double foo(double p)
{
return __builtin_pow(p, 10);
}
double bar(double p)
{
return __builtin_powi(p, 10);
}
compiles down on my machine to the following assembly:
.text
.p2align 4,,15
.globl _Z3food
.def _Z3food; .scl 2; .type 32; .endef
.seh_proc _Z3food
_Z3food:
.LFB0:
.seh_endprologue
movsd .LC0(%rip), %xmm1
jmp pow
.seh_endproc
.p2align 4,,15
.globl _Z3bard
.def _Z3bard; .scl 2; .type 32; .endef
.seh_proc _Z3bard
_Z3bard:
.LFB1:
.seh_endprologue
movapd %xmm0, %xmm1
mulsd %xmm0, %xmm1
mulsd %xmm1, %xmm0
mulsd %xmm1, %xmm0
mulsd %xmm0, %xmm0
ret
.seh_endproc
What may be interesting is that this optimization takes place only when C++11 mode is
disabled. With
-std=c++11, or even
-std=gnu++11, for
double qux(double p)
{
return std::pow(p, 10);
}
GCC 4.8.2 20130701 generates a library call:
.p2align 4,,15
.globl _Z3quxd
.def _Z3quxd; .scl 2; .type 32; .endef
.seh_proc _Z3quxd
_Z3quxd:
.LFB258:
.seh_endprologue
movsd .LC0(%rip), %xmm1
jmp pow
.seh_endproc