I believe that what Hapax meant by error accumulation is the error (inaccuracy) caused by std::chrono, which does accumulate when using multiple measurements.
It does not, in this case. Let me explain. What we are looking at is a sequence of time points:
{ t0, t1, ...tk }
where the individual times tn are composed of a fixed 'true' time n * dt (dt being the fixed, unvarying GPU frame time) and an arbitrary small error en, where en is never larger than the maximal measurement error (std::chrono's inaccuracy):
tn = n * dt + en
Now we increase k. We have
tk = k * dt + ek
When we take the average a, we get
a = tk / k
a = dt + ek / k
So, after k measurements, the error of the average is precisely ek / k, which, with large k, approaches 0. There is now way for an error to accumulate in this system, the error only depends on the error of the last measurement, divided by the number of total measurements.
I have to eat my words concerning another statement I made, though. I wrote earlier that the average frame time did not come out at 16.6 / 20.0 respectively. This was due to sloppy programming on my part: I stupidly did a std::chrono::duration_cast<std::chrono::milliseconds> on my times, resulting in the value being truncated to an integer millisecond value, which produced the .5 time difference, because my faulty maths sure *did* cumulate. I now changed the code to use std::chrono::duration_cast<std::chrono::microseconds>, assign to float and divide by 1000.0. Lo and behold, the timing is spot on. And it only takes a few frames to stabilize, so 25 is ample for a sample.