OK... really... just stay away from those kinds of articles. I can't think of how you can convey more wrong ideas than in that article. While what the author wants to get done actually gets done, it is not only inefficient, it might not even address the real problem.
1. The author assumes that the reason for even offloading the loading to another thread is the fact that the texture data uploading, as opposed to the much more likely image loading from disk, takes more time. This is simply not true unless you have a REALLY broken system. Memory buses are always order of magnitudes faster than storage IO controllers, even if you have a cheap IGP. What you should do is the complete opposite: load from storage in a separate thread and upload texture in the main thread. If the author's runLoader() function contains both, then either they are too lazy to separate them, or they overestimate the time it takes for the upload. It is true that the OpenGL call to load the texture data blocks. But depending on what the driver really does, it can block for differing amounts of time. Since all modern PCIe hardware makes use of DMA to transfer data, one can reliably assume that anything that you transfer to the GPU gets buffered in a separate DMA buffer in your system RAM for the GPU to "pull" over the PCIe bus when it is instructed to. Using a "pushing" variant of the driver would simply be too slow and eat up precious CPU cycles which is not something we want. Basically the call to glTexImage2D "does not block" or blocks for as little time as any other moderately complex operation involving memory copies.
2. The author states that when loading in a separate thread, you need to call glFinish() to not corrupt the texture memory because you try to read it while it is being written to... First of all, I don't see why you want to call glFinish() in the same thread that you loaded from, it makes absolutely no sense. Assuming that they meant the main thread, do they not realize calling glFinish() is exactly what they want to avoid in the first place? It is in fact even worse than a simple blocking glTexImage2D() (if that even blocks). Calling glFinish() typically causes a sync object to be inserted into the command stream and waiting for ALL operations to complete up to that point, not only the glTexImage2D. This will cause you to wait for longer than you would have without it. The best part about this point is the fact that the author assumes that the broken behaviour they describe is because of a race condition, and rule out any driver bugs. The whole idea of using "names" for resources that you need to get from OpenGL with the according glGen* functions is so that OpenGL can properly manage resources, even across multiple threads. When requesting a secondary context from which you do stuff, the driver should be smart enough to recognize any accesses to the same resource and make sure that no race conditions truly occur. Even if it didn't care, when you "use" a texture, you normally just bind it and do your rendering. This is also a queued command, and this will NOT be executed on the GPU at the same time as the data transfer. Broken drivers/implementation and an even more broken workaround.
3. The author is obviously worried about every nanosecond of CPU time that is wasted waiting for the texture load to finish. They seem to have forgotten about how much time OpenGL context management also takes up. Instead of performing explicit synchronization themselves, they just push the effort to the OS/driver and expect to magically gain performance. I'm willing to bet money that unless you load 1000+ textures, you won't even be able to measure the difference, if this implementation is even faster...
Another badly written article, containing nothing but myths that should really just die out someday. Combining C++11 code with code that is supposed to simulate proper buffer object usage on top of the legacy API is just... wrong... in all sorts of ways. If you need asynchronous transfers, buffer objects. It's that simple.