I'm no expert on this, but the number one thing I ALWAYS hear the experts saying on this subject is that you want to minimize your draw calls. As far as I know there are two main ways of doing this:
1) Draw several things with a single draw() call by using a VertexArray instead of a ton of sprites (ideal for things like tiled levels)
2) Simply not drawing things that won't be visible on screen (this is called "culling"), either because they're outside the window entirely or they're covered up by other stuff.
My impression is that things like RenderTexture don't matter all that much, except in the sense that drawing something to a RenderTexture and then drawing that texture to a RenderWindow is two separate draw() calls where you might have gotten away with just one.
Of course, if you're doing something silly like loading new textures or creating new rendertextures or capturing the screen every single frame (yes, I've done this myself >_>) then that's probably a big problem you can easily fix.
Some specific responses:
1) If all you're doing is rendering everything to a RenderTexture and then rendering the result to the window, all you've done is added an extra draw() call for no benefit. Of course, if you ever want to do post-processing effects, you'll have to do this anyway, but don't bother with it if you don't plan to do any of that.
2) Not really. Again, moving the texture to the window afterward is just an extra draw() call that doesn't gain you anything. I've never heard of rendertexture being somehow faster or slower than renderwindow.
3) clear() and display() should always be called exactly once per frame no matter what, so I'm not sure why this question matters. However, my gut feeling is that display() merely swaps buffers (look up Double Buffering) so draw() would be doing all the real work.
4) The official tutorials explicitly state you should not use both. In my personal experiments setting vsync doesn't do much but framerate limiting works flawlessly. What Vsync does is tell the graphics card not to bother rendering frames faster than the monitor can display them, which typically means 60fps. There are a LOT of details about this process I'm not going to try explaining here (eg, why Triple Buffering becomes relevant). Check out
http://www.tweakguides.com/Graphics_9.html for a far better explanation than I can provide.
This is just what I've gotten from other sources. Since I'm not even close to an expert, I won't try to analyze your code myself. I'd probably get it wrong.