*Sigh*, yes I have done performance tests several times over and of course I've read all the tutorials before posting (except directly interfacing with OpenGL).
I'm already a few months into the project and it runs at a solid 60 FPS on my laptop on battery saving mode (not running well on the EePC though).
I'm currently rendering 3000+ tiles at a time, by precaching possible combinations of layers at a given tileX,tileY position to new 32*32 pixel textures and allowing them to be accessed multiple times. It works really well after a long loading time. The initial loading time is what's giving me grief.
The tiles themselves aren't always always opaque or square, some have boundaries that deliberately overlap slightly
and in certain levels are prone to frequent change.
This is why I may need to cache several layers together. In my honest opinion anything that can potentially give a decent boost to performance is worth a try (especially if it can be used with a possible 2D lighting system later)
Edit 2: Sorry, I didn't mean to come across as pretentious, it was 4:30 AM in the morning when I posted and I was a bit irritable.
Please read the red box in this tutorial... 256 tiles are nothing. Have you measured the performance impact? How do you draw the tiles?
It would be possible to draw a rectangle shape with no blend mode, but before complicating your code, please make sure you need it and the disadvantages of this cache won't affect your application.