The general advantage AND disadvantage of your approach is that the only reasonable way to do it is per-pixel collision.
It's very precise, however also costs a lot of CPU time.
If you really want to implement it, divide the scene into individual objects, as was said earlier. Then generate collision bitmap from top-view for every object. First test against AABB, then against the bitmap with per-pixel.
I also assumed that you need something more than rectangle-rectangle collisions. If not, just skip the second step of what I said earlier.