It depends on how the type of transparency if you need to sort transparent meshes, it's hard to tell from the screenshots if the textures are fully blended with an 8 bit alpha channel (requires sorting, think an additive blend of a fire particle system) or if pixels of a certain colour are discarded (results in 'sharp' edges where the transparent coloured pixels would meet the textured pixels, so no actual blending of Source Colour to Destination, does not require sorting).

For extra FPS in any renderer, you should render front to back to minimize overdraw (zbuffer discards fragments before they're rasterized if they're behind what's already been written(drawn) to the buffer(Screen).

For a top down view on a planar world (like GTA2 maps) you probably wouldn't save much to make sorting a worthwhile option as very little is rendered behind visible geometry. However if the camera is looking out across the map from an FPS like perspective then you'll benifit from sorting before you render each frame.

I see that the visible portion of the map in your editor is clipped to a quad that is centered on the cameras position, are you also able to render the map clipped to the cameras view frustum? I think it'd be awesome if you could walk around the map in first person!