You would put the objects in the scene. Dah. You can put all these objects below the Viewport node and have them interact with your main scene. The Viewport will output one texture that you can then set to transparent (don't forget the ViewportContainer).
The best solution would be for the Engine to have a node with the modulate property that automatically composes one texture of its children. Sadly, a node that merges its children like that is only available in the "Alabama" version of the engine.