To Question 1:
I think the reason the tilemaps aren't rendering nicely how you'd like is that the y-sort origin of each tile isn't at the bottom of the tile, but constrained to the top face. That's a limitation of Godot 3's tilesets, as far as I know.
I experimented with TileMaps myself in Godot 3.44. This is an axonometric one where each tile image is taller than the grid cell it occupies. Because the tiles are y-sorted this layer looks fine. Importantly, the tiles go 'downwards' from their cells. You can't make a tileset where the tiles 'emerge' up from the grid cells.
If each tile were a hand-placed sprite, I'd make sure the origin used for y-sorting was at the bottom of the sprite, like in the 2 block tall sprite you show in your gif. The TileMap can't do this though. Even if you set the y-sort origin property for the TileMap to bottom left, it will be the bottom left of the grid cell, not the tile image, and so it won't y-sort properly.
In Godot 4 this issue would be very easy to solve, as in the tileset editor you can set the Y-Sort point for each tile (even below the image, which is what you'd want for tile layers above the ground).
That said, you're using Godot 3.5, not 3.44. The TileMap > Cell > Custom Transform property does nothing for me, but if it works for Godot 3.5 it could be used to offset the tiles so they 'emerge' from the grid, thus fixing the problem. I am just speculating — the documentation isn't that helpful and I haven't tried Godot 3.5.
To Question 2:
That sounds like a really tricky thing to implement. I don't think you can manipulate tiles in a TileMap like sprites, so you'd have to write your own custom tilemapping tool :|
Godot 3's TileMap system sucks a lot. Godot 4's is wayyyy more flexible, so I'd recommend switching if you can. One of my favourite features is that you can make each unique tile have custom data, so you could sample the tile below a player to check what footstep sound should play etc.