For a world as dense in tiles as Terraria, you will get better performance with custom systems.
One general way to optimize those games is to "flyweight" your tiles. Don't create objects or nodes for all of them. Your tiles should be represented either as packed arrays of ints, or textures if you want to use the power of your graphics card. Then, you create temporary structures on the fly when interacting with them, but you don't need to keep track or store anything most of the time. Of course, chunking always help for large grids especially if the world is infinite.
Godot's 2D physics engine is general purpose and isn't that good at handling large amounts of colliders. Instead, you could go for something more specific such as AABB physics, which means everything is a box that slides on other, static boxes. If the world is a grid, it makes things tremendously faster because you don't need to actually instanciate a box collider for every single of those 10000s of tiles (FYI one node is about 300 bytes). You only need to query the few tiles around your player, do box intersections with only them, and then throw them away. You can do this anytime, anywhere.
I implemented this in 3D C++ the same way Minecraft does, it might be portable to 2D by removing the Z coordinate: https://github.com/Zylann/godot_voxel/blob/137059f514c25fa0559df494d8ca8ab4a4988cfd/terrain/voxel_box_mover.cpp#L78
A middle ground otherwise would be to create only the few surrounding tiles like you said, but it means you have to track those objects (with reference counts?) so that you know when to destroy them, even if multiple players need the same tile.
About graphics, straight away, don't use nodes for each tile. They will be too many. Tilemap might be ok if it only has visuals (no colliders, no occluders, no navigation). It already stores tiles as IDs and because of that it might be suitable. But if you need even more performance, a GPU tilemap can render more. In a technique like this your tilemap would be a texture in which each pixel contains an ID rather than a color, and a shader would match that ID to a position in a tileset with a quick formula and a bit of tileset conventions (no ifs). This can render in a single (or few) drawcall. Unfortunately I never tried this myself and I don't have enough time to explain more how it would be done, but it's something that's been around, notably here: https://github.com/MightyPrinny/godot-gputilemap
I don't know if Terraria does this, but I suspect it uses a texture to render lighting, where again each pixel contains the color of the light in one tile, which modulates the screen in some way.
It might even be useful for simulations like water, using viewports maybe. But I bet Terraria actually does it on the CPU with C#.
If there are 10 players on a server and approaches like I mentioned are applied, i think it should run pretty well. It all depends on connection and the complexity of your game.
Sorry if I didnt give you ready to use code, but it's a broad question and there are several ways you could achieve a game like this, and it requires experimentation.