This kind of problem is being asked/solved over and over again, it all boils down to segmentation at multiple levels: visual, physical and data segmentation.
The simplest segmentation you can do in 2D is to represent your world as a grid of square chunks, as you already thought. Then you store loaded chunks in a 2D array, or in a dictionary with
Vector2s as keys (usually in a scaled-down "chunk coordinate system", so this way their coordinates are contiguous).
Then, regularly check the chunk "slots" in a range around the player. If some aren't present, load them, and make sure chunks outside of the range get unloaded. It can be optimized by detecting when the player crosses the boundaries of its current chunk.
I said there are multiple levels of segmentation: the simplest is to mix all of them, so 1 chunk of data, visuals and physics are all in the same object, with same grid size. But that's not mandatory, you could have the whole map loaded in data, but only instance part of it as visual/physic chunks. It also depends which nodes you use, which technical limitations you need to solve (like neighboring). If your map is finite and small enough to fit in memory, you could have all data loaded with no segmentation. If you are unsure of the size, or if it's infinite, everything must be segmented.
More complex segmentation involves quadtrees or LOD grids, but in 2D you might not even need that. They would be used more likely in perspective 3D because you can see at many orders of magnitude (from flower near you to mountains in the distance), while in 2D or ortho you see a fixed range regardless of where you look at.