What is the best data structure for storing spatially sensitive remote data?


Consider I have the data about meteo station coordinates, historical temperature values and city center coordinates worldwide. Meteo stations are be placed on different distances to the city centers. The task is to determine average historical values of city temperature by the data of meteo stations.

To solve it for each city I need to find the set of closest meteo stations in some radius and average their data. The brutforce way is to calculate distances from each city to each meteo station, but it's too slow for my data. So I thought some tree data structure could help here. I tried to use R-trees to partition meteo stations by coordinates but there is a problem - such approach allows me to find meteo stations in some tree node, but it doesn't give me information about neighbouring nodes, to calculate radius condition fast (for example in case city is very close to the R-tree's node's border).

Is there a standard tree data structure which allows to search needed node fast, but also provides the set of spatial neighbours on the same tree level?

You should probably not be concerned about 'neighbours on same level' or such, this information does not necessarily mean much. I think you should probably

  1. Decide whether you want all weather stations within a given distance (range query) or the closest k weather stations.
  2. Then I would just use the API of the index that you are using to find the stations.
  3. Then calculate the distance.

R-Trees are okay for that, but they are usually quite slow for loading. If loading times are a problem, you may want to try R+tree, R*tree, or maybe Quadtrees (for small datasets) or PH-Tree (for large datasets, my implementation in Java).

How the data is organsized inside the tree should not be a concern. Who ever implemented the tree probably implemented the most efficient way of finding desired neighbours.