by Zhang, Xiaohan, Shore, Tavis, Chen, Chen, Mendez, Oscar, Hadfield, Simon and Wshah, Safwan
Abstract:
In this paper, we present a high-performing solution to the UAVM 2025 Challenge, which focuses on matching narrow Field-of-View (FOV) street-level images to corresponding satellite imagery using the University-1652 dataset. As panoramic Cross-View Geo-Localisation nears peak performance, it becomes increasingly important to explore more practical problem formulations. Real-world scenarios rarely offer panoramic street-level queries; instead, queries typically consist of limited-FOV images captured with unknown camera parameters. Our work prioritises discovering the highest achievable performance under these constraints, pushing the limits of existing architectures. Our method begins by retrieving candidate satellite image embeddings for a given query, followed by a re-ranking stage that selectively enhances retrieval accuracy within the top candidates. This two-stage approach enables more precise matching, even under the significant viewpoint and scale variations inherent in the task. Through experimentation, we demonstrate that our approach achieves competitive results - specifically attaining R@1 and R@10 retrieval rates of 30.21% and 63.13% respectively. This underscores the potential of optimised retrieval and re-ranking strategies in advancing practical geo-localisation performance. Code is available at github.com/tavisshore/VICI.
Reference:
VICI: VLM-Instructed Cross-view Image-localisation (Zhang, Xiaohan, Shore, Tavis, Chen, Chen, Mendez, Oscar, Hadfield, Simon and Wshah, Safwan), In Proceedings of the 3rd International Workshop on UAVs in Multimedia: Capturing the World from a New Perspective, Association for Computing Machinery, 2025. (Code)
Bibtex Entry:
@inproceedings{10.1145/3728482.3757386,
author = {Zhang, Xiaohan and Shore, Tavis and Chen, Chen and Mendez, Oscar and Hadfield, Simon and Wshah, Safwan},
title = {VICI: VLM-Instructed Cross-view Image-localisation},
year = {2025},
isbn = {9798400718397},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3728482.3757386},
doi = {10.1145/3728482.3757386},
abstract = {In this paper, we present a high-performing solution to the UAVM 2025 Challenge, which focuses on matching narrow Field-of-View (FOV) street-level images to corresponding satellite imagery using the University-1652 dataset. As panoramic Cross-View Geo-Localisation nears peak performance, it becomes increasingly important to explore more practical problem formulations. Real-world scenarios rarely offer panoramic street-level queries; instead, queries typically consist of limited-FOV images captured with unknown camera parameters. Our work prioritises discovering the highest achievable performance under these constraints, pushing the limits of existing architectures. Our method begins by retrieving candidate satellite image embeddings for a given query, followed by a re-ranking stage that selectively enhances retrieval accuracy within the top candidates. This two-stage approach enables more precise matching, even under the significant viewpoint and scale variations inherent in the task. Through experimentation, we demonstrate that our approach achieves competitive results - specifically attaining R@1 and R@10 retrieval rates of 30.21\% and 63.13\% respectively. This underscores the potential of optimised retrieval and re-ranking strategies in advancing practical geo-localisation performance. Code is available at github.com/tavisshore/VICI.},
booktitle = {Proceedings of the 3rd International Workshop on UAVs in Multimedia: Capturing the World from a New Perspective},
pages = {21–25},
numpages = {5},
keywords = {image localisation, cross-view geo-localisation, vision-language model, image retrieval},
location = {Ireland},
series = {UAVM '25},
Comment = {<a href="http://github.com/tavisshore/VICI">Code</a>},
Url = {http://personalpages.surrey.ac.uk/s.hadfield/papers/Shore25c.pdf},
}