Spatio-textual data integration with Artificial Intelligence (AI): toponym interlinking

Ntzoufas, Alexandros; Ντζούφας, Αλέξανδρος

Spatio-textual data integration with Artificial Intelligence (AI): toponym interlinking

Αρχεία

ntzoufas_2019.pdf (4.77 MB)

Ημερομηνία

2020-07-30

Συγγραφείς

Ntzoufas, Alexandros

Ντζούφας, Αλέξανδρος

Επιβλέποντα

Papageorgiou, Haris
Παπαγεωργίου, Χάρης

Διαθέσιμο από

2020-12-21 13:26:09

Περίληψη

Toponym matching comprises the problem of identifying same real-world spatio-textual entities exclusively based on their name. It is a fundamental problem for several applications related to geographical information retrieval and the geographical information sciences, such as conflation of digital gazetteers or point-of-interest datasets, address parsing in geocoding and map search services or toponym resolution over textual contents, digitized maps and digital library contents (Santos, Murrieta-Flores, Pável, & Martins, 2017). This study is dealing with pairs of toponyms which either refer to the same place or not. Given a random toponym pair, this study is trying to predict whether it is matching or non-matching (true or false) by exploiting classification algorithms. The main pillars of the toponym matching approach which we followed in the context of this study are three: a) the word embedding learning models, b) the feature extraction methods and c) machine learning and deep learning classification algorithms. As expected, the deep learning algorithms exceeded in performance the machine learning algorithms. The fully connected neural network reached the highest f1-score and accuracy, followed by LSTM and CNN, while MLP performed better than XG Boost and Random Forest. More specifically, the f1-score and accuracy of the fully connected model were equal to 85.2% and 85.05%, respectively. It’s worth mentioning that the results of our approach exceeded significantly several published results based on string similarity metrics (Santosa, Murrieta-Floresb, & Martins, 2018) while they are quite close to state of the art.

Λέξεις-κλειδιά

Toponym matching, Geographic Information Retrieval (GIR), Natural Language Processing (NLP), Machine learning, Deep learning, Αντιστοίχιση τοπονυμίων, Ανάκτηση γεωγραφικών πληροφοριών, Επεξεργασία φυσικής γλώσσας, Μηχανική μάθηση, Βαθιά μάθηση