AUEB Library - Digital Repository

PYXIDA Institutional Repository
and Digital Library

Username
Password

Collections :	Ιδρυματικό Αποθετήριο ΟΠΑ / AUEB Institutional Repository Σχολή Επιστημών και Τεχνολογίας της Πληροφορίας / School of Informatics Τμήμα Πληροφορικής / Department of Informatics Μεταπτυχιακές Εργασίες / Postgraduate dissertations

Title :	Ανάλυση μεθόδων βαθιάς μάθησης για την παραγωγή υψηλής ποιότητας αναπαραστάσεων λέξεων της Ελληνικής γλώσσας

Alternative Title :	Deep learning methods analysis for producing high-quality word representations for the Greek language

Creator :	Λιουδάκης, Μιχαήλ

Contributor :	Βαζιργιάννης, Μιχαήλ (Επιβλέπων καθηγητής) Ανδρουτσόπουλος, Ίων (Εξεταστής) Γκρίτζαλης, Δημήτριος (Εξεταστής) Οικονομικό Πανεπιστήμιο Αθηνών, Τμήμα Λογιστικής και Χρηματοοικονομικής (Degree granting institution)

Type :	Text

Extent :	42σ.

Language :	el

Identifier :	http://www.pyxida.aueb.gr/index.php?op=view_object&object_id=7314

Abstract :	Η Επεξεργασία Φυσικής Γλώσσας (ΕΦΓ) είναι ένας πολύ ενεργός κλάδος έρευνας της Τεχνητής Νοημοσύνης, η οποία τα τελευταία χρόνια έχει εξελιχθεί παράλληλα με την ανάπτυξη των νευρωνικών δικτύων και της βαθιάς μάθησης. Πολλές εφαρμογές της ΕΦΓ χρειάζονται ενθέσεις λέξεων για να πετύχουν τα καλύτερα δυνατά αποτελέσματα. Ωστόσο παρά την τεράστια πρόοδο των ενθέσεων λέξεων για την Αγγλική γλώσσα, υπάρχουν μόνο λίγες δημοσιευμένες εργασίες για τις ενθέσεις λέξεων της Ελληνικής γλώσσας. Η παρούσα διπλωματική στοχεύει στην παραγωγή υψηλής ποιότητας αναπαραστάσεων λέξεων για την Ελληνική γλώσσα. Επιπρόσθετα, προτείνεται μια νέα μέθοδος για την παραγωγή ενθέσεων λέξεων. Η μέθοδος Continuous Bag-of-Skip-grams (CBOS) συνδυάζει τις δύο πιο δημοφιλείς προσεγγίσεις για την εκμάθηση αναπαράστασης λέξεων: Continuous Bag-of-Words (CBOW) και Skip-gram. Αυτές οι μέθοδοι μαζί με την position-dependent CBOW συγκρίνονται μέσω της μετρικής word analogy σε τρεις διαφορετικές πηγές δεδομένων: το κείμενο της Αγγλικής Wikipedia, το κείμενο της Ελληνικής Wikipedia, και το περιεχόμενο του Ελληνικού Παγκόσμιου Ιστού. Συγκρίνοντας αυτές τις μεθόδους μεταξύ διαφορετικών σετ δεδομένων, γίνεται φανερό πως η μέθοδος CBOS πετυχαίνει επίδοση τελευταίας τεχνολογίας. Natural Language Processing (NLP) is a very active research subfield of Artificial Intelligence, which the last few years has evolved along with the development of neural networks and deep learning. Many NLP applications need word representations in order to achieve the best possible results. Even though there is continuous progress on word embeddings for the English language, there are only a few published works for Greek language word embeddings. The present aims to produce high-quality word representations for the Greek language. Furthermore, a new method for producing word embeddings is proposed. The Continuous Bag-of-Skip-grams (CBOS) method combines the two most popular approaches for learning word representations: Continuous Bag-of-Words (CBOW) and Continuous Skip-gram. These methods in addition to position-dependent CBOW, are compared through word analogy task on three different sources of data: English Wikipedia corpus, Greek Wikipedia corpus, and Greek Web Content corpus. By comparing these methods across different datasets, it is evident that the CBOS method achieves state-of-the-art performance.

Abstract :

Η Επεξεργασία Φυσικής Γλώσσας (ΕΦΓ) είναι ένας πολύ ενεργός κλάδος έρευνας της Τεχνητής Νοημοσύνης, η οποία τα τελευταία χρόνια έχει εξελιχθεί παράλληλα με την ανάπτυξη των νευρωνικών δικτύων και της βαθιάς μάθησης. Πολλές εφαρμογές της ΕΦΓ χρειάζονται ενθέσεις λέξεων για να πετύχουν τα καλύτερα δυνατά αποτελέσματα. Ωστόσο παρά την τεράστια πρόοδο των ενθέσεων λέξεων για την Αγγλική γλώσσα, υπάρχουν μόνο λίγες δημοσιευμένες εργασίες για τις ενθέσεις λέξεων της Ελληνικής γλώσσας. Η παρούσα διπλωματική στοχεύει στην παραγωγή υψηλής ποιότητας αναπαραστάσεων λέξεων για την Ελληνική γλώσσα. Επιπρόσθετα, προτείνεται μια νέα μέθοδος για την παραγωγή ενθέσεων λέξεων. Η μέθοδος Continuous Bag-of-Skip-grams (CBOS) συνδυάζει τις δύο πιο δημοφιλείς προσεγγίσεις για την εκμάθηση αναπαράστασης λέξεων: Continuous Bag-of-Words (CBOW) και Skip-gram. Αυτές οι μέθοδοι μαζί με την position-dependent CBOW συγκρίνονται μέσω της μετρικής word analogy σε τρεις διαφορετικές πηγές δεδομένων: το κείμενο της Αγγλικής Wikipedia, το κείμενο της Ελληνικής Wikipedia, και το περιεχόμενο του Ελληνικού Παγκόσμιου Ιστού. Συγκρίνοντας αυτές τις μεθόδους μεταξύ διαφορετικών σετ δεδομένων, γίνεται φανερό πως η μέθοδος CBOS πετυχαίνει επίδοση τελευταίας τεχνολογίας.
Natural Language Processing (NLP) is a very active research subfield of Artificial Intelligence, which the last few years has evolved along with the development of neural networks and deep learning. Many NLP applications need word representations in order to achieve the best possible results. Even though there is continuous progress on word embeddings for the English language, there are only a few published works for Greek language word embeddings. The present aims to produce high-quality word representations for the Greek language. Furthermore, a new method for producing word embeddings is proposed. The Continuous Bag-of-Skip-grams (CBOS) method combines the two most popular approaches for learning word representations: Continuous Bag-of-Words (CBOW) and Continuous Skip-gram. These methods in addition to position-dependent CBOW, are compared through word analogy task on three different sources of data: English Wikipedia corpus, Greek Wikipedia corpus, and Greek Web Content corpus. By comparing these methods across different datasets, it is evident that the CBOS method achieves state-of-the-art performance.

Subject :	Αναπαραστάσεις λέξεων Επεξεργασία φυσικής γλώσσας Βαθιά μάθηση Τεχνητή νοημοσύνη Word representations Natural language processing Deep learning Continuous Bag-of-Words Continuous Skip-gram

Date Available :	2019-09-23 10:53:06

Date Issued :	08/02/2019

Date Submitted :	2019-09-23 10:53:06

Access Rights :	Free access

Licence :

File: Lioudakis_2019.pdf

Type: application/pdf

Login