AUEB Library - Digital Repository

PYXIDA Institutional Repository
and Digital Library

Username
Password

Collections :	Ιδρυματικό Αποθετήριο ΟΠΑ / AUEB Institutional Repository Σχολή Επιστημών και Τεχνολογίας της Πληροφορίας / School of Informatics Τμήμα Στατιστικής / Department of Statistics Μεταπτυχιακές Εργασίες / Postgraduate dissertations

Title :	Statistical models for natural language processing and topic modelling in R

Alternative Title :	Στατιστικά μοντέλα για επεξεργασία της φυσικής γλώσσας και μοντελοποίηση θεμάτων με χρήση της γλώσσας προγραμματισμού R

Creator :	Καβούρ, Ευθύμιος-Ιωάννης Kavour, Efthimios-Ioannis

Contributor :	Papastamoulis, Panagiotis (Επιβλέπων καθηγητής) Papageorgiou, Ioulia (Εξεταστής) Pedeli, Xanthi (Εξεταστής) Athens University of Economics and Business, Department of Statistics (Degree granting institution)

Type :	Text

Extent :	78p.

Language :	en

Identifier :	http://www.pyxida.aueb.gr/index.php?op=view_object&object_id=11516

Abstract :	Σκοπός της παρούσας διπλωματικής εργασίας είναι η εις βάθος ανάλυση και εφαρμογή της μεθόδου Latent Dirichlet Allocation (LDA), η οποία επιτρέπει την κατηγοριοποίηση λεκτικών δεδομένων σε θεματικές ομάδες. Αρχικά, παρουσιάζεται μια συνοπτική εισαγωγή στη μηχανική μάθηση, ακολουθούμενη από λεπτομερή μελέτη του μοντέλου ενδιαφέροντος. Τέλος, η μέθοδος εφαρμόζεται σε μια συλλογή βιβλίων, με σκοπό την ανάλυση και κατηγοριοποίηση των περιγραφών τους. The aim of this thesis is the in-depth analysis and application of the Latent Dirichlet Allocation (LDA) method, which allows for the categorization of textual data into thematic groups. Initially, a brief introduction to machine learning is provided, followed by a detailed study of the model of interest. Finally, the method is applied to a collection of books, analyzing and grouping of their descriptions.

Abstract :

Σκοπός της παρούσας διπλωματικής εργασίας είναι η εις βάθος ανάλυση και εφαρμογή της μεθόδου Latent Dirichlet Allocation (LDA), η οποία επιτρέπει την κατηγοριοποίηση λεκτικών δεδομένων σε θεματικές ομάδες. Αρχικά, παρουσιάζεται μια συνοπτική εισαγωγή στη μηχανική μάθηση, ακολουθούμενη από λεπτομερή μελέτη του μοντέλου ενδιαφέροντος. Τέλος, η μέθοδος εφαρμόζεται σε μια συλλογή βιβλίων, με σκοπό την ανάλυση και κατηγοριοποίηση των περιγραφών τους.
The aim of this thesis is the in-depth analysis and application of the Latent Dirichlet Allocation (LDA) method, which allows for the categorization of textual data into thematic groups. Initially, a brief introduction to machine learning is provided, followed by a detailed study of the model of interest. Finally, the method is applied to a collection of books, analyzing and grouping of their descriptions.

Subject :	Μηχανική μάθηση Εξαγωγή δεδομένων από το διαδίκτυο Επεξεργασία φυσικής γλώσσας Machine learning (ML) Web scrapping Latent Dirichlet Allocation (LDA) Natural Language Processing (NLP) Latent semantic analysis

Date Available :	2024-09-18 18:57:25

Date Issued :	17-09-2024

Date Submitted :	2024-09-18 18:57:25

Access Rights :	Free access

Licence :

File: Kavour_2024.pdf

Type: application/pdf

Login