AUEB Library - Digital Repository

PYXIDA Institutional Repository
and Digital Library

Username
Password

Collections :	Ιδρυματικό Αποθετήριο ΟΠΑ / AUEB Institutional Repository Σχολή Επιστημών και Τεχνολογίας της Πληροφορίας / School of Informatics Τμήμα Πληροφορικής / Department of Informatics Μεταπτυχιακές Εργασίες / Postgraduate dissertations

Title :	Resource management in big data processing systems

Alternative Title :	Διαχείριση πόρων στα συστήματα επεξεργασίας big data

Creator :	Kostopoulos, Victor Efstathios

Contributor :	Kalogeraki, Vana (Επιβλέπων καθηγητής) Apostolopoulos, Theodoros (Εξεταστής) Athens University of Economics and Business, Department of Informatics (Degree granting institution)

Type :	Text

Extent :	40 p.

Language :	en

Identifier :	http://www.pyxida.aueb.gr/index.php?op=view_object&object_id=7054

Abstract :	This thesis is involved with a big data processing framework, Apache Spark, referring to some similar software, like Apache Storm and Apache Flink, as well as their comparison.First of all, the term memory elasticity will be introduced, followed by its different approaches, and some techniques and ways of achieving it.Furthermore, an Apache Spark management application will be implemented, where two different processes will be run, a Spark and a Spark Streaming process, whereas nodes will be added or removed automatically depending to the general performance and the resources availability.Finally, some comparison tests will be run to inspect the system’s performance while executing a job and some of its characteristics change, like the number of working nodes, the memory of each node, or the persistence level, in an effo rt of enlightening the factors that lead to resources consumption, and subsequently, their optimal use.Η παρούσα εργασία καταπιάνεται με ένα λογισμικό διαχείρισης Big Data, το Apache Spark, αναφέρει κάποιες παραπλήσιες εφαρμογές όπως τα Apache Storm και Apache Flink, όπως επίσης και τις συγκρίνει.Καταρχάς, επεξηγείται ο όρος memory elasticity, αναφέρονται διαφορετικοί τρόποι προσέγγισής του, καθώς και κάποιες τεχνικές για την επίτευξή του.Στο κύριο μέρος της εργασίας παρουσιάζεται μία εφαρμογή διαχείρισης του Apache Spark, η οποία θα εκτελείται σε ένα cluster υπολογιστών και ανάλογα τον φόρτο εργασίας σε αυτό και της διαθεσιμότητας των πόρων, θα προσθαφαιρεί βοηθητικούς κόμβους για την αποδοτικότερη περάτωση των εργασιών.Τέλος, θα εκτελεστούν κάποιες συγκριτικές δοκιμές, εξομοιώνοντας πραγματικές συνθήκες ενός cluster, αλλάζοντας σε κάθε έλεγχο ένα χαρακτηριστικό, όπως το ποσό της μνήμης σε ένα κόμβο, ή τον αριθμό των κόμβων, σε μια προσπάθεια να εντοπιστούν και να αποσαφηνιστούν οι λόγοι για τους οποίους οι πόροι ενός συστήματος καταναλώνονται, ώστε να αποφευχθούν ανάλογα την περίπτωση.

Abstract :

This thesis is involved with a big data processing framework, Apache Spark, referring to some similar software, like Apache Storm and Apache Flink, as well as their comparison.First of all, the term memory elasticity will be introduced, followed by its different approaches, and some techniques and ways of achieving it.Furthermore, an Apache Spark management application will be implemented, where two different processes will be run, a Spark and a Spark Streaming process, whereas nodes will be added or removed automatically depending to the general performance and the resources availability.Finally, some comparison tests will be run to inspect the system’s performance while executing a job and some of its characteristics change, like the number of working nodes, the memory of each node, or the persistence level, in an effo
rt of enlightening the factors that lead to resources consumption, and subsequently, their optimal use.Η παρούσα εργασία καταπιάνεται με ένα λογισμικό διαχείρισης Big Data, το Apache Spark, αναφέρει κάποιες παραπλήσιες εφαρμογές όπως τα Apache Storm και Apache Flink, όπως επίσης και τις συγκρίνει.Καταρχάς, επεξηγείται ο όρος memory elasticity, αναφέρονται διαφορετικοί τρόποι προσέγγισής του, καθώς και κάποιες τεχνικές για την επίτευξή του.Στο κύριο μέρος της εργασίας παρουσιάζεται μία εφαρμογή διαχείρισης του Apache Spark, η οποία θα εκτελείται σε ένα cluster υπολογιστών και ανάλογα τον φόρτο εργασίας σε αυτό και της διαθεσιμότητας των πόρων, θα προσθαφαιρεί βοηθητικούς κόμβους για την αποδοτικότερη περάτωση των εργασιών.Τέλος, θα εκτελεστούν κάποιες συγκριτικές δοκιμές, εξομοιώνοντας πραγματικές συνθήκες ενός cluster, αλλάζοντας σε κάθε έλεγχο ένα χαρακτηριστικό, όπως το ποσό της μνήμης σε ένα κόμβο, ή τον αριθμό των κόμβων, σε μια προσπάθεια να εντοπιστούν και να αποσαφηνιστούν οι λόγοι για τους οποίους οι πόροι ενός συστήματος καταναλώνονται, ώστε να αποφευχθούν ανάλογα την περίπτωση.

Subject :	Spark Streaming Big data Ganglia Python Εφαρμογή διαχείρισης Διαθεσιμότητα πόρων Έλεγχος μνήμης

Date Available :	2019-06-10 19:37:37

Date Issued :	05/15/2019

Date Submitted :	2019-06-10 19:37:37

Access Rights :	Free access

Licence :

File: Kostopoulos_2019.pdf

Type: application/pdf

Login