Πλοήγηση ανά Επιβλέποντα "Vazirgiannis, Michalis"

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Α Β Γ Δ Ε Ζ Η Θ Ι Κ Λ Μ Ν Ξ Ο Π Ρ Σ Τ Υ Φ Χ Ψ Ω

Τώρα δείχνει 1 - 6 από 6

Crawling facebook: a social network analysis
(2011-09) Παπαγεωργίου, Θεόδωρος; Papageorgiou, Theodore; Athens University of Economics and Business, Department of Informatics; Stamoulis, George D.; Vazirgiannis, Michalis
Online Social Networks (OSN) play an integral role in our everyday life, affecting the social life and activity of people in various ways. Social Networking sites have hundreds of millions of registered users who use these sites to share thoughts, experiences, photographs, meet new people, contact long-lost friends and family members, find jobs, spread information, and more The idea of social networks, and that social phenomena can be explained when we surpass the properties of individuals and examine their personal and social ties, has been around for over a century. Social Networks play a critical role in the social, economic, health, educational aspects of our life and behavior in general. Their structure affects the way information flows amongst people, the way diseases spread, our purchase choices, the decisions we make and the way our society evolves. In this Thesis we perform a study that includes crawling the most popular online social network site "Facebook" and performing a proof-of-concept Social Network Analysis. We describe the collection process of the crawlers implemented in python. Moreover we provide graph visualization and study several graph metrics with the help of Gephi, an open source program for visualizing and analyzing large graphs. We provide metrics and analyze network graph properties such as degree distribution, centrality measures, and community detection, among others. From our extracted anonymized data we choose to further analyze users’ likes in conjunction with their relationships and provide basic statistics and analysis. We analyze the community detection mechanism and raise the question if community unfolding results can be reproduced and/ or improved or if we take into consideration the users common preferences (likes).
Linear and non linear dimensionality reduction for distributed knowledge discovery
Magdalinos, Panagis; Athens University of Economics and Business, Department of Informatics; Vazirgiannis, Michalis
An increasing number of contemporary applications produce massive volumes of very high dimensional data. In scientific databases, for example, it is common to encounter large sets of observations, represented by hundreds or even thousands of coordinates. Unfortunately the rate of data generation and accumulation significantly outperforms our ability to explore and analyze it. Nevertheless, in order to extract knowledge from these datasets, we need to access the underlying, hidden information. However, the size and complexity of these collections makes their processing and analysis impractical or even ineffective [13, 47]. Therefore, scaling up knowledge discovery algorithms for data of both high dimensionality and cardinality has been recently recognized as one of the top-10 problems in data mining research [95].In parallel, the evolution of the internet as well as the emergence of novel applications, such as peer-to-peer systems, has led to an unprecedented distribution of available information. Data is dispersed among network nodes, making the cost of centralizing and subsequent processing prohibitive. Consequently, distributed data mining and distributed knowledge discovery have also emerged as highly challenging tasks[95]. Nevertheless, the vast amount of generated data dictates methods that are fast, exhibit low requirements in terms of computational resources and can be applied to various network setups. Motivated by the previous analysis, this thesis attempts to provide a solution through the definition of efficient and effective dimensionality reduction algorithms. The proposed methods exhibit minor requirements in terms of computational resources without compromising the quality of the produced results; therefore can be exploited in the context of centralized and distributed preprocessing for knowledge discovery. Towards this end,• We introduce FEDRA (Chapter 3, [62, 63]), a dimensionality reduction algorithm which poses minimum time and space requirements and is ideal for large FEDRA is an acronym for A Fast and Efficient Dimensionality Reduction Algorithm datasets of particularly high cardinality and dimensionality.• Inspired by the nature of landmark based dimensionality reduction algorithms(Chapter 2) we introduce the distributed adaptation of FEDRA ([62, 61]) and extend its underlying methodology in order to derive a framework for the decentralization of any landmark based dimensionality reduction algorithm (Chapter 3, Section 3.4)• We propose a distributed non linear dimensionality reduction algorithm, the Distributed Isomap (Chapter 4, [66, 65]) which to the best of our knowledge comprises the first of this kind. Additionally, motivated by recent research results on text-mining ([41, 17, 101, 78, 71]) we propose its application for hard dimensionality reduction problems related with text-mining.• Finally, we introduce X-SDR (Chapter 5, [64]), a prototype that enables the integration and evaluation of any dimensionality reduction algorithm. X-SDRis an open source tool that supports the evaluation of methods through experimentation on artificial and real world datasets thus promoting itself as an ideal candidate platform for research and teaching in academia.
Organizing and Searching Data in Unstructured P2P Networks
Doulkeridis, Christos; Athens University of Economics and Business, Department of Informatics; Vazirgiannis, Michalis
As data generation becomes increasingly inherently distributed, either due to usergenerated (multimedia) content or because of application-specific needs (sensor networks, data streams, etc.), traditional centralized architectures fail to address the new challenges of contemporary data management. A promising solution for the design and deployment of global-scale applications is the exploitation of the peer-to-peer (P2P) paradigm. P2P has emerged as a powerful model for organizing and searching large data repositories distributed over autonomous independent sources. The main topic and contribution of this thesis is the unsupervised organization of content into Semantic Overlay Networks (SONs), in a decentralized and distributed manner, and subsequently a variety of techniques for efficient searching and query processing in unstructured P2P systems. SONs have been proposed in the relevant research literature, as a way to organize peers into thematic groups, thereby enabling query routing to specific peer groups in a deliberate way, instead of blind forwarding. In particular, this work focuses on unstructured P2P networks that preserve peer autonomy. A novel protocol for unsupervised, distributed and decentralized SON construction is proposed, named DESENT [35, 38], which employs distributed clustering of peer contents, respecting the requirements imposed by the distributed nature of the environment [138]. Exploiting the generated SONs, we propose efficient routing strategies for answering similarity search queries [37, 39]. The approach is applied and tested in a distributed IR setting, aiming to address some of the limitations of P2P IR/web search. Towards this goal, a distributed dimensionality reduction algorithm is proposed [96], in order to reduce the high-dimensional feature space and improve clustering quality. Assuming a super-peer architecture we propose an approach called SIMPEER [43] that efficiently supports similarity search over data distributed over a large set of peers. We show how range queries and nearest neighbor queries can be processed. We also explore how to support non-traditional queries (such as top-k [141] and skylines [139]) that involve ranking. Furthermore, by relaxing the restriction of completely unsupervised environment and assuming a semi-supervised context, a novel technique for P2P summary caching of hierarchical information is presented, exploiting either predefined taxonomies [104] or XML schema information [36, 40], which is applied in mobile P2P context-aware environments to improve query routing [45, 44].
Semi-automatic semantic video indexing and retrieval
(2005-03-31) Pitkanen, Reetta; Athens University of Economics and Business, Department of Informatics; Vazirgiannis, Michalis
As the use of digital video increases, so does the need to provide effective management and access to such data. To achieve this we can create annotations that describe the content. This can be done manually, automatically or assisted by a professional indexer. Manual annotation of video is a very time-consuming task and different indexers are likely to use different terminology, which leads to inconsistencies. Low-level features, such as color and texture, can be automatically extracted from video data without user intervention. However, they are not sufficient to index video at a higher-level. Semantic video indexing is a promising approach to enable semi-automatic video annotation and semantic video retrieval via keywords. In order to express semantics, we need to use prior knowledge about the domain of the video in the indexing process. Semantic information cannot be extracted automatically; human intervention is unavoidable. For this reason, we should see indexing as a collaborative process between the user and the system and design interactive indexing systems. We have designed and developed an innovative end-to-end semantic video indexing and search tool. Our tool can automatically extract keywords from the video and uses ontologies to represent domain knowledge in the system. The indexed video is semantically enhanced by mapping the extracted keywords to a set of ontology concepts organized in a hierarchy. We have also developed a search tool that allows users to efficiently search and retrieve the indexed video presentations.
Stability of spectral learning algorithms: theory, methodologies and applications
Mavroeidis, Dimitrios; Athens School of Economics and Business, Department of Informatics; Vazirgiannis, Michalis
In Data Mining an important research problem is the identification and analysis of theoretical properties that characterize and explain the behavior of learning algorithms. Based on such theoretical tools the comparison and analysis of algorithms can be based on rigorous and sound criteria other than simply their empirical behavior. In this context, an important criterion that is widely used for assessing the quality of learning algorithms is stability. Stability essentially evaluates the sensitivity of the output of a learning algorithm with respect to small perturbations of the input. An algorithm is said to be stable if it is insensitive to perturbations and unstable if even small perturbations of the input can significantly alter the algorithm’s output. Based on this definition it is natural to require that an algorithm is stable. A learning paradigm that has been recently developed in the context of Data Mining considers the use of Spectral techniques for addressing several data mining tasks, ranging from Dimensionality Reduction to Clustering and Classification. Spectral algorithms are commonly perceived as approximate solutions to certain combinatorial optimization problems and derive their solutions based on the eigenvectors of appropriate input matrices. The stability of the results depends on certain algebraic properties of these matrices, thus the study of stability reduces to the study of these algebraic properties.In this thesis we focus on the stability of spectral learning algorithms and consider the following research questions: How can we measure the stability of spectral learning algorithms? What is the relation of sample size and (in)stability? What is the relation between stability and efficiency? How can we identify the sufficient sample size for guaranteeing stability? How can we enhance the stability of spectral learning algorithms? How can the concept of stability be employed in distributed learning algorithms? Based on the aforementioned research directions, the contributions of this thesis can be summarized in the following. We define an efficient bootstrap based methodology for measuring the stability of Principal Components Analysis (PCA) [88], Spectral k-Means and Normalized Spectral Clustering [85]. • We propose a Feature Selection Framework that enhances the stability of Principal Components Analysis PCA [88] and Spectral Clustering/Ordering [83]. • We propose a Semi-Supervised Framework that enhances the stability and efficiency of Spectral Ordering[83, 84]. • Motivated by [87] where highly accurate classification accuracy is achieved with very small training data sets, we associate the concept of sampling variability to instability and Sample Size, and propose a Spectral Clustering algorithm that automatically identifies the sufficient sample size that guarantees stability [85]. • We define a Distributed Spectral Clustering framework, that aims in minimizing instability. The proposed framework is demonstrated to obtain highly accurate models with low bandwidth consumption [85].
Word sense disambiguation and text relatedness based on word thesauri
Tsatsaronis, Georgios; Athens School of Economics and Business, Department of Informatics; Vazirgiannis, Michalis
As the immense amount of text data increases rapidly over the years, the need to improve the quality of algorithms in text related tasks is eminent. Traditional models for representing documents, like the standard vector space model (VSM), often neglect the semantic relatedness between words, suffering from the restriction of exact keywords matching, in order to explore the similarity or relatedness between segments of text. In critical tasks, like text classification and retrieval, which have been studied over the past decades intensively, this assumption of exact keyword matching is often the reason for poor performance. This thesis aims to explore the potential of incorporating semantic relatedness between documents in several text related applications,like text classification, retrieval and paraphrasing recognition. Several aspects have been taken into account, like natural language processing techniques and use of a word thesaurus, namely WordNet, in an effort to exhaust as many possibilities as possible in the workflow from analyzing and preprocessing documents up to embedding successfully the semantic information in a machine readable manner in those tasks. The outcome of this thesis shows that lexical semantic similarity can be used efficiently in the studied tasks and that it can boost their performance, widening the possibilities of more efficient algorithms in text applications. This thesis is part of the research project number 03E¢850/8.3.1., implemented within the framework of the Greek Reinforcement Programme of Human Research Manpower (PENED) and co-financed by Greek national and European Union Funds (25% from the Greek Ministry of Development-General Secretariat of Research and Technology, and 75% from E.U.- European Social Fund).