Abstract : | We are entering the era of Big Data. Half a century after computers enteredin our society, data has begun to be the center of attention. Not only there ismassive information, which has never existed before, but also this informationis proliferating faster and has a variety of forms. Therefore, Big Data bringsnew opportunities for the development of society and poses new challenges todata scientists. It has unique features that are not shared by the traditionaldata sets. Speciffcally, it is characterized by high dimensionality and largesample size. Due to this, ordinary statistical methods do not work. Thus,we need new effective statistical procedures and computational methods.In the current thesis, we will study methods that deal with regressionfor large datasets. After we review such existing methodologies, we willexamine an updating method, which estimates the regression coefficients andis faster than the ordinary least square estimation. We will also investigatethe regression by orthogonalization and regression using QR decomposition.Afterwards, we will compare the methods through their respective runningtime that is required in order to estimate the regression coefficients. In thesame way, generalised linear models are studied and an updating method forthem is attempted. Finally, several proposals for deeper exploration of theprevious subjects will be discussed.
|
---|