Big Data Concepts, Theories, and Applications PDF

ebook img

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Big Data Concepts, Theories, and Applications

Shui Yu · Song Guo Editors Big Data Concepts, Theories, and Applications Big Data Concepts, Theories, and Applications Shui Yu • Song Guo Editors Big Data Concepts, Theories, and Applications 123 Editors ShuiYu SongGuo SchoolofInformationTechnology SchoolofComputerScience DeakinUniversity andEngineering Burwood,VIC,Australia TheUniversityofAizu Aizu-WakamatsuCity,Fukushima,Japan ISBN978-3-319-27761-5 ISBN978-3-319-27763-9 (eBook) DOI10.1007/978-3-319-27763-9 LibraryofCongressControlNumber:2015958772 SpringerChamHeidelbergNewYorkDordrechtLondon ©SpringerInternationalPublishingSwitzerland2016 Thisworkissubjecttocopyright.AllrightsarereservedbythePublisher,whetherthewholeorpartof thematerialisconcerned,specificallytherightsoftranslation,reprinting,reuseofillustrations,recitation, broadcasting,reproductiononmicrofilmsorinanyotherphysicalway,andtransmissionorinformation storageandretrieval,electronicadaptation,computersoftware,orbysimilarordissimilarmethodology nowknownorhereafterdeveloped. Theuseofgeneraldescriptivenames,registerednames,trademarks,servicemarks,etc.inthispublication doesnotimply,evenintheabsenceofaspecificstatement,thatsuchnamesareexemptfromtherelevant protectivelawsandregulationsandthereforefreeforgeneraluse. Thepublisher,theauthorsandtheeditorsaresafetoassumethattheadviceandinformationinthisbook arebelievedtobetrueandaccurateatthedateofpublication.Neitherthepublishernortheauthorsor theeditorsgiveawarranty,expressorimplied,withrespecttothematerialcontainedhereinorforany errorsoromissionsthatmayhavebeenmade. Printedonacid-freepaper SpringerInternational PublishingAGSwitzerlandispartofSpringerScience+Business Media(www. springer.com) Preface Bigdataisoneofthehottestresearchtopicsinscienceandtechnologycommunities, and it possesses a great potential in every sector for our society, such as climate, economy,health,socialscience,andsoon.Bigdataiscurrentlytreatedasdatasets with sizes beyondthe ability of commonlyused software tools to capture, curate, andmanage.We havetastedthepowerofbigdatainvariousapplications,suchas finance,business,health,andsoon.However,bigdataisstillinherinfancystage, which is evidenced by its vague definition, limited application, unsolved security and privacy barriers for pervasive implementation, and so forth. It is certain that we will face many unprecedented problems and challenges along the way of this unfoldingrevolutionarychapterofhumanhistory. Bigdataisdrivenbyapplicationsandaimstoobtainknowledgeorconclusions directlyfrombigdatasets.Asanapplication-orientedfield,itisinevitablyneededto integratedomainknowledgeintoinformationsystems,whichissimilartotraditional databasesystems,whichpossessarigorousmathematicalfoundation,asetofdesign rules, and implementation mechanisms. We imagine that we may have similar counterpartsinbigdata. We have witnessed the significant development in big data from various com- munities, such as the mining and learning algorithms from the artificial intelli- gencecommunity,networkingfacilitiesfromnetworkingcommunity,andsoftware platforms from software engineering community. However, big data applications introduceunprecedentedchallengestous,andexistingtheoriesandtechniqueshave to be extended and upgraded to serve the forthcoming real big data applications. Withahighprobability,weneedtoinventnewtoolsforbigdataapplications.With the increasing volume and complexity of big data, theoretical insights have to be employedtoachievetheoriginalgoalofbigdataapplications.Asthefoundationof theoreticalexploration,constantrefinementsoradjustmentsofbigdatadefinitions andmeasurementsarenecessaryanddemanded.Ideally,theoreticalcalculationand inference will replace the current brute force strategy. We have seen the effort from different communities in this direction, such as big data modeling, big task scheduling, privacy framework, and so on. Once again, these theoretical attempts arestillinsufficienttomostoftheincomingbigdataapplications. v vi Preface Motivated by these problemsand challenges, we proposedthis book aiming to collectthelatestresearchoutputinbigdatafromvariousperspectives.Wewishour effortwillpaveasolidstartinggroundforresearchersandengineerswhoaregoing to starttheir explorationin thisalmostunchartedlandof bigdata. Asa result,the book emphasizes in three parts: concepts, theories, and applications. We received manysubmissionsand finally acceptedtwelve chaptersafter a strict selection and revisionprocessing.Itisregretfulthatmanygoodsubmissionshavebeenexcluded due to our theme and space limitation. From our limited statistics, we notice that thereisagreatinterestinsecurityandapplicationaspectsofbigdata,whichreflects the currentreality of the domain:big data applicationsare valuableand expected, andsecurityandprivacyissuehastobeappropriatelyhandledbeforethepervasive practice of big data in our society. On the other hand, the theoretical part of big data is not as high as we expected. We fully believe the theoretical effort in big data is essentialand highlydemandedin problemsolvingin the big data age, and it is worthwhile to invest our energy and passion in this direction without any reservation. Finally,wethankalltheauthorsandreviewersofthisbookfortheirgreateffort and cooperation.Many people helped us in this book project, we appreciate their guidanceandsupport.Inparticular,wewouldliketotakethisopportunitytoexpress oursincereappreciationandcherishedmemorytolateProfessorIvanStojmenovic, agreatmentorandfriend.AtSpringer,wewouldliketothankSusanLagerstrom- FifeandJenniferMalatfortheirprofessionalsupport. Melbourne,VIC,Australia ShuiYu Fukushima,Japan SongGuo Contents 1 BigContinuousData:DealingwithVelocitybyComposing EventStreams. 1 GenovevaVargas-Solar,JavierA.Espinosa-Oviedo,andJosé LuisZechinelli-Martini 2 BigDataToolsandPlatforms. 29 SouravMazumder 3 TrafficIdentificationinBigInternetData. 129 BinfengWang,JunZhang,ZiliZhang,WeiLuo, andDawenXia 4 SecurityTheoriesandPracticesforBigData . 157 LeiXuandWeidongShi 5 RapidScreeningofBigDataAgainstInadvertentLeaks. 193 XiaokuiShu,FangLiu,andDanfeng(Daphne)Yao 6 BigDataStorageSecurity . 237 MiWen,ShuiYu,JinguoLi,HongweiLi,andKejieLu 7 Cyber Attacks on MapReduce ComputationTime inaHadoopCluster. 257 WilliamGlennandWeiYu 8 SecurityandPrivacyforBigData. 281 ShuyuLiandJerryGao 9 BigDataApplicationsinEngineeringandScience . 315 Kok-LeongOng,DaswinDeSilva,YeeLingBoo,EeHui Lim,FrankBodi,DammindaAlahakoon,andSimoneLeao vii viii Contents 10 GeospatialBigDataforEnvironmentalandAgricultural Applications. 353 AthanasiosKarmas,AngelosTzotsos, andKonstantinosKarantzalos 11 BigDatainFinance . 391 BinFangandPengZhang 12 BigDataApplicationsinBusinessAnalysis . 413 SienChen,YinghuaHuang,andWenqiangHuang Chapter 1 Big Continuous Data: Dealing with Velocity by Composing Event Streams GenovevaVargas-Solar,JavierA.Espinosa-Oviedo, andJosé LuisZechinelli-Martini Abstract The rate at which we produce data is growing steadily, thus creating evenlargerstreamsofcontinuouslyevolvingdata.Onlinenews,micro-blogs,search queriesarejusta fewexamplesofthesecontinuousstreamsofuseractivities.The valueofthese streamsrelies in theirfreshnessand relatednessto on-goingevents. Modern applications consuming these streams need to extract behaviour patterns that can be obtained by aggregating and mining statically and dynamically huge eventhistories.Aneventisthenotificationthatahappeningofinteresthasoccurred. Event streams must be combined or aggregated to produce more meaningful information.By combining and aggregatingthem either from multiple producers, orfromasingleoneduringagivenperiodoftime,alimitedsetofeventsdescribing meaningful situations may be notified to consumers. Event streams with their volumeandcontinuousproductioncopemainlywithtwoofthecharacteristicsgiven to Big Data by the 5V’s model: volume & velocity. Techniques such as complex pattern detection, event correlation, event aggregation, event mining and stream processing,have beenused for composingevents.Nevertheless,to the bestof our knowledge,fewapproachesintegratedifferentcompositiontechniques(onlineand post-mortem)for dealing with Big Data velocity. This chapter gives an analytical overview of event stream processing and composition approaches:complex event languages, services and event querying systems on distributed logs. Our analysis underlines the challenges introduced by Big Data velocity and volume and use them as reference for identifying the scope and limitations of results stemming from different disciplines: networks, distributed systems, stream databases, event compositionservices,anddataminingontraces. G.Vargas-Solar((cid:2))(cid:129)J.A.Espinosa-Oviedo CNRS-LIG-LAFMIA,681ruedelaPasserelleBP72,SaintMartind’Hères, 38402Grenoble,France e-mail:[email protected];[email protected] J.L.Zechinelli-Martini UDLAP-LAFMIA,ExhaciendaSta.CatarinaMártirs/n,SanAndrésCholula, 72810Cholula,Mexico e-mail:[email protected] ©SpringerInternationalPublishingSwitzerland2016 1 S.Yu,S.Guo(eds.),BigDataConcepts,Theories,andApplications, DOI10.1007/978-3-319-27763-9_1