Big data became popular with the MapReduce programming model, and the big ecosystems grow big with the Hadoop system, an open-source implementation of MapReduce model. Programming is the key to understand the big data processing. In this lecture, three popular big data processing paradigms and related programming issues are introduced. First, the programming for batch data processing is introduced. The content covers the motivation for new big data processing model, MapReduce programming model, the MapReduce framework and the Hadoop ecosystems. Then, the programming for graph processing is presented, as graph is one of most popular data structure in the big data era. This part covers graph programming model, graph processing frameworks, and the related system issues. At last, the streaming data processing is introduced. Stream data processing refer to the processing the big data with big velocity, and the real time data processing. In this part, typical streaming processing system architectures, typical systems and related programming models are introduced.
Hai Jin is a Cheung Kung Scholars Chair Professor of computer science and engineering at Huazhong University of Science and Technology (HUST) in China. Jin received his PhD in computer engineering from HUST in 1994. In 1996, he was awarded a German Academic Exchange Service fellowship to visit the Technical University of Chemnitz in Germany. Jin worked at The University of Hong Kong between 1998 and 2000, and as a visiting scholar at the University of Southern California between 1999 and 2000. He was awarded Excellent Youth Award from the National Science Foundation of China in 2001. Jin is the chief scientist of ChinaGrid, the largest grid computing project in China, and the chief scientists of National 973 Basic Research Program Project of Virtualization Technology of Computing System, and Cloud Security.
Jin is a Fellow of IEEE, Fellow of CCF, and a life member of the ACM. He has co-authored 22 books and published over 800 research papers. His research interests include computer architecture, virtualization technology, cluster computing and cloud computing, peer-to-peer computing, network storage, and network security.