Cloud Computing and Big Data

Fall Term, 2018, Computer Science, Undergraduate, 4th Year


Data is growing faster than ever before, more data has been created in the past two years than in the entire previous history. By the year 2020, about 1.7 megabytes of new information will be created every second for every human being on the planet. Our accumulated data will grow from 4.4 zettabytes today to around 44 zettabytes, or 44 trillion gigabytes. The number of devices is also quickly growing. By 2020, we will have over 6.1 billion smartphone users globally and there will be over 50 billion smart connected devices in the world, all developed to collect and share data. The operation of these large volumes of data in order to get their insights in real time presents new challenges and opportunities for existing parallel data processing platforms cloud computing infrastructures.

This course introduces cloud computing and big data, and demonstrates the core tools used to wrangle and analyze big data on the cloud. With no prior experience, you will have the opportunity to walk through hands-on examples with Hadoop and Spark frameworks, two of the most common in the industry, and manage elastic processing environments using Amazon Web Services. You will also explore the basics of cloud services and cloud deployment models. You will become acquainted with commonly used industry terms, typical business scenarios and applications for the cloud, and benefits and limitations inherent in the new paradigm that is the cloud.


  • Large-scale Data Science
  • Parallel Processing Architectures
  • Large-scale Processing on the Cloud
  • Practical Aspects of Cloud Computing
  • Foundations of Data Processing
  • Batch Data Processing
  • Dataflow Processing
  • Stream Data Processing

Assignments: Programming models you will use in this course mostly include MapReduce/Hadoop and Spark

Hands-on Labs: Cloud infrastructures you will use in this course mostly include AWS and OpenNebula.

Practical Cases and Reading Assignments: Material you will discuss in the course include articles about cloud pricing, cloud adoption strategies, private cloud designs, cloud and big data case studies, and edge computing.


  • All course materials, including syllabus, handouts, slides, and midterms, will be posted on Google Classroom