Big Data Analysis

Course Overview

Without big data analytics, companies are blind and deaf, wandering out onto the web like deer on a freeway. This training is for those new to data science and interested in understanding why the Big Data Era has come to be. It is for those who want to become conversant with the terminology and the core concepts behind big data problems, applications, and systems. It is for those who want to start thinking about how Big Data might be useful in their business or career. It provides an introduction to one of the most common frameworks, Hadoop, that has made big data analysis easier and more accessible -- increasing the potential for data to transform our world!

Course Objectives

Participants will be able to learn:

  • Describe the Big Data landscape including examples of real world big data problems including the three key sources of Big Data: people, organizations, and sensors
  • Explain the V’s of Big Data (volume, velocity, variety, veracity, valence, and value) and why each impacts data collection, monitoring, storage, analysis and reporting
  • Get value out of Big Data by using a 5-step process to structure your analysis
  • Identify what are and what are not big data problems and be able to recast big data problems as data science questions.
  • Provide an explanation of the architectural components and programming models used for scalable big data analysis
  • Summarize the features and value of core Hadoop stack components including the YARN resource and job management system, the HDFS file system and the MapReduce programming model

Course Content

Day 1:

  • What's in Big Data Applications and Systems?
  • Data -- it's been around (even digitally) for a while. What makes data "big" and where this does big data come from?
  • What launched the Big Data era?
  • What makes big data valuable?
  • Saving lives with Big Data: Using Big Data to Help Patients
  • Machine-Generated Data: It's Everywhere and There's a Lot!
  • Machine-Generated Data: Advantages
  • Big Data Generated By People: The Unstructured Challenge
  • Organization-Generated Data: Structured but often siloed
  • Organization-Generated Data: Benefits Come From Combining with Other Data Types
  • The Key: Integrating Diverse Data
  • Machine-Generated Data: It's Everywhere and There's a Lot!
  • Practical Lab:How to use IBM Watson Analytics to do Big Data Analytics

Day 2:

  • Characteristics of Big Data and Dimensions of Scalability
  • Characteristics Of Big Data
  • Six V’s of Big Data
  • Data Science: Getting Value out of Big Data
  • Building a Big Data Strategy
  • How does big data science happen?: Five Components of Data Science
  • Five P's of Data Science
  • Asking the Right Question
  • Steps in the Data Science Process
  • Step 1: Acquiring Data
  • Step 2-A: Exploring Data
  • Step 2-B: Pre-Processing Data
  • Step 3: Analyzing Data
  • Step 4: Communicating Results
  • Step 5: Turning Insights into Action
  • Practical Lab:Using Alteryx and Tableau to Big Data Exploratory Analysis

Day 3

  • Foundations for Big Data Systems and Programming
  • What is a Distributed File System?
  • Scalable Computing over the Internet
  • Programming Models for Big Data
  • Intro to Hadoop
  • Intro to Spark
  • Understand by Doing: MapReduce
  • Running Hadoop Map Reduce Programs
  • Practical Lab:Running Hadoop Map Reduce Programs

Course Methodology

The training is going to be highly interactive combination of lectures, group discussions, questionnaires, individual reflections, role plays, simulations and videos.

Target Audience

This training is for those new to data science and interested in understanding why the Big Data Era has come to be. It is for those who want to become conversant with the terminology and the core concepts behind big data problems, applications, and systems.

Duration

4 Days (08:30–14:30) with appropriate breaks for tea/refreshments and lunch.

Related Courses

View All Courses