Data Streaming Nanodegree v1.0.0

Nanodegree key: nd029

Version: 1.0.0

Locale: en-us

Learn the latest skills to process data in real-time by building fluency in modern data engineering tools, such as Apache Spark, Kafka, Spark Streaming, and Kafka Streaming.

Content

Part 01 : Welcome to the Data Streaming Nanodegree Program

Module 01: Welcome to the Data Streaming Nanodegree Program
- Lesson 01: Data Streaming Nanodegree Program Introduction
  You are starting a challenging but rewarding journey! Take a few minutes to read how to get help with projects and content.
- Lesson 02: Introduction to Data Streaming
  Learn how to process data in real-time by building fluency in modern data engineering tools, such as Apache Spark, Kafka, Spark Streaming, and Kafka Streaming
  - Concept 01: Introduction to Data Streaming Part I
  - Concept 02: Introduction to Data Streaming Part 2
- Lesson 03: Nanodegree Career Services
  The Careers team at Udacity is here to help you move forward in your career - whether it's finding a new job, exploring a new career path, or applying new skills to your current job.
  - Concept 01: Career Services

Part 02 : Data Ingestion with Kafka & Kafka Streaming

Learn to use REST Proxy, Kafka Connect, KSQL, and Faust Python Stream Processing and use it to stream public transit statuses using Kafka and Kafka ecosystem to build a stream processing application that shows the status of trains in real-time.

Module 01: Data Ingestion with Kafka & Kafka Streaming
- Lesson 01: Introduction to Stream Processing
  In this lesson students will learn what data streaming is. Students will learn the pros and cons of data streaming, and how it compares to traditional data strategies.
- Lesson 02: Apache Kafka
  In this lesson we’ll review the architecture and configuration of Apache Kafka.
- Lesson 03: Data Schemas and Apache Avro
  This lesson covers data schemas and data schema management, with a focus on Apache Avro.
- Lesson 04: Kafka Connect and REST Proxy
  This lesson covers producing and consuming data into Kafka with Kafka Connect and REST Proxy.
- Lesson 05: Stream Processing Fundamentals
  Learn to build real-time applications that instantly process events, the concepts of stream processing state storage, windowed processing, and stateful and non-stateful stream processing.
- Lesson 06: Stream Processing with Faust
  Students will learn how to use the Python stream processing library Faust to rapidly create powerful stream processing applications.
- Lesson 07: KSQL
  Learn how to write simple SQL queries to turn Kafka topics into KSQL streams and tables, and then write those tables back out to Kafka.
- Lesson 08: Optimizing Public Transportation
  For your first project, you’ll be streaming public transit status using Kafka and the Kafka ecosystem to build a stream processing application that shows the status of trains in real-time.
  
  Project Description - Optimizing Public Transportation
  
  Project Rubric - Optimizing Public Transportation
- Lesson 09: Optimize Your GitHub Profile
  Other professionals are collaborating on GitHub and growing their network. Submit your profile to ensure your profile is on par with leaders in your field.
  
  Project Description - Optimize Your GitHub Profile
  
  Project Rubric - Optimize Your GitHub Profile

Part 03 : Apache Spark and Spark Streaming

Module 01: Apache Spark and Spark Streaming
- Lesson 01: The Power of Spark
  In this lesson, you will learn about the problems that Apache Spark is designed to solve. You'll also learn about the greater Big Data ecosystem and how Spark fits into it.
- Lesson 02: Data Wrangling with Spark
  In this lesson, we'll dive into how to use Spark for cleaning and aggregating data.
- Lesson 03: Intro to Spark Streaming
  In this lesson, students will learn what Apache Spark Streaming is. Students will review the core architecture of Spark, and distinguish differences between Spark Streaming vs Structured Streaming.
- Lesson 04: Structured Streaming APIs
  In this lesson, we’ll go over commonly used functions in RDD/DataFrame/Dataset. We’ll continue to learn about Spark Streaming APIs and how you can use them to solve real-time analytic problems.
- Lesson 05: Integration of Spark Streaming and Kafka
  In this lesson, students will learn core components in integrating Spark Streaming and Kafka.
- Lesson 06: SF Crime Statistics with Spark Streaming
  In this project, you will analyze a real-world dataset of the SF Crime Rate, extracted from kaggle, to provide statistical analysis using Apache Spark Structured Streaming.
  
  Project Description - SF Crime Statistics with Spark Streaming
  
  Project Rubric - SF Crime Statistics with Spark Streaming
- Lesson 07: Take 30 Min to Improve your LinkedIn
  Find your next job or connect with industry peers on LinkedIn. Ensure your profile attracts relevant leads that will grow your professional network.
  
  Project Description - Improve Your LinkedIn Profile
  
  Project Rubric - Improve Your LinkedIn Profile