Course Overview

This course provides a technical overview of Apache Hadoop. It includes high-level information about concepts, architecture, operation, and uses of the Hortonworks Data Platform (HDP) and the Hadoop ecosystem. The course provides an optional primer for those who plan to attend a hands-on, instructor-led courses.

1 days
    • Describe the use case for Hadoop
      • Identify Hadoop Ecosystem architectural categories
      • Data Management
      • Data Access
      • Data Governance and Integration
      • Security
      • Operations
    • Detail the HDFS architecture
    • Describe data ingestion options and frameworks for batch and real-time streaming
    • Explain the fundamentals of parallel processing
    • See popular data transformation and processing engines in action
      • Apache Hive
      • Apache Pig
      • Apache Spark
    • Detail the architecture and features of YARN
    • Describe how to secure Hadoop
    • Operational overview with Ambari
    • Loading data into HDFS
    • Data manipulation with Hive
    • Risk Analysis with Pig
    • Risk Analysis with Spark and Zeppelin
    • Securing Hive with Ranger
  • No previous Hadoop or programming knowledge is required. Students will need browser access to the Internet.

  • Data architects, data integration architects, managers, C-level executives, decision makers, technical infrastructure team, and Hadoop administrators or developers who want to understand the fundamentals of Big Data and the Hadoop ecosystem.