This course Provides instruction on the processes and practice of data science, including machine learning and natural language processing. Included are: tools and programming languages (Python, IPython, Mahout, Pig, NumPy, pandas, SciPy, Scikitlearn), the Natural Language Toolkit (NLTK), and Spark MLlib.
This course is designed for developers who need to create applications to analyze Big Data stored in Apache Hadoop using Pig and Hive. Topics include: Hadoop, YARN, HDFS, MapReduce, data ingestion, workflow definition, using Pig and Hive to perform data analytics on Big Data and an introduction to Spark Core and Spark SQL.
This course is designed as an entry point for developers who need to create applications to analyze Big Data stored in Apache Hadoop using Spark. Topics include:
- An overview of the Hortonworks Data Platform (HDP), including HDFS and YARN
- Using Spark Core APIs for interactive data exploration
- Spark SQL and DataFrame operations
- Spark Streaming and DStream operations
- Data visualization, reporting, and collaboration
- Performance monitoring and tuning
- Building and deploying Spark applications
- Introduction to the Spark Machine Learning Library
This course is designed for experienced administrators who manage Hortonworks Data Platform (HDP) clusters with Ambari. It covers upgrades, configuration, application management, and other common tasks.
This course is designed for Data Stewards or Data Flow Managers who are looking forward to automate the flow of data between systems. Topics Include Introduction to NiFi, Installing and Configuring NiFi, Detail explanation of NiFi User Interface, Explanation of its components and Elements associated with each. How to Build a dataflow, NiFi Expression Language, Understanding NiFi Clustering, Data Provenance, Security around NiFi, Monitoring Tools and HDF Best practices.
This course is intended for systems administrators who will be responsible for the design, installation, configuration, and management of the Hortonworks Data Platform (HDP). The course provides in-depth knowledge and experience in using Apache Ambari as the operational management platform for HDP. This course presumes no prior knowledge or experience with Hadoop.
This course is designed for experienced administrators who will be implementing secure Hadoop clusters using authentication, authorization, auditing and data protection strategies and tools.
This course provides a technical overview of Apache Hadoop. It includes high-level information about concepts, architecture, operation, and uses of the Hortonworks Data Platform (HDP) and the Hadoop ecosystem. The course provides an optional primer for those who plan to attend a hands-on, instructor-led courses.