课程信息

课程名称: Hadoop开发工程师(CCDH)认证

公开班、定制班

开课时间:2024-03-26

课程介绍

 Hadoop开发工程师(CCDH)认证


课程简介

作为大数据核心技术,hadoop 为企业提供了高扩展、高冗余、高容错、和经济有效的“数据驱动”解决方案。针对目前普遍缺乏海量数据技术人员的现状,Cloudera公司推出面向开发人员的认证Cloudera Certified Developer for Apache Hadoop (CCDH)。通过在青蓝咨询CCDH课程培训您将学习到:

* Hadoop核心

* HDFS和MapReduce工作原理

* 如何开发MapReduce应用

*  如何单元测试MapReduce应用

* 如何使用MapReduce combiners, partitioners和distributed cache

* 开发调试MapReduce应用

* 如何实现MapReduce应用中的输入/输出

* 常见MapReduce算法

* 如何用MapReduce来联结数据集

* 如何把Hadoop嵌入到企业已有的计算环境里

* 如何使用Mahout来进行机器学习

* 如何使用Hive和Pig来快速开发数据分析应用

* 如何使用Oozie来创建管理工作流


授课对象

企业管理者、CIO、CTO、政府信息部门官员、项目(开发)经理、咨询顾问 、IT经理,IT咨询顾问,IT支持专家、系统工程师、数据中心管理员、云计算管理员及想加入云计算队伍的您需要使用Apache Hadoop来开发功能强大的数据分析应用的程序开发人员。

学员需具备程序设计经验,特别是Java方面的技能和背景。无需Hadoop方面的基础和经验。


授课内容

了解MapReduce和HDFS是如何组合相互匹配,提供可扩展的强大系统。

学习编写针对Hadoops API的程序,掌握编写更有趣的数据处理任务所需的基本技能。

掌握如何在数据中心服务器上或Amazons EC2上部署Hadoop,利用Hadoop扩充现有系统。

掌握如何把不同类型数据导入Hadoop作进一步分析,以及利用Sqoop导入现有数据库。

掌握如何使用Hive,涉及数据导入、表格创建及作出查询。

掌握最佳方案以减轻MapReduce程序调试难度,及规模调试的本地测试工具和技术。

深入了解Hadoop API,包括自定义数据类型和文件格式,HDFS的直接访问,中间数据划分,以及其他工具,如DistributedCache。

深入了解图算法,以及PageRank。了解有效执行联接的策略,比较不同数据模型的不同技术。

掌握如何进行MapReduce程序优化,提高性能。


模块

内容

The Motivation for Hadoop

 

Problems with Traditional  Large-Scale Systems

Introducing Hadoop

Hadoopable Problems

The Motivation for Hadoop

 

Problems with Traditional  Large-Scale Systems

Introducing Hadoop

Hadoopable Problems

Hadoop: Basic Concepts and HDFS

The Hadoop Project and  Hadoop Components

The Hadoop Distributed File System

Introduction to MapReduce V2

 

MapReduce Overview

Example: WordCount  

Mappers  

Reducers

Hadoop Clusters and the Hadoop Ecosystem

Hadoop Cluster Overview

Hadoop Jobs and Tasks

Other Hadoop Ecosystem Components

Writing a MapReduce Program in Java

Basic MapReduce API Concepts

Writing MapReduce Drivers, Mappers, and Reducers in Java

Speeding Up Hadoop Development by Using Eclipse

Differences Between the Old and New MapReduce APIs

Writing a MapReduce Program Using Streaming

Writing Mappers and Reducers with the Streaming API

Unit Testing MapReduce Programs

Unit Testing

The JUnit and MRUnit Testing Frameworks

Writing Unit Tests with MRUnit

Running Unit Tests

Delving Deeper into the Hadoop API

 

Using the ToolRunner Class

Setting Up and Tearing Down Mappers and Reducers

Decreasing the Amount of Intermedi-ate  Data with Combiners

Accessing HDFS Programmatically

Using The Distributed Cache

Using the Hadoop API’s Library of Mappers,Reducers, and Partitioners

Practical Development Tips and Techniques

Strategies for Debugging MapReduce Code

Testing MapReduce Code Locally by Using

LocalJobRunner

 

Writing and Viewing Log Files

Retrieving Job Information with Counters

Reusing Objects

Creating Map-Only MapReduce Jobs

Partitioners and Reducers

How Partitioners and Reducers Work Together

Determining the Optimal Number of Reduc-ers for a Job

Writing Customer Partitioners

Data Input and Output

 

Creating Custom Writable and Writable-Comparable Implementations

Saving Binary Data Using SequenceFile andAvro Data Files

Issues to Consider When Using File Compression

Implementing Custom InputFormats and OutputFormats

Common MapReduce Algorithms

 

Sorting and Searching Large Data Sets

Indexing Data

Computing Term Frequency — Inverse Document Frequency

Calculating Word Co-Occurrence

Performing Secondary Sort

Joining Data Sets in MapReduce Jobs

Writing a Map-Side Join

Writing a Reduce-Side Join

Integrating Hadoop into the Enterprise Workflow

Integrating Hadoop into an Existing Enterprise

Loading Data from an RDBMS into HDFS by Using Sqoop

Managing Real-Time Data Using Flume

Accessing HDFS from Legacy  Systems with FuseDFS and HttpFS

An Introduction to Hive, Imapala, and Pig

The Motivation for Hive, Impala, and Pig

Hive Overview

Impala Overview

Pig Overview

Choosing Between Hive, Impala, and Pig

An Introduction to Oozie

Introduction to Oozie

Creating Oozie Workflows

Conclusion

Conclusion



注:具体开课时间将根据实际进行调整,请关注青蓝咨询官方公众号消息或咨询课程顾问!



【联系青蓝咨询】

地址: 深圳市南山区高新南一道06号TCL大厦B座3楼309室 (公交站:大冲   地铁站:一号线高新园C出口) 

    邮编:518057 

    电话:0755-86950769

    邮箱:peixun@shzhchina.com 

    网址:http://www.shzhchina.com

 

扫码关注 了解更多课程信息