001 大数据爱好者的 Hadoop 教程-学习 Hadoop 的最佳方式

000 Hadoop Tutorial for Big Data Enthusiasts – The Optimal way of Learning Hadoop

Hadoop Tutorial – One of the most searched terms on the internet today. Do you know the reason? It is because Hadoop is the major part or framework of Big Data.

Hadoop 教程-当今互联网上搜索最多的术语之一. 你知道原因吗?这是因为 Hadoop 是大数据的主要组成部分或框架.

If you don’t know anything about Big Data then you are in major trouble. But don’t worry I have something for you which is completely FREE –*** 520+ Big Data Tutorials. *** This free tutorial series will make you a master of Big Data in just few weeks. Also, I have explained a little about Big Data in this blog.

如果你对大数据一无所知,那你就麻烦大了. 但是别担心,我有东西给你 完全免费-***520 + 大数据教程:. *** 这个免费的教程系列将在几周内让你成为大数据的大师.此外,我在这个博客中解释了一点关于大数据的知识.

“Hadoop is a technology to store massive datasets on a cluster of cheap machines in a distributed manner”. It was originated by Doug Cutting and Mike Cafarella.

“Hadoop 是一种以分布式方式将大量数据集存储在廉价机器集群上的技术”. 它由道格切和迈克 · 卡法雷拉发起.

Doug Cutting’s kid named Hadoop to one of his toy that was a yellow elephant. Doug then used the name for his open source project because it was easy to spell, pronounce, and not used elsewhere.

道格 · 切的孩子把 Hadoop 命名为他的一个玩具,那是一只黄色的大象.道格随后在他的开源项目中使用了这个名字,因为它很容易拼写、发音,在其他地方也不使用.

Interesting, right?

有趣是吧?

Hadoop Tutorial

Hadoop 教程:

Now, let’s begin our interesting Hadoop tutorial with the basic introduction to Big Data.

现在,让我们从大数据的基本介绍开始我们有趣的 Hadoop 教程.

What is Big Data?

大数据是什么?

Big Data refers to the datasets too large and complex for traditional systems to store and process. The major problems faced by Big Data majorly falls under three Vs. They are volume, velocity, and variety.

大数据是指传统系统存储和处理的数据集太大、太复杂.大数据面临的主要问题主要是体积、速度和多样性.

***Do you know – ****Every minute we send 204 million emails, generate 1.8 million Facebook likes, send 278 thousand Tweets, and up-load 200,000 photos to Facebook. *

**你知道吗****我们每分钟发送 2.04亿封电子邮件,生成 180万个 Facebook 赞,发送 278,000 条推文,并向 Facebook 上传 200,000 张照片.

Volume: The data is getting generated in order of Tera to petabytes. The largest contributor of data is social media. For instance, Facebook generates 500 TB of data every day. Twitter generates 8TB of data daily.

体积: 按照 Tera 到 pb 的顺序生成数据.社交媒体是最大的数据贡献者.例如,Facebook 每天产生 500 TB 的数据.Twitter 每天产生 8 TB 的数据.

Velocity: Every enterprise has its own requirement of the time frame within which they have process data. Many use cases like credit card fraud detection have only a few seconds to process the data in real-time and detect fraud. Hence there is a need of framework which is capable of high-speed data computations.

速度: 每个企业都有自己的处理数据的时间框架要求.像信用卡欺诈检测这样的许多用例只有几秒钟的时间来实时处理数据并检测欺诈.因此,需要能够进行高速数据计算的框架.

Variety: Also the data from various sources have varied formats like text, XML, images, audio, video, etc. Hence the Big Data technology should have the capability of performing analytics on a variety of data.

品种: 此外,来自不同来源的数据也有不同的格式,如文本、 XML 、图像、音频、视频等.因此,大数据技术应该有能力对各种数据进行分析.

Hope you have checked the Free Big Data DataFlair Tutorial Series. Here is one more interesting article for you – Top Big Data Quotes by the Experts

希望您已经查看了免费的大数据 DataFlair 教程系列

Why Hadoop is Invented?

Hadoop 为何发明?

Let us discuss the shortcomings of the traditional approach which led to the invention of Hadoop –

让我们讨论导致 Hadoop 发明的传统方法的缺点-

1. Storage for Large Datasets

1. 存储的大数据集

The conventional RDBMS is incapable of storing huge amounts of Data. The cost of data storage in available RDBMS is very high. As it incurs the cost of hardware and software both.

传统的关系数据库不能存储大量的数据.在可用的数据库中存储数据的成本非常高.因为它会带来硬件和软件的成本.

2. Handling data in different formats

2. 、处理不同格式的数据

The RDBMS is capable of storing and manipulating data in a structured format. But in the real world we have to deal with data in a structured, unstructured and semi-structured format.

关系数据库能够以结构化格式存储和操作数据.但是在现实世界中,我们必须以结构化、非结构化和半结构化的格式处理数据.

3. Data getting generated with high speed:

3..高速生成数据:

The data in oozing out in the order of tera to peta bytes daily. Hence we need a system to process data in real-time within a few seconds. The traditional RDBMS fail to provide real-time processing at great speeds.

数据以 tera 到 peta 字节的顺序每天渗出.因此,我们需要一个系统在几秒钟内实时处理数据.传统的关系数据库不能提供高速的实时处理.

What is Hadoop?

Hadoop is the solution to above Big Data problems. It is the technology to store massive datasets on a cluster of cheap machines in a distributed manner. Not only this it provides Big Data analytics through distributed computing framework.

Hadoop 是解决上述大数据问题的解决方案.这是一种以分布式方式将大量数据集存储在廉价机器集群上的技术.它不仅通过分布式计算框架提供大数据分析.

It is an open-source software developed as a project by Apache Software Foundation. Doug Cutting created Hadoop. In the year 2008 Yahoo gave Hadoop to Apache Software Foundation. Since then two versions of Hadoop has come. Version 1.0 in the year 2011 and version 2.0.6 in the year 2013. Hadoop comes in various flavors like Cloudera, IBM BigInsight, MapR and Hortonworks.

它是 Apache 软件基金会作为一个项目开发的开源软件.Doug Cutting 创建了 Hadoop.2008年,雅虎将 Hadoop 交给了 Apache 软件基金会.从那以后,Hadoop 有了两个版本.2011年的 1.0 版和 2013年的版.Hadoop 有 Cloudera 、 IBM BigInsight 、 MapR 和 Hortonworks 等多种版本.

Prerequisites to Learn Hadoop

学习 Hadoop 的先决条件

  • Familiarity with some basic Linux Command – Hadoop is set up over Linux Operating System preferable Ubuntu. So one must know certain*** basic Linux commands***. These commands are for uploading the file in HDFS, downloading the file from HDFS and so on.
  • Basic Java concepts – Folks want to learn Hadoop can get started in Hadoop while simultaneously grasping basic concepts of Java. We can write map and reduce functions in Hadoop using other languages too. And these are Python, Perl, C, Ruby, etc. This is possible via streaming API. It supports reading from standard input and writing to standard output. Hadoop also has high-level abstractions tools like Pig and Hive which do not require familiarity with Java.

Big Data Hadoop Tutorial Video

  • 熟悉一些基本的 Linux 命令 Hadoop 是在 Linux 操作系统上建立的,比 Ubuntu 更好.所以一定要知道 基本的 Linux 命令. 这些命令用于在 HDFS 中上传文件、从 HDFS 下载文件等.
  • Java 的基本概念 想学习 Hadoop 的人可以在同时掌握 Hadoop 的同时开始学习 Java 的基本概念. 我们也可以使用其他语言在 Hadoop 中编写 map 和 reduce 函数. 这些是 Python 、 Perl 、 C 、 Ruby 等,这是通过流 API 实现的.它支持从标准输入到标准输出的读取和写入. Hadoop 还有像 Pig 和 Hive 这样的高级抽象工具,不需要熟悉 Java.

Hadoop consists of three core components –

Hadoop 由三个核心组件组成

  • Hadoop Distributed File System **(HDFS) – **It is the storage layer of Hadoop.

  • **Map-Reduce – **It is the data processing layer of Hadoop.

  • **YARN – **It is the resource management layer of Hadoop.

  • 分布式文件系统 (HDFS)- 是 Hadoop 的存储层.

  • Map-Reduce- 是 Hadoop 的数据处理层.

  • Yarn- Hadoop 的资源管理层.

Core Components of Hadoop

Hadoop 的核心组件

Let us understand these Hadoop components in detail.

让我们详细了解这些 Hadoop 组件.

1. HDFS

Short for Hadoop Distributed File System provides for distributed storage for Hadoop. HDFS has a master-slave topology.

Hadoop 分布式文件系统的简称,为 Hadoop 提供分布式存储.HDFS 具有主从拓扑结构.

image.png

Master is a high-end machine where as slaves are inexpensive computers. The Big Data files get divided into the number of blocks. Hadoop stores these blocks in a distributed fashion on the cluster of slave nodes. On the master, we have metadata stored.

Master 是一种高端机器,作为奴隶,它是廉价的计算机.大数据文件按照块的数量进行划分.Hadoop 以分布式方式将这些块存储在从属节点集群上.在 master 上,我们存储了元数据.

HDFS has two daemons running for it. They are :

HDFS 有两个守护进程在运行.他们是:

NameNode : NameNode performs following functions –

  • NameNode Daemon runs on the master machine.

  • It is responsible for maintaining, monitoring and managing DataNodes.

  • It records the metadata of the files like the location of blocks, file size, permission, hierarchy etc.

  • Namenode captures all the changes to the metadata like deletion, creation and renaming of the file in edit logs.

  • It regularly receives heartbeat and block reports from the DataNodes.

  • NameNode 守护进程在主机上运行.

  • 负责数据节点的维护、监控和管理.

  • 它记录文件的元数据,如块的位置、文件大小、权限、层次结构等.

  • Namenode 捕获对元数据的所有更改,如在编辑日志中删除、创建和重命名文件.

  • 它定期从 DataNodes 接收心跳和阻塞报告.

DataNode: The various functions of DataNode are as follows –

  • DataNode runs on the slave machine.

  • It stores the actual business data.

  • It serves the read-write request from the user.

  • DataNode does the ground work of creating, replicating and deleting the blocks on the command of NameNode.

  • After every 3 seconds, by default, it sends heartbeat to NameNode reporting the health of HDFS.

  • DataNode 在从机上运行.

  • 存储实际业务数据.

  • 它服务于用户的读写请求.

  • DataNode 在 NameNode 命令下执行创建、复制和删除块的基本工作.

  • 默认情况下,每隔 3 秒,它会向报告 HDFS 健康状况的 NameNode 发送心跳.

Erasure Coding in HDFS

擦除编码 HDFS

Till Hadoop 2.x replication is the only method for providing fault tolerance. Hadoop 3.0 introduces one more method called erasure coding. Erasure coding provides the same level of fault tolerance but with lower storage overhead.

直到 Hadoop 2.X 复制是提供容错的唯一方法. Hadoop 3.0 又引入了一种称为擦除编码的方法.擦除编码提供了相同级别的容错能力,但存储开销较低.

Erasure coding is usually used in RAID (Redundant Array of Inexpensive Disks) kind of storage. RAID provides erasure coding via striping. In this, it divides the data into smaller units like bit/byte/block and stores the consecutive units on different disks. Hadoop calculates parity bits for each of these cell (units). We call this process as encoding. On the event of loss of certain cells, Hadoop computes these by decoding. Decoding is a process in which lost cells gets recovered from remaining original and parity cells.

RAID (廉价磁盘的冗余阵列) 存储通常使用擦除编码.RAID 通过条带化提供擦除编码.在这种情况下,它将数据分成更小的单元,如位/字节/块,并将连续的单元存储在不同的磁盘上.Hadoop 计算每个单元 (单元) 的奇偶校验位.我们把这个过程称为编码.在某些单元丢失的情况下,Hadoop 通过解码来计算这些单元.解码是从剩余的原始和奇偶校验单元格中恢复丢失的单元格的过程.

Erasure coding is mostly used for warm or cold data which undergo less frequent I/O access. The replication factor of Erasure coded file is always one. we cannot change it by -setrep command. Under erasure coding storage overhead is never more than 50%.

擦除编码主要用于接受不太频繁 I/O 访问的温暖或寒冷数据.擦除编码文件的复制因子始终是 1.我们不能通过-setrep 命令来改变它.在擦除编码下,存储开销不会超过 50%.

Under conventional Hadoop storage replication factor of 3 is default. It means 6 blocks will get replicated into 6*3 i.e. 18 blocks. This gives a storage overhead of 200%. As opposed to this in Erasure coding technique there are 6 data blocks and 3 parity blocks. This gives storage overhead of 50%.

默认情况下,传统的 Hadoop 存储复制因子为 3.这意味着 6 个块将被复制到 6*3,即 18 个块中.这将导致 200% 的存储开销.与擦除编码技术相反,有 6 个数据块和 3 个奇偶校验块.这使得存储开销高达 50%.

The File System Namespace

文件系统命名空间

HDFS supports hierarchical file organization. One can create, remove, move or rename a file. NameNode maintains file system Namespace. NameNode records the changes in the Namespace. It also stores the replication factor of the file.

HDFS 支持分层文件组织.可以创建、删除、移动或重命名文件.NameNode 维护文件系统命名空间.NameNode 记录命名空间中的更改.它还存储文件的复制因子.

2. MapReduce

2. MapReduce

It is the data processing layer of Hadoop. It processes data in two phases.

是 Hadoop 的数据处理层.它分两个阶段处理数据.

They are:-

Map Phase- This phase applies business logic to the data. The input data gets converted into key-value pairs.

Map 阶段 这个阶段对数据应用业务逻辑.输入数据被转换成键值对.

Reduce Phase- The Reduce phase takes as input the output of Map Phase. It applies aggregation based on the key of the key-value pairs.

Reduce 阶段 将 Map 阶段的输出作为输入.它基于键-值对的键应用聚合.

Map-Reduce works in the following way:

  • The client specifies the file for input to the Map function. It splits it into tuples

  • Map function defines key and value from the input file. The output of the map function is this key-value pair.

  • MapReduce framework sorts the key-value pair from map function.

  • The framework merges the tuples having the same key together.

  • The reducers get these merged key-value pairs as input.

  • Reducer applies aggregate functions on key-value pair.

  • The output from the reducer gets written to HDFS.

  • 客户端指定输入到 Map 函数的文件.把它拆分成元组

  • Map 函数从输入文件中定义键和值.Map 函数的输出是这个键值对.

  • MapReduce 框架根据 map 函数对键值对进行排序.

  • 框架将具有相同键的元组合并在一起.

  • Reducers 将这些合并的键值对作为输入.

  • Reducer 在键值对上应用聚合函数.

  • 减速机的输出被写到 HDFS.

3. YARN

Short for Yet Another Resource Locator has the following components:-

另一个资源定位器的缩写有以下组件:-

Resource Manager 资源经理

  • Resource Manager runs on the master node.

  • It knows where the location of slaves (Rack Awareness).

  • It is aware about how much resources each slave have.

  • Resource Scheduler is one of the important service run by the Resource Manager.

  • Resource Scheduler decides how the resources get assigned to various tasks.

  • Application Manager is one more service run by Resource Manager.

  • Application Manager negotiates the first container for an application.

  • Resource Manager keeps track of the heart beats from the Node Manager.

  • 资源管理器在主节点上运行.

  • 它知道奴隶的位置 (机架感知).

  • 它知道每个奴隶有多少资源.

  • 资源调度器是资源管理器运行的重要服务之一.

  • 资源调度器决定如何将资源分配给各种任务.

  • Application Manager 是资源管理器运行的又一个服务.

  • Application Manager 为应用程序协商第一个容器.

  • 资源管理器从节点管理器跟踪心跳.

Node Manager 节点管理器

[图片上传失败...(image-3f9790-1564409913371)]

  • It runs on slave machines.

  • It manages containers. Containers are nothing but a fraction of Node Manager’s resource capacity

  • Node manager monitors resource utilization of each container.

  • It sends heartbeat to Resource Manager.

  • 它在从机上运行.

  • 它管理集装箱.容器只是节点管理器资源容量的一小部分

  • 节点管理器监视每个容器的资源利用率.

  • 它向资源管理器发送心跳.

Job Submitter 工作提交者

The application startup process is as follows:-

应用程序启动过程如下:-

  • The client submits the job to Resource Manager.

  • Resource Manager contacts Resource Scheduler and allocates container.

  • Now Resource Manager contacts the relevant Node Manager to launch the container.

  • Container runs Application Master.

  • 客户端将作业提交给资源管理器.

  • 资源管理器联系资源调度器并分配容器.

  • 现在,资源管理器联系相关的节点管理器来启动容器.

  • 容器运行应用程序 Master.

The basic idea of YARN was to split the task of resource management and job scheduling. It has one global Resource Manager and per-application Application Master. An application can be either one job or DAG of jobs.

YARN 的基本思想是将资源管理和作业调度的任务进行拆分.它有一个全局资源管理器和每个应用程序的主应用程序.应用程序可以是一个作业,也可以是作业的 DAG.

The Resource Manager’s job is to assign resources to various competing applications. Node Manager runs on the slave nodes. It is responsible for containers, monitoring resource utilization and informing about the same to Resource Manager.

资源管理器的工作是为各种竞争的应用程序分配资源.节点管理器在从属节点上运行.它负责容器、监控资源利用率并向资源管理器通知.

The job of Application master is to negotiate resources from the Resource Manager. It also works with NodeManager to execute and monitor the tasks.

应用主的工作是从资源管理器协商资源.它还与 NodeManager 一起执行和监控任务.

***Wait before scrolling further! This is the time to read about the top 15 Hadoop Ecosystem components. ***

***在进一步滚动之前等待!这就是我们看到的Hadoop 生态系统组件前 15 名. ***

Why Hadoop?

Let us now understand why Big Data Hadoop is very popular, why Apache Hadoop capture more than 90% of the big data market.

现在让我们来了解为什么大数据 Hadoop 非常受欢迎,为什么 Apache Hadoop 在大数据市场上占据了 90% 以上的份额.

Apache Hadoop is not only a storage system but is a platform for data storage as well as processing. It is scalable (as we can add more nodes on the fly), Fault-tolerant (Even if nodes go down, data processed by another node).
Following characteristics of Hadoop make it a unique platform:

Apache Hadoop 不仅是一个存储系统,也是一个数据存储和处理的平台.它是可扩展(因为我们可以动态添加更多节点),容错的(即使节点宕机,数据也由另一个节点处理).
以下Hadoop 的特点打造独一无二的平台:

  • Flexibility to store and mine any type of data whether it is structured, semi-structured or unstructured. It is not bounded by a single schema.

  • Excels at processing data of complex nature. Its scale-out architecture divides workloads across many nodes. Another added advantage is that its flexible file-system eliminates ETL bottlenecks.

  • Scales economically, as discussed it can deploy on commodity hardware. Apart from this its open-source nature guards against vendor lock.

  • 灵活地存储和挖掘任何类型的数据,无论是结构化的、半结构化的还是非结构化的.它不受单个模式的限制.

  • 擅长处理复杂性质的数据.它的横向扩展架构在许多节点上划分工作负载.它的另一个优点是灵活的文件系统消除了 ETL 瓶颈.

  • 正如所讨论的,它可以在商品硬件上部署,经济规模.除此之外,它的开源自然保护供应商锁.

What is Hadoop Architecture?

Hadoop 架构是什么?

After understanding what is Apache Hadoop, let us now understand the Hadoop Architecture in detail.

了解了什么是 Apache Hadoop 之后,现在就让我们详细了解一下 Hadoop 的架构.

How Hadoop Works

Hadoop works in master-slave fashion. There is a master node and there are n numbers of slave nodes where n can be 1000s. Master manages, maintains and monitors the slaves while slaves are the actual worker nodes. In Hadoop architecture, the Master should deploy on good configuration hardware, not just commodity hardware. As it is the centerpiece of Hadoop cluster.

Hadoop 的工作原理主-从.有一个主节点,有 n 个从节点,其中 n 个可以是 1000.当从属节点是实际的工作节点时,Master 管理、维护和监控从属节点.在 Hadoop 架构中,Master 应该部署在配置良好的硬件上,而不仅仅是商品硬件.因为它是 Hadoop 集群.

Master stores the metadata (data about data) while slaves are the nodes which store the data. Distributedly data stores in the cluster. The client connects with the master node to perform any task. Now in this Hadoop tutorial for beginners, we will discuss different features of Hadoop in detail.

Master 存储元数据 (关于数据的数据),而 slaves 是存储数据的节点.集群中的分布式数据存储.客户端与主节点连接以执行任何任务.现在,在这个面向初学者的 Hadoop 教程中,我们将详细讨论 Hadoop 的不同特性.

Hadoop Features

Hadoop 特性

Here are the top Hadoop features that make it popular –

以下是 Hadoop 最受欢迎的功能-

1. Reliability

1. 可靠性

In the Hadoop cluster, if any node goes down, it will not disable the whole cluster. Instead, another node will take the place of the failed node. Hadoop cluster will continue functioning as nothing has happened. Hadoop has built-in fault tolerance feature.

在 Hadoop 集群中,如果有任何节点宕机,都不会禁用整个集群.相反,另一个节点将取代失败的节点.由于没有发生任何事情,Hadoop 集群将继续运行.Hadoop 内置了容错功能.

2. Scalable

2. 可扩展

Hadoop gets integrated with cloud-based service. If you are installing Hadoop on the cloud you need not worry about scalability. You can easily procure more hardware and expand your Hadoop cluster within minutes.

Hadoop 与基于云的服务集成.如果你在云上安装 Hadoop,你不需要担心可扩展性.您可以在几分钟内轻松获得更多硬件并扩展 Hadoop 集群.

3. Economical

3. 经济型

Hadoop gets deployed on commodity hardware which is cheap machines. This makes Hadoop very economical. Also as Hadoop is an open system software there is no cost of license too.

Hadoop 部署在廉价机器上的商用硬件上.这使得 Hadoop 非常经济.此外,由于 Hadoop 是一个开放的系统软件,许可证也没有成本.

4. Distributed Processing

4. 分布式处理

In Hadoop, any job submitted by the client gets divided into the number of sub-tasks. These sub-tasks are independent of each other. Hence they execute in parallel giving high throughput.

在 Hadoop 中,客户端提交的任何作业都被划分为子任务的数量.这些子任务是相互独立的.因此,它们并行执行,提供高吞吐量.

5. Distributed Storage

5. 分布式存储

Hadoop splits each file into the number of blocks. These blocks get stored distributedly on the cluster of machines.

Hadoop 将每个文件拆分成块的数量.这些数据块被分布式地存储在机器集群上.

6. Fault Tolerance

6. 容错

Hadoop replicates every block of file many times depending on the replication factor. Replication factor is 3 by default. In Hadoop suppose any node goes down then the data on that node gets recovered. This is because this copy of the data would be available on other nodes due to replication. Hadoop is fault tolerant.

Hadoop 根据复制因子多次复制每个文件块.默认情况下,复制因子为 3.在 Hadoop 中,假设任何节点都关闭,那么该节点上的数据就会恢复.这是因为由于复制,数据的此副本将在其他节点上可用.Hadoop 是容错的.

***Are you looking for more Features? Here are the additional Hadoop Features that make it special. ***

你在寻找更多的功能吗?以下是Hadoop 的附加功能这让它变得特别.

Hadoop Flavors

This section of the Hadoop Tutorial talks about the various flavors of Hadoop.

Hadoop 教程的这一部分讲述了 Hadoop 的各种风格.

  • Apache – Vanilla flavor, as the actual code is residing in Apache repositories.
  • Hortonworks – Popular distribution in the industry.
  • Cloudera – It is the most popular in the industry.
  • MapR – It has rewritten HDFS and its HDFS is faster as compared to others.
  • IBM – Proprietary distribution is known as Big Insights.

All the databases have provided native connectivity with Hadoop for fast data transfer. Because, to transfer data from Oracle to Hadoop, you need a connector.

所有数据库都提供了与 Hadoop 的本地连接,以实现快速数据传输.因为要将数据从 Oracle 传输到 Hadoop,需要一个连接器.

All flavors are almost same and if you know one, you can easily work on other flavors as well.

所有的口味几乎都是一样的,如果你知道一种,你也可以很容易地尝试其他口味.

Hadoop Future Scope

未来的 Hadoop

There is going to be a lot of investment in the Big Data industry in coming years. According to a report by FORBES, 90% of global organizations will be investing in Big Data technology. Hence the demand for Hadoop resources will also grow. Learning Apache Hadoop will give you accelerated growth in career. It also tends to increase your pay package.

将会有大量的投资在 未来几年的大数据产业 .根据一份报告福布斯90% 的全球组织将投资大数据技术.因此,对 Hadoop 资源的需求也将增长.学习 Apache Hadoop 可以让你的职业生涯加速发展.它还会增加你的薪酬.

There is a lot of gap between the supply and demand of Big Data professional. The skill in Big Data technologies continues to be in high demand. This is because companies grow as they try to get the most out of their data. Therefore, their salary package is quite high as compared to professionals in other technology.

大数据专业人才的供需缺口很大.对大数据技术的需求仍然很高.这是因为公司在努力从数据中获得最大收益的过程中不断增长.因此,与其他技术专业人员相比,他们的工资待遇相当高.

The managing director of** Dice, Alice Hills** has said that Hadoop jobs have seen 64% increase from the previous year. It is evident that Hadoop is ruling the Big Data market and its future is bright. The demand for Big Data Analytics professional is ever increasing. As it is a known fact that data is nothing without power to analyze it.

You must check Expert’s Prediction for the Future of Hadoop

Summary – Hadoop Tutorial

摘要-Hadoop 教程

On concluding this Hadoop tutorial, we can say that Apache Hadoop is the most popular and powerful big data tool. Big Data stores huge amount of data in the distributed manner and processes the data in parallel on a cluster of nodes. It provides the world’s most reliable storage layer- HDFS. Batch processing engine MapReduce and Resource management layer- YARN.

总结这个 Hadoop 教程,可以说 Apache Hadoop 是目前最流行、最强大的大数据工具.大数据以分布式的方式存储大量数据,并在一个节点集群上并行处理数据.它提供了世界上最可靠的存储层 -- HDFS.批处理引擎 MapReduce 和资源管理层-YARN.

On summarizing this Hadoop Tutorial, I want to give you a quick revision of all the topics we have discussed

在总结这个 Hadoop 教程时,我想给你一个我们讨论过的所有主题的快速修订

  • The concept of Big Data

  • Reason for Hadoop’s Invention

  • Prerequisites to learn Hadoop

  • Introduction to Hadoop

  • Core components of Hadoop

  • Why Hadoop

  • Hadoop Architecture

  • Features of Hadoop

  • Hadoop Flavours

  • Future Scope of Hadoop

  • 大数据的概念

  • Hadoop 发明的原因:

  • 学习 Hadoop 的先决条件

  • Hadoop 简介

  • Hadoop 的核心组件

  • Hadoop 为什么

  • Hadoop 架构

  • Hadoop 的特点

  • Hadoop 特色

  • Hadoop 的未来范围

Hope this Hadoop Tutorial helped you. If you face any difficulty while understanding Hadoop concept, comment below.

希望这个 Hadoop 教程对你有帮助.如果您在理解 Hadoop 概念时遇到任何困难,请在下面发表评论.

***This is the right time to start your Hadoop learning with industry experts. ***

***这是开始你的与行业专家一起学习 Hadoop. ***

https://data-flair.training/blogs/hadoop-tutorial

©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 212,029评论 6 492
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 90,395评论 3 385
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 157,570评论 0 348
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 56,535评论 1 284
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 65,650评论 6 386
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 49,850评论 1 290
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 39,006评论 3 408
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 37,747评论 0 268
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 44,207评论 1 303
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 36,536评论 2 327
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 38,683评论 1 341
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 34,342评论 4 330
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 39,964评论 3 315
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 30,772评论 0 21
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 32,004评论 1 266
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 46,401评论 2 360
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 43,566评论 2 349

推荐阅读更多精彩内容