hadoop高可用实现
为什么需要高可用
hadoop有以下关键角色,其中NameNode和ResourceManager至关重要,一个关键节点出故障,整个集群就失效,所以需要配置高可用
- NameNode
- datanode
- secondarynamenode
- Mapreduce
- resourcemanager
- nodemanager
解决NameNode和resourceManager的高可用
官方给了两种解决方案
- HDFS with NFS
- HDFS with QJM
两种高可用的对比
特性/组件 | HDFS with NFS | HDFS with QJM (HA) |
---|---|---|
文件系统协议 | NFSv3/NFSv4 | HDFS (Hadoop Distributed Filesystem) |
用途 | 提供对HDFS的NFS访问 | 提供HDFS的高可用性配置 |
访问方式 | 通过NFS客户端访问HDFS | 通过Hadoop API、命令行工具、HDFS客户端等访问 |
数据一致性 | 可能存在数据一致性问题(取决于NFS配置和版本) | Hadoop内部的数据一致性保证 |
性能 | 依赖于NFS服务器和网络的性能 | 优化的HDFS性能 |
扩展性 | 受限于NFS服务器的扩展性 | HDFS集群的扩展性 |
高可用性 | 需要额外的HA解决方案(如NFS服务器的冗余) | 通过QJM实现HDFS NameNode的HA |
故障恢复 | 依赖于NFS服务器的故障恢复能力 | 快速自动故障切换到备用NameNode |
安全性 | 依赖于NFS服务器的安全性设置 | Hadoop的安全特性(如Kerberos认证) |
复杂性 | 配置相对简单(仅NFS配置) | 需要配置HDFS HA(包括QJM、ZooKeeper等) |
适用场景 | 需要通过NFS访问HDFS的场景 | 需要HDFS高可用性、大规模数据处理的场景 |
两者都能实现热备,但是NFS无法做集群,存在单点故障的问题
NameNode高可用两个数据需要解决fsimages和fsedits
- QJM方式使用两个NameNode,datanode对两个NameNode进行汇报,Active和Standby两个NameNode,这就实现了数据的同步,fsimages通过两个namenode解决了
- fsedits数据变更日志不保存在本地,发送保存在JNN集群中,Activy NameNode发送数据变更日志到JNN集群,Standby NameNode会读取执行
Zookeeperf简介
ZooKeeper是一个分布式的,开放源码的分布式应用程序协调服务,是Google的Chubby一个开源的实现,是Hadoop和Hbase的重要组件。它是一个为分布式应用提供一致性服务的软件,提供的功能包括:配置维护、域名服务、分布式同步、组服务等。
Zookeeperf角色与特性
- Leader:接受所有Follower的提案请求并统一协调发起提案的投票,负责与所有的Fol川ower进行内部数据交换
- Follower:直接为客户端服务并参与提案的投票,同时与Leaderi进行数据交换
- Observer:直接为客户端服务但并不参与提案的投票,同时也与Leaderi进行数据交换(可以部署在广域网)
Zookeeper可伸缩扩展性原理与设计
- 客户端提交一个请求,若是读请求,则由每台Server的本地副本数据库直接响应。若是写请求,需要通过一致性协议(Zab)来处理
- Zab协议规定:来自Client的所有写请求都要转发给ZK服务中唯一的Leader,由Leader根据该请求发起一个Proposal,然后其他Server对该Proposali进行Vote,之后Leaderi对Vote进行收集,当Vote数量过半时Leader会向所有的Server发送一个通知消息,最后当Client所连接的Server收到该消息时,会把该操作更新到内存中并对Client的写请求做出回应。
环境
主机 | ip | 角色 |
---|---|---|
nnode1 | 192.168.126.21 | NameNode |
dnode1 | 192.168.126.22 | DataNode Zookeeper journalNode |
dnode2 | 192.168.126.23 | DataNode Zookeeper journalNode |
dnode3 | 192.168.126.70 | DataNode Zookeeper journalNode |
nnode2 | 192.168.126.71 | NameNode |
zookeeper关键配置
vim conf/zoo.cfg
clientPort=2181
server.1=192.168.126.70:3188:3288
server.2=192.168.126.22:3188:3288
server.3=192.168.126.23:3188:3288
[root@dnode3 zookeeper]# cat data/myid
1
[root@dnode1 zookeeper]# cat data/myid
2
[root@dnode2 zookeeper]# cat data/myid
3
[root@dnode2 zookeeper]# ./bin/zkServer.sh start
[root@dnode2 zookeeper]# ./bin/zkServer.sh status
[root@dnode1 zookeeper]# ./bin/zkServer.sh start
[root@dnode1 zookeeper]# ./bin/zkServer.sh status
[root@dnode3 zookeeper]# ./bin/zkServer.sh start
[root@dnode3 zookeeper]# ./bin/zkServer.sh status
hadoop高可用配置
主要用到的配置文件有
- core-site.xml
- hadoop-env.sh
- hdfs-site.xml
- slaves
- yarn-site.xml
环境准备
所有节点配置主机解析
cat > /etc/hosts <<EOF
127.0.0.1 localhost
192.168.126.21 nnode1
192.168.126.71 nnode2
192.168.126.22 dnode1
192.168.126.23 dnode2
192.168.126.70 dnode3
EOF
同步namenode之间的公钥私钥
[root@nnode1 hadoop]# rsync -aXSH --delete /root/.ssh nnode2:/root/
1、slaves
[root@nnode1 hadoop]# cat etc/hadoop/slaves
dnode1
dnode2
dnode3
2、hadoop-env.sh
[root@nnode1 hadoop]# cat etc/hadoop/hadoop-env.sh
export JAVA_HOME=${JAVA_HOME}
#export JSVC_HOME=${JSVC_HOME}
export HADOOP_CONF_DIR="/usr/local/hadoop/etc/hadoop/"
3、core-site.xml
vim etc/hadoop/core-site.xml
<property>
<name>fs.defaultFS</name>
<value>hdfs://nngroup</value>
<description>集群用hdfs组为存储,file:///默认是本机文件系统</description>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/var/hadoop</value>
<description>hadoop所有数据的根目录</description>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>dnode1:2181,dnode2:2181,dnode3:2181</value>
<description>zookeeper集群</description>
</property>
4、hdfs-site.xml
[root@nnode1 hadoop]# cat etc/hadoop/hdfs-site.xml
<configuration>
<property>
<name>dfs.nameservices</name>
<value>nngroup</value>
<description>组名</description>
</property>
<property>
<name>dfs.ha.namenodes.nngroup</name>
<value>nnode1,nnode2</value>
<description>成员名</description>
</property>
<property>
<name>dfs.namenode.rpc-address.nngroup.nnode1</name>
<value>nnode1:8020</value>
<description>心跳</description>
</property>
<property>
<name>dfs.namenode.rpc-address.nngroup.nnode2</name>
<value>nnode2:8020</value>
<description>心跳</description>
</property>
<property>
<name>dfs.namenode.http-address.nngroup.nnode1</name>
<value>nnode1:50075</value>
<description>声明namenode1</description>
</property>
<property>
<name>dfs.namenode.http-address.nngroup.nnode2</name>
<value>nnode2:50075</value>
<description>声明namenode2</description>
</property>
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://dnode1:8485;dnode2:8485;dnode3:8485/nngroup</value>
</property>
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/var/hadoop/jounal</value>
<description>jounal保存路径</description>
</property>
<property>
<name>dfs.client.failover.proxy.provider.nngroup</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
<description>高可用软件</description>
</property>
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value>
<description>高可用连接方式</description>
</property>
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/root/.ssh/id_rsa</value>
<description>ssh密匙路径</description>
</property>
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
<description>高可用是否自动切换</description>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
<description>一个数据默认存几份</description>
</property>
<property>
<name>dfs.hosts.exclude</name>
<value>/usr/local/hadoop/etc/hadoop/exclude</value>
<description>排除数据节点的文件路径</description>
</property>
</configuration>
5、mapred-site.xml
[root@nnode1 hadoop]# cat etc/hadoop/mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
<description>用于执行 MapReduce 作业的运行时框架。可以是 local, classic or yarn.</description>
</property>
</configuration>
[root@nnode1 hadoop]# vim etc/hadoop/yarn-site.xml
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
<property>
<name>yarn.resourcemanager.recovery.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.store.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
</property>
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>dnode1:2181,dnode2:2181,dnode3:2181</value>
</property>
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>yarn-ha</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>nnode1</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>nnode2</value>
</property>
<property>
<!-- RM HTTP访问地址 默认:${yarn.resourcemanager.hostname}:8088-->
<name>yarn.resourcemanager.webapp.address.rm1</name>
<value>nnode1:8088</value>
</property>
<property>
<!-- RM HTTP访问地址 默认:${yarn.resourcemanager.hostname}:8088-->
<name>yarn.resourcemanager.webapp.address.rm2</name>
<value>nnode2:8088</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
<description>使用哪个计算节点框架,使用mapreduce_shuffle</description>
</property>
</configuration>
启动集群
1、同步配置到所有机器
[root@nnode1 hadoop]# for i in nnode2 dnode{1..3};do rsync -aXSH --delete /usr/local/hadoop ${i}:/usr/local/ & done
2、格式化zookeeper
zookeeper必须先起来
[root@dnode1 hadoop]# ../zookeeper/bin/zkServer.sh start
[root@dnode2 hadoop]# ../zookeeper/bin/zkServer.sh start
[root@dnode3 hadoop]# ../zookeeper/bin/zkServer.sh start
[root@nnode1 hadoop]# ./bin/hdfs zkfc -formatZK
3、三个datanode开启journalNode
[root@nnode1 hadoop]# ./sbin/hadoop-daemon.sh start journalnode
[root@nnode2 hadoop]# ./sbin/hadoop-daemon.sh start journalnode
[root@nnode3 hadoop]# ./sbin/hadoop-daemon.sh start journalnode
starting journalnode, logging to /usr/local/hadoop/logs/hadoop-root-journalnode-nnode2.out
[root@nnode2 hadoop]# jps
4836 JournalNode
4887 Jps
4、格式化NN1
NN1上格式化,journalNode必须都起来
[root@nnode1 hadoop]# ./bin/hdfs namenode -format
5、NN2同步NN上的数据
[root@nnode2 hadoop]# rsync -aSH --delete nnode1:/var/hadoop /var/
6、初始化JNN
[root@dnode1 hadoop]# for i in dnode{1..3};do ssh ${i} rm -fr /var/hadoop/jounal/nngroup/* ;done
[root@dnode1 hadoop]# ./bin/hdfs namenode -initializeSharedEdits
7、三个datanode停止journalNode
[root@nnode1 hadoop]# ssh dnode1 /usr/local/hadoop/sbin/hadoop-daemon.sh stop journalnode
[root@nnode1 hadoop]# ssh dnode2 /usr/local/hadoop/sbin/hadoop-daemon.sh stop journalnode
[root@nnode1 hadoop]# ssh dnode3 /usr/local/hadoop/sbin/hadoop-daemon.sh stop journalnode
8、NN1启动集群
[root@nnode1 hadoop]# ./sbin/start-all.sh
9、NN2启动热备
[root@nnode2 hadoop]# ./sbin/yarn-daemon.sh start resourcemanager
获取集群状态
[root@nnode1 hadoop]# ./bin/hdfs dfsadmin -report
[root@nnode1 hadoop]# ./bin/hdfs haadmin -getServiceState nnode2
standby
[root@nnode1 hadoop]# ./bin/hdfs haadmin -getServiceState nnode1
active
[root@nnode1 hadoop]# ./bin/yarn rmadmin -getServiceState rm1
active
[root@nnode1 hadoop]# ./bin/yarn rmadmin -getServiceState rm2
standby
10、节点上的服务
[root@nnode1 ~]# jps
3312 DFSZKFailoverController
4128 NameNode
4054 ResourceManager
10541 Jps
[root@nnode2 ~]# jps
20928 ResourceManager
15346 DFSZKFailoverController
20995 NameNode
93607 Jps
[root@dnode1 ~]# jps
2834 NodeManager
2743 JournalNode
9001 Jps
2075 QuorumPeerMain
2667 DataNode
高可用验证
不管两个节点的namenode然是rm停掉之后另一个会成为active
[root@nnode1 hadoop]# ./bin/yarn rmadmin -getServiceState rm1
active
[root@nnode1 hadoop]# ./bin/yarn rmadmin -getServiceState rm2
standby
[root@nnode1 hadoop]# ./sbin/yarn-daemon.sh stop resourcemanager
stopping resourcemanager
[root@nnode1 hadoop]# ./bin/yarn rmadmin -getServiceState rm2
active
[root@nnode1 hadoop]# ./sbin/yarn-daemon.sh start resourcemanager
starting resourcemanager, logging to /usr/local/hadoop/logs/yarn-root-resourcemanager-nnode1.out
[root@nnode1 hadoop]# ./bin/yarn rmadmin -getServiceState rm1
standby
上传要分析的数据
[root@nnode1 hadoop]# ./bin/hadoop fs -mkdir /input
[root@nnode1 hadoop]# ./bin/hadoop fs -ls hdfs://nngroup/
Found 1 items
drwxr-xr-x - root supergroup 0 2024-06-22 20:21 hdfs://nngroup/input
[root@nnode1 hadoop]# ./bin/hadoop fs -put *.txt /input/
[root@nnode1 hadoop]# ./bin/hadoop fs -ls /input
Found 3 items
-rw-r--r-- 2 root supergroup 106210 2024-06-22 20:25 /input/LICENSE.txt
-rw-r--r-- 2 root supergroup 15830 2024-06-22 20:25 /input/NOTICE.txt
-rw-r--r-- 2 root supergroup 1366 2024-06-22 20:25 /input/README.txt
集群状态如下
[root@nnode1 hadoop]# ./bin/yarn rmadmin -getServiceState rm1
standby
[root@nnode1 hadoop]# ./bin/yarn rmadmin -getServiceState rm2
active
[root@nnode1 hadoop]# ./bin/hdfs haadmin -getServiceState nnode1
standby
[root@nnode1 hadoop]# ./bin/hdfs haadmin -getServiceState nnode2
active
用集群分析词频,然后立马停止NN2上的namenode和resourcemanager,在查看分析是否受到影响
./bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.10.2.jar wordcount /input /ooresult
[root@nnode2 hadoop]# ./sbin/yarn-daemon.sh stop resourcemanager
stopping resourcemanager
[root@nnode2 hadoop]# ./sbin/hadoop-daemon.sh stop namenode
stopping namenode
可以看到数据处理受到了一点影响,比正常情况慢几十秒,但是实现了集群故障自动恢复和高可用
[root@nnode2 hadoop]# ./sbin/yarn-daemon.sh start resourcemanager
starting resourcemanager, logging to /usr/local/hadoop/logs/yarn-root-resourcemanager-nnode2.out
[root@nnode2 hadoop]# ./sbin/hadoop-daemon.sh start namenode
starting namenode, logging to /usr/local/hadoop/logs/hadoop-root-namenode-nnode2.out
报错及解决
AM Container for appattempt_1719067220150_0002_000002 exited with exitCode: 1
./bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.10.2.jar wordcount /input /ooresult
24/06/22 22:42:48 INFO mapreduce.Job: Job job_1719067220150_0002 failed with state FAILED due to: Application application_1719067220150_0002 failed 2 times due to AM Container for appattempt_1719067220150_0002_000002 exited with exitCode: 1
Failing this attempt.Diagnostics: [2024-06-22 22:42:48.164]Exception from container-launch.
Container id: container_e09_1719067220150_0002_02_000001
Exit code: 1
[2024-06-22 22:42:48.166]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
执行hadoop-3.3.0 wc 报错Container exited with a non-zero exit code 1. Error file: prelaunch.err.-CSDN博客
必须指定web访问地址
[root@nnode1 hadoop]# vim etc/hadoop/yarn-site.xml
<property>
<!-- RM HTTP访问地址 默认:${yarn.resourcemanager.hostname}:8088-->
<name>yarn.resourcemanager.webapp.address.rm1</name>
<value>nnode1:8088</value>
</property>
<property>
<!-- RM HTTP访问地址 默认:${yarn.resourcemanager.hostname}:8088-->
<name>yarn.resourcemanager.webapp.address.rm2</name>
<value>nnode2:8088</value>
</property>
出错之后快速还原
ssh nnode2 /usr/local/hadoop/sbin/yarn-daemon.sh stop resourcemanager
for i in nnode1 nnode2 dnode{1..3};do ssh ${i} rm -fr /var/hadoop;done
for i in nnode2 dnode{1..3};do rsync -aXSH --delete /usr/local/hadoop ${i}:/usr/local/ & done
./bin/hdfs zkfc -formatZK
for i in dnode{1..3};do ssh ${i} /usr/local/hadoop/sbin/hadoop-daemon.sh start journalnode;done
./bin/hdfs namenode -format
ssh nnode2 rsync -aSH --delete nnode1:/var/hadoop /var/hadoop
for i in dnode{1..3};do ssh ${i} rm -fr /var/hadoop/jounal/nngroup/* ;done
./bin/hdfs namenode -initializeSharedEdits
for i in dnode{1..3};ssh ${i} /usr/local/hadoop/sbin/hadoop-daemon.sh stop journalnode ;done
./sbin/start-all.sh
ssh nnode2 /usr/local/hadoop/sbin/yarn-daemon.sh start resourcemanager
./bin/yarn rmadmin -getServiceState rm1
./bin/yarn rmadmin -getServiceState rm2
./bin/hdfs haadmin -getServiceState nnode1
./bin/hdfs haadmin -getServiceState nnode2