Contents
Hadoop HDFS分布式集群部署
Hadoop环境准备
1、系统:centos7
2、关闭防火墙
3、关闭selinux(重点)
主机 | ip | 角色 |
---|---|---|
nnode1 | 192.168.126.21 | NameNode SecondaryNameNode ResourceManager |
dnode1 | 192.168.126.22 | DataNode NodeManager |
dnode2 | 192.168.126.23 | DataNode NodeManager |
dnode3 | 192.168.126.70 | DataNode NodeManager |
dnode4 | 192.168.126.71 | DataNode |
fsgw | 192.168.126.51 | nfsgw |
1、设置主机名
hostnamectl set-hostname 主机名
2、配置主机名解析(强依赖,必须配)
在/etc/hosts下配置所有主机的主机名解析
cat >> /etc/hosts <<EOF
192.168.126.21 nnode1
192.168.126.22 dnode1
192.168.126.23 dnode2
192.168.126.70 dnode3
EOF
3、免密登录
在所有主机上
编辑vim /etc/ssh/ssh_config
禁止主机检查
分发公匙到所有的主机上,包括自己
ssh-keygen
ssh-copy-id nnode1
ssh-copy-id dnode1
ssh-copy-id dnode2
ssh-copy-id dnode3
Hadoop下载安装
Hadoop依赖java环境,安装java开发包即可
yum install -y java-1.8.0-openjdk-devel
wget https://dlcdn.apache.org/hadoop/common/hadoop-2.10.2/hadoop-2.10.2.tar.gz
[root@nnode1 software]# tar -zxf hadoop-2.10.2.tar.gz -C /usr/local/
[root@nnode1 software]# ls /usr/local/hadoop-2.10.2/
bin etc include lib libexec LICENSE.txt NOTICE.txt README.txt sbin share
[root@nnode1 local]# cd /usr/local/
[root@nnode1 local]# mv hadoop-2.10.2 hadoop
Hadoop HDFS基本配置
配置文件 | 作用 |
---|---|
hdfs-site.xml |
HDFS(Hadoop Distributed FileSystem)的配置文件。它包含了HDFS集群的具体配置,如数据块大小、副本因子、元数据存储位置等。 |
slaves |
此文件通常包含Hadoop集群中所有数据节点的主机名或IP地址。它用于启动或停止Hadoop集群时,自动连接到所有数据节点。 |
yarn-site.xml |
YARN(Yet Another Resource Negotiator)的配置文件。YARN是Hadoop 2.x及更高版本中用于管理和调度集群资源的框架。此文件包含了YARN集群的具体配置,如ResourceManager和NodeManager的地址、资源管理器调度策略等。 |
hadoop-env.sh |
Hadoop环境变量的配置文件。它用于设置Java环境变量、Hadoop守护进程的堆大小等。这个文件通常用于自定义Hadoop集群的运行环境。 |
core-site.xml |
Hadoop核心配置文件。它包含了Hadoop集群的基本配置,如Hadoop文件系统的默认地址(通常是HDFS)、Hadoop临时目录的位置、Hadoop集群的默认端口号等。 |
mapred-site.xml.template |
这是一个模板文件,通常用于配置MapReduce作业的参数。但在Hadoop 2.x及更高版本中,MapReduce框架已经集成到YARN中,因此这个文件的实际名称通常是mapred-site.xml (而不是.template )。它包含了MapReduce框架的配置,如JobHistory Server的地址、MapReduce任务的内存分配等。 |
配置文件参考位置
配置文件的基本格式
<property>
<name>参数名</name>
<value>值</value>
<description>描述</description>
</property>
<property>
<name></name>
<value></value>
<description></description>
</property>
1、配置hadoop-env.sh
echo "export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.412.b08-1.el7_9.x86_64/jre" >> ~/.bashrc
source ~/.bashrc
echo "export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.161-2.b14.el7.x86_64/jre/" >> ~/.bashrc
source ~/.bashrc
echo $JAVA_HOME
这是因为不同机器java版本不同路径有点差异,只能用环境变量来,环境一致的话直接填路径字符串即可
编辑hadoop环境变量,修改java路径和hadoop路径
[root@nnode1 hadoop-2.10.2]# vim etc/hadoop/hadoop-env.sh
export JAVA_HOME=$JAVA_HOME
export HADOOP_CONF_DIR="/usr/local/hadoop/etc/hadoop/"
2、core-site配置
[root@nnode1 hadoop]# vim etc/hadoop/core-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://nnode1:9000</value>
<description>集群使用什么作为存储,file:///默认是本机文件系统</description>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/var/hadoop</value>
<description>hadoop所有数据的根目录</description>
</property>
</configuration>
3、hdfs-site.xml
[root@nnode1 hadoop]# vim etc/hadoop/hdfs-site.xml
<property>
<name>dfs.namenode.http-address</name>
<value>nnode1:50075</value>
<description>namenode的端口地址</description>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>nnode1:50090</value>
<description>secondary所在的主机</description>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
<description>一个数据默认存几份</description>
</property>
4、slaves
删除localhost,声明数据节点
dnode1
dnode2
dnode3
5、格式化namenode
创建目录,格式化namenode
mkdir /var/hadoop
[root@nnode1 hadoop]# ./bin/hdfs namenode -format
INFO common.Storage: Storage directory /var/hadoop/dfs/name has been successfully formatted.
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at nnode1/192.168.126.21
************************************************************/
6、同步配置
第一次同步整个程序
for i in dnode{1..3};do rsync -aXSH --delete /usr/local/hadoop ${i}:/usr/local/ & done
后面同步配置文件
for i in dnode{1..3};do rsync -aXSH --delete /usr/local/hadoop/etc/hadoop/ ${i}:/usr/local/hadoop/etc/hadoop/ & done
for i in dnode{1..3};do ssh ${i} "mkdir /var/hadoop" ;done
7、开启hdfs服务
会在hadoop下创建logs文件夹
./sbin/start-dfs.sh
[root@nnode1 hadoop]# ./sbin/start-dfs.sh
Starting namenodes on [nnode1]
nnode1: starting namenode, logging to /usr/local/hadoop/logs/hadoop-root-namenode-nnode1.out
dnode3: starting datanode, logging to /usr/local/hadoop/logs/hadoop-root-datanode-dnode3.out
dnode1: starting datanode, logging to /usr/local/hadoop/logs/hadoop-root-datanode-dnode1.out
dnode2: starting datanode, logging to /usr/local/hadoop/logs/hadoop-root-datanode-dnode2.out
Starting secondary namenodes [nnode1]
nnode1: starting secondarynamenode, logging to /usr/local/hadoop/logs/hadoop-root-secondarynamenode-nnode1.out
[root@nnode1 hadoop]# jps
9217 SecondaryNameNode
8997 NameNode
9343 Jps
[root@dnode1 hadoop]# jps
8280 DataNode
8364 Jps
[root@dnode2 local]# jps
6416 Jps
6334 DataNode
[root@dnode3 ~]# jps
6358 DataNode
6442 Jps
启动报错就停止服务删除日志,修改配置同步重启
./sbin/stop-dfs.sh
for i in dnode{1..3};do ssh ${i} "rm -f /usr/local/hadoop/logs/*" ;done
for i in dnode{1..3};do rsync -aXSH --delete /usr/local/hadoop/etc/hadoop/ ${i}:/usr/local/hadoop/etc/hadoop/ & done
./sbin/start-dfs.sh
查看log目录下的log文件纠错
cat hadoop-root-datanode-dnode1.log
8、查看节点状态
[root@nnode1 hadoop]# ./bin/hdfs dfsadmin -report
Configured Capacity: 130326466560 (121.38 GB)
Present Capacity: 61742063616 (57.50 GB)
DFS Remaining: 61742026752 (57.50 GB)
DFS Used: 36864 (36 KB)
DFS Used%: 0.00%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0
Pending deletion blocks: 0
-------------------------------------------------
Live datanodes (3):
Name: 192.168.126.22:50010 (dnode1)
Hostname: dnode1
Decommission Status : Normal
Configured Capacity: 51213500416 (47.70 GB)
DFS Used: 12288 (12 KB)
Non DFS Used: 25428824064 (23.68 GB)
DFS Remaining: 25784664064 (24.01 GB)
DFS Used%: 0.00%
DFS Remaining%: 50.35%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Sat Jun 22 10:34:56 CST 2024
Last Block Report: Sat Jun 22 10:33:04 CST 2024
Name: 192.168.126.23:50010 (dnode2)
Hostname: dnode2
Decommission Status : Normal
Configured Capacity: 51213500416 (47.70 GB)
DFS Used: 12288 (12 KB)
Non DFS Used: 22111420416 (20.59 GB)
DFS Remaining: 29102067712 (27.10 GB)
DFS Used%: 0.00%
DFS Remaining%: 56.82%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Sat Jun 22 10:34:56 CST 2024
Last Block Report: Sat Jun 22 10:33:04 CST 2024
Name: 192.168.126.70:50010 (dnode3)
Hostname: dnode3
Decommission Status : Normal
Configured Capacity: 27899465728 (25.98 GB)
DFS Used: 12288 (12 KB)
Non DFS Used: 21044158464 (19.60 GB)
DFS Remaining: 6855294976 (6.38 GB)
DFS Used%: 0.00%
DFS Remaining%: 24.57%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Sat Jun 22 10:34:53 CST 2024
Last Block Report: Sat Jun 22 10:33:05 CST 2024
Mapreduce配置
[root@nnode1 hadoop]# cp etc/hadoop/mapred-site.xml.template etc/hadoop/mapred-site.xml
vim etc/hadoop/mapred-site.xml
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
<description>用于执行 MapReduce 作业的运行时框架。可以是 local, classic or yarn.</description>
</property>
Yarn配置
vim etc/hadoop/yarn-site.xml
<property>
<name>yarn.resourcemanager.hostname</name>
<value>nnode1</value>
<description>ResourceManager地址.</description>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
<description>使用哪个计算节点框架,使用mapreduce_shuffle</description>
</property>
启动yarn
[root@nnode1 hadoop]# ./sbin/start-yarn.sh
验证服务是否启动
[root@nnode1 hadoop]# jps
2798 ResourceManager
1951 NameNode
2175 SecondaryNameNode
3071 Jps
[root@dnode1 ~]# jps
9175 NodeManager
8280 DataNode
9295 Jps
[root@dnode2 ~]# jps
7001 NodeManager
6334 DataNode
7119 Jps
[root@dnode3 ~]# jps
8308 NodeManager
6358 DataNode
8491 Jps
[root@nnode1 hadoop]# ./bin/yarn node -list
24/06/22 11:07:16 INFO client.RMProxy: Connecting to ResourceManager at nnode1/192.168.126.21:8032
Total Nodes:3
Node-Id Node-State Node-Http-Address Number-of-Running-Containers
dnode3:42037 RUNNING dnode3:8042 0
dnode1:41147 RUNNING dnode1:8042 0
dnode2:43926 RUNNING dnode2:8042 0
FDFS操作
[root@nnode1 hadoop]# ./bin/hadoop fs
Usage: hadoop fs [generic options]
[-appendToFile <localsrc> ... <dst>]
[-cat [-ignoreCrc] <src> ...]
[-checksum <src> ...]
[-chgrp [-R] GROUP PATH...]
[-chmod [-R] <MODE[,MODE]... | OCTALMODE> PATH...]
[-chown [-R] [OWNER][:[GROUP]] PATH...]
[-copyFromLocal [-f] [-p] [-l] [-d] <localsrc> ... <dst>]
[-copyToLocal [-f] [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]
[-count [-q] [-h] [-v] [-t [<storage type>]] [-u] [-x] <path> ...]
[-cp [-f] [-p | -p[topax]] [-d] <src> ... <dst>]
[-createSnapshot <snapshotDir> [<snapshotName>]]
[-deleteSnapshot <snapshotDir> <snapshotName>]
[-df [-h] [<path> ...]]
[-du [-s] [-h] [-x] <path> ...]
[-expunge]
[-find <path> ... <expression> ...]
[-get [-f] [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]
[-getfacl [-R] <path>]
[-getfattr [-R] {-n name | -d} [-e en] <path>]
[-getmerge [-nl] [-skip-empty-file] <src> <localdst>]
[-help [cmd ...]]
[-ls [-C] [-d] [-h] [-q] [-R] [-t] [-S] [-r] [-u] [<path> ...]]
[-mkdir [-p] <path> ...]
[-moveFromLocal <localsrc> ... <dst>]
[-moveToLocal <src> <localdst>]
[-mv <src> ... <dst>]
[-put [-f] [-p] [-l] [-d] <localsrc> ... <dst>]
[-renameSnapshot <snapshotDir> <oldName> <newName>]
[-rm [-f] [-r|-R] [-skipTrash] [-safely] <src> ...]
[-rmdir [--ignore-fail-on-non-empty] <dir> ...]
[-setfacl [-R] [{-b|-k} {-m|-x <acl_spec>} <path>]|[--set <acl_spec> <path>]]
[-setfattr {-n name [-v value] | -x name} <path>]
[-setrep [-R] [-w] <rep> <path> ...]
[-stat [format] <path> ...]
[-tail [-f] <file>]
[-test -[defsz] <path>]
[-text [-ignoreCrc] <src> ...]
[-touchz <path> ...]
[-truncate [-w] <length> <path> ...]
[-usage [cmd ...]]
操作同shell基本一致,touch不同,参数是touchz
[root@nnode1 hadoop]# ./bin/hadoop fs -touchz /touch.txt
[root@nnode1 hadoop]# ./bin/hadoop fs -ls /
Found 1 items
-rw-r--r-- 2 root supergroup 0 2024-06-22 11:04 /touch.txt
上传操作
-put 本地文件路径 远程文件路劲
[root@nnode1 hadoop]# ./bin/hadoop fs -mkdir /oo
[root@nnode1 hadoop]# ./bin/hadoop fs -put *.txt /oo
[root@nnode1 hadoop]# ./bin/hadoop fs -ls /
Found 2 items
drwxr-xr-x - root supergroup 0 2024-06-22 11:09 /oo
-rw-r--r-- 2 root supergroup 0 2024-06-22 11:04 /touch.txt
下载操作
-get 远程文件路基 本地文件路径
[root@nnode1 hadoop]# ./bin/hadoop fs -get /touch.txt /tmp/
[root@nnode1 hadoop]# ls /tmp/touch.txt
/tmp/touch.txt
[root@nnode1 hadoop]# ls -l /tmp/touch.txt
-rw-r--r-- 1 root root 0 6月 22 11:12 /tmp/touch.txt
分析集群资源
./bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.10.2.jar wordcount 集群文件路径 集群文件路径
数据切分,reduce计算完成100%
map 0% reduce 0%
24/06/22 11:17:32 INFO mapreduce.Job: map 33% reduce 0%
24/06/22 11:17:36 INFO mapreduce.Job: map 100% reduce 0%
24/06/22 11:17:37 INFO mapreduce.Job: map 100% reduce 100%
[root@nnode1 hadoop]# ./bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.10.2.jar wordcount /oo /ooresult
24/06/22 11:17:16 INFO client.RMProxy: Connecting to ResourceManager at nnode1/192.168.126.21:8032
24/06/22 11:17:17 INFO input.FileInputFormat: Total input files to process : 3
24/06/22 11:17:17 INFO mapreduce.JobSubmitter: number of splits:3
24/06/22 11:17:17 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1719025149534_0001
24/06/22 11:17:17 INFO conf.Configuration: resource-types.xml not found
24/06/22 11:17:17 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
24/06/22 11:17:17 INFO resource.ResourceUtils: Adding resource type - name = memory-mb, units = Mi, type = COUNTABLE
24/06/22 11:17:17 INFO resource.ResourceUtils: Adding resource type - name = vcores, units = , type = COUNTABLE
24/06/22 11:17:18 INFO impl.YarnClientImpl: Submitted application application_1719025149534_0001
24/06/22 11:17:18 INFO mapreduce.Job: The url to track the job: http://nnode1:8088/proxy/application_1719025149534_0001/
24/06/22 11:17:18 INFO mapreduce.Job: Running job: job_1719025149534_0001
24/06/22 11:17:26 INFO mapreduce.Job: Job job_1719025149534_0001 running in uber mode : false
24/06/22 11:17:26 INFO mapreduce.Job: map 0% reduce 0%
24/06/22 11:17:32 INFO mapreduce.Job: map 33% reduce 0%
24/06/22 11:17:36 INFO mapreduce.Job: map 100% reduce 0%
24/06/22 11:17:37 INFO mapreduce.Job: map 100% reduce 100%
24/06/22 11:17:38 INFO mapreduce.Job: Job job_1719025149534_0001 completed successfully
24/06/22 11:17:38 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=50617
FILE: Number of bytes written=942275
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=123698
HDFS: Number of bytes written=36016
HDFS: Number of read operations=12
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=3
Launched reduce tasks=1
Data-local map tasks=3
Total time spent by all maps in occupied slots (ms)=18146
Total time spent by all reduces in occupied slots (ms)=3073
Total time spent by all map tasks (ms)=18146
Total time spent by all reduce tasks (ms)=3073
Total vcore-milliseconds taken by all map tasks=18146
Total vcore-milliseconds taken by all reduce tasks=3073
Total megabyte-milliseconds taken by all map tasks=18581504
Total megabyte-milliseconds taken by all reduce tasks=3146752
Map-Reduce Framework
Map input records=2459
Map output records=17410
Map output bytes=190413
Map output materialized bytes=50629
Input split bytes=292
Combine input records=17410
Combine output records=3122
Reduce input groups=2848
Reduce shuffle bytes=50629
Reduce input records=3122
Reduce output records=2848
Spilled Records=6244
Shuffled Maps =3
Failed Shuffles=0
Merged Map outputs=3
GC time elapsed (ms)=412
CPU time spent (ms)=4660
Physical memory (bytes) snapshot=817803264
Virtual memory (bytes) snapshot=8393662464
Total committed heap usage (bytes)=412364800
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=123406
File Output Format Counters
Bytes Written=36016
[root@nnode1 hadoop]# ./bin/hadoop fs -ls /ooresult
Found 2 items
-rw-r--r-- 2 root supergroup 0 2024-06-22 11:25 /ooresult/_SUCCESS
-rw-r--r-- 2 root supergroup 36016 2024-06-22 11:25 /ooresult/part-r-00000
[root@nnode1 hadoop]# ./bin/hadoop fs -cat /ooresult/part-r-00000
web管理查看
http://192.168.126.21:50090
http://192.168.126.21:50075/
Hadoop datanode节点的管理
增加节点
0、首先要要关闭selinux
1、管理节点配置ssh免密登录
2、所有节点hosts添加新节点
3、安装Java环境,配置环境变量
4、slaves配置文件添加节点
5、拷贝管理节点hadoop程序文件夹到新节点
6、新节点执行启动程序,改命令只在本机有效,后根角色后命令
./sbin/hadoop-daemon.sh start datanode
7、恢复数据,同步数据
[root@compute ~]# hostnamectl set-hostname dnode4
[root@compute ~]# yum install -y java-1.8.0-openjdk-devel
[root@dnode4 ~]# echo "export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.161-2.b14.el7.x86_64/jre/" >> ~/.bashrc
[root@dnode4 ~]# source ~/.bashrc
[root@dnode4 ~]# echo $JAVA_HOME
/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.161-2.b14.el7.x86_64/jre/
[root@nnode1 hadoop]# ssh-copy-id nnode4
[root@nnode1 hadoop]# cat /etc/hosts
127.0.0.1 localhost
192.168.126.21 nnode1
192.168.126.22 dnode1
192.168.126.23 dnode2
192.168.126.70 dnode3
192.168.126.71 dnode4
[root@nnode1 hadoop]# cat etc/hadoop/slaves
dnode1
dnode2
dnode3
dnode4
for i in dnode{1..4};do rsync -aXSH --delete /usr/local/hadoop/etc/hadoop/ ${i}:/usr/local/hadoop/etc/hadoop/ & done
cat >/etc/hosts<<EOF
127.0.0.1 localhost
192.168.126.21 nnode1
192.168.126.22 dnode1
192.168.126.23 dnode2
192.168.126.70 dnode3
192.168.126.71 dnode4
EOF
启动新节点
[root@dnode4 ~]# cd /usr/local/hadoop/
[root@dnode4 hadoop]# ./sbin/hadoop-daemon.sh start datanode
starting datanode, logging to /usr/local/hadoop/logs/hadoop-root-datanode-dnode4.out
[root@dnode4 hadoop]# jps
1529 Jps
1451 DataNode
[root@dnode4 ~]# ./bin/hdfs dfsadmin -report
Configured Capacity: 158225932288 (147.36 GB)
Present Capacity: 84709842944 (78.89 GB)
DFS Remaining: 84708253696 (78.89 GB)
DFS Used: 1589248 (1.52 MB)
DFS Used%: 0.00%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0
Pending deletion blocks: 0
-------------------------------------------------
Live datanodes (4):
设置同步带宽,防止传输过来的带宽不够,然后同步数据
[root@dnode4 hadoop]# ./bin/hdfs dfsadmin -setBalancerBandwidth 60000000
Balancer bandwidth is set to 60000000
[root@dnode4 hadoop]# ./sbin/start-balancer.sh
starting balancer, logging to /usr/local/hadoop/logs/hadoop-root-balancer-dnode4.out
查看集群状态
[root@dnode4 hadoop]# ./bin/hdfs dfsadmin -report
修复节点
同增加节点,只需要恢复数据即可
删除节点
首先进行数据迁移,防止数据丢失的问题
数据节点 | DFS Used开始的数据大小 | 上传396M文件后 | 删除节点后 |
---|---|---|---|
dnode1 | 0 | 387M | 398M |
dnode2 | 0 | 129M | 130M |
dnode3 | 0 | 12M | 270M |
dnode4 | 0 | 269M | 269M |
上传数据
[root@nnode1 hadoop]# ./bin/hadoop fs -put /root/software/hadoop-2.10.2.tar.gz /hadoop.tar.gz
查看情况
[root@nnode1 hadoop]# ./bin/hdfs dfsadmin -report
[root@nnode1 hadoop]# echo $((387+129+12+269))
797
由于副本是2,数据大小基本是原来的2倍多
vim etc/hadoop/hdfs-site.xml
增加dfs.hosts.exclude配置
<property>
<name>dfs.hosts.exclude</name>
<value>/usr/local/hadoop/etc/hadoop/exclude</value>
<description>排除数据节点的文件路径</description>
</property>
增nexclude配置文件 写入要剃除的节点主机多
[root@nnode1 hadoop]# echo "dnode4">>/usr/local/hadoop/etc/hadoop/exclude
slaves文件种删除节点
[root@nnode1 hadoop]# ./bin/hdfs dfsadmin -refreshNodes
Refresh nodes successful
正在迁移:Decommission Status : Decommission in progress
迁移完成:Decommission Status : Decommissioned
彻底删除数据节点
[root@dnode4 ~]# reboot
或者删除jps的进程都可
要彻底看不到改节点,需要删除slaves节点内容重启dfs
[root@nnode1 hadoop]# ./sbin/stop-dfs.sh
Stopping namenodes on [nnode1]
nnode1: stopping namenode
dnode2: stopping datanode
dnode3: stopping datanode
dnode1: stopping datanode
Stopping secondary namenodes [nnode1]
nnode1: stopping secondarynamenode
[root@nnode1 hadoop]# ./sbin/start-dfs.sh
yarn manager节点管理
增加节点
拷贝hadoop配置后,目标节点直接启动
[root@dnode4 hadoop]# ./sbin/yarn-daemon.sh start nodemanager
starting nodemanager, logging to /usr/local/hadoop/logs/yarn-root-nodemanager-dnode4.out
管理节点查看即可
[root@nnode1 hadoop]# ./bin/yarn node -list
24/06/22 14:19:54 INFO client.RMProxy: Connecting to ResourceManager at nnode1/192.168.126.21:8032
Total Nodes:3
Node-Id Node-State Node-Http-Address Number-of-Running-Containers
dnode1:42648 RUNNING dnode1:8042 0
dnode2:35939 RUNNING dnode2:8042 0
dnode3:46093 RUNNING dnode3:8042 0
[root@nnode1 hadoop]# ./bin/yarn node -list
24/06/22 14:20:03 INFO client.RMProxy: Connecting to ResourceManager at nnode1/192.168.126.21:8032
Total Nodes:4
Node-Id Node-State Node-Http-Address Number-of-Running-Containers
dnode1:42648 RUNNING dnode1:8042 0
dnode2:35939 RUNNING dnode2:8042 0
dnode3:46093 RUNNING dnode3:8042 0
dnode4:45649 RUNNING dnode4:8042 0
删除节点
直接stop即可
[root@dnode4 hadoop]# ./sbin/yarn-daemon.sh stop nodemanager
stopping nodemanager
[root@nnode1 hadoop]# ./bin/yarn node -list
24/06/22 14:20:03 INFO client.RMProxy: Connecting to ResourceManager at nnode1/192.168.126.21:8032
Total Nodes:4
Node-Id Node-State Node-Http-Address Number-of-Running-Containers
dnode1:42648 RUNNING dnode1:8042 0
dnode2:35939 RUNNING dnode2:8042 0
dnode3:46093 RUNNING dnode3:8042 0
dnode4:45649 RUNNING dnode4:8042 0
[root@nnode1 hadoop]# ./bin/yarn node -list
24/06/22 14:22:38 INFO client.RMProxy: Connecting to ResourceManager at nnode1/192.168.126.21:8032
Total Nodes:3
Node-Id Node-State Node-Http-Address Number-of-Running-Containers
dnode1:42648 RUNNING dnode1:8042 0
dnode2:35939 RUNNING dnode2:8042 0
dnode3:46093 RUNNING dnode3:8042 0
Hadoop nfsgw网关
使用hdfs分布式文件系统需要使用hadoop客户端命令太麻烦了,使用nfs网关将其转成nfs,客户端可直接mount nfs即可使用
nfsgw网关:
- 兼容性和易用性
- NFS网关使得用户可以通过他们熟悉的、与操作系统兼容的NFSv3客户端来浏览和操作HDFS文件系统。
- 无需用户安装或学习额外的工具或接口,降低了使用HDFS的门槛。
- 数据下载
- 用户可以直接从HDFS文件系统下载文档到本地文件系统,通过NFS网关提供的透明访问方式。
- 数据流化
- 用户可以通过NFS挂载点直接流化数据,这在进行大数据处理和分析时非常有用。
- 支持文件附加(append)操作,允许用户向文件末尾添加数据。
- 写操作限制
- 虽然NFS网关支持文件附加操作,但它通常不支持随机写(random write)。这是因为HDFS是为大规模数据处理而设计的,其数据模型和架构不适合频繁的小规模随机写操作。
- 将HDFS作为文件系统挂载
- NFS网关允许HDFS作为客户端文件系统的一部分被挂载,这意味着用户可以将HDFS看作是本地文件系统的一个目录或分区,从而进行无缝的文件操作。
- 版本支持
- NFS网关通常支持NFSv3,这是NFS的一个广泛使用的版本,提供了许多基本的文件共享功能。
环境配置
yum install -y java-1.8.0-openjdk-devel
配置hosts
192.168.126.21 nnode1
192.168.126.22 dnode1
192.168.126.23 dnode2
192.168.126.70 dnode3
192.168.126.71 dnode4
192.168.126.51 fsgw
echo "export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.312.b07-2.el8_5.x86_64/jre" >> ~/.bashrc
source ~/.bashrc
echo $JAVA_HOME
前提nfs和本地nfs冲突,需要卸载rpcbind nfs
yum remove rpcbind
yum remove nfs-server
配置代理用户
在NameNode和NFSGW上添加代理用户
代理用户的UID,GID,用户名必须完全相同
如果因特殊原因客户端的用户和NFS网关的用户UID不
能保持一致,需要我们配置nfs.map的静态映射关系
nfs.map
uid 10 100 Map the remote UID 10 the local UID 100
gid 11 101 Map the remote GID 11 to the local GID 101
[root@fsgw ~]# hostnamectl set-hostname fsgw
[root@fsgw ~]# groupadd -g 800 nfsuser
[root@fsgw ~]# useradd -u 800 -g 800 -r -d /var/hadoop nfsuser
[root@nnode1 hadoop]# groupadd -g 800 nfsuser
[root@nnode1 hadoop]# useradd -u 800 -g 800 -r -d /var/hadoop nfsuser
停止集群所有服务
[root@nnode1 hadoop]# ./sbin/stop-all.sh
[root@nnode1 hadoop]# vim etc/hadoop/core-site.xml
<property>
<name>hadoop.proxyuser.nfsuser.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.nfsuser.hosts</name>
<value>*</value>
</property>
同步配置到其他节点
[root@nnode1 hadoop]# for i in dnode{1..3};do rsync -aXSH --delete /usr/local/hadoop/etc/hadoop/ ${i}:/usr/local/hadoop/etc/hadoop/ & done
重启节点,查看状态
[root@nnode1 hadoop]# ./sbin/start-dfs.sh
[root@nnode1 hadoop]# ./bin/hdfs dfsadmin -report
[root@fsgw ~]# rsync -aXSH nnode1:/usr/local/hadoop /usr/local/
[root@fsgw ~]# vim /usr/local/hadoop/etc/hadoop/hdfs-site.xml
<property>
<name>nfs.exports.allowd.hosts</name>
<value>* rw</value>
<description>rw读写权限,ro只读(默认)</description>
</property>
<property>
<name>dfs.dump.dir</name>
<value>/var/nfstmp</value>
<description>nfs缓存路径</description>
</property>
nfs.exports.allowed.hosts
- 默认情况下,export可以被任何客户端挂载,为了更好的控制访问,可以设置访问策路,通过空格来分割,机器名的格式可以是单一的主机、Java的正则表达式或者IPv4地址
- 使用rw或ro可以指定导出目录的读写或只读权限,
- 如果访问策略没被提供,默认为只读,每个条目使用
;
来分割
设置权限,因为logs默认是root的,设置nfsuser也有权限
[root@fsgw ~]# rm -f /usr/local/hadoop/logs/*
[root@fsgw ~]# setfacl -m nfsuser:rwx /usr/local/hadoop/logs
使用root用户启动portmap服务
[root@fsgw hadoop]# ./sbin/hadoop-daemon.sh --script bin/hdfs start portmap
starting portmap, logging to /usr/local/hadoop/logs/hadoop-root-portmap-fsgw.out
使用代理用户启动nfs3,必须先启动portmap
[root@fsgw hadoop]# sudo -u nfsuser ./sbin/hadoop-daemon.sh --script ./bin/hdfs start nfs3
starting nfs3, logging to /usr/local/hadoop/logs/hadoop-nfsuser-nfs3-fsgw.out
[root@fsgw hadoop]# jps
3570 Nfs3
3621 Jps
3318 Portmap
客户端使用
客户端安装nfs-utils
yum install nfs-utils
客户端mount测试
[root@dnode4 hadoop]# mount -t nfs -o vers=3,proto=tcp,nolock,noatime,noacl,sync 192.168.126.51:/ /mnt
[root@dnode4 hadoop]# ls /mnt/
hadoop.tar.gz oo ooresult system tmp touch.txt user
启动与挂载
- vers=3 已前NFS只能使用V3版本
- proto=tcp 仅使用TCP作为传输协议(不支持udp)
- nolock 不支持NLM(管理文件锁定的机制)
- noatime 禁用access time的时问更新
- noacl 禁用acl扩展权限
- sync (同步写入)最小化避免重排序写入造成不可预测吞吐量,上传大文件出现不可靠行为
报错及解决
DataNode起不来
报错如下:
2024-06-21 23:15:20,517 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in secureMain
java.net.BindException: Problem binding to [nnode1:50070] java.net.BindException: 无法指定被请求的地址; For more details see: http://wiki.apache.org/hadoop/BindException
原因是配置文件错误
hdfs-site.xml
<property>
<name>dfs.datanode.http.address</name>
<value>nnode1:50075</value>
<description>dataNode启动的主机</description>
</property>
改为,复制参数时复制错了
<property>
<name>dfs.namenode.http-address</name>
<value>nnode1:50075</value>
<description>namenode的端口地址</description>
</property>
查看日志排错
logs目录下
[root@nnode1 hadoop]# ls logs/
hadoop-root-namenode-nnode1.log hadoop-root-namenode-nnode1.out.4 hadoop-root-secondarynamenode-nnode1.out.2
hadoop-root-namenode-nnode1.out hadoop-root-namenode-nnode1.out.5 hadoop-root-secondarynamenode-nnode1.out.3
hadoop-root-namenode-nnode1.out.1 hadoop-root-secondarynamenode-nnode1.log hadoop-root-secondarynamenode-nnode1.out.4
hadoop-root-namenode-nnode1.out.2 hadoop-root-secondarynamenode-nnode1.out hadoop-root-secondarynamenode-nnode1.out.5
hadoop-root-namenode-nnode1.out.3 hadoop-root-secondarynamenode-nnode1.out.1 SecurityAuth-root.audit
hadoop-root-namenode-nnode1.log
hadoop+启动用户+角色+主机+log/out
log是日志,out是输出