Alluxio运维
Alluxio命令
alluxio fsadmin
# 查看服务状态
alluxio fsadmin report
# 查看挂掉的服务ip
alluxio fsadmin report capacity -lost
alluxio getConf
# 查看配置参数
alluxio getConf --master
Alluxio运维实战
Worker节点挂掉
查看服务状态,发现有一台worker节点丢失
查看丢失的节点是哪一台
$ alluxio fsadmin report capacity -lost
sjsysc-hh405-zbhx700w登录到丢失的worker节点,启动worker
$ ssh sjsysc-hh405-zbhx700w
$ alluxio-start.sh worker SudoMount
设置子目录挂载点
待Alluxio启动完毕之后,用户可以在挂载其他子目录,例如,将另一个hadoop集群的hdfs目录挂载到alluxio中。
当我们挂载配置不同的HDFS时候,可以在挂载的时候特别指定每一个HDFS所对应的配置信息(hdfs-site.xml,core-site.xml):
alluxio fs mount /ia_test hdfs://nameservice1/ia_test \
--option alluxio.underfs.hdfs.configuration=/opt/alluxio/hdfs/ia_conf/hdfs-site.xml:/opt/alluxio/hdfs/ia_conf/core-site.xml
挂载要求:
端口打通
(1) 需要打通alluxio集群到hdfs集群namenode 的8020端口
如果不打通此端口,则会报如下错误:
java.net.UnknownHostException: nameservice1
(2)需要打通alluxio集群到hdfs集群datanode的9866、9867端口
如果不打通此端口,则操作alluxio 文件时,会报如下错误:
Attempt 1 to load /hive/dwa_d_ia_basic_user_all/month_id=202105/day_id=19/prov_id=097/000016_0.gz failed because: Task execution failed: Could not obtain block: BP-467187067-10.177.36.3-1591087438300:blk_4563885290_3807183975 file=/user/alluxio_ia/dwa_d_ia_basic_user_all/month_id=202105/day_id=19/prov_id=097/000016_0.gz (Zero Copy GrpcDataReader)
Attempt 1 to load /hive/dwa_d_ia_basic_user_all/month_id=202105/day_id=19/prov_id=097/000083_0.gz failed because: Task execution failed: Could not obtain block: BP-467187067-10.177.36.3-1591087438300:blk_4564100089_3807398774 file=/user/alluxio_ia/dwa_d_ia_basic_user_all/month_id=202105/day_id=19/prov_id=097/000083_0.gz (Zero Copy GrpcDataReader)
Attempt 1 to load /hive/dwa_d_ia_basic_user_all/month_id=202105/day_id=19/prov_id=097/000115_0.gz failed because: Task execution failed: Could not obtain block: BP-467187067-10.177.36.3-1591087438300:blk_4564170915_3807469600 file=/user/alluxio_ia/dwa_d_ia_basic_user_all/month_id=202105/day_id=19/prov_id=097/000115_0.gz (Zero Copy GrpcDataReader)
Attempt 1 to load /hive/dwa_d_ia_basic_user_all/month_id=202105/day_id=19/prov_id=097/000079_0.gz failed because: Task execution failed: Could not obtain block: BP-467187067-10.177.36.3-1591087438300:blk_4564086733_3807385418 file=/user/alluxio_ia/dwa_d_ia_basic_user_all/month_id=202105/day_id=19/prov_id=097/000079_0.gz (Zero Copy GrpcDataReader)
Attempt 1 to load /hive/dwa_d_ia_basic_user_all/month_id=202105/day_id=19/prov_id=097/000041_0.gz failed because: Task execution failed: Could not obtain block: BP-467187067-10.177.36.3-1591087438300:blk_4563964409_3807263094 file=/user/alluxio_ia/dwa_d_ia_basic_user_all/month_id=202105/day_id=19/prov_id=097/000041_0.gz (Zero Copy GrpcDataReader)
Attempt 1 to load /hive/dwa_d_ia_basic_user_all/month_id=202105/day_id=19/prov_id=097/000103_0.gz failed because: Task execution failed: Could not obtain block: BP-467187067-10.177.36.3-1591087438300:blk_4564147300_3807445985 file=/user/alluxio_ia/dwa_d_ia_basic_user_all/month_id=202105/day_id=19/prov_id=097/000103_0.gz (Zero Copy GrpcDataReader)
Attempt 1 to load /hive/dwa_d_ia_basic_user_all/month_id=202105/day_id=19/prov_id=097/000046_0.gz failed because: Task execution failed: Could not obtain block: BP-467187067-10.177.36.3-1591087438300:blk_4563978019_3807276704 file=/user/alluxio_ia/dwa_d_ia_basic_user_all/month_id=202105/day_id=19/prov_id=097/000046_0.gz (Zero Copy GrpcDataReader)需要将hdfs配置文件发放到alluxio集群的所有节点上,并且配置文件及其所有父目录具有755权限。
否则挂载文件时会报如下错误:
java.net.UnknownHostException: nameservice1
如果只是mount hdfs目录,只需要将hdfs 配置文件发放到所有alluxio mastera节点即可,但是当操作alluxio 文件时,如果不讲hdfs配置文件发放到所有alluxio worker节点,则会报如下错误:
[alluxio@sjsysc-hh405-zbhx1135w ~]$ alluxio fs copyToLocal /hive-test/dm_m_ia_prefer_label_app_top5/month_id=202104/prov_id=084/000002_0.gz .
Failed to read block ID=287209160704 from tiered storage and UFS tier: com.google.common.util.concurrent.UncheckedExecutionException: java.lang.IllegalArgumentException: java.net.UnknownHostException: nameservice1 (Zero Copy GrpcDataReader)
注意:一般将配置文件放到 /opt 或者 /usr/local 这样的目录下,因为这样的目录都可执行权限,不要将配置文件放到 /home/用户/目录下,因为这个目录给父目录增加755权限的时候,ssh 免密登录会失效!!!
评论