Hadoop 3.3.6 + Tez 0.10.3 + Hive 4.0.0 安装指南

根据 Hive 官方发布说明,Hive 4.0.0 兼容 Hadoop 3.3.6 和 Tez 0.10.3,尝试搭建了一套单节点的环境用于学习

一、系统安装及配置

  1. 操作系统:RHEL 9.4
  2. IP地址: 192.168.1.10
  3. 主机名: hadoop
  4. 创建 hadoop 用户
    # 创建家目录
    mkdir /user
    # 创建用户
    useradd -m -d /user/hadoop hadoop
    # 设置密码
    passwd hadoop
    
  5. 配置 SSH 免密登录
    # 切换到 hadoop 用户
    su hadoop
    # 生成 SSH 密钥对
    ssh-keygen -t rsa
    # 复制公钥
    ssh-copy-id hadoop
    
  6. 安装 Java 8
    # 解压安装包
    tar xvf jdk-8u411-linux-x64.tar.gz -C /opt
    # 配置环境变量
    vim /etc/profile.d/hadoop.sh
    # JAVA
    export JAVA_HOME=/opt/jdk1.8.0_411
    export PATH=$PATH:$JAVA_HOME/bin
    
  7. 安装 MySQL
    # 安装 MySQL 服务
    yum -y install mysql-server
    # 启动 MySQL
    systemctl start mysqld
    # 设置开机自动启动
    systemctl enable mysqld
    # 设置 root 密码
    mysqladmin -u root -p password
    

二、Hadoop 3.3.6

  1. 配置环境变量
    vim /etc/profile.d/hadoop.sh
    # HADOOP
    export HADOOP_HOME=/opt/hadoop-3.3.6
    export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
    
  2. 解压安装包
    source /etc/profile
    tar xvf hadoop-3.3.6.tar.gz -C /opt
    chown -R hadoop:hadoop $HADOOP_HOME
    
  3. 修改配置文件
    • $HADOOP_HOME/etc/hadoop/workers
    hadoop
    
    • $HADOOP_HOME/etc/hadoop/core-site.xml
    <configuration>
        <property>
            <name>fs.defaultFS</name>
            <value>hdfs://hadoop:9000</value>
        </property>
        <property>
            <name>hadoop.proxyuser.hadoop.hosts</name>
            <value>*</value>
        </property>
        <property>
            <name>hadoop.proxyuser.hadoop.groups</name>
            <value>*</value>
        </property>
    </configuration>
    
    • $HADOOP_HOME/etc/hadoop/hdfs-site.xml
    <configuration>
        <property>
            <name>dfs.replication</name>
            <value>1</value>
        </property>
    </configuration>
    
    • $HADOOP_HOME/etc/hadoop/mapred-site.xml
    <configuration>
        <property>
            <name>mapreduce.framework.name</name>
            <value>yarn</value>
        </property>
        <property>
            <name>mapreduce.application.classpath</name>
            <value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*</value>
        </property>
    </configuration>
    
    • $HADOOP_HOME/etc/hadoop/yarn-site.xml
    <configuration>
        <property>
            <name>yarn.nodemanager.aux-services</name>
            <value>mapreduce_shuffle</value>
        </property>
        <property>
            <name>yarn.nodemanager.env-whitelist</name>
            <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_HOME,PATH,LANG,TZ,HADOOP_MAPRED_HOME</value>
        </property>
    </configuration>
    
  4. 初始化 namenode
    hdfs namenode -format
    
  5. 启动 HDFS 和 YARN
    start-dfs.sh
    start-yarn.sh
    
  6. 创建家目录
    hdfs dfs -mkdir -p /user/hadoop
    

三、Tez 0.10.3

  1. 配置环境变量
    vim /etc/profile.d/hadoop.sh
    # TEZ
    export TEZ_HOME=/opt/tez-0.10.3
    export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$TEZ_HOME/*:$TEZ_HOME/lib/*
    
  2. 解压安装包
    # 解压
    source /etc/profile
    tar xvf apache-tez-0.10.3-bin.tar.gz
    mv apache-tez-0.10.3-bin $TEZ_HOME
    chown -R hadoop:hadoop /opt/tez-0.10.3
    # 将 tez 压缩包上传到 HDFS
    cd $TEZ_HOME/share
    hdfs dfs -mkdir /user/tez
    hdfs dfs -put tez.tar.gz /user/tez
    # 删除多余的日志jar包
    cd $TEZ_HOME/lib
    rm slf4j-reload4j-1.7.36.jar
    
  3. 修改配置文件 $TEZ_HOME/conf/tez-site.xml
    <configuration>
        <property>
            <name>tez.lib.uris</name>
            <value>hdfs://hadoop:9000/user/tez/tez.tar.gz</value>
        </property>
    </configuration>
    

四、Hive 4.0.0

  1. 配置环境变量
    # HIVE
    export HIVE_HOME=/opt/hive-4.0.0
    export PATH=$PATH:$HIVE_HOME/bin
    
  2. 解压安装包
    source /etc/profile
    tar xvf apache-hive-4.0.0-bin.tar.gz
    mv apache-hive-4.0.0-bin /opt/hive-4.0.0
    cp mysql-connector-j-8.4.0.jar /opt/hive-4.0.0/lib
    chown -R hadoop:hadoop /opt/hive-4.0.0
    cd $HIVE_HOME/lib
    rm log4j-slf4j-impl-2.18.0.jar
    
  3. 修改配置文件,初始化数据库
    • $HIVE_HOME/conf/hive-site.xml
    	<configuration>
        <property>
            <name>javax.jdo.option.ConnectionURL</name>
            <value>jdbc:mysql://hadoop:3306/hive</value>
        </property>
        <property>
            <name>javax.jdo.option.ConnectionDriverName</name>
            <value>com.mysql.cj.jdbc.Driver</value>
        </property>
        <property>
            <name>javax.jdo.option.ConnectionUserName</name>
            <value>hive</value>
        </property>
        <property>
            <name>javax.jdo.option.ConnectionPassword</name>
            <value>Mysql.123</value>
        </property>
        <property>
            <name>hive.execution.engine</name>
            <value>tez</value>
        </property>
    </configuration>
    
    # 创建hive元数据库
    mysql -u root -p
    CREATE DATABASE hive;
    CREATE USER 'hive'@'%' IDENTIFIED BY 'Mysql.123';
    GRANT ALL PRIVILEGES ON hive.* TO 'hive'@'%';
    FLUSH PRIVILEGES;
    
    # 初始化数据库
    schematool -dbType mysql -initSchema
    # 创建表空间
    hdfs dfs -mkdir -p /user/hive/warehouse
    
  4. 启动 metastore 和 hiveserver2 服务
    mkdir $HIVE_HOME/logs
    nohup hive --service metastore > $HIVE_HOME/logs/metastore.log 2>&1 &
    nohup hive --service hiveserver2 > $HIVE_HOME/logs/hiveserver2.log 2>&1 &
    
  5. 连接测试
    beeline -u jdbc:hive2://hadoop:10000 -n hadoop