【干货】使用hive应该知道基础点-技术圈

正念的观察，让你又重回事物本身，看清自己的每一个动作、每一个瞬间，你将明白自己有权利快乐而从容地完成每一件事情！

《和繁重的工作一起修行》

一、数据类型

数值型
tinyint	1 byte 有符号整数
smallint	2 byte 有符号整数
int/integer	4 byte 有符号整数
bigint	8 byte 有符号整数
float	4 byte 单精度浮点数
double	8 byte 双精度浮点数
decimal	指定小数位，最大38位精度。例decimal(5,3) 5位数保留3位小数
numeric	类似decimal，从3.0.0开始支持
日期时间类型
timestamp	年月日时分秒毫秒
date	年月日
interval
String类型
string
varchar
char
其他类型
boolean
binary
复合类型
数组	array<data_type>
maps	map<key_type,data_type>
structs	struct<col_name:data_type[comment col_comment],...>

二、数据库操作

1.建库语法

create (database/schema) [if not exist] database_name  [comment database_comment]  [location hdfs_path]  [with dbproperties (pro_name=pro_val,...)];

2.示例

普通创建hive> create database mydb1;OKTime taken: 2.73 seconds
指定hdfs地址hive> create database mydb2 location '/opt';OKTime taken: 5.211 second
带有注释hive> create database mydb3 comment "这是数据库3";OKTime taken: 2.07 second
带有属性hive> create database mydb4    > with dbproperties('pro1'='val1','pro2'='val2');OKTime taken: 2.086 second
如果不存在就创建hive> create schema if not exists mydb2;OKTime taken: 0.017 seconds

3.查看

hive> show databases;OKdefaultmydbmydb1mydb2mydb3mydb4Time taken: 0.019 seconds, Fetched: 6 row(s)
hive> desc database mydb4;OKmydb4           hdfs://mycluster/user/hive/warehouse/mydb4.db   root    USERTime taken: 0.028 seconds, Fetched: 1 row(s)
hive> desc database mydb3;OKmydb3   这是数据库3     hdfs://mycluster/user/hive/warehouse/mydb3.db   root    USERTime taken: 0.018 seconds, Fetched: 1 row(s)

4.删库语法，默认restrict(严格模式)

drop (database/schema) [if exist] database_name[restrict|cascade];

5.删库示例

hive> drop database mydb1;OKTime taken: 2.146 secondshive> show databases;OKdefaultmydbmydb2mydb3mydb4Time taken: 0.011 seconds, Fetched: 5 row(s)

三、创建表

1.建表语法

CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name    -- (Note: TEMPORARY available in Hive 0.14.0 and later)  [(col_name data_type [column_constraint_specification] [COMMENT col_comment], ... [constraint_specification])]  [COMMENT table_comment]  [PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]  [CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS]  [SKEWED BY (col_name, col_name, ...)                  -- (Note: Available in Hive 0.10.0 and later)]     ON ((col_value, col_value, ...), (col_value, col_value, ...), ...)     [STORED AS DIRECTORIES]  [   [ROW FORMAT row_format]    [STORED AS file_format]     | STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)]  -- (Note: Available in Hive 0.6.0 and later)  ]  [LOCATION hdfs_path]  [TBLPROPERTIES (property_name=property_value, ...)]   -- (Note: Available in Hive 0.6.0 and later)  [AS select_statement];

2.建表示例

普通建表hive> create table t_user(    > uid int comment 'id',    > uname string comment '姓名',    > sex boolean comment '性别',    > birth date comment '生日',    > likes array<string> comment '兴趣',    > other map<string,string> comment '其他'    > );OKTime taken: 2.342 seconds
指定数据在hdfs文件中的分隔符hive> create table t_user1(    >   uid int comment 'id',    >   uname string comment '姓名',    >   sex boolean comment '性别',    >   birth date comment '生日',    >   likes array<string> comment '兴趣',    >   other map<string,string> comment '其他'    > )    > row format    > delimited    >  fields terminated by ','    >  collection items terminated by '-'    >  map keys terminated by ':'    >  lines terminated by '\n';OKTime taken: 5.186 seconds
指定存储格式hive> create table t_user2(    >   uid int comment 'id',    >   uname string comment '姓名',    >   sex boolean comment '性别',    >   birth date comment '生日',    >   likes array<string> comment '兴趣',    >   other map<string,string> comment '其他'    > )    > stored as orc;OKTime taken: 4.439 seconds
创建外部表hive> create external table t_user3(    >   uid int comment 'id',    >   uname string comment '姓名',    >   sex boolean comment '性别',    >   birth date comment '生日',    >   likes array<string> comment '兴趣',    >   other map<string,string> comment '其他'    > );OKTime taken: 2.126 seconds
指定hdfs文件hive> create table t_user4(    >   uid int comment 'id',    >   uname string comment '姓名',    >   sex boolean comment '性别',    >   birth date comment '生日',    >   likes array<string> comment '兴趣',    >   other map<string,string> comment '其他'    > )    > location '/user/root/user.txt';OKTime taken: 2.108 seconds
创建分区表hive> create table t_user5(    >   uid int comment 'id',    >   uname string comment '姓名',    >   birth date comment '生日',    >   likes array<string> comment '兴趣',    >   other map<string,string> comment '其他'    > )    > partitioned by (sex string comment '性别',age int comment '年龄');OKTime taken: 2.11 seconds

3.表数据存储格式

stored as avro	avro格式存储数据
stored as sequencefile	压缩的序列文件存储
stored as jsonfile	json文件存储
stored as orc	orc文件格式，支持ACID事务并做了基于资源的优化。存储列级别的元数据
stored as parquet	parquet列存储格式
stored as textfile	默认，普通文本文件格式， ‍
stored as rcfile	列记录文件格式
stored by‍	使用非本地表格式存储。创建或链接到非本地表

4.内部表和外部表

内部表	表或分区删除了，根表或分区关联的数据和元数据一并删除
外部表	在外部文件存储元数据及表结构，可以被外部进程管理和访问，可以访问存储于ASV或者远程HDFS上的数据。当表删除后不会删除数据

5.查看

hive> desc formatted t_user1;OK# col_name              data_type               comment
uid                     int                     iduname                   string                  姓名sex                     boolean                 性别birth                   date                    生日likes                   array<string>           兴趣other                   map<string,string>      其他
# Detailed Table InformationDatabase:               mydbOwner:                  rootCreateTime:             Mon Jul 05 21:21:01 CST 2021LastAccessTime:         UNKNOWNProtect Mode:           NoneRetention:              0Location:               hdfs://mycluster/user/hive/warehouse/mydb.db/t_user1Table Type:             MANAGED_TABLETable Parameters:        transient_lastDdlTime   1625491261
# Storage InformationSerDe Library:          org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDeInputFormat:            org.apache.hadoop.mapred.TextInputFormatOutputFormat:           org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormatCompressed:             NoNum Buckets:            -1Bucket Columns:         []Sort Columns:           []Storage Desc Params:        colelction.delim        -        field.delim             ,        line.delim              \n        mapkey.delim            :        serialization.format    ,Time taken: 0.12 seconds, Fetched: 35 row(s)

6.like和as建表

表1和表2表结构一样，没有数据create table table_name_1 like table_name_2;
从表2查询出col1、col2列的数据集，并建立成一个名为表1的表create table table_name1 as   select col1,col2 from table_name2

至此，hive基础操作就完成了，至于查询语法和常规的数据库操作大同小异，欢迎朋友留言指正。。。

END