【干货】使用hive应该知道基础点

正念的观察,让你又重回事物本身,看清自己的每一个动作、每一个瞬间,你将明白自己有权利快乐而从容地完成每一件事情!
《和繁重的工作一起修行》
一、数据类型
| 数值型 | |
| tinyint | 1 byte 有符号整数 | 
| smallint | 2 byte 有符号整数 | 
| int/integer | 4 byte 有符号整数 | 
| bigint | 8 byte 有符号整数 | 
| float | 4 byte 单精度浮点数 | 
| double | 8 byte 双精度浮点数 | 
| decimal | 指定小数位,最大38位精度。例decimal(5,3) 5位数保留3位小数 | 
| numeric | 类似decimal,从3.0.0开始支持 | 
| 日期时间类型 | |
| timestamp | 年月日时分秒毫秒 | 
| date | 年月日 | 
| interval | |
| String类型 | |
| string | |
| varchar | |
| char | |
| 其他类型 | |
| boolean | |
| binary | |
| 复合类型 | |
| 数组 | array<data_type> | 
| maps | map<key_type,data_type> | 
| structs | struct<col_name:data_type[comment col_comment],...> | 
二、数据库操作
1.建库语法
create (database/schema) [if not exist] database_name[comment database_comment][location hdfs_path][with dbproperties (pro_name=pro_val,...)];
2.示例
普通创建create database mydb1;OKTime taken: 2.73 seconds指定hdfs地址create database mydb2 location '/opt';OKTime taken: 5.211 second带有注释create database mydb3 comment "这是数据库3";OKTime taken: 2.07 second带有属性create database mydb4> with dbproperties('pro1'='val1','pro2'='val2');OKTime taken: 2.086 second如果不存在就创建create schema if not exists mydb2;OKTime taken: 0.017 seconds
3.查看
hive> show databases;OKdefaultmydbmydb1mydb2mydb3mydb4Time taken: 0.019 seconds, Fetched: 6 row(s)hive> desc database mydb4;OKmydb4 hdfs://mycluster/user/hive/warehouse/mydb4.db root USERTime taken: 0.028 seconds, Fetched: 1 row(s)hive> desc database mydb3;OKmydb3 这是数据库3 hdfs://mycluster/user/hive/warehouse/mydb3.db root USERTime taken: 0.018 seconds, Fetched: 1 row(s)
4.删库语法,默认restrict(严格模式)
drop (database/schema) [if exist] database_name[restrict|cascade];
5.删库示例
drop database mydb1;OKTime taken: 2.146 secondsshow databases;OKdefaultmydbmydb2mydb3mydb4Time taken: 0.011 seconds, Fetched: 5 row(s)
三、创建表
1.建表语法
CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name -- (Note: TEMPORARY available in Hive 0.14.0 and later)[(col_name data_type [column_constraint_specification] [COMMENT col_comment], ... [constraint_specification])][COMMENT table_comment][PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)][CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS][SKEWED BY (col_name, col_name, ...) -- (Note: Available in Hive 0.10.0 and later)]ON ((col_value, col_value, ...), (col_value, col_value, ...), ...)[STORED AS DIRECTORIES][[ROW FORMAT row_format][STORED AS file_format]| STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)] -- (Note: Available in Hive 0.6.0 and later)][LOCATION hdfs_path][TBLPROPERTIES (property_name=property_value, ...)] -- (Note: Available in Hive 0.6.0 and later)[AS select_statement];
2.建表示例
普通建表hive> create table t_user(> uid int comment 'id',> uname string comment '姓名',> sex boolean comment '性别',> birth date comment '生日',> likes array<string> comment '兴趣',> other map<string,string> comment '其他'> );OKTime taken: 2.342 seconds指定数据在hdfs文件中的分隔符hive> create table t_user1(> uid int comment 'id',> uname string comment '姓名',> sex boolean comment '性别',> birth date comment '生日',> likes array<string> comment '兴趣',> other map<string,string> comment '其他'> )> row format> delimited> fields terminated by ','> collection items terminated by '-'> map keys terminated by ':'> lines terminated by '\n';OKTime taken: 5.186 seconds指定存储格式hive> create table t_user2(> uid int comment 'id',> uname string comment '姓名',> sex boolean comment '性别',> birth date comment '生日',> likes array<string> comment '兴趣',> other map<string,string> comment '其他'> )> stored as orc;OKTime taken: 4.439 seconds创建外部表hive> create external table t_user3(> uid int comment 'id',> uname string comment '姓名',> sex boolean comment '性别',> birth date comment '生日',> likes array<string> comment '兴趣',> other map<string,string> comment '其他'> );OKTime taken: 2.126 seconds指定hdfs文件hive> create table t_user4(> uid int comment 'id',> uname string comment '姓名',> sex boolean comment '性别',> birth date comment '生日',> likes array<string> comment '兴趣',> other map<string,string> comment '其他'> )> location '/user/root/user.txt';OKTime taken: 2.108 seconds创建分区表hive> create table t_user5(> uid int comment 'id',> uname string comment '姓名',> birth date comment '生日',> likes array<string> comment '兴趣',> other map<string,string> comment '其他'> )> partitioned by (sex string comment '性别',age int comment '年龄');OKTime taken: 2.11 seconds
3.表数据存储格式
| stored as avro | avro格式存储数据 | 
| stored as sequencefile | 压缩的序列文件存储 | 
| stored as jsonfile | json文件存储 | 
| stored as orc | orc文件格式,支持ACID事务并做了基于资源的优化。存储列级别的元数据 | 
| stored as parquet | parquet列存储格式 | 
| stored as textfile | 默认,普通文本文件格式,  | 
| stored as rcfile | 列记录文件格式 | 
| stored by | 使用非本地表格式存储。创建或链接到非本地表 | 
4.内部表和外部表
| 内部表 | 表或分区删除了,根表或分区关联的数据和元数据一并删除 | 
| 外部表 | 在外部文件存储元数据及表结构,可以被外部进程管理和访问,可以访问存储于ASV或者远程HDFS上的数据。当表删除后不会删除数据 | 
5.查看
desc formatted t_user1;OK# col_name data_type commentuid int iduname string 姓名sex boolean 性别birth date 生日likes array<string> 兴趣other map<string,string> 其他# Detailed Table InformationDatabase: mydbOwner: rootCreateTime: Mon Jul 05 21:21:01 CST 2021LastAccessTime: UNKNOWNProtect Mode: NoneRetention: 0Location: hdfs://mycluster/user/hive/warehouse/mydb.db/t_user1Table Type: MANAGED_TABLETable Parameters:transient_lastDdlTime 1625491261# Storage InformationSerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDeInputFormat: org.apache.hadoop.mapred.TextInputFormatOutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormatCompressed: NoNum Buckets: -1Bucket Columns: []Sort Columns: []Storage Desc Params:-,\n:,Time taken: 0.12 seconds, Fetched: 35 row(s)
6.like和as建表
表1和表2表结构一样,没有数据create table table_name_1 like table_name_2;从表2查询出col1、col2列的数据集,并建立成一个名为表1的表create table table_name1 asselect col1,col2 from table_name2
至此,hive基础操作就完成了,至于查询语法和常规的数据库操作大同小异,欢迎朋友留言指正。。。
END评论
