【干货】使用hive应该知道基础点
正念的观察,让你又重回事物本身,看清自己的每一个动作、每一个瞬间,你将明白自己有权利快乐而从容地完成每一件事情!
《和繁重的工作一起修行》
一、数据类型
数值型 | |
tinyint | 1 byte 有符号整数 |
smallint | 2 byte 有符号整数 |
int/integer | 4 byte 有符号整数 |
bigint | 8 byte 有符号整数 |
float | 4 byte 单精度浮点数 |
double | 8 byte 双精度浮点数 |
decimal | 指定小数位,最大38位精度。例decimal(5,3) 5位数保留3位小数 |
numeric | 类似decimal,从3.0.0开始支持 |
日期时间类型 | |
timestamp | 年月日时分秒毫秒 |
date | 年月日 |
interval | |
String类型 | |
string | |
varchar | |
char | |
其他类型 | |
boolean | |
binary | |
复合类型 | |
数组 | array<data_type> |
maps | map<key_type,data_type> |
structs | struct<col_name:data_type[comment col_comment],...> |
二、数据库操作
1.建库语法
create (database/schema) [if not exist] database_name
[comment database_comment]
[location hdfs_path]
[with dbproperties (pro_name=pro_val,...)];
2.示例
普通创建
create database mydb1;
OK
Time taken: 2.73 seconds
指定hdfs地址
'/opt'; create database mydb2 location
OK
Time taken: 5.211 second
带有注释
"这是数据库3"; create database mydb3 comment
OK
Time taken: 2.07 second
带有属性
create database mydb4
> with dbproperties('pro1'='val1','pro2'='val2');
OK
Time taken: 2.086 second
如果不存在就创建
if not exists mydb2; create schema
OK
Time taken: 0.017 seconds
3.查看
hive> show databases;
OK
default
mydb
mydb1
mydb2
mydb3
mydb4
Time taken: 0.019 seconds, Fetched: 6 row(s)
hive> desc database mydb4;
OK
mydb4 hdfs://mycluster/user/hive/warehouse/mydb4.db root USER
Time taken: 0.028 seconds, Fetched: 1 row(s)
hive> desc database mydb3;
OK
mydb3 这是数据库3 hdfs://mycluster/user/hive/warehouse/mydb3.db root USER
Time taken: 0.018 seconds, Fetched: 1 row(s)
4.删库语法,默认restrict(严格模式)
drop (database/schema) [if exist] database_name
[restrict|cascade];
5.删库示例
drop database mydb1;
OK
Time taken: 2.146 seconds
show databases;
OK
default
mydb
mydb2
mydb3
mydb4
Time taken: 0.011 seconds, Fetched: 5 row(s)
三、创建表
1.建表语法
CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name -- (Note: TEMPORARY available in Hive 0.14.0 and later)
[(col_name data_type [column_constraint_specification] [COMMENT col_comment], ... [constraint_specification])]
[COMMENT table_comment]
[PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]
[CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS]
[SKEWED BY (col_name, col_name, ...) -- (Note: Available in Hive 0.10.0 and later)]
ON ((col_value, col_value, ...), (col_value, col_value, ...), ...)
[STORED AS DIRECTORIES]
[
[ROW FORMAT row_format]
[STORED AS file_format]
| STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)] -- (Note: Available in Hive 0.6.0 and later)
]
[LOCATION hdfs_path]
[TBLPROPERTIES (property_name=property_value, ...)] -- (Note: Available in Hive 0.6.0 and later)
[AS select_statement];
2.建表示例
普通建表
hive> create table t_user(
> uid int comment 'id',
> uname string comment '姓名',
> sex boolean comment '性别',
> birth date comment '生日',
> likes array<string> comment '兴趣',
> other map<string,string> comment '其他'
> );
OK
Time taken: 2.342 seconds
指定数据在hdfs文件中的分隔符
hive> create table t_user1(
> uid int comment 'id',
> uname string comment '姓名',
> sex boolean comment '性别',
> birth date comment '生日',
> likes array<string> comment '兴趣',
> other map<string,string> comment '其他'
> )
> row format
> delimited
> fields terminated by ','
> collection items terminated by '-'
> map keys terminated by ':'
> lines terminated by '\n';
OK
Time taken: 5.186 seconds
指定存储格式
hive> create table t_user2(
> uid int comment 'id',
> uname string comment '姓名',
> sex boolean comment '性别',
> birth date comment '生日',
> likes array<string> comment '兴趣',
> other map<string,string> comment '其他'
> )
> stored as orc;
OK
Time taken: 4.439 seconds
创建外部表
hive> create external table t_user3(
> uid int comment 'id',
> uname string comment '姓名',
> sex boolean comment '性别',
> birth date comment '生日',
> likes array<string> comment '兴趣',
> other map<string,string> comment '其他'
> );
OK
Time taken: 2.126 seconds
指定hdfs文件
hive> create table t_user4(
> uid int comment 'id',
> uname string comment '姓名',
> sex boolean comment '性别',
> birth date comment '生日',
> likes array<string> comment '兴趣',
> other map<string,string> comment '其他'
> )
> location '/user/root/user.txt';
OK
Time taken: 2.108 seconds
创建分区表
hive> create table t_user5(
> uid int comment 'id',
> uname string comment '姓名',
> birth date comment '生日',
> likes array<string> comment '兴趣',
> other map<string,string> comment '其他'
> )
> partitioned by (sex string comment '性别',age int comment '年龄');
OK
Time taken: 2.11 seconds
3.表数据存储格式
stored as avro | avro格式存储数据 |
stored as sequencefile | 压缩的序列文件存储 |
stored as jsonfile | json文件存储 |
stored as orc | orc文件格式,支持ACID事务并做了基于资源的优化。存储列级别的元数据 |
stored as parquet | parquet列存储格式 |
stored as textfile | 默认,普通文本文件格式, |
stored as rcfile | 列记录文件格式 |
stored by | 使用非本地表格式存储。创建或链接到非本地表 |
4.内部表和外部表
内部表 | 表或分区删除了,根表或分区关联的数据和元数据一并删除 |
外部表 | 在外部文件存储元数据及表结构,可以被外部进程管理和访问,可以访问存储于ASV或者远程HDFS上的数据。当表删除后不会删除数据 |
5.查看
desc formatted t_user1;
OK
# col_name data_type comment
uid int id
uname string 姓名
sex boolean 性别
birth date 生日
likes array<string> 兴趣
other map<string,string> 其他
# Detailed Table Information
Database: mydb
Owner: root
CreateTime: Mon Jul 05 21:21:01 CST 2021
LastAccessTime: UNKNOWN
Protect Mode: None
Retention: 0
Location: hdfs://mycluster/user/hive/warehouse/mydb.db/t_user1
Table Type: MANAGED_TABLE
Table Parameters:
transient_lastDdlTime 1625491261
# Storage Information
SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat: org.apache.hadoop.mapred.TextInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Compressed: No
Num Buckets: -1
Bucket Columns: []
Sort Columns: []
Storage Desc Params:
-
,
\n
:
,
Time taken: 0.12 seconds, Fetched: 35 row(s)
6.like和as建表
表1和表2表结构一样,没有数据
create table table_name_1 like table_name_2;
从表2查询出col1、col2列的数据集,并建立成一个名为表1的表
create table table_name1 as
select col1,col2 from table_name2
至此,hive基础操作就完成了,至于查询语法和常规的数据库操作大同小异,欢迎朋友留言指正。。。
评论