2019年秋招部分大厂SQL题
01
某滴有一张订单表ordr_list,共有三列:ordr_id(订单id)-- bigint,uid(乘客id)-- bigint,start_time(发单时间)-- datetime,对于同一个乘客,每个订单的发单时间均不同。
解题要求:
1)写SQL求得每个用户最早发单的订单id,每个乘客一行;
2)输出两列:uid(乘客id),ordr_id(最早发单的订单id);
3)可使用union、left join、case when等,不能使用右连接、全连接、row_number()、rank()
select a.uid, a.ordr_id
from ordr_list as a
left join
( select uid, min(start_time) as min_time
from ordr_list
group by uid
) as b
on a.uid = b.uid and a.start_time = b.min_time;
1.做题时遵守题目要求;
2.注意题中给的数据库系统的类型,此题给的是SQLite3,有些函数不能用或不一致;
select uid, min(ordr_id)
from
( select a.uid, a.ordr_id
from ordr_list as a
left join
( select uid, min(start_time) as min_time
from ordr_list
group by uid
) as b
on a.uid = b.uid and a.start_time =b.min_time
)t
group by uid;
解法1:
with tmp as (
select user_id, count(distinct dt) as sum_dt, sum(amt) as sum_amt
from table1
group by user_id
)
select user_id, sum_dt, sum_amt, '购买天数最多' as info
from tmp
where sum_dt = (select max(sum_dt) from tmp)
union all
select user_id, sum_dt, sum_amt, '购买金额最多' as info
from tmp
where sum_amt = (select max(sum_amt) from tmp);
解法2:
select TOP 1 user_id, count(distinct dt), sum(amt), '购买天数最多' as info
from table1
group by user_id
order by count(distinct dt) desc
union all
select TOP 1 user_id, count(), sum(amt), '购买金额最多' as info
from table1
group by user_id
order by sum(amt) desc;
某浪有一用户观看视频的行为记录表user_video,一次观看即一条记录,表结构大致如下,其中date,uid,vid为主键:
date(日期) | uid(用户id) | vid(视频id) | tag(视频标签) |
20180711 | 1001 | 2001 | 体育 |
20180711 | 1002 | 2002 | 娱乐 |
... | ... | ... | ... |
每天新增用户的信息存于新增用户表中,表名为new_user_info,表结构大致如下,其中uid为主键:
uid(用户id) | new_date(日期) | new_chl(渠道) | new_chl_type(渠道类型) |
1013 | 2018-06-30 | 11010 | 应用宝 |
1015 | 2018-07-01 | 12010 | 应用商店 |
... | ... | ... | ... |
用户信息存于用户信息表中,表名为user_info,表结构大致如下,其中uid为主键:
uid(用户id) | gender(性别) | age(年龄) | city(城市) |
1001 | 女 | 20 | 北京 |
1002 | 男 | 25 | 上海 |
... | ... | ... | ... |
1) 抽取各渠道类型的新增用户在7月10日,7月11日的人均观看次数,统计7月11日人均观看次数对比7月10日的涨幅,输出结果:第一列为渠道类型,第二列为人均观看次数,第三列为涨幅(保留4位小数,四舍五入)。
select new_chl_type
,round(sum(vid)/count(distinct uid), 4)
,round((sum(case when date = '20200711' then vid else 0 end)/count(distinct case when date = '20200711' then uid else null end))
/(sum(case when date = '20200710' then vid else 0 end)/count(distinct case when date = '20200710' then uid else null end)- 1, 4)
from
(
select a.uid, a.new_chl_type, b.date, b.vid
from new_user_info as a
left join
(
select date, uid, count(vid) as vid
from user_video
where date <= '20190711' and date >= '20190710'
group by date, uid
) as b
on a.uid = b.uid
) as t
group by new_chl_type;
2) 取25岁以下女性用户在美食标签2019年8月1日-7日观看视频次数超过10次且8月8日观看次数小于3次的用户,输出结果:输出符合条件的用户id。
select a.uid
from
(
select uid
from user_info
where age < 25 and gender = '女'
group by uid
) as a
left join
(
select uid, count(vid)
from user_video
where tag = '美食' and date <= '20190807' and date >= '20190801'
group by uid
having count(vid) > 10
) as b
on a.uid = b.uid
left join
(
select uid
from user_video
where tag = '美食' and date = '20190808'
group by uid
having count(uid) < 3
) as c
on a.uid = c.uid
where b.uid is not null and c.uid is not null;
04
select a.clk_id, b.ordr_id
from
(
select *
from
(
select *, row_number() over(partition by user_id, clk_goods order by clk_time desc) as rnk
from clk_tbl
) as t
where rnk = 1
) as a
left join ordr_tbl as b
on a.user_id = b.user_id
and a.clk_goods = b.ordr_goods
and a.clk_time < b.ordr_time;
注意:
不要丢失重要的限定条件,比如对于成交来说,点击时间肯定是早于下单时间的。
05
附赠一道求留存的题目,模糊记忆,是提前批的时候英语流利说考到的,那时候那道题可是难倒了周围不少小伙伴。不是题目本身难,而是因为不熟悉留存的定义,无法向下进行,而这道题目对于实习过的同学来说可能就比较友好了。
有一张活跃用户表active_user,包括两个字段:pt('年-月-日'),user_id(用户id),求当天用户在第二天、第三天的回访比例。
select a.pt
,count(distinct case when a.pt = date_sub(b.pt, 1) then a.user_id else null end)/count(distinct a.user_id) as '第二天回访比例'
,count(distinct case when a.pt = date_sub(b.pt, 2) then a.user_id else null end)/count(distinct a.user_id) as '第三天回访比例'
from active_user as a
left join active_user as b
on a.user_id = b.user_id
group by a.pt;
注意:
第二天的回访比例也就是常说的次留(次日留存率),上述方法就是以user_id为连接键,通过判断当天活跃的用户在第二天(第n天)还在不在来计数求取。除求留存外,求新增用户也常出现在笔面试中。
原文链接在这:解一下TMD几道热门数据分析面试题。
select user_name, goods_kind
from
(
select user_name, goods_kind, num, row_number() over(partition by user_name order by num desc) as rnk
from
(
select user_name, goods_kind, count(1) as num
from user_goods_table
group by user_name, goods_kind
) as t1
) as t2
where rnk = 1;
写在最后:
1.对于SQL题目,其实正经来说,比算法coding要简单的多。最基本的要求就是知道基础的函数及使用方法,就像做数学题一样,遇到什么类型的题目,需要用到什么样的解法,做的多了,自然也就熟悉了,也就会更熟练。
2.做题时注意题目中所给的条件,一些条件是显性的,如“超过”、“小于”、“最早”、“最多”、“几分之几”、“前百分之多少”等等;一些条件是隐性的,需要自己细心才能注意到,如上题中“点击时间小于下单时间”,还有一些题目中“快递邮寄时间早于快递接收时间”等。
3.有自己良好的书写习惯和风格,是先写大框架再填补,还是从内而外书写SQL语句,是通过空格或tab进行缩进来标志不同的子查询还是直接通过括号就能识别出来。好的习惯不仅给别人的可读性好,给自己在遇到逻辑毕竟复杂的题目时,也留有了后路。
4.做完题目记得复查。子查询是否定义了别名、表连接时select出两表中同名字段时需要加表别名、各个条件是否已与题目中条件一一对应等等,犯错的次数多了,记忆会更加深刻,也会更加注意,要学会总结。
5.最后再强调一次,想刷SQL题目,除了牛客和leetcode,请从牛客上找笔面经中的SQL题目练习,从各种公众号推文中练习。以后再有人问我从哪里练习SQL题,此种问题一概不回答。
6.上述题目有错误的地方或更好的方法欢迎指正。