Pandas缺失值处理-判断和删除
二、缺失值判断
DataFrame.isna()
df = pd.DataFrame({'age': [5, 6, np.NaN], 'born': [pd.NaT, pd.Timestamp('1939-05-27'), pd.Timestamp('1940-04-25')], 'name': ['Alfred', 'Batman', ''], 'toy': [None, 'Batmobile', 'Joker']})df age born name toy0 5.0 NaT Alfred None1 6.0 1939-05-27 Batman Batmobile2 NaN 1940-04-25 Jokerdf.isna() age born name toy0 False True False True1 False False False False2 True False False Falseser = pd.Series([5, 6, np.NaN])ser.isna()0 False1 False2 True# 但对于DataFrame我们更关心到底每列有多少缺失值 统计缺失值的个数df.isna().sum()age 1born 1name 0toy 1
DataFrame.isnull()
df.isnull() age born name toy0 False True False True1 False False False False2 True False False False#统计某一列的缺失值个数df['age'].isnull().sum()1
DataFrame.notna()
df.notna()
age born name toy
0 True False True False
1 True True True True
2 False True True True
DataFrame.notnull()
df.notnull()
age born name toy
0 True False True False
1 True True True True
2 False True True True
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 4 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 age 2 non-null float64
1 born 2 non-null datetime64[ns]
2 name 3 non-null object
3 toy 2 non-null object
dtypes: datetime64[ns](1), float64(1), object(2)
memory usage: 224.0+ bytes
三、缺失值删除
DataFrame.dropna
DataFrame.dropna(axis=0, how='any', thresh=None,
subset=None, inplace=False)
df = pd.DataFrame({"name": ['Alfred', 'Batman', 'Catwoman'],
"toy": [np.nan, 'Batmobile', 'Bullwhip'],
"born": [pd.NaT, pd.Timestamp("1940-04-25"),
pd.NaT]})
df
name toy born
0 Alfred NaN NaT
1 Batman Batmobile 1940-04-25
2 Catwoman Bullwhip NaT
#删除包含缺失值的行
df.dropna()
name toy born
1 Batman Batmobile 1940-04-25
#删除包含缺失值的列,需要用到参数axis='columns'
df.dropna(axis='columns')
name
0 Alfred
1 Batman
2 Catwoman
df.dropna(how='all')
name toy born
0 Alfred NaN NaT
1 Batman Batmobile 1940-04-25
2 Catwoman Bullwhip NaT
df.dropna(thresh=2)
name toy born
1 Batman Batmobile 1940-04-25
2 Catwoman Bullwhip NaT
df.dropna(subset=['name', 'born'])
name toy born
1 Batman Batmobile 1940-04-25
df.dropna(inplace=True)
df
name toy born
1 Batman Batmobile 1940-04-25
··· END ···
评论