Pandas缺失值处理-判断和删除
二、缺失值判断 
DataFrame.isna() 
df = pd.DataFrame({'age': [5, 6, np.NaN],                   'born': [pd.NaT, pd.Timestamp('1939-05-27'),                            pd.Timestamp('1940-04-25')],                   'name': ['Alfred', 'Batman', ''],                   'toy': [None, 'Batmobile', 'Joker']})df   age       born    name        toy0  5.0        NaT  Alfred       None1  6.0 1939-05-27  Batman  Batmobile2  NaN 1940-04-25              Jokerdf.isna()     age   born   name    toy0  False   True  False   True1  False  False  False  False2   True  False  False  Falseser = pd.Series([5, 6, np.NaN])ser.isna()0    False1    False2     True# 但对于DataFrame我们更关心到底每列有多少缺失值 统计缺失值的个数df.isna().sum()age     1born    1name    0toy     1DataFrame.isnull() 
df.isnull()     age   born   name    toy0  False   True  False   True1  False  False  False  False2   True  False  False  False#统计某一列的缺失值个数df['age'].isnull().sum()1DataFrame.notna() 
df.notna()age born name toy0 True False True False1 True True True True2 False True True True
DataFrame.notnull() 
df.notnull()age born name toy0 True False True False1 True True True True2 False True True True
df.info()<class 'pandas.core.frame.DataFrame'>RangeIndex: 3 entries, 0 to 2Data columns (total 4 columns):# Column Non-Null Count Dtype--- ------ -------------- -----0 age 2 non-null float641 born 2 non-null datetime64[ns]2 name 3 non-null object3 toy 2 non-null objectdtypes: datetime64[ns](1), float64(1), object(2)memory usage: 224.0+ bytes
三、缺失值删除 
DataFrame.dropna 
DataFrame.dropna(axis=0, how='any', thresh=None,subset=None, inplace=False)
df = pd.DataFrame({"name": ['Alfred', 'Batman', 'Catwoman'],"toy": [np.nan, 'Batmobile', 'Bullwhip'],"born": [pd.NaT, pd.Timestamp("1940-04-25"),pd.NaT]})dfname toy born0 Alfred NaN NaT1 Batman Batmobile 1940-04-252 Catwoman Bullwhip NaT#删除包含缺失值的行df.dropna()name toy born1 Batman Batmobile 1940-04-25#删除包含缺失值的列,需要用到参数axis='columns'df.dropna(axis='columns')name0 Alfred1 Batman2 Catwomandf.dropna(how='all')name toy born0 Alfred NaN NaT1 Batman Batmobile 1940-04-252 Catwoman Bullwhip NaTdf.dropna(thresh=2)name toy born1 Batman Batmobile 1940-04-252 Catwoman Bullwhip NaTdf.dropna(subset=['name', 'born'])name toy born1 Batman Batmobile 1940-04-25df.dropna(inplace=True)dfname toy born1 Batman Batmobile 1940-04-25
··· END ···
评论
