【数据竞赛】Kaggle秘技，用Sigmoid函数做回归问题！-技术圈

作者: 尘沙樱落,杰少

基于Sigmoid的回归损失函数设计

背景

这是一个非常有意思的Loss设计，在你的问题是回归问题的时候，都可以考虑尝试使用一下，并不能保证所有的问题都能奏效，但是在某些特定的问题中却可以带来巨大的提升，最不济也可以作为一个用于后期stacking的方案。

该方案是设计者是：数据科学家danzel ，作者对于该设计奏效的原因描述如下，

I used a sigmoid-output and scaled its range afterwards (to look like the target). Training like this helps the model to converge faster and gives better results.

设计思路

假设对于我们的回归的问题为最小化平方损失，而且我们第个标签为,

, 为我们的样本个数；

1. Baseline Loss

一般都是Dense(1,activation = 'linear')的形式

2. 基于Sigmoid的Loss

是Dense(1,activation = 'sigmoid') * (max_val - min_val) + min_val的形式;
,

案例

上面说的究竟靠谱不靠谱呢？我们摘取kaggle数据进行实验，眼见为真。有兴趣的朋友可以去文末链接下载。

1.导入工具包

1.1导入使用的工具包

import pandas                as pd 
from sklearn.metrics         import mean_squared_error
from sklearn.model_selection import KFold
import xgboost               as xgb
from   tqdm                  import tqdm
import numpy                 as np
import pandas                as pd 
import tensorflow            as tf 
from lightgbm                import LGBMRegressor
from sklearn.model_selection import KFold
import numpy                 as np
import seaborn               as sns
from sklearn.metrics         import mean_squared_error

def RMSE(y_true, y_pred):
    return tf.sqrt(tf.reduce_mean(tf.square(y_true - y_pred)))

1.2 数据读取

train = pd.read_csv('./data/train.csv')
test  = pd.read_csv('./data/test.csv')
sub   = pd.read_csv('./data/sample_submission.csv')

2. 数据预处理

2.1 数据拼接

train_test = pd.concat([train,test],axis=0,ignore_index=True)

train_test.head()

	id	cont1	cont2	cont3	cont4	cont5	cont6	cont7	cont8	cont9	cont10	cont11	cont12	cont13	cont14	target
0	1	0.670390	0.811300	0.643968	0.291791	0.284117	0.855953	0.890700	0.285542	0.558245	0.779418	0.921832	0.866772	0.878733	0.305411	7.243043
1	3	0.388053	0.621104	0.686102	0.501149	0.643790	0.449805	0.510824	0.580748	0.418335	0.432632	0.439872	0.434971	0.369957	0.369484	8.203331
2	4	0.834950	0.227436	0.301584	0.293408	0.606839	0.829175	0.506143	0.558771	0.587603	0.823312	0.567007	0.677708	0.882938	0.303047	7.776091
3	5	0.820708	0.160155	0.546887	0.726104	0.282444	0.785108	0.752758	0.823267	0.574466	0.580843	0.769594	0.818143	0.914281	0.279528	6.957716
4	8	0.935278	0.421235	0.303801	0.880214	0.665610	0.830131	0.487113	0.604157	0.874658	0.863427	0.983575	0.900464	0.935918	0.435772	7.951046

2.2. 用于神经网络预处理的GaussianRank

如果希望知道细节，可以参考之前分享的RankGaussian的部分

import numpy as np
from joblib import Parallel, delayed
from scipy.interpolate import interp1d
from scipy.special import erf, erfinv
from sklearn.preprocessing import QuantileTransformer,PowerTransformer
from sklearn.base import BaseEstimator, TransformerMixin
from sklearn.utils.validation import FLOAT_DTYPES, check_array, check_is_fitted

class GaussRankScaler(BaseEstimator, TransformerMixin):
    """Transform features by scaling each feature to a normal distribution.
    Parameters
        ----------
        epsilon : float, optional, default 1e-4
            A small amount added to the lower bound or subtracted
            from the upper bound. This value prevents infinite number
            from occurring when applying the inverse error function.
        copy : boolean, optional, default True
            If False, try to avoid a copy and do inplace scaling instead.
            This is not guaranteed to always work inplace; e.g. if the data is
            not a NumPy array, a copy may still be returned.
        n_jobs : int or None, optional, default None
            Number of jobs to run in parallel.
            ``None`` means 1 and ``-1`` means using all processors.
        interp_kind : str or int, optional, default 'linear'
           Specifies the kind of interpolation as a string
            ('linear', 'nearest', 'zero', 'slinear', 'quadratic', 'cubic',
            'previous', 'next', where 'zero', 'slinear', 'quadratic' and 'cubic'
            refer to a spline interpolation of zeroth, first, second or third
            order; 'previous' and 'next' simply return the previous or next value
            of the point) or as an integer specifying the order of the spline
            interpolator to use.
        interp_copy : bool, optional, default False
            If True, the interpolation function makes internal copies of x and y.
            If False, references to `x` and `y` are used.
        Attributes
        ----------
        interp_func_ : list
            The interpolation function for each feature in the training set.
        """

    def __init__(self, epsilon=1e-4, copy=True, n_jobs=None, interp_kind='linear', interp_copy=False):
        self.epsilon     = epsilon
        self.copy        = copy
        self.interp_kind = interp_kind
        self.interp_copy = interp_copy
        self.fill_value  = 'extrapolate'
        self.n_jobs      = n_jobs

    def fit(self, X, y=None):
        """Fit interpolation function to link rank with original data for future scaling
        Parameters
        ----------
        X : array-like, shape (n_samples, n_features)
            The data used to fit interpolation function for later scaling along the features axis.
        y
            Ignored
        """
        X = check_array(X, copy=self.copy, estimator=self, dtype=FLOAT_DTYPES, force_all_finite=True)

        self.interp_func_ = Parallel(n_jobs=self.n_jobs)(delayed(self._fit)(x) for x in X.T)
        return self

    def _fit(self, x):
        x = self.drop_duplicates(x)
        rank = np.argsort(np.argsort(x))
        bound = 1.0 - self.epsilon
        factor = np.max(rank) / 2.0 * bound
        scaled_rank = np.clip(rank / factor - bound, -bound, bound)
        return interp1d(
            x, scaled_rank, kind=self.interp_kind, copy=self.interp_copy, fill_value=self.fill_value)

    def transform(self, X, copy=None):
        """Scale the data with the Gauss Rank algorithm
        Parameters
        ----------
        X : array-like, shape (n_samples, n_features)
            The data used to scale along the features axis.
        copy : bool, optional (default: None)
            Copy the input X or not.
        """
        check_is_fitted(self, 'interp_func_')

        copy = copy if copy is not None else self.copy
        X = check_array(X, copy=copy, estimator=self, dtype=FLOAT_DTYPES, force_all_finite=True)

        X = np.array(Parallel(n_jobs=self.n_jobs)(delayed(self._transform)(i, x) for i, x in enumerate(X.T))).T
        return X

    def _transform(self, i, x):
        return erfinv(self.interp_func_[i](x))

    def inverse_transform(self, X, copy=None):
        """Scale back the data to the original representation
        Parameters
        ----------
        X : array-like, shape [n_samples, n_features]
            The data used to scale along the features axis.
        copy : bool, optional (default: None)
            Copy the input X or not.
        """
        check_is_fitted(self, 'interp_func_')

        copy = copy if copy is not None else self.copy
        X = check_array(X, copy=copy, estimator=self, dtype=FLOAT_DTYPES, force_all_finite=True)

        X = np.array(Parallel(n_jobs=self.n_jobs)(delayed(self._inverse_transform)(i, x) for i, x in enumerate(X.T))).T
        return X

    def _inverse_transform(self, i, x):
        inv_interp_func = interp1d(self.interp_func_[i].y, self.interp_func_[i].x, kind=self.interp_kind,
                                   copy=self.interp_copy, fill_value=self.fill_value)
        return inv_interp_func(erf(x))

    @staticmethod
    def drop_duplicates(x):
        is_unique = np.zeros_like(x, dtype=bool)
        is_unique[np.unique(x, return_index=True)[1]] = True
        return x[is_unique]

2.3 RankGaussian处理

feature_names = ['cont1', 'cont2', 'cont3', 'cont4', 'cont5', 'cont6', 'cont7','cont8', 'cont9', 'cont10', 'cont11', 'cont12', 'cont13', 'cont14']
scaler_linear    = GaussRankScaler(interp_kind='linear',) 
for c in feature_names:
    train_test[c+'_linear_grank'] = scaler_linear.fit_transform(train_test[c].values.reshape(-1,1))
    
gaussian_linear_feature_names = [c + '_linear_grank' for c in feature_names]

3. NN模型建模

from tensorflow.keras import regularizers
from sklearn.model_selection import KFold, StratifiedKFold
import tensorflow as tf
# import tensorflow_addons as tfa
import tensorflow.keras.backend as K
from tensorflow.keras.layers import *
from tensorflow.keras.models import *
from tensorflow.keras.optimizers import *
from tensorflow.keras.callbacks import *
from tensorflow.keras.layers import Input
import os

3.1 训练&验证划分

随机划分训练集和验证集

tr = train_test.iloc[:train.shape[0],:].copy()
te = train_test.iloc[train.shape[0]:,:].copy() 
kf          = KFold(n_splits=5,random_state=48,shuffle=False) 
cnt         = 0
for trn_idx, test_idx in kf.split(tr,tr['target']):
    if cnt == 0:
        cnt += 1
        continue
    X_tr_gbdt,X_val_gbdt = tr[feature_names].iloc[trn_idx],tr[feature_names].iloc[test_idx]
    X_tr_dnn_linear_gaussian,X_val_dnn_linear_gaussian = tr[gaussian_linear_feature_names].iloc[trn_idx],tr[gaussian_linear_feature_names].iloc[test_idx]
    y_tr,y_val = tr['target'].iloc[trn_idx],train['target'].iloc[test_idx]
    break

/home/inf/anaconda3/lib/python3.7/site-packages/sklearn/model_selection/_split.py:297: FutureWarning: Setting a random_state has no effect since shuffle is False. This will raise an error in 0.24. You should leave random_state to its default (None), or set shuffle=True.
  FutureWarning

3.2 MLP模型(sigmoid)：0.7108

基于sigmoid的回归

class MLP_Model(tf.keras.Model): 
    def __init__(self):
        super(MLP_Model, self).__init__() 
        self.dense1 =Dense(1000, activation='relu')  
        self.drop1  = Dropout(0.25)
        self.dense2 =Dense(500, activation='relu')  
        self.drop2  = Dropout(0.25) 
        self.dense_out =Dense(1,activation='sigmoid') 

    def call(self, inputs):
        min_target = 0
        max_target = 10.26757
        x1      = self.dense1(inputs)
        x1      = self.drop1(x1)
        x2      = self.dense2(x1)
        x2      = self.drop2(x2)
        outputs      = self.dense_out(x2)
        outputs  =  outputs * (max_target - min_target) + min_target  
        return outputs

import time  
def RMSE(y_true, y_pred):
    return tf.sqrt(tf.reduce_mean(tf.square(y_true - y_pred)))

model = MLP_Model()
adam = tf.optimizers.Adam(lr=1e-3)
model.compile(optimizer=adam, loss=RMSE)

K.clear_session() 
model_weights = f'./models/model_gauss_mlp_mlp.h5'
checkpoint = ModelCheckpoint(model_weights, monitor='loss', verbose=0, save_best_only=True, mode='min',
                             save_weights_only=True)
plateau        = ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=10, verbose=1, min_delta=1e-4, mode='min')
early_stopping = EarlyStopping(monitor="val_loss", patience=25)
history = model.fit(X_tr_dnn_linear_gaussian.values, y_tr.values,
                        validation_data=(X_val_dnn_linear_gaussian.values, y_val.values),
                    batch_size=1024, epochs=100,
                    callbacks=[plateau, checkpoint, early_stopping],
                    verbose=2
                   )

WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: Failed to parse source code of >, which Python reported as:
    def call(self, inputs):
        min_target = 0
        max_target = 10.26757
        x1      = self.dense1(inputs)
        x1      = self.drop1(x1)
        x2      = self.dense2(x1)
        x2      = self.drop2(x2)
        outputs      = self.dense_out(x2)
        outputs  =  outputs * (max_target - min_target) + min_target
#         outputs = self.dense_out(x3) # 1500 original  
        
        return outputs

This may be caused by multiline strings or comments not indented at the same level as the code.
WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: Failed to parse source code of >, which Python reported as:
    def call(self, inputs):
        min_target = 0
        max_target = 10.26757
        x1      = self.dense1(inputs)
        x1      = self.drop1(x1)
        x2      = self.dense2(x1)
        x2      = self.drop2(x2)
        outputs      = self.dense_out(x2)
        outputs  =  outputs * (max_target - min_target) + min_target
#         outputs = self.dense_out(x3) # 1500 original  
        
        return outputs

This may be caused by multiline strings or comments not indented at the same level as the code.
Train on 240000 samples, validate on 60000 samples
Epoch 1/100
WARNING:tensorflow:Entity .initialize_variables at 0x7f4818c32950> could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: module 'gast' has no attribute 'Num'
WARNING: Entity .initialize_variables at 0x7f4818c32950> could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: module 'gast' has no attribute 'Num'
240000/240000 - 1s - loss: 0.8020 - val_loss: 0.7203
Epoch 2/100
240000/240000 - 0s - loss: 0.7345 - val_loss: 0.7225
Epoch 3/100
240000/240000 - 0s - loss: 0.7290 - val_loss: 0.7183
Epoch 4/100
240000/240000 - 0s - loss: 0.7270 - val_loss: 0.7197
Epoch 5/100
240000/240000 - 0s - loss: 0.7247 - val_loss: 0.7170
Epoch 6/100
240000/240000 - 0s - loss: 0.7232 - val_loss: 0.7190
Epoch 7/100
240000/240000 - 0s - loss: 0.7227 - val_loss: 0.7157
Epoch 8/100
240000/240000 - 0s - loss: 0.7205 - val_loss: 0.7215
Epoch 9/100
240000/240000 - 0s - loss: 0.7199 - val_loss: 0.7144
Epoch 10/100
240000/240000 - 0s - loss: 0.7185 - val_loss: 0.7148
Epoch 11/100
240000/240000 - 0s - loss: 0.7175 - val_loss: 0.7176
Epoch 12/100
240000/240000 - 0s - loss: 0.7170 - val_loss: 0.7147
Epoch 13/100
240000/240000 - 0s - loss: 0.7165 - val_loss: 0.7142
Epoch 14/100
240000/240000 - 0s - loss: 0.7157 - val_loss: 0.7140
Epoch 15/100
240000/240000 - 0s - loss: 0.7150 - val_loss: 0.7132
Epoch 16/100
240000/240000 - 0s - loss: 0.7145 - val_loss: 0.7127
Epoch 17/100
240000/240000 - 0s - loss: 0.7136 - val_loss: 0.7127
Epoch 18/100
240000/240000 - 0s - loss: 0.7131 - val_loss: 0.7124
Epoch 19/100
240000/240000 - 0s - loss: 0.7126 - val_loss: 0.7165
Epoch 20/100
240000/240000 - 0s - loss: 0.7120 - val_loss: 0.7130
Epoch 21/100
240000/240000 - 0s - loss: 0.7116 - val_loss: 0.7119
Epoch 22/100
240000/240000 - 0s - loss: 0.7111 - val_loss: 0.7129
Epoch 23/100
240000/240000 - 0s - loss: 0.7104 - val_loss: 0.7129
Epoch 24/100
240000/240000 - 0s - loss: 0.7102 - val_loss: 0.7136
Epoch 25/100
240000/240000 - 0s - loss: 0.7097 - val_loss: 0.7120
Epoch 26/100
240000/240000 - 0s - loss: 0.7089 - val_loss: 0.7126
Epoch 27/100
240000/240000 - 0s - loss: 0.7084 - val_loss: 0.7154
Epoch 28/100
240000/240000 - 0s - loss: 0.7078 - val_loss: 0.7111
Epoch 29/100
240000/240000 - 0s - loss: 0.7075 - val_loss: 0.7132
Epoch 30/100
240000/240000 - 0s - loss: 0.7074 - val_loss: 0.7126
Epoch 31/100
240000/240000 - 0s - loss: 0.7062 - val_loss: 0.7129
Epoch 32/100
240000/240000 - 0s - loss: 0.7059 - val_loss: 0.7119
Epoch 33/100
240000/240000 - 0s - loss: 0.7054 - val_loss: 0.7135
Epoch 34/100
240000/240000 - 0s - loss: 0.7048 - val_loss: 0.7108
Epoch 35/100
240000/240000 - 0s - loss: 0.7048 - val_loss: 0.7116
Epoch 36/100
240000/240000 - 0s - loss: 0.7037 - val_loss: 0.7161
Epoch 37/100
240000/240000 - 0s - loss: 0.7034 - val_loss: 0.7131
Epoch 38/100
240000/240000 - 0s - loss: 0.7031 - val_loss: 0.7148
Epoch 39/100
240000/240000 - 0s - loss: 0.7022 - val_loss: 0.7113
Epoch 40/100
240000/240000 - 0s - loss: 0.7013 - val_loss: 0.7117
Epoch 41/100
240000/240000 - 0s - loss: 0.7012 - val_loss: 0.7124
Epoch 42/100
240000/240000 - 0s - loss: 0.7008 - val_loss: 0.7116
Epoch 43/100
240000/240000 - 0s - loss: 0.7001 - val_loss: 0.7124
Epoch 44/100

Epoch 00044: ReduceLROnPlateau reducing learning rate to 0.0005000000237487257.
240000/240000 - 0s - loss: 0.6995 - val_loss: 0.7113
Epoch 45/100
240000/240000 - 0s - loss: 0.6962 - val_loss: 0.7116
Epoch 46/100
240000/240000 - 0s - loss: 0.6954 - val_loss: 0.7118
Epoch 47/100
240000/240000 - 0s - loss: 0.6940 - val_loss: 0.7116
Epoch 48/100
240000/240000 - 0s - loss: 0.6938 - val_loss: 0.7120
Epoch 49/100
240000/240000 - 0s - loss: 0.6930 - val_loss: 0.7118
Epoch 50/100
240000/240000 - 0s - loss: 0.6927 - val_loss: 0.7123
Epoch 51/100
240000/240000 - 0s - loss: 0.6920 - val_loss: 0.7123
Epoch 52/100
240000/240000 - 0s - loss: 0.6915 - val_loss: 0.7125
Epoch 53/100
240000/240000 - 0s - loss: 0.6912 - val_loss: 0.7144
Epoch 54/100

Epoch 00054: ReduceLROnPlateau reducing learning rate to 0.0002500000118743628.
240000/240000 - 0s - loss: 0.6905 - val_loss: 0.7146
Epoch 55/100
240000/240000 - 0s - loss: 0.6885 - val_loss: 0.7123
Epoch 56/100
240000/240000 - 0s - loss: 0.6874 - val_loss: 0.7135
Epoch 57/100
240000/240000 - 0s - loss: 0.6872 - val_loss: 0.7136
Epoch 58/100
240000/240000 - 0s - loss: 0.6868 - val_loss: 0.7138
Epoch 59/100
240000/240000 - 0s - loss: 0.6863 - val_loss: 0.7134

3.3 MLP模型(linear)：0.7137

class MLP_Model(tf.keras.Model):

    def __init__(self):
        super(MLP_Model, self).__init__() 
        self.dense1 =Dense(1000, activation='relu')  
        self.drop1  = Dropout(0.25)
        self.dense2 =Dense(500, activation='relu') 
        self.drop2  = Dropout(0.25) 
        self.dense_out =Dense(1)

    def call(self, inputs): 
        x1      = self.dense1(inputs)
        x1      = self.drop1(x1)
        x2      = self.dense2(x1)
        x2      = self.drop2(x2)
        outputs = self.dense_out(x2) 
        
        return outputs

model = MLP_Model()
adam = tf.optimizers.Adam(lr=1e-3) 
model.compile(optimizer=adam, loss=RMSE)

K.clear_session() 
model_weights = f'./models/model_gauss_mlp_mlp.h5'
checkpoint = ModelCheckpoint(model_weights, monitor='loss', verbose=0, save_best_only=True, mode='min',
                             save_weights_only=True)
plateau        = ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=10, verbose=1, min_delta=1e-4, mode='min')
early_stopping = EarlyStopping(monitor="val_loss", patience=25)
history = model.fit(X_tr_dnn_linear_gaussian.values, y_tr.values,
                        validation_data=(X_val_dnn_linear_gaussian.values, y_val.values),
                    batch_size=1024, epochs=100,
                    callbacks=[plateau, checkpoint, early_stopping],
                    verbose=2
                   )

WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: Bad argument number for Name: 3, expecting 4
WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: Bad argument number for Name: 3, expecting 4
Train on 240000 samples, validate on 60000 samples
Epoch 1/100
WARNING:tensorflow:Entity .initialize_variables at 0x7f4818c487a0> could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: module 'gast' has no attribute 'Num'
WARNING: Entity .initialize_variables at 0x7f4818c487a0> could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: module 'gast' has no attribute 'Num'
240000/240000 - 1s - loss: 1.3292 - val_loss: 0.7767
Epoch 2/100
240000/240000 - 0s - loss: 0.8163 - val_loss: 0.7251
Epoch 3/100
240000/240000 - 0s - loss: 0.8072 - val_loss: 0.7251
Epoch 4/100
240000/240000 - 0s - loss: 0.8040 - val_loss: 0.7496
Epoch 5/100
240000/240000 - 0s - loss: 0.7997 - val_loss: 0.7324
Epoch 6/100
240000/240000 - 0s - loss: 0.7982 - val_loss: 0.7271
Epoch 7/100
240000/240000 - 0s - loss: 0.7936 - val_loss: 0.7202
Epoch 8/100
240000/240000 - 0s - loss: 0.7950 - val_loss: 0.7249
Epoch 9/100
240000/240000 - 0s - loss: 0.7914 - val_loss: 0.7284
Epoch 10/100
240000/240000 - 0s - loss: 0.7882 - val_loss: 0.7313
Epoch 11/100
240000/240000 - 0s - loss: 0.7886 - val_loss: 0.7303
Epoch 12/100
240000/240000 - 0s - loss: 0.7857 - val_loss: 0.7292
Epoch 13/100
240000/240000 - 0s - loss: 0.7855 - val_loss: 0.7257
Epoch 14/100
240000/240000 - 0s - loss: 0.7847 - val_loss: 0.7204
Epoch 15/100
240000/240000 - 0s - loss: 0.7825 - val_loss: 0.7224
Epoch 16/100
240000/240000 - 0s - loss: 0.7813 - val_loss: 0.7220
Epoch 17/100

Epoch 00017: ReduceLROnPlateau reducing learning rate to 0.0005000000237487257.
240000/240000 - 0s - loss: 0.7808 - val_loss: 0.7208
Epoch 18/100
240000/240000 - 0s - loss: 0.7752 - val_loss: 0.7187
Epoch 19/100
240000/240000 - 0s - loss: 0.7743 - val_loss: 0.7234
Epoch 20/100
240000/240000 - 0s - loss: 0.7730 - val_loss: 0.7190
Epoch 21/100
240000/240000 - 0s - loss: 0.7750 - val_loss: 0.7196
Epoch 22/100
240000/240000 - 0s - loss: 0.7742 - val_loss: 0.7286
Epoch 23/100
240000/240000 - 0s - loss: 0.7722 - val_loss: 0.7198
Epoch 24/100
240000/240000 - 0s - loss: 0.7720 - val_loss: 0.7227
Epoch 25/100
240000/240000 - 0s - loss: 0.7724 - val_loss: 0.7176
Epoch 26/100
240000/240000 - 0s - loss: 0.7705 - val_loss: 0.7194
Epoch 27/100
240000/240000 - 0s - loss: 0.7689 - val_loss: 0.7206
Epoch 28/100
240000/240000 - 0s - loss: 0.7696 - val_loss: 0.7168
Epoch 29/100
240000/240000 - 0s - loss: 0.7695 - val_loss: 0.7171
Epoch 30/100
240000/240000 - 0s - loss: 0.7681 - val_loss: 0.7164
Epoch 31/100
240000/240000 - 0s - loss: 0.7676 - val_loss: 0.7225
Epoch 32/100
240000/240000 - 0s - loss: 0.7681 - val_loss: 0.7177
Epoch 33/100
240000/240000 - 0s - loss: 0.7660 - val_loss: 0.7198
Epoch 34/100
240000/240000 - 0s - loss: 0.7668 - val_loss: 0.7202
Epoch 35/100
240000/240000 - 0s - loss: 0.7653 - val_loss: 0.7160
Epoch 36/100
240000/240000 - 0s - loss: 0.7647 - val_loss: 0.7248
Epoch 37/100
240000/240000 - 0s - loss: 0.7638 - val_loss: 0.7173
Epoch 38/100
240000/240000 - 0s - loss: 0.7626 - val_loss: 0.7197
Epoch 39/100
240000/240000 - 0s - loss: 0.7624 - val_loss: 0.7182
Epoch 40/100
240000/240000 - 0s - loss: 0.7615 - val_loss: 0.7195
Epoch 41/100
240000/240000 - 0s - loss: 0.7621 - val_loss: 0.7195
Epoch 42/100
240000/240000 - 0s - loss: 0.7616 - val_loss: 0.7192
Epoch 43/100
240000/240000 - 0s - loss: 0.7604 - val_loss: 0.7162
Epoch 44/100
240000/240000 - 0s - loss: 0.7592 - val_loss: 0.7152
Epoch 45/100
240000/240000 - 0s - loss: 0.7600 - val_loss: 0.7193
Epoch 46/100
240000/240000 - 0s - loss: 0.7594 - val_loss: 0.7206
Epoch 47/100
240000/240000 - 0s - loss: 0.7578 - val_loss: 0.7201
Epoch 48/100
240000/240000 - 0s - loss: 0.7583 - val_loss: 0.7164
Epoch 49/100
240000/240000 - 0s - loss: 0.7581 - val_loss: 0.7163
Epoch 50/100
240000/240000 - 0s - loss: 0.7572 - val_loss: 0.7163
Epoch 51/100
240000/240000 - 0s - loss: 0.7554 - val_loss: 0.7166
Epoch 52/100
240000/240000 - 0s - loss: 0.7564 - val_loss: 0.7212
Epoch 53/100
240000/240000 - 0s - loss: 0.7560 - val_loss: 0.7156
Epoch 54/100

Epoch 00054: ReduceLROnPlateau reducing learning rate to 0.0002500000118743628.
240000/240000 - 0s - loss: 0.7547 - val_loss: 0.7180
Epoch 55/100
240000/240000 - 0s - loss: 0.7530 - val_loss: 0.7154
Epoch 56/100
240000/240000 - 0s - loss: 0.7534 - val_loss: 0.7150
Epoch 57/100
240000/240000 - 0s - loss: 0.7531 - val_loss: 0.7148
Epoch 58/100
240000/240000 - 0s - loss: 0.7530 - val_loss: 0.7156
Epoch 59/100
240000/240000 - 0s - loss: 0.7523 - val_loss: 0.7166
Epoch 60/100
240000/240000 - 0s - loss: 0.7522 - val_loss: 0.7152
Epoch 61/100
240000/240000 - 0s - loss: 0.7520 - val_loss: 0.7155
Epoch 62/100
240000/240000 - 0s - loss: 0.7514 - val_loss: 0.7148
Epoch 63/100
240000/240000 - 0s - loss: 0.7514 - val_loss: 0.7149
Epoch 64/100
240000/240000 - 0s - loss: 0.7506 - val_loss: 0.7156
Epoch 65/100
240000/240000 - 0s - loss: 0.7508 - val_loss: 0.7150
Epoch 66/100
240000/240000 - 0s - loss: 0.7516 - val_loss: 0.7154
Epoch 67/100

Epoch 00067: ReduceLROnPlateau reducing learning rate to 0.0001250000059371814.
240000/240000 - 0s - loss: 0.7507 - val_loss: 0.7153
Epoch 68/100
240000/240000 - 0s - loss: 0.7502 - val_loss: 0.7149
Epoch 69/100
240000/240000 - 0s - loss: 0.7497 - val_loss: 0.7147
Epoch 70/100
240000/240000 - 0s - loss: 0.7496 - val_loss: 0.7148
Epoch 71/100
240000/240000 - 0s - loss: 0.7502 - val_loss: 0.7142
Epoch 72/100
240000/240000 - 0s - loss: 0.7492 - val_loss: 0.7148
Epoch 73/100
240000/240000 - 0s - loss: 0.7487 - val_loss: 0.7148
Epoch 74/100
240000/240000 - 0s - loss: 0.7485 - val_loss: 0.7143
Epoch 75/100
240000/240000 - 0s - loss: 0.7496 - val_loss: 0.7154
Epoch 76/100
240000/240000 - 0s - loss: 0.7482 - val_loss: 0.7144
Epoch 77/100
240000/240000 - 0s - loss: 0.7488 - val_loss: 0.7142
Epoch 78/100
240000/240000 - 0s - loss: 0.7492 - val_loss: 0.7145
Epoch 79/100
240000/240000 - 0s - loss: 0.7483 - val_loss: 0.7143
Epoch 80/100
240000/240000 - 0s - loss: 0.7478 - val_loss: 0.7143
Epoch 81/100

Epoch 00081: ReduceLROnPlateau reducing learning rate to 6.25000029685907e-05.
240000/240000 - 0s - loss: 0.7481 - val_loss: 0.7143
Epoch 82/100
240000/240000 - 0s - loss: 0.7480 - val_loss: 0.7146
Epoch 83/100
240000/240000 - 0s - loss: 0.7477 - val_loss: 0.7141
Epoch 84/100
240000/240000 - 0s - loss: 0.7471 - val_loss: 0.7139
Epoch 85/100
240000/240000 - 0s - loss: 0.7475 - val_loss: 0.7140
Epoch 86/100
240000/240000 - 0s - loss: 0.7473 - val_loss: 0.7141
Epoch 87/100
240000/240000 - 0s - loss: 0.7469 - val_loss: 0.7141
Epoch 88/100
240000/240000 - 0s - loss: 0.7474 - val_loss: 0.7148
Epoch 89/100
240000/240000 - 0s - loss: 0.7467 - val_loss: 0.7138
Epoch 90/100
240000/240000 - 0s - loss: 0.7466 - val_loss: 0.7142
Epoch 91/100
240000/240000 - 0s - loss: 0.7460 - val_loss: 0.7141
Epoch 92/100
240000/240000 - 0s - loss: 0.7465 - val_loss: 0.7138
Epoch 93/100
240000/240000 - 0s - loss: 0.7469 - val_loss: 0.7142
Epoch 94/100
240000/240000 - 0s - loss: 0.7467 - val_loss: 0.7141
Epoch 95/100
240000/240000 - 0s - loss: 0.7465 - val_loss: 0.7148
Epoch 96/100
240000/240000 - 0s - loss: 0.7465 - val_loss: 0.7138
Epoch 97/100
240000/240000 - 0s - loss: 0.7461 - val_loss: 0.7138
Epoch 98/100
240000/240000 - 0s - loss: 0.7456 - val_loss: 0.7140
Epoch 99/100

Epoch 00099: ReduceLROnPlateau reducing learning rate to 3.125000148429535e-05.
240000/240000 - 0s - loss: 0.7463 - val_loss: 0.7139
Epoch 100/100
240000/240000 - 0s - loss: 0.7461 - val_loss: 0.7137

参考文献

https://www.kaggle.com/c/tabular-playground-series-jan-2021/data
https://www.kaggle.com/c/tabular-playground-series-jan-2021/discussion/216037

往期精彩回顾




适合初学者入门人工智能的路线及资料下载
机器学习及深度学习笔记等资料打印
机器学习在线手册
深度学习笔记专辑
《统计学习方法》的代码复现专辑
AI基础下载
机器学习的数学基础专辑
本站知识星球“黄博的机器学习圈子”（92416895）
本站qq群704220115。
加入微信群请扫码：