23个优秀的机器学习训练公开数据集,一文全知道!
作者 | Nikola M. Zivkovic 转自AI前线
译者 | 王强
策划 | 凌敏
帕尔默企鹅数据集
共享单车需求数据集
葡萄酒分类数据集
波士顿住房数据集
电离层数据集
Fashion MNIST 数据集
猫与狗数据集
威斯康星州乳腺癌(诊断)数据集
Twitter 情绪分析和 Sentiment140 数据集
BBC 新闻数据集
垃圾短信分类器数据集
CelebA 数据集
YouTube-8M 数据集
亚马逊评论数据集
纸币验证数据集
LabelMe 数据集
声纳数据集
皮马印第安人糖尿病数据集
小麦种子数据集
Jeopardy! 数据集
鲍鱼数据集
假新闻检测数据集
ImageNet 数据集
data = pd.read_csv(f".\\Datasets\\penguins_size.csv")
data.head()
介绍 (https://allisonhorst.github.io/palmerpenguins/articles/intro.html)
GitHub(https://github.com/allisonhorst/palmerpenguins)
Kaggle(https://www.kaggle.com/parulpandey/palmer-archipelago-antarctica-penguin-data)
data = pd.read_csv(f".\\Datasets\\hour.csv")
data.head()
data = pd.read_csv(f".\\Datasets\\day.csv")
data.head()
UCI(https://archive.ics.uci.edu/ml/datasets/bike+sharing+dataset)
Kaggle(https://www.kaggle.com/c/bike-sharing-demand)
data = pd.read_csv(f".\\Datasets\\winequality-white.csv")
data.head()
介绍 (https://www.vinhoverde.pt/en/about-vinho-verde)
UCI(https://archive.ics.uci.edu/ml/datasets/Wine+Quality)
data = pd.read_csv(f".\\Datasets\\boston_housing.csv")
data.head()
介绍 (https://www.cs.toronto.edu/\~delve/data/boston/bostonDetail.html)
Kaggle(https://www.kaggle.com/c/boston-housing)
data = pd.read_csv(f".\\Datasets\\ionsphere.csv")
data.head()
UCI(https://archive.ics.uci.edu/ml/datasets/Ionosphere)
GitHub(https://github.com/zalandoresearch/fashion-mnist)
Kaggle(https://www.kaggle.com/zalando-research/fashionmnist)
介绍 (https://www.microsoft.com/en-us/download/details.aspx?id=54765)
Kaggle(https://www.kaggle.com/c/dogs-vs-cats)
data = pd.read_csv(f".\\Datasets\\breast-cancer-wisconsin.csv")
data.head()
Kaggle(https://www.kaggle.com/uciml/breast-cancer-wisconsin-data)
UCI(https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Diagnostic)
data = pd.read_csv(f".\\Datasets\\training.1600000.processed.noemoticon.csv")
data.head()
Kaggle(https://www.kaggle.com/c/twitter-sentiment-analysis2)
Kaggle(https://www.kaggle.com/kazanova/sentiment140)
data = pd.read_csv(f".\\Datasets\\BBC News Train.csv")
data.head()
Kaggle(https://www.kaggle.com/c/learn-ai-bbc)
ham What you doing?how are you?
ham Ok lar... Joking wif u oni...
ham dun say so early hor... U c already then say...
ham MY NO. IN LUTON 0125698789 RING ME IF UR AROUND! H*
ham Siva is in hostel aha:-.
ham Cos i was out shopping wif darren jus now n i called him 2 ask wat present he wan lor. Then he started guessing who i was wif n he finally guessed darren lor.
spam FreeMsg: Txt: CALL to No: 86888 & claim your reward of 3 hours talk time to use from your phone now! ubscribe6GBP/ mnth inc 3hrs 16 stop?txtStop
spam Sunshine Quiz! Win a super Sony DVD recorder if you canname the capital of Australia? Text MQUIZ to 82277. B
spam URGENT! Your Mobile No 07808726822 was awarded a L2,000 Bonus Caller Prize on 02/09/03! This is our 2nd attempt to contact YOU! Call 0871-872-9758 BOX95QU
UCI(https://archive.ics.uci.edu/ml/datasets/sms+spam+collection)
Kaggle(https://www.kaggle.com/uciml/sms-spam-collection-dataset)
介绍 (http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html)
mkdir -p ~/yt8m/2/frame/train
cd ~/yt8m/2/frame/train
curl data.yt8m.org/download.py | partition=2/frame/train mirror=us python
介绍 (https://arxiv.org/abs/1609.08675)
下载 (http://research.google.com/youtube8m/)
介绍和下载 (https://jmcauley.ucsd.edu/data/amazon/)
data = pd.read_csv(f".\\Datasets\\data_banknote_authentication.csv")
data.head()
UCI(https://archive.ics.uci.edu/ml/datasets/banknote+authentication#)
Kaggle(https://www.kaggle.com/ritesaluja/bank-note-authentication-uci-data)
介绍和下载 (http://labelme.csail.mit.edu/Release3.0/index.php)
data = pd.read_csv(f".\\Datasets\\sonar.csv")
data.head()
介绍 (https://www.is.umk.pl/projects/datasets.html#Sonar)
UCI(https://archive.ics.uci.edu/ml/datasets/Connectionist+Bench+(Sonar,+Mines+vs.+Rocks))
data = pd.read_csv(f".\\Datasets\\pima-indians-dataset.csv")
data.head()
介绍 (https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.names)
Kaggle(https://www.kaggle.com/uciml/pima-indians-diabetes-database)
data = pd.read_csv(f".\\Datasets\\seeds_dataset.csv")
data.head()
UCI(https://archive.ics.uci.edu/ml/datasets/seeds)
Kaggle(https://www.kaggle.com/jmcaro/wheat-seedsuci)
data = pd.read_csv(f".\\Datasets\\joepardy.csv")
data.head()
Kaggle(https://www.kaggle.com/tunguz/200000-jeopardy-questions)
data = pd.read_csv(f".\\Datasets\\abalone.csv")
data.head()
UCI(https://archive.ics.uci.edu/ml/datasets/abalone)
Kaggle(https://www.kaggle.com/rodolfomendes/abalone-dataset)
data = pd.read_csv(f".\\Datasets\\fake_news\\train.csv")
data.head()
Kaggle(https://www.kaggle.com/c/fake-news/overview)
官方网站 (https://image-net.org/)