程序员带娃有多“恐怖” ?!
源 / 文/
它要在廉价设备上运行,比如外接廉价 USB 麦克风的树莓派。 它要能够检测婴儿哭声,并在他开始或停止哭的时候通知我(最好通知发手机上)、或者把哭声记录到仪表盘上、或者做任何我想做的哭声监控。 它应该能够在任何设备上播放音频,比如:我自己的扬声器、智能手机、电脑等等设备。无论声源和扬声器之间有多远,都可以播放,不需要我在屋子里移动扬声器。 它还应该带有一个摄像头,这样就可以实时检查宝宝的情况。或者在宝宝开始哭时,我可以得到婴儿床的照片或短视频。
录音采样
Tensorflow
模型,把Linux
操作系统装到 SD 卡上,最好用树莓派3
及以上的版本。另外还要一个可兼容的麦克风。[sudo] apt-get install ffmpeg lame libatlas-base-dev alsa-utils
[sudo] pip3 install tensorflow
arecord -l
**** List of CAPTURE Hardware Devices ****
card 1: Device [USB PnP Sound Device], device 0: USB Audio [USB Audio]
Subdevices: 0/1
Subdevice #0: subdevice #0
card 2: Device_1 [USB PnP Sound Device], device 0: USB Audio [USB Audio]
Subdevices: 0/1
Subdevice #0: subdevice #0
card 2, device 0
录音频。ALSA (Advanced Linux Sound Architecture)识别第二个麦克风的参数是hw:2,0
(这个参数直接访问设备硬件)或plughw:2,0
(这个是声明了采样率和格式转换插件)。确定下SD卡有足够的存储空间,或者外接外部USB存储设备。开始录制音频:arecord -D plughw:2,0 -c 1 -f cd | lame - audio.mp3
Ctrl C
结束录音。一天或者几天里重复录音几次。给音频样本打标签
scp
还是直接从SD卡或者usb设备拷贝都行。~/datasets/sound-detect/audio
。另外创建一个新目录放样本,每个目录下包含一个命名为audio.mp3
的音频文件和命名为labels.json
的标签文件,标签文件里标记音频段落的正向/负向。目录结构大概是这样:~/datasets/sound-detect/audio
-> sample_1
-> audio.mp3
-> labels.json
-> sample_2
-> audio.mp3
-> labels.json
...
labels.json
标签文件。识别哭声开始结束的准确时间,在labels.json
里用时间->标签
的格式的键值对格式记录,比如:{
"00:00": "negative",
"02:13": "positive",
"04:57": "negative",
"15:41": "positive",
"18:24": "negative"
}
生成数据集
git clone git@github.com:/BlackLight/micmon.git
cd micmon
[sudo] pip3 install -r requirements.txt
[sudo] python3 setup.py build install
micmon
提供了在一些音频样本上计算 FFT(快速傅里叶变换)的逻辑,使用低通和高通滤波器将结果频谱分组后把结果保存到一组 numpy 压缩(.npz
)文件中。通过命令行工具micmon-datagen
进行操作:micmon-datagen \
--low 250 --high 2500 --bins 100 \
--sample-duration 2 --channels 1 \
~/datasets/sound-detect/audio ~/datasets/sound-detect/data
~/dataset/sound-detect/audio
目录里的原始音频生成了一组数据集,存在~/datasets/sound-detect/data
目录下。--low
和--high
参数分别代表指定结果频谱中的最低和最高频率,默认之分别是 20Hz (最低人耳朵可以识别到的频率)和 20kHz(最高健康年轻人耳朵识别到的频率)。你可能要自己调整这个参数,以尽可能多地捕捉您想要检测的声音并尽量限制任何其他类型的背景音和不相关的谐波。我这里是 250–2500Hz 这个范围就可以检测婴儿哭声了。婴儿哭声频率很高(歌剧女高音最高可以达到最高 1000Hz),通常可以至少将频率提高一倍,来获得足够高次谐波(谐波是实际上给声音带来音色的较高频率)、但不能太高,否则其他背景音的谐波会污染频谱。我忽略了低于 250Hz 的声音,因为婴儿的哭声不会再这么低的频率上发生,这些声音会扭曲检测。推荐通过 Audacity 或其他任何均衡器或频谱分析仪中打开正向音频样本,检查哪些频率在正向样本中占主导地位,将数据围绕这些频率对齐。--bins
参数指定频率空间的组数,默认值 100。更高 bins 配置意味着更高频率分辨率/粒度,但如果太高,会是模型容易过拟合。--sample-duration
指这些分段应有多长,默认 2 秒。越高数值和更长的声音匹配,但是高数值会缩小检测的时间长度,而且在短音上会失效。低数值给短音使用越好,但是如果声音较长,捕获的片段可能没有足够的信息来可靠地识别声音。micmon-datagen
,还有另一个方法可以生成数据集,即调用micmon
提供的python api:import os
from micmon.audio import AudioDirectory, AudioPlayer, AudioFile
from micmon.dataset import DatasetWriter
basedir = os.path.expanduser('~/datasets/sound-detect')
audio_dir = os.path.join(basedir, 'audio')
datasets_dir = os.path.join(basedir, 'data')
cutoff_frequencies = [250, 2500]
# Scan the base audio_dir for labelled audio samples
audio_dirs = AudioDirectory.scan(audio_dir)
# Save the spectrum information and labels of the samples to a
# different compressed file for each audio file.
for audio_dir in audio_dirs:
dataset_file = os.path.join(datasets_dir, os.path.basename(audio_dir.path) + '.npz')
print(f'Processing audio sample {audio_dir.path}')
with AudioFile(audio_dir) as reader, \
DatasetWriter(dataset_file,
low_freq=cutoff_frequencies[0],
high_freq=cutoff_frequencies[1]) as writer:
for sample in reader:
writer += sample
micmon-datagen
还是micmon
python api,最后都要在~/datasets/sound-detect/data
目录下生成.npz
文件,每个原始音频生成一个标记文件。使用这个数据集来训练我们的神经网络进行声音检测。训练模型
micmon
用Tensorflow+Keras定义和训练模型,用已有的python api很容易做:import os
from tensorflow.keras import layers
from micmon.dataset import Dataset
from micmon.model import Model
# This is a directory that contains the saved .npz dataset files
datasets_dir = os.path.expanduser('~/datasets/sound-detect/data')
# This is the output directory where the model will be saved
model_dir = os.path.expanduser('~/models/sound-detect')
# This is the number of training epochs for each dataset sample
epochs = 2
# Load the datasets from the compressed files.
# 70% of the data points will be included in the training set,
# 30% of the data points will be included in the evaluation set
# and used to evaluate the performance of the model.
datasets = Dataset.scan(datasets_dir, validation_split=0.3)
labels = ['negative', 'positive']
freq_bins = len(datasets[0].samples[0])
# Create a network with 4 layers (one input layer, two intermediate layers and one output layer).
# The first intermediate layer in this example will have twice the number of units as the number
# of input units, while the second intermediate layer will have 75% of the number of
# input units. We also specify the names for the labels and the low and high frequency range
# used when sampling.
model = Model(
[
layers.Input(shape=(freq_bins,)),
layers.Dense(int(2 * freq_bins), activation='relu'),
layers.Dense(int(0.75 * freq_bins), activation='relu'),
layers.Dense(len(labels), activation='softmax'),
],
labels=labels,
low_freq=datasets[0].low_freq,
high_freq=datasets[0].high_freq
)
# Train the model
for epoch in range(epochs):
for i, dataset in enumerate(datasets):
print(f'[epoch {epoch+1}/{epochs}] [audio sample {i+1}/{len(datasets)}]')
model.fit(dataset)
evaluation = model.evaluate(dataset)
print(f'Validation set loss and accuracy: {evaluation}')
# Save the model
model.save(model_dir, overwrite=True)
~/models/sound-detect
保存着有新的模型。我这里,从宝宝房间收集大约5个小时的声音,并定义一个好的频率范围来训练出准确率大于96%的模型就可以了。使用模型做检测
import os
from micmon.audio import AudioDevice
from micmon.model import Model
model_dir = os.path.expanduser('~/models/sound-detect')
model = Model.load(model_dir)
audio_system = 'alsa' # Supported: alsa and pulse
audio_device = 'plughw:2,0' # Get list of recognized input devices with arecord -l
with AudioDevice(audio_system, device=audio_device) as source:
for sample in source:
source.pause() # Pause recording while we process the frame
prediction = model.predict(sample)
print(prediction)
source.resume() # Resume recording
negative
,否则打印positive
。[sudo] apt-get install redis-server
[sudo] systemctl start redis-server.service
[sudo] systemctl enable redis-server.service
[sudo] pip3 install 'platypush[http,pushbullet]'
~/.config/platypush/config.yaml
文件,打开Http与Pushbullet集成:backend.http:
enabled: True
pushbullet:
token: YOUR_TOKEN
#!/usr/bin/python3
import argparse
import logging
import os
import sys
from platypush import RedisBus
from platypush.message.event.custom import CustomEvent
from micmon.audio import AudioDevice
from micmon.model import Model
logger = logging.getLogger('micmon')
def get_args():
parser = argparse.ArgumentParser()
parser.add_argument('model_path', help='Path to the file/directory containing the saved Tensorflow model')
parser.add_argument('-i', help='Input sound device (e.g. hw:0,1 or default)', required=True, dest='sound_device')
parser.add_argument('-e', help='Name of the event that should be raised when a positive event occurs', required=True, dest='event_type')
parser.add_argument('-s', '--sound-server', help='Sound server to be used (available: alsa, pulse)', required=False, default='alsa', dest='sound_server')
parser.add_argument('-P', '--positive-label', help='Model output label name/index to indicate a positive sample (default: positive)', required=False, default='positive', dest='positive_label')
parser.add_argument('-N', '--negative-label', help='Model output label name/index to indicate a negative sample (default: negative)', required=False, default='negative', dest='negative_label')
parser.add_argument('-l', '--sample-duration', help='Length of the FFT audio samples (default: 2 seconds)', required=False, type=float, default=2., dest='sample_duration')
parser.add_argument('-r', '--sample-rate', help='Sample rate (default: 44100 Hz)', required=False, type=int, default=44100, dest='sample_rate')
parser.add_argument('-c', '--channels', help='Number of audio recording channels (default: 1)', required=False, type=int, default=1, dest='channels')
parser.add_argument('-f', '--ffmpeg-bin', help='FFmpeg executable path (default: ffmpeg)', required=False, default='ffmpeg', dest='ffmpeg_bin')
parser.add_argument('-v', '--verbose', help='Verbose/debug mode', required=False, action='store_true', dest='debug')
parser.add_argument('-w', '--window-duration', help='Duration of the look-back window (default: 10 seconds)', required=False, type=float, default=10., dest='window_length')
parser.add_argument('-n', '--positive-samples', help='Number of positive samples detected over the window duration to trigger the event (default: 1)', required=False, type=int, default=1, dest='positive_samples')
opts, args = parser.parse_known_args(sys.argv[1:])
return opts
def main():
args = get_args()
if args.debug:
logger.setLevel(logging.DEBUG)
model_dir = os.path.abspath(os.path.expanduser(args.model_path))
model = Model.load(model_dir)
window = []
cur_prediction = args.negative_label
bus = RedisBus()
with AudioDevice(system=args.sound_server,
device=args.sound_device,
sample_duration=args.sample_duration,
sample_rate=args.sample_rate,
channels=args.channels,
ffmpeg_bin=args.ffmpeg_bin,
debug=args.debug) as source:
for sample in source:
source.pause() # Pause recording while we process the frame
prediction = model.predict(sample)
logger.debug(f'Sample prediction: {prediction}')
has_change = False
if len(window) < args.window_length:
window += [prediction]
else:
window = window[1:] + [prediction]
positive_samples = len([pred for pred in window if pred == args.positive_label])
if args.positive_samples <= positive_samples and \
prediction == args.positive_label and \
cur_prediction != args.positive_label:
cur_prediction = args.positive_label
has_change = True
logging.info(f'Positive sample threshold detected ({positive_samples}/{len(window)})')
elif args.positive_samples > positive_samples and \
prediction == args.negative_label and \
cur_prediction != args.negative_label:
cur_prediction = args.negative_label
has_change = True
logging.info(f'Negative sample threshold detected ({len(window)-positive_samples}/{len(window)})')
if has_change:
evt = CustomEvent(subtype=args.event_type, state=prediction)
bus.post(evt)
source.resume() # Resume recording
if __name__ == '__main__':
main()
~/bin/micmon_detect.py
。这个脚本只在window_length
长度的滑动窗口内检测到发生了positive_samples
,只在当前的检测从负向变成正向或正向变成负向的时候出发提示事件。提示事件通过RedisBus
发送给 Platypush。这个脚本很通用,不仅可以检测婴儿哭音模型,还使用于任何声音模型、任何正向负向标签、任何频率范围、任何类型的输出的场景。mkdir -p ~/.config/platypush/scripts
cd ~/.config/platypush/scripts
# Define the directory as a module
touch __init__.py
# Create a script for the baby-cry events
vi babymonitor.py
babymonitor.py
的代码如下:from platypush.context import get_plugin
from platypush.event.hook import hook
from platypush.message.event.custom import CustomEvent
@hook(CustomEvent, subtype='baby-cry', state='positive')
def on_baby_cry_start(event, **_):
pb = get_plugin('pushbullet')
pb.send_note(title='Baby cry status', body='The baby is crying!')
@hook(CustomEvent, subtype='baby-cry', state='negative')
def on_baby_cry_stop(event, **_):
pb = get_plugin('pushbullet')
pb.send_note(title='Baby cry status', body='The baby stopped crying - good job!')
mkdir -p ~/.config/systemd/user
wget -O ~/.config/systemd/user/platypush.service \
https://raw.githubusercontent.com/BlackLight/platypush/master/examples/systemd/platypush.service
systemctl --user start platypush.service
systemctl --user enable platypush.service
~/.config/systemd/user/babymonitor.service
[Unit]
Description=Monitor to detect my baby's cries
After=network.target sound.target
[Service]
ExecStart=/home/pi/bin/micmon_detect.py -i plughw:2,0 -e baby-cry -w 10 -n 2 ~/models/sound-detect
Restart=always
RestartSec=10
[Install]
WantedBy=default.target
plughw:2,0
的麦克风监控,如果在过去 10 秒内检测到至少 2 个正向的 2 秒样本,并且之前的状态为负向,则它将触发一个baby-cry
事件、配置state=positive
;如果在过去 10 秒内检测到少于 2 个正向样本,并且之前的状态为正向,则配置state=negative
。systemctl --user start babymonitor.service
systemctl --user enable babymonitor.service
micmon_detect.py
,让捕获的音频样本也用 http 做流式传输,例如用 Flask wrapper 发送、ffmpeg 进行音频转换。另一个有趣的用例是当婴儿开始/停止啼哭时,将数据点发送到您的本地数据库,这是一组有用的数据,可以跟踪婴儿何时睡觉、何时醒来或何时需要喂养。参考如何使用 Platypush + PostgreSQL + Moscoitto + Grafana 创建灵活的仪表板。婴儿摄像头
[sudo] pip3 install 'platypush[http,camera,picamera]'
~/.config/platypush/config.yaml
里加摄像头配置:camera.pi:
listen_port: 5001
wget http://raspberry-pi:8008/camera/pi/photo.jpg
http://raspberry-pi:8008/camera/pi/video.mjpg
mkdir -p ~/.config/platypush/scripts
cd ~/.config/platypush/scripts
touch __init__.py
vi camera.py
from platypush.context import get_plugin
from platypush.event.hook import hook
from platypush.message.event.application import ApplicationStartedEvent
@hook(ApplicationStartedEvent)
def on_application_started(event, **_):
cam = get_plugin('camera.pi')
cam.start_streaming()
vlc tcp/h264://raspberry-pi:5001
音频监控
git clone https://github.com/BlackLight/micstream.git
cd micstream
[sudo] python3 setup.py install
micstream --help
获得可用的命令行选项。arecord -l
看所有音频设备)、在/baby.mp3
文件上、监听 8088 端口、96 kbps 比特率,命令如下:micstream -i plughw:3,0 -e '/baby.mp3' -b 96 -p 8088
http://your-rpi:8088/baby.mp3
,就可以听到实时婴儿声音监控了。好文推荐
人类史上最伟大的 PPT,马斯克的 39 页火星计划PPT
某985学历程序员嫌弃女朋友高职毕业学历低,但女朋友实在太漂亮,好犹豫!
发现:字节跳动的内卷从男休息室开始的
END
顶级程序员:topcoding
做最好的程序员社区:Java后端开发、Python、大数据、AI
一键三连「分享」、「点赞」和「在看」
评论