This repository contains a sample on how to perform anomaly detection on machine sounds (based on the MIMII Dataset) leveraging several approaches.
此存储库包含有关如何利用多种方法对机器声音(基于 MIMII 数据集)执行异常检测的示例。
Running time: once the dataset is downloaded, it takes roughly an hour and a half to go through all these notebooks from start to finish.
运行时间:下载数据集后,从头到尾浏览所有这些笔记本大约需要一个半小时。
Industrial companies have been collecting a massive amount of time series data about their operating processes, manufacturing production lines, industrial equipment... They sometime store years of data in historian systems. Whereas they are looking to prevent equipment breakdown that would stop a production line, avoid catastrophic failures in a power generation facility or improving their end product quality by adjusting their process parameters, having the ability to process time series data is a challenge that modern cloud technologies are up to. In this post, we are going to focus on preventing machine breakdown from happening.
工业公司一直在收集大量有关其操作流程、制造生产线、工业设备的时间序列数据……它们有时会在历史系统中存储多年的数据。尽管他们希望防止导致生产线停止的设备故障、避免发电设施发生灾难性故障或通过调整工艺参数来提高最终产品质量,但处理时间序列数据的能力是现代云技术面临的挑战最多。在这篇文章中,我们将重点关注防止机器故障的发生。
In many cases, machine failures are tackled by either reactive action (stop the line and repair...) or costly preventive maintenance where you have to build the proper replacement parts inventory and schedule regular maintenance activities. Skilled machine operators are the most valuable assets in such settings: years of experience allow them to develop a fine knowledge of how the machinery should operate, they become expert listeners and are able to develop unusual behavior and sounds in rotating and moving machines. However, production lines are becoming more and more automated, and augmenting these machine operators with AI-generated insights is a way to maintain and develop the fine expertise needed to prevent industrials undergoing a reactive posture when dealing with machine breakdowns.
在许多情况下,机器故障可以通过反应性措施(停止生产线并进行维修......)或昂贵的预防性维护来解决,您必须建立适当的更换零件库存并安排定期维护活动。在这种情况下,熟练的机器操作员是最有价值的资产:多年的经验使他们能够深入了解机器应如何操作,他们成为专业的听众,并能够在旋转和移动机器中产生不寻常的行为和声音。然而,生产线正变得越来越自动化,通过人工智能生成的见解来增强这些机器操作员是维护和发展所需专业知识的一种方式,以防止工业在处理机器故障时出现被动反应。
This is a companion repository for a blog post on AWS Machine Learning Blog, where we compare and contrast two different approaches to identify a malfunctioning machine for which we have sound recordings: we will start by building a neural network based on an autoencoder architecture and we will then use an image-based approach where we will feed “images from the sound” (namely spectrograms) to an image based automated ML classification feature.
这是 AWS 机器学习博客上一篇博客文章的配套存储库,我们在其中比较和对比了两种不同的方法来识别我们有录音的故障机器:我们将首先构建一个基于自动编码器架构的神经网络,然后我们然后,我们将使用基于图像的方法,将“声音图像”(即频谱图)提供给基于图像的自动 ML 分类功能。
Create an AWS account if you do not already have one and login.
如果您还没有 AWS 账户,请创建一个账户并登录。
Navigate to the SageMaker console and create a new instance. Using an ml.c5.2xlarge instance with a 25 GB attached EBS volume is recommended to process the dataset comfortably
导航到 SageMaker 控制台并创建一个新实例。建议使用带有 25 GB 附加 EBS 卷的 ml.c5.2xlarge 实例,以便轻松处理数据集
You need to ensure that this notebook instance has an IAM role which allows it to call the Amazon Rekognition Custom Labels API:
您需要确保此笔记本实例具有 IAM 角色,允许其调用 Amazon Rekognition 自定义标签 API:
- In your IAM console, look for the SageMaker execution role endorsed by your notebook instance (a role with a name like AmazonSageMaker-ExecutionRole-yyyymmddTHHMMSS)
在您的 IAM 控制台中,查找笔记本实例认可的 SageMaker 执行角色(名称类似于 AmazonSageMaker-ExecutionRole-yyyymmddTHHMMSS 的角色) - Click on Attach Policies and look for this managed policy: AmazonRekognitionCustomLabelsFullAccess
单击附加策略并查找此托管策略:AmazonRekognitionCustomLabelsFullAccess - Check the box next to it and click on Attach Policy
选中旁边的框并单击附加策略
Your SageMaker notebook instance can now call the Rekognition Custom Labels APIs.
您的 SageMaker 笔记本实例现在可以调用 Rekognition 自定义标签 API。
You can know navigate back to the Amazon SageMaker console, then to the Notebook Instances menu. Start your instance and launch either Jupyter or JupyterLab session. From there, you can launch a new terminal and clone this repository into your local development machine using git clone
.
您可以导航回 Amazon SageMaker 控制台,然后导航到笔记本实例菜单。启动您的实例并启动 Jupyter 或 JupyterLab 会话。从那里,您可以启动一个新终端并使用 git clone
将此存储库克隆到本地开发计算机中。
Once you've cloned this repo, browse to the data exploration notebook: this first notebook will download and prepare the data necessary for the other ones.
克隆此存储库后,浏览到数据探索笔记本:第一个笔记本将下载并准备其他笔记本所需的数据。
The dataset used is a subset of the MIMII dataset dedicated to industrial fans sound. This 10 GB archive will be downloaded in the /tmp directory: if you're using a SageMaker instance, you should have enough space on the ephemeral volume to download it. The unzipped data is around 15 GB large and will be located in the EBS volume, make sure it is large enough to prevent any out of space error.
使用的数据集是专用于工业风扇声音的 MIMII 数据集的子集。此 10 GB 存档将下载到 /tmp 目录中:如果您使用的是 SageMaker 实例,则临时卷上应该有足够的空间来下载它。解压后的数据大约有 15 GB 大,将位于 EBS 卷中,请确保它足够大以防止任何空间不足错误。
.
|
+-- README.md <-- This instruction file
|
+-- autoencoder/
| |-- model.py <-- The training script used as an entrypoint of the
| | TensorFlow container
| \-- requirements.txt <-- Requirements file to update the training container
| at launch
|
+-- pictures/ <-- Assets used in in the introduction and README.md
|
+-- tools/
| |-- rekognition_tools.py <-- Utilities to manage Rekognition custom labels models
| | (start, stop, get inference...)
| |-- sound_tools.py <-- Utilities to manage sounds dataset
| \-- utils.py <-- Various tools to build files list, plot curves, and
| confusion matrix...
|
+-- 0_introduction.ipynb <-- Expose the context
|
+-- 1_data_exploration.ipynb <-- START HERE: data exploration notebook, useful to
| generate the datasets, get familiar with sound datasets
| and basic frequency analysis
|
+-- 2_custom_autoencoder.ipynb <-- Using the SageMaker TensorFlow container to build a
| custom autoencoder
|
\-- 3_rekognition_custom_labels.ipynb <-- Performing the same tasks by calling the Rekognition
Custom Labels API
Please contact @michaelhoarau or raise an issue on this repository.
请联系 @michaelhoarau 或在此存储库上提出问题。
See CONTRIBUTING for more information.
请参阅贡献以获取更多信息。
This collection of notebooks is licensed under the MIT-0 License. See the LICENSE file.
该笔记本系列已根据 MIT-0 许可证获得许可。请参阅许可证文件。