The Power of Embeddings in Tiny AI Sensor Nodes
Embeddings, a way of representing data such as text, images, and sounds, is a popular term in data science and AI tech media, despite the dry language of its math nature.
The reason is that embeddings are one of the most versatile techniques in machine learning, applicable to a wide range of use cases, from information search, similarity check, and fraud detection to learning user behavior and relationships among items. One of the most famous embeddings is Word2Vec, invented by Google in 2013. There is great potential for embeddings in the future Edge AI to come. When applied in the real world, embeddings will play a serious role in the long-awaited neural networks implementation in industrial IoT applications.
Embeddings are one of the most useful ideas we employ in data preprocessing. When the raw data is messy – for instance, a noisy signal from sensors – we want to boil it down to a shorter and cleaner format through data patterns. Simply speaking, an embedding is a vector of numbers, each representing one salient feature of the data processed. By using these vectors, we can complete downstream tasks such as classification with much greater ease.
There are many ways to make a neural network extract important information from the data and store it in embeddings. The most common way is to use two networks: an encoder and a decoder. The encoder attempts to get the most out of the raw data and produce an embedding, which is then fed to the decoder. In turn, the decoder tries to reconstruct the raw data given only the embedding. After a number of iterations, the information is utilized in an optimal way and the embeddings contain semantically essential data.
Basically, this approach allows us to perform what is widely known as transfer learning. In transfer learning, some domain knowledge is accumulated in a neural network and is then transferred to complete specific tasks within that domain. Most or even all layers of the underlying neural network usually stay frozen, so that the embeddings remain reliable.
This fact allows creation of small and ultra-low-power Tiny AI devices, such as smart sensor nodes for predictive maintenance. For example, a NASP solution reduces the data flow from vibration sensors by 1000 times, using the same encoder-decoder approach, and transmitting through LoRa (or another low power technology) only embeddings extracted from the initial data. It is worth noting that the autoencoder systems and embeddings will detect unfamiliar classes, describing new signals of vibration sensors, even if they were not trained to recognize these types of signal patterns.
The use of embeddings reduces measurement data sent to the cloud from machinery, tracks, railway cars, wind turbines, and oil and gas pumps, solving the fundamental problem of low bandwidth required by IoT systems.