In our last blog post, we focused on the digital transformation in the manufacturing industry, sharing our initial thoughts around micro-optimisations and MTL; short for Mesh Twin Learning, our own concept for enabling the technical implementation of various automation mechanisms on production lines.
This concept addresses most of the common challenges organizations face in Industry 4.0 transformation, such as complexity, gaps in standardisation or the inability to pick benefits from the wider scope of the solution.
This time, we would like to take a step further and dive into the MTL concept from the implementation perspective – as opposed to the purely technical – for two reasons. The first is to build a better understanding of the general concept and competitive advantages that such implementation brings with it, but we also want to discuss possible use cases, which are much more advanced and extensible beyond the initial manufacturing industry vertical.
A Long (hi-)story Made Short
The MTL solution was invented by me and my friend Maciej Mazur – we are both Solutions Architects, each with our own various experiences.
Maciej – our Chief Data Scientist – has a strong background in telecommunication, industry, and Data Science (surprise!). He has also always worked heavily on large scale projects, including many cutting-edge solutions driven by vast amounts of data.
I, however, was grown on slightly different ground, where modern technologies – especially those connected with the Cloud, networks and high-performance solutions – are used to build bleeding-edge systems in FinTech and eCommerce spaces.
We’ve got few things in common, but two are the most relevant for this story:
- We both like to solve real-life problems using the best possible technologies – even if it means we have to push the latter beyond its current limits
- We are always looking at how to combine our expertise and experiences to design solutions that are able to push the world forward
With that in mind, it was some time ago that, during morning coffee, that we first started to analyse one of the freshest cases that came to us. A manufacturing company wanted to track all of their assets, processes and logistics across warehouse and production lines. There were several solutions available on the market, but none of them were able to fulfil all of these requirements. Thus, we had to design something that connects the beats these other products miss.
Needless to say, it all went well (we wouldn’t be here talking about it if it did not!), the client was satisfied and the solution was stable. However, while this served the initial purpose, it wasn’t enough for us…
Technologies That Aim To Serve
Our first thought was that, with the modern technologies available on the market that we are using in every day projects, there has to be a way to get more out of them – not least of all in terms of scale, even if we are not thinking about the largest global players. This is how we came up with the Mesh Twin Learning concept – which we want to describe step by step.
However, before we jump into a detailed description of MTL’s technical architecture, it’s important to start from ground zero – in other words, the fundamental knowledge around the technologies we used to make this concept so effective, unique and special.
So, first of all, let’s quickly look at the definitions for the technology and terms that we will be using later on:
- Cloud – A large network of interconnected data centres, which operate as a single ecosystem, providing the necessary resources, computing power, and services to run applications or deliver the content to end-users.
- Edge Computing – A type of system architecture, where data storage and computing capabilities are moved to edge locations. This is supported by specialised devices connected to the global network, allowing for distributed computing with immediate access to results and seamless communication with Cloud resources.
- Digital Twins – A virtualised copy of a physical device, component or living entity, which represents its state as series of parameters constantly monitored and updated in time. This enables the digitisation of everything from processes, systems, places up to people. Digital Twins enable advanced simulations, better monitoring and more effective decision making over a distributed system’s state.
- Machine Learning – A concept that builds capability to automatically learn and improve from experiences into applications. It uses available data, without being explicitly implemented as set of static behaviours.
- Data Science – A set of scientific methods and techniques, aiming to extract knowledge from the surroundings, as well as use structured and unstructured data from all available sources, to support all processes focused on decisions, predictions or business management.
- Mesh – An approach to building, scaling and operating distributed ecosystems. The main focus is put on security, connectivity and reliability of the designed architecture, to limit the need of full assets sharing, instead introducing cross-functional capabilities of various parts of the system.
Now that we have these core definitions established, you’re probably thinking: “what the hell? I knew this already, so where is Mesh Twin Learning hiding? They’ve got nothing!”
Well, as always, the best is yet to come!
Each of these technologies is brining quite a lot of benefits on their own and they are typically only connected as separate systems that pass meta-data between each other.
Try to imagine the following scenario: you want to introduce a system capable of gathering all available information from every sensor built into a tyre production line. This data should be consumed and utilised to monitor the process, such as establishing velocity, alerts on issues and so on. At first, it seems straightforward, as we can take the data from sensors and, through IoT gateways, transfer them to the Cloud, where Big Data technology will work for us to visualise everything that we need. At the end of all this, the manager will be informed about production progress.
While this sounds great, we’ve typically got two main challenges to deal with:
- Typically, a factory often has no stable, highly performant network connections that could handle the amount of data that needs to be transferred. A single machine at a factory can have several sensors, each of which is able to produce a few data streams. However, if we were to combine all of them, we could end up with anything from 100 to 200 parameters per production nest. And, while short lines can have around 20 nests, something like a car assembly line can have over 80 – it’s a lot to handle!
- Flexible Automation. Each sensor that we want to read from has its own standards for communication, data formatting and more. The number of IoT devices required to handle them is enormous – and each needs to be configured. On the other hand, we also have human operators observing production lines and taking action based on single parameters. Even with the best dashboards and reports, he or she will always be limited in terms of making decisions based on the vast amounts of information passed through all these readings. As a result, it’s quite obvious that such reactions can only be triggered for the most obvious and clearly known alerts, with the potential for true improvements and experiments passing by unnoticed.
These problems are not only applicable to industrial spaces. In fact, we are facing them in everyday life – for example, connectivity in cars, smart city monitoring, logistics and much more. With the entire strength of Cloud systems and vast improvements in Data Science, we had to answer just one single question: how can we force the technology to serve our purpose without sacrificing benefits?
Smart Connectivity For A Better Future
Mesh Twin Learning, as a concept, focuses on the two critical challenges that we mentioned in the previous paragraph: connectivity and flexible automation. Let’s go top to bottom and see how the solution is built, as well as how it answers the needs of modern world.
From a high-level architecture perspective (presented in figure 1 below) we can clearly distinguish three main parts of the solution where important patterns are implemented and processing takes place: the Edge devices, Cloud infrastructure and ML models. While the architecture below was prepared based on AWS services, the concept itself can be implemented within any of the public Cloud providers, such as GCP or Azure.
The above diagram shows a simplified solution plugged into factory production lines (helping to solve the visualisation challenge, as we’ve covered earlier). To provide some extra explanatory details:
- Production nests at each factory are integrated with dedicated edge devices – they follow Industrial Internet of Things (IIoT) standards
- Each device runs several data and ML related services, all under the supervision of a real-time operating system
- Data collected from these machines is transmitted over to the Cloud using managed services and secure protocols (certificates, VPN connections etc.)
- This Cloud part is mainly responsible for:
- The virtualization of machines state (all parameters) through a farm of Digital Twins (Device Shadows Farm)
- Feeding the analytical engine, which is actually the data lake for the Machine Learning engine.
- Everything is connected in such a way that information is passed back and forth in near-real-time, including updating services and models deployed on edge devices – all via over the air updates.
- We can plug any number of edge devices to the Cloud infrastructure and digitalise unlimited numbers of processes and physical devices.
At first glance, it doesn’t seem to be much different from the solutions available on the market. Let’s dig a bit more and try to understand the benefits of Mesh Twin Learning’s design, as well as where all the key strengths (as well as differences compared to others) are hidden.
Meshes, Meshes Everywhere
As you already know from the previous definition, we can call something a mesh when we are dealing with things that work well together in common way. If we expand this concept into direction of service meshes, where we build a dedicated layer that enables safe, fast and reliable communication between various services or devices, we will be in great to position to address most of the challenges coming from automated devices and edge computing space.
Typically, the machines that we are dealing with during the automation process have various ages and standards, which immediately raises the complexity of the final solution. Simultaneously, within a large manufactory, we are talking about hundreds of different interfaces that we have to pull data from, all in a secure and reliable way. This is the area where common solutions start to reveal their first limitations, even during pilot projects, as they are either losing the data during transmission, completing the transfers in non-optimal (slow) way, or – worst of all – they are violating security standards.
Secondly, we have to remember that the entire responsibility for data processing (which comes in different formats) is thrown at the Cloud infrastructure, which raises the operational cost (time, performance, billing etc).
These are the main reasons for which we have decided to design specialised mini-PCs’ (with valid industry certification) enabling:
- Creation of connected devices – This ensures we are ready for any kind of integration. These edge devices are compliant with IIoT, run under specialised environment, allow us to introduce an abstraction tier, and are resilient to respective machine standards. They are prepared to consume any type of data sources and deal with them locally and at scale. Data collected here is used in three ways:
- In Batch process – Aggregated states (raw information) are compressed and encrypted, in order to be transferred over to the Cloud infrastructure through a configured device gateway. Batches are sent over once per every 2-5 seconds, which limits the network usage.
- Through device virtualisation – Each edge device creates a local digital twin of the monitored machine, allowing it to quickly represent the current state and take desired actions or simulations, without the necessity to call Cloud services. Only the meta-data for digitalised assets is sent over to the Cloud, which, as a result, limits the size of the uploads.
- Via ML light models – Every state change or event noticed on the physical device being monitored is interpreted by a light, trained Machine Learning model, and may cause a direct reaction, such as an adjustment in the production parameters, for example. Local models are lightweight representations that are under constant supervision of the main model, which is deployed in the Cloud infrastructure.
- Usage of real-time environment – This is adjusted for the specific configuration of the machine that it runs on, allowing for immediate reactions to the parameters changes, events, and any issues that may arise. Additionally, real-time operating systems (RTOS) allow for the deployment of native Cloud services in a local environment, with additional management and security levels, which makes the entire solution more robust and maintainable in the long term.
- Secured communication – This setup, based on the native services and built on top of the RTOS, with additional management services, allows for setting multiple-levels of security, starting from data encryption at local devices, secured transmission over Virtual Private Networks (VPNs), and secured networks, with rotation of keys and certificates. In the MTL solution, we are not sending raw data, but aggregated states or meta-information that only has meaning on an edge device or in the Cloud. Additionally, the edge device itself can be treated as a software protector of connected machines, as it doesn’t expose the physical device interfaces directly to the Internet.
- Exchange of the configurations – Every single edge device with the configuration optimised for a given type of physical device can be easily transferred to the Cloud and propagated to other similar devices, or newly plugged facilities, which starts to introduce the economy of scale concept.
After connecting all of the above as a single solution, it quickly becomes obvious that Mesh Twin Learning provides a few options have not been considered so far on the market.
With this solution, we are starting to be resilient to end-devices standards and configurations, as we have full control over the integration, including how data is received. Locally deployed environments, with their own computing capabilities (edges), offer self-sufficient systems, which are not only dealing with incoming data, but are also hosting Machine Learning solutions and digitalised views over monitored components. This gives us two main benefits: the ML model is constantly fed with information, which allows to take immediate actions over changing conditions, and we are not sending vast amounts of data over the network.
The most important part of this approach, however, is the economy of scale. Edges are similar in terms of architecture and they can take over the configuration from other edges, which means that, every time we plug in a new machine to the system, it becomes near-instantly configured to the current optimal state of entire network.
We can interconnect not only machines within the same production line, or building, either, as we can also start connecting devices located in various facilities. At each level, we are creating mesh of edge devices that are in constant cooperation and exchange of information, all of which leads to the most unique part of MTL – competition and optimisations.
As mentioned in the previous section, each edge device is doing basically two things. They host light Machine Learning models and transfer meaningful data over to the Cloud using secure protocols. So, let’s take a look at the Cloud architecture:
Here, we have a few areas that are worth mentioning. First of all is the Device Management area, which is directly connected to Device Shadows Farm. The main purpose of this backend system is to monitor state changes of virtualised edges and react to the events. Events triggering changes may come from two different sources:
- IoT Rules Engine – depending on the current state of the machine, we can configure alerts and various behaviours that can be immediately passed to an edge device, and then straight to the physical machine. This gives a centralised point of control for production managers and operators with a smart and adaptive adjustment engine. An operator is allowed, for example, to remotely pass the order to increase temperature at the nest, as the factory temperature went down and it is negatively impacting the quality of produced components.
- Central Machine Learning engine – a core part of Mesh Twin Learning. This is the central place where main ML models are constantly fed with near-real-time data and are producing predictions over the optimisation processes. New models produced for parts of the process are build up and transmitted over to the edge devices as lightweight ones, ready to execute the actions locally.
Now, you may be wondering where the competition and scale advantage of MTL is coming from? Well, imagine that every local model, as well as the digital twins, are constantly updating the core ML with their findings, parameters and operations. The results from these actions (here we are talking about multiple edges and models, just to be clear) are compared with each other and replicated in the Digital Twin Simulator area. The winner of such a competition – the model that contains the most optimal parameters for sub-processes or production steps – is wrapped and distributed across all edge devices. This is an ongoing process so, in essence, it means that we get constant updates, with the most efficient settings distributed globally.
But what does this mean when we put it in practice? It means we can perform micro-optimisations at scale, in a fully automated way. At the factory level, to help understand it better, let’s come back to the example of a tyre production plant.
For simplicity, let’s make a short production cycle, where we have to heat up and vulcanise the rubber, form it into a tyre and cool it down for inspection at the quality gateway. Our supplier delivers a new type of rubber for production and our edge devices quickly catch at the quality control gate that, after a full production cycle, the new tyre is not elastic enough, so we probably have to use a higher temperature at the beginning. This information is passed to the central ML engine, where simulations are processed, and ultimately the edge device responsible for the furnace is increasing the temperature by a few degrees.
At the same time and in another production line – or even another facility – we already have information that, thanks to this new rubber, we can expect a less elastic tyre, so the ML model additional increases the pressure parameter.
After the next cycle, both models are compared in terms of results and we discover that both actives have a positive effect on the final product. All edges are immediately updated with the latest configuration, as well as with new light ML models, which are starting the entire optimisation process from the beginning.
In essence, we are combining experience from multiple source, all around the world and in an automated fashion. This makes us capable of making decisions at a micro-scale, with results that increase the quality, velocity or safety on a macro scale, all without a long process full of time-consuming iterations.
Hopefully, you should now understand Mesh Twin Learning and how it comes from a combination of a number of technologies, most notable Digital Twins, the Cloud with a Service Mesh, and Machine Learning. It is designed in a way that allows for mesh learning, which in essence means that all physical devices and edges are connected to them, enabling them to share experiences (configurations) and influence optimal parameters, increasing the overall operational quality of all supervised processes.
This constant competition between ML models ensures that we always get the best possible settings, with changes and updates that are distributed globally – in no time at all – to every configured device. Once MTL is in place, every new facility that is plugged in will immediately receive the most effective and validated settings from the wider network – reducing the kick-off time for the business.
Furthermore, it’s worth remembering that the entire concept was prepared in a way to address the most challenging aspects of IoT, Cloud, and Machine Learning, which are data size, connectivity, security and efficiency of the operations. Combined together in just the right way, we’ve managed to harmoniously get the best of all of them, unlocking a few new advantages along the way!