App settings:

Short Summary

Use of modern techniques from the field of machine learning in industrial manufacturing processes. This article describes the concept of anomaly detection in production and introduces first steps of integrating these solutions into the FabOS environment.  

 

Article

Today's industrial production faces various tasks and challenges, such as increasing quality requirements and complexity of the product, constant cost and innovation pressure, and the change from mass to customer-specific products. An efficient method to solve these tasks is the use of artificial intelligence (AI) [1] [2] [3]. AI can already be sensibly used in today’s production: for condition monitoring tasks especially for applications in predictive maintenance and for support in decision making for adaptive process optimization, for example through the integration of pattern recognition algorithms or neural networks.  However, for AI to be used effectively, the production infrastructure must fulfil various requirements. First, the data availability needs to be ensured [4]. Access to sensors, machines and processes must be granted, reliable and synchronized. Further, the data must be available in high-quality, therefore semantic descriptions are needed for an easy integration of new data sources. This is where standardized interfaces and data structures build the backbone for a larger use of AI in industrial production. Within this blog post, a short introduction to one of the industrial applications of FabOS shall be presented. This use-case comprises an anomaly detection within an industrial milling application process. 

 

A possible definition of an anomaly is given by Douglas Hawkins: “an [anomaly] is an observation which deviates so much from other observations as to arouse suspicions that it was generated by a different mechanism.” [5] The identification of these deviations describes a central problem in machine learning techniques within the field of industrial applications.  

 

Since the occurring anomalies are highly connected to the corresponding applications, it is nearly impossible to find consistent definitions or create universal models across different tasks/domains/machines, this aspect sets anomaly detection apart from other machine learning problems [6]. Additionally, the data inherent noise is a common difficulty in the application of anomaly detection. 

 

Within the industrial use cases, the most common type of anomaly detection is the so-called point anomaly detection. Here anomalies occur as points within the data, which do not conform to the accepted normal behavior. In the context of Machine Learning there exist different methods to execute/implement anomaly detection procedures. The most common techniques for unsupervised anomaly detection problems, which means problems where the ground truth for training of the models is not known, are nearest neighbor-based, clustering-based and statistical methods. Nearest neighbor techniques use criteria of neighborhood properties of the data points to assign an anomaly score, the basic assumption is that normal datapoints lie in dense neighborhoods, while anomalies/ outliers find themselves in sparse neighborhoods. Clustering methods learn clusters from the given datasets and give an anomaly score based on the relationship compared to the nearest cluster. The general assumption here is that anomalous points do not belong to a cluster or are very distant from the nearest cluster representative. Statistical methods estimate a model from the data and apply statistical evaluations on the probability. These kinds of methods are applicable if the normal instances without anomalies can be modeled via statistical distributions [6]. 

 

If process knowledge is already available, classification methods have shown to be very effective techniques to learn classifiers from the training data and apply labels or scores to test data. These methods can be distinct between one-class methods, which classify points either to one class or to none if an anomaly is detected, and multi-class models where points which do not belong to the normal classes are classified as anomalous. Basic models for classifications are support vector machines (SVM), neural networks (i.e., autoencoders), Bayesian models or rule-based systems [6].  

 

Modern machine tools for metal cutting are used in industrial production chains for turning, milling, and drilling operations. Depending on the respective area of application, they appear in different degrees of automation. To produce single parts and small series, a standard CNC machine with automatic tool change is usually sufficient. As the number of pieces increases, further expansion stages, for example to a machine tool center or -cell, are economical. Multi-machine systems, i.e., flexible manufacturing systems, are mostly used in mass production, as they offer significant economic advantages due to a high degree of automation. These can produce workpieces efficiently with multi-machine systems in a 24/7 series production. However, the higher the degree of automation, the less flexibly the systems can respond to changes. This means that especially in mass production the need for data-driven systems is high to be able to react autonomously to process-dependent changes across machines. Figure 1 shows an example of a milling tool. Sensor-based systems that use recorded process data for condition monitoring or adaptive optimization purposes add significant value to automatic process analysis and manufacturing, thus contributing to increased productivity to meet the requirements of today's production processes 

 

 

The represented state-of-the-art machine tools are designed for productivity, functionality, and accuracy. The mechanical development as well as the machine control is technically advanced. It is designed for functionality but not for adaptivity and connectivity. Therefore, this allows only limited subsequent addition of adaptive solutions or control of further complementary solutions. For automatisms, there are usually only a few solutions available, which is why in-depth expert knowledge of production is still required. Finally, this is also because the machine-integrated sensor technology is usually only trivial, cannot be addressed or read out externally to a large extent, and is subject to severe limitations in terms of sampling rates and accuracy. 

 

The presented use-case within the FabOS project is a three-axis machine tool of the type DMG HSC-55 at the Fraunhofer IPT. It is equipped with additional vibrations sensors, acoustic emission sensors and an industrial microphone. The following figure shows the positions of the exemplary vibration and acoustic emission sensors on the machine’s spindle axis. 

 

 

For data acquisition the Fraunhofer vBox is used. This is a sensor data acquisition unit which inhibits different connectors for various sensor types. By its internal electronics it is capable of sampling rates up to 100 kHz. The sampled sensor data is transferred to an additionally connected IPC, which is stored within the machine cabinet. 

 

As specific use case, the manufacturing process of one small turbine blade like model was chosen. By its nature this process tends to show high vibrations when the manufacturing process is not optimized. This represents an optimal use-case for the application of anomaly detection, since the manufacturing of normal parts, which fulfill the required workpiece quality and the manufacturing of “bad” parts can easily be adjusted. 

 

For the creation of the machine learning model for anomaly detection, already acquired data of the manufacturing processes will be used. During the course of the FabOS project, the model shall be integrated into a ML pipeline to provide online anomaly detection during running manufacturing processes. In an initial version, active process interaction will not be possible, therefore detected anomalies will only be shown via a warning message to the operator on the screen mounted next to the machine. At the current stage the model is still under development and trained using offline data.  

 

[1]  K. Ahlborn, G. Bachmann, F. Biegel, J. Bienert, S. Falk, A. Fay, T. Gamer, K. Garrels, J. Grotepass, A. Heindl und J. Heizmann, „Technology Scenario ‘Artificial Intelligence in Industrie 4.0',“ 2019. [Online]. Available: https://www.plattform-i40.de/IP/Redaktion/EN/Downloads/Publikation/AI-in-Industrie4.0.pdf?__blob=publicationFile&v=5. 

[2] T. Wuest, D. Weimer, C. Irgens und K.-D. Thoben, „Machine learning in manufacturing: advantages, challenges, and applications,“ Production & Manufacturing Research, Bd. 4, Nr. 1, p. 23–45, 2016. 

[3] A. Diez-Olivan, J. Del Ser, D. Galar und B. Sierra, „Data fusion and machine learning for industrial prognosis: Trends and perspectives towards Industry 4.0,“ Information Fusion, Bd. 50, Nr. 2, p. 92–111, 2019. 

[4] S. Jeschke, C. Brecher, H. Song und D. B. Rawat, Hrsg., Industrial internet of things, Cham: Springer, 2017, p. 715. 

[5] D. M. Hawkins, “Identification of Outliers” – Monographs on Statistics and Applied Probability, p.1, 1980 

[6] V. Chandola, A. Banerjee, V. Kumar, “Anomaly Detection”, (ed) Encyclopedia of Machine Learning and Data Mining, 2016 

 

Author: Pierre Kehl, Tim Geerken

Firma: Fraunhofer IPT

Created by l.demes94 28.07.2022 13:2228.07.2022 13:22.

Modified by l.demes94 28.07.2022 13:2828.07.2022 13:28.

Short Summary

A wizard that helps you automate the generation of Data Driven Services. 

 

Article

Shortage of skilled workers in mechanical and plant engineering  

According to Handelsblatt, HR managers in mechanical and plant engineering complain about a shortage of academics in 81% of cases and a shortage of skilled workers in 90% of cases. Technological change, driven by digitization and the mobility transition is said to create attractive jobs, and many employees are going to retire [1]. 

 

One approach of keeping the burden of the shortage of skilled workers on the competitiveness and ability of companies to act to a minimum is the automated monitoring of production processes and the prediction of maintenance work. For this purpose, machine data is recorded, and models are trained with their sensor values, the so-called Data Driven Services.  

 

What can Data Driven Services do?  

Data Driven Services can monitor the condition of a machine and thus detect tool wear at an early stage based on abnormalities in the sensor values in order to minimize defect production, ensure quality and protect the machine. Cutting tools and faulty components can thus be detected automatically. With a Predictive Maintenance Service, maintenance work can even be predicted in order to avoid downtimes and to plan maintenance work in such a way that production is hindered as little as possible. With the Predictive Quality Service, the production parameters are monitored and, if necessary, optimized. In this way, the productivity of a machine can be increased. In addition, predictions can be made as to how the machine condition will affect product quality.  

 

With the increasing number of sensors that have been installed in industrial machines for several years, enormous amounts of data are already being accumulated, which are all too often not used at all or must be prepared for emerging questions.  

 

Will the problem of the shortage of skilled workers be shifted from machine construction and engineering to the field of data science and data engineering, where qualified personnel are also desperately being sought? This is exactly where we start with our goals for FabOS:  

 

Automated generation of Data Driven Services  

With our wizard, we bring the AI to your data, virtually. The aim of the wizard is that you can use your data profitably even without a data science department. You will also not need additional support from IT. The wizard is designed in such a way that it allows your domain experts, i.e., machine builders or machine operators, to independently create high-quality Data Driven Services and then put them directly into operation.  

 

Data integration via the FabOS operating system  

You can use the FabOS operating system to connect your machines and integrate the data. The wizard offers a graphical user interface via which the user first selects the respective machine and the desired Data Driven Service. The Data Driven Services are described in the user interface to make it easier for you to choose one. Now all relevant data sets are suggested. The user is a subject matter expert for his machine and therefore knows for himself which sensor data is necessary to monitor the condition of the machine for the respective application and selects it accordingly. In addition, the length of the history can be adjusted to exclude characteristic changes in the data set. These can be caused, for example, by changed environmental conditions or changes in production. This is a crucial step in data cleansing. After selecting the data, further automated preprocessing takes place. Here, regulations resulting from data integration via FabOS are exploited.

  

Wizard powered by Auto-ML  

Meta-learning analyzes the available data and preselects the algorithms that have provided promising results for similar data sets. Thus, the actual AutoML procedure able to build complex pipelines for the given problem very efficiently [2]. It will determine the best possible model with the ideal hyperparameters after just a few iterations via different algorithms. If you are interested, all steps carried out in AutoML can be traced using the XAutoML tool newly developed by USU [3]. Additional transparency to the reasoning of the model is provided by an automatically generated decision tree that describes the overall model. In addition, it is possible to explain predictions for individual data points. These arrangements increase the human acceptance of the Data Driven Service and reduce the risk of misbehavior of the algorithm.  

 

Service Lifecycle Management  

The user can now choose whether to accept the model with the best metric or opt for another one. In order to reduce the time for go-live of the Data Driven Service, the wizard offers further assistance functions. It supports the user during deployment, here you can choose between cloud and on edge. Finally, the live data of the machine is fed into the deployed model. The predictions are visualized on a dashboard. Here, the user also has explanations for the predictions available. The service is monitored by an AI supervisor who can provide extra resources under load. In addition, the AI supervisor issues an alert if the prediction quality drops. This can very often be due to changes in environmental conditions or a change in production control. Here it is necessary to retrain the model, whereby the wizard can support again.  

 

[1] https://www.handelsblatt.com/politik/konjunktur/nachrichten/fachkraeftemangel-maschinenbauer-wollen-personal-aufstocken/27843750.html?ticket=ST-340117-T6GmkIysmEDH4uXTcWeg-ap6  

[2] Zöller, M.-A., Nguyen, T.-D., & Huber, M. F. (2021). Incremental Search Space Construction for Machine Learning Pipeline Synthesis. International Symposium on Intelligent Data Analysis, 103–115. https://doi.org/10.1007/978-3-030-74251-5_9  

 [3] Zöller, M.-A., Titov W., Schlegel T. & Huber, M. F. (2022). XAutoML: A Visual Analytics Tool for Establishing Trust in Automated Machine Learning. https://doi.org/10.48550/arXiv.2202.11954 

 

Author: Carolin Walter

Firma: USU Software AG

Created by l.demes94 06.07.2022 09:5006.07.2022 09:50.

Modified by l.demes94 28.07.2022 13:2728.07.2022 13:27.

Article

With Industry 4.0, intelligent and connected factories, so-called smart factories, are a bright vision of the future. Currently, however, companies are still often faced with the challenge of digitizing workflows and processes in a way that promotes efficiency without sacrificing flexibility and usability.

 

This problem is easily explained in the case of components from the manufacturing industry. At the moment, many companies are not yet talking about smart machines and processes. However, flexible production systems are often operated, which means that the components produced by the system change on a daily, hourly or even minute-by-minute basis. What initially sounds good and flexible, however, also harbors problems.  

 

On the one hand, employees often do not know which component they are dealing with because the construction plans change so quickly, which significantly increases the risk of confusion. On the other hand, the finished components then have to be assigned to the correct customers or projects in a time-consuming process using paper lists, or markings have to be applied to the components from the outside (e.g. using a laser) in order to be able to recognize which component it is afterwards. This procedure is also error-prone and should no longer be necessary. Artificial intelligence should enable part identification that recognizes the respective parts in real time and assigns them to the appropriate blueprint. 

 

Currently, there is no solution on the market that allows the identification of any free-form parts. At the moment, there are only partial solutions for specific, pre-defined components. However, these have the disadvantage, that it is very time-consuming and cost-intensive to add and train further components, which makes these solutions extremely inflexible. A solution that remains flexible and can identify any free-form parts without further training of the data does not yet exist on the market, but is essential for the concept of a full digitization or smart factory and an important point for the competitiveness of German companies. 

 

Content based similarity comparison for component identification 

This innovation is being developed by Compaile Solutions GmbH within the FabOS research project. The aim of the project is the content-based (not optical) similarity comparison of unknown components based on neural networks. This enables the assignment of any components to corresponding construction plans as well as the use of this and similar technologies with networking of edge and cloud computing. It makes production more flexible, less prone to errors and enables individual components to be manufactured much more cheaply and quickly. A machine that is flexible and can act in real time as a result saves companies valuable warehouse space, as components are created when they are needed and there is no need for pre-production. Furthermore, this technology can also be extended to quality assurance issues. 

 

 

Increased quality through AI in industry 

With the help of an AI, more reliable quality checks can be performed during and after production. The advantage compared to currently used camera systems or manual quality control is speed and flexibility. Unlike a classic camera system, an AI is not dependent on a specific position or orientation of the components in order to be able to detect defects. Even complex or new components can be easily taught to the AI and subsequently analyzed by it within seconds. In conjunction with fully automated production, the necessary steps can be taken directly to rectify the detected defects before the affected parts are processed further and major damage occurs. 

 

Author: Kaja Wehner

Firma: COMPAILE Solutions GmbH

 

Created by l.demes94 22.06.2022 09:3122.06.2022 09:31.

Modified by l.demes94 27.06.2022 10:1827.06.2022 10:18.

Short Summary

This article gives a short overview over the topic of safety in industrial applications, highlighting legal aspects and corresponding requirements for products and their development process. Furthermore, problems of industry 4.0 and machine learning regarding adherence to current legal regulations are discussed, and possible solutions under development as part of the FabOS project are introduced.  

 

Article

Since its very beginning, the industrialization has been characterized by steady change. Important milestones were the change from pure manual labor to the use of simple machines as well as the employment of e.g. steam power, the beginning of the mass production, driven by electrification and the assembly line, and the still ongoing automation through the use of computer systems. However, the next industrial milestone is with Industry 4.0 already at hand. It is characterized by the increasing interconnection of production assets, human-robot-collaboration and the use of artificial intelligence. 

  

 
Human and robot collaborating on an assembly task ©KIT 

  

The serious changes to the production itself during these ‘industrial revolutions’ came with changing risks for the factory workers. Especially the later half of the 19th century is painfully remembered for its lack of occupational safety which lead to frequent and severe injuries of workers and child labourers alike. Consequently, occupational safety became an increasingly important subject of discussion at the end of the 19th century, and the foundation for today’s strict occupational safety laws was layed. Nowadays, there is a broad range of laws and standards regarding factory and machine safety (e.g. Machinery Directive 2006/42/EG, IEC 61508, ISO 13849) with the goal to avoid injuries of workers. With the upcoming changes induced by Industry 4.0, it will be a demanding challenge to keep fulfilling the requirements of these laws and standards - a purpose FabOS is going to contribute to. 

 

As mentioned in the previous section, there exists a wide variety of requirements that have to be fulfilled to achieve safety for workers in an industrial environment. To avoid potential danger from machinery - so to speak, achieving machine safety - it is necessary to obey the european Machine Directive, which has been transferred into german national law in form of the Product Safety Act (‘ProdSG’) and the Machinery Ordinance (‘9. ProdSV’). To simplify the adherence to the safety-related laws, so-called harmonised standards can be applied for the development and deployment of machines. In simple words, these standards can be seen as manuals that describe the requirements and necessary actions for the development and deployment process of safety-related machines. The most general safety standard is IEC 61508, while more specific standards like ISO 13849 (‘Safety of machinery - Safety-related parts of control systems’) are derived from it. When applying a harmonised standard successfully (Keep in mind: Not all standards are harmonised) it can be assumed that the legal regulations are met. This simplifies the development of safe machines significantly. 

  

Safety-related laws and selection of safety-related standards, together with their relations ©KIT 

  

It is important to note that safety is not an isolated property, but rather a holistic process that spans the whole product life cycle. It already starts with the selection of suitable and qualified individuals for the development and requirements for the organizational structure. However, the primary goal of this process is (among others) that the developed product will be highly reliable, meaning the almost complete absence of dangerous malfunctions. The reliability of the product is measured with the “Probability of dangerous Failure per Hours, PFH”, which applies to dangerous malfunctions of all safety-relevant functionalities. As an example, a PFH value below 10-6 is required for applications from the field of human-robot-collaboration - this equals a maximum of one dangerous error per 1.000.000 hours of operation. Even if the development and production of a machine is finished and all requirements for safety were met, there is still more to obey, as the final application context is also relevant for safety. Thus, the final deployment in a production cell must undergo a risk assessment, and suitable counter measures for identified risk factors must be employed, before commissioning can take place.

 

Particularly in view of the main characteristics of Industry 4.0, it becomes increasingly difficult to adhere to current safety regulations while embracing recent technological advancements. Today’s AI applications have no provably correct behaviour, they function based on large amounts of data and associated desired results, making mathematical proofs almost impossible. Significant differences between the available training data and the data capture at the final operation site can also lead to unpredictable behaviour. Thus, the application of AI for safety-relevant functionalities can not be done without additional safety measures, as humans could be injured due to AI malfunction (e.g. when AI is controlling a robot). Furthermore, the desired modularity and changeability as part of Industry 4.0 conflicts with current safety regulations: Today, a risk assessment is needed for each specific production cell configuration, and even the slightest changes make a repeated risk assessment necessary. This contrasts with the demand for a freely changeable production. 

 

FabOS is going to contribute to the solution of the mentioned problems by investigating possibilities for the application of AI in safety-related functionalities as well as the employment of a changeable production without repeated risk assessments. To enable the application of AI, a so called safety supervisor shall become part of FabOS. The supervisor will be tasked with validating AI decisions with respect to fixed safety criteria based upon the current state of production devices. In the case of an AI that controls a robot, the safety supervisor would monitor the current position of all parts of the robot and make sure that it does not leave its predefined working space to avoid collisions with humans. If the AI would make a decision to leave the working space, the safety supervisor would either discard the move command before being executed by the robot, or cause a safety stop of the robot shortly before it would leave the working space. Thus, the responsibility for human safety is transferred from the AI to the safety supervisor, who makes decisions based on clear rules whose correctness can be proven. To ensure safety in a changeable production, so-called Conditional Safety Certificates (ConSerts[1]) will be used. The basic idea behind the approach is to abstract from specific hardware and software by formulating requirements for the offered functionalities of these devices. The risk assessment shall be done to the functionality-level by formulating requirements for functionalities and devices offering these functionalities. Using a laser scanner as an example, such requirements could be the installment at position X, a sampling frequency of at least 50 Hz and at least performance level d (measure for the reliability of a safety-relevant device or function, defined in EN ISO 13849). When the laser scanner is exchanged, there will be an automated or partially automated check if the safety requirements formulated in the associated ConSert are fulfilled, to make a repeated risk assessment unnecessary.

 

It is the goal of the research on the safety supervisor and ConSerts to highlight possibilities and approaches to deal with the different aspects of safety in Industry 4.0 applications in the future. Obviously, the full-fledged safety certified development of components is too costly for research projects like FabOS. However, there will be prototypes being developed, which can serve as a starting point for further safety-related work in FabOS and Industry 4.0 alike. 

 

Author: Patrick Schlosser

Firma: KIT

Created by l.demes94 08.06.2022 09:0408.06.2022 09:04.

Modified by l.demes94 28.07.2022 14:3528.07.2022 14:35.

Short Summary

An efficient lot size one requires a changeable factory. However, safeguarding a changeable factory needs a new approach for safety engineering. In this article, we will explore the possibility of employing a modular and model-driven approach for safety engineering by leveraging Conditional Safety Certificates. 

 

Article

1. Introduction 

Industrie 4.0 promises to revolutionize production by making plants more open and flexible, which subsequently allows to manufacture highly individualized products as well as a large amount of production variants, directly tailored to the customer’s needs. This individualization and alteration potential through Industrie 4.0 also contributes to the UN Sustainable Development Goal 9: “Build resilient infrastructure, promote inclusive and sustainable industrialization and foster innovation” [1]. The high degree of flexibility in Industrie 4.0 is partially achieved through the employment of Human-Robot-Collaboration, which marks a step away from the rigid separation of automation and manual labour. Humans and robots now share a common workspace, where the robot for example assists the human worker in physically demanding and simple, repetitive tasks, while the human herself can perform the product individualization and more challenging tasks. Though this cooperation increases the flexibility of the production while maintaining high productivity, it comes with its own challenges to worker safety, as not only no separation between human and robot exists, but also that production environment and devices involved change frequently to account for the currently assembled product variant. Therefore, regulations for machine and worker safety in production environments, like EU Directive 2006/42/EC, remain highly relevant in Industrie 4.0. 

  

2. The Safety Engineering Process in Industrie 4.0 

Based on these challenges, FabOS work package 3.2 investigates how a changeable production plant can be safeguarded. A safety engineer is traditionally tasked with manually inspecting and approving the plant (along e.g. checklists). Changeability of the plant would either a) cause regular extensive audits or b) irregular complex audits to check a consolidated safety concept for all plant variants. To avoid both, we plan to employ the concept of “Conditional Safety Certificates” [2] (ConSerts), which allows safety arguments to be model-driven and modular. A production planner and a safety engineer could change the plant and check the safety at the same time in a model-driven fashion. The use of asset administration shells (AAS) and their submodels makes it possible to develop manufacturer-independent safety concepts and to integrate them in production in a semi-automated manner to create a safe plant. 

  

3. Modular Safety Assurance for a Robotic Bin-Picking Use Case 

Fig. 1 Two scanners (black circle-cuts) with two safety zones (yellow and red) inform a robot to slow-down or stop if a human approaches it (© 2020 KIT). 

  

Not all safety concepts are static and inherently safe; instead technical protection systems are used to transmit, evaluate, and use safety-critical information at runtime, e.g. to trigger an emergency stop. ConSerts can be used at runtime to abstract from the specific protection system by synthesizing runtime monitors on their basis. For example, a set of ConSerts describes a safety concept to safeguard a pick-and-place application, consisting of a robot and laser scanners (cf. Fig. 1). The scanner provides the occupation of different safety zones and the robot demands that a certain space around it is free, depending among other factors on its current speed. The ConSert-based monitors (not depicted) act as mediators at this point, collecting information from the laser scanner and providing guarantees to the robot about the occupation of the workspace. 

Fig. 2 Each component comes with a Platform I4.0 [3] compatible ConSert submodel and a runtime monitor, containing structure and logic of the safety argumentation. 

  

The structure of the safety concept and the system components are visible in Fig. 2. Here it is also clear what role asset administration shells and submodels play. They allow runtime data (such as safety zone occupancy) to be used to make safe decisions (such as allowing robot movements). Here, the “Safe Robot Application” submodel can be seen as a system service with a safety concept that can be developed independently of specific components. Similarly, the production assets (robots, laser scanners) are developed independently of the safety strategy and merely provide their guarantees and requirements in the form of jointly defined ConSerts. 
It should be mentioned at this point that the focus of the work package is on modeling the safety concepts and making these concepts usable at runtime. The safety of runtime environments (OS, containers) and communication networks (protocols, transmission methods) is considered, since it leads to a safe overall system, but is not the focus. Here, technologies such as Time-Sensitive Networking (TSN) and real-time-capable containers are to be seen as potential enablers [4]. 

  

4. Explainable AI as an Enabler for Dynamic Risk Management 

Closely related to the topic of ensuring worker safety is the challenge of avoiding property damage through employed robot systems. In contrast to worker safety, this involves avoiding collisions of the robot with the environment and inappropriate actions of the robot potentially leading to damage on the handled workpiece. Especially AI applications – an integral part of Industry 4.0 - with their sometimes unpredictable behavior can be seen as a major risk factor here. For example, using an AI for determining grasp positions for workpieces can lead to severe property damage when the determined position is incorrect: Then the robot can for example drop the workpiece when the grasp is not stable or collide with the environment when the grasp position is predicted inside some of the surroundings. To detect and/or avoid such defective and potentially harmful AI decision, extensive research is performed on explainability of AI and determination of uncertainties of AI decisions [5]. 

 

In this use case, the above-mentioned grasp position algorithm is to be extended by a component for determining uncertainty. This then serves as an additional input for the ConSert-based monitor and thus influences the robot speed. Thus, both classical safety mechanisms (human in proximity causes an emergency stop) and mechanisms of dynamic risk management are combined. In the latter case, the approaching speed is coupled to the uncertainty of the decision, in order to keep the risk of material damage low, while at the same time ensuring efficiency.

 

5. Outlook 

In work package 3.2 of the FabOS project, these concepts are going to be implemented by the end of the project and integrated into demonstrators. As concrete results, various safety submodels will be implemented and a safety supervisor component is to be developed. In addition, the delivery of asset administration shells and submodels linked with runtime behavior is considered. Through this combination, an easily manageable AASX file can be provided and directly integrated into the off-the-shell components of the I4.0 middleware BaSyx [6]. This provides a contribution to link safety engineering with asset administration shells – so that future open, distributed, and flexible Industry 4.0 plants do not have to compromise on safety. 

 

[1] https://sdgs.un.org/goals/goal9 

[2]  Daniel Schneider and Mario Trapp. „Engineering conditional safety certificates for open adaptive systems.” IFAC Proceedings Volumes 46.22 (2013): 139-144. 

[3]  https://drops.dagstuhl.de/opus/volltexte/2020/12001/pdf/OASIcs-Fog-IoT-2020-7.pdf 

[4] https://drops.dagstuhl.de/opus/volltexte/2020/12001/pdf/OASIcs-Fog-IoT-2020-7.pdf 

[5] Kläs, Michael, and Lena Sembach. "Uncertainty wrappers for data-driven models." International Conference on Computer Safety, Reliability, and Security. Springer, Cham, 2019. 

[6]  https://www.eclipse.org/basyx/ 

 

Author: Andreas Schmidt (Fraunhofer IESE), Denis Uecker (Fraunhofer IESE), Tom Huck (KIT), Christoph Ledermann (KIT), Frank Schnicke (Fraunhofer IESE)

Firma: Frauenhofer IESEKIT

Created by l.demes94 25.05.2022 09:3925.05.2022 09:39.

Modified by l.demes94 25.05.2022 09:4225.05.2022 09:42.

Short Summary

This article introduces the concept of Collaborative Robots (“Cobots”) and discusses advantages of Cobots compared to traditional industrial robots as well as open challenges for Cobot deployment. Herein, the main focuses lies safety as a central issue in collaborative robots. Various safety challenges are discussed, especially with regard to to combination of artificial intelligence and collaborative robotics. Finally, the article highlights how FabOS can contribute to the safe deplyoment of Cobots in a production environment. 

 

Article

“Cobots”: Collaborative robots 

  

 

Cobots are specially designed to interact with humans, for example as seen here, through hand guidance.  ©KIT 

  

If you have visited any automation fair within the last few years, you have probably seen a “Cobot” (short for “collaborative robot”). Almost all robot manufacturers, from established companies to newcomers, have introduced robots specifically designed for human-robot collaboration (HRC). The trend towards collaborative robotics is driven by a number of factors that are crucial for the production of the future: Due to individualized products and smaller lot sizes, traditional fully-automated robot systems are often not flexible enough. Due to increasing labor costs, on the other hand, manual labor is also unattractive. HRC presents a trade-off between these two opposites and offers a low-cost entry into the world of robotic automation, while maintaining some of the flexibility that is inherent to manual labor. Furthermore, collaborative robots can assist human workers with tasks that are physically or ergonomically challenging - a point that is especially important when considering demographic trends in many industrialized countries. 

 

Safety: A major challenge in HRC 

 

However, all these advantages should not hide the fact that deploying a collaborative robot is far from easy: Several issues need to be addressed before commissioning of the system. The most important issue is safety: Robots, even if they are relatively small and lightweight cobots, can move fast and exert great forces in case of a collision. Thus, safety is paramount when humans and robots share a workspace. However, ensuring safety is not trivial. Even though cobots usually come with a wide variety of safety functions such as velocity limitation, workspace limitation, or collision detection, they are not inherently safe. Robot safety is a property which is not inherent to the robot. Safety also depends on the application and on the system environment where the robot is used. Thus, the robot safety standard ISO 10218 requires that a risk assessment is performed to identify and assess potential hazards and configure the robot’s safety functions accordingly. The risk assessment procedure itself is specified by ISO 12100. Nowadays, risk assessments are typically performed on the basis of expert knowledge, experience, and simple tools such as checklists. However, current research aims to develop support tools that are based on simulation and intelligent expert systems. 

 

Although a proper risk assessment is important, it is not the only challenge. HRC is also very demanding with regards to the components and communication channels that are used. Safety-critical robot functions (e.g. measuring human-robot distance and transmitting that information to the robot), must fulfil the safety requirements expressed by Performance Level (PL) d according to ISO 13849 or Safety Integrity Level (SIL) 2 according to IEC 62061. This requirement significantly reduces the choice of components and communication channels and increases costs. System Designers need to consider carefully which functions are safety-critical and which are not. To avoid errors and keep costs low, safety-critical functions should - if possible - be separated and implemented locally. When it is not possible to implement safety-critical functions locally, and a network has to be used to transmit safety-critical signals, users should assess carefully whether the network infrastructure can fulfil the strict safety requirements. 

  

Safety-rated components do not necessarily make a safe system. Safety is a system-wide property. ©KIT 

  

Collaborative robots and artificial intelligence: A good combination? 

  

At first glance, artificial intelligence (AI) and HRC are a perfect combination: Machine Learning can enable robots to adapt to their human counterparts or to changes in the production environment. But again, one must consider the safety challenge: In current industrial practice, programs are typically hard-coded on programmable logic controllers (PLCs). In contrast, AI adapts its behavior by learning from data. Thus, AI-driven components might act in an unforeseen way. This makes it very hard to provide safety guarantees as required by the aforementioned safety standards. AI-based systems with safety guarantees are an active research issue. Although there are promising approaches, it will probably take quite a while to deploy “safe” AI in a real-world industrial environment. 

 

Thus, in a short- to medium term timeframe, it will be the most practical solution to deploy AI only in non-safety-critical robot functions. If, nevertheless, an AI system requires access to safety-critical robot functions (such as motion control), there should be another, non AI-based component to supervise AI decisions with respect to certain boundaries (e.g. workspace or velocity limits). 

 

How can FabOS support the deployment of collaborative robots in a production environment? 

 

As we have made clear in this article, safety is a crucial challenge for HRC applications in a production environment, especially when AI-components are involved. Planning and risk assessment are time-intensive and costs of safety-rated components are relatively high and the choice of components must be considered carefully. 

 

Motivated by this challenge, FabOS conducts several research activities related to the safety of AI-based systems, including HRC applications. The FabOS Safety Supervisor will provide a generic component to supervise AI-driven systems at runtime, which - among other benefits - is expected to simplify the deployment of collaborative robots with AI-components. 

 

Furthermore, FabOS will investigate the integration of so-called “Conditional Safety Certificates”[1] (ConSerts). The use of ConSerts will simplify the process of checking whether the current safety configuration of a production system is valid and the used components fulfil the relevant safety requirements. After exchanging or modifying a component that is part of the safety configuration, for instance, ConSerts help to determine whether the system still fulfills the original safety requirements. More details about Safety-Supervisor and ConSerts can be found in our blog article about functional safety. 

 

Finally, FabOS will also simplify collaborative robot use by providing interfaces to various simulation tools. These interfaces are beneficial because an increasing number of HRC applications are planned and tested in simulation. The integration of simulation tools in FabOS is expected to facilitate the building of digital twins, simulation-based risk assessment, and virtual commissioning of HRC applications. 

 

[1] Schneider, Daniel, and Mario Trapp. "Engineering conditional safety certificates for open adaptive systems." IFAC Proceedings Volumes 46.22 (2013): 139-144. 

 

Author: Patrick Schlosser

Firma: KIT

Created by l.demes94 11.05.2022 10:2911.05.2022 10:29.

Modified by l.demes94 11.05.2022 10:3211.05.2022 10:32.

Short Summary

Machine Learning and simulations are key technologies for Industrie 4.0. How can they be combined to provide even more benefit? This article illustrates different use cases for the combination of both technologies. 

 

Article

1. Introduction

Using simulation techniques have a long tradition in manufacturing and are also a key to success for Industrie 4.0 [1]. Exemplary applications are the prediction of process properties or the simulation of distributed supply chains. With the recent advances in machine learning and computing power, it is natural to raise the question how simulations can be improved by machine learning. 

 

Machine learning and simulation have a similar goal: to predict the behavior of a system (e.g. the energy consumption behavior) with the help of data analysis and physical modeling. However, the approaches to achieve that goal are quite different. The most widespread used approach in the simulation community is the manual creation of behavior models using mathematical-physical modeling in the sense of a theoretical modeling of the subsystems and processes involved. Even when using preconfigured libraries, depending on the desired level of detail, model building for complex systems can be very time-consuming. The greater the model accuracy, the higher the demands on the prior knowledge of the modeler are. The modeler must consider the physical interactions of the system and its environment (e.g. other systems or subsystems) in order to identify the relevant influences on the system properties. 

 

Another possibility is an experimental approach, where the input and output variables of a system are evaluated on a test-bench and the so-captured data is used to parametrize mathematical equations describing the behavior of a system. 

 

For both approaches (theoretical and experimental model building), a human expert has to choose the appropriate equations (e.g. differential or algebraic equations) that are suitable for the problem at hand based on his domain-knowledge. 

 

In contrast to that, the machine learning approach is based on choosing an appropriate inference machine (like neural networks) and provide enough training material to train the underlying model. The following table depicts a qualitative comparison of advantages and disadvantages of generalized approaches on behavior modeling, in the special application domain of virtual validation in autonomous and highly automated driving functions. 

 

Aspect 

Theoretical 

Experimental 

DL-Based 

Domain-Knowledge 

high 

middle 

low 

Manual effort for model creation 

high 

middle 

low 

Amount of Empirical Data 

middle 

high 

Model preciseness 

high 

middle 

middle 

Execution Speed of Model 

low 

high 

high 

IP Protection 

low 

middle 

high 

Effort for automation 

high 

low 

low 

Effort for model training 

middle 

high 

  

As can be seen by that table, simulation and machine learning approaches have pros and cons and hence it is natural to consider cases where both can be combined. In this article we therefore discuss three generic use cases with potential applications that combine simulation and machine learning. 

 

2. Combining Machine Learning & Simulation 

2.1 Integrating Machine Learning Models into Simulations 

World energy consumption has continued to increase in recent years. As a major consumer, industrial activities use about one third of the global energy over the last few decades. In the presence of renewable energies, it is beneficial to shift energy intensive production processes to times where photovoltaics and wind turbines provide enough energy. Hence energy analysis and optimization are essential topics within a sustainable manufacturing strategy. 

 

However, as we’ve outlined in the introduction, predicting the (physical) behavior of individual systems with physical simulation models is already challenging. But giving precise forecasts for whole factories is a notoriously difficult and time-consuming task. Hence, often methods based purely on statistics has been the only means to achieve that goal. For the purpose of energy forecasts, time series analysis with approaches like SARIMA (Seasonal Autoregressive Integrated Moving Average) is an established method that is already used in the industry. Nonetheless, as the name suggests, autoregression assumes that the previous observations in the time series provide a good estimate of future observations. This might be appropriate in a traditional manufacturing environment that produces the same product (with slight modifications) the whole time. In the context of Industrie 4.0 with changeable production and customized products it seems to be completely inappropriate. Moreover, also a naive approach using machine learning alone might be inappropriate: on one hand, a whole factory has just too many parameters that it can be efficiently learned even with the newest advances in deep neural networks. So, a careful pre-selection of the parameters that needs to be learned has to be done by a human expert. Even after that, it will be really hard to provide enough training data giving the changeable production. As a consequence, the machine learning algorithm might suffer from over-fitting: it is very well suited to predict the energy consumption of the products it has seen before, but it will have a bad performance on new products. 

 

Hence, neither simulation nor machine learning alone will be sufficient for accurate energy estimation of Industrie 4.0 plants. However, when we combine both approaches, we might achieve a highly-accurate simulation model for the energy-consumption of a factory. 

 

Clearly, the first step of such a procedure would involve the data acquisition. With FabOS and the asset administration shell, this data acquisition becomes feasible as outlined in Figure 1 in a simplified manner. Here, the MES is controlling the production in the plant by invoking services (like e.g. drilling) which are provided by the asset administration shell. Moreover, relevant properties of the device like the energy consumption are also provided by the device via the asset administration shell to the MES as a feedback. The executed services together with the energy consumption of the device can be collected by a data acquisition system and stored e.g. in a time series database. The acquired data can afterwards be used to train a machine learning algorithm to actually generate an energy model e.g. as a neural network. The crux of that approach is that we estimate the energy consumption based on the executed services which will provide a much more fine-grained energy model in contrast to evaluating the energy consumption of the plant as whole. 

 

Hence the “device energy simulation twin” has to offer the same service interface as the original device and based on the invoked service will predict the current energy consumption. 

 

Note that this encapsulated machine learning model will not suffer from overfitting in the presence of new products as we can now estimate the energy consumption of the new product based on the recipe (e.g. the service invocations) to actually produce it[1]

 

Figure 1: Generating an energy simulation twin 

 

Using these simulation digital twins, we can build up a holistic system of systems simulation for our Industrie 4.0 factory as depicted in Figure 2: 

 

Figure 2: Simulating the Energy Consumption of a Factory 

 

In that setting, the real ERP system is replaced by a data generator that triggers the production process by issuing production orders to the simulated twin of the MES system controlling the simulated production process. Here our machine learned energy models come into play that reports the predicted energy consumptions based on the executed services. 

 

2.2 Machine Learning of existing Simulation Models  

Another branch where simulation and machine learning can be combined is the learning of existing behavior simulation models (sometimes also called learning of white-box models). As discussed in the introduction, behavior models are often manually created by means of mathematical-physical modeling (theoretical modeling). This can lead to highly accurate, but also highly complex models with long simulation runtimes. For example, the simulation of Dynamic Random Access Memories (DRAMs) requires highly accurate models due to the complex timing and power behavior of DRAMs. However, cycle accurate DRAM models often become the bottleneck regarding the overall simulation time. With machine learning, we can achieve significant simulation acceleration with little loss of model accuracy as discussed in [2] where neural networks are used to speedup DRAM simulations. Another field where machine learning is applied for simulation speedup is in the simulation of physics. E.g. simulators for particle physics describe the low-level interactions of particles with matter which are very computationally intensive and consume a significant amount of simulation time. As discussed in the survey article [3], among others, generative adversarial networks have been successfully applied for simulation acceleration. Other examples are given in [4] where different machine learning models are trained to predict the flow over an airfoil using data from a large-scale computational fluid dynamics (CFD) simulation. Another use case in the paper is the reduction of complexity of a high-fidelity finite element model to a low dimension machine learning model. The machine learning models used in this article are neural networks, polynomial linear regression, k-nearest neighbors (kNN) and decision trees. In this paper, all machine learning techniques compute the result much faster than the compute-intensive fluid dynamics but with the price of sometimes very inaccurate results. 

  

Besides simulation acceleration, the transformation into a neural network results additionally in intellectual property (IP) protection. This is gaining importance due to the increasing number of exchanged behavioral models between companies, especially in the automotive sector. 

 

3. Summary 
In this article, we discussed the improvement of simulations using machine learning techniques by means of generic use cases. In FabOS, we will evaluate these generic use cases in concrete applications. Additionally, we will provide necessary components and interfaces to support the FabOS user in applying aforementioned techniques. Obviously, there are more cases thinkable how simulation and machine learning can be combined. For example, given the large number of simulations used in the manufacturing domain, it is desirable to support the selection of proper simulation tools or the parametrization of simulations using machine learning techniques. Another topic that is currently getting a lot of attention is the validation of machine learning components using simulation techniques. However, discussing both topics in details is beyond the scope of this article. 

 

[1]  Gunal, Murat M. Simulation for Industry 4.0. Basel, Switzerland: Springer Nature Switzerland AG, 2019

[2]  Feldmann, Johannes, et al. "Fast and accurate dram simulation: Can we further accelerate it?." 2020 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 2020. 

[3]  Guest, Dan, Kyle Cranmer, and Daniel Whiteson. "Deep learning and its application to LHC physics." Annual Review of Nuclear and Particle Science 68 (2018): 161-181. 

[4] Swischuk, Renee, et al. "Projection-based model reduction: Formulations for physics-based machine learning." Computers & Fluids 179 (2019): 704-717. 

 

[1] Obviously, depending on the device under investigation and the needed accuracy even the service alone might not be sufficient. However, the important thing here is that all needed information will be available via the asset administration shell and we can acquire the appropriate training data for machine learning. 

 

Author: Frank Schnicke, Andreas Morgenstern, Oliver Bleisinger, Florian Balduf

Firma: Fraunhofer IESE

 

 

Created by l.demes94 27.04.2022 09:1227.04.2022 09:12.

Modified by l.demes94 27.04.2022 09:1627.04.2022 09:16.

Article: One Stop Shop: From initial information until use in the factory 

FabOS is a highly complex, versatile operating system for production consisting of numerous individual components. The One Stop Shop supports interested parties with the configuration and offers them orientation, e.g. for the following questions:

  • Which components are the right ones to cover my use case?
  • Are all components compatible?
  • Which requirements have to be met?
  • With which configuration can my production be optimally optimized?
  • How does it all get into my factory?

Central point  

 

The One Stop Shop is the central point of contact: from initial information to commissioning and ongoing optimization of the system. In the first step, users enter information about their production processes, their factory, and their goals. The configurator matches these with the software and hardware catalogue in which all FabOS components are listed. From this, the One Stop Shop creates an individual proposal for a FabOS overall system, considering the compatibility and the circumstances in the respective production.

The entire purchase process is mapped via the shop – from product selection and configuration to the shopping cart and the actual ordering and payment process. The delivery and commissioning are carried out both by the shop and by the FabOS partners.

 

One Stop Shop enables continuous improvement of own production 

On a voluntary basis, FabOS users can share and visualize data with each other and with the One Stop Shop. This offers numerous potentials for optimizing their own production. In addition, the shared data can continuously improve both the configurator and their own FabOS: For example, the configurator suggests suitable new services. This mechanism opens new potential for users of FabOS, which can be exploited with the help of suitable FabOS components. The development partners of FabOS recognize new needs based on the data released for them and develop new software services in a targeted manner.

 

FabOS-ecos - More than just a shop 

FabOS-ecos, the companion platform where the One Stop Shop resides, includes:

  • Configuration
  • Handling of the purchase process
  • Continuous implementation
  • Optimization
  • Blog
  • Catalog of all partners in the consortium with the respective core competencies that they bring to the project

Classification into the overall FabOS project

 

The One Stop Shop is the central information and sales channel for the result of the overall project. It brings the results of the research project to industry and helps to make them usable in practice.

The One Stop Shop offers its users low-threshold access to FabOS – in-depth IT knowledge is not required. In addition, it is able to individually optimize the configuration of the operating system for production based on data.

 

Article: FabOS-ecos: Services for users, software and AI-provider 

FabOS-ecos is a comprehensive portal with services for users of FabOS, but also the providers of software, AI components and FabOS itself. It enables networking, the development of a community and more.  

 

innoecos as base for FabOS-ecos  

As base for FabOS-ecos, a software is being used which has already been utilized successfully in numerous clusters and funding projects in industry and companies: innoecos. The flexible collaboration platform is particularly characterized by the fact that it enables groups to work together and network. Thanks to a consistent and comprehensive role and rights concept, it offers the possibility of securely and differentiated sharing of data and information. In addition, the platform can be adapted to customer requirements and various use cases.  

 

Optimized for IIoT-applications  

Innoecos has been developed even further for the use of IIoT applications and enables communication between humans and machines as well as between machines, without restrictions on the basic functions mentioned, such as data security and the role and rights concept. This opens up valuable opportunities for FabOS users to optimize their own business processes.  

 

Use Case Inline-quality control  

If the systems of a production are operated with FabOS, the data being collected by the  systems about tools and workpieces can initially simply be transferred to our secure, independent IIoT cloud via an interface. The physical tools and workpieces receive their digital counterpart - the so-called digital twin (administration shell). This data record can be continuously filled with data over the life cycle of the tool or workpiece (e.g. geometry data, process data, type data, etc.). Ideally, this happens not just in a single company, but across all parties involved in a value chain - for example the manufacturer of a tool and the user who uses the tool to manufacture their own products.  

In this example, both parties would first collect data, feed it into the cloud and decide individually which of the data the other party is allowed to use for its own further processing. This way, the company's know-how remains protected and each partner retains sovereignty of their own data. 

 

Analyse data – improve processes  

The existing data can then be analyzed using machine learning and AI components, which entails a large number of optimizations in planning, production, but also e.g. in customer service and maintenance. For example, a tool manufacturer would now find out what exactly his customer does with the tools and could use this to improve his own products or even implement a new business model such as pay per use. The user of the tools could check the quality of these directly on basis of the data and gain knowledge about how long the tools can actually be used. Thus maintenance of the systems could be planned more precisely. If an end customer later has a complaint, the data would immediately provide information on whether a product left the factory in perfect condition or not.  

 

This is an overview of the advantages:  

  • Increase of productivity  
  • More efficient use of resources and personnel as well as improvement of product development and production  
  • Strengthening the customer service and reducing the effort in quality assurance  
  • Access to data along the entire value chain or from the value network 
  • Access to data from application technology  
  • Simplified data exchange thanks to the use of a standardized interface across all companies involved  

 

Usage of FabOS as entry point to IIoT  

With FabOS, users become part of a growing community and create the conditions to benefit from the possibilities of the Industrial Internet of Things. FabOS-ecos as a platform paves the way for this.  

 

If you are further interested in IIoT applications, then find out more here.  

 

Author: Theresa Höhn

Firma: inno-focus digital gmbh

Created by l.demes94 13.04.2022 10:1313.04.2022 10:13.

Modified by t.hoehn2 29.04.2022 13:4629.04.2022 13:46.

Short Summary

This article compares popular workflow orchestration frameworks, based on their way of defining workflows, supported programming languages, available toolbox and scalability. 

 

Article

Introduction

A workflow is a loosely defined as an organized pattern of activates. There are many terminologies related to this, depending on the domain the workflow is applied to. In the following elements of a workflow will be called tasks. An example workflow is presented in the following figure. 

This figure shows a workflow for for packaging and shipping a product or recycling said product based on the results of the measurement (and the subsequent processing of measurement data). Some fo these can be tasks carried out by automated services or carries out manually, by humans. Such workflows (sometimes called also processed in different domains) are necessary in order to have a standardized approach of dealing with decision making in an organization. 

 

Orchestration vs. Choreography

If we consider the implementation of automated workflows, implemented by software components, Workflow Orchestration and Workflow Choreography are two patterns which aim do assure the interaction if components and build a workflow, i.e., a sequence of tasks implements as software components, which accomplish a given goal. Orchestration is a centralized approach in which one entity is orchestrating the interaction between the components. The central orchestrator manages the interaction between the components explicitly. The components implement logic which can be seen as one working step, or task, in the workflow, but only interact with the orchestrator, not with each other. In this case, in order to make changes to the workflow, only the logic of the orchestrator has to be changed. Orchestration is represented below. 

 

 

Choreography is a decentralized approach where components of a system implement not only the working steps, or tasks, but also the logic based on which the workflow is defined. The components interact with each other and parts of the workflow are implemented scattered in each component in an implicit manner. Choreography is represented below.

 

 

Orchestration or choreography are the two patterns used in decentralized (container based) microservice architecture in order to assure modularity and ease of extensibility of the architecture. Choreography is not necessarily an explicit design decision, since it is currently the most used design pattern for communication between microservices and it is in some cases adopted as the default solution without an explicit design decision. 

Both orchestration and choreography have their advantages and disadvantages. The central orchestrator allows an a priori overview (even graphical overview in many cases) of the workflow and changes to the workflow need to be made only at this central place. In the other hand, this can also be a single point of failure and this approach can lead to more communication overhead. Choreography allows for an easier extension of the functionalities as new components can be added to the system an only localized changes are required (or even no changes at all in some cases). These types of systems tend to have less overhead and not rely on a central single point of failure, however using a messaging broker somewhat erodes this advantage. There are tools which can create a graphical overview of the workflow, but this is only possible post-prior, reconstructed from traces or logs. 

Choreography can be found in many modern microservices infrastructure. The decentralized nature of choreographed system is well correlated to the autonomous small teams approach to the development of microservices. Besides their technical characteristics listed above, this can be one reason for their popularity. Orchestration on the other hand is the basis of most DevOps systems. Furthermore, it is also the basis of other xOps, like MLOps, DataOps or AiOps. These are all essentially workflow orchestration system specialized pipelines for machine learning, data science and AI related tasks. In the following the general-purpose orchestration tools are analyzed (even if, like Argo, have their roots in *Ops). Specialized orchestrators for MLOps (e.g., KubeFlow), and DevOps (Like Github Actions) are not considered in this section. 

Choreography in practice, when implemented in a microservices architecture, in most cases, relies on some type of publish-subscribe communication pattern. As this is a decentralized approach, from the tooling side, it is inherently characterized by the lack of a framework for explicit workflow management or a central software component which enables this pattern, therefor it is hard to compare solutions and implementations. Choreographed microservices, in most cases, use a messaging system (like MQTT, Redis Pub/Sub, AMQP, ZeroMQ, ROS, Kafka) to communicate with each other. 

Orchestration has one distinctive central component, the orchestrator, which explicitly manages the workflow. Examples for workflow orchestration systems are Apache Airflow, Apache Argo, Uber Cadence, Camunda BPM, Netflix Conductor, Lyft Flyte, Apache NiFi, Camunda Zeebe. In this case, since there is an explicit workflow definition, the way this workflow is described is important. (It is interesting to note, that the companies famous for their microservice-based approach, like Uber and Netflix have chosen to develop orchestrators for their microservices) Some orchestrators like Camunda, NiFi, Zeebe use a visual programming language (VPL) to define the workflow (obviously these visual languages are serializable). Some of these, like Camunda and Zebee use standardized languages, in this case BPMN, others, like NiFi use proprietary VPL. Orchestrators which do not use a VPL, use a programming language, like Python (e.g., Airflow, Flyte), Java (e.g., Cadence, Flyte) or in JSON/YAML DSL (e.g., Argo, Conductor) 

In terms of horizontal scalability, most orchestrators support a (micro)services approach for scaling, but there is a difference at the level which is this happens. One end of the spectrum is Apache NiFi, which distributes at the workflow instance level, meaning, that the orchestrator does not make network calls directly (the implemented tasks might, but not the orchestrator) and scaling happens at the workflow instance level. The other end of the spectrum is Zeebe, Conductor or Cadence, which distributes at the task level, the orchestrator only makes network calls and each task is expected to be implemented independently of the orchestrator. Scaling happens both at the task level, as many workers can register to complete the same task and at the orchestration level, as Zeebe supports a built-in complex load balanced scaling model at the orchestrator level. 

 

Workflow types: Pipeline vs. DAG vs. Generic Flow

In a strict sense, a pipeline can be considered the simplest form of a workflow in which the components of the workflow are chained together in a sequence, without any options for bifurcations. Pipelines are popular in DevOps, as in many cases in DevOps a pipeline is expressive enough to describe the intended workflow. If something should go wrong in the DevOps pipeline notification and stopping the workflow instance is the only reasonable action. (Some Workflow engines, e.g. Azure DevOps, which call their workflow a pipeline are more permissive with the pipeline definition and allow some bifurcations of hte pipeline. 

A more complex approach to a workflow definition then a simple pipeline is a DAG (Directed Acyclic Graph). All piplines can be also expressed as a DAG. This approach models the workflow as a directed graph, which however, is not allowed to include cycles. This is the main limitation when it comes to DAGs. DAGs are very popular for stream processing, where a cycle is not needed, but cannot express complex workflows which include cycles. 

As there is no special naming convention for workflows which include cycles, we are calling them Generic Workflows in this article. A generic workflow can describe all workflows which can be described by pipelines or DAGs, furthermore it can include cycles in its definition. BPMN (Business Process Modelling Notation) is a standard to define workflows (or Processes in BPMN terminology) in a visual manner, which can be represented as an XML file. BPMN Workflows can include cycles. 

 

Activiti 

Activiti is a “light-weight workflow and Business Process Management (BPM) Platform”. The project website can be found here 

 

Workflow definition 

The workflows are defined in BPMN, according to the BPMN Standard in XML format. There is no special tool provided for Activity to define, view or edit a workflow visually (Activiti Designer has been depricated).

Pre-defined Toolbox of tasks 

There is no predefined toolbox of ready-made tasks which can be used, other then the logic gates used in BPMN (OR, XOR, PARALLEL, …). 

Task Implementation 

In Activiti Tasks are directly implemented in Java and are called as a Java function call from the BPMN engine instance 

Microservices compatibility 

Activiti itself can be packaged as a microservice. However, as task implementations have to be done in Java, Activity Tasks are not out-of-the-box compatible with Microserices. One could call different microservices form the Java code implementing the Task. 

License 

Apache-2.0 License

 

Apache Airflow 

Apache Airflow is a platform to “programatically author, schedule and monitor workflows”. The project website can be found here 

 

Workflow definition 

The workflows are defined as a DAG using a Python SDK or a REST API. There is no visual editor of the DAG available. 

Pre-defined Toolbox of tasks 

There is a large selection of predifined tasks (instantiated using Operators or Sensors in AirFlow) which deal with various tasks like reading/writing from/to databases, reading/writing to streams, applying transformations to the data, etc. 

Task Implementation 

In Airflow Tasks can be implemented as configurations for operators (e.g. the command for a BAsh Operator with will be executes as Bash script) or implemented in Python which and are called as a Pyhton function call from the AirFlow Executor instance 

Microservices compatibility 

Airflow itself can be packaged as a microsdervifes. However, as task implementations have to be done in Python, AirFlow Tasks are not out-of-the-box compatible with Microserices. One could call different microservices form the Python code implementing the Task. 

License 

Apache-2.0 License 

 

Apache NiFi 

Apache NiFi is not explicitly aimed ad workflow orchestration. It is defined as a “system to process and distribute data”. However, by looking at its capabilities and features it comes very close to a workflow engine. The project website can be found here 

 

Workflow definition 

The workflows are defined as directed graph, called a dataflow using a graphical user interface and a custom visual programming language. The dataflow does not have to be acyclic, loops are explicitly permitted in NiFi, however, the functionality of a loop seems to be error correction and not custom logic. 

Pre-defined Toolbox of tasks 

There is a large selection of predifined tasks (called Component or Processors in NiFi) which deal with various tasks like reading/writing from/to databases, reading/writing to streams, applying transformations to the data, etc. 

Task Implementation 

In NiFi Tasks are implemented in Java. The Java snapshot files (.nar file), written according to a detailed specification, along with metadata description of the task have to ba made available for the NiFi instance running on the system path. 

Microservices compatibility 

NiFi itself can be packaged as a microservices. However, as task implementations have to be done in Java, NiFi Tasks are not out-of-the-box compatible with microservices. One could call different microservices form the Java code implementing the Task. 

License 

Apache 2.0 License 

 

Netflix Conductor 

“Conductor is a workflow orchestration engine that runs in the cloud.” The project website can be found here

 

Workflow definition 

Workflows are defined in a JSON DSL. There is no visual editor for defining workflows and workflows are defined as DAGs. 

Pre-defined Toolbox of tasks 

There is no predefined toolbox of ready-made tasks which can be used. 

Task Implementation 

In Conductor Tasks are implemented as Java or Python applications (or over HTTP REST Calls). The Workers implementing a Task are decoupled from the Workflow engine itself and communicate with the Workflow Engin over HTTP REST. 

Microservices compatibility 

Conductor itself can be packaged as a microservices. Furthermore, as task implementations are meant to be implemented decopupled form the Workflow engine, it is out-of-the-box compatible with microservices. 

License 

Apache 2.0 License 

 

Node-Red 

Node-Red is not explicitly a workflow orchestration engine. It is a “Low-code programming for event-driven applications”. However, by looking at its capabilities and features it comes very close to a workflow engine. The project website can be found here 

 

Workflow definition 

Node-Red is not explicitly a workflow orchestration system, but it is very popular and it is very close, based on its featrues. Workflows are defined in a proprietary visual programming language and are stored as JSON files. 

Pre-defined Toolbox of tasks 

There is a large pre-defined toolbox and many community developed extensions. 

Task Implementation 

The Tasks in Node-Red are implemented as Javascript code and run in Node.js 

Microservices compatibility 

Node-Red itself can be packaged as a microservices. However, as task implementations have to be done in Javascript, Node-Red Tasks are not out-of-the-box compatible with microservices. One could call different microservices from the Javascript code implementing the Task. 

License 

Apache License 2.0

 

Uber Cadence 

“Cadence is a distributed, scalable, durable, and highly available orchestration engine”. The project website can be found here 

 

Workflow definition 

Workflows are defined using a Java SDK There is no visual editor for defining workflows and workflows are defined as DAGs. 

Pre-defined Toolbox of tasks 

There is no predefined toolbox of ready-made tasks which can be used. 

Task Implementation 

In Cadence Tasks are implemented as Java or Go applications (Python and .Net are under development). The Workers implementing a Task are decoupled from the Workflow engine itself and communicate with the Workflow Engin over HTTP REST. 

Microservices compatibility 

Conductor itself can be packaged as a microservices. Furthermore, as task implementations are meant to be implemented decoupled form the Workflow engine, it is out-of-the-box compatible with microservices. 

License 

MIT License 

 

Apache Argo 

The project website can be found here 

 

Workflow definition 

The workflows are defined as a DAG using a YAML configuration files. There is no visual editor of the DAG available, but the DAG can be visualizes at runtime. Special elements of the Argo DAG is the option to iterate over a list of items as a loop and the support for recursivity (although the workflow is explicitly called a DAG).

Pre-defined Toolbox of tasks 

There is no predefined toolbox of ready-made tasks which can be used. 

Task Implementation 

In Argo Tasks are implemented as (Kubernetes) services. 

Microservices compatibility 

Argo is by design intended for use in a Kubernetes environment, therefore it supports containerization. The tasks (which in Argo are called steps) are implemented by Kubernetes Services hence Argo is out-of-the-box compatible with tasks implemented as microservices. Interestingly the input and output for the tasks is done over environmental variables and the standard input/output. 

License 

Apache 2.0 License 

 

Zeebe 

The project website can be found here 

 

Workflow definition 

The workflows are defined in BPMN, according to the BPMN Standard in XML format. There was a Zeebe Modeler tool for visually editing BPMNN diagrams for Zeebe, but it was depricated and exchanged with a more generic BPMN modeler from the same company. 

Pre-defined Toolbox of tasks 

There is no predefined toolbox of ready-made tasks which can be used.

Task Implementation 

In Zeebe Tasks are implemented as Java, Go. .Net or Python applications (or over gRPC Calls). The Workers implementing a Task are decoupled from the Workflow engine itself and communicate with the Workflow Engin over gRPC. 

Microservices compatibility 

Zeebe itself can be packaged as a microservices. Furthermore, as task implementations are meant to be implemented decopupled form the Workflow engine, it is out-of-the-box compatible with microservices. 

License 

Zeebe Community License Version 1.1 

 

Comparison

Name 

Workflow Definition 

Toolbox 

Task Implementation 

Tasks as Microservices 
out of the box 

License 

Activiti 

BPMN 

No 

Java 

No 

Apache 2.0 

Airflow 

Python 

Yes 

Python or Config 

No 

Apache 2.0 

NiFi 

Proprietary VPL 

Yes 

Java 

No 

Apache 2.0 

Conductor 

JSON DSL 

No 

Java 

Yes 

Apache 2.0 

Node-Red 

Proprietary VPL 

Yes 

JavaScript 

No 

Apache 2.0 

Cadence 

Java 

No 

Java or Go 

Yes 

Apache 2.0 

Argo 

YAML 

No 

Any Kubernetes Service 

Yes 

MIT License 

Zeebe 

BPMN 

No 

Java, Python, Go, .net or any gRPC 

Yes 

Zeebe Community 
License 1.1 

 

Conclusions

By looking at the comparison table it becomes clear, that most of the workflow orchestration engines are meant for fully automates processes. Workflows defined in configuration files are not targeted at generic use cases. It is interesting to observe, the low-code movement and how it is gaining traction with Node-Red and NiFi as a flagship example. Visual programming languages, as no-code or low-code platforms are intended to be used by non-programming savvy users to achieve similar results ad their programming savvy counterparts. This is where BPMN excels, as the few standardized ways to represent generic business processes. It is seen as a visual programming language to define a workflow in the case of Activiti and Zeebe, however, in contrast to NiFi and Node-Red, there is no predefined toolbox to interact with other systems. All implementations have to be custom made and without these implementations, Activiti and Zeebe have a very limited applicability. This can be seen as a cooperation mechanisms, where the role of the people implementing the services and those defining the workflows are decoupled. Furthermore, as the workflows are defines in a way which is standardized and used throughout enterprises everywhere, moreover, the visual representation of the workflow is coupled with the actual workflow which the workflow engine uses. When compered to Activiti, Zeebe stands out by its treatment of microservices as first-class citizens. The drawback of Zeebe is its licensing model, which prohibits commercial exploitation in a form of a workflow orchestration service provider. 

 

Author: Dr.-Ing. Akos Csiszar

Firma: ISW Uni Stuttgart

Created by l.demes94 30.03.2022 08:0530.03.2022 08:05.

Modified by l.demes94 30.03.2022 08:1230.03.2022 08:12.