Monitoring and Alerting
The Monitoring and Alerting component is responsible for allowing users to collect data, eg KPIs and other data points from machines, infrastructure, and zApps. The component is also responsible for alerting users and other ZDMP components in case a KPI get out of defined limits, reducing the impact of crises and losses to smart factories.
The different KPIs delivered via the platforms message bus can be configured to be stored in the Storage Component to collect historic data. If historic data is collected, different choices of histograms are presented for the user to choose how the data should be presented.
To be able to notify about potential problems, users can define limits for data points, as well as qualifiers (eg energy consumption is ‘larger than’ & ‘100 kWh’) to trigger alerts, ie SMS, emails, push notifications as well as calls to HTTP endpoints when these limits have been crossed for the first time. These limits are used as goals (for example by Autonomous Computing) where a process can be started if this limit is not reached.
Additional alerts can be sent if the component has not sent a response after a defined time has passed. The receivers should be able to check that the problem has been recognized so the system knows it is already being acted on. If this has not happened after passing a critical value, the system notifies other receivers. A reset period can be created, to indicate a duration in which the data point must be back within the regular defined value to be able to trigger the alert again.
|Latest Release||Vs 1.0.0|
|X Open API Spec||Link|
|Generation date of this content||25 June 2023|
The following images are illustrative screenshots of the components:
|Company Name||ZDMP Acronym||Website||Logo|
The following diagram shows the position of this component in the ZDMP architecture
Figure 20: Position of Component in ZDMP Architecture
Collect various type of data, eg KPIs and other type of data from machines, sensors, infrastructure and zApps by subscribing to topics in the message bus
Create KPIs and watch its value changes occurring through time
Store the collected data and KPIs in the Storage component and allow to use it for analyses purposes
Notify users when KPI’s values are not within the defined conditions reducing the impact of failures and crises to smart factories
Create and send alerts to user in the form of emails or notifications
Use Grafana to compose observability dashboards with data from Prometheus
This component offers the following features:
View KPI’s Historic Data
Create Message Template
Send Notification / Send Alert
Allow the user to create KPIs to extract important values from the data transmitted through the Message Bus. The user needs to specify through which topics the data is extracted from.
View KPI’s Historic Data
Allow the user to create Alerts to notify users when the KPI’s values are not within the expected by quality standards or in any other situations where the user wants to be notified if the KPI values meets a certain criterion.
Create Message Template
Allow the user to create a rich text message using KPI properties, like description inside the template to provide a complete message with pertinent real time information.
Send Notification / Send Alert
Allow the user to send an email directly to one or more users, without the need of an alert to do so. This feature is only available in the API and not in the UI, since this feature is intended for other API’s use and not human use.
- 2 CPUs
64GB disk space
Associated ZDMP services
Installation via miniZDMP
To install the Monitoring and Alerting component on a miniZDMP instance, some preliminary work is necessary. To follow this guide, we assume that the installation and setup of the miniZDMP platform has been completed. Instructions on how to do this can be found here (https://gitlab-zdmp.platform.zdmp.eu/enterprise-tier/t6.4-application-run-time/-/tree/master/minizdmp or D087 Platform Integration and Federation).
In order for the component to be used, the following things must be ready:
T6.4 Service and Message Bus (or RabbitMQ Message Bus)
Access data for a mail server (reference platform mail server is used as default)
- Please login into your Rancher Instance and navigate to the Apps Section.
Figure 21 - Rancher Lunch an App
Please click on the Lunch Button to launch a new Application on the miniZDMP Kubernetes Cluster
Search for the Entry of “Monitoring and Alerting / zdmp-monitoring-and-alerting” and click on it.
Figure 22 - Lunch the Monitoring and Alerting App
In this step there are a few configurations needed, please set it to your needs.
Please set it to LOCAL or NFS. For NFS it is needed that you configure a NFS-Server in your miniZDMP Platform. Local means that the Data that are collected in this component will be saved to the local Server disk. In case of a NFS the data can be saved to Network Shared Directory (recommended way, prevents data loss).
You can adjust here the domain name, where this component will be available. The default is set to zdmp.home which is also a default of the miniZDMP Platform.
- Private Registry Settings
These settings are necessary to download the container of registry of ZDMP. For the HCE Gitlab please use your own credentials as username and password.
You can choose between a central database (where you need to provide the credentials) or a preconfigured database for this component.
- Backend environment vars
The access data for the mail server and the message bus must be stored here. In the standard case, this data is configured so that the accesses are used by the reference platform.
- Environment Vars
Here it is possible to adapt the URLs in order to call up either other components or internal components. In a standard installation of miniZDMP, no changes should be necessary here.
- Please click on the Lunch Button, to start the process that the monitoring and alerting component will be deploy to the miniZDMP Cluster.
Installation via Docker-Compose
The Monitoring and Alerting component is installed via docker-compose, for that a server for the email credentials and the message bus component credentials is also needed:
- Download the latest docker-compose file from ZDMP’s GitLab
- Add the environment variable values. Choose the way to do it following the instructions from docker: https://docs.docker.com/compose/environment-variables/.
As an example, create a file named ‘.env’ in the same folder of the docker-compose file, with the following information:
The TIME_BETWEEN_NOTIFICATIONS_FROM_SAME_ALERT is a value in milliseconds used when the value of a KPI dispatches an Alert, and the KPI value was changed again and would trigger the alert again, a new alert is sent only after TIME_BETWEEN_NOTIFICATIONS_FROM_SAME_ALERT milliseconds from the last alert sent.
The MESSAGE_BUS_SECURE should be set to “true” in the case that you want to use a message bus server that uses a connection that are protected via TLS.
- Install and start the component by executing the following command:
docker-compose up -d
How to use
The Monitoring and Alerting component can be used through an API or a friendly user interface. It is different between the two options of installation how to use the component, because the URLs are different. Please use the URLs according to the installation method.
API: Please refer to http://localhost:28001/api or https://monitoring-and-alerting-api-zdmp.zdmp.home/api (miniZDMP) for the Swagger instructions on how to use the API. There are all the possible requests the component accepts, and its expected parameters or body content. The API can be accessed in http://localhost:28001/ or https://monitoring-and-alerting-api-zdmp.zdmp.home/ (miniZDMP)
User Interface (UI): Access http://localhost:28002 or https://monitoring-and-alerting-api-zdmp.zdmp.home/ (miniZDMP) to access the user interface.
A KPI references a data value that holds a significant meaning for the user, as an example, the length of pencil produced by an automatic machine. As the length of the pencil is one of the keys to measure the quality of the production, we can create an KPI of the length of the Pencil. To create a KPI, the following is necessary:
Description to identify the KPI
Message Bus topics that should be used to extract the KPI value
Data format expected and the query used to extract the data. The possible data types are JSON and XML
Follows is an example, which extracts the length of the pencils produced:
Figure 23 – Create new KPI
After the KPI is created, the item appears in the KPI List View, and is available to be used to create Alerts, or in other components that uses the list of KPI’s saved:
Figure 24 – KPI List View
One or more conditions can be applied to KPI’s to ensure the quality of the products, and in case a KPI value indicates a quality failure an Alert can be sent to one or more users. Following the pencil example, an alert can be created when the length of the pencil is outside the range delimited by quality standards.
To create an Alert, the following is necessary:
Description to identify the alert.
Condition that compares the values of one KPI.
When more than one condition is provided, a logic query identifying the relation between the conditions needs to be provided. (See example bellow)
Message to be sent when the conditions are matched.
One or more users to receive the message when the conditions are matched.
The following example alerts a user when the length of the produced pencil is out of the range delimited by quality standards:
Figure 25 – Create new Alert – Part 1
The Conditions query must be formulated using the conditions identifiers and the logical operators available in the drag and drop UI.
The alert message can include data from the KPI at the time the alert was sent, the buttons in the upper area are used to add these values, like the KPI ID, value, or description:
Figure 26 – Create new Alert – Part 2
After the Alert is created, the item appears in the Alert List View, and the conditions start to be monitored by the component:
Figure 27 – Alert List View
When the conditions are matched, an e-mail is sent to the users:
Figure 28 – E-mail sent by Alert
The Push Gateway instance can be accessed at http://localhost:9091 or https://pushgateway-zdmp.zdmp.home/ (miniZDMP). The Push Gateway sends the KPI value update to the Prometheus instance, making it available also on Grafana.
Whenever a KPI value is updated, this value is sent to Prometheus, having the following information:
kpiValue: [kpiValue] (this label is only present when the KPI value is not a numeric value)
kpi_value: [kpiValue] (when the KPI value is a numeric value)
kpi_value: [updateTime] (when the KPI value is not a numeric value)
Users can also use the Push Gateway to add additional values to be monitored by Prometheus. As a note, the push gateway only accepts numeric metrics, both integer and float are valid. More information about how to use the push gateway can be found in the official documentation and in this tutorial.
Follow an example of the Push gateway dashboard, where the KPI Pencil Production had an update where the value is 12.8:
Figure 29 – Push gateway dashboard
The Prometheus instance can be accessed at http://localhost:9090 or https://prometheus-zdmp.zdmp.home/ (miniZDMP). Prometheus receives data from the Push Gateway that can be used in the queries and alerts, the metric for the KPI updates is kpi_value.
More information about how to use Prometheus can be found in the official documentation.
Follow an example of the Prometheus query panel, with values sent by the Push gateway, where the KPI Pencil Production had an update where the value is 12.8:
Figure 30 – Prometheus Query Panel
The Grafana instance can be accessed at http://localhost:3000 or https://grafana-zdmp.zdmp.home/ . Grana can compose observability dashboards, query, visualize, alert on, and explore the metrics received from Prometheus.
More information about how to use Grafana can be found in the official documentation.
Follow an example of the Grafana panel, with values collected from the Prometheus instance:
Figure 31 – Grafana Panel