DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Low-Code Development: Leverage low and no code to streamline your workflow so that you can focus on higher priorities.

DZone Security Research: Tell us your top security strategies in 2024, influence our research, and enter for a chance to win $!

Launch your software development career: Dive head first into the SDLC and learn how to build high-quality software and teams.

Open Source Migration Practices and Patterns: Explore key traits of migrating open-source software and its impact on software development.

Related

  • Are You Tracking Kubernetes Applications Effectively?
  • Cluster Logging of Telecom 5G IOT Microservice Pods
  • Revolutionizing Kubernetes With K8sGPT: A Deep Dive Into AI-Driven Insights
  • 5 Ways a Service Mesh Can Better Manage App Data Sharing

Trending

  • Applying the Pareto Principle To Learn a New Programming Language
  • Spring AI: How To Write GenAI Applications With Java
  • Integration Testing With Keycloak, Spring Security, Spring Boot, and Spock Framework
  • Leveraging Microsoft Graph API for Unified Data Access and Insights
  1. DZone
  2. Testing, Deployment, and Maintenance
  3. Monitoring and Observability
  4. Operation and Network Administration Management of Telecom 5G Network Functions Using Openshift Kubernetes Tools

Operation and Network Administration Management of Telecom 5G Network Functions Using Openshift Kubernetes Tools

Learn how 5G Telecom Operators can adopt Prometheus, Grafana, and Alert Manager tools for monitoring and alerting their Network Functions in the Kubernetes cloud.

By 
BINU SUDHAKARAN PILLAI user avatar
BINU SUDHAKARAN PILLAI
·
May. 08, 24 · Analysis
Like (1)
Save
Tweet
Share
2.4K Views

Join the DZone community and get the full member experience.

Join For Free

The world of Telecom is evolving at a rapid pace, and it is not just important, but crucial for operators to stay ahead of the game. As 5G technology becomes the norm, it is not just essential, but a strategic imperative to transition seamlessly from 4G technology (which operates on OpenStack cloud) to 5G technology (which uses Kubernetes).

In the current scenario, operators invest in multiple vendor-specific monitoring tools, leading to higher costs and less efficient operations. However, with the upcoming 5G world, operators can adopt a unified monitoring and alert system for all their products. This single system, with its ability to monitor network equipment, customer devices, and service platforms, offers a reassuringly holistic view of the entire system, thereby reducing complexity and enhancing efficiency.

By adopting a Prometheus-based monitoring and alert system, operators can streamline operations, reduce costs, and enhance customer experience. With a single monitoring system, operators can monitor their entire 5G system seamlessly, ensuring optimal performance and avoiding disruptions.

This practical solution eliminates the need for a complete overhaul and offers a cost-effective transition. 

Let's dive deep.

Prometheus, Grafana, and Alert Manager

Prometheus is a tool for monitoring and alerting systems, utilizing a pull-based monitoring system. It scrapes, collects, and stores Key Performance Indicators (KPI) with labels and timestamps, enabling it to collect metrics from targets, which are the Network Functions' namespaces in the 5G telecom world. 

Grafana is a dynamic web application that offers a wide range of functionalities. It visualizes data, allowing the building of charts, graphs, and dashboards that the 5G Telecom operator wants to visualize. Its primary feature is the display of multiple graphing and dashboarding support modes using GUI (Graphical user interface). Grafana can seamlessly integrate data collected by Prometheus, making it an indispensable tool for telecom operators. It is a powerful web application that supports the integration of different data sources into one dashboard, enabling continuous monitoring. This versatility improves response rates by alerting the telecom operator's team when an incident emerges, ensuring a minimum 5G network function downtime.

The Alert Manager is a crucial component that manages alerts from the Prometheus server via alerting rules. It manages the received alerts, including silencing and inhibiting them and sending out notifications via email or chat. The Alert Manager also removes duplications, grouping, and routing them to the centralized webhook receiver, making it a must-have tool for any telecom operator.

Architectural Diagram

Prometheus Server Architectural Diagram

Prometheus

Components of Prometheus (Specific to a 5G Telecom Operator)

  • Core component: Prometheus server scrapes HTTP endpoints and stores data (time series).
  • The Prometheus server, a crucial component in the 5G telecom world, collects metrics from the Prometheus targets. In our context, these targets are the Kubernetes cluster that houses the 5G network functions.
  • Time series database (TSDB): Prometheus stores telecom Metrics as time series data.
  • HTTP Server: API to query data stored in TSDB; The Grafana dashboard can query this data for visualization.
  • Telecom operator-specific libraries (5G) for instrumenting application code.
  • Push gateway (scrape target for short-lived jobs)
  • Service Discovery: In the world of 5G, network function pods are constantly being added or deleted by Telecom operators to scale up or down. Prometheus's adaptable service discovery component monitors the ever-changing list of pods.
  •  The Prometheus Web UI, accessible through port 9090, is a data visualization tool. It allows users to view and analyze Prometheus data in a user-friendly and interactive manner, enhancing the monitoring capabilities of the 5G telecom operators.
  • The Alert Manager, a key component of Prometheus, is responsible for handling alerts. It is designed to notify users if something goes wrong, triggering notifications when certain conditions are met. When alerting triggers are met, Prometheus alerts the Alert Manager, which sends alerts through various channels such as email or messenger, ensuring timely and effective communication of critical issues. 
  • Grafana for dashboard visualization (actual graphs)

With Prometheus's robust components, your Telecom operator's 5G network functions are monitored with diligence, ensuring reliable resource utilization, tracking performance, detection of errors in availability, and more. Prometheus can provide you with the necessary tools to keep your network running smoothly and efficiently.

Prometheus Features

  • The multi-dimensional data model identified by metric details uses PromQL (Prometheus Querying Language) as the query language and the HTTP Pull model.
  • Telecom operators can now discover 5G network functions with service discovery and static configuration. 
  • The multiple modes of dashboard and GUI support provide a comprehensive and customizable experience for users. 

Prometheus Remote Write to Central Prometheus from Network Functions

Prometheus Remote Write to Central Prometheus from Network Functions

5G Operators will have multiple network functions from various vendors, such as SMF (Session Management Function), UPF (User Plane Function), AMF (Access and Mobility Management Function), PCF (Policy Control Function), and UDM (Unified Data Management).

Using multiple Prometheus/Grafana dashboards for each network function can lead to a complex and inefficient 5G network operator monitoring process. To address this, it is highly recommended that all data/metrics from individual Prometheus be consolidated into a single Central Prometheus, simplifying the monitoring process and enhancing efficiency.

The 5G network operator can now confidently monitor all the data at the Central Prometheus's centralized location. This user-friendly interface provides a comprehensive view of the network's performance, empowering the operator with the necessary tools for efficient monitoring.

Grafana

 Grafana Features 

  • Panels: This powerful feature empowers operators to visualize Telecom 5G data in many ways, including histograms, graphs, maps, and KPIs. It offers a versatile and adaptable interface for data representation, enhancing the efficiency and effectiveness of your data analysis.
  • Plugins: This feature efficiently renders Telecom 5G data in real-time on a user-friendly API (Application Programming Interface), ensuring operators always have the most accurate and up-to-date data at their fingertips. It also enables operators to create data source plugins and retrieve metrics from any API.
  • Transformations: This feature allows you to flexibly adapt, summarize, combine, and perform KPI metrics query/calculations across 5G network functions data sources, providing the tools to effectively manipulate and analyze your data.
  • Annotations: Rich events from different Telecom 5G network functions data sources are used to annotate metrics-based graphs.
  • Panel editor: Reliable and consistent graphical user interface for configuring and customizing 5G telecom metrics panels

Grafana Sample Dashboard GUI for 5G

Grafana Sample Dashboard GUI for 5G

Alert Manager

Alert Manager Components

  • The Ingester swiftly ingests all alerts, while the Grouper groups them into categories. 
  • The De-duplicator prevents repetitive alerts, ensuring you're not bombarded with notifications. 
  • The Silencer is there to mute alerts based on a label, and the Throttler regulates the frequency of alerts. 
  • Finally, the Notifier will ensure that third parties are notified promptly. 

Alert Manager Functionalities

  1. Grouping: 
    • Grouping categorizes similar alerts into a single notification system. This is helpful during more extensive outages when many 5G network functions fail simultaneously and when all the alerts need to fire simultaneously.
    • The telecom operator will expect only to get a single page while still being able to visualize the exact service instances affected.
  2. Inhibition:
    • Inhibition suppresses the notification for specific low-priority alerts if certain major/critical alerts are already firing.
    • For example, when a critical alert fires, indicating that an entire 5G SMF (Session Management Function) cluster is not reachable, AlertManager can mute all other minor/warning alerts concerning this cluster.
  3. Silences: Silences are simply mute alerts for a given time. Incoming alerts are checked to match the regular expression matches of an active silence. If they match, no notifications will be sent out for that alert.
  4. High availability: Telecom operators will not load balance traffic between Prometheus and all its Alert Managers; instead, they will point Prometheus to a list of all Alert Managers.

Dashboard Visualization

Grafana dashboard visualizes the Alert Manager webhook traffic notifications as shown below:

Grafana dashboard visualizes the Alert Manager webhook traffic notifications

Configuration YAMLs (Yet Another Markup Language)

Telecom Operators can install and run Prometheus using the configuration below:

YAML
 
prometheus:
  enabled: true
  route:
    enabled: {}
  nameOverride: Prometheus
  tls:
    enabled: true
    certificatesSecret: backstage-prometheus-certs
    certFilename: tls.crt
    certKeyFilename: tls.key
  volumePermissions:
    enabled: true
  initdbScriptsSecret: backstage-prometheus-initdb
  prometheusSpec:
    retention: 3d
    replicas: 2
    prometheusExternalLabelName: prometheus_cluster
    image:
      repository: <5G operator image repository for Prometheus>
      tag: <Version example v2.39.1>
      sha: ""
    podAntiAffinity: "hard"
    securityContext: null
    resources:
      limits:
        cpu: 1
        memory: 2Gi
      requests:
        cpu: 500m
        memory: 1Gi
    serviceMonitorNamespaceSelector:
      matchExpressions:
        - {key: namespace, operator: In, values: [<Network function 1 namespace>, <Network function 2 namespace>]}
    serviceMonitorSelectorNilUsesHelmValues: false
    podMonitorSelectorNilUsesHelmValues: false
    ruleSelectorNilUsesHelmValues: false


Configuration to route scrape data segregated based on the namespace and route to Central Prometheus.

  • Note: The below configuration can be appended to the Prometheus mentioned in the above installation YAML.
YAML
 
    remoteWrite:
      - url: <Central Prometheus URL for namespace 1 by 5G operator>
        basicAuth:
          username:
            name: <secret username for namespace 1>
            key: username
          password:
            name: <secret password for namespace 1>
            key: password
        tlsConfig:
          insecureSkipVerify: true
        writeRelabelConfigs:
        - sourceLabels:
          - namespace
          regex: <namespace 1>
          action: keep
      - url: <Central Prometheus URL for namespace 2 by 5G operator>
        basicAuth:
          username:
            name: <secret username for namespace 2>
            key: username
          password:
            name: <secret password for namespace 2>
            key: password
        tlsConfig:
          insecureSkipVerify: true
        writeRelabelConfigs:
        - sourceLabels:
          - namespace
          regex: <namespace 2>
          action: keep


Telecom Operators can install and run Grafana using the configuration below.

YAML
 
grafana:
  replicas: 2
  affinity:
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        - labelSelector:
            matchExpressions:
              - key: "app.kubernetes.io/name"
                operator: In
                values:
                  - Grafana
          topologyKey: "kubernetes.io/hostname"
  securityContext: false
  rbac:
    pspEnabled: false # Must be disabled due to tenant permissions
    namespaced: true
  adminPassword: admin
  image:
    repository: <artifactory>/Grafana
    tag: <version>
    sha: ""
    pullPolicy: IfNotPresent
  persistence:
    enabled: false
  initChownData:
    enabled: false
  sidecar:
    image:
      repository: <artifactory>/k8s-sidecar
      tag: <version>
      sha: ""
    imagePullPolicy: IfNotPresent
    resources:
     limits:
       cpu: 100m
       memory: 100Mi
     requests:
       cpu: 50m
       memory: 50Mi
    dashboards:
      enabled: true
      label: grafana_dashboard
      labelValue: "Vendor name"
    datasources:
      enabled: true
      defaultDatasourceEnabled: false
  additionalDataSources:
  - name: Prometheus
    type: Prometheus
    url: http://<prometheus-operated>:9090
    access: proxy
    isDefault: true
    jsonData:
      timeInterval: 30s
  resources:
   limits:
     cpu: 400m     
     memory: 512Mi 
   requests:
     cpu: 50m      
     memory: 206Mi 
  extraContainers: 
    - name: oauth-proxy
      image: <artifactory>/origin-oauth-proxy:<version>
      imagePullPolicy: IfNotPresent
      ports:
        - name: proxy-web
          containerPort: 4181
      args:
        - --https-address=:4181
        - --provider=openshift
        # Service account name here must be "<Helm Release name>-grafana"
        - --openshift-service-account=monitoring-grafana
        - --upstream=http://localhost:3000
        - --tls-cert=/etc/tls/private/tls.crt
        - --tls-key=/etc/tls/private/tls.key
        - --cookie-secret=SECRET
        - --pass-basic-auth=false
      resources:
        limits:
          cpu: 100m
          memory: 256Mi
        requests:
          cpu: 50m
          memory: 128Mi
      volumeMounts:
        - mountPath: /etc/tls/private
          name: grafana-tls
  extraContainerVolumes:
    - name: grafana-tls
      secret:
        secretName: grafana-tls
  serviceAccount:
    annotations:
      "serviceaccounts.openshift.io/oauth-redirecturi.first":  https://[SPK exposed IP for Grafana]
  service:
    targetPort: 4181
    annotations:
      service.alpha.openshift.io/serving-cert-secret-name: <secret>


Telecom Operators can install and run Alert Manager using the configuration below.

YAML
 
alertmanager:
  enabled: true
  alertmanagerSpec:
    image:
      repository: prometheus/alertmanager
      tag: <version>
    replicas: 2
    podAntiAffinity: hard
    securityContext: null
    resources:
      requests:
        cpu: 25m
        memory: 200Mi
      limits:
        cpu: 100m
        memory: 400Mi
    containers:
      - name: config-reloader
        resources:
          requests:
            cpu: 10m
            memory: 10Mi
          limits:
            cpu: 25m
            memory: 50Mi


Configuration to route Prometheus Alert Manager data to the Operator's centralized webhook receiver.

  • Note: The below configuration can be appended to the Alert Manager mentioned in the above installation YAML.
YAML
 
config:
    global:
      resolve_timeout: 5m
    route:
      group_by: ['alertname']
      group_wait: 30s
      group_interval: 5m
      repeat_interval: 12h
      receiver: 'null'
      routes:
      - receiver: '<Network function 1>'
        group_wait: 10s
        group_interval: 10s
        group_by: ['alertname','oid','action','time','geid','ip']
        matchers:
        - namespace="<namespace 1>"
      - receiver: '<Network function 2>'
        group_wait: 10s
        group_interval: 10s
        group_by: ['alertname','oid','action','time','geid','ip']
        matchers:
        - namespace="<namespace 2>"


Conclusion

The open-source OAM (Operation and Maintenance) tools Prometheus, Grafana, and Alert Manager can benefit 5G Telecom operators. Prometheus periodically captures all the status of monitored 5G Telecom network functions through the HTTP protocol, and any component can be connected to the monitoring as long as the 5G Telecom operator provides the corresponding HTTP interface. Prometheus and Grafana Agent gives the 5G Telecom operator control over the metrics the operator wants to report; once the data is in Grafana, it can be stored in a Grafana database as extra data redundancy.

In conclusion, Prometheus allows 5G Telecom operators to improve their operations and offer better customer service. Adopting a unified monitoring and alert system like Prometheus is one way to achieve this.

5G Grafana Kubernetes OpenShift Data (computing)

Opinions expressed by DZone contributors are their own.

Related

  • Are You Tracking Kubernetes Applications Effectively?
  • Cluster Logging of Telecom 5G IOT Microservice Pods
  • Revolutionizing Kubernetes With K8sGPT: A Deep Dive Into AI-Driven Insights
  • 5 Ways a Service Mesh Can Better Manage App Data Sharing

Partner Resources


Comments

ABOUT US

  • About DZone
  • Send feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends: