Installing the Network Observability Operator

Installing Loki is a prerequisite for using the Network Observability Operator. It is recommended to install Loki using the Loki Operator; therefore, these steps are documented below prior to the Network Observability Operator installation.

The Loki Operator integrates a gateway that implements multi-tenancy & authentication with Loki for data flow storage. The LokiStack resource manages Loki, which is a scalable, highly-available, multitenant log aggregation system, and a web proxy with OKD authentication. The LokiStack proxy uses OKD authentication to enforce multi-tenancy and facilitate the saving and indexing of data in Loki log stores.

The Loki Operator can also be used for Logging with the LokiStack. The Network Observability Operator requires a dedicated LokiStack separate from Logging.

Installing the Loki Operator

It is recommended to install Loki using the Loki Operator version 5.6, This version provides the ability to create a LokiStack instance using the openshift-network tennant configuration mode. It also provides fully automatic, in-cluster authentication and authorization support for Network Observability.

Prerequisites

  • Supported Log Store (AWS S3, Google Cloud Storage, Azure, Swift, Minio, OpenShift Data Foundation)

  • OKD 4.10+.

  • Linux Kernel 4.18+.

There are several ways you can install Loki. One way you can install the Loki Operator is by using the OKD web console Operator Hub.

Procedure

  1. Install the Loki Operator Operator:

    1. In the OKD web console, click OperatorsOperatorHub.

    2. Choose Loki Operator from the list of available Operators, and click Install.

    3. Under Installation Mode, select All namespaces on the cluster.

    4. Verify that you installed the Loki Operator. Visit the OperatorsInstalled Operators page and look for Loki Operator.

    5. Verify that Loki Operator is listed with Status as Succeeded in all the projects.

  2. Create a Secret YAML file. You can create this secret in the web console or CLI.

    1. Using the web console, navigate to the ProjectAll Projects dropdown and select Create Project. Name the project netobserv and click Create.

    2. Navigate to the Import icon ,+, in the top right corner. Drop your YAML file into the editor. It is important to create this YAML file in the netobserv namespace that uses the access_key_id and access_key_secret to specify your credentials.

    3. Once you create the secret, you should see it listed under WorkloadsSecrets in the web console.

      The following shows an example secret YAML file:

  1. apiVersion: v1
  2. kind: Secret
  3. metadata:
  4. name: loki-s3
  5. namespace: netobserv
  6. stringData:
  7. access_key_id: QUtJQUlPU0ZPRE5ON0VYQU1QTEUK
  8. access_key_secret: d0phbHJYVXRuRkVNSS9LN01ERU5HL2JQeFJmaUNZRVhBTVBMRUtFWQo=
  9. bucketnames: s3-bucket-name
  10. endpoint: https://s3.eu-central-1.amazonaws.com
  11. region: eu-central-1

To uninstall Loki, refer to the uninstallation process that corresponds with the method you used to install Loki. You might have remaining ClusterRoles and ClusterRoleBindings, data stored in object store, and persistent volume that must be removed.

Create a LokiStack custom resource

It is recommended to deploy the LokiStack in the same namespace referenced by the FlowCollector specification, spec.namespace. You can use the web console or CLI to create a namespace, or new project.

Procedure

  1. Navigate to OperatorsInstalled Operators.

  2. In the Details, under Provided APIs, select LokiStack and click Create LokiStack.

  3. Ensure the following fields are specified in either Form View or YAML view:

    1. apiVersion: loki.grafana.com/v1
    2. kind: LokiStack
    3. metadata:
    4. name: loki
    5. namespace: netobserv
    6. spec:
    7. size: 1x.small
    8. storage:
    9. schemas:
    10. - version: v12
    11. effectiveDate: '2022-06-01'
    12. secret:
    13. name: loki-s3
    14. type: s3
    15. storageClassName: gp3 (1)
    16. tenants:
    17. mode: openshift-network
    1Use a storage class name that is available on the cluster for ReadWriteOnce access mode. You can use oc get storageclasses to see what is available on your cluster.

    You must not reuse the same LokiStack that is used for cluster logging.

Deployment Sizing

Sizing for Loki follows the format of N<x>._<size>_ where the value <N> is the number of instances and <size> specifies performance capabilities.

1x.extra-small is for demo purposes only, and is not supported.

Table 1. Loki Sizing
1x.extra-small1x.small1x.medium

Data transfer

Demo use only.

500GB/day

2TB/day

Queries per second (QPS)

Demo use only.

25-50 QPS at 200ms

25-75 QPS at 200ms

Replication factor

None

2

3

Total CPU requests

5 vCPUs

36 vCPUs

54 vCPUs

Total Memory requests

7.5Gi

63Gi

139Gi

Total Disk requests

150Gi

300Gi

450Gi

Configuring LokiStack ingestion

The LokiStack instance comes with default settings according to the configured size. It is possible to override some of these settings, such as the ingestion and query limits. You might want to update them if you get Loki errors showing up in the Console plugin, or in flowlogs-pipeline logs.

Here is an example of configured limits:

  1. spec:
  2. limits:
  3. global:
  4. ingestion:
  5. ingestionBurstSize: 40
  6. ingestionRate: 20
  7. maxGlobalStreamsPerTenant: 25000
  8. queries:
  9. maxChunksPerQuery: 2000000
  10. maxEntriesLimitPerQuery: 10000
  11. maxQuerySeries: 3000

Refer to the LokiStack API reference for more information on these settings.

A good practice is to define an alert, to get notified when these limits are reached. In the example below, the alert uses a metric provided by the Loki operator, loki_request_duration_seconds_count:

  1. apiVersion: monitoring.coreos.com/v1
  2. kind: PrometheusRule
  3. metadata:
  4. name: loki-alerts
  5. namespace: openshift-operators-redhat
  6. spec:
  7. groups:
  8. - name: LokiRateLimitAlerts
  9. rules:
  10. - alert: LokiTenantRateLimit
  11. annotations:
  12. message: |-
  13. {{ $labels.job }} {{ $labels.route }} is experiencing 429 errors.
  14. summary: "At any number of requests are responded with the rate limit error code."
  15. expr: sum(irate(loki_request_duration_seconds_count{status_code="429"}[1m])) by (job, namespace, route) / sum(irate(loki_request_duration_seconds_count[1m])) by (job, namespace, route) * 100 > 0
  16. for: 10s
  17. labels:
  18. severity: warning

Create roles for authentication and authorization

Specify authentication and authorization configurations by defining ClusterRole and ClusterRoleBinding. You can create a YAML file to define these roles.

Procedure

  1. Using the web console, click the Imoport icon, +.

  2. Drop your YAML file into the editor and click Create:

    1. apiVersion: rbac.authorization.k8s.io/v1
    2. kind: ClusterRole
    3. metadata:
    4. name: loki-netobserv-tenant
    5. rules:
    6. - apiGroups:
    7. - 'loki.grafana.com'
    8. resources:
    9. - network
    10. resourceNames:
    11. - logs
    12. verbs:
    13. - 'get'
    14. - 'create'
    15. ---
    16. apiVersion: rbac.authorization.k8s.io/v1
    17. kind: ClusterRoleBinding
    18. metadata:
    19. name: loki-netobserv-tenant
    20. roleRef:
    21. apiGroup: rbac.authorization.k8s.io
    22. kind: ClusterRole
    23. name: loki-netobserv-tenant
    24. subjects:
    25. - kind: ServiceAccount
    26. name: flowlogs-pipeline (1)
    27. namespace: netobserv
    28. - kind: ServiceAccount
    29. name: netobserv-plugin (2)
    30. namespace: netobserv
    1The flowlogs-pipeline writes to Loki. If you are using Kafka, this value is flowlogs-pipeline-transformer.
    2The netobserv-plugin reads in the data.

Installing Kafka (optional)

The Kafka Operator is supported for large scale environments. You can install the Kafka Operator as Red Hat AMQ Streams from the Operator Hub, just as the Loki Operator and Network Observability Operator were installed.

To uninstall Kafka, refer to the uninstallation process that corresponds with the method you used to install.

Installing the Network Observability Operator

You can install the Network Observability Operator using the OKD web console Operator Hub. When you install the Operator, it provides the FlowCollector custom resource definition (CRD). You can set specifications in the web console when you create the FlowCollector.

Prerequisites

Procedure

  1. In the OKD web console, click OperatorsOperatorHub.

  2. Choose Network Observability Operator from the list of available Operators in the OperatorHub, and click Install.

  3. Navigate to OperatorsInstalled Operators. Under Provided APIs for Network Observability, select the Flow Collector link.

    1. Navigate to the Flow Collector tab, and click Create FlowCollector. Make the following selections in the form view:

      • spec.agent.ebpf.Sampling : Specify a sampling size for flows. Lower sampling sizes will have higher impact on resource utilization. For more information, see the FlowCollector API reference, under spec.agent.ebpf.

      • spec.deploymentModel: If you are using Kafka, verify Kafka is selected.

      • loki.url: Since authentication is specified separately, this URL needs to be updated to [https://loki-gateway-http.netobserv.svc:8080/api/logs/v1/network](https://loki-gateway-http.netobserv.svc:8080/api/logs/v1/network). The first part of the URL, “loki”, should match the name of your LokiStack.

      • loki.tenantID: Set this to network.

      • loki.statusUrl: Set this to [https://loki-query-frontend-http.netobserv.svc:3100/](https://loki-query-frontend-http.netobserv.svc:3100/).

      • loki.authToken: Select the HOST value.

      • tls.enable: Verify that the box is checked so it is enabled.

    2. Click Create.

Verification

To confirm this was successful, when you navigate to Observe you should see Network Traffic listed in the options.

In the absence of Application Traffic within the OKD cluster, default filters might show that there are “No results”, which results in no visual flow. Beside the filter selections, select Clear all filters to see the flow.

If you installed Loki using the Loki Operator, it is advised not to use querierUrl, as it can break the console access to Loki. If you installed Loki using another type of Loki installation, this does not apply.

Additional resources

To see more information about Flow Collector specifications, see the Flow Collector API Reference and the Flow Collector sample resource.

Uninstalling the Network Observability Operator

You can uninstall the Network Observability Operator using the OKD web console Operator Hub, working in the OperatorsInstalled Operators area.

Procedure

  1. Uninstall the FlowCollector custom resource.

    1. Click Flow Collector, which is next to the Network Observability Operator in the Provided APIs column.

    2. Click the options menu kebab for the cluster and select Delete FlowCollector.

  2. Uninstall the Network Observability Operator.

    1. Navigate back to the OperatorsInstalled Operators area.

    2. Click the options menu kebab next to the Network Observability Operator and select Uninstall Operator.