MetalLB troubleshooting and support

In the event that you need to troubleshoot MetalLB configuration, refer to the following sections for commonly used commands.

Troubleshooting BGP issues

The BGP implementation that Red Hat supports uses FRRouting (FRR) in a container in the speaker pods. As a cluster administrator, if you need to troubleshoot BGP configuration issues, you need to run commands in the FRR container.

Prerequisites

  • You have access to the cluster as a user with the cluster-admin role.

  • You have installed the OpenShift CLI (oc).

Procedure

  1. Display the names of the speaker pods:

    1. $ oc get -n metallb-system pods -l app.kubernetes.io/component=speaker

    Example output

    1. NAME READY STATUS RESTARTS AGE
    2. speaker-66bth 4/4 Running 0 56m
    3. speaker-gvfnf 4/4 Running 0 56m
    4. ...
  2. Display the running configuration for FRR:

    1. $ oc exec -n metallb-system speaker-66bth -c frr -- vtysh -c "show running-config"

    Example output

    1. Building configuration...
    2. Current configuration:
    3. !
    4. frr version 7.5.1_git
    5. frr defaults traditional
    6. hostname some-hostname
    7. log file /etc/frr/frr.log informational
    8. log timestamp precision 3
    9. service integrated-vtysh-config
    10. !
    11. router bgp 64500 (1)
    12. bgp router-id 10.0.1.2
    13. no bgp ebgp-requires-policy
    14. no bgp default ipv4-unicast
    15. no bgp network import-check
    16. neighbor 10.0.2.3 remote-as 64500 (2)
    17. neighbor 10.0.2.3 bfd profile doc-example-bfd-profile-full (3)
    18. neighbor 10.0.2.3 timers 5 15
    19. neighbor 10.0.2.4 remote-as 64500 (2)
    20. neighbor 10.0.2.4 bfd profile doc-example-bfd-profile-full (3)
    21. neighbor 10.0.2.4 timers 5 15
    22. !
    23. address-family ipv4 unicast
    24. network 203.0.113.200/30 (4)
    25. neighbor 10.0.2.3 activate
    26. neighbor 10.0.2.3 route-map 10.0.2.3-in in
    27. neighbor 10.0.2.4 activate
    28. neighbor 10.0.2.4 route-map 10.0.2.4-in in
    29. exit-address-family
    30. !
    31. address-family ipv6 unicast
    32. network fc00:f853:ccd:e799::/124 (4)
    33. neighbor 10.0.2.3 activate
    34. neighbor 10.0.2.3 route-map 10.0.2.3-in in
    35. neighbor 10.0.2.4 activate
    36. neighbor 10.0.2.4 route-map 10.0.2.4-in in
    37. exit-address-family
    38. !
    39. route-map 10.0.2.3-in deny 20
    40. !
    41. route-map 10.0.2.4-in deny 20
    42. !
    43. ip nht resolve-via-default
    44. !
    45. ipv6 nht resolve-via-default
    46. !
    47. line vty
    48. !
    49. bfd
    50. profile doc-example-bfd-profile-full (3)
    51. transmit-interval 35
    52. receive-interval 35
    53. passive-mode
    54. echo-mode
    55. echo-interval 35
    56. minimum-ttl 10
    57. !
    58. !
    59. end
    1The router bgp section indicates the ASN for MetalLB.
    2Confirm that a neighbor <ip-address> remote-as <peer-ASN> line exists for each BGP peer custom resource that you added.
    3If you configured BFD, confirm that the BFD profile is associated with the correct BGP peer and that the BFD profile appears in the command output.
    4Confirm that the network <ip-address-range> lines match the IP address ranges that you specified in address pool custom resources that you added.
  3. Display the BGP summary:

    1. $ oc exec -n metallb-system speaker-66bth -c frr -- vtysh -c "show bgp summary"

    Example output

    1. IPv4 Unicast Summary:
    2. BGP router identifier 10.0.1.2, local AS number 64500 vrf-id 0
    3. BGP table version 1
    4. RIB entries 1, using 192 bytes of memory
    5. Peers 2, using 29 KiB of memory
    6. Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd PfxSnt
    7. 10.0.2.3 4 64500 387 389 0 0 0 00:32:02 0 1 (1)
    8. 10.0.2.4 4 64500 0 0 0 0 0 never Active 0 (2)
    9. Total number of neighbors 2
    10. IPv6 Unicast Summary:
    11. BGP router identifier 10.0.1.2, local AS number 64500 vrf-id 0
    12. BGP table version 1
    13. RIB entries 1, using 192 bytes of memory
    14. Peers 2, using 29 KiB of memory
    15. Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd PfxSnt
    16. 10.0.2.3 4 64500 387 389 0 0 0 00:32:02 NoNeg (1)
    17. 10.0.2.4 4 64500 0 0 0 0 0 never Active 0 (2)
    18. Total number of neighbors 2
    1Confirm that the output includes a line for each BGP peer custom resource that you added.
    2Output that shows 0 messages received and messages sent indicates a BGP peer that does not have a BGP session. Check network connectivity and the BGP configuration of the BGP peer.
  4. Display the BGP peers that received an address pool:

    1. $ oc exec -n metallb-system speaker-66bth -c frr -- vtysh -c "show bgp ipv4 unicast 203.0.113.200/30"

    Replace ipv4 with ipv6 to display the BGP peers that received an IPv6 address pool. Replace 203.0.113.200/30 with an IPv4 or IPv6 IP address range from an address pool.

    Example output

    1. BGP routing table entry for 203.0.113.200/30
    2. Paths: (1 available, best #1, table default)
    3. Advertised to non peer-group peers:
    4. 10.0.2.3 (1)
    5. Local
    6. 0.0.0.0 from 0.0.0.0 (10.0.1.2)
    7. Origin IGP, metric 0, weight 32768, valid, sourced, local, best (First path received)
    8. Last update: Mon Jan 10 19:49:07 2022
    1Confirm that the output includes an IP address for a BGP peer.

Troubleshooting BFD issues

The Bidirectional Forwarding Detection (BFD) implementation that Red Hat supports uses FRRouting (FRR) in a container in the speaker pods. The BFD implementation relies on BFD peers also being configured as BGP peers with an established BGP session. As a cluster administrator, if you need to troubleshoot BFD configuration issues, you need to run commands in the FRR container.

Prerequisites

  • You have access to the cluster as a user with the cluster-admin role.

  • You have installed the OpenShift CLI (oc).

Procedure

  1. Display the names of the speaker pods:

    1. $ oc get -n metallb-system pods -l app.kubernetes.io/component=speaker

    Example output

    1. NAME READY STATUS RESTARTS AGE
    2. speaker-66bth 4/4 Running 0 26m
    3. speaker-gvfnf 4/4 Running 0 26m
    4. ...
  2. Display the BFD peers:

    1. $ oc exec -n metallb-system speaker-66bth -c frr -- vtysh -c "show bfd peers brief"

    Example output

    1. Session count: 2
    2. SessionId LocalAddress PeerAddress Status
    3. ========= ============ =========== ======
    4. 3909139637 10.0.1.2 10.0.2.3 up (1)
    1Confirm that the PeerAddress column includes each BFD peer. If the output does not list a BFD peer IP address that you expected the output to include, troubleshoot BGP connectivity with the peer. If the status field indicates down, check for connectivity on the links and equipment between the node and the peer. You can determine the node name for the speaker pod with a command like oc get pods -n metallb-system speaker-66bth -o jsonpath=’{.spec.nodeName}’.

MetalLB metrics for BGP and BFD

OKD captures the following metrics that are related to MetalLB and BGP peers and BFD profiles:

  • metallb_bfd_control_packet_input counts the number of BFD control packets received from each BFD peer.

  • metallb_bfd_control_packet_output counts the number of BFD control packets sent to each BFD peer.

  • metallb_bfd_echo_packet_input counts the number of BFD echo packets received from each BFD peer.

  • metallb_bfd_echo_packet_output counts the number of BFD echo packets sent to each BFD peer.

  • metallb_bfd_session_down_events counts the number of times the BFD session with a peer entered the down state.

  • metallb_bfd_session_up indicates the connection state with a BFD peer. 1 indicates the session is up and 0 indicates the session is down.

  • metallb_bfd_session_up_events counts the number of times the BFD session with a peer entered the up state.

  • metallb_bfd_zebra_notifications counts the number of BFD Zebra notifications for each BFD peer.

  • metallb_bgp_announced_prefixes_total counts the number of load balancer IP address prefixes that are advertised to BGP peers. The terms prefix and aggregated route have the same meaning.

  • metallb_bgp_session_up indicates the connection state with a BGP peer. 1 indicates the session is up and 0 indicates the session is down.

  • metallb_bgp_updates_total counts the number of BGP update messages that were sent to a BGP peer.

Additional resources

About collecting MetalLB data

You can use the oc adm must-gather CLI command to collect information about your cluster, your MetalLB configuration, and the MetalLB Operator. The following features and objects are associated with MetalLB and the MetalLB Operator:

  • The namespace and child objects that the MetalLB Operator is deployed in

  • All MetalLB Operator custom resource definitions (CRDs)

The oc adm must-gather CLI command collects the following information from FRRouting (FRR) that Red Hat uses to implement BGP and BFD:

  • /etc/frr/frr.conf

  • /etc/frr/frr.log

  • /etc/frr/daemons configuration file

  • /etc/frr/vtysh.conf

The log and configuration files in the preceding list are collected from the frr container in each speaker pod.

In addition to the log and configuration files, the oc adm must-gather CLI command collects the output from the following vtysh commands:

  • show running-config

  • show bgp ipv4

  • show bgp ipv6

  • show bgp neighbor

  • show bfd peer

No additional configuration is required when you run the oc adm must-gather CLI command.

Additional resources