Troubleshooting Windows container workload issues

Windows Machine Config Operator does not install

If you have completed the process of installing the Windows Machine Config Operator (WMCO), but the Operator is stuck in the InstallWaiting phase, your issue is likely caused by a networking issue.

The WMCO requires your OKD cluster to be configured with hybrid networking using OVN-Kubernetes; the WMCO cannot complete the installation process without hybrid networking available. This is necessary to manage nodes on multiple operating systems (OS) and OS variants. This must be completed during the installation of your cluster.

For more information, see Configuring hybrid networking.

Investigating why Windows Machine does not become compute node

There are various reasons why a Windows Machine does not become a compute node. The best way to investigate this problem is to collect the Windows Machine Config Operator (WMCO) logs.

Prerequisites

  • You installed the Windows Machine Config Operator (WMCO) using Operator Lifecycle Manager (OLM).

  • You have created a Windows machine set.

Procedure

  • Run the following command to collect the WMCO logs:

    1. $ oc logs -f deployment/windows-machine-config-operator -n openshift-windows-machine-config-operator

Accessing a Windows node

Windows nodes cannot be accessed using the oc debug node command; the command requires running a privileged pod on the node, which is not yet supported for Windows. Instead, a Windows node can be accessed using a secure shell (SSH) or Remote Desktop Protocol (RDP). An SSH bastion is required for both methods.

Accessing a Windows node using SSH

You can access a Windows node by using a secure shell (SSH).

Prerequisites

  • You have installed the Windows Machine Config Operator (WMCO) using Operator Lifecycle Manager (OLM).

  • You have created a Windows machine set.

  • You have added the key used in the cloud-private-key secret and the key used when creating the cluster to the ssh-agent. For security reasons, remember to remove the keys from the ssh-agent after use.

  • You have connected to the Windows node using an ssh-bastion pod.

Procedure

  • Access the Windows node by running the following command:

    1. $ ssh -t -o StrictHostKeyChecking=no -o ProxyCommand='ssh -A -o StrictHostKeyChecking=no \
    2. -o ServerAliveInterval=30 -W %h:%p core@$(oc get service --all-namespaces -l run=ssh-bastion \
    3. -o go-template="{{ with (index (index .items 0).status.loadBalancer.ingress 0) }}{{ or .hostname .ip }}{{end}}")' <username>@<windows_node_internal_ip> (1) (2)
    1Specify the cloud provider username, such as Administrator for Amazon Web Services (AWS) or capi for Microsoft Azure.
    2Specify the internal IP address of the node, which can be discovered by running the following command:
    1. $ oc get nodes <node_name> -o jsonpath={.status.addresses[?\(@.type==\"InternalIP\"\)].address}

Accessing a Windows node using RDP

You can access a Windows node by using a Remote Desktop Protocol (RDP).

Prerequisites

  • You installed the Windows Machine Config Operator (WMCO) using Operator Lifecycle Manager (OLM).

  • You have created a Windows machine set.

  • You have added the key used in the cloud-private-key secret and the key used when creating the cluster to the ssh-agent. For security reasons, remember to remove the keys from the ssh-agent after use.

  • You have connected to the Windows node using an ssh-bastion pod.

Procedure

  1. Run the following command to set up an SSH tunnel:

    1. $ ssh -L 2020:<windows_node_internal_ip>:3389 \ (1)
    2. core@$(oc get service --all-namespaces -l run=ssh-bastion -o go-template="{{ with (index (index .items 0).status.loadBalancer.ingress 0) }}{{ or .hostname .ip }}{{end}}")
    1Specify the internal IP address of the node, which can be discovered by running the following command:
    1. $ oc get nodes <node_name> -o jsonpath={.status.addresses[?\(@.type==\"InternalIP\"\)].address}
  2. From within the resulting shell, SSH into the Windows node and run the following command to create a password for the user:

    1. C:\> net user <username> * (1)
    1Specify the cloud provider user name, such as Administrator for AWS or capi for Azure.

You can now remotely access the Windows node at localhost:2020 using an RDP client.

Collecting Kubernetes node logs for Windows containers

Windows container logging works differently from Linux container logging; the Kubernetes node logs for Windows workloads are streamed to the C:\var\logs directory by default. Therefore, you must gather the Windows node logs from that directory.

Prerequisites

  • You installed the Windows Machine Config Operator (WMCO) using Operator Lifecycle Manager (OLM).

  • You have created a Windows machine set.

Procedure

  1. To view the logs under all directories in C:\var\logs, run the following command:

    1. $ oc adm node-logs -l kubernetes.io/os=windows --path= \
    2. /ip-10-0-138-252.us-east-2.compute.internal containers \
    3. /ip-10-0-138-252.us-east-2.compute.internal hybrid-overlay \
    4. /ip-10-0-138-252.us-east-2.compute.internal kube-proxy \
    5. /ip-10-0-138-252.us-east-2.compute.internal kubelet \
    6. /ip-10-0-138-252.us-east-2.compute.internal pods
  2. You can now list files in the directories using the same command and view the individual log files. For example, to view the kubelet logs, run the following command:

    1. $ oc adm node-logs -l kubernetes.io/os=windows --path=/kubelet/kubelet.log

Collecting Windows application event logs

The Get-WinEvent shim on the kubelet logs endpoint can be used to collect application event logs from Windows machines.

Prerequisites

  • You installed the Windows Machine Config Operator (WMCO) using Operator Lifecycle Manager (OLM).

  • You have created a Windows machine set.

Procedure

  • To view logs from all applications logging to the event logs on the Windows machine, run:

    1. $ oc adm node-logs -l kubernetes.io/os=windows --path=journal

    The same command is executed when collecting logs with oc adm must-gather.

    Other Windows application logs from the event log can also be collected by specifying the respective service with a -u flag. For example, you can run the following command to collect logs for the docker runtime service:

    1. $ oc adm node-logs -l kubernetes.io/os=windows --path=journal -u docker

Collecting Docker logs for Windows containers

The Windows Docker service does not stream its logs to stdout, but instead, logs to the event log for Windows. You can view the Docker event logs to investigate issues you think might be caused by the Windows Docker service.

Prerequisites

  • You installed the Windows Machine Config Operator (WMCO) using Operator Lifecycle Manager (OLM).

  • You have created a Windows machine set.

Procedure

  1. SSH into the Windows node and enter PowerShell:

    1. C:\> powershell
  2. View the Docker logs by running the following command:

    1. C:\> Get-EventLog -LogName Application -Source Docker

Additional resources