Kops tasks and effort estimation

List of open issues and features and their effort estimation as of .

TODO Issues listed below require proper labels to be assigned, especially P0.

Priorities

P0: Must have fixes and features, needed to make existing vSphere support in kops work.

P1: Important fixes and features, required to give vSphere users a more useful kubernetes cluster deployment experience, with multiple masters and HA.

P2: Rarely occurring issues and features that will bring vSphere support closer to AWS and GCE support in kops.

Notes:

  • Effort estimation includes fix for an issue or implementation for a feature, testing and generating a PR.
  • There are a few issues that are related to startup and base image. If we can resolve "Use PhotonOS for vSphere node template" issue first and replace init-cloud with guestinfo, those issues might get resolved automatically. But further investigation is needed and fixed issues will need verifications and testings.
PriorityTaskType (bug, feature, testEffort estimate(in days)Remarks
P0Kops for vSphere is broken kubernetes/kops#2729Bug1
P0AWS EBS is set as default volume provisioner, instead of vSphere kubernetes/kops#2732Bug1Looks like the fix is available, need to be tested again before committing https://github.com/vmware/kops/pull/70/files
P0Package installation through nodeup causing delay in cluster deployment kubernetes/kops#2742Bug2If we get the "Use PhotonOS for vSphere node template" done first, this one may just be avoided. Verification required.
P0Connection to api.clustername.skydns.local is failing kubernetes/kops#2744Bug1This might be fixed by Abrar's PR in kubernetes already. Just need to verify.
P0Make update command, that includes scale up and down, work kubernetes/kops#2738. There are two possible ways to implement this- without auto scaling group (ASG) or with auto scaling groupFeature4 days Assuming ASG is available. 4 days ASG is not available.This effort estimation needs more analysis.
P0Make end-to-end CI/CD work on vmware/kops kubernetes/kops#2730Bug, Test3
P1Use PhotonOS for vSphere node template kubernetes/kops#2735. Use guestinfo instead of init-cloud kubernetes/kops#2726Feature5This was originally P2 issue. However several other issues might be affected by this one. So bring it to P1. Problem to solve: init-cloud on PhotonOS not working properly. Or, get rid of init-cloud and use guestinfo instead.
P1Default image name extraction for vSphere needs to be fixed kubernetes/kops#2740Bug2
P1Multi-master HA setup kubernetes/kops#2734Feature5
P1Add unit tests for vSphere workflows kubernetes/kops#2745Test8This effort estimation needs more analysis as estimator is not familiar with all the components for which tests need to be written.
P1Documentation of- 1) Existing commands and usage, blog kubernetes/kops#2739, 2) Behavior for all flags for ‘kops cluster create’ command kubernetes/kops#27413
P2vCenter running out of HTTP sessions kubernetes/kops#2747Bug3
P2Long boot time for template VM kubernetes/kops#2746Bug3This estimation needs more analysis. Same here: if Use PhotonOS for vSphere node template can be resolved first, this might not be valid. Still, verification needed.
P2Improve methods that create user-data and meta-data for ISO kubernetes/kops#2748, or use guestinfo instead of cloud-init for passing in VM specific informations kubernetes/kops#2726Feature2We need decide If we should use guestinfo instead of cloud-init.
P2Support for rolling upgrade, normal upgradeFeature7This estimation needs more analysis. Presence of ASG might simplify the implementation of this feature. Some changes might be needed in core kops code, as current rolling upgrade implementation is very much AWS specific.
P2Make ETCD volumes re-attachable for vSphere (AWS and GCE already support this) kubernetes/kops#2736Feature7This task needs more analysis for design and implementation. Estimate might change accordingly.
P2Security and isolation- 1) Networking for master, worker nodes kubernetes/kops#2731. 2) Credentials in plain text kubernetes/kops#2743FeatureDon't have enough information on this to give an estimate for effort involved.
P2Explore vSphere DRS cluster’s anti-affinity rules to achieve better master HA and meaningful zone allotment for masters kubernetes/kops#2733Feature8This estimation needs more analysis. DRS anti-affinity rule needs to be explored, to see if it's even fit for this problem.
P2Enable user-defined dns zone name kubernetes/kops#2727Feature2
Total 67

Kops commands behavior for vSphere

List of all kops commands and how they behave for vSphere cloud provider, as of .

Column explanation

  • Command, option and usage example are self-explanatory.
  • vSphere support: whether or not the command is supported for vSphere cloud provider (Yes/No), followed by current status of that command and explanation of any failures.
  • Graceful termination needed: If the command will not supported, does it need additional code to fail gracefully for vSphere provider?
  • Remark: Miscellaneous comments about the command.
CommandOptionUsage examplevSphere supportGraceful termination needed (if not fixed)Remark
completionbashkops completion bashYesNoOutput shell completion code for the given shell (bash), which can easily be incorporated in a bash script to run kops commands as bash functions.
createclusterYes. Supported/tested command flags: cloud, dns, dns-zone, image, networking, node-count, vsphere-server, vsphere-datacenter, vsphere-resource-pool, vsphere-datastore, vsphere-coredns-server, yes, zones.Yes. Check for unsupported flags. Terminate command, if needed, with appropriate message.Creates cluster spec and configs. If —yes is specified then creates resources as well.
createinstancegroupkops create ig —name=v1c1.skydns.local —role=Node —subnet=vmw-zone nodes2No. InstanceGroup spec gets created in object store. Command however shows this error even after setting 'image' value in spec: I0412 11:08:23.025842 80677 populate_instancegroup_spec.go:257] Cannot set default Image for CloudProvider="vsphere"Yes. Either add a check for vSphere, or fix the issue causing the failure.
createsecretkops create secret sshpublickey test_key -i ~/.ssh/git_rsa.pubYesNoCreates and delete secrets can be used in combination to replace existing secrets. Justin's explanation: "k8s in theory supports multiple certificates but it was not working until 1.5 so I don't think we actually enable it in kops This will be how we do certificate rotation though - add a certificate, roll that out, roll out a new key and switch to the new key"
create-f FILENAMEThree yams files are required- cluster: kops create -f ~/kops.yaml, master IG: kops create -f ~/kops.nodeig.yaml, node IG: kops create -f ~/kops.masterig.yamlYesNo
deleteclusterkops delete cluster v2c1.skydns.local —yesYesNo
deleteinstancegroupkops delete instancegroup —name=v2c1.skydns.local nodes.v2c1.skydns.localNo. No implementation available to list resources. Method corresponding to AWS is getting called and crashing with panic, without any useful message.Yes
deletesecretYes-
delete-f FILENAMEkops delete -f config.yaml —name=v2c1.skydns.localNo. Cluster deletion works. Instance group deletion is failing with error: panic: interface conversion: *vsphere.VSphereCloud is not awsup.AWSCloud: missing method AddAWSTags goroutine 1 [running]: panic(0x26fbd20, 0xc420770780) /usr/local/go/src/runtime/panic.go:500 +0x1a1 k8s.io/kops/upup/pkg/kutil.FindCloudInstanceGroupsYesDelete cluster, ig specified by the file.
describesecretskops describe secretsYesNoDescribe secrets, based on the kubectl context.
editclusterkops edit cluster —name=v2c1.skydns.local nodesYes. Edited spec gets updated in object store.YesEdit works. But it would be a bad user experience if we allow users to edit the spec, followed by a failed 'kops update' and then no way to go back to the older spec.
editigkops edit ig —name=v2c1.skydns.local nodesYes. Edited spec gets updated in object store.YesEdit works. But it would be a bad user experience if we allow users to edit the spec, followed by a failed 'kops update' and then no way to go back to the older spec.
editfederationNoYesFederation is a group of k8s clusters. This doesn't look an important goal for vSphere in near future. Q: "How is a federation getting created? I see update and edit methods for a federation but I am not clear how to get a federation in the first place." A: Justin's reply: I'm chatting with the federation folk about kubefed & kops and whether we should integrate them etc. The federation stuff was very alpha and I believe is (trivially) broken right now, but I'm debating integrating with kubefed vs fixing kops federation. kubefed worked fine when I tried it the other day.
exportkubecfgkops export kubecfg v1c1.skydns.localYes-Sets kubectl context to given cluster.
getclustersYes-Gets list of clusters. If yaml output is specified, this output can be modified and used for 'kops replace' command.
getfederationsYes-Gets list of federations. For now empty list.
getinstancesgroupsYes-Gets list of intancegroups. If yaml output is specified, this output can be modified and used for 'kops replace' command.
getsecretsYes-Gets list of secrets.
importclusterkops import cluster —region=us-west-2 —name=v2c1.skydns.local nodesNo. Current implementation is very aws specific. Multiple aws services are queried to construct the api.Cluster object.YesImports spec for an existing cluster into the object store. While this functionality is good for importing and managing existing k8s clusters using kops, it doesn't seem like a high priority functionality at this point of time.
replacekops replace -f FILENAMENoYesOutput of kops get cluster name -oyaml or kops get ig name -oyaml can be updated and passed to 'kops replace' command.
rolling-updateclusterNo. Current implementation is aws specific.Yes
secretscreate--Legacy command, points to 'kops create secrets'.
secretsdescribe--Legacy command, points to 'kops describe secrets'.
secretsexpose--Legacy command, points to 'kops get secrets -oplaintext'.
secretsget--Legacy command, point to 'kops get secret'.
toolboxdumpNo. Current implementation is aws specific.YesDumps cloud information for the given cluster. This looks like a good to have functionality. Once resource listing is available for vsphere, which will anyways get used for deletion operation as well, this command should become easier to implement.
toolboxconvert-importedNo. Current implementation is aws specific.YesDoesn't look like a high priority functionality.
updateclusterkops update cluster —name=v2c1.skydns.local —yesNo. 1) Works for new cluster. 2) Existing cluster scale up: vSphere provisioning code tries to provision all master and node VMs from scratch. New nodes get created and registered successfully. Existing resources keep failing with 'already exists' error. 3) Existing cluster scale down: Won't work, no resource listing or deletion logic available for vSphere. On top of that all listed resources- masters and workers are attempted for creation and fail with 'already exists' error.Yes
updatefederationNoYesFederation is a group of k8s clusters. This doesn't look an important goal for vSphere in near future.
upgradeclusterkops upgrade cluster —name=v1c1.skydns.local —yesNo. Seeing this error: W0413 11:48:52.216116 15456 upgrade_cluster.go:202] No matching images specified in channel; cannot prompt for upgradeYesFind out more about 'channel' in context of kops. Note that no —channel argument is specified.
validateclusterkops validate cluster —name=v1c1.skydns.localYes. Not working right now. Failing with this error: cannot get nodes for "v1c1.skydns.local": Get https://api.v1c1.skydns.local/api/v1/nodes: dial tcp: lookup api.v1c1.skydns.local: no such host-Investigation is already going on- https://github.com/kubernetes/kops/issues/2744. This issue will most likely get fixed by a fix in cloud-provider code that is not returning appropriate internal and external IP for the node.
versionkops versionYes-Prints client version information. Eg: Version 1.6.0-alpha.1 (git-500cb69)