Labs Workbench via Kubeadm

From NDS-766 - Getting issue details... STATUS

Kubeadm exploration:

Provision 3 nodes via horizon m1.large with fedora-25-cloud (kadm {1,2,3}

On each via ssh:

cat <<EOF > /etc/yum.repos.d/kubernetes.repo

[kubernetes]

name=Kubernetes

baseurl=http://yum.kubernetes.io/repos/kubernetes-el7-x86_64

enabled=1

gpgcheck=1

repo_gpgcheck=1

gpgkey=https://packages.cloud.google.com/yum/doc/yum-key.gpg

https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg

EOF

setenforce 0

dnf -y update

dnf -y --nogpgcheck install -y docker kubelet kubeadm kubectl kubernetes-cni

systemctl enable docker && systemctl start docker

systemctl enable kubelet && systemctl start kubelet

On kadm1 (master) in sudo shell

env | grep OS_ >> /etc/kubernetes/cloud-config

kubeadm init

Save the output join command with token out token}}
In su shell, paste the saved command: kubeadm join --token=1c9133.347bb13bd6bef75d 192.168.100.203
Switch to master
Check kubectl get nodes to assure all nodes have joined: kubectl get nodes
Check pods look normal: kubectl get pods --all-namespaces.
DNS in prepetual containercreating is normal until a network is applied to the cluster. Other pods should be ready, and represent kubernetes running as daemonsets within kubernetes.
The following is what seems to work, the other alternatives and issues are at the end
Install calico for networking: kubectl apply -f http://docs.projectcalico.org/v2.0/getting-started/kubernetes/installation/hosted/kubeadm/calico.yaml

When calico is fully ready, the DNS pods should come online and show ready.

[root@kadm-1 fedora]# kubectl get pods --all-namespaces

NAMESPACE NAME READY STATUS RESTARTS AGE

kube-system calico-etcd-sdg0q 1/1 Running 0 9m

kube-system calico-node-9jqqb 2/2 Running 0 9m

kube-system calico-node-lp52n 2/2 Running 0 9m

kube-system calico-node-sx3vb 2/2 Running 0 9m

kube-system calico-policy-controller-917753764-h4nwf 1/1 Running 0 9m

kube-system dummy-2088944543-107c1 1/1 Running 0 12m

kube-system etcd-kadm-1.os.ncsa.edu 1/1 Running 0 12m

kube-system kube-apiserver-kadm-1.os.ncsa.edu 1/1 Running 0 12m

kube-system kube-controller-manager-kadm-1.os.ncsa.edu 1/1 Running 0 12m

kube-system kube-discovery-1769846148-kfkrj 1/1 Running 0 12m

kube-system kube-dns-2924299975-1r784 4/4 Running 0 12m

kube-system kube-proxy-mnfcx 1/1 Running 0 11m

kube-system kube-proxy-rscmv 1/1 Running 0 12m

kube-system kube-proxy-s6wpm 1/1 Running 0 11m

kube-system kube-scheduler-kadm-1.os.ncsa.edu 1/1 Running 0 12m

Any other addons (dash/monitoring/etc.) can be deployed directly via kubectl apply directly from the kubernetes soruce URL's under addons:
Fully functional kubernetes at this point should support normal kubernetes deployments - including WB if we had self-hosted shared gluster storage - solution discussed in NDS-764

Notes on Problems, Issues, Workarounds, future

Network plugin - originally attempted the flannel plugin as we are familiar. Flannel seemed to work for pod networking but DNS did not go ready as it should have. Deleted the deployment and deployed the calico option - calico came up and dns went ready. Simple test with a fedora container deploy using .cluster.local names worked as expected.
reboot - rebooted all 3 nodes - after which containers on all nodes came back, but kubectl on master was unable to connect. Feeling is that this may be flannel configuration that was not cleaned up from the initial deploy. Did a kubeadm reset and attempted redeploy, but kubeadm init on master would not complete and so the entire system was unresolvable without serious investigation.
automating via openstack cli: attempted to script the process originally using nova boot from openstack cli: nova boot --flavor m1.large --key-name dr --nic net-id=38a19005-2177-4e32-96d3-009fcae6dfaa --image fedora-25-cloud --security-groups "remote SSH" kadm2 but nova cli instances were hung in boot obtaining IPO address from openstack.
openstack provider -
Ansible integration - because deploy-tools is intertwined with coreos and the storage model for glfs-global, and the storage setup is coreos specific and connected to the brbick deployments the suggested changes to separate the depnedencies and integrate kubeadm in the process are:
- Split the storage (glfs volumes and mounts) out from openstack_provision - openstack-provision becomes simple server deploys for all in cluster - just the os/image/flavor/etc.
- Move the storage allocation and initialization into an alternate playbook for on-prem openstack deploys with deploy-tools, or discard the storage provisioning in ansible, replacing by simple configmaps and use the strategy discussed in NDS-764 which is more portable across cloud providers.
Openstack provider - kubeadm init did not complete with --cloud-provder=openstack and /etc/kubernetes/cloud-config in various configurations (file, json, etc.) with the OS_ennvironment variables within. It is unclear why, but the documentation here is quite thin and embedded in git issues, etc. Enabling openstack volume provider and dynamic provisioning will enable the dynamic self-hosted gluster brick-server-in-pods via statefulsets. The integration is there, but time did not permit running this down.
OS and docker versions - I attempted using f25-cloud which has docker 1.12 which is not officially supported. Fought a bit with ubuntu cloud but was unsuccessful with qemu format imaages unclear why. Suggest that a centos7 be used going forward, but even there docker versions in the OS may be rev'd past the official kubernetes supported versions.
1.6 release - The kubeadm team gave an update some weeks ago that suggested that HA-scale deployments on all supported cloud-providers would be part of the release - due 3-22. Looked for updated info in their features repo which tracks release features (an interesting alternative solution to managing release features and burn-down tracking) but everything seems to still be in-progress and no recent updates. Note that 1.6 is a stabilization release - hardening and widening to corner-cases - so expect that 1.6 kubeadm should include most of the features NDS needs.

Could work with some changes to how we provision
Could work in local VM version
1.6 (3/22) should include enterprise-scale deployments

Space shortcuts

Page tree

Notes on Problems, Issues, Workarounds, future