1.3.32.kubectl commands are failing due to proxy
kubectl commands may fail with the following error e0403 14 04 05 944359 3116197 memcache go 265] couldn't get current server api group list get "https //swimlane dev swimlane example com 6443/api?timeout=32s" service unavailable check if there is a proxy enabled in the environment env | grep i proxy the resulting output should be as below if there is a proxy enabled in place https proxy=http //192 168 x x 8080 http proxy=http //192 168 x x 8080 disable proxy from the node export no proxy=127 0 0 1,192 168 x x, default,kubernetes, local,localhost, svc,load balancer fqdn the above should also be added to $home/ bashrc otherwise a node reboot will wipe it check to make sure the proxy is disabled env | grep i proxy the output from above should now be as below https proxy=http //192 168 x x 8080 http proxy=http //192 168 x x 8080 no proxy=127 0 0 1,192 168 x x, default,kubernetes, local,localhost, svc,load balancer fqdn kubectl commands should work now
1.3.33.Upload a New TLS Certificate
if you've already gone through the setup process once, and you want to upload new tls certificates, run this command to restore the ability to upload new tls certificates kubectl n default annotate secret kotsadm tls acceptanonymousuploads=1 adding this annotation temporarily creates a vulnerability for an attacker to maliciously upload tls certificates once tls certificates have been uploaded again, the vulnerability goes away after adding the annotation, you will need to restart the kurl proxy server the simplest way to do that is to delete the kurl proxy pod (the pod will automatically get restarted) with this command kubectl delete pods proxy server this command provides you with the name of the kurl proxy server kubectl get pods a | grep kurl proxy | awk '{print $2}' after the pod has been restarted, re direct your browser to http //\<your ip> 8800/tls to see the same page that you did during the initial installation then, load your tls certificate swimlane recommends that you complete this process as soon as possible in order to avoid anyone from nefariously uploading tls certificates after this process is complete, the vulnerability is closed, and uploading new tls certificates will be disallowed again please repeat the steps above in order to upload new tls certificates
1.3.34.How to Install and Use the restic CLI
sometimes access to restic restore objects is needed to troubleshoot why an spi instance snapshot failed to restore to manipulate restic restore objects, it is necessary to install the restic cli into the swimlane tools pod download the current restic cli download restic 0 10 0 from github (the version will change as time passes) in a browser, navigate to https //github com/restic/restic/releases/v0 10 0 and download restic 0 10 0 linux amd64 bz2 use bzip2 to uncompress restic bzip2 d restic 0 10 0 linux amd64 bz2 copy the restic cli binary into the swimlane tools pod kubectl cp restic 0 10 0 linux amd64 swimlane tools 0 /tmp get the values for environment variables that must be created in the swimlane tools pod (aws access key id, aws secret access key and restic password) note if you are unsure how to get the secrets please reach out to support exec into the swimlane tools pod and set the aws environment variables kubect exec it swimlane tools 0 bash export aws access key id=\<value above> export aws secret access key=\<value above> export restic password=\<value above> execute restic commands (still in the swimlane tools pod) /tmp/restic 0 14 0 linux amd64 list snapshots r 's3\ http //kotsadm fs minio default 9000/velero/restic/default' repository 15049d8e opened (repository version 2) successfully, password is correct created new cache in /root/ cache/restic 6b98083019de01590d6fa12ba46ed4bb46c7c7ee8bcc3192128041c851667d70 768e57bdfb53c7e5a3e3c282f68bf9cab83159bb051f47c492053903f6b7ae19 80d918b1f28835eba6e1c955286c19c19b81a28f33a72eb0e0d4b49447c68833 8189de2cfb080be3063b4a52966099526ed143fbda1d03a6092f2c732703dad6 825bfd64bddd769645485662dff836999bf7bd86a7f20da8994e38cb573e8e01 d4035bcfff85355fdb0f16fc32ea8ad38d1bbe4a87fe2ef6e8b172e9f9650368
1.3.35.Resizing the MongoDB PVC on eks existing clusters using gp2 as the MongoDB volume storageclass
on existing clusters, the customer is responsible for the eks configuration while using gp2 as the mongodb volume storageclass (rather than openebs we use for embedded clusters), we must resize the volumes in eks so customers with the same existing cluster configuration can follow the following steps prerequisites ensure permissions to edit volumes and persistent volume claims (pvcs) within your eks environment confirm the ability to access the kots admin console and aws console from kots admin console update kots config and redeploy kots version with resized db log in to the kots admin console navigate to the settings or configuration section update the mongodb volume size field to the new desired size under the database configuration redeploy the kots version to apply the updated configuration note ensure that the updated size is reflected in the configuration to avoid discrepancies when kubernetes applies the volume size from common cluster update volume size in persistent volume configs resize volume in aws console (gp2 storage class) these steps provide a thorough approach for resizing mongodb volumes using gp2 in a cluster on eks
1.3.36.Troubleshooting Velero Snapshot Backup and Restore Failures
in some cases, velero backups or restores fail without clear error messages use these steps to gather logs, diagnose issues, and understand the root cause prerequisite set the namespace for embedded cluster, the value should be "default" export ns=\<your namespace> 1\ collect velero debug logs to generate detailed logs for a specific backup or restore velero get backup velero debug backup \<backup name> velero get restore velero debug restore \<restore name> this creates a bundle \<date> tar gz file containing logs and metadata 2\ capture a support bundle gather the cluster’s state after failure kubectl support bundle interactive=false secret/${ns}/kotsadm swimlane platform supportbundle kubectl support bundle n $ns interactive=false https //raw\ githubusercontent com/replicatedhq/troubleshoot specs/main/host/default yaml 3\ look for out of memory (oom) events on each node dmesg t | grep i oom dmesg t | egrep i 'killed process' journalctl k | grep i 'killed process' from any node, check specific pods kubectl describe pod swimlane tools 0 n $ns | grep i oom kubectl get pod swimlane tools 0 n $ns o jsonpath="{ status containerstatuses\[ ] laststate terminated reason}" kubectl describe pod swimlane sw mongo 0 n $ns | grep i oom kubectl get pod swimlane sw mongo 0 n $ns o jsonpath="{ status containerstatuses\[ ] laststate terminated reason}" 4\ check node and container memory configuration on each node kubectl describe node \<node name> | grep i memory journalctl u kubelet | tail 100 > kubelet log the journalctl u kubelet command helps uncover kubelet level issues from any node, check resource limits for mongo and tools containers kubectl get sts swimlane sw mongo n $ns o jsonpath="{ spec template spec containers\[ ] resources}" && echo kubectl get sts swimlane tools n $ns o jsonpath="{ spec template spec containers\[ ] resources}" && echo 5\ check for large mongodb collections run inside mongo shell swimlane 10 x kubectl exec it n $ns swimlane sw mongo 0 mongosh u admin p authenticationdatabase admin tls tlsallowinvalidcertificates admin note for older versions, change "mongosh" to "mongo" turbine kubectl exec it n $ns mongo 0 mongosh u admin p authenticationdatabase admin tls tlsallowinvalidcertificates admin then identify top collections by data and index size using the provided javascript scripts below large mongodb collections let collections = \[]; db getmongo() getdbnames() foreach(function(dbname) { const currentdb = db getsiblingdb(dbname); currentdb getcollectioninfos() foreach(function(collinfo) { if (collinfo type === "collection" && !collinfo name startswith("system ")) { const stats = currentdb getcollection(collinfo name) stats(); collections push({ db dbname, collection collinfo name, sizebytes stats size, sizemb (stats size / (1024 1024)) tofixed(2), sizegb (stats size / (1024 1024 1024)) tofixed(2) }); } }); }); collections sort((a, b) => b sizebytes a sizebytes); print("top 10 largest collections by data size "); collections slice(0, 10) foreach(function(item, index) { print(`${index + 1} ${item db} ${item collection} ${item sizebytes} bytes | ${item sizemb} mb | ${item sizegb} gb`); }); mongodb index size let indexstats = \[]; db getmongo() getdbnames() foreach(function(dbname) { const currentdb = db getsiblingdb(dbname); currentdb getcollectioninfos() foreach(function(collinfo) { if (collinfo type === "collection" && !collinfo name startswith("system ")) { const stats = currentdb getcollection(collinfo name) stats(); indexstats push({ db dbname, collection collinfo name, indexbytes stats totalindexsize, indexmb (stats totalindexsize / (1024 1024)) tofixed(2), indexgb (stats totalindexsize / (1024 1024 1024)) tofixed(2) }); } }); }); indexstats sort((a, b) => b indexbytes a indexbytes); print("top 10 collections by index size "); indexstats slice(0, 10) foreach(function(item, index) { print(`${index + 1} ${item db} ${item collection} ${item indexbytes} bytes | ${item indexmb} mb | ${item indexgb} gb`); }); 6\ check disk space and dump folder content inside the swimlane tools pod kubectl exec n $ns $(kubectl get pod n $ns l app=swimlane tools o name) ls lh /dump on each node, check disk usage df h | head 20 7\ additional log collection to improve verbosity during troubleshooting use the log level argument the available values (in increasing verbosity) are error – logs only critical errors warn – logs warnings and errors info – default; logs general operational messages debug – logs detailed debug information trace – logs everything, including low level internal operations (very verbose) you can set this in the velero deployment like this kubectl edit deployment/velero n velero \# add argument log level debug under spec containers args spec containers \ name velero args \ server \ log level=debug this will provide deeper insight 8\ additional recommendations capture screenshots of error messages in the ui note the failure timestamp and backup or restore names for restore, ensure the backup being used is complete