4234 shaares
3 results
tagged
Kubernetes
CLI to analyze IaC: Terraform, CloudFormation, Kubernetes, Helm, ARM Templates and Serverless framework
tl;dr :
- Start using [your software] in production in a non-critical capacity (by sending a small percentage of traffic to it, on a less critical service, etc)
- try to have each incident only once
- Understand what is ok to break and isn’t
For example, with Kubernetes:
- ok to break:
- any stateless control plane component can crash or be cycled out or go down for 5 minutes at any time
- kubernetes networking can break as much as it wants because we decided not to use it to start
- not ok to break
- for us, if etcd goes down for 10 minutes, that’s ok
- containers not starting or crashing on startup
- containers not having access to the resources they need
- pods being terminated unexpectedly by Kubernetes
With Envoy, the breakdown is pretty different:
we operate Kubernetes as follows to try and minimise it:
- We run multiple production clusters and teams are able to choose which clusters to run their application in. We don’t use Federation yet (we’re waiting on AWS support) but we use Envoy instead to load-balance across the different cluster Ingress load-balancers. We can automate much of this with our Continuous Delivery pipeline (we use Drone) and other AWS services.
- All clusters are configured with the same Namespaces. These map approximately 1:1 with teams.
- We use RBAC to control access to Namespaces. All access is authenticated and authorised against our corporate identity in Active Directory.
- Clusters are auto-scaled and we do as much as we can to optimise node start-up time.
- Applications auto-scale using application-level metrics exported from Prometheus.