Skip to main content

Running out of Private IP in EKS

 


In particular, there are cases where the pending state persists despite not running out of
CPU or memory.

There is a case that it is due to lack of private ip.If the node group is shown as "Degraded"
in the EKS cluster configuration and you can see the following error in Health issues.

"Amazon Autoscaling was unable to launch instances because there are not enough free addresses
in the subnet associated with your AutoScaling group(s)."

And you can see that the number of "Available IP4 addresses" in the AWS VPC subnet used in the
node group is 0.

By designating the IP that the node group occupies, you can get some IPs back.


kubectl set env -n kube-system daemonset/aws-node MINIMUM_IP_TARGET=10 WARM_IP_TARGET=2
kubectl get daemonset -n kube-system aws-node -o json | jq -r '.spec.template.spec.containers[] |select ( .name == "aws-node" ).env'


You can see that the number of "Available IP4 addresses" in the AWS VPC subnet is increased.

Nevertheless, if IPs are not enough, consider two approaches.

1. Check the HA status and adjust appropriately to avoid the case where too many pods are
created due to the cpu and memory allocated to the application being too small.

kubectl get hpa -n test
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
test-nginx Deployment/test-nginx 10%/80%, 9%/80% 7 200 7 14d

---

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: test-nginx
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: test-nginx
minReplicas: 7
maxReplicas: 200
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 80 =>
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80 =>


---

apiVersion: apps/v1
kind: Deployment
metadata:
name: test-nginx
spec:
selector:
matchLabels:
app: test-nginx
replicas: 7
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 50%
maxUnavailable: 50%
template:
spec:
containers:
- name: nginx
imagePullPolicy: Always
resources:
requests:
memory: "200Mi" =>
cpu: "100m" =>
limits:
memory: "1Gi"
cpu: "500m"
nodeSelector:
team: test
environment: prod

2. If IPs are not enough even after tuning, add a node group to which another subnet is
assigned. (This below is a terraform snippet.)

test = {
desired_capacity = 5
max_capacity = 15
min_capacity = 4
subnets = [element(module.vpc.private_subnets, 0)]
disk_size = 30
k8s_labels = {
team = "test"
environment = "prod"
}
},

test2 = { =>
desired_capacity = 5
max_capacity = 15
min_capacity = 4
subnets = [element(module.vpc.private_subnets, 4)]
disk_size = 30
k8s_labels = {
team = "test"
environment = "prod"
}
},


Comments

Popular posts from this blog

Amazon RDS Blue/Green Deployments

In order to avoid some errors I experienced when proceeding as described in the official documentation, I describe what I did in order. 1) Modify parameters of source_database * error: Blue Green Deployments requires cluster parameter group has binlog enabled. RDS Parameter groups: source-params-group binlog_format => MIXED mysql> show global variables like 'binlog_format'; 2) Insert a row after rebooting the source database, to avoid this error. * error: Correct the replication errors and then switch over. Read Replica Replication Error - IOError: 1236, reason: Got fatal error 1236 from master when reading data from binary log: 'Could not find first log file name in binary log index file' => To Fix: You need to change the data in the source database. INSERT INTO dummy_table ( `favorite_id` , `favorite_order` , `user_id` , `board_id` ) VALUES ('100001', '1', '11111', '11111'); 3) Modify the param...

Fluentd for mysql in AWS

(0) preparation ulimit -n If your console shows 1024, it is insufficient. Please add following lines to your /etc/security/limits.conf file and reboot your machine. root soft nofile 65536 root hard nofile 65536 (1) install Fluentd // “Ubuntu 12.04 LTS / Precise” curl -L http://toolbelt.treasuredata.com/sh/install-ubuntu-precise.sh | sh /etc/init.d/td-agent start/stop/restart/status // test curl -X POST -d 'json={"json":"message"}' http://localhost:8888/debug.test /etc/init.d/td-agent stop chown: changing ownership of `/var/run/td-agent/td-agent.pid': Operation not permitted chown: changing ownership of `/var/run/td-agent': Operation not permitted  * Stopping td-agent td-agent                                                                                   ...

Install CoreOs on linode without VM

Install CoreOs on linode without VM 1. Add a Linode 2. Create a new Disk   CoreOS 3. Rescue > Reboot into Rescue Mode 4. Remote Access   Launch Lish Console 5. make an install script cat <<'EOF1' > install.sh # add needed package sudo apt-get update sudo apt-get install -y curl wget whois sudo apt-get install -y ca-certificates #sudo apt-get install gawk -y # get discovery url discoveryUrl=`curl https://discovery.etcd.io/new` # write cloud-config.yml cat <<EOF2 > cloud-config.yml #cloud-config users:   - name: core     groups:       - sudo       - docker coreos:   etcd:     name: node01     discovery: $discoveryUrl hostname: node01 EOF2 # get the coreos installation script #wget https://raw.github.com/coreos/init/master/bin/coreos-install wget https://raw.githubusercontent.com/coreos/init/master/bin/coreos-install # run installation chmod 75...