Skip to main content

Debugging 'Too many open files' in Kubernetes: nofile vs inotify/fsnotify

· 5 min read
Kobbi Gal
I like to pick things apart and see how they work inside

When you see too many open files in a containerized app, it’s tempting to jump straight to ulimit -n. Sometimes that’s correct. But on Linux (especially with Go apps using fsnotify), the error can also be caused by inotify limits—even if your process has a huge file-descriptor limit.

This post is a practical, copy/paste-friendly checklist to debug the problem on a real Kubernetes cluster.

Step 0: Decide which “limit” you’re hitting

There are (at least) three common failure modes that can all look like “too many open files”:

  • Per-process file descriptor limit (classic EMFILE)
    • Think: ulimit -n, /proc/<pid>/limits, or systemd LimitNOFILE.
  • System-wide file table exhaustion (ENFILE)
    • Think: /proc/sys/fs/file-nr approaching /proc/sys/fs/file-max.
  • inotify instance/watch limits (common with fsnotify)
    • Think: fs.inotify.max_user_instances and fs.inotify.max_user_watches.

The rest of the tutorial helps you quickly identify which one applies.

Step 1: Check the real limits of the failing process (not your shell)

Inside the pod:

# Find your process (adjust the pattern for your app)
ps -eo pid,comm,args | grep -E 'myapp|server|gateway' | grep -v grep

# Replace <pid> with the real PID
cat /proc/<pid>/limits | sed -n '/Max open files/p'

# Count currently open FDs
ls /proc/<pid>/fd | wc -l

Why this matters:

  • ulimit -n shows the limit for your current shell, which may be totally different from a process started by an init system (systemd, supervisord, Kubernetes runtime, etc.).

Step 2: Check node-wide file table pressure (ENFILE)

On the Kubernetes node:

cat /proc/sys/fs/file-nr
cat /proc/sys/fs/file-max

file-nr is usually three numbers: allocated, unused, max. If allocated is near max, you have a node-level exhaustion problem that can break unrelated workloads.

Step 3: Check inotify limits (the usual fsnotify culprit)

On the node (or inside the container—these are node kernel settings):

cat /proc/sys/fs/inotify/max_user_instances
cat /proc/sys/fs/inotify/max_user_watches
cat /proc/sys/fs/inotify/max_queued_events

If your app uses Go file watching, fsnotify’s Linux notes are worth skimming: fsnotify README (Linux).

Key concept:

  • max_user_instances is per-UID. In Kubernetes, multiple containers/processes can share the same numeric UID (e.g., a non-root “app user”), which means they share the same inotify instance budget on that node.

Step 4: Count inotify “instances” currently in use (who’s consuming them?)

On Linux, inotify instances show up as file descriptors named anon_inode:inotify.

Run this on the node to see which processes (and UIDs) are holding inotify instances:

for pid in /proc/[0-9]*; do
p=${pid#/proc/}
[ -r "$pid/fd" ] || continue
c=0
for fd in "$pid"/fd/*; do
[ "$(readlink "$fd" 2>/dev/null)" = "anon_inode:inotify" ] && c=$((c+1))
done
[ "$c" -gt 0 ] || continue
uid=$(awk '/^Uid:/{print $2}' "$pid/status" 2>/dev/null)
cmd=$(tr '\0' ' ' < "$pid/cmdline" 2>/dev/null)
echo "uid=$uid pid=$p inotify_instances=$c $cmd"
done | sort -k3 -nr | head -n 50

What to look for:

  • A single UID with inotify_instances totals near max_user_instances
  • Your app process holding many anon_inode:inotify FDs
  • A node “agent” (log collector, metrics sidecar, etc.) consuming a lot

Step 5: (Optional) Count how many watches each inotify FD holds

If you suspect “too many watches” (not instances), you can inspect /proc/<pid>/fdinfo/*:

pid=<pid>
for fd in /proc/$pid/fd/*; do
[ "$(readlink "$fd" 2>/dev/null)" = "anon_inode:inotify" ] || continue
n=$(grep -c 'inotify wd' /proc/$pid/fdinfo/${fd##*/} 2>/dev/null || true)
echo "pid=$pid fd=${fd##*/} watches=$n"
done | sort -t= -k3 -nr | head -n 20

Step 6: “How do I kill the file watcher?”

You generally can’t “kill a watch” directly. The kernel releases it when the owning process closes the FD.

Practical actions:

  • Restart the pod / process that owns the inotify instances.
  • If a node is saturated, cordon/drain and reschedule to another node as a temporary workaround.
  • Fix the root cause by raising limits and/or reducing watcher usage.

Step 7: Mitigation—raise inotify instance limits (and make it persistent)

If you’ve confirmed max_user_instances is the bottleneck, increasing it is often the quickest fix.

Temporary (until reboot):

sysctl -w fs.inotify.max_user_instances=1024

Persistent:

  • Add a sysctl config file on the node (exact location varies by distro), for example:
    • /etc/sysctl.d/99-inotify.conf
  • Then reload sysctls (varies by environment), commonly:
    • sysctl --system

In managed Kubernetes, you may prefer:

  • baking sysctl settings into your node image/bootstrap
  • setting them via a privileged DaemonSet (policy-dependent)

Step 8: Don’t forget nofile (it’s still real)

Even if inotify was the cause this time, it’s worth capturing nofile facts for your runbook:

  • Systemd defaults:
systemctl show --property DefaultLimitNOFILE
systemctl show --property DefaultLimitNOFILESoft
  • Unit-specific limits (examples):
systemctl show kubelet --property LimitNOFILE
systemctl show containerd --property LimitNOFILE

And always verify the actual process limit via /proc/<pid>/limits (Step 1).

Summary: the fastest “root cause” loop

  1. Check the failing process’s actual Max open files and FD count.
  2. Check node file-nr/file-max for system-wide FD exhaustion.
  3. Check inotify sysctls and enumerate anon_inode:inotify to find the UID/process consuming instances.
  4. Apply the smallest safe mitigation (restart offender, reschedule, raise sysctl) and confirm the error disappears.

The 6GHz Black Hole: Troubleshooting SSH Failures Across WiFi Bands

· 3 min read
Kobbi Gal
I like to pick things apart and see how they work inside

Have you ever had a service that works perfectly on localhost but acts like it doesn't exist to the rest of your network? We recently spent an afternoon debugging a Mac Mini M4 that refused to accept SSH connections, despite every local check saying "All Systems Go."

The culprit wasn't a firewall or a wrong config, it was the invisible wall between 5GHz and 6GHz WiFi bands.

The Setup

  • Target: Mac Mini M4 (192.168.x.10) connected via a WiFi Extender on the 5GHz band.
  • Host: Laptop (192.168.x.20) connected directly to the main router on the 6GHz (Wi-Fi 6E) band.
  • Goal: Simple SSH access from Host to Target.

Step 1: Is the service actually running?

First, we checked if the target was actually listening on Port 22.

sudo lsof -i :22

Result: sshd was listening on all interfaces (z). Testing ssh localhost 22 worked perfectly. The "phone was off the hook," but the call wasn't getting through.

Step 2: Checking the "Shields"

We checked the standard macOS suspects:

  • Application Firewall: Turned OFF.
  • Stealth Mode: Disabled.
  • Third-Party Tools: None (clean install).

Despite this, a simple nc -vz 192.168.x.10 22 from the host timed out, and ping failed entirely.

Step 3: Following the Breadcrumbs (The Topology)

By looking at the router’s device list, we noticed a discrepancy:

  • The Host was on the 6GHz band.
  • The Target was on the 5GHz band (linked via an extender).

The Theory: Why it Failed

When we disabled the 6GHz radio on the router, forcing the Host onto the 5GHz band, everything started working immediately. Why? There are three concrete theories for this behavior:

  1. The ARP Bridge Failure (Layer 2): For the laptop to talk to the Mac Mini, it needs to know its MAC address via an ARP (Address Resolution Protocol) request. Many routers and extenders fail to "bridge" these broadcast requests across different physical radios (6GHz to 5GHz). If the ARP request never crosses the bridge, the Host has no "physical address" to send the SSH packets to.

  2. Extender "Station Separation": WiFi extenders often use a technique called MAC Translation to manage clients. This can sometimes create a "one-way mirror" effect where the client on the extender can reach the internet, but unsolicited incoming traffic (like a New SSH Connection) from the main router's side is dropped because the extender doesn't recognize the route.

  3. mDNS/Bonjour Isolation: macOS relies heavily on Bonjour (mDNS) for discovery. 6GHz is often treated as a "high-security" or "high-speed" isolated segment by consumer routers. If the router isn't explicitly configured to repeat Multicast traffic between the 6GHz and 5GHz radios, the devices remain invisible to each other.

The Solution

If you encounter "Port 22 Timeout" on a modern Wi-Fi 6E/7 network:

  1. Check your bands: Ensure both devices are on the same frequency (2.4, 5, or 6GHz).

  2. Disable "Smart Connect" / SON: Give your 5GHz and 6GHz networks different names to control exactly where your "Server" devices land.

  3. Wired Backhaul: If using an extender, connect it to the router via Ethernet or MoCA to avoid wireless bridging bugs.

The lesson? Just because they have the same IP subnet doesn't mean they're on the same "road."

How To Deploy Application with Azure Workload Identity

· 3 min read
Kobbi Gal (Akeyless)
Escalations Engineer at Akeyless

This tutorial is a guide on how to deploy an application in Kubernetes that will authenticate using Azure Workload Identity on Azure Kubernetes Services (AKS).

Prerequisites

  • Access to the Azure CLI and an Azure account.
  • kubectl installed and access to the AKS cluster.
  • helm.

See links for more information about Azure Identity and AKS.

Enable OIDC on AKS

  1. Check if the OIDC issuer is enabled in the AKS cluster. Enable it if it's not.

(Optional) Enable Workload Identity plugin

az aks update \
--resource-group "$AZURE_RESOURCE_GROUP" \
--name "$AKS_CLUSTER_NAME" \
--enable-workload-identity

This will deploy a Deployment named azure-wi-webhook-controller-manager in the kube-system namespace:

❯ kubectl get deploy -n kube-system
NAME READY UP-TO-DATE AVAILABLE AGE
azure-wi-webhook-controller-manager 2/2 2 2 48d

This step is optional since we can explicitly specify the application that will use Azure Workload Identity to mount the Azure token as a volume. More on that in a bit.

Create User Assigned Managed Identity for Application

# Replace with your preferred names and location
IDENTITY_NAME="app-wi"
IDENTITY_RG="$AZURE_RESOURCE_GROUP"
LOCATION="${AZURE_LOCATION:-eastus}"

az identity create --resource-group "$IDENTITY_RG" --name "$IDENTITY_NAME" --location "$LOCATION"
CLIENT_ID=$(az identity show --resource-group "$IDENTITY_RG" --name "$IDENTITY_NAME" --query clientId -o tsv)
PRINCIPAL_ID=$(az identity show --resource-group "$IDENTITY_RG" --name "$IDENTITY_NAME" --query principalId -o tsv)
TENANT_ID=$(az account show --query tenantId -o tsv)
OIDC_ISSUER=$(az aks show --resource-group "$AZURE_RESOURCE_GROUP" --name "$AKS_CLUSTER_NAME" --query "oidcIssuerProfile.issuerUrl" -o tsv)

Create a Federated Credential

# namespace and service account name that your test app will use
NAMESPACE="default"
SA_NAME="app-wi-sa"

az identity federated-credential create \
--resource-group "$IDENTITY_RG" \
--name "${IDENTITY_NAME}-fc" \
--identity-name "$IDENTITY_NAME" \
--issuer "$OIDC_ISSUER" \
--subject "system:serviceaccount:${NAMESPACE}:${SA_NAME}"

Install Azure Workload Identity Webhook

This is what injects AZURE_CLIENT_ID, AZURE_TENANT_ID, AZURE_FEDERATED_TOKEN_FILE, and the projected token volume into pods that use the label. See Service Principal for more info on those environmental variables.

helm repo add azure-workload-identity https://azure.github.io/azure-workload-identity/charts
helm repo update
kubectl create namespace azure-workload-identity-system 2>/dev/null || true
helm upgrade --install workload-identity-webhook azure-workload-identity/workload-identity-webhook \
--namespace azure-workload-identity-system \
--set azureTenantId="$TENANT_ID"s

Create a Kubernetes ServiceAccount

Here is where the link between Kubernetes and Azure Workload Identity happens:

kubectl create namespace "$NAMESPACE" 2>/dev/null || true
kubectl apply -f - <<EOF
apiVersion: v1
kind: ServiceAccount
metadata:
name: $SA_NAME
namespace: $NAMESPACE
annotations:
azure.workload.identity/client-id: "$CLIENT_ID"
EOF

As we can see, we annotate the ServiceAccount with azure.workload.identity/client-id: "$CLIENT_ID".

Deploy Application with Workload Identity

kubectl apply -f - <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
name: hello-wid
spec:
replicas: 1
selector:
matchLabels:
app: hello-wid
template:
metadata:
labels:
app: hello-wid
azure.workload.identity/use: "true"
spec:
serviceAccountName: $SA_NAME
containers:
- name: alpine
image: alpine
command:
- "sh"
- "-c"
- "echo "Workload Identity tutorial done! Sleeping..." && sleep 10000"
EOF

The main things we're doing here are:

  1. We set the application Deployment to use the ServiceAccount we created in the previous step and that is linked to an Azure Workload Identity.
  2. We set the Deployment Pod specification to use Azure Workload Identity by setting the label azure.workload.identity/use: "true".

How to Deploy Kubernetes Services using Gateway API/AWS Load Balancer Controller

· 9 min read
Kobbi Gal (Akeyless)
Escalations Engineer at Akeyless

This tutorial contains a working example of exposing TCP services (LDAP/LDAPS + SSH) from a single-node k3s cluster running on an EC2 instance, using:

  • Kubernetes Gateway API
  • AWS Load Balancer Controller (LBC) for:
    • NLB (L4) via TCPRoute
    • ALB (L7) via HTTPRoute/GRPCRoute (example file included)

The key implementation detail for k3s-on-EC2 with the default overlay networking (flannel): use instance targets + NodePorts for L4 routes. ClusterIP + pod IP targets won’t work unless pods are VPC-routable (AWS VPC CNI).

How To Set Up Split Tunneling with VPN

· 6 min read
Kobbi Gal (Akeyless)
Escalations Engineer at Akeyless

Introduction

A lot of our work nowadays requires using and connecting to a Virtual Private Networks (VPNs) in order to access certain resources (e.g. databases, websites, REST APIs) that were deemed important to protect from the public internet. When we connect to the VPN, we're able to access these resources.

The nature of our modern digital work requires simultaneous access to a plethora of services. Some of these services require an active VPN connection and some can be accessed without.

At times, the VPN we need to connect to is geolocated far from us. In addition, the VPN can be one that serves the entire company and is not very performant. These factors result in an experience of collective latency accessing resources, restricted and unrestricted ones alike.

If you find/found yourself in this type of situation before, this post will explain how you can circumvent that and suffer latency only when accessing the restricted resources instead of all resources by modifying the operating system routing tables in what's called 'Split Tunneling'.

To do this you will need to have root/administrator access to the UNIX operating system.

We begin by collecting the relevant information and later performing the modifications.

3-Way Data Migration between Support Systems

· 12 min read
Kobbi Gal
I like to pick things apart and see how they work inside

Introduction

The company I work for decided a few months ago that we’ll be moving all customer tickets and assets from two separate systems (one for chat and one for old-school tickets) into a new, integrated system which provides both capabilites. My task was to perform the migration between the systems. Even though I’m not data engineer by any means, I accepted the challenge and thought it would teach me a lot about the planning and execution of such a complex project. It would also allow me to hone in my development/scripting skills and finally have some hands-on experience using a Python library I was always interested in working with, pandas.