Kubernetes Topics To Learn

Kubernetes Topics To Learn

Reading time1 min
#Cloud#DevOps#Networking#Kubernetes#CNI#NetworkPolicy

Mastering Kubernetes Networking: Beyond the Basics for Scalable, Secure Clusters

Mistakes in Kubernetes networking aren’t theoretical—they’re outages, breached perimeter, or load balancers burning CPU. Often, the first sign of trouble is a stuck rollout or an alert about unreachable APIs. The root: misunderstanding the actual mechanics of cluster networking.

Beyond Pod-to-Pod: Where Problems Grow

Kubernetes assigns every pod an IP, with built-in assumptions: global connectivity, no NAT between pods, and seamless service discovery. But as clusters grow, those abstractions become points of failure and attack. Overlay networks conceal complexity, and default openness creates surface for lateral movement.

Key architectural layers:

  • Overlay networks: Decouple node and pod IP space. Useful, but increase MTU fragility and debugging difficulty.
  • Service abstraction: Automates VIPs and client routing, but leaks (e.g., via NodePort) if misconfigured.
  • Network policies: A coarse firewall in a world used to zero-trust microsegmentation.
  • Ingress controllers/Load Balancers: External traffic is handled here—an easy target for misroutes or over-permissioned paths.
  • DNS: Kubernetes leans hard on internal DNS; misconfigurations snowball into cross-service outages.

Default networking will get a playground cluster online. Production clusters need deliberate design.


Four Networking Domains Every Cluster Operator Must Master

1. Container Network Interface (CNI): The Non-Optional Plug-in

Kubernetes doesn’t wire containers together itself; CNI plugins do the heavy lifting. Your choice controls security capabilities, performance, and operational friction. Differences matter:

PluginMajor FeaturesNotes
CalicoPolicy enforcement, eBPF, BGPStrong defaults for security
FlannelSimple, minimal overlay networkLacks advanced policy
CiliumeBPF, L7 visibility, native IPv6Increased resource usage, best for modern kernels (5.x+)
Weave NetNetwork encryption, simple setupDeploys easily, watch MTU issues

Assess before install. For example, Calico v3.25+ supports Kubernetes 1.27+ and enables Network Policy by default—unlike Flannel.

kubectl apply -f https://docs.projectcalico.org/manifests/calico.yaml

Install manifest for Calico 3.x (verify version before use).

Side note: Not all CNIs support Kubernetes NetworkPolicy. Calico and Cilium do; Flannel doesn’t (except in special hybrid setups). This is a known cause for “network policies not working” tickets.


2. Kubernetes Service Types: Traffic Surfacing, Done Right

Three service types, three stages of exposure. Mixing them without intent leads to hazards.

  • ClusterIP: Intra-cluster only. Preferred for backend/stateful workloads.
  • NodePort: Maps a static port (e.g., 30080) on all nodes to your service. Acceptable for bare metal, brittle at scale.
  • LoadBalancer: Talks to external cloud or on-prem L4 balancers, allocates VIP per service. Costs and quota constraints apply.

Example (tested on GKE 1.26+):

apiVersion: v1
kind: Service
metadata:
  name: frontend
spec:
  type: LoadBalancer
  ports:
    - port: 443
      targetPort: 8443
      protocol: TCP
  selector:
    app: frontend-app

This configures a TCP frontend app listening on 8443 container port, externally exposed on 443 via provider LB. Double-check firewall rules—cloud LB does not imply security group lock-down.

Trade-off: On every cloud provider, LoadBalancer count is limited. Review your quotas or expect “TooManyLoadBalancers” errors.


3. NetworkPolicy: Microsegmentation, Not Just a Checkbox

Clusters ship wide open. Any pod, any namespace, any traffic. Once sensitive workloads and compliance land, this fails audits.

Sample: Lock backend access strictly

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: backend-restrict
  namespace: prod
spec:
  podSelector:
    matchLabels:
      app: backend
  ingress:
    - from:
        - podSelector:
            matchLabels:
              app: api-gateway
  policyTypes:
    - Ingress

This only allows pods labeled app=api-gateway to reach backend pods in prod. No policy means all traffic is permitted.

Non-obvious tip: Test egress rules as well—many external integrations break once outbound traffic is filtered. Fine-tune policies with kubectl exec ... wget/curl instead of waiting for application-level timeouts.


4. DNS Inside Kubernetes: It’s the Glue

*.svc.cluster.local is translated by CoreDNS (or kube-dns). If DNS flaps, readiness checks fail and dependencies disappear.

Sample resolution:

kubectl exec -it test-pod -- nslookup mysql.prod.svc.cluster.local

Typical troubleshooting:

  • CoreDNS logs with repeated SERVFAIL
  • Pods with CrashLoopBackOff if DNS is unreachable
  • MTU issues with overlay network: unable to resolve host x

Known issue: Heavy use of headless services or StatefulSets increases DNS resolution. Defaults can bottleneck—consider tuning Corefile to increase concurrency.


Debugging and Tuning: Concrete Steps

  • Validate CNI status with kubectl get pods -n kube-system -o wide (look for restarts/CrashLoopBackOff in CNI daemonset).
  • List routes and IP layouts: kubectl get pods -o wide, ip route within a node.
  • Test NetworkPolicy enforcements using ephemeral test pods.
  • Check NodePort allocation: Confirm host ports aren’t blocked by local firewalls/SELinux.
  • Inspect CoreDNS health: kubectl logs -n kube-system -l k8s-app=kube-dns
  • Ingress troubleshooting: Confirm correct IngressClass and backend service mapping; error 404 Not Found often signals misconfigured backend service, not ingress itself.

Gotcha: Some cloud providers inject their own CNI or custom DNS, which may not play nicely with manual tweaks. “ClusterIP not working” is frequently misdiagnosed—often it’s a misconfigured CNI or security group, not a Kubernetes bug.


Conclusions: The Real-World Checklist

Kubernetes networking isn’t “set and forget.” Prioritize:

  • CNI plugin fit for your security and scaling needs.
  • Explicit NetworkPolicy, especially for multi-tenant, regulated, or production clusters.
  • Proper usage of Service exposure—default to least-exposure, escalate intentionally.
  • Proactive DNS and routing monitoring.

The hardest outages trace to silent network misconfigurations. Invest time on day zero—debugging in production is slow and costly.


For further detail—such as eBPF tracing for pod-level packet flow, Ingress controller best practices for HTTP, or advanced CNI policy scenarios—see related deep dives and operational field notes.