



"It was just supposed to be a simple ETL microservice. Fetch some data, transform it, and load it into Postgres. 30 minutes, tops."
— Famous last words before a 5-hour DNS debugging marathon
It's a regular Tuesday afternoon, and I'm building what I thought would be a straightforward microservices setup in Kubernetes. Simple enough, right? A data loader microservice that fetches user data from an API, transforms it, and loads it into a PostgreSQL database. ConfigMaps for non-sensitive configs, Secrets for credentials, and textbook Kubernetes stuff.
The architecture looked clean on paper:
Data Loader Microservice: Python app containerized and ready to roll
PostgreSQL Database: Running in its own pod with persistent storage
ConfigMaps: Handling API URLs, log levels, DB port, DB name
Secrets: Safeguarding DB credentials (properly base64 encoded, thank you very much)

I built my Docker image, configured my deployments, and applied my ConfigMaps and Secrets. The PostgreSQL pod started up beautifully. Green lights everywhere.
Then I deployed the ETL microservice and... 💥

Me: "That's odd. Let me check the logs..."
ERROR:root:Error: could not translate host name 'postgres-service' to address:
Temporary failure in name resolutionWait, what? The service is right there! I can see it in kubectl get svc. The name is correct. The port is correct. What's happening?
I tried the classic debugging move, checking if it's an external network issue:
kubectl logs etl-deployment-7cd7bd469b-7dqfl
...
Failed to resolve 'random-data-api.com': Temporary failure in name resolutionOh no. It's not just the internal service. DNS is completely broken.
Time to investigate CoreDNS:
kubectl get pods -n kube-system -l k8s-app=kube-dns
NAME READY STATUS RESTARTS
coredns-668d6bf9bc-8xg47 0/1 CrashLoopBackOff 142There it was. My DNS server was having an existential crisis, trapped in an infinite CrashLoopBackOff.

kubectl logs -n kube-system coredns-668d6bf9bc-8xg47
Listen: listen tcp :53: bind: permission deniedAh! A permission error. CoreDNS can't bind to port 53. Easy fix; just add the `NET_BIND_SERVICE` capability!
I edited the CoreDNS deployment:
securityContext:
capabilities:
add:
- NET_BIND_SERVICESurely this would work... right?
Narrator: It did not work.
Added allowPrivilegeEscalation: true to the security context. Restarted CoreDNS.
kubectl get pods -n kube-system -l k8s-app=kube-dns
NAME READY STATUS RESTARTS
coredns-668d6bf9bc-tsn9r 0/1 CrashLoopBackOff 12Still crashing. Still the same error.
Me, increasingly frustrated: "But `NET_BIND_SERVICE` is literally right there in the deployment file!"

Okay, if port 53 is causing problems because it's a privileged port (ports < 1024), let's just use a non-privileged port instead!
I edited the CoreDNS ConfigMap and changed the port to 1053:
data:
Corefile: |
.:1053 {
log
errors
health {
lameduck 5s
}
ready
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
fallthrough in-addr.arpa ip6.arpa
ttl 30
}
prometheus :9153
forward . /etc/resolv.conf {
max_concurrent 1000
}
}Restarted CoreDNS:
kubectl -n kube-system delete pod -l k8s-app=kube-dnsAND IT WORKED! CoreDNS was finally running!
kubectl get pods -n kube-system -l k8s-app=kube-dns
NAME READY STATUS RESTARTS
coredns-698f4cc5c9-xflhm 1/1 Running 0Me: "YES! We did it! DNS is working now!"
But then I tested my ETL pod...
kubectl logs etl-deployment-7cd7bd469b-7dqfl
ERROR:root:Error: could not translate host name 'postgres-service' to address:
Temporary failure in name resolutionMe: "WHAT? How is it STILL broken?"

The problem? CoreDNS was now listening on port 1053, but all the pods were still configured to use port 53 for DNS lookups. I'd essentially moved the DNS server to a different address without telling anyone where it went.
At this point, I was ready to throw my laptop out the window. Instead, I did the next best thing, nuked everything, and started fresh:
minikube delete
rm -rf ./minikube # Clean slate, baby!
minikube startFresh install, clean DNS configuration, port 53 as intended. Everything should work now.
But still, the Same error. Same CrashLoopBackOff. Same "permission denied" message.
Me, on the verge of tears: "WHY?! It's a fresh install! What is happening?!"
After hours of debugging, countless chats with ChatGPT, Claude Code, and Gemini, and more coffee than is probably healthy, I finally understood the root cause:
Linux containers run as non-root users by default for security.

Port 53 is a privileged port that requires root access to bind to. Even with the `NET_BIND_SERVICE` capability, even with `allowPrivilegeEscalation: true`, CoreDNS running as a non-root user simply couldn't bind to port 53.
The solution? Allow CoreDNS to run as root (in a controlled way):
kubectl patch deployment coredns -n kube-system \
--type='json' \
-p='[{"op": "add", "path": "/spec/template/spec/containers/0/securityContext/runAsUser", "value": 0}]'This sets `runAsUser: 0`, telling the container to run as root (UID 0).
I also enabled host networking to give CoreDNS direct access to the host's network namespace:
kubectl patch deployment coredns -n kube-system \
--type='json' \
-p='[{"op": "add", "path": "/spec/template/spec/hostNetwork", "value": true}]'Restarted CoreDNS one more time:
kubectl -n kube-system delete pod -l k8s-app=kube-dnsAnd then... silence. No crash. No errors.
kubectl get pods -n kube-system -l k8s-app=kube-dns
NAME READY STATUS RESTARTS
coredns-698f4cc5c9-xflhm 1/1 Running 0I nervously checked the logs:
kubectl logs -n kube-system -l k8s-app=kube-dns
.:53
[INFO] plugin/reload: Running configuration SHA512 = b141a95c4ad45582b9d8d1be55241970...
CoreDNS-1.11.3
linux/amd64, go1.21.11, a6338e9
[INFO] 127.0.0.1:54374 - 17618 "HINFO IN 1085771236002924139.3914264802840083309...It was actually working! Time to test DNS resolution:
kubectl run -it --rm dns-test --image=busybox:1.28 --restart=Never -- nslookup kubernetes.default
Server: 10.96.0.10
Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local
Name: kubernetes.default
Address 1: 10.96.0.1 kubernetes.default.svc.cluster.local✅ Internal DNS working!
kubectl run -it --rm dns-test --image=busybox:1.28 --restart=Never -- nslookup postgres-service
Server: 10.96.0.10
Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local
Name: postgres-service
Address 1: 10.98.33.166 postgres-service.default.svc.cluster.local✅ Service discovery working!
kubectl run -it --rm dns-test --image=busybox:1.28 --restart=Never -- nslookup random-data-api.com
Server: 10.96.0.10
Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local
Name: random-data-api.com
Address 1: 67.205.161.199✅ External DNS working!

I redeployed my ETL microservice and watched the logs with bated breath:

THE ETL JOB COMPLETED SUCCESSFULLY!

You might be wondering: "Port 1053 is non-privileged. CoreDNS was running. Why didn't it work?"
The answer: It's like changing your phone number without updating your contacts.
When a Kubernetes pod resolves a hostname:
Pod reads `/etc/resolv.conf` → points to 10.96.0.10:53
Query sent to kube-dns Service on port 53
Service forwards to CoreDNS on port 53 (defined in targetPort)
CoreDNS responds
The problem? When I changed CoreDNS to port 1053, only step 4 changed. Steps 1-3 still expected port 53!
Component | What I Changed | What I Missed |
|---|---|---|
CoreDNS | ✅ Listening on 1053 | - |
kube-dns Service | - | ❌ Still forwarding to port 53 |
kubelet (all nodes) | - | ❌ Still configured for port 53 |
Pod resolv.conf | - | ❌ Still pointing to port 53 |
Result: CoreDNS was listening on the wrong port. It's like moving apartments without telling the post office.

Port 1053 solved the permission problem (CoreDNS could bind without root), but created a communication problem (nothing could reach it there).
Port 53 has been the DNS standard since 1987. Changing it means updating every component in the cluster, nodes, services, pods, everything.
The simpler fix? Let CoreDNS run as root (`runAsUser: 0`) and keep port 53. One change instead of cluster-wide reconfiguration.
Three lessons from 5 hours of debugging:
Privileged ports need root access - This is Linux security, not a bug
System defaults exist for a reason - Port 53 is everywhere; changing it cascades
Simple root cause fix beats complex workarounds - Sometimes "just run as root" is the right answer
If you hit CoreDNS issues:
# Check CoreDNS status
kubectl get pods -n kube-system -l k8s-app=kube-dns
# Check logs for errors
kubectl logs -n kube-system -l k8s-app=kube-dns
# Test DNS resolution
kubectl run -it --rm dns-test --image=busybox:1.28 --restart=Never -- nslookup kubernetes.default
Remember: Fix DNS first, everything else second.

Tired of boring images? Meet the 'Jai & Veeru' of AI! See how combining Claude and Nano Banana Pro creates mind-blowing results for comics, diagrams, and more.

An honest, first-person account of learning dynamic pricing through hands-on Excel analysis. I tackled a real CPG problem : Should FreshJuice implement different prices for weekdays vs weekends across 30 retail stores?

What I thought would be a simple RBAC implementation turned into a comprehensive lesson in Kubernetes deployment. Part 1: Fixing three critical deployment errors. Part 2: Implementing namespace-scoped RBAC security. Real terminal outputs and lessons learned included

This blog walks you through how Databricks Connect completely transforms PySpark development workflow by letting us run Databricks-backed Spark code directly from your local IDE. From setup to debugging to best practices this Blog covers it all.

This blog unpacks how brands like Amazon and Domino’s decide who gets which coupon and why. Learn how simple RFM metrics turn raw purchase data into smart, personalised loyalty offers.

Learn how Snowflake's Query Acceleration Service provides temporary compute bursts for heavy queries without upsizing. Per-second billing, automatic scaling.

A data engineer started a large cluster for a short task and couldn’t stop it due to limited permissions, leaving it idle and causing unnecessary cloud costs. This highlights the need for proper access control and auto-termination.

Say goodbye to deployment headaches. Learn how Databricks Asset Bundles keep your pipelines consistent, reproducible, and stress-free—with real-world examples and practical tips for data engineers.

Tracking sensitive data across Snowflake gets overwhelming fast. Learn how object tagging solved my data governance challenges with automated masking, instant PII discovery, and effortless scaling. From manual spreadsheets to systematic control. A practical guide for data professionals.

My first hand experience learning the essential concepts of Dynamic pricing

Running data quality checks on retail sales distribution data

This blog explores my experience with cleaning datasets during the process of performing EDA for analyzing whether geographical attributes impact sales of beverages

Snowflake recommends 100–250 MB files for optimal loading, but why? What happens when you load one large file versus splitting it into smaller chunks? I tested this with real data, and the results were surprising. Click to discover how this simple change can drastically improve loading performance.

Master the bronze layer foundation of medallion architecture with COPY INTO - the command that handles incremental ingestion and schema evolution automatically. No more duplicate data, no more broken pipelines when new columns arrive. Your complete guide to production-ready raw data ingestion

Learn Git and GitHub step by step with this complete guide. From Git basics to branching, merging, push, pull, and resolving merge conflicts—this tutorial helps beginners and developers collaborate like pros.

Discover how data management, governance, and security work together—just like your favorite food delivery app. Learn why these three pillars turn raw data into trusted insights, ensuring trust, compliance, and business growth.

Beginner’s journey in AWS Data Engineering—building a retail data pipeline with S3, Glue, and Athena. Key lessons on permissions, data lakes, and data quality. A hands-on guide for tackling real-world retail datasets.

A simple request to automate Google feedback forms turned into a technical adventure. From API roadblocks to a smart Google Apps Script pivot, discover how we built a seamless system that cut form creation time from 20 minutes to just 2.

Step-by-step journey of setting up end-to-end AKS monitoring with dashboards, alerts, workbooks, and real-world validations on Azure Kubernetes Service.

My learning experience tracing how an app works when browser is refreshed

Demonstrates the power of AI assisted development to build an end-to-end application grounds up

A hands-on learning journey of building a login and sign-up system from scratch using React, Node.js, Express, and PostgreSQL. Covers real-world challenges, backend integration, password security, and key full-stack development lessons for beginners.

This is the first in a five-part series detailing my experience implementing advanced data engineering solutions with Databricks on Google Cloud Platform. The series covers schema evolution, incremental loading, and orchestration of a robust ELT pipeline.

Discover the 7 major stages of the data engineering lifecycle, from data collection to storage and analysis. Learn the key processes, tools, and best practices that ensure a seamless and efficient data flow, supporting scalable and reliable data systems.

This blog is troubleshooting adventure which navigates networking quirks, uncovers why cluster couldn’t reach PyPI, and find the real fix—without starting from scratch.

Explore query scanning can be optimized from 9.78 MB down to just 3.95 MB using table partitioning. And how to use partitioning, how to decide the right strategy, and the impact it can have on performance and costs.

Dive deeper into query design, optimization techniques, and practical takeaways for BigQuery users.

Wondering when to use a stored procedure vs. a function in SQL? This blog simplifies the differences and helps you choose the right tool for efficient database management and optimized queries.

Discover how BigQuery Omni and BigLake break down data silos, enabling seamless multi-cloud analytics and cost-efficient insights without data movement.

In this article we'll build a motivation towards learning computer vision by solving a real world problem by hand along with assistance with chatGPT

This blog explains how Apache Airflow orchestrates tasks like a conductor leading an orchestra, ensuring smooth and efficient workflow management. Using a fun Romeo and Juliet analogy, it shows how Airflow handles timing, dependencies, and errors.

The blog underscores how snapshots and Point-in-Time Restore (PITR) are essential for data protection, offering a universal, cost-effective solution with applications in disaster recovery, testing, and compliance.

The blog contains the journey of ChatGPT, and what are the limitations of ChatGPT, due to which Langchain came into the picture to overcome the limitations and help us to create applications that can solve our real-time queries

This blog simplifies the complex world of data management by exploring two pivotal concepts: Data Lakes and Data Warehouses.

demystifying the concepts of IaaS, PaaS, and SaaS with Microsoft Azure examples

Discover how Azure Data Factory serves as the ultimate tool for data professionals, simplifying and automating data processes

Revolutionizing e-commerce with Azure Cosmos DB, enhancing data management, personalizing recommendations, real-time responsiveness, and gaining valuable insights.

Highlights the benefits and applications of various NoSQL database types, illustrating how they have revolutionized data management for modern businesses.

This blog delves into the capabilities of Calendar Events Automation using App Script.

Dive into the fundamental concepts and phases of ETL, learning how to extract valuable data, transform it into actionable insights, and load it seamlessly into your systems.

An easy to follow guide prepared based on our experience with upskilling thousands of learners in Data Literacy

Teaching a Robot to Recognize Pastries with Neural Networks and artificial intelligence (AI)

Streamlining Storage Management for E-commerce Business by exploring Flat vs. Hierarchical Systems

Figuring out how Cloud help reduce the Total Cost of Ownership of the IT infrastructure

Understand the circumstances which force organizations to start thinking about migration their business to cloud