



Last month, my manager dropped a folder on my desk. "The dev team deployed our new ETL pipeline on AKS. It works, but security is wide open. Lock it down with Kubernetes RBAC. No Azure AD, pure Kubernetes only."
I nodded confidently. "I'll have it done by Friday."
It was Tuesday. I didn't finish until the following Tuesday. Here's what went wrong, and what I learned.
Setting up the infrastructure was straightforward:
export RESOURCE_GROUP="CloudWave-RG"
export LOCATION="eastus"
export ACR_NAME="cloudwaveacr$RANDOM"
export AKS_CLUSTER_NAME="CloudWaveAKS"
# Create everything
az group create --name $RESOURCE_GROUP --location $LOCATION
az acr create --resource-group $RESOURCE_GROUP --name $ACR_NAME --sku Basic
az aks create \
--resource-group $RESOURCE_GROUP \
--name $AKS_CLUSTER_NAME \
--node-count 1 \
--node-vm-size Standard_D2s_v3 \
--enable-managed-identity \
--generate-ssh-keys
# Connect AKS to ACR and get credentials
az aks update -n $AKS_CLUSTER_NAME -g $RESOURCE_GROUP --attach-acr $ACR_NAME
az aks get-credentials --resource-group $RESOURCE_GROUP --name $AKS_CLUSTER_NAME
Pro tip: Always use $RANDOM ACR names. I once wasted 30 minutes because "mycompanyacr" was already taken by someone in Europe. ACR names are globally unique.
The ETL script was straightforward: it fetched data from an API and stored it in PostgreSQL. It had retry logic for database connections:
retries = 5
while retries > 0:
try:
conn = psycopg2.connect(host=db_host, dbname=db_name, ...)
return conn
except:
retries -= 1
time.sleep(5)
I containerized it and pushed it to ACR. By Wednesday morning, I was feeling good.
That didn't last long.
I created separate namespaces for isolation:
kubectl create namespace db
kubectl create namespace etl
Think of namespaces like apartments in a building: shared infrastructure, but you can't walk into someone else's place without permission.
I deployed the database:
kubectl apply -f db-setup.yaml
kubectl get statefulset -n db
Output: `0/1 READY` 😱

I checked the logs:
kubectl logs postgres-statefulset-0 -n db
The error:
initdb: error: directory "/var/lib/postgresql/data" exists but is not empty
It contains a lost+found directory
What I learned: Azure persistent disks automatically include a lost+found system directory. PostgreSQL refuses to initialize if it finds anything in its data directory.
The fix: Tell PostgreSQL to use a subdirectory:
env:
- name: PGDATA
value: /var/lib/postgresql/data/pgdata
But here's the catch: I had to delete the old persistent volume claim first.
kubectl delete -f db-setup.yaml
kubectl delete pvc postgres-storage-postgres-statefulset-0 -n db
kubectl apply -f db-setup.yaml

Key insight: Kubernetes doesn't auto-delete PVCs. This protects your data, but you need to clean up manually when troubleshooting.
✅ Database finally started!
I deployed the ETL job:
kubectl apply -f etl-cronjob.yaml
kubectl create job --from=cronjob/data-fetcher-cronjob test-run -n etl
Result: ErrImagePull - no such host

The problem: My YAML had a placeholder image name from a tutorial. I needed my actual ACR name:
az acr list --resource-group $RESOURCE_GROUP --output table
Updated the YAML with the correct registry (cloudwaveacr26077.azurecr.io/etl-script:v1) and redeployed.
Lesson: Never use placeholder values. Always double-check your image paths.
The pod started but immediately failed. The logs showed:

But I HAD created the secret! I checked:
kubectl get secrets -n db
There it was, in the db namespace, but not in etl. Then it hit me.

Secrets are namespace-scoped. My ETL pod in the `etl` namespace couldn't access a secret in the `db` namespace. This isn't a bug; it's intentional isolation!
The fix: Copy the secret to the ETL namespace:
apiVersion: v1
kind: Secret
metadata:
name: postgres-secret
namespace: etl # Different namespace!
type: Opaque
data:
POSTGRES_PASSWORD: bXlTdXBlclNlY3JldFBhc3N3b3Jk
kubectl apply -f etl-secret.yaml
✅ Finally, the ETL job completed successfully!
This namespace isolation moment was crucial, it showed me how Kubernetes enforces boundaries, which would become the foundation of our RBAC security.
By Friday, the app was working. Now for the security lockdown.
I explained it to myself like a nightclub VIP system:
Role = The VIP list (what you're allowed to do)
ServiceAccount = Your ID card (who you are)
RoleBinding = The bouncer (connects your ID to the VIP list)

Database Admin Role:
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
namespace: db
name: db-admin-role
rules:
- apiGroups: ["", "apps"]
resources: ["statefulsets", "services", "secrets", "persistentvolumeclaims", "pods"]
verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
ETL Admin Role:
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
namespace: etl
name: etl-admin-role
rules:
- apiGroups: ["batch", "", "apps"]
resources: ["cronjobs", "jobs", "pods"]
verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
Key differences:
Different namespaces
Different resources (databases vs. jobs)
ETL role includes the batch API group for CronJobs
# ETL Manager Identity
apiVersion: v1
kind: ServiceAccount
metadata:
name: etl-manager-sa
namespace: etl
---
# Link identity to permissions
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: bind-etl-manager
namespace: etl
subjects:
- kind: ServiceAccount
name: etl-manager-sa
namespace: etl
roleRef:
kind: Role
name: etl-admin-role
apiGroup: rbac.authorization.k8s.io
Repeated the same pattern for db-manager-sa in the database namespace.
Applied everything:
kubectl apply -f db-role.yaml
kubectl apply -f etl-role.yaml
kubectl apply -f rbac-setup.yaml
This was the moment of truth. I used the --as flag to impersonate ServiceAccounts:
Test 1: ETL Manager accessing ETL resources (should work)
kubectl get cronjobs -n etl --as=system:serviceaccount:etl:etl-manager-sa
✅ Success! Shows the CronJob.
Test 2: ETL Manager trying to access DB resources (should fail)
kubectl get pods -n db --as=system:serviceaccount:etl:etl-manager-sa
✅ Perfect failure! This is exactly what we want.

Tests 3 & 4: DB Manager
# Can access DB namespace ✅
kubectl get statefulsets -n db --as=system:serviceaccount:db:db-manager-sa
# Cannot access ETL namespace ❌
kubectl get cronjobs -n etl --as=system:serviceaccount:db:db-manager-sa

Perfect isolation achieved! Each manager can only access their designated namespace.
Namespaces separate resources for RBAC and organization, but pods can still talk across namespaces. The ETL pod connects to the database using postgres-db-service.db.svc.cluster.local. For network isolation, you need NetworkPolicies.
This seemed annoying at first, but it's excellent security. It forces explicit sharing. Copying the database password to the ETL namespace was the right call; the job legitimately needs it.
Kubernetes protects your data by not auto-deleting persistent volumes. This felt annoying during debugging, but would be a lifesaver in production.
Even with init containers and readiness probes, application-level retries are crucial. Databases restart, networks hiccup, and distributed systems are inherently unreliable.
--as Flag Is Your Testing SuperpowerImpersonating ServiceAccounts makes testing RBAC painless. No tokens, no kubeconfig gymnastics, just pretend to be the user and see what happens.
CrashLoopBackOff → Check logs
ErrImagePull → Check image path
Secret not found → Check namespace
Forbidden → RBAC working correctly!
After presenting this at our team meeting, people immediately saw uses:
Multi-tenant SaaS: One namespace per customer with isolated permissions
Compliance: SOC2 auditors love the separation of duties
Team boundaries: Dev teams manage their namespaces without breaking production
CI/CD: Build jobs can't touch production resources

It took me a week and three major errors to secure our Kubernetes cluster, but I learned more from those failures than any tutorial could teach.
What I wish I'd known on day one:
Start with working apps, then add security
Expect failures; they're learning opportunities
Test thoroughly with --as a flag.
Document everything
Security is iterative, not a one-time task
The key lesson: Security isn't about building a perfect fortress on the first try. It's about understanding tools, learning from mistakes, and continuously improving.

Don't forget to delete resources if you're following along:
az group delete --name $RESOURCE_GROUP --yes --no-wait
Your cloud bill will thank you! 💰

Tired of boring images? Meet the 'Jai & Veeru' of AI! See how combining Claude and Nano Banana Pro creates mind-blowing results for comics, diagrams, and more.

An honest, first-person account of learning dynamic pricing through hands-on Excel analysis. I tackled a real CPG problem : Should FreshJuice implement different prices for weekdays vs weekends across 30 retail stores?

This blog walks you through how Databricks Connect completely transforms PySpark development workflow by letting us run Databricks-backed Spark code directly from your local IDE. From setup to debugging to best practices this Blog covers it all.

This blog unpacks how brands like Amazon and Domino’s decide who gets which coupon and why. Learn how simple RFM metrics turn raw purchase data into smart, personalised loyalty offers.

Learn how Snowflake's Query Acceleration Service provides temporary compute bursts for heavy queries without upsizing. Per-second billing, automatic scaling.

A simple ETL job broke into a 5-hour Kubernetes DNS nightmare. This blog walks through the symptoms, the chase, and the surprisingly simple fix.

A data engineer started a large cluster for a short task and couldn’t stop it due to limited permissions, leaving it idle and causing unnecessary cloud costs. This highlights the need for proper access control and auto-termination.

Say goodbye to deployment headaches. Learn how Databricks Asset Bundles keep your pipelines consistent, reproducible, and stress-free—with real-world examples and practical tips for data engineers.

Tracking sensitive data across Snowflake gets overwhelming fast. Learn how object tagging solved my data governance challenges with automated masking, instant PII discovery, and effortless scaling. From manual spreadsheets to systematic control. A practical guide for data professionals.

My first hand experience learning the essential concepts of Dynamic pricing

Running data quality checks on retail sales distribution data

This blog explores my experience with cleaning datasets during the process of performing EDA for analyzing whether geographical attributes impact sales of beverages

Snowflake recommends 100–250 MB files for optimal loading, but why? What happens when you load one large file versus splitting it into smaller chunks? I tested this with real data, and the results were surprising. Click to discover how this simple change can drastically improve loading performance.

Master the bronze layer foundation of medallion architecture with COPY INTO - the command that handles incremental ingestion and schema evolution automatically. No more duplicate data, no more broken pipelines when new columns arrive. Your complete guide to production-ready raw data ingestion

Learn Git and GitHub step by step with this complete guide. From Git basics to branching, merging, push, pull, and resolving merge conflicts—this tutorial helps beginners and developers collaborate like pros.

Discover how data management, governance, and security work together—just like your favorite food delivery app. Learn why these three pillars turn raw data into trusted insights, ensuring trust, compliance, and business growth.

Beginner’s journey in AWS Data Engineering—building a retail data pipeline with S3, Glue, and Athena. Key lessons on permissions, data lakes, and data quality. A hands-on guide for tackling real-world retail datasets.

A simple request to automate Google feedback forms turned into a technical adventure. From API roadblocks to a smart Google Apps Script pivot, discover how we built a seamless system that cut form creation time from 20 minutes to just 2.

Step-by-step journey of setting up end-to-end AKS monitoring with dashboards, alerts, workbooks, and real-world validations on Azure Kubernetes Service.

My learning experience tracing how an app works when browser is refreshed

Demonstrates the power of AI assisted development to build an end-to-end application grounds up

A hands-on learning journey of building a login and sign-up system from scratch using React, Node.js, Express, and PostgreSQL. Covers real-world challenges, backend integration, password security, and key full-stack development lessons for beginners.

This is the first in a five-part series detailing my experience implementing advanced data engineering solutions with Databricks on Google Cloud Platform. The series covers schema evolution, incremental loading, and orchestration of a robust ELT pipeline.

Discover the 7 major stages of the data engineering lifecycle, from data collection to storage and analysis. Learn the key processes, tools, and best practices that ensure a seamless and efficient data flow, supporting scalable and reliable data systems.

This blog is troubleshooting adventure which navigates networking quirks, uncovers why cluster couldn’t reach PyPI, and find the real fix—without starting from scratch.

Explore query scanning can be optimized from 9.78 MB down to just 3.95 MB using table partitioning. And how to use partitioning, how to decide the right strategy, and the impact it can have on performance and costs.

Dive deeper into query design, optimization techniques, and practical takeaways for BigQuery users.

Wondering when to use a stored procedure vs. a function in SQL? This blog simplifies the differences and helps you choose the right tool for efficient database management and optimized queries.

Discover how BigQuery Omni and BigLake break down data silos, enabling seamless multi-cloud analytics and cost-efficient insights without data movement.

In this article we'll build a motivation towards learning computer vision by solving a real world problem by hand along with assistance with chatGPT

This blog explains how Apache Airflow orchestrates tasks like a conductor leading an orchestra, ensuring smooth and efficient workflow management. Using a fun Romeo and Juliet analogy, it shows how Airflow handles timing, dependencies, and errors.

The blog underscores how snapshots and Point-in-Time Restore (PITR) are essential for data protection, offering a universal, cost-effective solution with applications in disaster recovery, testing, and compliance.

The blog contains the journey of ChatGPT, and what are the limitations of ChatGPT, due to which Langchain came into the picture to overcome the limitations and help us to create applications that can solve our real-time queries

This blog simplifies the complex world of data management by exploring two pivotal concepts: Data Lakes and Data Warehouses.

demystifying the concepts of IaaS, PaaS, and SaaS with Microsoft Azure examples

Discover how Azure Data Factory serves as the ultimate tool for data professionals, simplifying and automating data processes

Revolutionizing e-commerce with Azure Cosmos DB, enhancing data management, personalizing recommendations, real-time responsiveness, and gaining valuable insights.

Highlights the benefits and applications of various NoSQL database types, illustrating how they have revolutionized data management for modern businesses.

This blog delves into the capabilities of Calendar Events Automation using App Script.

Dive into the fundamental concepts and phases of ETL, learning how to extract valuable data, transform it into actionable insights, and load it seamlessly into your systems.

An easy to follow guide prepared based on our experience with upskilling thousands of learners in Data Literacy

Teaching a Robot to Recognize Pastries with Neural Networks and artificial intelligence (AI)

Streamlining Storage Management for E-commerce Business by exploring Flat vs. Hierarchical Systems

Figuring out how Cloud help reduce the Total Cost of Ownership of the IT infrastructure

Understand the circumstances which force organizations to start thinking about migration their business to cloud