


When I first started working with data, one phrase kept coming up in every team meeting: "Be careful with PII."
Our Snowflake environment had sensitive information scattered everywhere. My manager would remind me, "One mistake, one unmasked column exposed to the wrong person, and we're looking at a compliance violation." I'd nod and make a mental note of which tables were sensitive.
But as our data warehouse grew, that mental map quickly broke down.
Three months in, our compliance lead sent me a Slack message: "Can you send me a list of all tables and columns containing PII by EoD?"
I stared at my screen.
We had 40+ tables, some I knew contained PII. Others? I'd have to dig through schemas, guess from table names, or query each one individually. There was no systematic way to answer this question.
That's when I discovered object tagging in Snowflake. I realized I'd been manually solving a problem that had a built-in solution.
Object tagging is Snowflake's built-in labeling system for your data. Think of it like moving to a new house. You label boxes: "Kitchen," "Fragile," "Books." You know what's inside without opening each one.
In Snowflake, tags work the same way. You can attach labels to databases, schemas, tables, warehouses, and even individual columns. A tag is simply a key-value pair.
For example :
-- Create the tag
CREATE TAG DATA_CLASSIFICATION;
-- Apply & set tag on columns
ALTER TABLE CUSTOMERS
MODIFY COLUMN EMAIL SET TAG DATA_CLASSIFICATION = 'PII';No more guessing. No more manual tracking.
But here's where it gets powerful, tags aren't just labels. They enable automated governance, security policies, and instant visibility across your entire data landscape.
Remember that compliance request? "Send me all tables containing PII." With tags, that becomes a simple query instead of a manual hunt.
Here's how it works :
I created a DATA_CLASSIFICATION tag and applied it to every table or column containing sensitive information.
Marketing tables get tagged as PII
Financial tables get tagged as Financial Data.
Operations tables get tagged as Internal Only.
-- Tag tables by data classification
ALTER TABLE CUSTOMERS
SET TAG DATA_CLASSIFICATION = 'PII';
ALTER TABLE TRANSACTIONS
SET TAG DATA_CLASSIFICATION = 'Financial Data';
ALTER TABLE LOGS
SET TAG DATA_CLASSIFICATION = 'Internal Only';When compliance asks for all PII tables, I just need to query the tag references
SELECT OBJECT_NAME,
OBJECT_DATABASE,
TAG_VALUE
FROM SNOWFLAKE.ACCOUNT_USAGE.TAG_REFERENCES
WHERE TAG_NAME = 'DATA_CLASSIFICATION' AND TAG_VALUE = 'PII';This provides an instant answer. No guessing. No missed tables.
This applies beyond compliance. I can organize objects by business unit, project, cost center, or any classification system the organization needs. Tags create a flexible metadata layer that scales with data.
Here's where tags become more than just labels. They can trigger security policies automatically.
Let's say we have 20 columns across different tables containing email addresses. Normally, we would apply a masking policy to each column individually. With tags, we do it once.
-- Create a tag for PII data
CREATE TAG PII_DATA;
-- Tag the sensitive columns
ALTER TABLE CUSTOMERS
MODIFY COLUMN EMAIL SET TAG PII_DATA = 'Email';
ALTER TABLE MARKETING_CONTACTS
MODIFY COLUMN CONTACT_EMAIL SET TAG PII_DATA = 'Email';
-- Create a masking policy
CREATE MASKING POLICY mask_pii AS (val STRING)
RETURNS STRING ->
CASE
WHEN CURRENT_ROLE() IN ('ACCOUNTADMIN', 'SECURITYADMIN') THEN val
ELSE '***MASKED***'
END;
-- Link the policy to the tag
ALTER TAG PII_DATA SET MASKING POLICY mask_pii;Every column tagged with PII_DATA is automatically masked.
Now, let's consider these scenarios six months later:
Our data warehouse doubled in size.
New data sources are feeding in without a central place to track sensitivity.
Marketing launches a customer rewards program and needs a LOYALTY_MEMBERS table to store customer emails and phone numbers.
Create the LOYALTY_MEMBERS table.
Manually add it to my spreadsheet of “tables with PII.”
Remember to apply masking policies to each sensitive column individually.
When compliance asks for an updated report:
Open the spreadsheet and hope I remembered to update it.
Manually verify each entry against the warehouse.
Spend 2 hours producing a list I’m only 80% confident is correct.
As the warehouse grows, this process doesn’t just get slower, it gets riskier.
Create the table and tag the sensitive columns:
-- Create the table
CREATE TABLE LOYALTY_MEMBERS (
CUSTOMER_ID NUMBER,
EMAIL STRING,
PHONE STRING,
CREATED_AT TIMESTAMP
);
-- Set the PII tag for both the columns
ALTER TABLE LOYALTY_MEMBERS
MODIFY COLUMN EMAIL
SET TAG DATA_CLASSIFICATION = 'PII';
ALTER TABLE LOYALTY_MEMBERS
MODIFY COLUMN PHONE
SET TAG DATA_CLASSIFICATION = 'PII';Masking policies apply automatically to any column tagged as DATA_CLASSIFICATION = 'PII'.
When compliance asks for a report, I can just query ACCOUNT_USAGE for all columns tagged as PII and get an accurate, up-to-date list in seconds.
No manual cross-checks. No “I hope I didn’t miss anything.”
The data warehouse grew in size, but data governance didn’t break, because our policies follow tags, not ad hoc lists of tables and columns.
Object tagging solved my data governance problem. What used to require manual tracking, spreadsheets, and guesswork became a systematic, scalable process built into Snowflake.
Tags are metadata labels that you attach to databases, schemas, tables, and columns. Think of them as organizing your data warehouse the same way you'd label moving boxes.
Tags enable automated governance. Link a masking policy to a tag once, and every tagged column is automatically protected. No manual policy application for each column.
Tags make data discoverable. Finding all tables with PII across your organization becomes a simple query instead of a manual hunt through schemas.
Tags scale effortlessly. As your warehouse grows, new tables, and data sources are added, tags maintain governance without breaking down like manual tracking methods.
Now whenever my compliance team asks for a list of sensitive data, I have the answer in seconds instead of hours.
Want to master Snowflake and prepare for the SnowPro Core Certification?
Check out the SnowPro Core Certification Skill Path on Enqurious Academy and start your certification journey.

Tired of boring images? Meet the 'Jai & Veeru' of AI! See how combining Claude and Nano Banana Pro creates mind-blowing results for comics, diagrams, and more.

An honest, first-person account of learning dynamic pricing through hands-on Excel analysis. I tackled a real CPG problem : Should FreshJuice implement different prices for weekdays vs weekends across 30 retail stores?

What I thought would be a simple RBAC implementation turned into a comprehensive lesson in Kubernetes deployment. Part 1: Fixing three critical deployment errors. Part 2: Implementing namespace-scoped RBAC security. Real terminal outputs and lessons learned included

This blog walks you through how Databricks Connect completely transforms PySpark development workflow by letting us run Databricks-backed Spark code directly from your local IDE. From setup to debugging to best practices this Blog covers it all.

This blog unpacks how brands like Amazon and Domino’s decide who gets which coupon and why. Learn how simple RFM metrics turn raw purchase data into smart, personalised loyalty offers.

Learn how Snowflake's Query Acceleration Service provides temporary compute bursts for heavy queries without upsizing. Per-second billing, automatic scaling.

A simple ETL job broke into a 5-hour Kubernetes DNS nightmare. This blog walks through the symptoms, the chase, and the surprisingly simple fix.

A data engineer started a large cluster for a short task and couldn’t stop it due to limited permissions, leaving it idle and causing unnecessary cloud costs. This highlights the need for proper access control and auto-termination.

Say goodbye to deployment headaches. Learn how Databricks Asset Bundles keep your pipelines consistent, reproducible, and stress-free—with real-world examples and practical tips for data engineers.

My first hand experience learning the essential concepts of Dynamic pricing

Running data quality checks on retail sales distribution data

This blog explores my experience with cleaning datasets during the process of performing EDA for analyzing whether geographical attributes impact sales of beverages

Snowflake recommends 100–250 MB files for optimal loading, but why? What happens when you load one large file versus splitting it into smaller chunks? I tested this with real data, and the results were surprising. Click to discover how this simple change can drastically improve loading performance.

Master the bronze layer foundation of medallion architecture with COPY INTO - the command that handles incremental ingestion and schema evolution automatically. No more duplicate data, no more broken pipelines when new columns arrive. Your complete guide to production-ready raw data ingestion

Learn Git and GitHub step by step with this complete guide. From Git basics to branching, merging, push, pull, and resolving merge conflicts—this tutorial helps beginners and developers collaborate like pros.

Discover how data management, governance, and security work together—just like your favorite food delivery app. Learn why these three pillars turn raw data into trusted insights, ensuring trust, compliance, and business growth.

Beginner’s journey in AWS Data Engineering—building a retail data pipeline with S3, Glue, and Athena. Key lessons on permissions, data lakes, and data quality. A hands-on guide for tackling real-world retail datasets.

A simple request to automate Google feedback forms turned into a technical adventure. From API roadblocks to a smart Google Apps Script pivot, discover how we built a seamless system that cut form creation time from 20 minutes to just 2.

Step-by-step journey of setting up end-to-end AKS monitoring with dashboards, alerts, workbooks, and real-world validations on Azure Kubernetes Service.

My learning experience tracing how an app works when browser is refreshed

Demonstrates the power of AI assisted development to build an end-to-end application grounds up

A hands-on learning journey of building a login and sign-up system from scratch using React, Node.js, Express, and PostgreSQL. Covers real-world challenges, backend integration, password security, and key full-stack development lessons for beginners.

This is the first in a five-part series detailing my experience implementing advanced data engineering solutions with Databricks on Google Cloud Platform. The series covers schema evolution, incremental loading, and orchestration of a robust ELT pipeline.

Discover the 7 major stages of the data engineering lifecycle, from data collection to storage and analysis. Learn the key processes, tools, and best practices that ensure a seamless and efficient data flow, supporting scalable and reliable data systems.

This blog is troubleshooting adventure which navigates networking quirks, uncovers why cluster couldn’t reach PyPI, and find the real fix—without starting from scratch.

Explore query scanning can be optimized from 9.78 MB down to just 3.95 MB using table partitioning. And how to use partitioning, how to decide the right strategy, and the impact it can have on performance and costs.

Dive deeper into query design, optimization techniques, and practical takeaways for BigQuery users.

Wondering when to use a stored procedure vs. a function in SQL? This blog simplifies the differences and helps you choose the right tool for efficient database management and optimized queries.

Discover how BigQuery Omni and BigLake break down data silos, enabling seamless multi-cloud analytics and cost-efficient insights without data movement.

In this article we'll build a motivation towards learning computer vision by solving a real world problem by hand along with assistance with chatGPT

This blog explains how Apache Airflow orchestrates tasks like a conductor leading an orchestra, ensuring smooth and efficient workflow management. Using a fun Romeo and Juliet analogy, it shows how Airflow handles timing, dependencies, and errors.

The blog underscores how snapshots and Point-in-Time Restore (PITR) are essential for data protection, offering a universal, cost-effective solution with applications in disaster recovery, testing, and compliance.

The blog contains the journey of ChatGPT, and what are the limitations of ChatGPT, due to which Langchain came into the picture to overcome the limitations and help us to create applications that can solve our real-time queries

This blog simplifies the complex world of data management by exploring two pivotal concepts: Data Lakes and Data Warehouses.

demystifying the concepts of IaaS, PaaS, and SaaS with Microsoft Azure examples

Discover how Azure Data Factory serves as the ultimate tool for data professionals, simplifying and automating data processes

Revolutionizing e-commerce with Azure Cosmos DB, enhancing data management, personalizing recommendations, real-time responsiveness, and gaining valuable insights.

Highlights the benefits and applications of various NoSQL database types, illustrating how they have revolutionized data management for modern businesses.

This blog delves into the capabilities of Calendar Events Automation using App Script.

Dive into the fundamental concepts and phases of ETL, learning how to extract valuable data, transform it into actionable insights, and load it seamlessly into your systems.

An easy to follow guide prepared based on our experience with upskilling thousands of learners in Data Literacy

Teaching a Robot to Recognize Pastries with Neural Networks and artificial intelligence (AI)

Streamlining Storage Management for E-commerce Business by exploring Flat vs. Hierarchical Systems

Figuring out how Cloud help reduce the Total Cost of Ownership of the IT infrastructure

Understand the circumstances which force organizations to start thinking about migration their business to cloud