This hands-on project guides you through building a complete Medallion Architecture (Bronze–Silver–Gold) pipeline in Databricks Community Edition using a real-world e-commerce scenario from GlobalMart.
You'll implement all three layers of the medallion architecture:
- Bronze Layer: Ingest raw customer, order, and transaction data exactly as received from source systems
- Silver Layer: Apply data quality checks including duplicate detection, missing value handling, email validation, and referential integrity constraints
- Gold Layer: Create aggregated business metrics like customer order counts and total spending for analytics-ready insights
What makes this different:
- Complete end-to-end implementation from raw ingestion to business metrics
- Practical data quality validation techniques including constraint enforcement
- Real-world e-commerce datasets with actual data quality issues to resolve
- Step-by-step guidance with code examples and validation checkpoints
By the end of this project, you'll have a working medallion architecture pipeline demonstrating standardized data processing, quality assurance, and business-ready analytics.