


In the health insurance industry, the cost of risk misclassification can run into crores each month — from approving risky claims to misallocating premiums. PrimerInsurance, a leading health insurer, faces this very challenge: identifying which policyholders are genuinely “high-risk.” Without a predictive system, underwriters rely on static rules, leading to inconsistent decisions, revenue leakage, and inflated claim ratios.
The stakes are high. Every misclassified customer increases claim exposure and erodes profitability. Over time, the absence of data-driven risk segmentation results in delayed approvals, increased churn among low-risk customers, and higher operational costs. The need of the hour is an automated, explainable, and efficient risk classification model capable of flagging high-risk customers before policy issuance.
As a data scientist at PrimerInsurance, you’ll build an end-to-end machine learning workflow to predict customer risk categories using Python, Pandas, scikit-learn, and MLflow. You’ll work with three datasets — customers, policy details, and claims — containing demographics, medical history, and claim outcomes. The project follows a complete ML lifecycle: data preparation, model training, evaluation, and ROI interpretation.
What You’ll Build:
By the end, you’ll have built a fully functional risk prediction engine, trained and evaluated across multiple models, and validated with measurable business outcomes. Submit your Python code snippets, solution notebook, and evaluation outputs to demonstrate your implementation and insights.