Customer churn prediction is crucial in the FinTech domain. Your task is to build a robust pipeline addressing data extraction, model development, and system design.

Access the dataset and a sample solution reference using the links below.

Hypothesis Building

State a concise hypothesis connecting features to potential influences on customer churn. If you evaluate more than one hypothesis, build separate pipelines for them and showcase your understanding of pipelines by re-using components.

Standard EDA

  • Covariance and Correlation Matrix: Display matrices to understand feature relationships.
  • Data Quality Check: Handle missing values, outliers, and other data quality issues.

Feature Engineering and Reduction

  • Create a maximum of 6 raw or derived features. Use suitable techniques for feature selection or reduction.

Model Evaluation Metrics

  • Decide on appropriate evaluation metrics for churn and connect them to business KPIs (e.g., customer lifetime value, retention cost).

Model Development

  • Model Creation: Develop at least two models for comparison (e.g., Logistic Regression, Random Forest).
  • Implement hyperparameter tuning for at least one model.

Airflow/Kubeflow Integration

  • Create an Airflow pipeline for data processing and model training.
  • Use Kubeflow for managing the ML workflow on a simulated cluster. Senior+ only

Container Deployment

  • Containerize the model using Docker.
  • Deploy the containerized model on a local cluster with minikube and kind. Senior+ only

Model Deployment Plan and Architecture Design

  • Create a working solution and share recorded video or screenshots.
  • Highlight components for model serving, monitoring, logging, and iteration/updation. Senior+ only
  • Achieve a model accuracy on the test dataset greater than 70%.
  • Packaging: Include a README for installation and execution of the end-to-end pipeline.
  • CI/CD: Implement CI using Github Actions. Implement CD if deploying on the cloud with ECR/EC2/EKS. Senior+ only
  • Post Model-Serving Stages: Present a strategic roll-out and A/B/N testing plan. Explain how you would handle drift detection with your pipelines. Senior+ only
  • Documentation Skills: Showcase the documentation's value to the organization.
  • Version Control: Use version control for code and for artifacts (model, data, pipelines). Senior+ only
  • Report (PDF):
    • Pipeline description and design choices.
    • Model performance evaluation.
    • Scaling up the pipeline discussion. Senior+ only
    • Future work discussion.
  • Source Code: All working code.
  • Video/Screenshots: A video or screenshots of a working solution. A 5-minute video pitch explaining the solution and components is preferred.

Timeline

One week for submission. Please contact the appropriate party for any extension requests.

Note

While we are not expecting everything to be covered, please ensure the solution is sufficiently complete for evaluation on coding standards, ML, documentation, and system design.