Zepto Demand Forecasting System — ML System Design

Zepto Demand Forecasting

ML System Design Time Series Forecasting Scalability Expert

The Challenge: Hyperlocal Demand Forecasting for Zepto

You're an ML Engineer at Zepto, the quick commerce grocery delivery platform. Zepto operates numerous "dark stores" (micro-fulfillment centers) across various Indian cities. To ensure optimal inventory and minimize stockouts/wastage, Zepto needs an accurate demand forecasting system. How would you design a system to forecast the demand for thousands of SKUs (e.g., 5,000-10,000 SKUs per dark store, across 100s of dark stores) for the next 1-7 days, at a daily or even hourly granularity, considering promotions, holidays, and hyperlocal factors?

Initial Thoughts & Clarifications

Definition of "Demand": Is it actual sales, or sales adjusted for stockouts (true demand)? How to estimate true demand?
Forecasting Horizon & Granularity: Next 1-7 days is mentioned. Is daily sufficient, or is hourly needed for specific fast-moving items or operational planning within the dark store?
Product Scope: All SKUs? Or focus on top N% movers initially? How to handle long-tail items?
Geographical Scope: Forecast per SKU per dark store? Or aggregate at a city level first?
Data Sources: What historical data is available? (Sales transactions with timestamps, SKU ID, dark store ID, price, promotions applied). Product metadata (category, brand, perishability, shelf life). Store characteristics (size, location demographics). External data (holidays, weather, local events, competitor activity).
Key Business Goals: Minimize stockouts (lost sales, bad CX), minimize wastage (especially for perishables), optimize inventory holding costs, improve delivery ETAs by having products ready. What are the relative costs of over-forecasting vs. under-forecasting?
Scalability: Forecasting for potentially millions of (SKU x Dark Store) time series.
Cold Start Problem: How to forecast for new SKUs or new dark stores with limited history?
Existing Systems: Any current forecasting methods in place? What are their limitations?

Framework to Consider (Demand Forecasting System):

Problem Definition & Scope:
- Define forecast target (demand), horizon, granularity (daily/hourly), and scope (SKU x Store).
- Identify key business objectives and trade-offs.
Data Sources & Understanding:
- Identify all relevant internal and external data sources.
- Analyze data quality, completeness, and potential biases (e.g., stockouts).
Data Preparation & Feature Engineering for Time Series:
- Time series preprocessing: Handling missing sales data, outlier detection/treatment, demand estimation during stockouts.
- Feature creation: Lagged sales, rolling statistics (mean, std, min, max over various windows), calendar features (day of week, week of year, month, holidays - including regional ones like Ugadi, Sankranti, Diwali, local festivals), promotion flags/durations, price elasticity features.
- Product/Store features: Category, brand, perishability, shelf life, store location type, weather data for store location.
- Train/validation/test splitting: Time-series cross-validation (e.g., walk-forward validation, rolling origin validation) to prevent data leakage.
Model Selection & Architecture:
- Explore options:
  - Classical: ARIMA, SARIMA, Exponential Smoothing (ETS), Prophet.
  - Machine Learning: Gradient Boosting (XGBoost, LightGBM), Random Forest (often need careful feature engineering for time series).
  - Deep Learning: RNNs (LSTM, GRU), Temporal Convolutional Networks (TCNs), Transformers adapted for time series.
- Consider model per time series vs. global models (that learn across many time series using product/store embeddings).
- Hierarchical forecasting (e.g., reconcile forecasts from SKU level to category level to store total).
- Cold start strategies for new products/stores.
Scalability & System Architecture:
- Design for training and inference for millions of time series.
- Batch processing for model retraining and daily/weekly forecast generation.
- Near real-time updates if hourly forecasts or very reactive promotion handling is needed.
- Technology stack for data processing (Spark), model training (distributed frameworks), model serving, and forecast storage.
Rollout Strategy & Risk Management:
- Phased rollout (e.g., by city, by category, by store).
- Shadow mode deployment to compare against existing methods.
- A/B testing forecast impact on operational KPIs.
- Rollback plans. Communication with operations teams.
Evaluation:
- Time series metrics: MAE, RMSE, MAPE, WMAPE (Weighted MAPE, often weighted by sales volume/value). Quantile loss for probabilistic forecasts.
- Forecast bias analysis.
- Business metrics: Stockout rate reduction, wastage reduction, inventory turnover improvement, impact on order fulfillment times.
Human-in-the-Loop & Continuous Improvement:
- Mechanism for operations teams/category managers to review and potentially adjust forecasts for exceptional situations.
- Feedback loop to retrain models with corrected data or new features.
- Monitoring for concept drift (e.g., changing customer behavior).
Ethical Considerations & Bias Mitigation:
- Ensure fairness in forecasting if it impacts regional product assortments or availability.
- Be aware of biases in historical data (e.g., stockouts under-representing true demand for certain items).

Simulated Conversation

Round 1: Problem Understanding & Scope Definition

Interviewer 1 (I1 - Lead ML Engineer): Welcome! At Zepto, we pride ourselves on 10-minute grocery delivery. Accurate demand forecasting is absolutely critical to make this happen by ensuring the right products are in the right dark stores at the right time. We need a system to forecast demand for thousands of SKUs per dark store, across our network of hundreds of dark stores. The horizon is the next 1-7 days. How would you approach designing this system? Let's start with clarifying the problem.

Candidate (C): This is a classic and highly impactful time series forecasting problem. Before diving into solutions, I'd like to clarify a few points:

Definition of "Demand": Are we forecasting actual sales, or are we trying to estimate "true demand" by accounting for periods when items were out of stock? Estimating true demand is harder but gives a better picture for inventory planning.
Forecast Granularity: The horizon is 1-7 days. Is a daily forecast per SKU per dark store the primary goal? Or is there a need for sub-daily (e.g., hourly) forecasts for very fast-moving items or for intra-day operational adjustments?
Product Scope: Are we forecasting for all SKUs, including very slow-moving or new ones? Or do we prioritize, say, the top 80-90% of SKUs by sales volume/frequency initially?
Geographical Granularity: The core unit seems to be SKU x Dark Store. Is there any need for aggregated forecasts at a city or regional level for higher-level planning?
Key Business Objectives & Trade-offs: The main goals are minimizing stockouts and wastage. What's the relative business cost of under-forecasting (leading to stockouts and lost sales) versus over-forecasting (leading to wastage for perishables, or higher holding costs)? This will influence our choice of loss functions and model evaluation.
New Product/Store Handling: How frequently are new SKUs introduced or new dark stores launched? We'll need a strategy for the "cold start" problem.

Understanding these will help scope the initial version of the system effectively.

Strong Clarification: Candidate asks critical questions about demand definition, granularity, scope, and business objectives, showing a good understanding of forecasting nuances.

Interviewer 2 (I2 - Head of Supply Chain): Excellent questions.

Demand: Ideally, true demand. We have data on when items were marked out-of-stock (OOS).
Granularity: Let's start with daily forecasts per SKU per dark store for the next 7 days. Hourly could be a future enhancement.
Product Scope: Let's aim for comprehensive coverage, but we can prioritize models or apply simpler heuristics for very long-tail items if needed.
Geo Granularity: SKU x Dark Store is primary. Aggregates can be derived later.
Trade-offs: Stockouts are very bad for our 10-minute promise and customer retention. Wastage is also a concern, especially for fresh produce like vegetables, fruits, dairy – think items like Amul milk packets, ID dosa batter, fresh coriander. For now, let's assume stockouts are slightly more costly.
New Products/Stores: New SKUs are added weekly, new stores monthly. A cold start solution is essential.

C: Understood. That clarifies the initial focus well. So, the core task is to predict daily sales units for each SKU at each dark store for the next 7 days, trying to estimate true demand by considering stockout periods, with a higher penalty for under-forecasting.

Good Summary: Candidate succinctly summarizes the refined problem statement.

Round 2: Data Sources & Understanding

I1 (Lead ML Engineer): Okay, now that the scope is clearer, what data sources would you expect to use for building this forecasting system at Zepto, and what are some potential challenges or quality issues you'd anticipate with this data?

C: For a robust demand forecasting system, we'd need a rich set of internal and potentially external data sources.

Potential Data Sources:

Internal Data:
- Historical Sales Data: This is the primary time series.
  - Key fields: Timestamp of sale, SKU ID, Dark Store ID, quantity sold, selling price, any discounts/promotions applied at item level.
  - Granularity: Ideally, transaction-level, which can be aggregated to daily/hourly.
- Stockout Data: Information on when an SKU was out of stock at a specific dark store. This is crucial for estimating true demand.
  - Fields: SKU ID, Dark Store ID, OOS start timestamp, OOS end timestamp.
- Promotions Data:
  - Fields: Promotion ID, SKU ID(s) involved, Dark Store ID(s) where active, promotion type (e.g., discount, BOGO, combo), start/end date, promotion mechanics.
  - This helps model promotional lift.
- Product Metadata (SKU Master):
  - Fields: SKU ID, product name, description, category (e.g., "Dairy," "Fresh Vegetables," "Snacks"), sub-category, brand, pack size, price, perishability flag, shelf life, country of origin, any specific attributes like "organic," "vegan."
  - Important for understanding product characteristics and for cold start (finding similar products).
- Dark Store Metadata:
  - Fields: Dark Store ID, location (latitude/longitude, city, pincode), size/capacity, opening date, operating hours, surrounding neighborhood demographics (if available).
- Inventory Data (Optional but useful): Current and historical inventory levels at dark stores. Can help validate stockout periods or understand replenishment cycles.
- Pricing History: Changes in base price over time for each SKU.
External Data (Contextual Factors):
- Calendar/Holiday Data: Public holidays (national and regional - e.g., Diwali, Eid, Christmas, local city festivals like Ganesh Chaturthi in Mumbai/Hyderabad, Ugadi), special event days (e.g., big cricket matches, New Year's Eve).
- Weather Data: For each dark store's location: daily temperature (min/max/avg), precipitation, humidity, extreme weather events. This can impact demand for certain categories (e.g., ice cream, hot beverages).
- Local Events Data (Harder to get systematically): Major local festivals, school holidays, potentially even local strikes or disruptions if they significantly impact movement and demand.
- Competitor Data (Very hard to get): Information on major competitor promotions or stock situations, if ethically and legally obtainable. This is usually a stretch.
- Macroeconomic Indicators (Less likely for short-term forecast): Inflation, GDP growth (more relevant for long-term strategic forecasting).

Potential Data Challenges & Quality Issues:

Stockout Data Accuracy: Is the OOS data reliably captured? There might be delays in marking items OOS, or brief periods of OOS might be missed.
True Demand Estimation: Even with OOS data, estimating how much more would have sold is non-trivial. Simplistic imputation might be needed initially.
Promotion Data Completeness: Are all types of promotions (centralized, store-specific, ad-hoc) captured systematically? Are their actual start/end times accurate?
Data Granularity Mismatches: Sales might be at transaction level, promotions at daily level, weather at hourly/daily. Aggregation/disaggregation needs care.
Product Lifecycle: SKUs get introduced, become popular, and then might decline or be delisted. Handling these lifecycle stages is important.
Data Sparsity: Many SKUs (long-tail items) will have very sparse sales history (many zero-sale days), making individual forecasting difficult.
Outliers & Anomalies: Unusual sales spikes (e.g., bulk order by one customer, data entry errors) can distort the historical pattern.
Missing Values: In any of the data streams.
Time Zone Consistency: Ensure all timestamped data is in a consistent time zone (e.g., UTC or IST).

Thorough Data Understanding: Candidate lists a comprehensive set of relevant internal and external data sources and proactively identifies potential quality issues and challenges.

Round 3: Data Preparation & Feature Engineering for Time Series

I1 (Lead ML Engineer): That's a good overview of the data landscape. Now, let's get into the specifics of preparing this data for your forecasting models. This is a critical step. Can you detail your approach to time series preprocessing and, very importantly, the feature engineering you would perform? How would you structure your train/validation/test splits for time series data?

C: Data preparation and feature engineering are indeed where a lot of the magic happens in time series forecasting. My approach would involve several steps:

Time Series Preprocessing:

Data Aggregation:
- Aggregate raw transaction data to the desired forecast granularity (e.g., daily sales quantity per SKU per Dark Store).
Handling Missing Sales Data (Zero Sales vs. True Missing):
- For days where an SKU-Store combination has no sales record, we need to distinguish if it was a true zero-sale day (product was available but not sold) or if data is missing. If inventory data is available, it can help. If not, we might assume zero sales if the product was active.
True Demand Estimation (Addressing Stockouts):
- For periods where an SKU was marked OOS:
  - Simple Imputation: Replace OOS period sales with an average of sales from surrounding non-OOS periods (e.g., average of 2 days before and 2 days after OOS, or average for the same day of the week in previous weeks).
  - Model-based Imputation (Advanced): Train a separate model to predict sales during OOS, using features like sales before OOS, day of week, promotions, etc. This is more complex.
  - Initially, I'd start with simpler imputation methods and flag these imputed values.
Outlier Detection and Treatment:
- Identify sales figures that are abnormally high or low (e.g., using rolling standard deviations, IQR method).
  - For Zepto, a sudden bulk order for a typically low-volume item by a single customer could be an outlier.
- Treatment: Cap/floor outliers, or replace with a rolling median/mean, or treat as missing and impute. Careful not to remove true demand spikes caused by valid events.
Time Series Alignment: Ensure all time series (sales, promotions, weather, etc.) are aligned on the same daily/hourly index for each SKU-Store combination. Forward-fill or back-fill for features where appropriate (e.g., a promotion is active for the whole day).

Feature Engineering:

The goal is to create features that capture trend, seasonality, promotional effects, and other influencing factors.

Lagged Features (Autoregressive component):
- Lagged sales values: Sales from `t-1` day, `t-2` days, ..., `t-7` days, `t-14` days, `t-28` days. The choice of lags depends on seasonality and product lifecycle.
  - Example: For `milk`, `t-1` and `t-7` (same day last week) are likely very important.
Rolling Window Statistics:
- Rolling mean, median, min, max, std dev of sales over various past windows (e.g., last 3 days, 7 days, 14 days, 28 days).
  - Example: `rolling_mean_sales_7d`, `rolling_std_sales_28d`.
- These capture recent trends and volatility.
Date & Calendar Features:
- Day of the week (encoded, e.g., one-hot or cyclical).
- Day of the month, day of the year.
- Week of the month, week of the year.
- Month, Quarter, Year.
- Is_weekend flag.
- Holiday Features:
  - Binary flag for public holidays (national & regional relevant to the dark store's city).
    - Consider Telugu festivals like Ugadi, Sankranti, Dasara, and national ones like Diwali, Eid, Christmas.
  - Days before/after a major holiday (e.g., `days_until_diwali`, `days_after_diwali`).
  - Special event days (e.g., major cricket matches).
Promotion Features:
- Binary flag: `is_on_promotion` (1 if active, 0 otherwise).
- Promotion type (categorical, e.g., "Discount," "BOGO," "Combo").
- Discount percentage or amount (numeric).
- Days since promotion started, days until promotion ends.
- Interaction features: e.g., promotion active on a weekend.
Price Features:
- Current selling price.
- Price change from previous period.
- Ratio of current price to average historical price (price elasticity proxy).
Product Metadata Features (Static, but useful for global models or cold start):
- Category, sub-category, brand (often one-hot encoded or target encoded).
- Perishability flag.
- Shelf life (numeric).
- Embeddings derived from product title/description (if using NLP to find similar products for cold start).
Dark Store Metadata Features (Static):
- Store location cluster/type (e.g., "residential," "commercial area," "student area").
- Store age.
External Regressors (Exogenous Variables):
- Weather data: Lagged and future (if weather forecast is available) temperature, precipitation for the store's location.
  - Example: Demand for ice cream might increase with `temperature_t+1`.
Interaction Features:
- E.g., `day_of_week * product_category`, `promotion_active * is_weekend`.

Train/Validation/Test Splitting for Time Series:

This is critical to avoid data leakage and get a realistic estimate of future performance.

Avoid Random Splitting: Randomly shuffling time series data and then splitting will lead to the model seeing future data during training, giving overly optimistic results.
Walk-Forward Validation (or Rolling Origin Validation):
1. Train on an initial period of data (e.g., first 60% of available history).
2. Validate on the next N days (e.g., next 7 days, matching our forecast horizon).
3. Slide the training window forward (e.g., by N days or by 1 day) and repeat the validation.
```
Train: [1...T]              Validate: [T+1...T+h]
Train: [1...T+1]            Validate: [T+2...T+h+1] (if retraining daily)
...
Test:  [Last_Available_Data_Point - h + 1 ... Last_Available_Data_Point] for final hold-out
                
```
This simulates how the model would be used in production: trained on past data to predict the future.
Fixed Hold-Out Test Set: Keep the most recent chunk of data (e.g., last 1-2 months) completely separate as a final test set, which the model never sees during training or hyperparameter tuning.
Per-SKU-Store Considerations: If training individual models per SKU-Store, ensure each series has enough data for meaningful splits. For global models, the split is on time across all series.

This structured approach to data preparation and feature engineering will provide a solid foundation for our forecasting models.

Detailed & Time-Series Aware: Candidate provides a comprehensive list of preprocessing steps and feature engineering ideas specific to time series forecasting, including crucial aspects like true demand estimation and appropriate train/val/test splitting (walk-forward validation).

Round 4: Model Selection & Architecture

I1 (Lead ML Engineer): That's a very solid plan for data preparation and feature engineering. Now, let's talk about the core modeling. Given the scale (thousands of SKUs per store, hundreds of stores) and the daily 7-day forecast requirement, what kind of models would you consider? Discuss their pros and cons in this context. Also, how would you address the cold start problem for new SKUs or new dark stores?

C: Choosing the right model involves balancing accuracy, scalability, interpretability, and ease of maintenance. For Zepto's scale, we'd likely need models that can handle many time series efficiently.

Model Selection Considerations:

Classical Time Series Models:
- Examples: ARIMA, SARIMA (Seasonal ARIMA), Exponential Smoothing (ETS), Theta method. Facebook's Prophet also falls somewhat in this category by decomposing time series.
- Pros:
  - Well-understood, statistically grounded.
  - Often good for individual time series with clear trend/seasonality.
  - Prophet is particularly good at handling holidays and seasonality, and is robust to missing data and outliers.
- Cons:
  - Typically univariate (though SARIMAX/ARIMAX can include exogenous regressors).
  - Fitting thousands/millions of individual models can be computationally intensive and hard to manage.
  - May not easily leverage cross-series information (e.g., similar products behave similarly).
  - Less flexible in incorporating complex non-linear relationships from many features.
- Zepto Context: Could be a good baseline, especially Prophet, or for very stable, high-volume SKUs. But likely not the primary solution for all SKUs due to scalability and cross-learning limitations.
Machine Learning Models (Tree-based, etc.):
- Examples: Gradient Boosting (LightGBM, XGBoost, CatBoost), Random Forest.
- Pros:
  - Excellent at handling tabular data with many features (our engineered lags, calendar, promo features, etc.).
  - Can capture complex non-linear relationships.
  - LightGBM/XGBoost are highly scalable and efficient.
  - Can be trained as a "global" model: one model trained on data from all SKU-Store time series, using SKU ID and Store ID (or their embeddings/features) as categorical features. This allows learning across series.
  - Relatively good interpretability (feature importance).
- Cons:
  - Require careful feature engineering to capture time series dynamics (lags, rolling stats are essential).
  - Don't inherently model time dependencies as well as sequence models unless features are very well crafted.
- Zepto Context: This is a very strong contender, especially LightGBM, due to its scalability and ability to use rich features. A global LightGBM model is likely a good primary approach.
Deep Learning Models (Sequence Models):
- Examples: RNNs (LSTMs, GRUs), Temporal Convolutional Networks (TCNs), Transformers adapted for time series (e.g., Informer, Autoformer). Amazon's DeepAR is also relevant here (probabilistic forecasting using RNNs).
- Pros:
  - Can automatically learn temporal dependencies and complex patterns from raw time series (or with minimal feature engineering).
  - Can naturally handle multivariate time series and exogenous variables.
  - Global DL models can learn shared representations across many time series using embeddings for SKU/Store IDs.
  - Can produce probabilistic forecasts (predicting a distribution, not just a point estimate), which is useful for setting safety stock.
- Cons:
  - Data-hungry: Typically require large amounts of data per series or many series for global models to perform well.
  - Computationally expensive to train and tune.
  - Can be harder to interpret (black-box nature).
  - More complex to implement and maintain.
- Zepto Context: A promising direction, especially for global modeling and probabilistic forecasts. Might be an iteration after establishing a strong tree-based baseline, or for specific categories where complex patterns are evident. TCNs or Transformer-based models could be more efficient than LSTMs for long sequences.

Proposed Primary Model & Architecture:

I would propose starting with a global Gradient Boosting model, likely LightGBM, as the primary workhorse due to its balance of performance, scalability, and feature handling capabilities.

Input: Each row in the training data would represent a `(SKU, Store, Date)` combination.
- Features: Lagged sales, rolling statistics, calendar features, promotion features, price features, product metadata (encoded), store metadata (encoded), weather data.
Target: Sales quantity for `Date + 1 day`, `Date + 2 days`, ..., `Date + 7 days`. This can be done by:
- Direct Multi-Output Forecasting: Train one model to predict all 7 days simultaneously (some GBDT libraries support this).
- Iterative Forecasting (Recursive): Train a model to predict `t+1`. Use the prediction for `t+1` as a feature to predict `t+2`, and so on. More prone to error accumulation.
- Independent Models per Horizon: Train 7 separate models, one for each day in the forecast horizon (Model_D1, Model_D2, ..., Model_D7). Computationally more expensive but can be more accurate. This is often a good approach with GBDTs.
I'd likely start by trying independent models per horizon or direct multi-output if supported well.

Hierarchical Forecasting (Optional Enhancement):

Forecasts might be more robust if reconciled across a hierarchy (e.g., total sales for a store, category sales within a store, SKU sales within a category).

Generate base forecasts at the SKU-Store level.
Generate forecasts at higher levels (e.g., Category-Store).
Use reconciliation methods (e.g., top-down, bottom-up, MinT) to ensure forecasts are consistent across the hierarchy. This can improve overall accuracy and stability.

Addressing Cold Start Problem:

This is crucial for new SKUs and new dark stores.

New SKUs (No Sales History):
- Attribute-Based Similarity: Find existing SKUs that are most similar to the new SKU based on its metadata (category, sub-category, brand, price range, product description embeddings).
- Initial Forecast: Use the (potentially scaled) forecast of the average or median of these similar SKUs. For example, if a new brand of 2-minute noodles is introduced, its initial forecast could be based on existing popular 2-minute noodle brands in that store.
- Faster Retraining/Adaptation: As soon as a few days/weeks of sales data for the new SKU become available, incorporate it quickly into model retraining or use a separate, rapidly adapting model for new items.
New Dark Stores (No Store-Specific History):
- Store Attribute Similarity: Find existing dark stores that are most similar based on location, demographics, size, and initial product assortment.
- Initial Forecast: Use an average or weighted average of forecasts from similar stores for the same SKUs, potentially adjusted for store size/capacity.
- Again, rapidly incorporate new store data as it becomes available.
Global Models Advantage: Global models (LightGBM or DL) that use product/store features/embeddings can inherently handle cold starts better if the new product/store shares features with existing ones the model has seen. The model can generalize from seen feature combinations.

My strategy would be to start with a robust global LightGBM, rigorously evaluate it, and then explore DL models or hierarchical approaches as further improvements.

Well-Reasoned Model Choices: Candidate discusses pros/cons of different model families in context, proposes a sensible primary model (global GBDT), and outlines practical strategies for multi-step forecasting and cold start.

Round 5: Scalability & System Architecture

I1 (Lead ML Engineer): Your plan to use a global LightGBM model for potentially millions of SKU-Store time series sounds reasonable for tackling the "many series" problem. Let's elaborate on the system architecture. How would you design the end-to-end pipeline to train this model, generate daily forecasts for all SKU-Store combinations, and make these forecasts available to downstream systems like inventory planning? Consider the scale.

C: Designing for this scale requires a robust, automated, and distributed system. Here’s a high-level architecture:

End-to-End System Architecture for Demand Forecasting:

I. Data Ingestion & Storage Layer:

Data Sources: As discussed (Sales DB, Promotions DB, Product Master, Store Master, Weather APIs, Holiday Calendars).
Ingestion:
- Batch Ingestion: Daily/hourly ETL jobs (e.g., using Apache Airflow to orchestrate Spark jobs or SQL transformations) to pull data from source systems into a data lake (e.g., AWS S3, Google Cloud Storage).
- Streaming Ingestion (Optional, for future near real-time features): Kafka for real-time sales/inventory updates if needed for very short-term reactive forecasting.
Data Lake/Warehouse:
- Store raw and processed data (e.g., S3).
- Use a data warehouse (e.g., Snowflake, BigQuery, Redshift) or a lakehouse architecture (e.g., Databricks Delta Lake) for structured, queryable historical data.
Feature Store (Highly Recommended):
- Centralized repository for curated features (lags, rolling stats, promo flags, calendar features).
  - Ensures consistency between training and serving.
- Supports batch computation for training and potentially low-latency retrieval for real-time inference if ever needed. Tools like Feast or Tecton.

II. Model Training Pipeline (Batch, e.g., Daily/Weekly):

Orchestration: Apache Airflow, Kubeflow Pipelines, or AWS Step Functions to manage the training workflow.
Data Preparation & Feature Engineering Job:
- A distributed processing job (e.g., Spark, Dask) that:
  - Reads historical sales, product, store, promotion, weather data.
  - Performs time series preprocessing (OOS handling, outlier cleaning).
  - Generates all the engineered features (lags, rolling windows, calendar, etc.) for each (SKU, Store, Date).
  - Constructs the final training dataset (tabular format).
  - Saves features to the Feature Store and/or the training dataset to the data lake.
Model Training Job:
- Reads the prepared training data.
- Trains the global LightGBM model (or potentially multiple models for different horizons/segments).
  - This can run on a distributed training framework if the dataset is very large (e.g., LightGBM on Spark, or using a managed ML platform like SageMaker, Vertex AI, Azure ML).
- Performs hyperparameter optimization (e.g., using Optuna, Hyperopt) with appropriate time series cross-validation.
- Evaluates the model on a validation set.
Model Registry:
- Version and store the trained model artifact (e.g., using MLflow, SageMaker Model Registry).
- Store model metadata (training parameters, evaluation metrics, lineage).
Model Deployment (for Batch Inference): The "deployed" model for batch forecasting is simply the registered artifact that the inference pipeline will pick up.

III. Batch Forecasting Pipeline (Daily):

Orchestration: Scheduled daily by Airflow (or similar).
Feature Generation for Prediction:
- For each (SKU, Store) combination, generate the required features for the next 7 days (or the prediction horizon). This involves using the most recent available data to compute lags, rolling stats, and fetching future calendar/promo/weather features. This also uses the Feature Store.
Prediction Job:
- Load the latest approved/production model from the Model Registry.
- Perform inference on the generated feature set for all (SKU, Store) combinations to get the 7-day forecasts.
  - This can also be a distributed job (Spark + LightGBM UDFs).
Post-processing & Storage:
- Apply any business rules or constraints (e.g., forecasts cannot be negative).
- Perform hierarchical reconciliation if implemented.
- Store the final forecasts in a database (e.g., PostgreSQL, MySQL, or a dedicated forecast DB) accessible by downstream systems.
  - Schema: `(SKU_ID, Store_ID, Forecast_Date, Forecast_Horizon_Day, Predicted_Quantity, Model_Version, Run_Timestamp)`.

IV. Forecast Serving Layer:

API Service: Downstream systems (inventory planning, replenishment, dark store operations dashboards) can query the forecast database via an API to get the latest forecasts.
Direct DB Access: For some batch processes (like generating purchase orders), direct access to the forecast DB might be used.

V. Monitoring & Alerting Layer (Covered in detail if asked):

Dashboards for forecast accuracy, data drift, model drift, system health.
Alerts for significant deviations or failures.

Technology Stack Considerations (Examples):

Data Lake: AWS S3, GCS, Azure Blob Storage.
Data Warehouse/Lakehouse: Snowflake, BigQuery, Redshift, Databricks Delta Lake.
ETL/Orchestration: Apache Airflow, Spark, AWS Glue, Azure Data Factory.
Feature Store: Feast, Tecton, or custom build on top of data lake/warehouse.
ML Platform/Training: SageMaker, Vertex AI, Azure ML, Databricks ML, or custom Kubeflow setup.
Model Registry: MLflow, SageMaker Model Registry.
Forecast DB: PostgreSQL, MySQL, Cassandra (if very high write load).
API: Python (FastAPI/Flask) + Docker + Kubernetes/Serverless.

This architecture emphasizes automation, scalability through distributed processing, and MLOps best practices like feature stores and model registries.

Scalable & End-to-End Architecture: Candidate outlines a comprehensive system architecture covering data ingestion, training, batch inference, and serving, with good technology choices and considerations for MLOps.

Round 6: System Rollout Strategy & Risk Management

I2 (Head of Supply Chain): This is a complex system, and our dark store operations heavily depend on accurate inventory. Rolling out a new forecasting system like this carries significant operational risk. If the forecasts are wildly off, we could face massive stockouts or huge amounts of wastage. How would you plan the rollout to de-risk it and ensure a smooth transition?

C: You're absolutely right. Operational impact is paramount. A "big bang" deployment would be irresponsible. We need a cautious, phased rollout with continuous validation and clear rollback plans.

Phased Rollout Strategy & Risk Management:

Phase 0: Offline Validation & Benchmarking (Pre-Rollout):
- Rigorous Backtesting: Extensively backtest the new model against historical data using walk-forward validation.
  - Compare its performance (MAE, MAPE, WMAPE, bias) against any existing forecasting methods or simple baselines (e.g., naive forecast, moving average).
- Segment Analysis: Analyze model performance across different product categories (e.g., perishables vs. non-perishables, fast vs. slow movers), store types, and regions. Identify potential weak spots.
- Sanity Checks: Ensure forecasts are plausible (e.g., not negative, not orders of magnitude different from recent history without good reason like a major promotion).
Phase 1: Shadow Mode Deployment (No Operational Impact):
- Deploy the new forecasting system to generate daily forecasts in parallel with the existing system (if any).
- These new forecasts are not used for actual inventory decisions yet.
- Purpose:
  - Compare new forecasts against actual sales and existing forecasts in a live environment.
  - Identify discrepancies, bugs, or unexpected behavior.
  - Allow the operations and category teams to review the new forecasts and build confidence/provide feedback.
- Duration: Several weeks to cover different scenarios (weekdays, weekends, minor events).
Phase 2: Pilot Program / Canary Release (Limited Live Impact):
- Select Pilot Stores/Categories: Choose a small, manageable set of dark stores (e.g., 2-3 stores in one city) or a few specific product categories (e.g., a mix of perishables and non-perishables) to go live with the new forecasts.
  - These pilot stores should ideally have co-operative staff who can provide detailed feedback.
- Intensive Monitoring:
  - Track inventory levels, stockout rates, wastage rates, and fulfillment times very closely for the pilot stores/categories.
  - Compare these KPIs against control stores/categories still using the old system.
  - Gather qualitative feedback from dark store managers and pickers.
- Refine & Iterate: Based on pilot results, make necessary adjustments to the model, features, or post-processing rules.
Phase 3: Gradual Expansion (A/B Testing on Operational KPIs):
- If the pilot is successful, gradually expand the rollout to more stores, cities, or categories.
- A/B Testing Approach:
  - For a given city or set of similar stores:
    - Group A (Control): Continues using the old forecasting method.
    - Group B (Treatment): Uses the new forecasting system for inventory decisions.
  - Measure and compare key operational KPIs (stockouts, wastage, inventory holding cost, sales) over a significant period (e.g., 4-8 weeks).
  - This provides quantitative evidence of the new system's impact.
Phase 4: Full Rollout & Continuous Monitoring:
- Once A/B tests demonstrate clear benefits and stability, proceed with full rollout.
- Even after full rollout, continuously monitor forecast accuracy and its impact on business KPIs.

Risk Management & Mitigation:

Clear Go/No-Go Criteria: Define clear metrics and thresholds at each phase to decide whether to proceed with the next phase of rollout.
Rollback Plan: At every stage, have a well-defined and tested plan to quickly revert to the previous forecasting system or manual overrides if the new system causes significant issues.
Forecast Overrides/Adjustments: Initially, provide a mechanism for experienced planners or category managers to review and manually adjust system-generated forecasts, especially for critical items or during uncertain periods. Log these overrides to learn from them.
Communication & Training: Thoroughly train operations teams, inventory planners, and dark store staff on how the new system works, its expected outputs, and how to interpret the forecasts.
Dedicated Support Team: Have a dedicated team ready to address any issues or questions during the rollout phases.

This careful, data-driven rollout approach helps build confidence, identify problems early, and minimize the risk of large-scale operational disruption.

Cautious & Data-Driven Rollout: Candidate proposes a well-structured phased rollout, including shadow mode, pilot programs, and A/B testing, with strong emphasis on risk mitigation, monitoring, and communication.

Round 7: Model Evaluation

I1 (Lead ML Engineer): Your rollout plan is solid. Let's talk about how you'd specifically evaluate the performance of your forecasting models, both offline during development and online once the system is impacting inventory decisions. What metrics are most important for Zepto?

C: Evaluating a demand forecasting system requires a mix of statistical accuracy metrics and business-oriented KPIs.

Offline Model Evaluation (During Development & Retraining):

These are calculated on a hold-out validation set using walk-forward validation.

Scale-Dependent Error Metrics:
- Mean Absolute Error (MAE): `Average(|Actual - Forecast|)`. Easy to understand, shows average error in units.
- Root Mean Squared Error (RMSE): `Sqrt(Average((Actual - Forecast)^2))`. Penalizes larger errors more heavily.
Percentage Error Metrics:
- Mean Absolute Percentage Error (MAPE): `Average(|(Actual - Forecast) / Actual|) * 100`.
  - Caution: Can be problematic with zero or near-zero actuals (division by zero, or explodes). Can be skewed by low-volume items.
- Weighted Mean Absolute Percentage Error (WMAPE) / Mean Absolute Scaled Error (MASE):
  - WMAPE (Volume-Weighted): `Sum(|Actual - Forecast|) / Sum(Actual) * 100`. This is often a key metric in retail as it gives more weight to high-volume items where forecast accuracy matters more for overall business impact. This would be very relevant for Zepto.
  - MASE: Compares the model's MAE to the MAE of a naive baseline (e.g., seasonal naive forecast). MASE < 1 means the model is better than naive.
Forecast Bias:
- Mean Error (ME) or Mean Percentage Error (MPE): `Average(Actual - Forecast)`. Helps identify systematic over-forecasting (ME < 0) or under-forecasting (ME > 0). We want this to be close to zero.
Quantile Loss (for Probabilistic Forecasts):
- If the model produces quantile forecasts (e.g., P10, P50, P90), we can use quantile loss (pinball loss) to evaluate the accuracy of these specific quantiles. This is important for setting safety stock based on uncertainty.
  - For Zepto, P90 or P95 forecasts might be used to minimize stockouts.
Segment-Level Evaluation:
- Evaluate metrics separately for different segments:
  - Product categories (e.g., fresh produce, dairy, pantry staples).
  - Fast-movers vs. slow-movers.
  - Promotional vs. non-promotional periods.
  - Different dark stores or city clusters.
  This helps identify where the model performs well or poorly.

Online Evaluation & Business KPIs (Post-Deployment, often via A/B Testing):

These measure the actual impact of the forecasts on operations.

Inventory & Availability Metrics:
- Stockout Rate / Out-of-Stock Percentage (OOS%): Percentage of (SKU-Store-Day) instances where an item was OOS. This is a primary KPI for Zepto. Target: Significant reduction.
- Service Level: Percentage of demand met from available stock.
- Lost Sales Estimation: Estimated revenue lost due to stockouts.
Wastage/Expiry Metrics (Especially for Perishables):
- Wastage Rate: Percentage of stock (by value or quantity) that expires or becomes unsellable. Critical for fresh categories. Target: Significant reduction.
Inventory Efficiency Metrics:
- Inventory Holding Cost: Cost associated with storing inventory.
- Inventory Turnover: How quickly inventory is sold. Higher is generally better.
- Days of Supply (DOS): How many days current inventory would last based on current sales rate.
Operational Efficiency:
- Order Fulfillment Time: While not directly a forecast metric, accurate stock levels contribute to faster picking and delivery.
- Replenishment Efficiency: Smoother, more predictable replenishment cycles.
Customer Satisfaction (Indirect):
- Reduction in customer complaints related to item unavailability.
- Improvement in overall CSAT scores.

For Zepto, the most critical metrics would likely be:

Offline: WMAPE (volume-weighted), Forecast Bias (ME/MPE), and Quantile Loss (if using probabilistic forecasts for safety stock).
Online (via A/B testing): Stockout Rate, Wastage Rate (especially for perishables like milk, bread, vegetables), and potentially overall sales lift in treated stores/categories.

We'd need dashboards to track these metrics continuously and trigger alerts if performance degrades significantly.

Comprehensive & Business-Aware Evaluation: Candidate covers a wide range of relevant offline statistical metrics and online business KPIs, correctly identifying the most critical ones for Zepto (WMAPE, stockout rate, wastage rate).

Round 8: Ethical Considerations & Bias

I2 (Head of Supply Chain): This is good. Now, a system that influences what products are available in which neighborhood dark stores can have unintended consequences. What ethical considerations or potential biases should we be mindful of when designing and deploying this demand forecasting system for Zepto, especially in a diverse market like India?

C: That's a very important consideration. A demand forecasting system, while aiming for efficiency, can inadvertently perpetuate or introduce biases if not carefully designed and monitored.

Ethical Considerations & Bias Mitigation for Zepto's Demand Forecasting:

Bias from Historical Data (Stockouts & Underrepresentation):
- Concern: If certain products (e.g., niche regional items, specific dietary preference items like vegan products) were frequently out of stock in the past in certain neighborhoods due to poor prior forecasting or supply issues, the historical sales data will underrepresent their true demand. The new system might learn this pattern and continue to under-forecast for these items, creating a self-fulfilling prophecy of unavailability. This could disproportionately affect certain customer segments or cultural preferences.
- Mitigation:
  - Robust True Demand Estimation: Invest heavily in accurately estimating demand during OOS periods. This is crucial.
  - Monitor Forecasts for Consistently Low-Stocked Items: Flag SKUs that are consistently under-forecasted and frequently OOS, especially if they cater to specific demographic or regional needs. Investigate if this is a model bias or a persistent supply chain issue.
  - Feedback from Store Managers: Dark store managers often have good local knowledge. Provide a channel for them to flag items they believe are consistently understocked despite demand.
Geographical/Socioeconomic Bias (Assortment & Availability):
- Concern: If the model (or the data it's trained on) implicitly learns correlations between neighborhood socioeconomic status/demographics and product demand, it might lead to systematically different (and potentially less diverse or lower quality) assortments in less affluent areas compared to wealthier ones, even if latent demand for certain products exists. For example, under-forecasting healthier or premium options in lower-income areas.
- Mitigation:
  - Audit Assortment Diversity: Regularly audit the diversity and quality of product assortments forecasted and stocked across different neighborhood types.
  - Fairness-Aware Modeling (Advanced): Explore techniques that can incorporate fairness constraints or re-weight samples to ensure equitable representation of demand across different demographic groups, if such data is available and ethically usable.
  - Local Curation Input: Allow for some level of local curation or minimum assortment guarantees for essential items across all stores, regardless of purely data-driven forecasts for those specific items.
Bias Towards Popular/Mainstream Products:
- Concern: Models might naturally become better at forecasting high-volume, mainstream products, potentially at the expense of accurately forecasting demand for niche, local, or emerging items. This could reduce product diversity over time.
- Mitigation:
  - Segmented Modeling/Evaluation: Evaluate model performance specifically for long-tail or niche items. Consider separate modeling strategies or heuristics if global models underperform here.
  - Cold Start Strategies: Ensure good cold-start strategies for new and niche items to give them a fair chance.
  - Protect Diversity Goals: Business rules might be needed to ensure a certain level of diversity in stocked items, even if some have lower or more volatile forecasted demand.
Impact of Promotions:
- Concern: If promotional data is biased (e.g., promotions historically targeted only certain customer segments or product types), the model might learn to only boost forecasts for those segments during promotions, missing opportunities elsewhere.
- Mitigation:
  - Ensure promotion data used for training is representative or that the model can generalize promotional lift effects.
  - Carefully analyze promotional lift across different product types and customer demographics.
Transparency & Explainability (Internal):
- Concern: If inventory planners or category managers don't understand why the model is making certain forecasts, they may not trust it or may override it incorrectly.
- Mitigation:
  - Provide feature importance scores from the model.
  - Show confidence intervals or prediction intervals if using probabilistic forecasts.
  - Develop tools to allow planners to see key drivers for a particular forecast (e.g., recent trend, upcoming holiday, active promotion).

Regular audits for fairness, continuous monitoring of impact on different segments, and a strong feedback loop involving local store insights and category management are crucial to mitigate these ethical risks and ensure the system serves all customers equitably.

Thoughtful Ethical Considerations: Candidate identifies several relevant biases (historical stockouts, geographical, popularity) and proposes concrete mitigation strategies, emphasizing true demand estimation, fairness audits, and local input.

Round 9: Advanced Technical Challenge & Trade-offs

I1 (Lead ML Engineer): This is a thorough plan. Let's consider a specific advanced challenge. Zepto operates in a dynamic environment. How would your forecasting system adapt to sudden, unpredictable external events not easily captured by historical data or pre-defined features? For example, a sudden city-wide announcement of a 3-day festival leading to an unexpected surge in demand for specific items, or a flash competitor sale significantly drawing demand away for a short period. How can the system be made more resilient or reactive to such shocks?

C: That's a great question. Standard time series models trained on historical patterns often struggle with truly novel, unannounced shocks. Making the system resilient and reactive requires a multi-pronged approach:

Adapting to Sudden, Unpredictable Events:

Near Real-Time Anomaly Detection & Alerting on Demand Signals:
- Monitor Point-of-Sale (POS) Data: Continuously monitor actual sales data at a granular level (e.g., hourly or even every 10-15 minutes for very fast movers) for significant deviations from short-term forecasts or typical patterns.
  - If sales for "sweet boxes" or "pooja items" suddenly spike across multiple stores in Hyderabad, it might indicate an uncaptured local event or festival.
- Alerting System: If such anomalies are detected (e.g., sales > 3 standard deviations above expected for a sustained period), trigger alerts to human planners/category managers.
Rapid Manual Override & Adjustment Capabilities:
- Provide an easy-to-use interface for operations or category teams to quickly input information about such events (e.g., "Unexpected local festival in Area X, expect +50% demand for sweets & flowers for 2 days") and apply a temporary uplift or suppression to the system's baseline forecasts for affected SKUs/stores.
- This human-in-the-loop intervention is crucial for events the model couldn't have known.
Short-Term Reactive Models / Nowcasting (More Advanced):
- Alongside the main 1-7 day forecast, develop very short-term "nowcasting" models (e.g., predicting demand for the next 1-4 hours) that are highly sensitive to the most recent sales trends and external signals (if any can be streamed, like social media sentiment for very specific events).
- These models could use simpler techniques (e.g., exponential smoothing with a very short alpha, or a very reactive ARIMA) and be retrained/updated much more frequently (e.g., hourly). Their output can inform immediate operational decisions within dark stores (e.g., prioritize picking for surging items).
Exogenous Shock Event Logging & Feature Creation (Post-Event):
- Once such an event is identified (either by anomaly detection or human input), log it meticulously: event type, start/end time, affected SKUs/stores, estimated impact.
- Over time, build a library of these "shock events." For future model retraining, try to create features representing these past shocks (e.g., a binary flag for "similar_past_local_festival_type_X"). This allows the model to potentially learn from past, similar (though not identical) shocks if they have recurring characteristics. This is hard because shocks are often unique.
Scenario-Based Adjustments / What-If Analysis Tools:
- Provide tools for planners to simulate the impact of potential shocks. For example, "If competitor Y runs a 50% off on dairy, what's the likely impact on our dairy sales?" This might require some pre-defined rules or simpler elasticity models.
Model Ensembles & Robustness:
- Sometimes, an ensemble of different model types (e.g., a statistical model + a GBDT model) can be more robust to certain types of shocks if one model is less affected than another.
Focus on Resilience in Inventory Strategy:
- Recognize that no forecast will be perfect for extreme unpredictable events. Part of the solution lies in inventory strategy:
  - Slightly higher safety stock for A-class items.
  - Agile replenishment capabilities to quickly restock if a sudden surge occurs.
  - Fast communication channels between dark stores and central supply chain teams.

For truly "black swan" events, the system's primary role shifts from perfect prediction to rapid detection, enabling quick human intervention, and then learning from the event to improve (marginally) for any future similar occurrences.

Handling Shocks & Unpredictability: Candidate proposes a practical multi-layered approach involving anomaly detection, human overrides, reactive short-term models, and learning from past events, acknowledging the limits of pure prediction for novel shocks.

I2 (Head of Supply Chain): That makes sense. One final question on trade-offs. For Zepto, stockouts are very costly (lost sales, customer dissatisfaction, damage to our 10-minute promise). Wastage of perishables is also a significant cost. If you had to tune your primary forecasting model, and there was a clear trade-off, would you err on the side of slight over-forecasting (risking more wastage but fewer stockouts) or slight under-forecasting (risking more stockouts but less wastage)? How would your choice of loss function reflect this?

C: Given that stockouts are considered "slightly more costly" and directly impact Zepto's core value proposition of rapid delivery and customer retention, I would lean towards tuning the system to err on the side of slight over-forecasting, especially for A-class (high-volume/high-impact) items and key perishables where availability is paramount.

Reasoning & Loss Function Implication:

Asymmetric Business Costs:
- The cost of a stockout includes not just the lost margin on that sale, but also potential loss of a customer to a competitor, and damage to Zepto's brand promise. This "customer lifetime value" impact can be substantial.
- The cost of over-forecasting for non-perishables is mainly holding cost. For perishables, it's the cost of goods + disposal, which is significant, but perhaps still less than losing a customer.
Loss Function Choice/Modification:
- Standard Loss Functions (MAE, MSE): These treat over-prediction and under-prediction errors symmetrically.
- Asymmetric Loss Function: We could use a custom loss function that penalizes under-prediction errors more heavily than over-prediction errors. For example, a modified Mean Squared Error:
```
Loss = if (Actual > Forecast):  Cost_Underforecast * (Actual - Forecast)^2
       else:                   Cost_Overforecast * (Actual - Forecast)^2
```
  Where `Cost_Underforecast > Cost_Overforecast`. The ratio of these costs would need to be determined through business analysis (e.g., `Cost_Underforecast` might be 1.2 to 1.5 times `Cost_Overforecast`).
- Quantile Regression / Quantile Loss: This is a very natural way to achieve this. Instead of predicting the mean (P50 quantile), we could train the model to predict a higher quantile, for instance, the P70 or P75 quantile of the demand distribution. Predicting the 75th percentile means that, by definition, we expect to have enough stock 75% of the time, thus inherently building in a buffer against stockouts and accepting a higher chance of some overstock. The quantile loss function automatically handles this asymmetry.
```
QuantileLoss_q(y, y_pred) = q * max(0, y - y_pred) + (1-q) * max(0, y_pred - y)
```
  By choosing `q > 0.5` (e.g., `q=0.75`), we penalize under-predictions more. LightGBM and some deep learning frameworks support quantile regression directly.
Segment-Specific Strategy:
- This bias towards over-forecasting might be applied more aggressively for:
  - High-margin / High-volume (A-class) SKUs.
  - Key Value Items (KVIs) that customers expect Zepto to always have (e.g., milk, bread, onions, tomatoes).
  - SKUs with very high stockout costs.
- For very low-margin, highly perishable, or very slow-moving items, we might accept a slightly lower service level (i.e., forecast closer to the median or a lower quantile) to minimize wastage.

So, the primary approach would be to use quantile regression to target a higher percentile (e.g., P70-P80) of the demand distribution for most items, or implement a custom asymmetric loss function. This directly aligns the model's optimization objective with the business priority of minimizing stockouts while being mindful of wastage.

Strategic Trade-off with Loss Function: Candidate makes a clear choice based on business priorities (penalizing stockouts more) and correctly links this to using asymmetric loss functions or quantile regression (targeting a higher quantile) to steer the model appropriately.

Interview Conclusion

I1 (Lead ML Engineer): This has been an incredibly thorough and well-reasoned discussion. You've demonstrated a deep understanding of time series forecasting, the practical challenges of scaling such a system, and the nuances of applying it in a dynamic business like Zepto. Your structured approach, from data understanding to model deployment and evaluation, is exactly what we look for.

I2 (Head of Supply Chain): I'm very impressed. You didn't just focus on the algorithms; you consistently tied your technical decisions back to our business needs – minimizing stockouts, managing wastage, and ensuring a great customer experience. Your thoughts on handling unpredictable events and ethical considerations were particularly insightful.

C: Thank you. I really enjoyed discussing this problem. Zepto's operational model presents fascinating ML challenges, and aligning the forecasting system with those core business drivers is key. I'd be keen to learn more about the current scale of your data and any specific pain points you're facing with existing forecasting approaches.

I1 (Lead ML Engineer): We'd be happy to discuss that further. Based on this conversation, we're very keen to proceed to the next steps. Expect to hear from HR soon.

What to Learn from This Case

Clarify Time Series Specifics: Understand demand definition, forecast horizon/granularity, stockout handling, and business costs of over/under forecasting.
Rich Data Sources are Key: Identify internal (sales, promotions, product/store metadata, OOS data) and external (calendar, weather) data. Anticipate quality issues.
Time Series Feature Engineering is Crucial: Detail lags, rolling stats, calendar/holiday features, promotion effects, price features, and external regressors. Explain how they capture temporal dynamics.
Correct Time Series Splitting: Emphasize walk-forward validation or rolling origin validation to prevent data leakage and realistically simulate production deployment.
Model Selection Trade-offs: Discuss pros/cons of classical, ML (tree-based), and DL (sequence) models in the context of scalability, feature richness, and interpretability. Global models are often preferred for many series.
Address Cold Start: Have clear strategies for new products and new locations using attribute similarity or leveraging global model features.
Scalable System Architecture: Design for batch training/inference pipelines using orchestration tools, distributed processing, feature stores, and model registries.
Phased Rollout & Risk Management: Detail shadow mode, pilot programs, A/B testing on operational KPIs, and clear rollback plans.
Time Series Evaluation Metrics: Use MAE, RMSE, MAPE, WMAPE, MASE, forecast bias, and quantile loss. Connect these to business KPIs like stockout rate, wastage, and inventory turnover.
Ethical Considerations & Bias: Discuss potential biases from historical data (stockouts), geographical disparities, popularity, and how to mitigate them through data, modeling, and process.
Handling Unpredictable Shocks: Propose strategies like anomaly detection, human overrides, reactive short-term models, and learning from past events.
Align Loss Functions with Business Objectives: Use asymmetric loss functions or quantile regression to penalize more costly errors (e.g., under-forecasting leading to stockouts).
Communicate Business Impact: Consistently link technical decisions to their impact on business objectives and user experience.