Zepto Demand Forecasting
The Challenge: Hyperlocal Demand Forecasting for Zepto
You're an ML Engineer at Zepto, the quick commerce grocery delivery platform. Zepto operates numerous "dark stores" (micro-fulfillment centers) across various Indian cities. To ensure optimal inventory and minimize stockouts/wastage, Zepto needs an accurate demand forecasting system. How would you design a system to forecast the demand for thousands of SKUs (e.g., 5,000-10,000 SKUs per dark store, across 100s of dark stores) for the next 1-7 days, at a daily or even hourly granularity, considering promotions, holidays, and hyperlocal factors?
Initial Thoughts & Clarifications
- Definition of "Demand": Is it actual sales, or sales adjusted for stockouts (true demand)? How to estimate true demand?
- Forecasting Horizon & Granularity: Next 1-7 days is mentioned. Is daily sufficient, or is hourly needed for specific fast-moving items or operational planning within the dark store?
- Product Scope: All SKUs? Or focus on top N% movers initially? How to handle long-tail items?
- Geographical Scope: Forecast per SKU per dark store? Or aggregate at a city level first?
- Data Sources: What historical data is available? (Sales transactions with timestamps, SKU ID, dark store ID, price, promotions applied). Product metadata (category, brand, perishability, shelf life). Store characteristics (size, location demographics). External data (holidays, weather, local events, competitor activity).
- Key Business Goals: Minimize stockouts (lost sales, bad CX), minimize wastage (especially for perishables), optimize inventory holding costs, improve delivery ETAs by having products ready. What are the relative costs of over-forecasting vs. under-forecasting?
- Scalability: Forecasting for potentially millions of (SKU x Dark Store) time series.
- Cold Start Problem: How to forecast for new SKUs or new dark stores with limited history?
- Existing Systems: Any current forecasting methods in place? What are their limitations?
- Problem Definition & Scope:
- Define forecast target (demand), horizon, granularity (daily/hourly), and scope (SKU x Store).
- Identify key business objectives and trade-offs.
- Data Sources & Understanding:
- Identify all relevant internal and external data sources.
- Analyze data quality, completeness, and potential biases (e.g., stockouts).
- Data Preparation & Feature Engineering for Time Series:
- Time series preprocessing: Handling missing sales data, outlier detection/treatment, demand estimation during stockouts.
- Feature creation: Lagged sales, rolling statistics (mean, std, min, max over various windows), calendar features (day of week, week of year, month, holidays - including regional ones like Ugadi, Sankranti, Diwali, local festivals), promotion flags/durations, price elasticity features.
- Product/Store features: Category, brand, perishability, shelf life, store location type, weather data for store location.
- Train/validation/test splitting: Time-series cross-validation (e.g., walk-forward validation, rolling origin validation) to prevent data leakage.
- Model Selection & Architecture:
- Explore options:
- Classical: ARIMA, SARIMA, Exponential Smoothing (ETS), Prophet.
- Machine Learning: Gradient Boosting (XGBoost, LightGBM), Random Forest (often need careful feature engineering for time series).
- Deep Learning: RNNs (LSTM, GRU), Temporal Convolutional Networks (TCNs), Transformers adapted for time series.
- Consider model per time series vs. global models (that learn across many time series using product/store embeddings).
- Hierarchical forecasting (e.g., reconcile forecasts from SKU level to category level to store total).
- Cold start strategies for new products/stores.
- Explore options:
- Scalability & System Architecture:
- Design for training and inference for millions of time series.
- Batch processing for model retraining and daily/weekly forecast generation.
- Near real-time updates if hourly forecasts or very reactive promotion handling is needed.
- Technology stack for data processing (Spark), model training (distributed frameworks), model serving, and forecast storage.
- Rollout Strategy & Risk Management:
- Phased rollout (e.g., by city, by category, by store).
- Shadow mode deployment to compare against existing methods.
- A/B testing forecast impact on operational KPIs.
- Rollback plans. Communication with operations teams.
- Evaluation:
- Time series metrics: MAE, RMSE, MAPE, WMAPE (Weighted MAPE, often weighted by sales volume/value). Quantile loss for probabilistic forecasts.
- Forecast bias analysis.
- Business metrics: Stockout rate reduction, wastage reduction, inventory turnover improvement, impact on order fulfillment times.
- Human-in-the-Loop & Continuous Improvement:
- Mechanism for operations teams/category managers to review and potentially adjust forecasts for exceptional situations.
- Feedback loop to retrain models with corrected data or new features.
- Monitoring for concept drift (e.g., changing customer behavior).
- Ethical Considerations & Bias Mitigation:
- Ensure fairness in forecasting if it impacts regional product assortments or availability.
- Be aware of biases in historical data (e.g., stockouts under-representing true demand for certain items).
Simulated Conversation
Round 1: Problem Understanding & Scope Definition
- Definition of "Demand": Are we forecasting actual sales, or are we trying to estimate "true demand" by accounting for periods when items were out of stock? Estimating true demand is harder but gives a better picture for inventory planning.
- Forecast Granularity: The horizon is 1-7 days. Is a daily forecast per SKU per dark store the primary goal? Or is there a need for sub-daily (e.g., hourly) forecasts for very fast-moving items or for intra-day operational adjustments?
- Product Scope: Are we forecasting for all SKUs, including very slow-moving or new ones? Or do we prioritize, say, the top 80-90% of SKUs by sales volume/frequency initially?
- Geographical Granularity: The core unit seems to be SKU x Dark Store. Is there any need for aggregated forecasts at a city or regional level for higher-level planning?
- Key Business Objectives & Trade-offs: The main goals are minimizing stockouts and wastage. What's the relative business cost of under-forecasting (leading to stockouts and lost sales) versus over-forecasting (leading to wastage for perishables, or higher holding costs)? This will influence our choice of loss functions and model evaluation.
- New Product/Store Handling: How frequently are new SKUs introduced or new dark stores launched? We'll need a strategy for the "cold start" problem.
- Demand: Ideally, true demand. We have data on when items were marked out-of-stock (OOS).
- Granularity: Let's start with daily forecasts per SKU per dark store for the next 7 days. Hourly could be a future enhancement.
- Product Scope: Let's aim for comprehensive coverage, but we can prioritize models or apply simpler heuristics for very long-tail items if needed.
- Geo Granularity: SKU x Dark Store is primary. Aggregates can be derived later.
- Trade-offs: Stockouts are very bad for our 10-minute promise and customer retention. Wastage is also a concern, especially for fresh produce like vegetables, fruits, dairy – think items like Amul milk packets, ID dosa batter, fresh coriander. For now, let's assume stockouts are slightly more costly.
- New Products/Stores: New SKUs are added weekly, new stores monthly. A cold start solution is essential.
Round 2: Data Sources & Understanding
Potential Data Sources:
- Internal Data:
- Historical Sales Data: This is the primary time series.
- Key fields: Timestamp of sale, SKU ID, Dark Store ID, quantity sold, selling price, any discounts/promotions applied at item level.
- Granularity: Ideally, transaction-level, which can be aggregated to daily/hourly.
- Stockout Data: Information on when an SKU was out of stock at a specific dark store. This is crucial for estimating true demand.
- Fields: SKU ID, Dark Store ID, OOS start timestamp, OOS end timestamp.
- Promotions Data:
- Fields: Promotion ID, SKU ID(s) involved, Dark Store ID(s) where active, promotion type (e.g., discount, BOGO, combo), start/end date, promotion mechanics.
- This helps model promotional lift.
- Product Metadata (SKU Master):
- Fields: SKU ID, product name, description, category (e.g., "Dairy," "Fresh Vegetables," "Snacks"), sub-category, brand, pack size, price, perishability flag, shelf life, country of origin, any specific attributes like "organic," "vegan."
- Important for understanding product characteristics and for cold start (finding similar products).
- Dark Store Metadata:
- Fields: Dark Store ID, location (latitude/longitude, city, pincode), size/capacity, opening date, operating hours, surrounding neighborhood demographics (if available).
- Inventory Data (Optional but useful): Current and historical inventory levels at dark stores. Can help validate stockout periods or understand replenishment cycles.
- Pricing History: Changes in base price over time for each SKU.
- Historical Sales Data: This is the primary time series.
- External Data (Contextual Factors):
- Calendar/Holiday Data: Public holidays (national and regional - e.g., Diwali, Eid, Christmas, local city festivals like Ganesh Chaturthi in Mumbai/Hyderabad, Ugadi), special event days (e.g., big cricket matches, New Year's Eve).
- Weather Data: For each dark store's location: daily temperature (min/max/avg), precipitation, humidity, extreme weather events. This can impact demand for certain categories (e.g., ice cream, hot beverages).
- Local Events Data (Harder to get systematically): Major local festivals, school holidays, potentially even local strikes or disruptions if they significantly impact movement and demand.
- Competitor Data (Very hard to get): Information on major competitor promotions or stock situations, if ethically and legally obtainable. This is usually a stretch.
- Macroeconomic Indicators (Less likely for short-term forecast): Inflation, GDP growth (more relevant for long-term strategic forecasting).
Potential Data Challenges & Quality Issues:
- Stockout Data Accuracy: Is the OOS data reliably captured? There might be delays in marking items OOS, or brief periods of OOS might be missed.
- True Demand Estimation: Even with OOS data, estimating how much more would have sold is non-trivial. Simplistic imputation might be needed initially.
- Promotion Data Completeness: Are all types of promotions (centralized, store-specific, ad-hoc) captured systematically? Are their actual start/end times accurate?
- Data Granularity Mismatches: Sales might be at transaction level, promotions at daily level, weather at hourly/daily. Aggregation/disaggregation needs care.
- Product Lifecycle: SKUs get introduced, become popular, and then might decline or be delisted. Handling these lifecycle stages is important.
- Data Sparsity: Many SKUs (long-tail items) will have very sparse sales history (many zero-sale days), making individual forecasting difficult.
- Outliers & Anomalies: Unusual sales spikes (e.g., bulk order by one customer, data entry errors) can distort the historical pattern.
- Missing Values: In any of the data streams.
- Time Zone Consistency: Ensure all timestamped data is in a consistent time zone (e.g., UTC or IST).
Round 3: Data Preparation & Feature Engineering for Time Series
Time Series Preprocessing:
- Data Aggregation:
- Aggregate raw transaction data to the desired forecast granularity (e.g., daily sales quantity per SKU per Dark Store).
- Handling Missing Sales Data (Zero Sales vs. True Missing):
- For days where an SKU-Store combination has no sales record, we need to distinguish if it was a true zero-sale day (product was available but not sold) or if data is missing. If inventory data is available, it can help. If not, we might assume zero sales if the product was active.
- True Demand Estimation (Addressing Stockouts):
- For periods where an SKU was marked OOS:
- Simple Imputation: Replace OOS period sales with an average of sales from surrounding non-OOS periods (e.g., average of 2 days before and 2 days after OOS, or average for the same day of the week in previous weeks).
- Model-based Imputation (Advanced): Train a separate model to predict sales during OOS, using features like sales before OOS, day of week, promotions, etc. This is more complex.
- Initially, I'd start with simpler imputation methods and flag these imputed values.
- For periods where an SKU was marked OOS:
- Outlier Detection and Treatment:
- Identify sales figures that are abnormally high or low (e.g., using rolling standard deviations, IQR method).
- For Zepto, a sudden bulk order for a typically low-volume item by a single customer could be an outlier.
- Treatment: Cap/floor outliers, or replace with a rolling median/mean, or treat as missing and impute. Careful not to remove true demand spikes caused by valid events.
- Identify sales figures that are abnormally high or low (e.g., using rolling standard deviations, IQR method).
- Time Series Alignment: Ensure all time series (sales, promotions, weather, etc.) are aligned on the same daily/hourly index for each SKU-Store combination. Forward-fill or back-fill for features where appropriate (e.g., a promotion is active for the whole day).
Feature Engineering:
The goal is to create features that capture trend, seasonality, promotional effects, and other influencing factors.
- Lagged Features (Autoregressive component):
- Lagged sales values: Sales from `t-1` day, `t-2` days, ..., `t-7` days, `t-14` days, `t-28` days. The choice of lags depends on seasonality and product lifecycle.
- Example: For `milk`, `t-1` and `t-7` (same day last week) are likely very important.
- Lagged sales values: Sales from `t-1` day, `t-2` days, ..., `t-7` days, `t-14` days, `t-28` days. The choice of lags depends on seasonality and product lifecycle.
- Rolling Window Statistics:
- Rolling mean, median, min, max, std dev of sales over various past windows (e.g., last 3 days, 7 days, 14 days, 28 days).
- Example: `rolling_mean_sales_7d`, `rolling_std_sales_28d`.
- These capture recent trends and volatility.
- Rolling mean, median, min, max, std dev of sales over various past windows (e.g., last 3 days, 7 days, 14 days, 28 days).
- Date & Calendar Features:
- Day of the week (encoded, e.g., one-hot or cyclical).
- Day of the month, day of the year.
- Week of the month, week of the year.
- Month, Quarter, Year.
- Is_weekend flag.
- Holiday Features:
- Binary flag for public holidays (national & regional relevant to the dark store's city).
- Consider Telugu festivals like Ugadi, Sankranti, Dasara, and national ones like Diwali, Eid, Christmas.
- Days before/after a major holiday (e.g., `days_until_diwali`, `days_after_diwali`).
- Special event days (e.g., major cricket matches).
- Binary flag for public holidays (national & regional relevant to the dark store's city).
- Promotion Features:
- Binary flag: `is_on_promotion` (1 if active, 0 otherwise).
- Promotion type (categorical, e.g., "Discount," "BOGO," "Combo").
- Discount percentage or amount (numeric).
- Days since promotion started, days until promotion ends.
- Interaction features: e.g., promotion active on a weekend.
- Price Features:
- Current selling price.
- Price change from previous period.
- Ratio of current price to average historical price (price elasticity proxy).
- Product Metadata Features (Static, but useful for global models or cold start):
- Category, sub-category, brand (often one-hot encoded or target encoded).
- Perishability flag.
- Shelf life (numeric).
- Embeddings derived from product title/description (if using NLP to find similar products for cold start).
- Dark Store Metadata Features (Static):
- Store location cluster/type (e.g., "residential," "commercial area," "student area").
- Store age.
- External Regressors (Exogenous Variables):
- Weather data: Lagged and future (if weather forecast is available) temperature, precipitation for the store's location.
- Example: Demand for ice cream might increase with `temperature_t+1`.
- Weather data: Lagged and future (if weather forecast is available) temperature, precipitation for the store's location.
- Interaction Features:
- E.g., `day_of_week * product_category`, `promotion_active * is_weekend`.
Train/Validation/Test Splitting for Time Series:
This is critical to avoid data leakage and get a realistic estimate of future performance.
- Avoid Random Splitting: Randomly shuffling time series data and then splitting will lead to the model seeing future data during training, giving overly optimistic results.
- Walk-Forward Validation (or Rolling Origin Validation):
- Train on an initial period of data (e.g., first 60% of available history).
- Validate on the next N days (e.g., next 7 days, matching our forecast horizon).
- Slide the training window forward (e.g., by N days or by 1 day) and repeat the validation.
Train: [1...T] Validate: [T+1...T+h] Train: [1...T+1] Validate: [T+2...T+h+1] (if retraining daily) ... Test: [Last_Available_Data_Point - h + 1 ... Last_Available_Data_Point] for final hold-out
- Fixed Hold-Out Test Set: Keep the most recent chunk of data (e.g., last 1-2 months) completely separate as a final test set, which the model never sees during training or hyperparameter tuning.
- Per-SKU-Store Considerations: If training individual models per SKU-Store, ensure each series has enough data for meaningful splits. For global models, the split is on time across all series.
This structured approach to data preparation and feature engineering will provide a solid foundation for our forecasting models.
Round 4: Model Selection & Architecture
Model Selection Considerations:
- Classical Time Series Models:
- Examples: ARIMA, SARIMA (Seasonal ARIMA), Exponential Smoothing (ETS), Theta method. Facebook's Prophet also falls somewhat in this category by decomposing time series.
- Pros:
- Well-understood, statistically grounded.
- Often good for individual time series with clear trend/seasonality.
- Prophet is particularly good at handling holidays and seasonality, and is robust to missing data and outliers.
- Cons:
- Typically univariate (though SARIMAX/ARIMAX can include exogenous regressors).
- Fitting thousands/millions of individual models can be computationally intensive and hard to manage.
- May not easily leverage cross-series information (e.g., similar products behave similarly).
- Less flexible in incorporating complex non-linear relationships from many features.
- Zepto Context: Could be a good baseline, especially Prophet, or for very stable, high-volume SKUs. But likely not the primary solution for all SKUs due to scalability and cross-learning limitations.
- Machine Learning Models (Tree-based, etc.):
- Examples: Gradient Boosting (LightGBM, XGBoost, CatBoost), Random Forest.
- Pros:
- Excellent at handling tabular data with many features (our engineered lags, calendar, promo features, etc.).
- Can capture complex non-linear relationships.
- LightGBM/XGBoost are highly scalable and efficient.
- Can be trained as a "global" model: one model trained on data from all SKU-Store time series, using SKU ID and Store ID (or their embeddings/features) as categorical features. This allows learning across series.
- Relatively good interpretability (feature importance).
- Cons:
- Require careful feature engineering to capture time series dynamics (lags, rolling stats are essential).
- Don't inherently model time dependencies as well as sequence models unless features are very well crafted.
- Zepto Context: This is a very strong contender, especially LightGBM, due to its scalability and ability to use rich features. A global LightGBM model is likely a good primary approach.
- Deep Learning Models (Sequence Models):
- Examples: RNNs (LSTMs, GRUs), Temporal Convolutional Networks (TCNs), Transformers adapted for time series (e.g., Informer, Autoformer). Amazon's DeepAR is also relevant here (probabilistic forecasting using RNNs).
- Pros:
- Can automatically learn temporal dependencies and complex patterns from raw time series (or with minimal feature engineering).
- Can naturally handle multivariate time series and exogenous variables.
- Global DL models can learn shared representations across many time series using embeddings for SKU/Store IDs.
- Can produce probabilistic forecasts (predicting a distribution, not just a point estimate), which is useful for setting safety stock.
- Cons:
- Data-hungry: Typically require large amounts of data per series or many series for global models to perform well.
- Computationally expensive to train and tune.
- Can be harder to interpret (black-box nature).
- More complex to implement and maintain.
- Zepto Context: A promising direction, especially for global modeling and probabilistic forecasts. Might be an iteration after establishing a strong tree-based baseline, or for specific categories where complex patterns are evident. TCNs or Transformer-based models could be more efficient than LSTMs for long sequences.
Proposed Primary Model & Architecture:
I would propose starting with a global Gradient Boosting model, likely LightGBM, as the primary workhorse due to its balance of performance, scalability, and feature handling capabilities.
- Input: Each row in the training data would represent a `(SKU, Store, Date)` combination.
- Features: Lagged sales, rolling statistics, calendar features, promotion features, price features, product metadata (encoded), store metadata (encoded), weather data.
- Target: Sales quantity for `Date + 1 day`, `Date + 2 days`, ..., `Date + 7 days`. This can be done by:
- Direct Multi-Output Forecasting: Train one model to predict all 7 days simultaneously (some GBDT libraries support this).
- Iterative Forecasting (Recursive): Train a model to predict `t+1`. Use the prediction for `t+1` as a feature to predict `t+2`, and so on. More prone to error accumulation.
- Independent Models per Horizon: Train 7 separate models, one for each day in the forecast horizon (Model_D1, Model_D2, ..., Model_D7). Computationally more expensive but can be more accurate. This is often a good approach with GBDTs.
Hierarchical Forecasting (Optional Enhancement):
Forecasts might be more robust if reconciled across a hierarchy (e.g., total sales for a store, category sales within a store, SKU sales within a category).
- Generate base forecasts at the SKU-Store level.
- Generate forecasts at higher levels (e.g., Category-Store).
- Use reconciliation methods (e.g., top-down, bottom-up, MinT) to ensure forecasts are consistent across the hierarchy. This can improve overall accuracy and stability.
Addressing Cold Start Problem:
This is crucial for new SKUs and new dark stores.
- New SKUs (No Sales History):
- Attribute-Based Similarity: Find existing SKUs that are most similar to the new SKU based on its metadata (category, sub-category, brand, price range, product description embeddings).
- Initial Forecast: Use the (potentially scaled) forecast of the average or median of these similar SKUs. For example, if a new brand of 2-minute noodles is introduced, its initial forecast could be based on existing popular 2-minute noodle brands in that store.
- Faster Retraining/Adaptation: As soon as a few days/weeks of sales data for the new SKU become available, incorporate it quickly into model retraining or use a separate, rapidly adapting model for new items.
- New Dark Stores (No Store-Specific History):
- Store Attribute Similarity: Find existing dark stores that are most similar based on location, demographics, size, and initial product assortment.
- Initial Forecast: Use an average or weighted average of forecasts from similar stores for the same SKUs, potentially adjusted for store size/capacity.
- Again, rapidly incorporate new store data as it becomes available.
- Global Models Advantage: Global models (LightGBM or DL) that use product/store features/embeddings can inherently handle cold starts better if the new product/store shares features with existing ones the model has seen. The model can generalize from seen feature combinations.
My strategy would be to start with a robust global LightGBM, rigorously evaluate it, and then explore DL models or hierarchical approaches as further improvements.
Round 5: Scalability & System Architecture
End-to-End System Architecture for Demand Forecasting:
I. Data Ingestion & Storage Layer:
- Data Sources: As discussed (Sales DB, Promotions DB, Product Master, Store Master, Weather APIs, Holiday Calendars).
- Ingestion:
- Batch Ingestion: Daily/hourly ETL jobs (e.g., using Apache Airflow to orchestrate Spark jobs or SQL transformations) to pull data from source systems into a data lake (e.g., AWS S3, Google Cloud Storage).
- Streaming Ingestion (Optional, for future near real-time features): Kafka for real-time sales/inventory updates if needed for very short-term reactive forecasting.
- Data Lake/Warehouse:
- Store raw and processed data (e.g., S3).
- Use a data warehouse (e.g., Snowflake, BigQuery, Redshift) or a lakehouse architecture (e.g., Databricks Delta Lake) for structured, queryable historical data.
- Feature Store (Highly Recommended):
- Centralized repository for curated features (lags, rolling stats, promo flags, calendar features).
- Ensures consistency between training and serving.
- Supports batch computation for training and potentially low-latency retrieval for real-time inference if ever needed. Tools like Feast or Tecton.
- Centralized repository for curated features (lags, rolling stats, promo flags, calendar features).
II. Model Training Pipeline (Batch, e.g., Daily/Weekly):
- Orchestration: Apache Airflow, Kubeflow Pipelines, or AWS Step Functions to manage the training workflow.
- Data Preparation & Feature Engineering Job:
- A distributed processing job (e.g., Spark, Dask) that:
- Reads historical sales, product, store, promotion, weather data.
- Performs time series preprocessing (OOS handling, outlier cleaning).
- Generates all the engineered features (lags, rolling windows, calendar, etc.) for each (SKU, Store, Date).
- Constructs the final training dataset (tabular format).
- Saves features to the Feature Store and/or the training dataset to the data lake.
- A distributed processing job (e.g., Spark, Dask) that:
- Model Training Job:
- Reads the prepared training data.
- Trains the global LightGBM model (or potentially multiple models for different horizons/segments).
- This can run on a distributed training framework if the dataset is very large (e.g., LightGBM on Spark, or using a managed ML platform like SageMaker, Vertex AI, Azure ML).
- Performs hyperparameter optimization (e.g., using Optuna, Hyperopt) with appropriate time series cross-validation.
- Evaluates the model on a validation set.
- Model Registry:
- Version and store the trained model artifact (e.g., using MLflow, SageMaker Model Registry).
- Store model metadata (training parameters, evaluation metrics, lineage).
- Model Deployment (for Batch Inference): The "deployed" model for batch forecasting is simply the registered artifact that the inference pipeline will pick up.
III. Batch Forecasting Pipeline (Daily):
- Orchestration: Scheduled daily by Airflow (or similar).
- Feature Generation for Prediction:
- For each (SKU, Store) combination, generate the required features for the next 7 days (or the prediction horizon). This involves using the most recent available data to compute lags, rolling stats, and fetching future calendar/promo/weather features. This also uses the Feature Store.
- Prediction Job:
- Load the latest approved/production model from the Model Registry.
- Perform inference on the generated feature set for all (SKU, Store) combinations to get the 7-day forecasts.
- This can also be a distributed job (Spark + LightGBM UDFs).
- Post-processing & Storage:
- Apply any business rules or constraints (e.g., forecasts cannot be negative).
- Perform hierarchical reconciliation if implemented.
- Store the final forecasts in a database (e.g., PostgreSQL, MySQL, or a dedicated forecast DB) accessible by downstream systems.
- Schema: `(SKU_ID, Store_ID, Forecast_Date, Forecast_Horizon_Day, Predicted_Quantity, Model_Version, Run_Timestamp)`.
IV. Forecast Serving Layer:
- API Service: Downstream systems (inventory planning, replenishment, dark store operations dashboards) can query the forecast database via an API to get the latest forecasts.
- Direct DB Access: For some batch processes (like generating purchase orders), direct access to the forecast DB might be used.
V. Monitoring & Alerting Layer (Covered in detail if asked):
- Dashboards for forecast accuracy, data drift, model drift, system health.
- Alerts for significant deviations or failures.
Technology Stack Considerations (Examples):
- Data Lake: AWS S3, GCS, Azure Blob Storage.
- Data Warehouse/Lakehouse: Snowflake, BigQuery, Redshift, Databricks Delta Lake.
- ETL/Orchestration: Apache Airflow, Spark, AWS Glue, Azure Data Factory.
- Feature Store: Feast, Tecton, or custom build on top of data lake/warehouse.
- ML Platform/Training: SageMaker, Vertex AI, Azure ML, Databricks ML, or custom Kubeflow setup.
- Model Registry: MLflow, SageMaker Model Registry.
- Forecast DB: PostgreSQL, MySQL, Cassandra (if very high write load).
- API: Python (FastAPI/Flask) + Docker + Kubernetes/Serverless.
This architecture emphasizes automation, scalability through distributed processing, and MLOps best practices like feature stores and model registries.
Round 6: System Rollout Strategy & Risk Management
Phased Rollout Strategy & Risk Management:
- Phase 0: Offline Validation & Benchmarking (Pre-Rollout):
- Rigorous Backtesting: Extensively backtest the new model against historical data using walk-forward validation.
- Compare its performance (MAE, MAPE, WMAPE, bias) against any existing forecasting methods or simple baselines (e.g., naive forecast, moving average).
- Segment Analysis: Analyze model performance across different product categories (e.g., perishables vs. non-perishables, fast vs. slow movers), store types, and regions. Identify potential weak spots.
- Sanity Checks: Ensure forecasts are plausible (e.g., not negative, not orders of magnitude different from recent history without good reason like a major promotion).
- Rigorous Backtesting: Extensively backtest the new model against historical data using walk-forward validation.
- Phase 1: Shadow Mode Deployment (No Operational Impact):
- Deploy the new forecasting system to generate daily forecasts in parallel with the existing system (if any).
- These new forecasts are not used for actual inventory decisions yet.
- Purpose:
- Compare new forecasts against actual sales and existing forecasts in a live environment.
- Identify discrepancies, bugs, or unexpected behavior.
- Allow the operations and category teams to review the new forecasts and build confidence/provide feedback.
- Duration: Several weeks to cover different scenarios (weekdays, weekends, minor events).
- Phase 2: Pilot Program / Canary Release (Limited Live Impact):
- Select Pilot Stores/Categories: Choose a small, manageable set of dark stores (e.g., 2-3 stores in one city) or a few specific product categories (e.g., a mix of perishables and non-perishables) to go live with the new forecasts.
- These pilot stores should ideally have co-operative staff who can provide detailed feedback.
- Intensive Monitoring:
- Track inventory levels, stockout rates, wastage rates, and fulfillment times very closely for the pilot stores/categories.
- Compare these KPIs against control stores/categories still using the old system.
- Gather qualitative feedback from dark store managers and pickers.
- Refine & Iterate: Based on pilot results, make necessary adjustments to the model, features, or post-processing rules.
- Select Pilot Stores/Categories: Choose a small, manageable set of dark stores (e.g., 2-3 stores in one city) or a few specific product categories (e.g., a mix of perishables and non-perishables) to go live with the new forecasts.
- Phase 3: Gradual Expansion (A/B Testing on Operational KPIs):
- If the pilot is successful, gradually expand the rollout to more stores, cities, or categories.
- A/B Testing Approach:
- For a given city or set of similar stores:
- Group A (Control): Continues using the old forecasting method.
- Group B (Treatment): Uses the new forecasting system for inventory decisions.
- Measure and compare key operational KPIs (stockouts, wastage, inventory holding cost, sales) over a significant period (e.g., 4-8 weeks).
- This provides quantitative evidence of the new system's impact.
- For a given city or set of similar stores:
- Phase 4: Full Rollout & Continuous Monitoring:
- Once A/B tests demonstrate clear benefits and stability, proceed with full rollout.
- Even after full rollout, continuously monitor forecast accuracy and its impact on business KPIs.
Risk Management & Mitigation:
- Clear Go/No-Go Criteria: Define clear metrics and thresholds at each phase to decide whether to proceed with the next phase of rollout.
- Rollback Plan: At every stage, have a well-defined and tested plan to quickly revert to the previous forecasting system or manual overrides if the new system causes significant issues.
- Forecast Overrides/Adjustments: Initially, provide a mechanism for experienced planners or category managers to review and manually adjust system-generated forecasts, especially for critical items or during uncertain periods. Log these overrides to learn from them.
- Communication & Training: Thoroughly train operations teams, inventory planners, and dark store staff on how the new system works, its expected outputs, and how to interpret the forecasts.
- Dedicated Support Team: Have a dedicated team ready to address any issues or questions during the rollout phases.
This careful, data-driven rollout approach helps build confidence, identify problems early, and minimize the risk of large-scale operational disruption.
Round 7: Model Evaluation
Offline Model Evaluation (During Development & Retraining):
These are calculated on a hold-out validation set using walk-forward validation.
- Scale-Dependent Error Metrics:
- Mean Absolute Error (MAE): `Average(|Actual - Forecast|)`. Easy to understand, shows average error in units.
- Root Mean Squared Error (RMSE): `Sqrt(Average((Actual - Forecast)^2))`. Penalizes larger errors more heavily.
- Percentage Error Metrics:
- Mean Absolute Percentage Error (MAPE): `Average(|(Actual - Forecast) / Actual|) * 100`.
- Caution: Can be problematic with zero or near-zero actuals (division by zero, or explodes). Can be skewed by low-volume items.
- Weighted Mean Absolute Percentage Error (WMAPE) / Mean Absolute Scaled Error (MASE):
- WMAPE (Volume-Weighted): `Sum(|Actual - Forecast|) / Sum(Actual) * 100`. This is often a key metric in retail as it gives more weight to high-volume items where forecast accuracy matters more for overall business impact. This would be very relevant for Zepto.
- MASE: Compares the model's MAE to the MAE of a naive baseline (e.g., seasonal naive forecast). MASE < 1 means the model is better than naive.
- Mean Absolute Percentage Error (MAPE): `Average(|(Actual - Forecast) / Actual|) * 100`.
- Forecast Bias:
- Mean Error (ME) or Mean Percentage Error (MPE): `Average(Actual - Forecast)`. Helps identify systematic over-forecasting (ME < 0) or under-forecasting (ME > 0). We want this to be close to zero.
- Quantile Loss (for Probabilistic Forecasts):
- If the model produces quantile forecasts (e.g., P10, P50, P90), we can use quantile loss (pinball loss) to evaluate the accuracy of these specific quantiles. This is important for setting safety stock based on uncertainty.
- For Zepto, P90 or P95 forecasts might be used to minimize stockouts.
- If the model produces quantile forecasts (e.g., P10, P50, P90), we can use quantile loss (pinball loss) to evaluate the accuracy of these specific quantiles. This is important for setting safety stock based on uncertainty.
- Segment-Level Evaluation:
- Evaluate metrics separately for different segments:
- Product categories (e.g., fresh produce, dairy, pantry staples).
- Fast-movers vs. slow-movers.
- Promotional vs. non-promotional periods.
- Different dark stores or city clusters.
- Evaluate metrics separately for different segments:
Online Evaluation & Business KPIs (Post-Deployment, often via A/B Testing):
These measure the actual impact of the forecasts on operations.
- Inventory & Availability Metrics:
- Stockout Rate / Out-of-Stock Percentage (OOS%): Percentage of (SKU-Store-Day) instances where an item was OOS. This is a primary KPI for Zepto. Target: Significant reduction.
- Service Level: Percentage of demand met from available stock.
- Lost Sales Estimation: Estimated revenue lost due to stockouts.
- Wastage/Expiry Metrics (Especially for Perishables):
- Wastage Rate: Percentage of stock (by value or quantity) that expires or becomes unsellable. Critical for fresh categories. Target: Significant reduction.
- Inventory Efficiency Metrics:
- Inventory Holding Cost: Cost associated with storing inventory.
- Inventory Turnover: How quickly inventory is sold. Higher is generally better.
- Days of Supply (DOS): How many days current inventory would last based on current sales rate.
- Operational Efficiency:
- Order Fulfillment Time: While not directly a forecast metric, accurate stock levels contribute to faster picking and delivery.
- Replenishment Efficiency: Smoother, more predictable replenishment cycles.
- Customer Satisfaction (Indirect):
- Reduction in customer complaints related to item unavailability.
- Improvement in overall CSAT scores.
For Zepto, the most critical metrics would likely be:
- Offline: WMAPE (volume-weighted), Forecast Bias (ME/MPE), and Quantile Loss (if using probabilistic forecasts for safety stock).
- Online (via A/B testing): Stockout Rate, Wastage Rate (especially for perishables like milk, bread, vegetables), and potentially overall sales lift in treated stores/categories.
We'd need dashboards to track these metrics continuously and trigger alerts if performance degrades significantly.
Round 8: Ethical Considerations & Bias
Ethical Considerations & Bias Mitigation for Zepto's Demand Forecasting:
- Bias from Historical Data (Stockouts & Underrepresentation):
- Concern: If certain products (e.g., niche regional items, specific dietary preference items like vegan products) were frequently out of stock in the past in certain neighborhoods due to poor prior forecasting or supply issues, the historical sales data will underrepresent their true demand. The new system might learn this pattern and continue to under-forecast for these items, creating a self-fulfilling prophecy of unavailability. This could disproportionately affect certain customer segments or cultural preferences.
- Mitigation:
- Robust True Demand Estimation: Invest heavily in accurately estimating demand during OOS periods. This is crucial.
- Monitor Forecasts for Consistently Low-Stocked Items: Flag SKUs that are consistently under-forecasted and frequently OOS, especially if they cater to specific demographic or regional needs. Investigate if this is a model bias or a persistent supply chain issue.
- Feedback from Store Managers: Dark store managers often have good local knowledge. Provide a channel for them to flag items they believe are consistently understocked despite demand.
- Geographical/Socioeconomic Bias (Assortment & Availability):
- Concern: If the model (or the data it's trained on) implicitly learns correlations between neighborhood socioeconomic status/demographics and product demand, it might lead to systematically different (and potentially less diverse or lower quality) assortments in less affluent areas compared to wealthier ones, even if latent demand for certain products exists. For example, under-forecasting healthier or premium options in lower-income areas.
- Mitigation:
- Audit Assortment Diversity: Regularly audit the diversity and quality of product assortments forecasted and stocked across different neighborhood types.
- Fairness-Aware Modeling (Advanced): Explore techniques that can incorporate fairness constraints or re-weight samples to ensure equitable representation of demand across different demographic groups, if such data is available and ethically usable.
- Local Curation Input: Allow for some level of local curation or minimum assortment guarantees for essential items across all stores, regardless of purely data-driven forecasts for those specific items.
- Bias Towards Popular/Mainstream Products:
- Concern: Models might naturally become better at forecasting high-volume, mainstream products, potentially at the expense of accurately forecasting demand for niche, local, or emerging items. This could reduce product diversity over time.
- Mitigation:
- Segmented Modeling/Evaluation: Evaluate model performance specifically for long-tail or niche items. Consider separate modeling strategies or heuristics if global models underperform here.
- Cold Start Strategies: Ensure good cold-start strategies for new and niche items to give them a fair chance.
- Protect Diversity Goals: Business rules might be needed to ensure a certain level of diversity in stocked items, even if some have lower or more volatile forecasted demand.
- Impact of Promotions:
- Concern: If promotional data is biased (e.g., promotions historically targeted only certain customer segments or product types), the model might learn to only boost forecasts for those segments during promotions, missing opportunities elsewhere.
- Mitigation:
- Ensure promotion data used for training is representative or that the model can generalize promotional lift effects.
- Carefully analyze promotional lift across different product types and customer demographics.
- Transparency & Explainability (Internal):
- Concern: If inventory planners or category managers don't understand why the model is making certain forecasts, they may not trust it or may override it incorrectly.
- Mitigation:
- Provide feature importance scores from the model.
- Show confidence intervals or prediction intervals if using probabilistic forecasts.
- Develop tools to allow planners to see key drivers for a particular forecast (e.g., recent trend, upcoming holiday, active promotion).
Regular audits for fairness, continuous monitoring of impact on different segments, and a strong feedback loop involving local store insights and category management are crucial to mitigate these ethical risks and ensure the system serves all customers equitably.
Round 9: Advanced Technical Challenge & Trade-offs
Adapting to Sudden, Unpredictable Events:
- Near Real-Time Anomaly Detection & Alerting on Demand Signals:
- Monitor Point-of-Sale (POS) Data: Continuously monitor actual sales data at a granular level (e.g., hourly or even every 10-15 minutes for very fast movers) for significant deviations from short-term forecasts or typical patterns.
- If sales for "sweet boxes" or "pooja items" suddenly spike across multiple stores in Hyderabad, it might indicate an uncaptured local event or festival.
- Alerting System: If such anomalies are detected (e.g., sales > 3 standard deviations above expected for a sustained period), trigger alerts to human planners/category managers.
- Monitor Point-of-Sale (POS) Data: Continuously monitor actual sales data at a granular level (e.g., hourly or even every 10-15 minutes for very fast movers) for significant deviations from short-term forecasts or typical patterns.
- Rapid Manual Override & Adjustment Capabilities:
- Provide an easy-to-use interface for operations or category teams to quickly input information about such events (e.g., "Unexpected local festival in Area X, expect +50% demand for sweets & flowers for 2 days") and apply a temporary uplift or suppression to the system's baseline forecasts for affected SKUs/stores.
- This human-in-the-loop intervention is crucial for events the model couldn't have known.
- Short-Term Reactive Models / Nowcasting (More Advanced):
- Alongside the main 1-7 day forecast, develop very short-term "nowcasting" models (e.g., predicting demand for the next 1-4 hours) that are highly sensitive to the most recent sales trends and external signals (if any can be streamed, like social media sentiment for very specific events).
- These models could use simpler techniques (e.g., exponential smoothing with a very short alpha, or a very reactive ARIMA) and be retrained/updated much more frequently (e.g., hourly). Their output can inform immediate operational decisions within dark stores (e.g., prioritize picking for surging items).
- Exogenous Shock Event Logging & Feature Creation (Post-Event):
- Once such an event is identified (either by anomaly detection or human input), log it meticulously: event type, start/end time, affected SKUs/stores, estimated impact.
- Over time, build a library of these "shock events." For future model retraining, try to create features representing these past shocks (e.g., a binary flag for "similar_past_local_festival_type_X"). This allows the model to potentially learn from past, similar (though not identical) shocks if they have recurring characteristics. This is hard because shocks are often unique.
- Scenario-Based Adjustments / What-If Analysis Tools:
- Provide tools for planners to simulate the impact of potential shocks. For example, "If competitor Y runs a 50% off on dairy, what's the likely impact on our dairy sales?" This might require some pre-defined rules or simpler elasticity models.
- Model Ensembles & Robustness:
- Sometimes, an ensemble of different model types (e.g., a statistical model + a GBDT model) can be more robust to certain types of shocks if one model is less affected than another.
- Focus on Resilience in Inventory Strategy:
- Recognize that no forecast will be perfect for extreme unpredictable events. Part of the solution lies in inventory strategy:
- Slightly higher safety stock for A-class items.
- Agile replenishment capabilities to quickly restock if a sudden surge occurs.
- Fast communication channels between dark stores and central supply chain teams.
- Recognize that no forecast will be perfect for extreme unpredictable events. Part of the solution lies in inventory strategy:
For truly "black swan" events, the system's primary role shifts from perfect prediction to rapid detection, enabling quick human intervention, and then learning from the event to improve (marginally) for any future similar occurrences.
Reasoning & Loss Function Implication:
- Asymmetric Business Costs:
- The cost of a stockout includes not just the lost margin on that sale, but also potential loss of a customer to a competitor, and damage to Zepto's brand promise. This "customer lifetime value" impact can be substantial.
- The cost of over-forecasting for non-perishables is mainly holding cost. For perishables, it's the cost of goods + disposal, which is significant, but perhaps still less than losing a customer.
- Loss Function Choice/Modification:
- Standard Loss Functions (MAE, MSE): These treat over-prediction and under-prediction errors symmetrically.
- Asymmetric Loss Function: We could use a custom loss function that penalizes under-prediction errors more heavily than over-prediction errors. For example, a modified Mean Squared Error:
Where `Cost_Underforecast > Cost_Overforecast`. The ratio of these costs would need to be determined through business analysis (e.g., `Cost_Underforecast` might be 1.2 to 1.5 times `Cost_Overforecast`).Loss = if (Actual > Forecast): Cost_Underforecast * (Actual - Forecast)^2 else: Cost_Overforecast * (Actual - Forecast)^2 - Quantile Regression / Quantile Loss: This is a very natural way to achieve this. Instead of predicting the mean (P50 quantile), we could train the model to predict a higher quantile, for instance, the P70 or P75 quantile of the demand distribution. Predicting the 75th percentile means that, by definition, we expect to have enough stock 75% of the time, thus inherently building in a buffer against stockouts and accepting a higher chance of some overstock. The quantile loss function automatically handles this asymmetry.
By choosing `q > 0.5` (e.g., `q=0.75`), we penalize under-predictions more. LightGBM and some deep learning frameworks support quantile regression directly.QuantileLoss_q(y, y_pred) = q * max(0, y - y_pred) + (1-q) * max(0, y_pred - y)
- Segment-Specific Strategy:
- This bias towards over-forecasting might be applied more aggressively for:
- High-margin / High-volume (A-class) SKUs.
- Key Value Items (KVIs) that customers expect Zepto to always have (e.g., milk, bread, onions, tomatoes).
- SKUs with very high stockout costs.
- For very low-margin, highly perishable, or very slow-moving items, we might accept a slightly lower service level (i.e., forecast closer to the median or a lower quantile) to minimize wastage.
- This bias towards over-forecasting might be applied more aggressively for:
So, the primary approach would be to use quantile regression to target a higher percentile (e.g., P70-P80) of the demand distribution for most items, or implement a custom asymmetric loss function. This directly aligns the model's optimization objective with the business priority of minimizing stockouts while being mindful of wastage.
Interview Conclusion
What to Learn from This Case
- Clarify Time Series Specifics: Understand demand definition, forecast horizon/granularity, stockout handling, and business costs of over/under forecasting.
- Rich Data Sources are Key: Identify internal (sales, promotions, product/store metadata, OOS data) and external (calendar, weather) data. Anticipate quality issues.
- Time Series Feature Engineering is Crucial: Detail lags, rolling stats, calendar/holiday features, promotion effects, price features, and external regressors. Explain how they capture temporal dynamics.
- Correct Time Series Splitting: Emphasize walk-forward validation or rolling origin validation to prevent data leakage and realistically simulate production deployment.
- Model Selection Trade-offs: Discuss pros/cons of classical, ML (tree-based), and DL (sequence) models in the context of scalability, feature richness, and interpretability. Global models are often preferred for many series.
- Address Cold Start: Have clear strategies for new products and new locations using attribute similarity or leveraging global model features.
- Scalable System Architecture: Design for batch training/inference pipelines using orchestration tools, distributed processing, feature stores, and model registries.
- Phased Rollout & Risk Management: Detail shadow mode, pilot programs, A/B testing on operational KPIs, and clear rollback plans.
- Time Series Evaluation Metrics: Use MAE, RMSE, MAPE, WMAPE, MASE, forecast bias, and quantile loss. Connect these to business KPIs like stockout rate, wastage, and inventory turnover.
- Ethical Considerations & Bias: Discuss potential biases from historical data (stockouts), geographical disparities, popularity, and how to mitigate them through data, modeling, and process.
- Handling Unpredictable Shocks: Propose strategies like anomaly detection, human overrides, reactive short-term models, and learning from past events.
- Align Loss Functions with Business Objectives: Use asymmetric loss functions or quantile regression to penalize more costly errors (e.g., under-forecasting leading to stockouts).
- Communicate Business Impact: Consistently link technical decisions to their impact on business objectives and user experience.