Karthikeya Silk House: Wedding Season Recommendation Test

Problem Statement

"Karthikeya Silk House," a renowned saree retailer with branches across Hyderabad, Vijayawada, and Rajahmundry, wants to test a new personalized recommendation system during the wedding season. The system suggests sarees based on customers' previous purchases and browsing history. Based on a pilot study with offline metrics, the new system appears to perform better than their current recommendation approach.

A/B Test Design for Recommendation System

MODERATE

How would you design an A/B test to evaluate if the new recommendation system actually increases average purchase value and customer satisfaction across different regional branches (Hyderabad, Vijayawada, Rajahmundry) with varying customer preferences (like Gadwal sarees in Telangana vs. Uppada sarees in coastal Andhra)?

Solution

Karthikeya Silk House wants to see if their new saree recommendation system helps customers spend more and be happier, especially during the busy wedding season. They have branches in Hyderabad, Vijayawada, and Rajahmundry, and people in different areas like different sarees (e.g., Gadwal in Telangana, Uppada in coastal Andhra).

Here's how we'd design a fair test (A/B Test):

Two Groups: We'll randomly divide customers (either visiting the website or the store, if the system is in-store) into two groups:
- Group A (Control): Sees the current, old recommendation system.
- Group B (Treatment): Sees the new, personalized recommendation system.
Random is Key: It's crucial that customers are put into Group A or B randomly. This helps ensure the groups are similar in other ways, so any difference we see is likely due to the recommendation system.
Testing Everywhere, But Watching Separately: We run the test for customers across all branches (Hyderabad, Vijayawada, Rajahmundry). However, when we look at the results, we'll analyze each branch separately, or even by saree preference (e.g., how did it do for Gadwal saree lovers vs. Uppada saree lovers?). This is called stratification or segmentation, and it helps us see if the new system works well everywhere or only in certain places/for certain sarees.
What We Measure: We'll track if customers buy more expensive sarees (average purchase value) and if they say they are happy with the suggestions (customer satisfaction, maybe via a quick survey).
How Long: We need to run the test long enough (e.g., a few weeks of the wedding season) to get enough data and to cover different types of shopping days.

This way, Karthikeya Silk House can confidently see if the new system is truly better across all their important Telugu customer groups and for various traditional sarees.

To design an A/B test for Karthikeya Silk House's new personalized recommendation system, aiming to increase average purchase value and customer satisfaction across diverse regional branches (Hyderabad, Vijayawada, Rajahmundry) with varying saree preferences (e.g., Gadwal vs. Uppada), I would propose the following design:

1. Define Objective and Scope:
- Objective: To determine if the new personalized recommendation system (Variant B) leads to a statistically significant increase in average purchase value and customer satisfaction compared to the current system (Variant A) during the wedding season.
- Scope: The test would run on the e-commerce platform (if applicable) and/or on in-store digital interfaces where recommendations are shown. It would target customers interacting with saree product pages or specific recommendation widgets.
2. Participant Randomization and Groups:
- Unit of Randomization: Individual customers (e.g., based on user ID for logged-in users, or session ID/cookie for guest users).
- Groups:
  - Variant A (Control): Customers are exposed to the current recommendation system.
  - Variant B (Treatment): Customers are exposed to the new personalized recommendation system.
- Assignment: Typically a 50/50 random split. Ensure users are consistently assigned to the same variant across their session and, if possible, across multiple sessions.
3. Handling Regional Preferences (Stratification/Segmentation):
- Pre-Test Stratification (Optional but Recommended): If technically feasible and baseline data on regional preferences (e.g., primary branch association like Hyderabad, Vijayawada, Rajahmundry, or strong interest in Gadwal vs. Uppada sarees) is available, we could stratify the randomization. This means ensuring a balanced split into A and B within each key region or preference segment. This helps improve the precision of segment-specific estimates.
- Post-Test Segmentation (Essential): Regardless of pre-test stratification, it's crucial to analyze the results by segments:
  - By Branch Location (Hyderabad, Vijayawada, Rajahmundry).
  - By Inferred Saree Preference (e.g., users who primarily browse/purchase Gadwal, Uppada, Kanjeevaram, Pochampally sarees, etc.). This might require a system to tag users based on their interaction history.
  This allows Karthikeya Silk House to understand if the new system performs uniformly well or if its effectiveness varies, informing targeted rollouts or further personalization.
4. Key Metrics (KPIs): (Covered in detail in Q2, but to mention here for design)
- Primary: Average Order Value (AOV), Conversion Rate (from recommendation interaction to purchase).
- Secondary: Customer Satisfaction Score (CSAT) via post-interaction/purchase survey, Items Per Order (IPO), Revenue Per User (RPU), Click-Through Rate (CTR) on recommendations.
5. Sample Size and Duration:
- Calculate required sample size based on baseline AOV, desired Minimum Detectable Effect (MDE) for AOV, statistical power (e.g., 80%), and significance level (e.g., 5%). Account for potentially needing larger samples if deep segment analysis is a primary goal.
- Run the test for a sufficient duration during the wedding season to capture enough data and representative user behavior (e.g., 2-4 weeks, considering typical purchase cycles for wedding sarees).
6. Implementation Details:
- Ensure robust tracking for all defined metrics for both variants.
- Minimize technical differences between variants other than the recommendation logic itself (e.g., ensure similar loading times for recommendation widgets).

This design allows for an overall comparison while also providing crucial insights into how the new recommendation system performs for different customer segments and regional preferences, which is vital for a retailer like Karthikeya Silk House with a diverse Telugu customer base.

Hypotheses and Metrics for Wedding Season

MODERATE

For Karthikeya Silk House's A/B test, what would be your null and alternative hypotheses? What specific metrics would you track that align with both business goals (increasing purchase value, customer satisfaction) and Telugu wedding shopping patterns (e.g., high-value purchases, multiple saree purchases for different ceremonies)?

Solution

For Karthikeya Silk House's test of the new saree recommendation system, we need a clear "guess" and ways to measure if it's true.

Our Guesses (Hypotheses):

Null Hypothesis (H₀ - "No Change" Guess): The new personalized recommendation system does not increase how much money customers spend on average (average purchase value) OR it does not make them happier (customer satisfaction), OR it might even make things worse, compared to the old system.
Alternative Hypothesis (H₁ - "It Works!" Guess): The new personalized recommendation system does increase how much money customers spend on average AND/OR it does make them happier. (We're hoping for this!)

What We'll Track (Metrics), Keeping Telugu Wedding Shopping in Mind:

Telugu weddings often mean buying special, high-value sarees (like beautiful Gadwal or Uppada silks) and sometimes several sarees for different ceremonies or family members.

Main Goals to Measure (Primary Metrics):
- Average Order Value (AOV): How much money does each customer spend on average per purchase? We hope the new system suggests sarees that lead to bigger bills.
- Conversion Rate from Recommendation: Of the customers who interact with a recommendation (click on it), what percentage end up buying a saree?
Other Important Clues (Secondary Metrics):
- Customer Satisfaction (CSAT): After they buy or see recommendations, we can ask them (e.g., with a simple star rating or short survey) how much they liked the suggestions. Happy customers are good for business!
- Items Per Order (IPO): Are customers buying more sarees in one go? This is important for wedding shopping.
- Revenue Per User (RPU): Overall revenue generated per user exposed to the system.
- Click-Through Rate (CTR) on Recommendations: Are people even clicking on the recommended sarees? This shows if the suggestions are catching their eye.
- Purchase of High-Value Sarees: We can specifically track if the new system leads to more sales of expensive sarees typically bought for weddings.

By tracking these, Karthikeya Silk House can see if the new recommendation system truly helps their business and makes saree shopping better for their Telugu customers during the important wedding season.

For Karthikeya Silk House's A/B test on the new personalized recommendation system, the null and alternative hypotheses, along with specific metrics, would be defined as follows:

Null and Alternative Hypotheses:

We would typically have a set of hypotheses for each primary metric. For example, for Average Purchase Value (AOV) and Customer Satisfaction (CSAT):

For Average Purchase Value (AOV):
- Null Hypothesis (H₀): The new personalized recommendation system (Variant B) results in an AOV that is less than or equal to the AOV from the current system (Variant A). (μ_{B_AOV} ≤ μ_{A_AOV})
- Alternative Hypothesis (H₁): The new personalized recommendation system (Variant B) results in a significantly higher AOV than the current system (Variant A). (μ_{B_AOV} > μ_{A_AOV})
  (This is a one-tailed test, as the business goal is improvement. A two-tailed test, μ_{B_AOV} ≠ μ_{A_AOV}, could also be used if any significant change is of interest, but usually, we look for positive impact.)
For Customer Satisfaction (CSAT):
- Null Hypothesis (H₀): The new personalized recommendation system (Variant B) results in a CSAT score that is less than or equal to the CSAT score from the current system (Variant A). (μ_{B_CSAT} ≤ μ_{A_CSAT})
- Alternative Hypothesis (H₁): The new personalized recommendation system (Variant B) results in a significantly higher CSAT score than the current system (Variant A). (μ_{B_CSAT} > μ_{A_CSAT})

Specific Metrics to Track:

These metrics should align with Karthikeya Silk House's business goals and reflect typical Telugu wedding shopping patterns (e.g., high-value items, multiple purchases for different events/family members, interest in traditional sarees like Gadwal or Uppada):

Primary Metrics:
- 1. Average Order Value (AOV): Total revenue from transactions influenced by recommendations / Number of transactions influenced by recommendations.
  - Relevance: Directly measures if the new system encourages customers to purchase higher-value sarees, crucial during wedding season when budgets are often larger.
- 2. Recommendation-Influenced Conversion Rate: Number of users who make a purchase after interacting with a recommendation / Total number of users who were shown recommendations from that system.
  - Relevance: Indicates how effective the recommendations are at turning browsing into actual sales.
Secondary Metrics:
- 3. Customer Satisfaction (CSAT): Measured via post-interaction surveys (e.g., "How relevant were these saree suggestions?") or overall post-purchase satisfaction if recommendations were a key part of their journey.
  - Relevance: A good recommendation system should enhance the shopping experience, especially for important occasions like weddings.
- 4. Items Per Order (IPO) for recommendation-influenced orders:
  - Relevance: Telugu wedding shopping often involves buying multiple sarees (for the bride, for gifting, for different ceremonies). A good system might suggest complementary items or sarees for other related needs.
- 5. Revenue Per User (RPU) or Revenue Per Visitor (RPV): Total revenue / Total users (or visitors) exposed to each system.
  - Relevance: Provides an overall view of the monetary impact per user.
- 6. Click-Through Rate (CTR) on Recommended Sarees: Clicks on recommended items / Impressions of recommended items.
  - Relevance: Measures the initial appeal and relevance of the suggested sarees (e.g., Gadwal, Uppada).
- 7. Add-to-Cart (ATC) Rate from Recommendations:
  - Relevance: Indicates stronger purchase intent from the recommendations.
- 8. Proportion of High-Value Sarees in Recommendation-Influenced Orders: Track the percentage of sales from premium/wedding collections (e.g., sarees above a certain price point, or specific types like authentic Gadwal or Uppada pattu sarees) that were influenced by recommendations.
  - Relevance: Directly addresses the goal of selling more high-value items during the wedding season.
- 9. Engagement with Recommendation Widget: Time spent viewing recommendations, scroll depth within the widget.
Guardrail Metrics (to ensure no negative impact):
- Overall Site Conversion Rate: Ensure the new system doesn't negatively impact overall conversion.
- Page Load Time: Ensure the new recommendation system doesn't slow down the website.
- Bounce Rate on pages with recommendations.

Tracking these metrics will provide a comprehensive view of the new system's performance, helping Karthikeya Silk House understand not just the direct financial impact but also its effect on customer experience during the critical wedding shopping period across their Hyderabad, Vijayawada, and Rajahmundry customer base.

Accounting for Seasonal Variations

ADVANCED

How would you account for seasonal variations in saree purchasing behavior during major Telugu festivals (like Dasara, Sankranti) and specific wedding muhurtham dates when conducting this A/B test for Karthikeya Silk House's new recommendation system?

Solution

Saree shopping for Telugu weddings and festivals like Dasara or Sankranti has its own rhythm – some days are super busy, others are quieter. We need to make sure these natural ups and downs don't confuse our test results for Karthikeya Silk House.

How to Handle the Busy Seasons:

Run Both Systems at the Same Time: The most important thing is that both the old (Group A) and new (Group B) recommendation systems are active simultaneously. So, if there's a big rush for wedding muhurtham dates, both groups experience that rush. This way, the "season effect" impacts both equally, and we can still see the difference caused by the recommendation system.
Test for a Decent Period: Don't just test for one super busy weekend. Run the test for several weeks. This helps average out the super busy days and the normal days, giving a more balanced view of how the new system performs overall. Ideally, cover a full cycle of shopping behavior if possible.
Look at Busy Times Separately (Segmentation): After the test, we can specifically compare how the new system performed during peak wedding muhurtham dates versus other days. Maybe the new system is especially helpful when customers are in a hurry and need good suggestions quickly, or perhaps it helps them discover more items during less frantic periods. This helps understand when it's most effective.
Keep an Eye on External Events: We should note down when major festivals or clusters of wedding dates occur during the test. This helps in explaining any unusual spikes or dips in the data.

By doing this, Karthikeya Silk House can get a clearer picture of how their new saree recommendation system works, even with all the exciting hustle and bustle of Telugu wedding and festival seasons in Hyderabad, Vijayawada, and Rajahmundry.

Accounting for seasonal variations in saree purchasing behavior during major Telugu festivals (like Dasara, Sankranti) and specific wedding muhurtham dates is crucial for obtaining reliable results from Karthikeya Silk House's A/B test. These periods often see significant spikes in demand, changes in customer intent (e.g., higher urgency, larger budgets), and shifts in preferred saree types (e.g., traditional Gadwal or Uppada silks for weddings).

Here’s how I would approach this:

1. Concurrent Experimentation:
- The most fundamental principle is to ensure that both Variant A (control) and Variant B (treatment) run simultaneously throughout the entire test period. This means any external seasonal factor (e.g., a surge in demand during a wedding muhurtham week) will affect both groups equally, allowing the comparison between them to remain valid for isolating the feature's effect.
2. Sufficient Test Duration:
- Run the A/B test for a duration that is long enough to encompass typical seasonal cycles or multiple instances of peak events (if feasible). For instance, if testing during the broader "wedding season," aim for several weeks to average out the impact of specific clusters of muhurtham dates.
- This helps ensure that the results are not skewed by an unusually high or low period that might coincide with only a part of the test.
3. Segmentation by Time Period / Event:
- During analysis, segment the data based on different time periods:
  - Peak wedding muhurtham days/weeks vs. non-peak days/weeks.
  - Festival periods (e.g., Dasara, Sankranti) vs. regular periods.
- Analyze the performance of Variant A vs. Variant B separately for these segments. This can reveal if the new recommendation system's effectiveness changes with seasonal demand (e.g., it might be more impactful during high-intent periods or, conversely, less noticeable amidst a general buying frenzy).
- This provides richer insights than a single aggregated result, helping Karthikeya Silk House understand the feature's performance under different market conditions.
4. Covariate Adjustment / Regression Modeling:
- If precise dates of festivals or peak muhurthams are known, these can be incorporated as covariates in a regression model analyzing the A/B test results. This statistical technique can help to control for the effect of these known seasonal events, providing a more precise estimate of the treatment effect.
- For example, a model could be: `Sales = β₀ + β₁*IsTreatment + β₂*IsWeddingPeak + β₃*(IsTreatment*IsWeddingPeak) + ...` This allows estimating the main effect of the treatment and how it interacts with peak periods.
5. Pre-Period and Post-Period Analysis (if applicable):
- If the test is run for a very long duration, one could compare trends before, during, and after major seasonal peaks. This is more aligned with observational methods like Difference-in-Differences (DiD) if a true A/B setup is hard for very long periods, but aspects can be borrowed for A/B analysis interpretation.
6. Avoid Starting/Stopping Mid-Event:
- Try not to start or end the A/B test right in the middle of a major festival or a cluster of highly auspicious wedding dates, as this could introduce bias if one group gets disproportionately affected by the ramp-up or wind-down of the event.
7. Monitor External Factors:
- Keep a log of significant external events that occur during the test period (e.g., competitor promotions, major local events in Hyderabad, Vijayawada, or Rajahmundry) that could influence purchasing behavior, even if they are not strictly seasonal in the same way as festivals.

By employing these strategies, Karthikeya Silk House can gain a more robust understanding of how their new personalized recommendation system performs, disentangling its effect from the powerful seasonal currents that shape saree purchasing patterns in the Telugu states.

Drape Your A/B Expertise!

What are your thoughts on these scenarios? Try answering the questions yourself and share your insights or alternative approaches in the comments section below!

Back to Inferential Stats

Problem Statement

A/B Test Design for Recommendation System

Related Concepts

Hint

Solution

Hypotheses and Metrics for Wedding Season

Related Concepts

Hint

Solution

Null and Alternative Hypotheses:

Specific Metrics to Track:

Accounting for Seasonal Variations

Related Concepts

Hint

Solution

Drape Your A/B Expertise!