Untitled — Python Coding — Nerchuko Academy

Replace Words with Roots (Stemming)

MEDIUM

You are given a dictionary of "roots" (e.g., a list or set of strings like {"cat", "bat", "rat"}) and a "sentence" (a string of words, e.g., "the cattle was rattled by the battery"). Write a Python function to replace each word in the sentence with its corresponding root if the word is a "successor" of a root (i.e., the word starts with a root).

If a word can be formed by multiple roots, it should be replaced by the shortest root.

If a word is not a successor of any root, it should remain unchanged.

Examples:

Example 1:

Input: roots = ["cat", "bat", "rat"], sentence = "the cattle was rattled by the battery"
Output: "the cat was rat by the bat"

Example 2:

Input: roots = ["a", "b", "c"], sentence = "apple banana carrot"
Output: "a b c"

Example 3 (Shortest Root):

Input: roots = ["sub", "substitute", "replace"], sentence = "substitute all replacements"
Output: "sub all replace" ( "substitute" is replaced by "sub" because it's shorter)

Example 4 (No Replacement):

Input: roots = ["xyz"], sentence = "a quick brown fox"
Output: "a quick brown fox"

Constraints:

The input roots will be a list of strings.
The input sentence will be a string.
Roots and words consist of lowercase English letters.

Function Signature (Python):

from typing import List, Set # Set is good for roots for O(1) avg lookup if needed, but list is fine too

class Solution:
    def replace_with_roots(self, roots: List[str], sentence: str) -> str:
        # Your code here
        pass

Solution: Replace Words with Roots

The Goal: We have a list of "root" words (like "cat", "bat") and a sentence ("the cattle was rattled by the battery"). If a word in the sentence starts with one of our roots (e.g., "cattle" starts with "cat"), we replace that word with the root. If a word could be formed by multiple roots (e.g., if "sub" and "substitute" are roots, and the word is "substituting"), we pick the shortest root ("sub").

Approach: Iterative Processing with Shortest Root Selection

The general strategy is:

Split the input sentence into individual words.
For each word in the sentence:
1. Initialize a variable (e.g., best_root_for_word) to store the shortest root found so far for this word (initially none).
2. Iterate through all the roots in the given roots list/set.
3. If the current word startswith() a root:
  - If best_root_for_word is currently none, or if the current root is shorter than the current best_root_for_word, then update best_root_for_word to this current root.
4. After checking all roots, if a best_root_for_word was found, this word in our processed list becomes the best_root_for_word. Otherwise, it remains the original word.
Join the list of processed words back into a sentence string, separated by spaces.

Using a set for roots can be beneficial if you were checking for exact matches. For startswith, you still need to iterate. Sorting the roots by length beforehand can be an optimization if you want to find the first (and thus shortest) match more quickly, but you'd still need to ensure it's *the* shortest among all possible matches.

from typing import List, Set

class Solution:
    def replace_with_roots(self, roots: List[str], sentence: str) -> str:
        # Convert roots to a set for potentially faster lookups if we were doing exact matches,
        # but for startswith, we still need to iterate. Sorting roots by length can help find
        # a short root quickly, but we must ensure it's the *shortest overall*.
        # For this problem, iterating through all roots for each word is straightforward.
        root_set = set(roots) # Using a set helps if we pre-filter roots based on word length for opt.

        words = sentence.split(' ')
        result_words = []

        for word in words:
            shortest_root_found = None
            for root in root_set: # Iterate through all potential roots
                if word.startswith(root):
                    if shortest_root_found is None or len(root) < len(shortest_root_found):
                        shortest_root_found = root
            
            if shortest_root_found:
                result_words.append(shortest_root_found)
            else:
                result_words.append(word)
                
        return " ".join(result_words)

# Example Usage:
# sol = Solution()
# roots1 = ["cat", "bat", "rat"]
# sentence1 = "the cattle was rattled by the battery"
# print(f"Input: roots={roots1}, sentence='{sentence1}'")
# print(f"Output: '{sol.replace_with_roots(roots1, sentence1)}'") # Expected: the cat was rat by the bat

# roots2 = ["sub", "substitute", "replace"]
# sentence2 = "substitute all replacements"
# print(f"\nInput: roots={roots2}, sentence='{sentence2}'")
# print(f"Output: '{sol.replace_with_roots(roots2, sentence2)}'") # Expected: sub all replace

Complexity Analysis:

Let N be the number of words in the sentence, R be the number of roots in the dictionary, L_{max_word} be the maximum length of a word in the sentence, and L_{max_root} be the maximum length of a root.

Splitting the sentence: O(S) where S is the length of the sentence string. This results in N words.
For each of the N words:
- We iterate through all R roots.
- word.startswith(root) takes time proportional to the length of the root, O(L_{max_root}) in the worst case.
Joining the words: O(S') where S' is the length of the resulting sentence string.

So, the dominant part is the nested loop: Time Complexity: O(N * R * L_{max_root}). If we convert `roots` to a set first, it's O(sum of lengths of roots). The iteration for `startswith` remains.

Space Complexity: O(N + R) to store the words and the roots (or set of roots). If the output sentence string is considered, it's also proportional to its length.

Approach 2: Using a Trie (Prefix Tree) for Roots (Advanced Optimization)

For a very large dictionary of roots, iterating through all roots for every word can be inefficient. A Trie (prefix tree) can optimize the process of finding if a word starts with any of the roots, and finding the shortest such root.

Build Trie: Insert all roots into a Trie. Mark the end of each root word in the Trie.
Process Sentence: For each word in the sentence:
- Traverse the Trie character by character with the current word.
- Keep track of the shortest root found during traversal. If you reach a node in the Trie that is marked as the end of a root, that root is a candidate. Continue traversing: if you find another, shorter root that is a prefix, update your choice.
- If no root prefix is found, use the original word. Otherwise, use the shortest root found.
Join the words back.

Implementing a Trie is more involved and might be beyond the scope of an initial question unless the interviewer specifically pushes for performance optimization with a very large root dictionary.

# Conceptual structure for Trie approach (implementation details omitted for brevity here)
class TrieNode:
    def __init__(self):
        self.children = {} # char -> TrieNode
        self.is_end_of_root = False
        self.root_word = None # Store the actual root word here if it ends here

class SolutionWithTrie:
    def replace_with_roots_trie(self, roots: List[str], sentence: str) -> str:
        # 1. Build Trie from roots
        trie_root_node = TrieNode()
        for root in roots:
            node = trie_root_node
            for char in root:
                if char not in node.children:
                    node.children[char] = TrieNode()
                node = node.children[char]
            node.is_end_of_root = True
            node.root_word = root # Store the full root string

        words = sentence.split(' ')
        result_words = []

        for word in words:
            node = trie_root_node
            shortest_found_root = word # Default to original word
            for i, char in enumerate(word):
                if char not in node.children:
                    break # No root starts with this prefix further
                node = node.children[char]
                if node.is_end_of_root:
                    # Found a root that is a prefix of the word.
                    # Since we want the shortest, and we are traversing char by char,
                    # the first one we find will be the shortest.
                    shortest_found_root = node.root_word 
                    break # Found shortest root for this word
            result_words.append(shortest_found_root)
            
        return " ".join(result_words)

Complexity Analysis (Trie Approach):

Let R be the number of roots, L_{avg_root} be the average length of a root, N be the number of words in the sentence, and L_{avg_word} be the average length of a word.

Trie Construction: O(Total characters in all roots) = O(R * L_{avg_root}).
Processing Sentence: For each of N words, we traverse the Trie up to the length of the word (L_{avg_word}). This is O(N * L_{avg_word}).

Time Complexity: O(R * L_{avg_root} + N * L_{avg_word}). This is generally much better than the O(N * R * L_{max_root}) of the naive approach if R is large.

Space Complexity: O(R * L_{avg_root}) for storing the Trie.

Key Takeaways for Interviews:

Clarify Requirements: Ask about handling of case sensitivity (problem says lowercase), punctuation in sentence (assume simple space separation as per example), and tie-breaking for roots of same shortest length (here, any of them is fine).
Basic Approach: The iterative approach (Approach 1) is a good starting point to demonstrate understanding. Clearly explain the logic for finding a match and then selecting the shortest root.
String Methods: Show proficiency with .split(), .startswith(), and .join().
Data Structure for Roots: Using a set for roots might be mentioned for general unique storage, but for startswith, iteration is still needed. Sorting the roots by length (shortest first) and then iterating can allow early exit if the *first* match found is guaranteed to be the *overall shortest*. However, simply iterating all roots and tracking the shortest found is more straightforward to implement correctly under pressure.
Optimization (Trie): For a follow-up on performance with many roots, mentioning a Trie as an optimized data structure for prefix matching shows advanced knowledge. Be prepared to briefly explain how it would work.