Replace Words with Roots (Stemming)
You are given a dictionary of "roots" (e.g., a list or set of strings like {"cat", "bat", "rat"}) and a "sentence" (a string of words, e.g., "the cattle was rattled by the battery"). Write a Python function to replace each word in the sentence with its corresponding root if the word is a "successor" of a root (i.e., the word starts with a root).
If a word can be formed by multiple roots, it should be replaced by the shortest root.
If a word is not a successor of any root, it should remain unchanged.
Examples:
Example 1:
Input: roots = ["cat", "bat", "rat"], sentence = "the cattle was rattled by the battery"
Output: "the cat was rat by the bat"
Example 2:
Input: roots = ["a", "b", "c"], sentence = "apple banana carrot"
Output: "a b c"
Example 3 (Shortest Root):
Input: roots = ["sub", "substitute", "replace"], sentence = "substitute all replacements"
Output: "sub all replace" ( "substitute" is replaced by "sub" because it's shorter)
Example 4 (No Replacement):
Input: roots = ["xyz"], sentence = "a quick brown fox"
Output: "a quick brown fox"
Constraints:
- The input
rootswill be a list of strings. - The input
sentencewill be a string. - Roots and words consist of lowercase English letters.
Function Signature (Python):
from typing import List, Set # Set is good for roots for O(1) avg lookup if needed, but list is fine too
class Solution:
def replace_with_roots(self, roots: List[str], sentence: str) -> str:
# Your code here
pass
Related Python Concepts
.split(), .startswith(), .join()) List Manipulation Loops (nested) Conditional Logic Set for Efficient Lookup (optional for roots) Sorting (optional for roots for tie-breaking if rules were different) Trie Data Structure (Advanced Optimization)Hint
The main steps are to process each word of the sentence individually.
- Tokenize Sentence: First, break the sentence into a list of words. The
.split()method is useful here. - For Each Word:
- You need to find if any root in your
rootsdictionary/list is a prefix of the current word. Theword.startswith(root)method is key. - Keep track of the "best" root found so far for the current word. "Best" means the shortest one if multiple roots match.
- Initialize
shortest_root_found_for_this_wordtoNoneor some indicator. - Iterate through all
rootin yourroots.- If
word.startswith(root):- If no root has been found yet for this word OR if this
rootis shorter than theshortest_root_found_for_this_word, then updateshortest_root_found_for_this_word = root.
- If no root has been found yet for this word OR if this
- If
- After checking all roots, if a
shortest_root_found_for_this_wordexists, use it. Otherwise, use the original word.
- You need to find if any root in your
- Reconstruct Sentence: After processing all words, join them back into a single string, separated by spaces. The
" ".join(list_of_words)method is perfect. - Optimization (Optional): If the list of roots is very large, converting it to a
setcan make checking for root existence faster if you were doing exact matches, but here you need to iterate to checkstartswith. Sorting roots by length first (shortest to longest) could allow you to stop early once a match is found, as any subsequent matches will be longer or equal. However, you still need to check all roots of the same shortest length if that's a tie-breaking rule (not specified here, shortest overall is the rule). A Trie (prefix tree) built from the roots would be the most efficient data structure for prefix matching.
Solution: Replace Words with Roots
The Goal: We have a list of "root" words (like "cat", "bat") and a sentence ("the cattle was rattled by the battery"). If a word in the sentence starts with one of our roots (e.g., "cattle" starts with "cat"), we replace that word with the root. If a word could be formed by multiple roots (e.g., if "sub" and "substitute" are roots, and the word is "substituting"), we pick the shortest root ("sub").
Approach: Iterative Processing with Shortest Root Selection
The general strategy is:
- Split the input sentence into individual words.
- For each word in the sentence:
- Initialize a variable (e.g.,
best_root_for_word) to store the shortest root found so far for this word (initially none). - Iterate through all the roots in the given
rootslist/set. - If the current word
startswith()a root:- If
best_root_for_wordis currently none, or if the currentrootis shorter than the currentbest_root_for_word, then updatebest_root_for_wordto this currentroot.
- If
- After checking all roots, if a
best_root_for_wordwas found, this word in our processed list becomes thebest_root_for_word. Otherwise, it remains the original word.
- Initialize a variable (e.g.,
- Join the list of processed words back into a sentence string, separated by spaces.
set for roots can be beneficial if you were checking for exact matches. For startswith, you still need to iterate. Sorting the roots by length beforehand can be an optimization if you want to find the first (and thus shortest) match more quickly, but you'd still need to ensure it's *the* shortest among all possible matches.
from typing import List, Set
class Solution:
def replace_with_roots(self, roots: List[str], sentence: str) -> str:
# Convert roots to a set for potentially faster lookups if we were doing exact matches,
# but for startswith, we still need to iterate. Sorting roots by length can help find
# a short root quickly, but we must ensure it's the *shortest overall*.
# For this problem, iterating through all roots for each word is straightforward.
root_set = set(roots) # Using a set helps if we pre-filter roots based on word length for opt.
words = sentence.split(' ')
result_words = []
for word in words:
shortest_root_found = None
for root in root_set: # Iterate through all potential roots
if word.startswith(root):
if shortest_root_found is None or len(root) < len(shortest_root_found):
shortest_root_found = root
if shortest_root_found:
result_words.append(shortest_root_found)
else:
result_words.append(word)
return " ".join(result_words)
# Example Usage:
# sol = Solution()
# roots1 = ["cat", "bat", "rat"]
# sentence1 = "the cattle was rattled by the battery"
# print(f"Input: roots={roots1}, sentence='{sentence1}'")
# print(f"Output: '{sol.replace_with_roots(roots1, sentence1)}'") # Expected: the cat was rat by the bat
# roots2 = ["sub", "substitute", "replace"]
# sentence2 = "substitute all replacements"
# print(f"\nInput: roots={roots2}, sentence='{sentence2}'")
# print(f"Output: '{sol.replace_with_roots(roots2, sentence2)}'") # Expected: sub all replace
Complexity Analysis:
Let N be the number of words in the sentence, R be the number of roots in the dictionary, Lmax_word be the maximum length of a word in the sentence, and Lmax_root be the maximum length of a root.
- Splitting the sentence:
O(S)where S is the length of the sentence string. This results in N words. - For each of the N words:
- We iterate through all R roots.
word.startswith(root)takes time proportional to the length of the root, O(Lmax_root) in the worst case.
- Joining the words:
O(S')where S' is the length of the resulting sentence string.
So, the dominant part is the nested loop: Time Complexity: O(N * R * Lmax_root). If we convert `roots` to a set first, it's O(sum of lengths of roots). The iteration for `startswith` remains.
Space Complexity: O(N + R) to store the words and the roots (or set of roots). If the output sentence string is considered, it's also proportional to its length.
Approach 2: Using a Trie (Prefix Tree) for Roots (Advanced Optimization)
For a very large dictionary of roots, iterating through all roots for every word can be inefficient. A Trie (prefix tree) can optimize the process of finding if a word starts with any of the roots, and finding the shortest such root.
- Build Trie: Insert all roots into a Trie. Mark the end of each root word in the Trie.
- Process Sentence: For each word in the sentence:
- Traverse the Trie character by character with the current word.
- Keep track of the shortest root found during traversal. If you reach a node in the Trie that is marked as the end of a root, that root is a candidate. Continue traversing: if you find another, shorter root that is a prefix, update your choice.
- If no root prefix is found, use the original word. Otherwise, use the shortest root found.
- Join the words back.
Implementing a Trie is more involved and might be beyond the scope of an initial question unless the interviewer specifically pushes for performance optimization with a very large root dictionary.
# Conceptual structure for Trie approach (implementation details omitted for brevity here)
class TrieNode:
def __init__(self):
self.children = {} # char -> TrieNode
self.is_end_of_root = False
self.root_word = None # Store the actual root word here if it ends here
class SolutionWithTrie:
def replace_with_roots_trie(self, roots: List[str], sentence: str) -> str:
# 1. Build Trie from roots
trie_root_node = TrieNode()
for root in roots:
node = trie_root_node
for char in root:
if char not in node.children:
node.children[char] = TrieNode()
node = node.children[char]
node.is_end_of_root = True
node.root_word = root # Store the full root string
words = sentence.split(' ')
result_words = []
for word in words:
node = trie_root_node
shortest_found_root = word # Default to original word
for i, char in enumerate(word):
if char not in node.children:
break # No root starts with this prefix further
node = node.children[char]
if node.is_end_of_root:
# Found a root that is a prefix of the word.
# Since we want the shortest, and we are traversing char by char,
# the first one we find will be the shortest.
shortest_found_root = node.root_word
break # Found shortest root for this word
result_words.append(shortest_found_root)
return " ".join(result_words)
Complexity Analysis (Trie Approach):
Let R be the number of roots, Lavg_root be the average length of a root, N be the number of words in the sentence, and Lavg_word be the average length of a word.
- Trie Construction: O(Total characters in all roots) = O(R * Lavg_root).
- Processing Sentence: For each of N words, we traverse the Trie up to the length of the word (Lavg_word). This is O(N * Lavg_word).
Time Complexity: O(R * Lavg_root + N * Lavg_word). This is generally much better than the O(N * R * Lmax_root) of the naive approach if R is large.
Space Complexity: O(R * Lavg_root) for storing the Trie.
Key Takeaways for Interviews:
- Clarify Requirements: Ask about handling of case sensitivity (problem says lowercase), punctuation in sentence (assume simple space separation as per example), and tie-breaking for roots of same shortest length (here, any of them is fine).
- Basic Approach: The iterative approach (Approach 1) is a good starting point to demonstrate understanding. Clearly explain the logic for finding a match and then selecting the shortest root.
- String Methods: Show proficiency with
.split(),.startswith(), and.join(). - Data Structure for Roots: Using a
setforrootsmight be mentioned for general unique storage, but forstartswith, iteration is still needed. Sorting the roots by length (shortest first) and then iterating can allow early exit if the *first* match found is guaranteed to be the *overall shortest*. However, simply iterating all roots and tracking the shortest found is more straightforward to implement correctly under pressure. - Optimization (Trie): For a follow-up on performance with many roots, mentioning a Trie as an optimized data structure for prefix matching shows advanced knowledge. Be prepared to briefly explain how it would work.