28. Find the Index of the First Occurrence in a String Medium

Naive

O(nm) solution which basically just do 2 nested loop to compare the needle with a sliding substring in haystack. So need to take care of some edge case if the loop condition cannot handle.

Z algorithm

Algorithm detail can be found here

The idea is to calculate the z array for needle$haystack where $ does not exist in both of the input string. if any value in the z array is equal to the length of the needle string we found it. This is because the z array store the value for the longest prefix substring starting from each of corresponding index, a value equal to the length of needle string means there is a substring as a prefix starting from this index.

To construct the z array, we maintain a tupple (l ,r) which form a range that is known to be a prefix substring. We start to iterate from 1 as z[0] does not matter.

if i > r, we start to calculate the new (l, r), by setting both to i, and increment r if we keep finding longer prefix substring, once done, set z[i] to r - l + 1;
if i < r, this means we are in a known prefix substring, and we can use this information to help determin z[i]. We can see input[i] == input[i - r]. Now if z[i - r] = 0. this means there is no prefix substring starting from i - r, same for i. If z[i - r] > 0, that means there is a prefix substring with length z[i - r], since (l, r) is a prefix substring, we have 2 different case ** if z[i - r] < r - i + 1, this is basically the remaining characters in the (l, r) starting from i. Basically this means the prefix substring (l, r) covers all the characters in prefix substring starting from i - r. In this case we can conclude z[i] = z[i - r]. ** otherwise prefix substring (l, r) does not cover all the characters in the prefix subtring starting from i - r (this is basically input[i-r..i-r+z[i-r]] inclusive). Now we don't know what comes after the the index r, so we need to recalculate the range (l, r), setting l to i and increment r the same way as before and set z[i] the same way as the first condition as well.

This will give us the z array in linear time since we never compare a character twice. And the space complexity is O(m + n)

kmp

For KMP we preprocess the needle to construct an lps array with the same length as the needle. lps means the longest prefix that is also a sufix, this exclude the string itself. Each value in the lps array indicates the length of the longest prefix that is also a sufix for the substring from 0 to index inclusive. so lps[0] is 0, since we exclude the string itself (otherwise its always the whole string). lps[1] = 0 if pattern[0] != pattern[1] or 1 if pattern[0] == pattern[1], and 1 is the largest value here.

1// init i from 1
2let (mut i, mut j) = (1, 0);
3while i < pattern.len() {
4  if pattern[i] == pattern[j] {
5    j += 1;
6    lps[i] = j;
7    i += 1;
8  } else {
9    // not match and j is not 0, j > 0 means that 
10    // there is a prefix with length j - 1 match the 
11    // the suffix with the same length ending at i - 1.
12    // not pattern[i] != pattern[j], in substring pattern[0..j - 1](inclusive), 
13    // any suffix is equal to the suffix with same length ending at i - 1.
14    // so for the length lps[j - 1], the pattern[0..lps[j - 1]](exclusive) is equal
15    // to the suffix ending at i - 1 with length lps[j - 1], we can then try index
16    // lps[j - 1] with i in next loop
17    if j > 0 {
18      j = lps[j - 1];
19    // not match and j is 0, start from here
20    } else {
21      lps[i] = 0;
22      i += 1;
23    }
24  }
25}
26

When searching with lps preprocessed.

We start comparison of pat[j] with j = 0 with characters of current window of text.
We keep matching characters txt[i] and pat[j] and keep incrementing i and j while pat[j] and txt[i] keep matching.
When we see a mismatch
- We know that characters pat[0..j-1] match with txt[i-j…i-1] (Note that j starts with 0 and increment it only when there is a match).
- We also know (from above definition) that lps[j-1] is count of characters of pat[0…j-1] that are both proper prefix and suffix.
- From above two points, we can conclude that we do not need to match these lps[j-1] characters with txt[i-j…i-1] because we know that these characters will anyway match. Let us consider above example to understand this.

1/**
2 * [28] Implement strStr()
3 *
4 * Implement <a href="http://www.cplusplus.com/reference/cstring/strstr/" target="_blank">strStr()</a>.
5 * Given two strings needle and haystack, return the index of the first occurrence of needle in haystack, or -1 if needle is not part of haystack.
6 * Clarification:
7 * What should we return when needle is an empty string? This is a great question to ask during an interview.
8 * For the purpose of this problem, we will return 0 when needle is an empty string. This is consistent to C's <a href="http://www.cplusplus.com/reference/cstring/strstr/" target="_blank">strstr()</a> and Java's <a href="https://docs.oracle.com/javase/7/docs/api/java/lang/String.html#indexOf(java.lang.String)" target="_blank">indexOf()</a>.
9 *  
10 * Example 1:
11 *
12 * Input: haystack = "hello", needle = "ll"
13 * Output: 2
14 *
15 * Example 2:
16 *
17 * Input: haystack = "aaaaa", needle = "bba"
18 * Output: -1
19 *
20 *  
21 * Constraints:
22 *
23 * 1 <= haystack.length, needle.length <= 10^4
24 * haystack and needle consist of only lowercase English characters.
25 *
26 */
27pub struct Solution {}
28
29// problem: https://leetcode.com/problems/implement-strstr/
30// discuss: https://leetcode.com/problems/implement-strstr/discuss/?currentPage=1&orderBy=most_votes&query=
31
32// submission codes start here
33
34impl Solution {
35    pub fn str_str_naive(haystack: String, needle: String) -> i32 {
36        if needle.len() > haystack.len() {
37            return -1;
38        }
39        let haystack_chs: Vec<char> = haystack.chars().collect();
40        let needle_chs: Vec<char> = needle.chars().collect();
41        for i in 0..haystack_chs.len() - needle_chs.len() + 1 {
42            let mut j = 0;
43            while j < needle_chs.len() {
44                if haystack_chs[i + j] == needle_chs[j] {
45                    j += 1;
46                } else {
47                    break;
48                }
49            }
50            if j == needle_chs.len() {
51                return i as i32;
52            }
53        }
54        -1
55    }
56
57    pub fn str_str_kmp(haystack: String, needle: String) -> i32 {
58        if needle.len() == 0 {
59            return 0;
60        }
61        let haystack_chs: Vec<char> = haystack.chars().collect();
62        let needle_chs: Vec<char> = needle.chars().collect();
63        let mut lps = vec![0; needle_chs.len()];
64        Self::calc_lps(&needle_chs, &mut lps);
65        let (mut i, mut j) = (0, 0);
66        while i < haystack_chs.len() && j < needle_chs.len() {
67            if haystack_chs[i] == needle_chs[j] {
68                i += 1;
69                j += 1;
70            }
71            if j == needle_chs.len() {
72                return (i - j) as i32;
73            } else if i < haystack_chs.len()
74                && j < needle_chs.len()
75                && haystack_chs[i] != needle_chs[j]
76            {
77                if j > 0 {
78                    j = lps[j - 1];
79                } else {
80                    i += 1;
81                }
82            }
83        }
84        -1
85    }
86
87    fn calc_lps(pattern: &Vec<char>, lps: &mut Vec<usize>) {
88        let (mut i, mut j) = (1, 0);
89        while i < pattern.len() {
90            if pattern[i] == pattern[j] {
91                j += 1;
92                lps[i] = j;
93                i += 1;
94            } else {
95                if j > 0 {
96                    j = lps[j - 1];
97                } else {
98                    lps[i] = 0;
99                    i += 1;
100                }
101            }
102        }
103    }
104
105    pub fn str_str_z(haystack: String, needle: String) -> i32 {
106        let mut chs: Vec<char> = needle.chars().collect();
107        chs.push('$');
108        chs.append(&mut haystack.chars().collect::<Vec<char>>());
109        let mut z_arr = vec![0; chs.len()];
110        Self::z(&chs, &mut z_arr);
111
112        for (i, v) in z_arr.iter().enumerate() {
113            if *v == needle.len() {
114                return i as i32 - 1 - needle.len() as i32;
115            }
116        }
117        -1
118    }
119
120    fn z(chs: &Vec<char>, z_arr: &mut Vec<usize>) {
121        let (mut l, mut r) = (0, 0);
122        for i in 1..chs.len() {
123            if i > r {
124                (r, l) = (i, i);
125                while r < chs.len() && chs[r - l] == chs[r] {
126                    r += 1;
127                }
128                z_arr[i] = r - l;
129                r -= 1;
130            } else {
131                let k = i - l;
132
133                if z_arr[k] < r - i + 1 {
134                    z_arr[i] = z_arr[k];
135                } else {
136                    l = i;
137                    while r < chs.len() && chs[r - l] == chs[r] {
138                        r += 1;
139                    }
140                    z_arr[i] = r - l;
141                    r -= 1;
142                }
143            }
144        }
145    }
146}
147
148// submission codes end
149
150#[cfg(test)]
151mod tests {
152    use super::*;
153
154    #[test]
155    fn test_28() {
156        assert_eq!(
157            Solution::str_str_naive("abb".to_string(), "aabaa".to_string()),
158            -1
159        );
160        assert_eq!(Solution::str_str_naive("a".to_string(), "a".to_string()), 0);
161        assert_eq!(
162            Solution::str_str_naive("hello".to_string(), "ll".to_string()),
163            2
164        );
165        assert_eq!(
166            Solution::str_str_naive("aaaaa".to_string(), "bba".to_string()),
167            -1
168        );
169        assert_eq!(
170            Solution::str_str_z("hello".to_string(), "ll".to_string()),
171            2
172        );
173        assert_eq!(
174            Solution::str_str_z("aaaaa".to_string(), "bba".to_string()),
175            -1
176        );
177        assert_eq!(
178            Solution::str_str_z("abb".to_string(), "aabaa".to_string()),
179            -1
180        );
181        assert_eq!(Solution::str_str_z("a".to_string(), "a".to_string()), 0);
182        assert_eq!(
183            Solution::str_str_kmp("hello".to_string(), "ll".to_string()),
184            2
185        );
186        assert_eq!(
187            Solution::str_str_kmp("aaaaa".to_string(), "bba".to_string()),
188            -1
189        );
190        assert_eq!(
191            Solution::str_str_kmp("abb".to_string(), "aabaa".to_string()),
192            -1
193        );
194        assert_eq!(Solution::str_str_kmp("a".to_string(), "a".to_string()), 0);
195    }
196}
197

Back