28. Find the Index of the First Occurrence in a String Medium
Naive
O(nm) solution which basically just do 2 nested loop to compare the needle with a sliding substring in haystack. So need to take care of some edge case if the loop condition cannot handle.
Z algorithm
Algorithm detail can be found here
The idea is to calculate the z array for needle$haystack where $ does not exist in both of the input string. if any value in the z array is equal to the length of the needle string we found it. This is because the z array store the value for the longest prefix substring starting from each of corresponding index, a value equal to the length of needle string means there is a substring as a prefix starting from this index.
To construct the z array, we maintain a tupple (l ,r) which form a range that is known to be a prefix substring. We start to iterate from 1 as z[0] does not matter.
- if i > r, we start to calculate the new (l, r), by setting both to i, and increment r if we keep finding longer prefix substring, once done, set z[i] to r - l + 1;
- if i < r, this means we are in a known prefix substring, and we can use this information to help determin z[i]. We can see input[i] == input[i - r]. Now if z[i - r] = 0. this means there is no prefix substring starting from i - r, same for i. If z[i - r] > 0, that means there is a prefix substring with length z[i - r], since (l, r) is a prefix substring, we have 2 different case ** if z[i - r] < r - i + 1, this is basically the remaining characters in the (l, r) starting from i. Basically this means the prefix substring (l, r) covers all the characters in prefix substring starting from i - r. In this case we can conclude z[i] = z[i - r]. ** otherwise prefix substring (l, r) does not cover all the characters in the prefix subtring starting from i - r (this is basically input[i-r..i-r+z[i-r]] inclusive). Now we don't know what comes after the the index r, so we need to recalculate the range (l, r), setting l to i and increment r the same way as before and set z[i] the same way as the first condition as well.
This will give us the z array in linear time since we never compare a character twice. And the space complexity is O(m + n)
kmp
For KMP we preprocess the needle to construct an lps array with the same length as the needle. lps means the longest prefix that is also a sufix, this exclude the string itself. Each value in the lps array indicates the length of the longest prefix that is also a sufix for the substring from 0 to index inclusive. so lps[0] is 0, since we exclude the string itself (otherwise its always the whole string). lps[1] = 0 if pattern[0] != pattern[1] or 1 if pattern[0] == pattern[1], and 1 is the largest value here.
1// init i from 1 2let (mut i, mut j) = (1, 0); 3while i < pattern.len() { 4 if pattern[i] == pattern[j] { 5 j += 1; 6 lps[i] = j; 7 i += 1; 8 } else { 9 // not match and j is not 0, j > 0 means that 10 // there is a prefix with length j - 1 match the 11 // the suffix with the same length ending at i - 1. 12 // not pattern[i] != pattern[j], in substring pattern[0..j - 1](inclusive), 13 // any suffix is equal to the suffix with same length ending at i - 1. 14 // so for the length lps[j - 1], the pattern[0..lps[j - 1]](exclusive) is equal 15 // to the suffix ending at i - 1 with length lps[j - 1], we can then try index 16 // lps[j - 1] with i in next loop 17 if j > 0 { 18 j = lps[j - 1]; 19 // not match and j is 0, start from here 20 } else { 21 lps[i] = 0; 22 i += 1; 23 } 24 } 25} 26
When searching with lps preprocessed.
- We start comparison of pat[j] with j = 0 with characters of current window of text.
- We keep matching characters txt[i] and pat[j] and keep incrementing i and j while pat[j] and txt[i] keep matching.
- When we see a mismatch
- We know that characters pat[0..j-1] match with txt[i-j…i-1] (Note that j starts with 0 and increment it only when there is a match).
- We also know (from above definition) that lps[j-1] is count of characters of pat[0…j-1] that are both proper prefix and suffix.
- From above two points, we can conclude that we do not need to match these lps[j-1] characters with txt[i-j…i-1] because we know that these characters will anyway match. Let us consider above example to understand this.
1/**
2 * [28] Implement strStr()
3 *
4 * Implement <a href="http://www.cplusplus.com/reference/cstring/strstr/" target="_blank">strStr()</a>.
5 * Given two strings needle and haystack, return the index of the first occurrence of needle in haystack, or -1 if needle is not part of haystack.
6 * Clarification:
7 * What should we return when needle is an empty string? This is a great question to ask during an interview.
8 * For the purpose of this problem, we will return 0 when needle is an empty string. This is consistent to C's <a href="http://www.cplusplus.com/reference/cstring/strstr/" target="_blank">strstr()</a> and Java's <a href="https://docs.oracle.com/javase/7/docs/api/java/lang/String.html#indexOf(java.lang.String)" target="_blank">indexOf()</a>.
9 *
10 * Example 1:
11 *
12 * Input: haystack = "hello", needle = "ll"
13 * Output: 2
14 *
15 * Example 2:
16 *
17 * Input: haystack = "aaaaa", needle = "bba"
18 * Output: -1
19 *
20 *
21 * Constraints:
22 *
23 * 1 <= haystack.length, needle.length <= 10^4
24 * haystack and needle consist of only lowercase English characters.
25 *
26 */
27pub struct Solution {}
28
29// problem: https://leetcode.com/problems/implement-strstr/
30// discuss: https://leetcode.com/problems/implement-strstr/discuss/?currentPage=1&orderBy=most_votes&query=
31
32// submission codes start here
33
34impl Solution {
35 pub fn str_str_naive(haystack: String, needle: String) -> i32 {
36 if needle.len() > haystack.len() {
37 return -1;
38 }
39 let haystack_chs: Vec<char> = haystack.chars().collect();
40 let needle_chs: Vec<char> = needle.chars().collect();
41 for i in 0..haystack_chs.len() - needle_chs.len() + 1 {
42 let mut j = 0;
43 while j < needle_chs.len() {
44 if haystack_chs[i + j] == needle_chs[j] {
45 j += 1;
46 } else {
47 break;
48 }
49 }
50 if j == needle_chs.len() {
51 return i as i32;
52 }
53 }
54 -1
55 }
56
57 pub fn str_str_kmp(haystack: String, needle: String) -> i32 {
58 if needle.len() == 0 {
59 return 0;
60 }
61 let haystack_chs: Vec<char> = haystack.chars().collect();
62 let needle_chs: Vec<char> = needle.chars().collect();
63 let mut lps = vec![0; needle_chs.len()];
64 Self::calc_lps(&needle_chs, &mut lps);
65 let (mut i, mut j) = (0, 0);
66 while i < haystack_chs.len() && j < needle_chs.len() {
67 if haystack_chs[i] == needle_chs[j] {
68 i += 1;
69 j += 1;
70 }
71 if j == needle_chs.len() {
72 return (i - j) as i32;
73 } else if i < haystack_chs.len()
74 && j < needle_chs.len()
75 && haystack_chs[i] != needle_chs[j]
76 {
77 if j > 0 {
78 j = lps[j - 1];
79 } else {
80 i += 1;
81 }
82 }
83 }
84 -1
85 }
86
87 fn calc_lps(pattern: &Vec<char>, lps: &mut Vec<usize>) {
88 let (mut i, mut j) = (1, 0);
89 while i < pattern.len() {
90 if pattern[i] == pattern[j] {
91 j += 1;
92 lps[i] = j;
93 i += 1;
94 } else {
95 if j > 0 {
96 j = lps[j - 1];
97 } else {
98 lps[i] = 0;
99 i += 1;
100 }
101 }
102 }
103 }
104
105 pub fn str_str_z(haystack: String, needle: String) -> i32 {
106 let mut chs: Vec<char> = needle.chars().collect();
107 chs.push('$');
108 chs.append(&mut haystack.chars().collect::<Vec<char>>());
109 let mut z_arr = vec![0; chs.len()];
110 Self::z(&chs, &mut z_arr);
111
112 for (i, v) in z_arr.iter().enumerate() {
113 if *v == needle.len() {
114 return i as i32 - 1 - needle.len() as i32;
115 }
116 }
117 -1
118 }
119
120 fn z(chs: &Vec<char>, z_arr: &mut Vec<usize>) {
121 let (mut l, mut r) = (0, 0);
122 for i in 1..chs.len() {
123 if i > r {
124 (r, l) = (i, i);
125 while r < chs.len() && chs[r - l] == chs[r] {
126 r += 1;
127 }
128 z_arr[i] = r - l;
129 r -= 1;
130 } else {
131 let k = i - l;
132
133 if z_arr[k] < r - i + 1 {
134 z_arr[i] = z_arr[k];
135 } else {
136 l = i;
137 while r < chs.len() && chs[r - l] == chs[r] {
138 r += 1;
139 }
140 z_arr[i] = r - l;
141 r -= 1;
142 }
143 }
144 }
145 }
146}
147
148// submission codes end
149
150#[cfg(test)]
151mod tests {
152 use super::*;
153
154 #[test]
155 fn test_28() {
156 assert_eq!(
157 Solution::str_str_naive("abb".to_string(), "aabaa".to_string()),
158 -1
159 );
160 assert_eq!(Solution::str_str_naive("a".to_string(), "a".to_string()), 0);
161 assert_eq!(
162 Solution::str_str_naive("hello".to_string(), "ll".to_string()),
163 2
164 );
165 assert_eq!(
166 Solution::str_str_naive("aaaaa".to_string(), "bba".to_string()),
167 -1
168 );
169 assert_eq!(
170 Solution::str_str_z("hello".to_string(), "ll".to_string()),
171 2
172 );
173 assert_eq!(
174 Solution::str_str_z("aaaaa".to_string(), "bba".to_string()),
175 -1
176 );
177 assert_eq!(
178 Solution::str_str_z("abb".to_string(), "aabaa".to_string()),
179 -1
180 );
181 assert_eq!(Solution::str_str_z("a".to_string(), "a".to_string()), 0);
182 assert_eq!(
183 Solution::str_str_kmp("hello".to_string(), "ll".to_string()),
184 2
185 );
186 assert_eq!(
187 Solution::str_str_kmp("aaaaa".to_string(), "bba".to_string()),
188 -1
189 );
190 assert_eq!(
191 Solution::str_str_kmp("abb".to_string(), "aabaa".to_string()),
192 -1
193 );
194 assert_eq!(Solution::str_str_kmp("a".to_string(), "a".to_string()), 0);
195 }
196}
197
Back