Longest Common Substring

Tags: String, LintCode Copyright, Medium

Question

Problem Statement

Given two strings, find the longest common substring.

Return the length of it.

Notice

The characters in substring should occur continuously in original string.
This is different with subsequence.

Example

Given A = "ABCD", B = "CBCE", return 2.

Challenge **

O(n x m) time and memory.

题解1 - 暴力枚举

求最长公共子串,注意「子串」和「子序列」的区别!简单考虑可以暴力使用两根指针索引分别指向两个字符串的当前遍历位置,若遇到相等的字符时则同时向后移动一位。

Python

  1. class Solution:
  2. # @param A, B: Two string.
  3. # @return: the length of the longest common substring.
  4. def longestCommonSubstring(self, A, B):
  5. if not (A and B):
  6. return 0
  7. lcs = 0
  8. for i in range(len(A)):
  9. for j in range(len(B)):
  10. lcs_temp = 0
  11. while (i + lcs_temp < len(A) and
  12. j + lcs_temp < len(B) and
  13. A[i + lcs_temp] == B[j + lcs_temp]):
  14. lcs_temp += 1
  15. # update lcs
  16. if lcs_temp > lcs:
  17. lcs = lcs_temp
  18. return lcs

C++

  1. class Solution {
  2. public:
  3. /**
  4. * @param A, B: Two string.
  5. * @return: the length of the longest common substring.
  6. */
  7. int longestCommonSubstring(string &A, string &B) {
  8. if (A.empty() || B.empty()) {
  9. return 0;
  10. }
  11. int lcs = 0;
  12. for (int i = 0; i < A.length(); ++i) {
  13. for (int j = 0; j < B.length(); ++j) {
  14. int lcs_temp = 0;
  15. while ((i + lcs_temp < A.length()) &&
  16. (j + lcs_temp < B.length()) &&
  17. (A[i + lcs_temp] == B[j + lcs_temp])) {
  18. ++lcs_temp;
  19. }
  20. // update lcs
  21. if (lcs_temp > lcs) lcs = lcs_temp;
  22. }
  23. }
  24. return lcs;
  25. }
  26. };

Java

  1. public class Solution {
  2. /**
  3. * @param A, B: Two string.
  4. * @return: the length of the longest common substring.
  5. */
  6. public int longestCommonSubstring(String A, String B) {
  7. if ((A == null || A.isEmpty()) ||
  8. (B == null || B.isEmpty())) {
  9. return 0;
  10. }
  11. int lcs = 0;
  12. for (int i = 0; i < A.length(); i++) {
  13. for (int j = 0; j < B.length(); j++) {
  14. int lcs_temp = 0;
  15. while (i + lcs_temp < A.length() &&
  16. j + lcs_temp < B.length() &&
  17. A.charAt(i + lcs_temp) == B.charAt(j + lcs_temp)) {
  18. lcs_temp++;
  19. }
  20. // update lcs
  21. if (lcs_temp > lcs) lcs = lcs_temp;
  22. }
  23. }
  24. return lcs;
  25. }
  26. }

源码分析

  1. 异常处理,空串时返回0.
  2. 分别使用ij表示当前遍历的索引处。若当前字符相同时则共同往后移动一位。
  3. 没有相同字符时比较此次遍历的lcs_templcs大小,更新lcs.
  4. 返回lcs.

注意在while循环中不可直接使用++i或者++j,即两根指针依次向前移动,不能在内循环处更改,因为有可能会漏解!

复杂度分析

双重 for 循环,最坏时间复杂度约为 O(mn \cdot lcs), lcs 最大可为 \min{m, n} .

题解2 - 动态规划

题解1中使用了两根指针指向当前所取子串的起点,在实际比较过程中存在较多的重复计算,故可以考虑使用记忆化搜索或者动态规划对其进行优化。动态规划中状态的确定及其状态转移方程最为关键,如果直接以题目所求为状态,我们会发现其状态转移方程似乎写不出来,但退而求其次,我们不妨采用子串/子序列中常用的状态定义——『以(i,j)结尾(如 A[i-1], B[j-1])且其字符相等的子串lcs, 状态转移时只需判断两个字符串后一位字符是否相等,最后再次遍历二维状态数组即可。

Python

  1. class Solution:
  2. # @param A, B: Two string.
  3. # @return: the length of the longest common substring.
  4. def longestCommonSubstring(self, A, B):
  5. if not (A and B):
  6. return 0
  7. n, m = len(A), len(B)
  8. f = [[0 for i in range(m + 1)] for j in range(n + 1)]
  9. for i in range(n):
  10. for j in range(m):
  11. if A[i] == B[j]:
  12. f[i + 1][j + 1] = 1 + f[i][j]
  13. lcs = max(map(max, f))
  14. return lcs

C++

  1. class Solution {
  2. public:
  3. /**
  4. * @param A, B: Two string.
  5. * @return: the length of the longest common substring.
  6. */
  7. int longestCommonSubstring(string &A, string &B) {
  8. if (A.empty() || B.empty()) {
  9. return 0;
  10. }
  11. int n = A.length();
  12. int m = B.length();
  13. vector<vector<int> > f = vector<vector<int> >(n + 1, vector<int>(m + 1, 0));
  14. for (int i = 0; i < n; ++i) {
  15. for (int j = 0; j < m; ++j) {
  16. if (A[i] == B[j]) {
  17. f[i + 1][j + 1] = 1 + f[i][j];
  18. }
  19. }
  20. }
  21. // find max lcs
  22. int lcs = 0;
  23. for (int i = 1; i <= n; ++i) {
  24. for (int j = 1; j <= m; ++j) {
  25. if (f[i][j] > lcs) lcs = f[i][j];
  26. }
  27. }
  28. return lcs;
  29. }
  30. };

Java

  1. public class Solution {
  2. /**
  3. * @param A, B: Two string.
  4. * @return: the length of the longest common substring.
  5. */
  6. public int longestCommonSubstring(String A, String B) {
  7. if ((A == null || A.isEmpty()) ||
  8. (B == null || B.isEmpty())) {
  9. return 0;
  10. }
  11. int n = A.length();
  12. int m = B.length();
  13. int[][] f = new int[n + 1][m + 1];
  14. for (int i = 0; i < n; i++) {
  15. for (int j = 0; j < m; j++) {
  16. if (A.charAt(i) == B.charAt(j)) {
  17. f[i + 1][j + 1] = 1 + f[i][j];
  18. }
  19. }
  20. }
  21. // find max lcs
  22. int lcs = 0;
  23. for (int i = 1; i <= n; i++) {
  24. for (int j = 1; j <= m; j++) {
  25. if (f[i][j] > lcs) lcs = f[i][j];
  26. }
  27. }
  28. return lcs;
  29. }
  30. }

源码分析

  1. 异常处理
  2. 列出状态转移方程,关键处在于以 (i,j) 结尾的两个字符串

复杂度分析

两次双重 for 循环,时间复杂度约为 O(mn), 空间复杂度为 O(mn). 对于这个题而言,使用动态规划的思维其复杂度并未得到明显下降。

Reference