Đã đăng vào thg 3 15, 2:57 SA 0 phút đọc

Repeated Dna sequence - Funny

Problem description

Given a string s repeating a DNA sequence ( For you don't know what is DNA sequence, it includes 'A', 'C', 'G', 'T'). Your mission is figuring out all the 10 letter long subsequences (substrings) that occur more than once in the sequence.

Approach

Use a sliding window technique to extract all substrings of length 10 from s.
Maintain a seen to store all substrings your loop has encountered.
Maintain a duplicated to store all substrings that encoutered more than once.
Iterate through a list (0 -> len- L + 1)
- If it not seen, add to seen list
- If it has been in seen list before, add to duplicated ( there are two ways to implement duplciated here, use set first or when return)
Converted duplicated to set if needed

Time complexity: O(NL) (N len of str, L is len of substring) - because when you create a substring, your effort is equal to substring too.

Space complexity O(NL)

Solution

from typing import List

class Solution:
  def findRepeatedDnaSequences(self, s: str) -> List[str]:
    return self.findRepeatedLenN(s, 10)
  def findRepeatedLenN(self, s: str, n: int) -> List[str]:
    seen = set()
    duplicated = set()
    
    for i in range(len(s) - n + 1):
      sub = s[i: i + n]
      if sub in seen:
        duplicated.add(sub)
      else:
        seen.add(sub)
        
    return list(duplicated)
      
solution = Solution()
print(solution.findRepeatedDnaSequences("AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT"))

sliding window

Problem description

Approach

Time complexity: O(NL) (N len of str, L is len of substring) - because when you create a substring, your effort is equal to substring too.

Space complexity O(NL)

Mục lục