Repeated Dna sequence - Funny
Problem description
- Given a string s repeating a DNA sequence ( For you don't know what is DNA sequence, it includes 'A', 'C', 'G', 'T'). Your mission is figuring out all the 10 letter long subsequences (substrings) that occur more than once in the sequence.
Approach
- Use a sliding window technique to extract all substrings of length 10 from s.
- Maintain a seen to store all substrings your loop has encountered.
- Maintain a duplicated to store all substrings that encoutered more than once.
- Iterate through a list (0 -> len- L + 1)
- If it not seen, add to seen list
- If it has been in seen list before, add to duplicated ( there are two ways to implement duplciated here, use set first or when return)
- Converted duplicated to set if needed
Time complexity: O(NL) (N len of str, L is len of substring) - because when you create a substring, your effort is equal to substring too.
Space complexity O(NL)
Solution
from typing import List
class Solution:
def findRepeatedDnaSequences(self, s: str) -> List[str]:
return self.findRepeatedLenN(s, 10)
def findRepeatedLenN(self, s: str, n: int) -> List[str]:
seen = set()
duplicated = set()
for i in range(len(s) - n + 1):
sub = s[i: i + n]
if sub in seen:
duplicated.add(sub)
else:
seen.add(sub)
return list(duplicated)
solution = Solution()
print(solution.findRepeatedDnaSequences("AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT"))
All rights reserved