Python | Substring Frequency between Uppercases

Sometimes while working with Strings, we can have a problem in which we have to find substrings that occur between the uppercases and find their frequencies. This is a very unique problem and has less of applications. Let us discuss certain ways in which this task can be performed.
Method #1 : Using groupby() + regex + loop
The combination of the above functions can be used to perform this task. In this, we use groupby() to group the regex-extracted substrings. And all is compiled using loop.
Python3
| # Python3 code to demonstrate working of# Substring Frequency between Uppercases# Using groupby() + regex + loopfromitertools importgroupbyimportre# initializing stringtest_str ="GeEkSForGeEkS"# printing original stringprint("The original string is : "+test_str)# Substring Frequency between Uppercases# Using groupby() + regex + loopres ={}fori, j ingroupby(re.findall(r'[A-Z][a-z]*\d*', test_str)):    res[i] =res[i] +1ifi inres.keys() else1# printing resultprint("The grouped Substring Frequency : "+str(res)) | 
The original string is : GeEkSForGeEkS
The grouped Substring Frequency : {'Ge': 2, 'Ek': 2, 'S': 2, 'For': 1}
Method #2: Using dictionary comprehension + sorted() + groupby() + regex
The combination of the above functions can be used to perform this task. In this, the task of cumulation is performed using dictionary comprehension.
Python3
| # Python3 code to demonstrate working of# Substring Frequency between Uppercases# Using dictionary comprehension + sorted() + groupby() + regexfromitertools importgroupbyimportre# initializing stringtest_str ="GeEkSForGeEkS"# printing original stringprint("The original string is : "+test_str)# Substring Frequency between Uppercases# Using dictionary comprehension + sorted() + groupby() + regexres ={i: len(list(j)) fori, j ingroupby(    sorted(re.findall(r'[A-Z][a-z]*\d*', test_str)))}# printing resultprint("The grouped Substring Frequency : "+str(res)) | 
The original string is : GeEkSForGeEkS
The grouped Substring Frequency : {'Ek': 2, 'For': 1, 'Ge': 2, 'S': 2}
Time Complexity: O(n)
Auxiliary Space: O(n)
Method #3 : Using isupper(),split(),list(),set(),remove(),count() methods
Approach:
- Replace uppercase character with “*”+uppercase character(using isupper())
- Split on * that will result in a list(using split())
- Create a new list with unique strings of above list(using list(),set())
- Create a dictionary with unique strings as key and its count in original list. (using count() method)
Example:
Python3
| # Python3 code to demonstrate working of# Substring Frequency between Uppercases# initializing stringtest_str ="GeEkSForGeEkS"# printing original stringprint("The original string is : "+test_str)# Substring Frequency between Uppercasesres =dict()x =""fori intest_str:    ifi.isupper():        x +="*"+i    else:        x +=iy =x.split("*")y.remove('')z =list(set(y))fori inz:    res[i] =y.count(i)# printing resultprint("The grouped Substring Frequency : "+str(res)) | 
The original string is : GeEkSForGeEkS
The grouped Substring Frequency : {'S': 2, 'Ge': 2, 'Ek': 2, 'For': 1}
Time Complexity : O(N)
Auxiliary Space : O(N)
Method #4: Using regular expression and Counter()
- A regular expression pattern is defined to match all the substrings that start with an uppercase letter followed by zero or more lowercase letters.
- The re.findall() method is used to find all the matches of the pattern in the given string. This method returns a list of all the matches.
- The Counter() method is used to count the frequency of each match in the list.
- The resulting dictionary of the Counter() method is the desired output.
Example:
Python3
| importrefromcollections importCounter# initializing stringtest_str ="GeEkSForGeEkS"# printing original stringprint("The original string is : "+test_str)# Substring Frequency between Uppercasespattern =r'[A-Z][a-z]*'matches =re.findall(pattern, test_str)res =dict(Counter(matches))# printing resultprint("The grouped Substring Frequency : "+str(res)) | 
The original string is : GeEkSForGeEkS
The grouped Substring Frequency : {'Ge': 2, 'Ek': 2, 'S': 2, 'For': 1}
Time complexity: O(n), where n is the length of the input string. 
Auxiliary space: O(n), where n is the length of the input string.
Method #5: Using a for loop and string slicing
- Initialize an empty dictionary res to store the substring frequencies.
- Initialize a variable start to 0.
- Loop through each character in the test_str starting from the second character.
- If the current character is uppercase, slice the substring between start and the current index (exclusive) and add it to the res dictionary. Update the start variable to the current index.
- After the loop, slice the substring between the last uppercase letter and the end of the string and add it to the res dictionary.
- Print the res dictionary.
Example:
Python3
| # initializing stringtest_str ="GeEkSForGeEkS"# printing original stringprint("The original string is : "+test_str)# Substring Frequency between Uppercasesres ={}start =0fori inrange(1, len(test_str)):    iftest_str[i].isupper():        substr =test_str[start:i]        ifsubstr inres:            res[substr] +=1        else:            res[substr] =1        start =isubstr =test_str[start:]ifsubstr inres:    res[substr] +=1else:    res[substr] =1# printing resultprint("The grouped Substring Frequency : "+str(res)) | 
The original string is : GeEkSForGeEkS
The grouped Substring Frequency : {'Ge': 2, 'Ek': 2, 'S': 2, 'For': 1}
Time complexity: O(n), where n is the length of test_str.
Auxiliary space: O(k), where k is the number of unique substrings between uppercase letters in test_str.
Method #6: Using a list comprehension and zip
Steps:
- Create a list comprehension that uses zip to iterate over the string and create substrings between uppercase letters.
- Use a for loop to iterate over the list of substrings.
- Use the count method to count the occurrences of the current substring in the original string.
- Add the substring and its count to a dictionary.
- Print the resulting dictionary.
Example:
Python3
| # initializing stringtest_str ="GeEkSForGeEkS"# printing original stringprint("The original string is : "+test_str)# create a list of substrings between uppercase letters using a list comprehension and zipsubstr_list =[test_str[i:j] fori, j inzip([0] +[j                                                    forj inrange(1, len(test_str))                                                    iftest_str[j].isupper()],                                             [j forj inrange(1, len(test_str)) iftest_str[j].isupper()] +[None])]# create a dictionary to store the counts of each substringres ={}forsubstr insubstr_list:    res[substr] =test_str.count(substr)# printing resultprint("The grouped Substring Frequency : "+str(res)) | 
The original string is : GeEkSForGeEkS
The grouped Substring Frequency : {'Ge': 2, 'Ek': 2, 'S': 2, 'For': 1}
Time complexity: O(n^2) (due to the use of the count method)
Auxiliary space: O(k) (where k is the number of distinct substrings)
 
				 
					


