Python | Substring Frequency between Uppercases

0 0 4 minutes read

Sometimes while working with Strings, we can have a problem in which we have to find substrings that occur between the uppercases and find their frequencies. This is a very unique problem and has less of applications. Let us discuss certain ways in which this task can be performed.

Method #1 : Using groupby() + regex + loop

The combination of the above functions can be used to perform this task. In this, we use groupby() to group the regex-extracted substrings. And all is compiled using loop.

Python3

# Python3 code to demonstrate working of
# Substring Frequency between Uppercases
# Using groupby() + regex + loop
 
from itertools import groupby
import re
 
# initializing string
test_str = "GeEkSForGeEkS"
 
# printing original string
print("The original string is : " + test_str)
 
# Substring Frequency between Uppercases
# Using groupby() + regex + loop
res = {}
for i, j in groupby(re.findall(r'[A-Z][a-z]*\d*', test_str)):
    res[i] = res[i] + 1 if i in res.keys() else 1
 
# printing result
print("The grouped Substring Frequency : " + str(res))

Output

The original string is : GeEkSForGeEkS
The grouped Substring Frequency : {'Ge': 2, 'Ek': 2, 'S': 2, 'For': 1}

Method #2: Using dictionary comprehension + sorted() + groupby() + regex

The combination of the above functions can be used to perform this task. In this, the task of cumulation is performed using dictionary comprehension.

Python3

# Python3 code to demonstrate working of
# Substring Frequency between Uppercases
# Using dictionary comprehension + sorted() + groupby() + regex
 
from itertools import groupby
import re
 
# initializing string
test_str = "GeEkSForGeEkS"
 
# printing original string
print("The original string is : " + test_str)
 
# Substring Frequency between Uppercases
# Using dictionary comprehension + sorted() + groupby() + regex
res = {i: len(list(j)) for i, j in groupby(
    sorted(re.findall(r'[A-Z][a-z]*\d*', test_str)))}
 
# printing result
print("The grouped Substring Frequency : " + str(res))

Output

The original string is : GeEkSForGeEkS
The grouped Substring Frequency : {'Ek': 2, 'For': 1, 'Ge': 2, 'S': 2}

Time Complexity: O(n)
Auxiliary Space: O(n)

Method #3 : Using isupper(),split(),list(),set(),remove(),count() methods

Approach:

Replace uppercase character with “*”+uppercase character(using isupper())
Split on * that will result in a list(using split())
Create a new list with unique strings of above list(using list(),set())
Create a dictionary with unique strings as key and its count in original list. (using count() method)

Example:

Python3

# Python3 code to demonstrate working of
# Substring Frequency between Uppercases
 
# initializing string
test_str = "GeEkSForGeEkS"
 
# printing original string
print("The original string is : " + test_str)
 
# Substring Frequency between Uppercases
res = dict()
x = ""
for i in test_str:
    if i.isupper():
        x += "*"+i
    else:
        x += i
 
y = x.split("*")
y.remove('')
 
z = list(set(y))
 
for i in z:
    res[i] = y.count(i)
 
# printing result
print("The grouped Substring Frequency : " + str(res))

Output

The original string is : GeEkSForGeEkS
The grouped Substring Frequency : {'S': 2, 'Ge': 2, 'Ek': 2, 'For': 1}

Time Complexity : O(N)
Auxiliary Space : O(N)

Method #4: Using regular expression and Counter()

A regular expression pattern is defined to match all the substrings that start with an uppercase letter followed by zero or more lowercase letters.
The re.findall() method is used to find all the matches of the pattern in the given string. This method returns a list of all the matches.
The Counter() method is used to count the frequency of each match in the list.
The resulting dictionary of the Counter() method is the desired output.

Example:

Python3

import re
from collections import Counter
 
# initializing string
test_str = "GeEkSForGeEkS"
 
# printing original string
print("The original string is : " + test_str)
 
# Substring Frequency between Uppercases
pattern = r'[A-Z][a-z]*'
matches = re.findall(pattern, test_str)
res = dict(Counter(matches))
 
# printing result
print("The grouped Substring Frequency : " + str(res))

Output

The original string is : GeEkSForGeEkS
The grouped Substring Frequency : {'Ge': 2, 'Ek': 2, 'S': 2, 'For': 1}

Time complexity: O(n), where n is the length of the input string.
Auxiliary space: O(n), where n is the length of the input string.

Method #5: Using a for loop and string slicing

Initialize an empty dictionary res to store the substring frequencies.
Initialize a variable start to 0.
Loop through each character in the test_str starting from the second character.
If the current character is uppercase, slice the substring between start and the current index (exclusive) and add it to the res dictionary. Update the start variable to the current index.
After the loop, slice the substring between the last uppercase letter and the end of the string and add it to the res dictionary.
Print the res dictionary.

Example:

Python3

# initializing string
test_str = "GeEkSForGeEkS"
 
# printing original string
print("The original string is : " + test_str)
 
# Substring Frequency between Uppercases
res = {}
start = 0
for i in range(1, len(test_str)):
    if test_str[i].isupper():
        substr = test_str[start:i]
        if substr in res:
            res[substr] += 1
        else:
            res[substr] = 1
        start = i
 
substr = test_str[start:]
if substr in res:
    res[substr] += 1
else:
    res[substr] = 1
 
# printing result
print("The grouped Substring Frequency : " + str(res))

Output

The original string is : GeEkSForGeEkS
The grouped Substring Frequency : {'Ge': 2, 'Ek': 2, 'S': 2, 'For': 1}

Time complexity: O(n), where n is the length of test_str.
Auxiliary space: O(k), where k is the number of unique substrings between uppercase letters in test_str.

Method #6: Using a list comprehension and zip

Steps:

Create a list comprehension that uses zip to iterate over the string and create substrings between uppercase letters.
Use a for loop to iterate over the list of substrings.
Use the count method to count the occurrences of the current substring in the original string.
Add the substring and its count to a dictionary.
Print the resulting dictionary.

Example:

Python3

# initializing string
test_str = "GeEkSForGeEkS"
 
# printing original string
print("The original string is : " + test_str)
 
# create a list of substrings between uppercase letters using a list comprehension and zip
substr_list = [test_str[i:j] for i, j in zip([0] + [j
                                                    for j in range(1, len(test_str))
                                                    if test_str[j].isupper()],
                                             [j for j in range(1, len(test_str)) if test_str[j].isupper()] + [None])]
 
# create a dictionary to store the counts of each substring
res = {}
for substr in substr_list:
    res[substr] = test_str.count(substr)
 
# printing result
print("The grouped Substring Frequency : " + str(res))

Output

The original string is : GeEkSForGeEkS
The grouped Substring Frequency : {'Ge': 2, 'Ek': 2, 'S': 2, 'For': 1}

Time complexity: O(n^2) (due to the use of the count method)
Auxiliary space: O(k) (where k is the number of distinct substrings)

Namachila 2 weeks ago

0 0 4 minutes read

Python | Substring Frequency between Uppercases

Python3

Python3

Python3

Python3

Python3

Python3

Namachila

Leave a Reply Cancel reply

Hibs and Ross County fans on final

Signs of life for Apple stock as Wall St

The ultimate guide to herbal teas

After all is said and done, more is said than done

Knowledge is power

The Future Of Possible

Xbox boss talks Project Scorpio price

Hibs and Ross County fans on final

Tip of the day: That man again

Python3

Python3

Python3

Python3

Python3

Python3

Namachila

Subscribe to our mailing list to get the new updates!

Install Stock ROM on Xiaomi Mi Pad 7.9? [Stock Firmware/ Unbrick]

10 Creative Ways to Improve Your English Language Skills

Related Articles

How to Unlock Bootloader on Samsung Galaxy Tab Active3? OEM Unlock!

PyQt5 – Background color to lineedit of ComboBox on mouse hover

Install TWRP Recovery on Samsung Galaxy F41? [Easy Steps]

Python NumPy – Return real parts if input is complex with all imaginary parts close to zero

Leave a Reply Cancel reply

Hibs and Ross County fans on final

Signs of life for Apple stock as Wall St

The ultimate guide to herbal teas

After all is said and done, more is said than done

Knowledge is power

The Future Of Possible

Xbox boss talks Project Scorpio price

Hibs and Ross County fans on final

Tip of the day: That man again