Python NLTK | tokenize.regexp()

With the help of NLTK tokenize.regexp() module, we are able to extract the tokens from string by using regular expression with RegexpTokenizer() method.
Syntax :
tokenize.RegexpTokenizer()
Return : Return array of tokens using regular expression
Example #1 :
In this example we are using RegexpTokenizer() method to extract the stream of tokens with the help of regular expressions.
# import RegexpTokenizer() method from nltk from nltk.tokenize import RegexpTokenizer # Create a reference variable for Class RegexpTokenizer tk = RegexpTokenizer('\s+', gaps = True) # Create a string input gfg = "I love Python" # Use tokenize method geek = tk.tokenize(gfg) print(geek) |
Output :
[‘I’, ‘love’, ‘Python’]
Example #2 :
# import RegexpTokenizer() method from nltk from nltk.tokenize import RegexpTokenizer # Create a reference variable for Class RegexpTokenizer tk = RegexpTokenizer('\s+', gaps = True) # Create a string input gfg = "Geeks for Geeks" # Use tokenize method geek = tk.tokenize(gfg) print(geek) |
Output :
[‘Geeks’, ‘for’, ‘Geeks’]



