How to select the rows of a dataframe using the indices of another dataframe?

Prerequisites:
Using Pandas module it is possible to select rows from a data frame using indices from another data frame. This article discusses that in detail. It is advised to implement all the codes in jupyter notebook for easy implementation.
Approach:
- Import module
- Create first data frame. In the example given below choice(), randint() and random() all belonging to random module are used to generate a data frame.
1) choice() – choice() is an inbuilt function in Python programming language that returns a random item from a list, tuple, or string.
Syntax: random.choice(sequence)
Parameters: Sequence is a mandatory parameter that can be a list, tuple, or string.
Returns: The choice() returns a random item.
2) randint()- This function is used to generate random numbers
Syntax : randint(start, end)
Parameters :
(start, end) : Both of them must be integer type values.
Returns :
A random integer in range [start, end] including the end points.
3) random()- Used to generate floating numbers between 0 and 1.
- Create another data frame using the random() function and randomly selecting the rows of the first dataset.
- Now we will use dataframe.loc[] function to select the row values of the first data frame using the indexes of the second data frame. Pandas DataFrame.loc[] attribute access a group of rows and columns by label(s) or a boolean array in the given DataFrame.
Syntax: DataFrame.loc
Parameter : None
Returns : Scalar, Series, DataFrame
- Display selected rows
Implementation using the above concept is given below:
Program:
Python3
# Importing Required Librariesimport pandas as pdimport random# Creating data for main dataframecol1 = [random.randint(1, 9) for i in range(15)]col2 = [random.random() for i in range(15)]col3 = [random.choice(['Geeks', 'of', 'Computer', 'Science']) for i in range(15)]col4 = [random.randint(1, 9) for i in range(15)]col5 = [random.randint(1, 9) for i in range(15)]# Defining Column name for main dataframedata_generated = { 'value1': col1, 'value2': col2, 'value3': col3, 'value4': col4, 'value5': col5}# Creating the dataframe using DataFrame() functionprint("First data frame")dataframe = pd.DataFrame(data_generated)display(dataframe)# Creating a second dataframe that will be the subset of main dataframeprint("Second data frame")dataframe_second = dataframe[['value1', 'value2', 'value3']].sample(n=4)display(dataframe_second)# Rows of a dataframe using the indices of another dataframeprint("selecting rows of first dataframe using second dataframe")display(dataframe.loc[dataframe_second.index]) |
Output:




