How to select the rows of a dataframe using the indices of another dataframe?

0 0 2 minutes read

Prerequisites:

Using Pandas module it is possible to select rows from a data frame using indices from another data frame. This article discusses that in detail. It is advised to implement all the codes in jupyter notebook for easy implementation.

Approach:

Import module
Create first data frame. In the example given below choice(), randint() and random() all belonging to random module are used to generate a data frame.

1) choice() – choice() is an inbuilt function in Python programming language that returns a random item from a list, tuple, or string.

Syntax: random.choice(sequence)

Parameters: Sequence is a mandatory parameter that can be a list, tuple, or string.

Returns: The choice() returns a random item.

2) randint()- This function is used to generate random numbers

Syntax : randint(start, end)

Parameters :

(start, end) : Both of them must be integer type values.

Returns :

A random integer in range [start, end] including the end points.

3) random()- Used to generate floating numbers between 0 and 1.

Create another data frame using the random() function and randomly selecting the rows of the first dataset.
Now we will use dataframe.loc[] function to select the row values of the first data frame using the indexes of the second data frame. Pandas DataFrame.loc[] attribute access a group of rows and columns by label(s) or a boolean array in the given DataFrame.

Syntax: DataFrame.loc

Parameter : None

Returns : Scalar, Series, DataFrame

Display selected rows

Implementation using the above concept is given below:

Program:

Python3

# Importing Required Libraries
import pandas as pd
import random
 
# Creating data for main dataframe
col1 = [random.randint(1, 9) for i in range(15)]
col2 = [random.random() for i in range(15)]
col3 = [random.choice(['Geeks', 'of', 'Computer', 'Science'])
        for i in range(15)]
col4 = [random.randint(1, 9) for i in range(15)]
col5 = [random.randint(1, 9) for i in range(15)]
 
# Defining Column name for main dataframe
data_generated = {
    'value1': col1,
    'value2': col2,
    'value3': col3,
    'value4': col4,
    'value5': col5
}
 
# Creating the dataframe using DataFrame() function
print("First data frame")
dataframe = pd.DataFrame(data_generated)
display(dataframe)
 
# Creating a second dataframe that will be the subset of main dataframe
print("Second data frame")
dataframe_second = dataframe[['value1', 'value2', 'value3']].sample(n=4)
display(dataframe_second)
 
# Rows of a dataframe using the indices of another dataframe
print("selecting rows of first dataframe using second dataframe")
display(dataframe.loc[dataframe_second.index])