How to check if two Pandas DataFrames are equal

Data Comparison is another important feature that is present in Pandas. It helps us to easily locate any discrepancies between two DataFrames through Pandas in-built functions. In this tutorial, we will cover how to check if two Pandas DataFrames are equal, while also discussing how to pinpoint any differences between them.


Using equals() method to check if DataFrames are equal:

It checks two DataFrames (or Series) for differences and returns True if the shape and elements are the same or False if otherwise. If two corresponding values are NaN, it will treat them as equal. It takes a DataFrame (to compare with) as an argument. If the corresponding values are the same, the index or column labels can be of different data types. However, the elements must be of the same data type.

Out of all the functions/operators/methods we will teach you, this is probably the best and most straightforward one.

In the following code, we declared and initialized three Pandas DataFrames. We assigned identical values to df1 and df2, but changed “Oswald” to “Os” in df3. Then we used the equals() method to compare the DataFrames.

import pandas as pd
df1 = pd.DataFrame([["Oswald", "Male", 67], 
                    ["Jack", "Male", 25],
                    ["Lacie", "Female", 32]])

df2 = pd.DataFrame([["Oswald", "Male", 67], 
                    ["Jack", "Male", 25],
                    ["Lacie", "Female", 32]])

df3 = pd.DataFrame([["Os", "Male", 67], 
                    ["Jack", "Male", 25],
                    ["Lacie", "Female", 32]])

print(df1.equals(df2))
print(df1.equals(df3))

Output:

True 
False

As expected, when we compared df1 to df2, the method returned True as both DataFrames were identical. However, when we compared df1 to df3, the method returned False as their elements are not identical.


Using equality (==) operator to check if DataFrames are equal:

As you probably already know, the equality operator is used to compare two objects and return True if they are the same, else return False. If we compare two DataFrames using the equality operator, it will return a DataFrame that consists of boolean values. If two corresponding values are not the same, the DataFrame will store False in the corresponding cell, else it will be True.

One reason to use this over the equals() method is that it is not type sensitive, so it will treat 67 and 67.0 as equals. It can also be used to find any differences in values for data analysis.

import pandas as pd
df1 = pd.DataFrame([["Oswald", "Male", 67], 
                    ["Jack", "Male", 25],
                    ["Lacie", "Female", 32]])

df2 = pd.DataFrame([["Oswald", "Male", 67.0], 
                    ["Jack", "Male", 25.0],
                    ["Lacie", "Female", 32.0]])

df3 = pd.DataFrame([["Os", "Male", 67], 
                    ["Jack", "Male", 25],
                    ["Lacie", "Female", 32]])

print(df1 == df2)
print("")
print(df1 == df3)

Output:

      0     1     2
0  True  True  True
1  True  True  True
2  True  True  True

       0     1     2
0  False  True  True
1   True  True  True
2   True  True  True

Even though we changed all the integers in df2 to float, we still received True as the equality operator is not type sensitive.

If you simply want to receive one boolean value while comparing two DataFrames, you can use the all() method. This method returns True if all the boolean values on an axis are True, but returns False if even one value is False. Since there are two axes in a DataFrame, we will use this method twice for each comparison.

df1 = pd.DataFrame([["Oswald", "Male", 67], 
                    ["Jack", "Male", 25],
                    ["Lacie", "Female", 32]])

df2 = pd.DataFrame([["Oswald", "Male", 67.0], 
                    ["Jack", "Male", 25.0],
                    ["Lacie", "Female", 32.0]])

df3 = pd.DataFrame([["Os", "Male", 67], 
                    ["Jack", "Male", 25],
                    ["Lacie", "Female", 32]])

print((df1 == df2).all().all())
print((df1 == df3).all().all())

Output:

True
False

compare() method:

It is used to compare two DataFrames and show the differences between them. It takes another DataFrame as an argument and returns a DataFrame containing the differences between the two DataFrames. Equal values are shown as NaN. It is not type sensitive.

df1 = pd.DataFrame([["Oswald", "Male", 67], 
                    ["Jack", "Male", 25],
                    ["Lacie", "Female", 32]])

df2 = pd.DataFrame([["Oswald", "Male", 67.0], 
                    ["Jack", "Male", 25.0],
                    ["Lacie", "Female", 32.0]])

df3 = pd.DataFrame([["Os", "Male", 67], 
                    ["Alice", "Female", 25],
                    ["Lacie", "Female", 32]])

print(df1.compare(df2))
print("")
print(df1.compare(df3))

Output:

Empty DataFrame
Columns: []
Index: []

        0            1
     self  other  self   other
0  Oswald     Os   NaN     NaN
1    Jack  Alice  Male  Female

The comparison between df1 and df2 returned an empty DataFrame as there are no changes between the two. However, comparing df1 to df3 returned a DataFrame which displays the differences between the two.

self refers to df1 (the object that called the method) and other refers to df3 (the object passed as an argument).


assert_frame_equal() function

It compares two DataFrames and shows their differences. It takes several parameters, including left, right, and check_dtype. The first DataFrame is left and the second DataFrame is right. check_dtype is used to specify whether data types should also be compared, and is True by default.

It’s commonly used for unit testing and throws an error if the two DataFrames are not identical. Thus, we use it with exception handling to return True or False.

import pandas as pd
from pandas.testing import assert_frame_equal

df1 = pd.DataFrame([["Oswald", "Male", 67], 
                    ["Jack", "Male", 25],
                    ["Lacie", "Female", 32]])

df2 = pd.DataFrame([["Oswald", "Male", 67.0], 
                    ["Jack", "Male", 25.0],
                    ["Lacie", "Female", 32.0]])

try:
    assert_frame_equal(df1, df2)
    print("True")
except: 
    print("False")
    
try:
    assert_frame_equal(df1, df2, check_dtype=False)
    print("True")
except: 
    print("False")

Output:

False
True

In the first test, check_dtype was True (by default) so it threw an error and printed “False”. But in the second test, check_dtype was False so float could be compared to integer, thus it printed “True”.


Additional Tips:

If you need to compare two DataFrames with different column labels or index numbers, you can save the first DataFrame’s labels and indices in separate temporary variables and then assign the second DataFrame’s labels and indices to the first DataFrame. Then you can compare the DataFrames and then re-assign the first DataFrames labels and indices with the temporary variables.

import pandas as pd
from pandas.testing import assert_frame_equal

df1 = pd.DataFrame([["Oswald", "Male", 67], 
                    ["Jack", "Male", 25],
                    ["Lacie", "Female", 32]],
                    index=[1,2,3],
                    columns=["Name", "Gender", "Age"])


df2 = pd.DataFrame([["Oswald", "Male", 67], 
                    ["Jack", "Male", 25],
                    ["Lacie", "Female", 32]],
                    index=[1,4,3],
                    columns=["User", "Gen", "Age"])

print(df1.equals(df2))

temp_col = df1.columns
df1.columns = df2.columns
temp_row = df1.index
df1.index = df2.index

print(df1.equals(df2))

df1.columns = temp_col
df1.index = temp_row

Output:

False
True

Since, in the start, the labels and indices of df1 and df2 were different, equals() returned False. But after equating the labels and indices, it returned True.


This marks the end of the “How to check if Pandas DataFrames are equal” Tutorial. Any suggestions or contributions for CodersLegacy are more than welcome. Questions regarding the tutorial content can be asked in the comments section below.

Leave a Comment