How to make Deep Copy of Pandas DataFrame

While copying an object, we have to be mindful of how this may affect the object’s behavior in the future. We may not want changes in one copy of an object to be reflected in other copies of the same object. There are 2 ways to make a copy of a Pandas DataFrame: Shallow Copy and Deep Copy.

Shallow Copy vs Deep Copy

Let’s start by discussing the differences between shallow copy and deep copy:

Shallow Copy will only copy the references to the elements and indices. Any changes in the original object will be reflected in the shallow copy. Any changes in the shallow copy will also be reflected in the original.

Deep Copy will copy all of the elements and indices to create a new object. Any changes in the original object will not be reflected in the deep copy.


copy() method:

To use shallow copy and deep copy, we will have to use the copy() method. It takes deep as a parameter which accepts Boolean values. If deep=True, a deep copy is made, and if deep=False, a shallow copy is made. If there are no arguments, True is passed by default.

Shallow Copy of a Pandas DataFrame:

First, create and initialize a Pandas DataFrame and then use the copy() method. Make sure to pass False as an argument to create a shallow copy.

import pandas as pd
df = pd.DataFrame([["Break","London","25/09/1986"],
                   ["Xai","Toronto","22/04/1971"],
                   ["Rufus","Paris","08/08/1998"]],
                  index=[1,2,3],
                  columns=["Name", "City", "DOB"])

df2 = df.copy(deep=False)
print(df2)
print("")
df["Name"][2] = "Oscar"
print(df2)

Output:

    Name     City         DOB
1  Break   London  25/09/1986
2    Xai  Toronto  22/04/1971
3  Rufus    Paris  08/08/1998

    Name     City         DOB
1  Break   London  25/09/1986
2  Oscar  Toronto  22/04/1971
3  Rufus    Paris  08/08/1998

We made a shallow copy of df and assigned it to df2. Then we changed the value an element in df (“Xai” was replaced by “Oscar”) and printed df2. As shown above, the changes made in df were also made in df2.


Deep Copy of a Pandas DataFrame:

Similar to the above example, create and initialize a Pandas DataFrame and then use the copy() method. You can either pass no arguments or pass True as an argument.

import pandas as pd
df = pd.DataFrame([["Break","London","25/09/1986"],
                   ["Xai","Toronto","22/04/1971"],
                   ["Rufus","Paris","08/08/1998"]],
                  index=[1,2,3],
                  columns=["Name", "City", "DOB"])

df2 = df.copy(deep=True)
print(df2)
print("")
df["Name"][2] = "Oscar"
print(df2)

Output:

    Name     City         DOB
1  Break   London  25/09/1986
2    Xai  Toronto  22/04/1971
3  Rufus    Paris  08/08/1998

    Name     City         DOB
1  Break   London  25/09/1986
2    Xai  Toronto  22/04/1971
3  Rufus    Paris  08/08/1998

Since we created a deep copy, df2 remained the same even when we changed an element in df. As shown above, changes made in the original were not reflected in the copy.


Comparison of Shallow and Deep Copy

Finally, we will showcase shallow and deep copy in the same program. After creating the copies, we will change an element in the original DataFrame and the changes will only show in the original DataFrame and shallow copy.

df = pd.DataFrame([["Break","London","25/09/1986"],
                   ["Xai","Toronto","22/04/1971"],
                   ["Rufus","Paris","08/08/1998"]],
                  index=[1,2,3],
                  columns=["Name", "City", "DOB"])


df2 = df.copy(deep=False)
df3 = df.copy()
df["Name"][2] = "Oscar"

print("Original:")
print(df)
print("\nShallow Copy:")
print(df2)
print("\nDeep Copy:")
print(df3)

Output:

Original:
    Name     City         DOB
1  Break   London  25/09/1986
2  Oscar  Toronto  22/04/1971
3  Rufus    Paris  08/08/1998

Shallow Copy:
    Name     City         DOB
1  Break   London  25/09/1986
2  Oscar  Toronto  22/04/1971
3  Rufus    Paris  08/08/1998

Deep Copy:
    Name     City         DOB
1  Break   London  25/09/1986
2    Xai  Toronto  22/04/1971
3  Rufus    Paris  08/08/1998

deepcopy() method

Another way to make a deep copy of a Pandas DataFrame consists of the deepcopy() method. To use this method, we have to import the copy module. After this, pass the name of the variable as an argument and it will return a deep copy of said variable. We will also use other methods to compare the DataFrames.

import pandas as pd
import copy

df = pd.DataFrame([["Break","London","25/09/1986"],
                   ["Xai","Toronto","22/04/1971"],
                   ["Rufus","Paris","08/08/1998"]],
                  index=[1,2,3],
                  columns=["Name", "City", "DOB"])

df2 = copy.deepcopy(df)
df["City"][3] = "Detroit"
print(df)
print("")
print(df2)
print("")
print(df.equals(df2))

Output:

    Name     City         DOB
1  Break   London  25/09/1986
2    Xai  Toronto  22/04/1971
3  Rufus  Detroit  08/08/1998

    Name     City         DOB
1  Break   London  25/09/1986
2    Xai  Toronto  22/04/1971
3  Rufus    Paris  08/08/1998

False

As you can see, even though we altered the original DataFrame, the changes are not shown in the copy, and it failed the comparison test we applied.


This marks the end of the “How to make Deep Copy of Pandas DataFrame” Tutorial. Any suggestions or contributions for CodersLegacy are more than welcome. Questions regarding the tutorial content can be asked in the comments section below.

Leave a Comment