Error: Python Pandas – Only The Identically-Labeled Objects Can Be Compared

Error: Python Pandas, This was only applicable to DataFrames, not Series, untilPandas  0.19 where it applies to both.

In[1]: df1 = pd.DataFrame([

   [1, 2],

   [3, 4]

])

In[2]: df2 = pd.DataFrame([

   [3, 4],

   [1, 2]

], index = [1, 0])

In[3]: df1 == df2

Exception: Can only compare identically – labeled DataFrame objects

Some functions require sorted indexes, so it’s a solution to sort the index first.

In[4]: df2.sort_index(inplace = True)

In[5]: df1 == df2

Out[5]:

   0 1

0 True True

1 True True

Note: == is also sensitive to the order of columns, so you may have to use sort_index(axis=1):

In[11]: df1.sort_index().sort_index(axis = 1) == df2.sort_index().sort_index(axis = 1)

Out[11]:

   0 1

0 True True

1 True True

You can also try dropping the index column if it is not needed to compare:

print(df1.reset_index(drop = True) == df2.reset_index(drop = True))

I have used this same technique in a unit test like so:

from pandas.util.testing

import assert_frame_equal

assert_frame_equal(actual.reset_index(drop = True), expected.reset_index(drop = True))

You use it like this:

df1.equals(df2)

this should work

import pandas as pd

import numpy as np

firstProductSet = {

   ‘Product1’: [‘Computer’, ‘Phone’, ‘Printer’, ‘Desk’],

   ‘Price1’: [1200, 800, 200, 350]

}

df1 = pd.DataFrame(firstProductSet, columns = [‘Product1’, ‘Price1’])

secondProductSet = {

   ‘Product2’: [‘Computer’, ‘Phone’, ‘Printer’, ‘Desk’],

   ‘Price2’: [900, 800, 300, 350]

}

df2 = pd.DataFrame(secondProductSet, columns = [‘Product2’, ‘Price2’])

df1[‘Price2’] = df2[‘Price2’] #add the Price2 column from df2 to df1

df1[‘pricesMatch?’] = np.where(df1[‘Price1’] == df2[‘Price2’], ‘True’, ‘False’) #create new column in df1 to check

if prices match

df1[‘priceDiff?’] = np.where(df1[‘Price1’] == df2[‘Price2’], 0, df1[‘Price1’] – df2[‘Price2’]) #create new column in df1

for price diff

print(df1)

No explicit instruction given, as to the alignment: == aka DataFrame.__eq__,

In[1]: import pandas as pd

In[2]: df1 = pd.DataFrame(index = [0, 1, 2], data = {

   ‘col1’: list(‘abc’)

})

In[3]: df2 = pd.DataFrame(index = [2, 0, 1], data = {

   ‘col1’: list(‘cab’)

})

In[4]: df1 == df2

   — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — –

   …

ValueError: Can only compare identically – labeled DataFrame objects

Alignment is explicitly broken: DataFrame.equals, DataFrame.values, DataFrame.reset_index(),

In[5]: df1.equals(df2)

    Out[5]: False

    In[9]: df1.values == df2.values

    Out[9]:

       array([

          [False],

          [False],

          [False]

       ])

    In[10]: (df1.values == df2.values).all().all()

    Out[10]: False

Alignment is explicitly enforced: DataFrame.eq, DataFrame.sort_index(),

In[6]: df1.eq(df2)

Out[6]:

   col1

0 True

1 True

2 True

In[8]: df1.eq(df2).all().all()

Out[8]: True

I’m showing a complete example of how to handle this error. There are rows with zeros added to them. It is possible to have your dataframes from a number of sources.

import pandas as pd

import numpy as np

# df1 with 9 rows

df1 = pd.DataFrame({

   ‘Name’: [‘John’, ‘Mike’, ‘Smith’, ‘Wale’, ‘Marry’, ‘Tom’, ‘Menda’, ‘Bolt’, ‘Yuswa’, ],

   ‘Age’: [23, 45, 12, 34, 27, 44, 28, 39, 40]

})

# df2 with 8 rows

df2 = pd.DataFrame({

   ‘Name’: [‘John’, ‘Mike’, ‘Wale’, ‘Marry’, ‘Tom’, ‘Menda’, ‘Bolt’, ‘Yuswa’, ],

   ‘Age’: [25, 45, 14, 34, 26, 44, 29, 42]

})

# get lengths of df1 and df2

df1_len = len(df1)

df2_len = len(df2)

diff = df1_len – df2_len

rows_to_be_added1 = rows_to_be_added2 = 0

# rows_to_be_added1 = np.zeros(diff)

if diff < 0:

   rows_to_be_added1 = abs(diff)

else:

   rows_to_be_added2 = diff

# add empty rows to df1

if rows_to_be_added1 > 0:

   df1 = df1.append(pd.DataFrame(np.zeros((rows_to_be_added1, len(df1.columns))), columns = df1.columns))

# add empty rows to df2

if rows_to_be_added2 > 0:

   df2 = df2.append(pd.DataFrame(np.zeros((rows_to_be_added2, len(df2.columns))), columns = df2.columns))

# at this point we have two dataframes with the same number of rows, and maybe different indexes

# drop the indexes of both, so we can compare the dataframes and other operations like update etc.

df2.reset_index(drop = True, inplace = True)

df1.reset_index(drop = True, inplace = True)

Add a new column to df1

df1[‘New_age’] = None

# compare the Age column of df1 and df2, and update the New_age column of df1 with the Age column of df2

if they match,

else None

df1[‘New_age’] = np.where(df1[‘Age’] == df2[‘Age’], df2[‘Age’], None)

# drop rows where Name is 0.0

df2 = df2.drop(df2[df2[‘Name’] == 0.0].index)

# now we don ‘t get the error ValueError: Can only compare identically-labeled Series objects

When comparing two DataFrames using ==, you’ll often get an error: ValueError: Can only compare identically-labeled DataFrame objects. To avoid this issue, you must use.

You will be able to solve this problem by using equals instead of ==.,Congratulations on reading to the end of this tutorial!

When comparing two DataFrames that have different index labels, this error may occur. You can fix this error by either having all of your dataframes share the same index labels or changing the index labels to be more descriptive.

You can either reset the indexes by using.reset_index() or you can ignore the indexes using.equals().

The NumPy method array_equal to compare the two DataFrames’ columns can also solve this error.

This function allows us to compare two Series or DataFrames to see if they have the same shape or elements.

In Python, a value is something you can store in an object.

Python raises a ValueError when using a built-in operation or function that receives an argument that is the right type but an inappropriate value.

The data we want to compare is a list of lists with the index “product”, but the lists have the index “item” which is not the right type to use for comparison.

import pandas as pd

df1 = pd.DataFrame({

      ‘Bodyweight (kg)’: [76, 84, 93, 106, 120, 56],

      ‘Bench press (kg)’: [135, 150, 170, 140, 180, 155]

   },

   index = [‘lifter_1’, ‘lifter_2’, ‘lifter_3’, ‘lifter_4’, ‘lifter_5’, ‘lifter_6’])

df2 = pd.DataFrame({

      ‘Bodyweight (kg)’: [76, 84, 93, 106, 120, 56],

      ‘Bench press (kg)’: [145, 120, 180, 220, 175, 110]

   },

   index = [‘lifter_A’, ‘lifter_B’, ‘lifter_C’, ‘lifter_D’, ‘lifter_E’, ‘lifter_F’])

print(df1)

print(df2)

Let’s run this part of the program to see the DataFrames

Bodyweight(kg) Bench press(kg)

    lifter_1 76 135

    lifter_2 84 150

    lifter_3 93 170

    lifter_4 106 140

    lifter_5 120 180

    lifter_6 56 155

    Bodyweight(kg) Bench press(kg)

    lifter_A 76 145

    lifter_B 84 120

    lifter_C 93 180

    lifter_D 106 220

    lifter_E 120 175

    lifter_F 56 110 e

print(df1 == df2)

The DataFrame.equals function can be used to solve this error. The equals function allows us to compare Series and DataFrames to see if they have the same shape or elements. The revised code should be looked at.

print(df1.equals(df2))

False

We can compare the DataFrames using the reset_ index method. The parameters that need to be set are true and drop. Let’s look at the revised code

df1 = pd.DataFrame({

      ‘Bodyweight (kg)’: [76, 84, 93, 106, 120, 56],

      ‘Bench press (kg)’: [145, 120, 180, 220, 175, 110]

   },

   index = [‘lifter_1’, ‘lifter_2’, ‘lifter_3’, ‘lifter_4’, ‘lifter_5’, ‘lifter_6’])

df2 = pd.DataFrame({

      ‘Bodyweight (kg)’: [76, 84, 93, 106, 120, 56],

      ‘Bench press (kg)’: [145, 120, 180, 220, 175, 110]

   },

   index = [‘lifter_A’, ‘lifter_B’, ‘lifter_C’, ‘lifter_D’, ‘lifter_E’, ‘lifter_F’])

df1 = df1.reset_index(drop = True)

df2 = df2.reset_index(drop = True)

print(df1)

print(df2)

Bodyweight (kg) Bench press(kg)

   0 76 145

   1 84 120

   2 93 180

   3 106 220

   4 120 175

   5 56 110

   Bodyweight(kg) Bench press(kg)

   0 76 145

   1 84 120

   2 93 180

   3 106 220

   4 120 175

   5 56 110

print(df1.equals(df2))

print(df1 == df2)

If we use numpy.array_equal, we can check if the two arrays have the same shape and elements. We can use the.values to extract the array from the DataFrame. The code we are looking at has been revised.

import pandas as pd

import numpy as np

df1 = pd.DataFrame({

      ‘Bodyweight (kg)’: [76, 84, 93, 106, 120, 56],

      ‘Bench press (kg)’: [135, 150, 170, 140, 180, 155]

   },

   index = [‘lifter_1’, ‘lifter_2’, ‘lifter_3’, ‘lifter_4’, ‘lifter_5’, ‘lifter_6’])

df2 = pd.DataFrame({

      ‘Bodyweight (kg)’: [76, 84, 93, 106, 120, 56],

      ‘Bench press (kg)’: [145, 120, 180, 220, 175, 110]

   },

   index = [‘lifter_A’, ‘lifter_B’, ‘lifter_C’, ‘lifter_D’, ‘lifter_E’, ‘lifter_F’])

print(np.array_equal(df1.values, df2.values))

False

We can use array

equal to compare individual columns. Let’s look at the revised code:

import pandas as pd

import numpy as np

df1 = pd.DataFrame({

      ‘Bodyweight (kg)’: [76, 84, 93, 106, 120, 56],

      ‘Bench press (kg)’: [135, 150, 170, 140, 180, 155]

   },

   index = [‘lifter_1’, ‘lifter_2’, ‘lifter_3’, ‘lifter_4’, ‘lifter_5’, ‘lifter_6’])

df2 = pd.DataFrame({

      ‘Bodyweight (kg)’: [76, 84, 93, 106, 120, 56],

      ‘Bench press (kg)’: [145, 120, 180, 220, 175, 110]

   },

   index = [‘lifter_A’, ‘lifter_B’, ‘lifter_C’, ‘lifter_D’, ‘lifter_E’, ‘lifter_F’])

# Get individual columns of DataFrames using iloc

df1_bodyweight = df1.iloc[: , 0]

df1_bench = df1.iloc[: , 1]

df2_bodyweight = df2.iloc[: , 0]

df2_bench = df2.iloc[: , 1]

# Compare bodyweight and bench columns separately

print(np.array_equal(df1_bodyweight.values, df2_bodyweight.values))

print(np.array_equal(df1_bench.values, df2_bench.values))

True

False

An exception ValueError: Can only compare identically-labeled DataFrame objects was given when I ran this code.

The full exception is that I am trying to create a python app with two dataframes and I want to compare them.

import pandas as pd

 dfa = pd.DataFrame({

       ‘Bodyweight (kg)’: [760, 840, 930, 1060, 1200, 560],

       ‘Bench press (kg)’: [1350, 1500, 1700, 1400, 1080, 1505]

    },

    index = [‘index_1’, ‘index_2’, ‘index_3’, ‘index_4’, ‘index_5’, ‘index_6’])

 dfb = pd.DataFrame({

       ‘Bodyweight (kg)’: [756, 840, 903, 1006, 1200, 560],

       ‘Bench press (kg)’: [1405, 1020, 1080, 2200, 1075, 1010]

    },

    index = [‘index_A’, ‘index_B’, ‘index_C’, ‘index_D’, ‘index_E’, ‘index_F’])

 print(dfa == dfb)

Traceback (most recent call last):

   File “main.py”, line 7, in <module>

        print(dfa == dfb)

      File “/usr/lib/python3.8/site-packages/pandas/core/ops/__init__.py”, line 701, in f

        self, other = _align_method_FRAME(self, other, axis, level=None, flex=False)

      File “/usr/lib/python3.8/site-packages/pandas/core/ops/__init__.py”, line 510, in _align_method_FRAME

        raise ValueError(

    ValueError: Can only compare identically-labeled DataFrame objects

 

    ** Process exited – Return Code: 1 **

    Press Enter to exit terminal

 import pandas as pd

 dfa = pd.DataFrame({

       ‘Bodyweight (kg)’: [760, 840, 930, 1060, 1200, 560],

       ‘Bench press (kg)’: [1350, 1500, 1700, 1400, 1080, 1505]

    },

    index = [‘index_1’, ‘index_2’, ‘index_3’, ‘index_4’, ‘index_5’, ‘index_6’])

 dfb = pd.DataFrame({

       ‘Bodyweight (kg)’: [756, 840, 903, 1006, 1200, 560],

       ‘Bench press (kg)’: [1405, 1020, 1080, 2200, 1075, 1010]

    },

    index = [‘index_A’, ‘index_B’, ‘index_C’, ‘index_D’, ‘index_E’, ‘index_F’])

 dfa = dfa.reset_index(drop = True)

 dfb = dfb.reset_index(drop = True)

 print(dfa == dfb)

Bodyweight(kg) Bench press(kg)

 0 False False

 1 True False

 2 False False

 3 False False

 4 True False

 5 True False

    **

    Process exited – Return Code: 0 **

    Press Enter to exit terminal

Abdullah
Abdullah
Articles: 33

Leave a Reply

Your email address will not be published. Required fields are marked *