Error: Python Pandas, This was only applicable to DataFrames, not Series, untilPandas 0.19 where it applies to both.
In[1]: df1 = pd.DataFrame([
[1, 2],
[3, 4]
])
In[2]: df2 = pd.DataFrame([
[3, 4],
[1, 2]
], index = [1, 0])
In[3]: df1 == df2
Exception: Can only compare identically – labeled DataFrame objects
Some functions require sorted indexes, so it’s a solution to sort the index first.
In[4]: df2.sort_index(inplace = True)
In[5]: df1 == df2
Out[5]:
0 1
0 True True
1 True True
Note: == is also sensitive to the order of columns, so you may have to use sort_index(axis=1):
In[11]: df1.sort_index().sort_index(axis = 1) == df2.sort_index().sort_index(axis = 1)
Out[11]:
0 1
0 True True
1 True True
You can also try dropping the index column if it is not needed to compare:
print(df1.reset_index(drop = True) == df2.reset_index(drop = True))
I have used this same technique in a unit test like so:
from pandas.util.testing
import assert_frame_equal
assert_frame_equal(actual.reset_index(drop = True), expected.reset_index(drop = True))
You use it like this:
df1.equals(df2)
this should work
import pandas as pd
import numpy as np
firstProductSet = {
‘Product1’: [‘Computer’, ‘Phone’, ‘Printer’, ‘Desk’],
‘Price1’: [1200, 800, 200, 350]
}
df1 = pd.DataFrame(firstProductSet, columns = [‘Product1’, ‘Price1’])
secondProductSet = {
‘Product2’: [‘Computer’, ‘Phone’, ‘Printer’, ‘Desk’],
‘Price2’: [900, 800, 300, 350]
}
df2 = pd.DataFrame(secondProductSet, columns = [‘Product2’, ‘Price2’])
df1[‘Price2’] = df2[‘Price2’] #add the Price2 column from df2 to df1
df1[‘pricesMatch?’] = np.where(df1[‘Price1’] == df2[‘Price2’], ‘True’, ‘False’) #create new column in df1 to check
if prices match
df1[‘priceDiff?’] = np.where(df1[‘Price1’] == df2[‘Price2’], 0, df1[‘Price1’] – df2[‘Price2’]) #create new column in df1
for price diff
print(df1)
No explicit instruction given, as to the alignment: == aka DataFrame.__eq__,
In[1]: import pandas as pd
In[2]: df1 = pd.DataFrame(index = [0, 1, 2], data = {
‘col1’: list(‘abc’)
})
In[3]: df2 = pd.DataFrame(index = [2, 0, 1], data = {
‘col1’: list(‘cab’)
})
In[4]: df1 == df2
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — –
…
ValueError: Can only compare identically – labeled DataFrame objects
Alignment is explicitly broken: DataFrame.equals, DataFrame.values, DataFrame.reset_index(),
In[5]: df1.equals(df2)
Out[5]: False
In[9]: df1.values == df2.values
Out[9]:
array([
[False],
[False],
[False]
])
In[10]: (df1.values == df2.values).all().all()
Out[10]: False
Alignment is explicitly enforced: DataFrame.eq, DataFrame.sort_index(),
In[6]: df1.eq(df2)
Out[6]:
col1
0 True
1 True
2 True
In[8]: df1.eq(df2).all().all()
Out[8]: True
I’m showing a complete example of how to handle this error. There are rows with zeros added to them. It is possible to have your dataframes from a number of sources.
import pandas as pd
import numpy as np
# df1 with 9 rows
df1 = pd.DataFrame({
‘Name’: [‘John’, ‘Mike’, ‘Smith’, ‘Wale’, ‘Marry’, ‘Tom’, ‘Menda’, ‘Bolt’, ‘Yuswa’, ],
‘Age’: [23, 45, 12, 34, 27, 44, 28, 39, 40]
})
# df2 with 8 rows
df2 = pd.DataFrame({
‘Name’: [‘John’, ‘Mike’, ‘Wale’, ‘Marry’, ‘Tom’, ‘Menda’, ‘Bolt’, ‘Yuswa’, ],
‘Age’: [25, 45, 14, 34, 26, 44, 29, 42]
})
# get lengths of df1 and df2
df1_len = len(df1)
df2_len = len(df2)
diff = df1_len – df2_len
rows_to_be_added1 = rows_to_be_added2 = 0
# rows_to_be_added1 = np.zeros(diff)
if diff < 0:
rows_to_be_added1 = abs(diff)
else:
rows_to_be_added2 = diff
# add empty rows to df1
if rows_to_be_added1 > 0:
df1 = df1.append(pd.DataFrame(np.zeros((rows_to_be_added1, len(df1.columns))), columns = df1.columns))
# add empty rows to df2
if rows_to_be_added2 > 0:
df2 = df2.append(pd.DataFrame(np.zeros((rows_to_be_added2, len(df2.columns))), columns = df2.columns))
# at this point we have two dataframes with the same number of rows, and maybe different indexes
# drop the indexes of both, so we can compare the dataframes and other operations like update etc.
df2.reset_index(drop = True, inplace = True)
df1.reset_index(drop = True, inplace = True)
Add a new column to df1
df1[‘New_age’] = None
# compare the Age column of df1 and df2, and update the New_age column of df1 with the Age column of df2
if they match,
else None
df1[‘New_age’] = np.where(df1[‘Age’] == df2[‘Age’], df2[‘Age’], None)
# drop rows where Name is 0.0
df2 = df2.drop(df2[df2[‘Name’] == 0.0].index)
# now we don ‘t get the error ValueError: Can only compare identically-labeled Series objects
When comparing two DataFrames using ==, you’ll often get an error: ValueError: Can only compare identically-labeled DataFrame objects. To avoid this issue, you must use.
You will be able to solve this problem by using equals instead of ==.,Congratulations on reading to the end of this tutorial!
When comparing two DataFrames that have different index labels, this error may occur. You can fix this error by either having all of your dataframes share the same index labels or changing the index labels to be more descriptive.
You can either reset the indexes by using.reset_index() or you can ignore the indexes using.equals().
The NumPy method array_equal to compare the two DataFrames’ columns can also solve this error.
This function allows us to compare two Series or DataFrames to see if they have the same shape or elements.
In Python, a value is something you can store in an object.
Python raises a ValueError when using a built-in operation or function that receives an argument that is the right type but an inappropriate value.
The data we want to compare is a list of lists with the index “product”, but the lists have the index “item” which is not the right type to use for comparison.
import pandas as pd
df1 = pd.DataFrame({
‘Bodyweight (kg)’: [76, 84, 93, 106, 120, 56],
‘Bench press (kg)’: [135, 150, 170, 140, 180, 155]
},
index = [‘lifter_1’, ‘lifter_2’, ‘lifter_3’, ‘lifter_4’, ‘lifter_5’, ‘lifter_6’])
df2 = pd.DataFrame({
‘Bodyweight (kg)’: [76, 84, 93, 106, 120, 56],
‘Bench press (kg)’: [145, 120, 180, 220, 175, 110]
},
index = [‘lifter_A’, ‘lifter_B’, ‘lifter_C’, ‘lifter_D’, ‘lifter_E’, ‘lifter_F’])
print(df1)
print(df2)
Let’s run this part of the program to see the DataFrames
Bodyweight(kg) Bench press(kg)
lifter_1 76 135
lifter_2 84 150
lifter_3 93 170
lifter_4 106 140
lifter_5 120 180
lifter_6 56 155
Bodyweight(kg) Bench press(kg)
lifter_A 76 145
lifter_B 84 120
lifter_C 93 180
lifter_D 106 220
lifter_E 120 175
lifter_F 56 110 e
print(df1 == df2)
The DataFrame.equals function can be used to solve this error. The equals function allows us to compare Series and DataFrames to see if they have the same shape or elements. The revised code should be looked at.
print(df1.equals(df2))
False
We can compare the DataFrames using the reset_ index method. The parameters that need to be set are true and drop. Let’s look at the revised code
df1 = pd.DataFrame({
‘Bodyweight (kg)’: [76, 84, 93, 106, 120, 56],
‘Bench press (kg)’: [145, 120, 180, 220, 175, 110]
},
index = [‘lifter_1’, ‘lifter_2’, ‘lifter_3’, ‘lifter_4’, ‘lifter_5’, ‘lifter_6’])
df2 = pd.DataFrame({
‘Bodyweight (kg)’: [76, 84, 93, 106, 120, 56],
‘Bench press (kg)’: [145, 120, 180, 220, 175, 110]
},
index = [‘lifter_A’, ‘lifter_B’, ‘lifter_C’, ‘lifter_D’, ‘lifter_E’, ‘lifter_F’])
df1 = df1.reset_index(drop = True)
df2 = df2.reset_index(drop = True)
print(df1)
print(df2)
Bodyweight (kg) Bench press(kg)
0 76 145
1 84 120
2 93 180
3 106 220
4 120 175
5 56 110
Bodyweight(kg) Bench press(kg)
0 76 145
1 84 120
2 93 180
3 106 220
4 120 175
5 56 110
print(df1.equals(df2))
print(df1 == df2)
If we use numpy.array_equal, we can check if the two arrays have the same shape and elements. We can use the.values to extract the array from the DataFrame. The code we are looking at has been revised.
import pandas as pd
import numpy as np
df1 = pd.DataFrame({
‘Bodyweight (kg)’: [76, 84, 93, 106, 120, 56],
‘Bench press (kg)’: [135, 150, 170, 140, 180, 155]
},
index = [‘lifter_1’, ‘lifter_2’, ‘lifter_3’, ‘lifter_4’, ‘lifter_5’, ‘lifter_6’])
df2 = pd.DataFrame({
‘Bodyweight (kg)’: [76, 84, 93, 106, 120, 56],
‘Bench press (kg)’: [145, 120, 180, 220, 175, 110]
},
index = [‘lifter_A’, ‘lifter_B’, ‘lifter_C’, ‘lifter_D’, ‘lifter_E’, ‘lifter_F’])
print(np.array_equal(df1.values, df2.values))
False
We can use array
equal to compare individual columns. Let’s look at the revised code:
import pandas as pd
import numpy as np
df1 = pd.DataFrame({
‘Bodyweight (kg)’: [76, 84, 93, 106, 120, 56],
‘Bench press (kg)’: [135, 150, 170, 140, 180, 155]
},
index = [‘lifter_1’, ‘lifter_2’, ‘lifter_3’, ‘lifter_4’, ‘lifter_5’, ‘lifter_6’])
df2 = pd.DataFrame({
‘Bodyweight (kg)’: [76, 84, 93, 106, 120, 56],
‘Bench press (kg)’: [145, 120, 180, 220, 175, 110]
},
index = [‘lifter_A’, ‘lifter_B’, ‘lifter_C’, ‘lifter_D’, ‘lifter_E’, ‘lifter_F’])
# Get individual columns of DataFrames using iloc
df1_bodyweight = df1.iloc[: , 0]
df1_bench = df1.iloc[: , 1]
df2_bodyweight = df2.iloc[: , 0]
df2_bench = df2.iloc[: , 1]
# Compare bodyweight and bench columns separately
print(np.array_equal(df1_bodyweight.values, df2_bodyweight.values))
print(np.array_equal(df1_bench.values, df2_bench.values))
True
False
An exception ValueError: Can only compare identically-labeled DataFrame objects was given when I ran this code.
The full exception is that I am trying to create a python app with two dataframes and I want to compare them.
import pandas as pd
dfa = pd.DataFrame({
‘Bodyweight (kg)’: [760, 840, 930, 1060, 1200, 560],
‘Bench press (kg)’: [1350, 1500, 1700, 1400, 1080, 1505]
},
index = [‘index_1’, ‘index_2’, ‘index_3’, ‘index_4’, ‘index_5’, ‘index_6’])
dfb = pd.DataFrame({
‘Bodyweight (kg)’: [756, 840, 903, 1006, 1200, 560],
‘Bench press (kg)’: [1405, 1020, 1080, 2200, 1075, 1010]
},
index = [‘index_A’, ‘index_B’, ‘index_C’, ‘index_D’, ‘index_E’, ‘index_F’])
print(dfa == dfb)
Traceback (most recent call last):
File “main.py”, line 7, in <module>
print(dfa == dfb)
File “/usr/lib/python3.8/site-packages/pandas/core/ops/__init__.py”, line 701, in f
self, other = _align_method_FRAME(self, other, axis, level=None, flex=False)
File “/usr/lib/python3.8/site-packages/pandas/core/ops/__init__.py”, line 510, in _align_method_FRAME
raise ValueError(
ValueError: Can only compare identically-labeled DataFrame objects
** Process exited – Return Code: 1 **
Press Enter to exit terminal
import pandas as pd
dfa = pd.DataFrame({
‘Bodyweight (kg)’: [760, 840, 930, 1060, 1200, 560],
‘Bench press (kg)’: [1350, 1500, 1700, 1400, 1080, 1505]
},
index = [‘index_1’, ‘index_2’, ‘index_3’, ‘index_4’, ‘index_5’, ‘index_6’])
dfb = pd.DataFrame({
‘Bodyweight (kg)’: [756, 840, 903, 1006, 1200, 560],
‘Bench press (kg)’: [1405, 1020, 1080, 2200, 1075, 1010]
},
index = [‘index_A’, ‘index_B’, ‘index_C’, ‘index_D’, ‘index_E’, ‘index_F’])
dfa = dfa.reset_index(drop = True)
dfb = dfb.reset_index(drop = True)
print(dfa == dfb)
Bodyweight(kg) Bench press(kg)
0 False False
1 True False
2 False False
3 False False
4 True False
5 True False
**
Process exited – Return Code: 0 **
Press Enter to exit terminal