pivot()报错
在使用pivot()进行长表转宽表时,会出现如下错误:
ValueError: Index contains duplicate entries, cannot reshape
例:
// For an Example
df = pd.DataFrame({"foo": ['one', 'one', 'two', 'two'],"bar": ['A', 'A', 'A', 'B'],"baz": [1, 2, 3, 4]})
dffoo bar baz
0 one A 1
1 one A 2
2 two A 3
3 two B 4df.pivot(index='foo', columns='bar', values='baz')
Traceback (most recent call last):...
ValueError: Index contains duplicate entries, cannot reshape
报错原因:前两行,列 ‘foo’ 和列 ‘bar’ 出现了重复值。
解决方法:
- 使用pivot_table()
- 删掉重复值
- 先聚合再使用pivot()
1、使用pivot_table()
df.pivot_table(index='foo', columns='bar', values='baz')bar A B
foo
one 1.5 NaN
two 3.0 4.0
2、删掉重复值
df = df.drop_duplicates(['foo','bar'])
df.pivot(index='foo', columns='bar', values='baz')bar A B
foo
one 1.0 NaN
two 3.0 4.0
3、聚合
df_agg = df.groupby(by=['foo', 'bar']).sum().reset_index()
df_agg.pivot(index='foo', columns='bar', values='baz')bar A B
foo
one 1.0 NaN
two 3.0 4.0