[python] Pandas MultiIndex 앞에 수준 추가

Question 1

일부 그룹화 후 생성 된 MultiIndex가있는 DataFrame이 있습니다.

import numpy as np
import pandas as p
from numpy.random import randn

df = p.DataFrame({
    'A' : ['a1', 'a1', 'a2', 'a3']
  , 'B' : ['b1', 'b2', 'b3', 'b4']
  , 'Vals' : randn(4)
}).groupby(['A', 'B']).sum()

df

Output>            Vals
Output> A  B
Output> a1 b1 -1.632460
Output>    b2  0.596027
Output> a2 b3 -0.619130
Output> a3 b4 -0.002009

MultiIndex 앞에 레벨을 추가하여 다음과 같이 변경하려면 어떻게해야합니까?

Output>                       Vals
Output> FirstLevel A  B
Output> Foo        a1 b1 -1.632460
Output>               b2  0.596027
Output>            a2 b3 -0.619130
Output>            a3 b4 -0.002009

Question 2

다음을 사용하여 한 줄로이 작업을 수행하는 좋은 방법입니다 pandas.concat().

import pandas as pd

pd.concat([df], keys=['Foo'], names=['Firstlevel'])

더 짧은 방법 :

pd.concat({'Foo': df}, names=['Firstlevel'])

이것은 많은 데이터 프레임으로 일반화 될 수 있습니다 . 문서를 참조하십시오 .

Question 3

먼저 일반 열로 추가 한 다음 현재 인덱스에 추가 할 수 있습니다.

df['Firstlevel'] = 'Foo'
df.set_index('Firstlevel', append=True, inplace=True)

필요한 경우 다음을 사용하여 순서를 변경하십시오.

df.reorder_levels(['Firstlevel', 'A', 'B'])

결과 :

                      Vals
Firstlevel A  B
Foo        a1 b1  0.871563
              b2  0.494001
           a2 b3 -0.167811
           a3 b4 -1.353409

Question 4

나는 이것이 더 일반적인 해결책이라고 생각합니다.

# Convert index to dataframe
old_idx = df.index.to_frame()

# Insert new level at specified location
old_idx.insert(0, 'new_level_name', new_level_values)

# Convert back to MultiIndex
df.index = pandas.MultiIndex.from_frame(old_idx)

다른 답변에 비해 몇 가지 장점 :

새 레벨은 상단뿐만 아니라 모든 위치에 추가 할 수 있습니다.
순전히 인덱스에 대한 조작이며 연결 트릭과 같이 데이터를 조작 할 필요가 없습니다.
중간 단계로 열을 추가 할 필요가 없으므로 다중 수준 열 인덱스가 손상 될 수 있습니다.

Question 5

나는 cxrodgers answer 에서 약간의 기능을 만들었습니다. IMHO는 데이터 프레임이나 시리즈에 관계없이 순전히 인덱스에서 작동하기 때문에 최상의 솔루션입니다.

내가 추가 한 수정 사항이 하나 있습니다.이 to_frame()메서드는 인덱스 수준이없는 새 이름을 발명합니다. 따라서 새 색인은 이전 색인에 존재하지 않는 이름을 갖게됩니다. 이 이름 변경을 되 돌리는 코드를 추가했습니다.

아래는 코드입니다. 한동안 직접 사용해 보았는데 제대로 작동하는 것 같습니다. 문제 나 엣지 케이스를 찾으면 대답을 조정해야 할 의무가 많습니다.

import pandas as pd

def _handle_insert_loc(loc: int, n: int) -> int:
    """
    Computes the insert index from the right if loc is negative for a given size of n.
    """
    return n + loc + 1 if loc < 0 else loc


def add_index_level(old_index: pd.Index, value: Any, name: str = None, loc: int = 0) -> pd.MultiIndex:
    """
    Expand a (multi)index by adding a level to it.

    :param old_index: The index to expand
    :param name: The name of the new index level
    :param value: Scalar or list-like, the values of the new index level
    :param loc: Where to insert the level in the index, 0 is at the front, negative values count back from the rear end
    :return: A new multi-index with the new level added
    """
    loc = _handle_insert_loc(loc, len(old_index.names))
    old_index_df = old_index.to_frame()
    old_index_df.insert(loc, name, value)
    new_index_names = list(old_index.names)  # sometimes new index level names are invented when converting to a df,
    new_index_names.insert(loc, name)        # here the original names are reconstructed
    new_index = pd.MultiIndex.from_frame(old_index_df, names=new_index_names)
    return new_index

다음 unittest 코드를 전달했습니다.

import unittest

import numpy as np
import pandas as pd

class TestPandaStuff(unittest.TestCase):

    def test_add_index_level(self):
        df = pd.DataFrame(data=np.random.normal(size=(6, 3)))
        i1 = add_index_level(df.index, "foo")

        # it does not invent new index names where there are missing
        self.assertEqual([None, None], i1.names)

        # the new level values are added
        self.assertTrue(np.all(i1.get_level_values(0) == "foo"))
        self.assertTrue(np.all(i1.get_level_values(1) == df.index))

        # it does not invent new index names where there are missing
        i2 = add_index_level(i1, ["x", "y"]*3, name="xy", loc=2)
        i3 = add_index_level(i2, ["a", "b", "c"]*2, name="abc", loc=-1)
        self.assertEqual([None, None, "xy", "abc"], i3.names)

        # the new level values are added
        self.assertTrue(np.all(i3.get_level_values(0) == "foo"))
        self.assertTrue(np.all(i3.get_level_values(1) == df.index))
        self.assertTrue(np.all(i3.get_level_values(2) == ["x", "y"]*3))
        self.assertTrue(np.all(i3.get_level_values(3) == ["a", "b", "c"]*2))

        # df.index = i3
        # print()
        # print(df)

Question 6

pandas.MultiIndex.from_tuples 로 처음부터 빌드하는 것은 어떻습니까?

df.index = p.MultiIndex.from_tuples(
    [(nl, A, B) for nl, (A, B) in
        zip(['Foo'] * len(df), df.index)],
    names=['FirstLevel', 'A', 'B'])

cxrodger의 솔루션 과 마찬가지로 이것은 유연한 방법이며 데이터 프레임의 기본 배열을 수정하지 않습니다.