Pandas

[Pandas / 기초] 판다스 데이터수정 - replace, rename, lower, apply, lambda

씨주 2024. 1. 10. 17:47

📍 데이터수정

✅ 엑셀로 열기

: pd.read_excel('파일명.xlsx', index_col='column')

In [1]:

import pandas as pd

df = pd.read_excel('score.xlsx', index_col='지원번호') # index 설정
df

Out[1]:

✅ Column 데이터 수정

: df['column'].replace({'old_column' : 'new_column'})

In [2]:

# 북산고는 상북고로 수정
df['학교'].replace({'북산고':'상북고'}, inplace=True)
df

Out[2]:

: df.rename({'old_name' : 'new_name'})

In [3]:

df.rename({'학교': 'school'}, axis=1, inplace=True) # axis 1은 열방향
df.rename({'1번':'0번'}, axis=0, inplace=True) # axis 0은 행방향
df

Out[3]:

✅ 소문자 대체

: df.str.lower()

In [4]:

df = pd.read_excel('score.xlsx', index_col='지원번호')
df['SW특기'] = df['SW특기'].str.lower()
df

Out[4]:

✅ str 합

: 'str' + 'str'

In [5]:

df['학교'] = df['학교'] + '등학교'
df

Out[5]:

✅ 함수적용

: df['column'].apply(function)

In [6]:

# 데이터타입 불일치(int + str)로 error 발생
df['키'] = df['키'] + 'cm'
df

---------------------------------------------------------------------------
UFuncTypeError                            Traceback (most recent call last)
Cell In[3], line 1
----> 1 df['키'] = df['키'] + 'cm'
      2 df

File ~/anaconda3/lib/python3.11/site-packages/pandas/core/ops/common.py:81, in _unpack_zerodim_and_defer.<locals>.new_method(self, other)
     77             return NotImplemented
     79 other = item_from_zerodim(other)
---> 81 return method(self, other)

File ~/anaconda3/lib/python3.11/site-packages/pandas/core/arraylike.py:186, in OpsMixin.__add__(self, other)
     98 @unpack_zerodim_and_defer("__add__")
     99 def __add__(self, other):
    100     """
    101     Get Addition of DataFrame and other, column-wise.
    102 
   (...)
    184     moose     3.0     NaN
    185     """
--> 186     return self._arith_method(other, operator.add)

File ~/anaconda3/lib/python3.11/site-packages/pandas/core/series.py:6112, in Series._arith_method(self, other, op)
   6110 def _arith_method(self, other, op):
   6111     self, other = ops.align_method_SERIES(self, other)
-> 6112     return base.IndexOpsMixin._arith_method(self, other, op)

File ~/anaconda3/lib/python3.11/site-packages/pandas/core/base.py:1348, in IndexOpsMixin._arith_method(self, other, op)
   1345 rvalues = ensure_wrapped_if_datetimelike(rvalues)
   1347 with np.errstate(all="ignore"):
-> 1348     result = ops.arithmetic_op(lvalues, rvalues, op)
   1350 return self._construct_result(result, name=res_name)

File ~/anaconda3/lib/python3.11/site-packages/pandas/core/ops/array_ops.py:232, in arithmetic_op(left, right, op)
    228     _bool_arith_check(op, left, right)
    230     # error: Argument 1 to "_na_arithmetic_op" has incompatible type
    231     # "Union[ExtensionArray, ndarray[Any, Any]]"; expected "ndarray[Any, Any]"
--> 232     res_values = _na_arithmetic_op(left, right, op)  # type: ignore[arg-type]
    234 return res_values

File ~/anaconda3/lib/python3.11/site-packages/pandas/core/ops/array_ops.py:171, in _na_arithmetic_op(left, right, op, is_cmp)
    168     func = partial(expressions.evaluate, op)
    170 try:
--> 171     result = func(left, right)
    172 except TypeError:
    173     if not is_cmp and (is_object_dtype(left.dtype) or is_object_dtype(right)):
    174         # For object dtype, fallback to a masked operation (only operating
    175         #  on the non-missing values)
    176         # Don't do this for comparisons, as that will handle complex numbers
    177         #  incorrectly, see GH#32047

UFuncTypeError: ufunc 'add' did not contain a loop with signature matching types (dtype('int64'), dtype('<U2')) -> None

In [7]:

# 키 뒤에 cm를 붙이는 함수
def add_cm(height):
    return str(height) + 'cm'

df['키'] = df['키'].apply(add_cm) # 키 데이터에 대해 add_cm함수를 호출한 결과 데이터에 반영
df

Out[7]:

✅ 함수적용(lambda)

: df['column'].apply(lambda)

In [8]:

# 첫 글자는 대문자로, 나머지는 소문자로
df['SW특기'].apply(lambda x : x.capitalize() if pd.notnull(x) else x)

Out[8]:

지원번호
1번        Python
2번          Java
3번    Javascript
4번           NaN
5번           NaN
6번             C
7번        Python
8번            C#
Name: SW특기, dtype: object

✅ Cell 수정

: df.loc['index', 'column'] = data

In [9]:

df = pd.read_excel('score.xlsx', index_col='지원번호')
df.loc['4번', 'SW특기'] = 'Python' # 4번 학생의 SW특기 데이터를 Python으로 수정
df

Out[9]:

In [10]:

df.loc['5번', ['학교', 'SW특기']] = ['능남고', 'C'] # 5번 학생의 학교는 능남고, SW특기는 C로 수정
df

Out[10]:

✅ Column 순서 변경

: df[[new_column_list]]

In [11]:

cols = list(df.columns)
cols

Out[11]:

['이름', '학교', '키', '국어', '영어', '수학', '과학', '사회', 'SW특기']

In [12]:

# 맨 뒤에 있는 SW특기 column을 앞으로 가져오고 나머지 column들과 합쳐서 순서 변경
df = df[[cols[-1]] + cols[0:-1]]
df

Out[12]:

✅ Column 이름 변경

: df.columns = [new_column_list]

In [13]:

df = df[['이름', '학교']]
df

Out[13]:

In [14]:

df.columns

Out[14]:

Index(['이름', '학교'], dtype='object')

In [15]:

df.columns = ['Name', 'School']
df

Out[15]:

참고 : 나도코딩 파이썬 코딩 무료 강의 (활용편5) - 데이터 분석 및 시각화, 이 영상 하나로 끝내세요

(https://youtu.be/PjhlUzp_cU0?si=LW_MjXLjZVY9PrUt)

'Pandas' 카테고리의 다른 글

[Pandas / 기초] 판다스 데이터병합 - concat, merge (0)	2024.01.11
[Pandas / 기초] 판다스 그룹화 - groupby, pivot_table (0)	2024.01.10
[Pandas / 기초] 판다스 데이터추가, 삭제 - loc, drop, datetime, date_range, to_datetime (0)	2024.01.10
[Pandas / 기초] 판다스 데이터정렬 - sort_values, sort_index (0)	2024.01.10
[Pandas / 기초] 판다스 결측치 - isnull, notnull, isna, notna, fillna, dropna (0)	2024.01.10

현재글[Pandas / 기초] 판다스 데이터수정 - replace, rename, lower, apply, lambda

희주는 개발중

얼레벌레하다보면 될지어다

Today :
Yesterday :

01-13 18:14

희주는 개발중

[Pandas / 기초] 판다스 데이터수정 - replace, rename, lower, apply, lambda

📍 데이터수정

'Pandas' 카테고리의 다른 글

'Pandas'의 다른글

티스토리툴바

« 2025/01 »
일	월	화	수	목	금	토
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

[Pandas / 기초] 판다스 데이터수정 - replace, rename, lower, apply, lambda

📍 데이터수정

'Pandas' 카테고리의 다른 글

'Pandas'의 다른글

관련글

티스토리툴바