Pandas

[Pandas / 기초] 판다스 데이터추가, 삭제 - loc, drop, datetime, date_range, to_datetime

씨주 2024. 1. 10. 17:11

📍 데이터추가

✅ 엑셀로 열기

: pd.read_excel('파일명.xlsx', index_col='column')

In [1]:

import pandas as pd

df = pd.read_excel('score.xlsx', index_col='지원번호') # index 설정
df

Out[1]:

✅ Row 추가

: df.loc['index'] = [list]

In [2]:

df.loc['9번'] = ['이정환', '해남고등학교', 184, 90, 90, 90, 90, 90, 'Kotlin']
df

Out[2]:

✅ Column 추가

: df['new_column'] = data

In [3]:

df['총합'] = df['국어'] + df['영어'] + df['수학'] + df['과학'] + df['사회']
df

Out[3]:

In [4]:

df['결과'] = 'Fail'
df

Out[4]:

✅ 조건, 데이터수정 응용

2024.01.10 - [Pandas] - [Pandas / 기초] 판다스데이터선택 - 조건

In [5]:

df.loc[df['총합'] > 400, '결과'] = 'Pass' # 총합이 400보다 큰 데이터에 대해서 결과를 Pass로 업데이트
df

Out[5]:

📍 datetime(날짜, 시간)

✅ date_range

: pd.date_range(start, end, periods, freq)

start : 시작날짜
end : 끝날짜
periods : 생성할 데이터 개수
freq : 주기

In [6]:

dates = pd.date_range('20240101', periods=df.shape[0], freq='15H')
df['date'] = dates
df

Out[6]:

In [7]:

df['date'].info()

<class 'pandas.core.series.Series'>
Index: 9 entries, 1번 to 9번
Series name: date
Non-Null Count  Dtype         
--------------  -----         
9 non-null      datetime64[ns]
dtypes: datetime64[ns](1)
memory usage: 144.0+ bytes

✅ date_range

: dt접근자로 아래의 날짜속성에 접근 가능

pd.Series.dt.year: 연도
pd.Series.dt.month: 월
pd.Series.dt.day: 일
pd.Series.dt.hour: 시
pd.Series.dt.minute: 분
pd.Series.dt.second: 초
pd.Series.dt.microsecond: micro 초
pd.Series.dt.nanosecond: nano 초
pd.Series.dt.week: 주
pd.Series.dt.weekofyear: 연중 몇 째주
pd.Series.dt.dayofweek: 요일 (월요일 0, 일요일 6)
pd.Series.dt.weekday: 요일 (dayofweek과 동일)
pd.Series.dt.dayofyear: 연중 몇 번째 날
pd.Series.dt.quarter: 분기

In [8]:

df['date'].dt.year.head()

Out[8]:

지원번호
1번    2024
2번    2024
3번    2024
4번    2024
5번    2024
Name: date, dtype: int32

In [9]:

df['date'].dt.dayofweek.head()

Out[9]:

지원번호
1번    0
2번    0
3번    1
4번    1
5번    2
Name: date, dtype: int32

✅ datetime type 변환

: pd.to_datetime(df['column'])

In [10]:

df = pd.read_csv('seoul_bicycle.csv')
df.head()

Out[10]:

In [11]:

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 327231 entries, 0 to 327230
Data columns (total 11 columns):
 #   Column  Non-Null Count   Dtype  
---  ------  --------------   -----  
 0   대여일자    327231 non-null  object 
 1   대여소번호   327231 non-null  int64  
 2   대여소명    327231 non-null  object 
 3   대여구분코드  327231 non-null  object 
 4   성별      272841 non-null  object 
 5   연령대코드   327231 non-null  object 
 6   이용건수    327231 non-null  int64  
 7   운동량     327231 non-null  object 
 8   탄소량     327231 non-null  object 
 9   이동거리    327231 non-null  float64
 10  이용시간    327231 non-null  int64  
dtypes: float64(1), int64(3), object(7)
memory usage: 27.5+ MB

In [12]:

# 대여일자 컬럼은 object 타입이므로 datetime타입으로 변경해야 .dt 접근자 사용 가능
df['대여일자'] = pd.to_datetime(df['대여일자'])
df['대여일자'].dt.year.head()

Out[12]:

0    2020
1    2020
2    2020
3    2020
4    2020
Name: 대여일자, dtype: int32

📍 데이터삭제

✅ Column 삭제

: df.drop(columns=[column_list])

In [13]:

df = pd.read_excel('score.xlsx', index_col='지원번호')
df.drop(columns=['총합']) # 총합 column을 삭제

Out[13]:

In [14]:

df.drop(columns=['국어', '영어', '수학'], inplace=True)
df

Out[14]:

: df.drop([column_list], axis=1)

In [15]:

df.drop('SW특기', axis=1)

Out[15]:

✅ Row 삭제

: df.drop(index=[row_list])

In [16]:

df.drop(index='4번') # 4번 학생 row를 삭제

Out[16]:

In [17]:

df = pd.read_excel('score.xlsx', index_col='지원번호')
filt = df['수학'] < 80 # 수학점수 80점미만 학생 필터링
df[filt]

Out[17]:

In [18]:

df[filt].index

Out[18]:

Index(['2번', '3번', '4번', '5번', '7번'], dtype='object', name='지원번호')

In [19]:

df.drop(index=df[filt].index) # 수학점수 80점 미만인 학생 삭제

Out[19]:

참고 : 나도코딩 파이썬 코딩 무료 강의 (활용편5) - 데이터 분석 및 시각화, 이 영상 하나로 끝내세요

(https://youtu.be/PjhlUzp_cU0?si=LW_MjXLjZVY9PrUt)

'Pandas' 카테고리의 다른 글

[Pandas / 기초] 판다스 그룹화 - groupby, pivot_table (0)	2024.01.10
[Pandas / 기초] 판다스 데이터수정 - replace, rename, lower, apply, lambda (1)	2024.01.10
[Pandas / 기초] 판다스 데이터정렬 - sort_values, sort_index (0)	2024.01.10
[Pandas / 기초] 판다스 결측치 - isnull, notnull, isna, notna, fillna, dropna (0)	2024.01.10
[Pandas / 기초] 판다스 데이터선택 - 조건(and, &, or, \|, str함수, startswith, contains, isin, where) (1)	2024.01.10

현재글[Pandas / 기초] 판다스 데이터추가, 삭제 - loc, drop, datetime, date_range, to_datetime

희주는 개발중

얼레벌레하다보면 될지어다

Today :
Yesterday :

12-27 00:02

희주는 개발중

[Pandas / 기초] 판다스 데이터추가, 삭제 - loc, drop, datetime, date_range, to_datetime

📍 데이터추가

📍 datetime(날짜, 시간)

📍 데이터삭제

In [13]:

'Pandas' 카테고리의 다른 글

'Pandas'의 다른글

티스토리툴바

« 2024/12 »
일	월	화	수	목	금	토
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30	31

[Pandas / 기초] 판다스 데이터추가, 삭제 - loc, drop, datetime, date_range, to_datetime

📍 데이터추가

📍 datetime(날짜, 시간)

📍 데이터삭제

In [13]:

'Pandas' 카테고리의 다른 글

'Pandas'의 다른글

관련글

티스토리툴바