[Python] 판다스(pandas)_4.데이터프레임 속 시리즈(df_Series), indexing, slicing, transpose, values

#정리

#04df_Series - 데이터프레임에서 컬럼(시리즈) 추출

#df[0:3] slicing : 데이터 프레임 자체에도 슬라이싱이 가능하다.
- print(df[:3])
- print(df[3:])
- print(df[3:6])
- print(df[3:len(df)])
#df.loc[[0,3]] indexing : 데이터 프레임 자체에도 슬라이싱이 가능하다.
**print(df[[0,3]]) #error - 데이터프레임은 loc를 붙여줘야한다.
- print(df.loc[[0,3]]) #OK
#df['column name'] : Series 타입으로 특정 컬럼의 데이터 추출 가능
- print(df['emp_id']) / print(df.emp_id) #두 가지 방법 모두 가능하다.
#다양한 방법으로 특정 컬럼 데이터 변경 가능
- df['emp_id'] = pd.Series([11,22,33,44,55,66]) #시리즈로 특정 컬럼 데이터 변경 가능
- df['emp_id'] = np.array([111,222,333,444,555,666]) #넘파이배열로 특정 컬럼 데이터 변경 가능
- df['emp_id'] = np.arange(100,100+6)
- df['emp_id'] = range(200,200+6) #레인지
- df['emp_id'] = [i for i in range(300,300+6)] #리스트
#value_counts() : 데이터 중복 횟수 반환
- print(df['location'].value_counts())
- print(df.location.value_counts())

#df[['column name']] : DataFrame 타입에서 한 개 이상의 컬럼 데이터를 추출 가능
- print(df[['location']])
- print(df[['dept_id','location']])
#df.T : Transpose() : 데이타 프레임은 전치가 가능하다.
- print(df.T[0],type(df.T[0])) #OK Series : 0번 인원의 정보가 하나의 컬럼이 된다.
- print(df.T[[0]],type(df.T[[0]])) #OK DataFrame : 0번 인원의 정보가 하나의 컬럼이 된다.
- print(df.T[[0,3]])

<df.vaules() : 특정 행(인덱스) 추출>
- print(df.values,type(df.values)) #[300 'kim' 8000000 1 'seoul'], [300 'kim' 8000000 1 'seoul'], ..
- print(df.values[0]) #[300 'kim' 8000000 1 'seoul'] >> ndarray 2차원
- print(df.values[1])
#df.vaules()[0:3] slicing
-print(df.values[3:len(df)])
#df.vaules()[0:3] indexing
- indexing은 2차 배열의 원소 데이터값 지목과 달리 괄호[[]]가 두 개 필요하다.
- print(df.values[[0,3]])

#df.values[0][0],df.values[0][1] : 특정 한 개의 데이터 추출
- print(df.values[0][0],df.values[0][1])
- print(df.values[0,0],df.values[0,1])#2차 배열 원소데이터 값 지목 >> [] 한개

<boolean: 조건식으로 데이터 추출>
#(단일 조건 검색) : where location = 'seoul'

- print(df[df['location']=='seoul'])
- print(df[df.location=='seoul'])

#(복수 조건 검색) :where location = 'seoul' and salary>=7,000,000
- print(df[(df['location']=='seoul') & (df['salary']>=7000000)])
- print(df[(df.location=='seoul') & (df.salary>=7000000)])

#04df_Series - 데이터프레임에서 컬럼(시리즈) 추출

#df[0:3] slicing : 데이터 프레임 자체에도 슬라이싱이 가능하다.
- print(df[:3])
- print(df[3:])
- print(df[3:6])
- print(df[3:len(df)])
#df.loc[[0,3]] indexing : 데이터 프레임 자체에도 슬라이싱이 가능하다.
**print(df[[0,3]]) #error - 데이터프레임은 loc를 붙여줘야한다.
- print(df.loc[[0,3]]) #OK
#df['column name'] : Series 타입으로 특정 컬럼의 데이터 추출 가능
- print(df['emp_id']) / print(df.emp_id) #두 가지 방법 모두 가능하다.
#다양한 방법으로 특정 컬럼 데이터 변경 가능
- df['emp_id'] = pd.Series([11,22,33,44,55,66]) #시리즈로 특정 컬럼 데이터 변경 가능
- df['emp_id'] = np.array([111,222,333,444,555,666]) #넘파이배열로 특정 컬럼 데이터 변경 가능
- df['emp_id'] = np.arange(100,100+6)
- df['emp_id'] = range(200,200+6) #레인지
- df['emp_id'] = [i for i in range(300,300+6)] #리스트
#value_counts() : 데이터 중복 횟수 반환
- print(df['location'].value_counts())
- print(df.location.value_counts())

#df[['column name']] : DataFrame 타입에서 한 개 이상의 컬럼 데이터를 추출 가능
- print(df[['location']])
- print(df[['dept_id','location']])
#df.T : Transpose() : 데이타 프레임은 전치가 가능하다.
- print(df.T[0],type(df.T[0])) #OK Series : 0번 인원의 정보가 하나의 컬럼이 된다.
- print(df.T[[0]],type(df.T[[0]])) #OK DataFrame : 0번 인원의 정보가 하나의 컬럼이 된다.
- print(df.T[[0,3]])

<df.vaules() : 특정 행(인덱스) 추출>
- print(df.values,type(df.values)) #[300 'kim' 8000000 1 'seoul'], [300 'kim' 8000000 1 'seoul'], ..
- print(df.values[0]) #[300 'kim' 8000000 1 'seoul'] >> ndarray 2차원
- print(df.values[1])
#df.vaules()[0:3] slicing
-print(df.values[3:len(df)])
#df.vaules()[0:3] indexing
- indexing은 2차 배열의 원소 데이터값 지목과 달리 괄호[[]]가 두 개 필요하다.
- print(df.values[[0,3]])

#df.values[0][0],df.values[0][1] : 특정 한 개의 데이터 추출
- print(df.values[0][0],df.values[0][1])
- print(df.values[0,0],df.values[0,1])#2차 배열 원소데이터 값 지목 >> [] 한개

<boolean: 조건식으로 데이터 추출>
#(단일 조건 검색) : where location = 'seoul'

- print(df[df['location']=='seoul'])
- print(df[df.location=='seoul'])

#(복수 조건 검색) :where location = 'seoul' and salary>=7,000,000
- print(df[(df['location']=='seoul') & (df['salary']>=7000000)])
- print(df[(df.location=='seoul') & (df.salary>=7000000)])

--예문 코드 보기--

#!/usr/bin/env python
# -*- coding: utf-8 -*-
import numpy as np
import pandas as pd

#from numpy import *
#from pandas import Series,DataFrame
# >> Series,DataFrame 는 pd.Series라고 안적고, pd. 까지만 적어도 된다.

print("numpy version",pd.__version__) #pandas version check
print("---------------")
#dict = list,tuple type만 데이터로 사용 가능, set type X
emp = {
	'emp_id':[100,101,200,201,300,400],
	'name':('kim','lee','choi','han','gang','yoon'),
	'salary':[8000000,7000000,7500000,5000000,4000000,6000000],
	'dept_id':[1,1,2,2,3,4],
	'location':['seoul','seoul','busan','busan','daegu','incheon']
}

df = pd.DataFrame(emp)
print(df)
print("len(df):",len(df))

print("-----df[0:3] slicing-----------")
#데이터 프레임 자체에도 슬라이싱이 가능하다. 
print(df[:3])
print(df[0:3])
print(df[3:])
print(df[3:6])
print("len(df):",len(df))
print(df[3:len(df)])

print("-----df.loc[[0,3]] indexing-----------")
#데이터 프레임 자체에도 슬라이싱이 가능하다. 
#print(df[[0,3]]) #error - 데이터프레임은 loc를 붙여줘야한다.
print(df.loc[[0,3]]) #OK 


print("---df['column name'] : Series ---")
print(df['emp_id'])
print(df.emp_id)  
#다양한 방법으로 특정 컬럼 데이터 변경 가능
df['emp_id'] = pd.Series([11,22,33,44,55,66]) #시리즈로 특정 컬럼 데이터 변경 가능
df['emp_id'] = np.array([111,222,333,444,555,666]) #넘파이배열로 특정 컬럼 데이터 변경 가능
df['emp_id'] = np.arange(100,100+6)
df['emp_id'] = range(200,200+6) #레인지
df['emp_id'] = [i for i in range(300,300+6)] #리스트

print(df['name'])
print(df.name)

print("---value_counts()---")
print(df['location'].value_counts())
print(df.location.value_counts())

print("---df[['column name']] : DataFrame ---")
#복수의 컬럼 데이터를 추출 가능
print(df[['location']])
print(df[['dept_id','location']])
print(df[['emp_id']])
print(df[['emp_id','salary']])

print("-----df.T : Transpose()-----------")
#데이타 프레임은 전치가 가능하다.
#print(df[0]) #key error - key value 0 doesn't exist
print(df.T) 
print(df.T[0],type(df.T[0])) #OK Series : 0번 인원의 정보가 하나의 컬럼이 된다.
print(df.T[[0]],type(df.T[[0]])) #OK DataFrame : 0번 인원의 정보가 하나의 컬럼이 된다.
print(df.T[[0,3]]) 

print("-----df.vaules()-----------")
print(df.values,type(df.values)) #ndarray 2차원
print("-----df.vaules()[0]-----------")
print(df.values[0])
print(df.values[1])


print("-----df.vaules()[0:3] slicing-----------")
print(df.values[:3])
print(df.values[0:3])
print(df.values[3:])
print(df.values[3:6])
print("len(df):",len(df))
print(df.values[3:len(df)])

print("-----df.vaules()[0:3] indexing-----------")
#indexing은 2차 배열의 원소 데이터값 지목과 달리 괄호[[]]가 두 개 필요하다.
print(df.values[[0,3]])
print("-----df.values[0][0],df.values[0][1] cell-----------")
print(df.values[0][0],df.values[0][1])
print(df.values[0,0],df.values[0,1])#2차 배열 원소데이터 값 지목 >> [] 한개

'Python' 카테고리의 다른 글

[Python] 판다스(pandas)_6.operator - 연산자, function - 데이터 프레임 함수 (0)	2022.04.16
[Python] 판다스(pandas)_5. df_sort - 인덱스, df_boolean - 조건식 데이터 추출, drop - 열, 행 삭제 (0)	2022.04.16
[Python] 판다스(pandas)_3. 딕셔너리(dict), set_index, indexing, slicing, OrderedDict (0)	2022.04.16
[Python] 판다스(pandas)_2. 인덱스 변경(reindex) (0)	2022.04.16
[Python] 판다스(pandas)_1. 시리즈(Series) 배열, indexing, slicing, boolean, operator (0)	2022.04.16

걍작 수업 필기

[Python] 판다스(pandas)_4.데이터프레임 속 시리즈(df_Series), indexing, slicing, transpose, values

'Python' 카테고리의 다른 글

티스토리툴바

[Python] 판다스(pandas)_4.데이터프레임 속 시리즈(df_Series), indexing, slicing, transpose, values

'Python' 카테고리의 다른 글

'Python' Related Articles

티스토리툴바