Filter Data pada DataFrame Pandas
Dalam tahapan preprocessing data ataupun sebagai bagian dari analisis, seringkali dilakukan proses filter data-data sesuai kebutuhan. Tutorial ini kita akan membahas teknik-teknik untuk memfilter data menggunakan pustaka pandas
. Versi pandas
yang digunakan adalah versi 2.0.2
.
Python
# pip install pandas import pandas as pd print(pd.__version__) # 2.0.2
Penyiapan Data
Silahkan menggunakan data apa saja sepanjang dapat di impor ke dalam dataframe pandas
. Data yang digunakan pada tutorial ini adalah data sintetis atau buatan yang berisi informasi mengenai transaksi suatu retail. Format data yang digunakan adalah file parquet
. Jika ingin menggunakan data yang sama, klik di sini.
Python
# membaca data (format parquet) data = pd.read_parquet("retail_transaction.parquet") # menampilkan informasi data data.info()
Output
<class 'pandas.core.frame.DataFrame'> RangeIndex: 75620 entries, 0 to 75619 Data columns (total 8 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 tr_date 75620 non-null datetime64[ns] 1 state 75620 non-null object 2 city 75620 non-null object 3 membership_id 43923 non-null object 4 num_items 75620 non-null int64 5 coupon 20942 non-null object 6 discount_amt 20942 non-null float64 7 order_amt 75620 non-null float64 dtypes: datetime64[ns](1), float64(2), int64(1), object(4) memory usage: 4.6+ MB
Note : Pada tutorial ini, untuk menampilkan output pada layar akan menggunakan fungsi print
dan Dataframe dikonversi terlebih dahulu dengan method to_string
dengan maksimal data ditampilkan sebanyak 10 baris.
Contoh menampilkan variabel data
:
Python
# mencetak variabel data # Jika menggunakan jupyter notebook akan menampilkan format tabel rapi # data # print(data) # yang digunakan sepanjang tutorial print(data.to_string(max_rows=10))
Output
tr_date state city membership_id num_items coupon discount_amt order_amt 0 2020-06-24 Florida Miami A23-472-61 22 None NaN 213.68 1 2020-05-15 South Carolina Greenville None 13 Yes 0.33 155.59 2 2020-02-20 Georgia Atlanta R04-941-15 22 None NaN 106.82 3 2020-07-22 Florida Sarasota U46-940-80 17 Yes 0.25 169.14 4 2020-01-31 Florida Pensacola None 8 None NaN 72.93 ... ... ... ... ... ... ... ... ... 75615 2019-11-30 Florida Fort Pierce G23-376-16 12 None NaN 230.75 75616 2020-08-02 Florida Panama City P63-789-70 16 None NaN 76.02 75617 2020-01-11 Florida Panama City E79-788-42 13 None NaN 104.27 75618 2020-08-21 Alabama Gadsden E21-954-64 14 None NaN 161.50 75619 2019-11-02 Georgia Athens None 15 None NaN 179.96
Filter Null, NaN, None
Nilai null
, NaN
, None
dapat difilter menggunakan fungsi isna
atau isnull
. Kedua fungsi ini pada pandas
akan memberikan hasil yang sama.
Contoh 1 – Filter data dengan kolom membership_id
bernilai None
:
Python
# menggunakan fungsi isna print(data[data["membership_id"].isna()].to_string(max_rows=10)) # menggunakan fungsi isnull print(data[data["membership_id"].isnull()].to_string(max_rows=10))
Output
tr_date state city membership_id num_items coupon discount_amt order_amt 1 2020-05-15 South Carolina Greenville None 13 Yes 0.33 155.59 4 2020-01-31 Florida Pensacola None 8 None NaN 72.93 5 2020-02-16 Florida Miami None 26 None NaN 104.69 7 2019-11-07 Florida Homestead None 18 None NaN 94.76 9 2020-10-01 Georgia Atlanta None 6 Yes 0.36 202.52 ... ... ... ... ... ... ... ... ... 75606 2020-06-06 Florida West Palm Beach None 11 None NaN 44.94 75609 2019-10-22 Florida Orlando None 22 None NaN 114.95 75612 2020-07-17 Florida Miami None 21 None NaN 198.88 75613 2020-10-16 Alabama Birmingham None 29 Yes 0.13 158.22 75619 2019-11-02 Georgia Athens None 15 None NaN 179.96
Contoh 2 – Filter data dengan kolom discount_amt
bernilai NaN
:
Python
# menggunakan fungsi isna print(data[data["discount_amt"].isna()].to_string(max_rows=10))
Output
tr_date state city membership_id num_items coupon discount_amt order_amt 0 2020-06-24 Florida Miami A23-472-61 22 None NaN 213.68 2 2020-02-20 Georgia Atlanta R04-941-15 22 None NaN 106.82 4 2020-01-31 Florida Pensacola None 8 None NaN 72.93 5 2020-02-16 Florida Miami None 26 None NaN 104.69 6 2020-04-24 Florida Daytona Beach O68-957-14 6 None NaN 23.50 ... ... ... ... ... ... ... ... ... 75615 2019-11-30 Florida Fort Pierce G23-376-16 12 None NaN 230.75 75616 2020-08-02 Florida Panama City P63-789-70 16 None NaN 76.02 75617 2020-01-11 Florida Panama City E79-788-42 13 None NaN 104.27 75618 2020-08-21 Alabama Gadsden E21-954-64 14 None NaN 161.50 75619 2019-11-02 Georgia Athens None 15 None NaN 179.96
Filter Data Numerik
Filter pada data numerik dapat menggunakan operator perbandingan:
- sama dengan :
==
- tidak sama dengan :
!=
- lebih besar dari :
>
- lebih besar atau sama dengan :
>=
- lebih kecil dari :
<
- lebih kecil atau sama dengan :
<=
- ada di dalam list atau range:
.isin(...)
- berada di antara batas-batas:
.between(...)
- gabungan kondisi :
(...) & (...)
(harus semua terpenuhi),(...) | (...)
(setidaknya 1 saja terpenuhi) - negasi :
~
Contoh 3 – Filter data dimana kolom order_amt
lebih besar atau sama dengan $200
:
Agar sintaks lebih mudah dibaca, kita dapat memisahkan filter atau statemen kondisi pada variabel tersendiri. Hal ini sangat bermanfaat pada saat jumlah kondisi yang digunakan semakin banyak.
Python
filter_1 = data["order_amt"] >= 200 # data[filter_1] print(data[filter_1].to_string(max_rows=10))
Output
tr_date state city membership_id num_items coupon discount_amt order_amt 0 2020-06-24 Florida Miami A23-472-61 22 None NaN 213.68 9 2020-10-01 Georgia Atlanta None 6 Yes 0.36 202.52 20 2019-10-31 Florida West Palm Beach K73-446-05 19 None NaN 208.82 24 2020-08-13 Georgia Gainesville None 31 None NaN 216.70 26 2020-04-22 Florida Pompano Beach None 11 None NaN 204.15 ... ... ... ... ... ... ... ... ... 75601 2020-02-17 Florida Orlando E75-924-94 5 Yes 0.16 202.56 75602 2020-02-03 Florida Lake Worth None 34 None NaN 220.64 75608 2019-10-10 Florida Orlando Z52-600-31 13 None NaN 232.22 75611 2020-08-18 Alabama Anniston Q10-224-44 24 None NaN 216.49 75615 2019-11-30 Florida Fort Pierce G23-376-16 12 None NaN 230.75
Contoh 4 – Filter data dengan discount_amt
sama dengan 0.2 dan order_amt
lebih dari $100 :
Jika filter terdiri dari 2 atau lebih kondisi dan menggunakan operator perbandingan (==
,!=
, dst) pastikan untuk menutup setiap kondisi dalam tanda kurung (kondisi 1) & (kondisi 2) | ...
Python
# pastikan setiap kondisi dimasukkan dalam tanda (...) filter_2 = (data["discount_amt"] == 0.2) & (data["order_amt"] < 50) # data[filter_2] print(data[filter_2].to_string(max_rows=10))
Output
tr_date state city membership_id num_items coupon discount_amt order_amt 472 2019-11-13 Florida Jacksonville None 29 Yes 0.2 24.73 762 2019-11-28 Georgia Atlanta S26-460-55 7 Yes 0.2 36.73 1007 2019-12-25 Georgia Atlanta None 29 Yes 0.2 24.76 2389 2019-11-11 Florida Gainesville O52-218-69 32 Yes 0.2 25.04 3639 2019-12-22 Alabama Montgomery U35-263-94 3 Yes 0.2 35.21 ... ... ... ... ... ... ... ... ... 67271 2020-03-02 Alabama Birmingham None 29 Yes 0.2 29.44 70579 2020-05-14 Florida Pensacola None 7 Yes 0.2 46.67 73366 2020-01-31 Florida Melbourne G71-142-05 11 Yes 0.2 43.71 74204 2020-06-01 Florida Clearwater None 19 Yes 0.2 42.16 74889 2020-06-01 Florida Brooksville C95-444-50 26 Yes 0.2 45.33
Contoh 5 – Filter nilai num_items
antara 10-19
dan order_amt
< $50
:
Python
filter_3 = data["num_items"].isin(range(10, 20)) & (data["order_amt"] < 50) # data[filter_3] print(data[filter_3].to_string(max_rows=10))
Output
tr_date state city membership_id num_items coupon discount_amt order_amt 28 2020-04-28 Florida Gainesville Q80-771-31 13 None NaN 45.35 40 2020-01-13 Florida Saint Augustine O45-092-88 17 None NaN 24.27 63 2019-10-27 Florida Hialeah B70-879-96 10 Yes 0.19 32.38 94 2019-12-23 Florida West Palm Beach G74-909-31 12 None NaN 22.60 129 2020-10-06 Alabama Mobile None 15 None NaN 29.49 ... ... ... ... ... ... ... ... ... 75501 2019-10-01 Georgia Decatur None 14 None NaN 22.71 75538 2020-06-28 Florida Fort Lauderdale L86-855-25 12 None NaN 43.00 75566 2020-10-10 Alabama Montgomery None 13 None NaN 25.74 75591 2020-09-10 Georgia Atlanta I01-339-59 16 None NaN 29.67 75606 2020-06-06 Florida West Palm Beach None 11 None NaN 44.94
Contoh 6 – Filter nilai num_items
bukan [3, 5 atau 7] dan order_amt
> $100 :
Python
filter_4 = (~data["num_items"].isin([3, 5, 7])) & (data["order_amt"] > 100) # data[filter_4] print(data[filter_4].to_string(max_rows=10))
Output
tr_date state city membership_id num_items coupon discount_amt order_amt 0 2020-06-24 Florida Miami A23-472-61 22 None NaN 213.68 1 2020-05-15 South Carolina Greenville None 13 Yes 0.33 155.59 2 2020-02-20 Georgia Atlanta R04-941-15 22 None NaN 106.82 3 2020-07-22 Florida Sarasota U46-940-80 17 Yes 0.25 169.14 5 2020-02-16 Florida Miami None 26 None NaN 104.69 ... ... ... ... ... ... ... ... ... 75613 2020-10-16 Alabama Birmingham None 29 Yes 0.13 158.22 75615 2019-11-30 Florida Fort Pierce G23-376-16 12 None NaN 230.75 75617 2020-01-11 Florida Panama City E79-788-42 13 None NaN 104.27 75618 2020-08-21 Alabama Gadsden E21-954-64 14 None NaN 161.50 75619 2019-11-02 Georgia Athens None 15 None NaN 179.96
Filter Data Text dan Objek
Untuk data yang berisi text atau objek dapat dibuat difilter menggunakan operator serta fungsi sebagai berikut:
- sama dengan :
==
- tidak sama dengan :
!=
- diawali dengan :
str.startswith
- diakhiri dengan :
str.endswith
- mengandung text :
str.contains
- ada di dalam list :
isin
- negasi :
~
- regex :
str.contains
(regex=True
)
Contoh 7 – Filter kolom state=
“Florida"
:
Python
filter_5 = data["state"] == "Florida" # data[filter_5] print(data[filter_5].to_string(max_rows=10))
Output
tr_date state city membership_id num_items coupon discount_amt order_amt 0 2020-06-24 Florida Miami A23-472-61 22 None NaN 213.68 3 2020-07-22 Florida Sarasota U46-940-80 17 Yes 0.25 169.14 4 2020-01-31 Florida Pensacola None 8 None NaN 72.93 5 2020-02-16 Florida Miami None 26 None NaN 104.69 6 2020-04-24 Florida Daytona Beach O68-957-14 6 None NaN 23.50 ... ... ... ... ... ... ... ... ... 75609 2019-10-22 Florida Orlando None 22 None NaN 114.95 75612 2020-07-17 Florida Miami None 21 None NaN 198.88 75615 2019-11-30 Florida Fort Pierce G23-376-16 12 None NaN 230.75 75616 2020-08-02 Florida Panama City P63-789-70 16 None NaN 76.02 75617 2020-01-11 Florida Panama City E79-788-42 13 None NaN 104.27
Contoh 8 – Filter kolom state
adalah "Florida"
dan city
bukan "Miami"
:
Python
filter_6 = (data["state"] == "Florida") & (data["city"] != "Miami") # data[filter_6] print(data[filter_6].to_string(max_rows=10))
Output
tr_date state city membership_id num_items coupon discount_amt order_amt 3 2020-07-22 Florida Sarasota U46-940-80 17 Yes 0.25 169.14 4 2020-01-31 Florida Pensacola None 8 None NaN 72.93 6 2020-04-24 Florida Daytona Beach O68-957-14 6 None NaN 23.50 7 2019-11-07 Florida Homestead None 18 None NaN 94.76 10 2020-02-07 Florida Fort Lauderdale A23-124-53 7 Yes 0.42 172.93 ... ... ... ... ... ... ... ... ... 75608 2019-10-10 Florida Orlando Z52-600-31 13 None NaN 232.22 75609 2019-10-22 Florida Orlando None 22 None NaN 114.95 75615 2019-11-30 Florida Fort Pierce G23-376-16 12 None NaN 230.75 75616 2020-08-02 Florida Panama City P63-789-70 16 None NaN 76.02 75617 2020-01-11 Florida Panama City E79-788-42 13 None NaN 104.27
Contoh 9 – Filter kolom city
diawali kata "Fort"
:
Python
filter_7 = data["city"].str.startswith("Fort") data[filter_7] print(data[filter_7].to_string(max_rows=10))
Output
tr_date state city membership_id num_items coupon discount_amt order_amt 10 2020-02-07 Florida Fort Lauderdale A23-124-53 7 Yes 0.42 172.93 19 2019-10-11 Florida Fort Lauderdale O72-081-20 29 None NaN 187.34 42 2019-10-25 Florida Fort Pierce J63-913-89 24 None NaN 93.12 49 2020-07-06 Florida Fort Myers B45-088-52 5 None NaN 216.23 59 2019-11-07 Florida Fort Lauderdale None 15 None NaN 151.35 ... ... ... ... ... ... ... ... ... 75547 2020-06-01 Florida Fort Pierce None 8 None NaN 176.79 75552 2020-02-12 Florida Fort Lauderdale None 6 None NaN 212.13 75562 2020-01-07 Florida Fort Lauderdale F20-296-85 16 None NaN 124.53 75575 2020-03-10 Florida Fort Lauderdale None 28 None NaN 86.91 75615 2019-11-30 Florida Fort Pierce G23-376-16 12 None NaN 230.75
Contoh 10 – Filter kolom state
adalah "Florida"
dan kolom city
mengandung kata "land"
:
Python
filter_8 = (data["state"] == "Florida") & data["city"].str.contains("land") data[filter_8] print(data[filter_8].to_string(max_rows=10))
Output
tr_date state city membership_id num_items coupon discount_amt order_amt 16 2020-10-30 Florida Orlando None 25 None NaN 51.31 145 2019-11-21 Florida Lakeland None 22 Yes 0.50 84.69 150 2020-02-14 Florida Orlando N28-332-99 16 None NaN 55.73 157 2020-01-24 Florida Orlando None 5 None NaN 149.51 166 2019-10-03 Florida Orlando R30-922-39 14 None NaN 183.37 ... ... ... ... ... ... ... ... ... 75456 2019-10-07 Florida Lakeland None 24 Yes 0.44 191.45 75564 2020-10-22 Florida Orlando Z46-564-21 25 Yes 0.25 157.63 75601 2020-02-17 Florida Orlando E75-924-94 5 Yes 0.16 202.56 75608 2019-10-10 Florida Orlando Z52-600-31 13 None NaN 232.22 75609 2019-10-22 Florida Orlando None 22 None NaN 114.95
Contoh 11 – Filter kolom state
adalah "Florida"
, "Georgia"
atau "Alabama"
dan kolom coupon
bernilai "Yes"
:
Python
selected_state = ["Florida", "Georgia", "Alabama"] filter_9_1 = data["state"].isin(selected_state) filter_9_2 = data["coupon"] == "Yes" # data[filter_9_1 & filter_9_2] print(data[filter_9_1 & filter_9_2].to_string(max_rows=10))
Output
tr_date state city membership_id num_items coupon discount_amt order_amt 3 2020-07-22 Florida Sarasota U46-940-80 17 Yes 0.25 169.14 8 2020-03-06 Georgia Lawrenceville H90-561-44 28 Yes 0.17 105.71 9 2020-10-01 Georgia Atlanta None 6 Yes 0.36 202.52 10 2020-02-07 Florida Fort Lauderdale A23-124-53 7 Yes 0.42 172.93 13 2020-04-22 Alabama Birmingham None 22 Yes 0.41 176.76 ... ... ... ... ... ... ... ... ... 75600 2020-01-19 Florida Jacksonville None 5 Yes 0.22 122.68 75601 2020-02-17 Florida Orlando E75-924-94 5 Yes 0.16 202.56 75605 2020-06-21 Georgia Columbus None 17 Yes 0.34 155.56 75610 2020-01-16 Alabama Mobile E29-461-87 31 Yes 0.14 102.49 75613 2020-10-16 Alabama Birmingham None 29 Yes 0.13 158.22
Contoh 12 – Filter data membership_id
dengan patern [X-Z]?9-9??-[1-5]?
:
Python
filter_10_1 = ~data["membership_id"].isna() filter_10_2 = data["membership_id"].str.contains("[X-Z]\d9-9\d\d-[1-5]\d", regex=True) # data[filter_10_1 & filter_10_2] print(data[filter_10_1 & filter_10_2].to_string(max_rows=10))
Output
tr_date state city membership_id num_items coupon discount_amt order_amt 50 2020-06-10 Alabama Birmingham Y09-907-14 28 No 0.00 128.69 4106 2020-01-22 Florida Boynton Beach Y79-959-18 33 No 0.00 179.57 5349 2020-02-19 Alabama Mobile Y39-904-45 24 No 0.00 225.17 11146 2020-03-25 Florida Palm Bay Z29-955-38 30 Yes 0.19 191.54 19880 2020-08-21 Alabama Mobile X49-948-24 20 No 0.00 194.33 ... ... ... ... ... ... ... ... ... 52524 2020-03-29 Florida Bradenton Z89-902-27 6 Yes 0.46 104.35 61588 2019-10-17 Florida Daytona Beach Y79-918-38 16 No 0.00 83.69 70711 2019-12-24 Alabama Birmingham Y69-936-49 19 No 0.00 185.18 71649 2020-09-20 Mississippi Jackson X09-955-14 3 No 0.00 138.51 72698 2020-01-26 Florida Ocala Z49-998-39 29 No 0.00 174.92
Filter Data DateTime
Kolom-kolom bertipe datetime
dapat difilter menggunakan operator sama sepeti pada data numerik.
Contoh 13 – Filter berdasarkan tanggal tertentu :
Python
filter_11 = data["tr_date"] == "2020-01-01" # data[filter_11] print(data[filter_11].to_string(max_rows=10))
Output
tr_date state city membership_id num_items coupon discount_amt order_amt 577 2020-01-01 Florida Jacksonville Y90-746-84 3 None NaN 163.18 1424 2020-01-01 Georgia Duluth N54-069-70 33 None NaN 213.16 1792 2020-01-01 South Carolina Columbia None 3 None NaN 120.63 1918 2020-01-01 Florida Vero Beach Z81-325-76 23 None NaN 193.26 1985 2020-01-01 Georgia Savannah None 28 None NaN 64.50 ... ... ... ... ... ... ... ... ... 71875 2020-01-01 Florida Saint Petersburg I10-625-73 9 None NaN 104.78 72554 2020-01-01 Florida Fort Lauderdale K35-525-05 8 None NaN 201.99 72737 2020-01-01 Florida Lehigh Acres T33-942-96 34 None NaN 22.86 73302 2020-01-01 Florida Cape Coral K41-060-07 32 None NaN 97.19 73881 2020-01-01 Alabama Montgomery L82-292-59 20 None NaN 214.39
Contoh 14 – Filter menurut tahun tertentu dan bulan tertentu :
Untuk mendapatkan nilai tahun, bulan, hari pada data datetime dapat melalui method dt
.
Python
# Tahun 2019 Bulan Juli-Des filter_12 = (data["tr_date"].dt.year == 2019) & (data["tr_date"].dt.month > 6) # data[filter_12] print(data[filter_12].to_string(max_rows=10))
Output
tr_date state city membership_id num_items coupon discount_amt order_amt 7 2019-11-07 Florida Homestead None 18 None NaN 94.76 18 2019-11-27 Florida Jacksonville None 9 None NaN 125.20 19 2019-10-11 Florida Fort Lauderdale O72-081-20 29 None NaN 187.34 20 2019-10-31 Florida West Palm Beach K73-446-05 19 None NaN 208.82 25 2019-11-16 Florida Miami Beach T87-267-53 33 None NaN 61.08 ... ... ... ... ... ... ... ... ... 75599 2019-12-25 Florida Ocala V99-965-64 4 Yes 0.29 193.25 75608 2019-10-10 Florida Orlando Z52-600-31 13 None NaN 232.22 75609 2019-10-22 Florida Orlando None 22 None NaN 114.95 75615 2019-11-30 Florida Fort Pierce G23-376-16 12 None NaN 230.75 75619 2019-11-02 Georgia Athens None 15 None NaN 179.96
Contoh 15 – Filter antara 2 rentang tanggal :
Python
start = pd.to_datetime("2020-06-01") end = pd.to_datetime("2020-09-30") # inclusive # both : start dan end termasuk dalam range # neither : start dan end tidak termasuk dalam range # left : start masuk dalam range dan end tidak # right : start tidak masuk dalam range dan end masuk filter_13 = data["tr_date"].isin(pd.date_range(start, end, inclusive="both")) data[filter_13] print(data[filter_13].to_string(max_rows=10))
Output
tr_date state city membership_id num_items coupon discount_amt order_amt 0 2020-06-24 Florida Miami A23-472-61 22 None NaN 213.68 3 2020-07-22 Florida Sarasota U46-940-80 17 Yes 0.25 169.14 14 2020-06-21 Alabama Birmingham J25-217-44 33 None NaN 47.86 15 2020-08-13 Florida Tampa None 26 None NaN 42.29 21 2020-07-27 Georgia Atlanta None 6 Yes 0.28 67.64 ... ... ... ... ... ... ... ... ... 75611 2020-08-18 Alabama Anniston Q10-224-44 24 None NaN 216.49 75612 2020-07-17 Florida Miami None 21 None NaN 198.88 75614 2020-08-26 Alabama Birmingham U38-420-51 5 None NaN 98.44 75616 2020-08-02 Florida Panama City P63-789-70 16 None NaN 76.02 75618 2020-08-21 Alabama Gadsden E21-954-64 14 None NaN 161.50
Mengelola Data Hasil Filter
Data yang sudah difilter dapat dilakukan proses lebih lanjut, misalnya:
Menganti Nilai NaN
dengan 0 pada kolom discount_amt
dan nilai None
pada kolom coupon
menjadi No
:
Python
# filter baris yang memiliki nilai discount_amt = NaN filter_13 = data["discount_amt"].isnull() # ganti nilainya dengan 0 data.loc[filter_13, "discount_amt"] = 0 # filter baris yang memiliki nilai coupon = None filter_14 = data["coupon"].isna() # ganti nilainya dengan No data.loc[filter_13, "coupon"] = "No" # mencetak data yang sudah di-update print(data.to_string(max_rows=10))
Output
tr_date state city membership_id num_items coupon discount_amt order_amt 0 2020-06-24 Florida Miami A23-472-61 22 No 0.00 213.68 1 2020-05-15 South Carolina Greenville None 13 Yes 0.33 155.59 2 2020-02-20 Georgia Atlanta R04-941-15 22 No 0.00 106.82 3 2020-07-22 Florida Sarasota U46-940-80 17 Yes 0.25 169.14 4 2020-01-31 Florida Pensacola None 8 No 0.00 72.93 ... ... ... ... ... ... ... ... ... 75615 2019-11-30 Florida Fort Pierce G23-376-16 12 No 0.00 230.75 75616 2020-08-02 Florida Panama City P63-789-70 16 No 0.00 76.02 75617 2020-01-11 Florida Panama City E79-788-42 13 No 0.00 104.27 75618 2020-08-21 Alabama Gadsden E21-954-64 14 No 0.00 161.50 75619 2019-11-02 Georgia Athens None 15 No 0.00 179.96
Filter data untuk tahun 2020
untuk wilayah Florida
dan Georgia
dan simpan ke dalam file CSV :
Python
# filter baris yang memiliki tr_date : tahun 2020 filter_15 = (data["tr_date"].dt.year == 2020) & data["state"].isin(["Florida", "Georgia"]) # simpan hasil filter pada variabel data_2020 data_2020 = data[filter_15] # simpan data tahun 2020 ke dalam file csv data_2020.to_csv("Florida_Georgia_2020.csv", index=False)
Tutorial Pandas