Filter Data pada DataFrame Pandas


Dalam tahapan preprocessing data ataupun sebagai bagian dari analisis, seringkali dilakukan proses filter data-data sesuai kebutuhan. Tutorial ini kita akan membahas teknik-teknik untuk memfilter data menggunakan pustaka pandas. Versi pandas yang digunakan adalah versi 2.0.2.
Python
# pip install pandas import pandas as pd print(pd.__version__) # 2.0.2
Penyiapan Data
Silahkan menggunakan data apa saja sepanjang dapat di impor ke dalam dataframe pandas. Data yang digunakan pada tutorial ini adalah data sintetis atau buatan yang berisi informasi mengenai transaksi suatu retail. Format data yang digunakan adalah file parquet. Jika ingin menggunakan data yang sama, klik di sini.
Python
# membaca data (format parquet)
data = pd.read_parquet("retail_transaction.parquet")
# menampilkan informasi data
data.info()Output
<class 'pandas.core.frame.DataFrame'> RangeIndex: 75620 entries, 0 to 75619 Data columns (total 8 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 tr_date 75620 non-null datetime64[ns] 1 state 75620 non-null object 2 city 75620 non-null object 3 membership_id 43923 non-null object 4 num_items 75620 non-null int64 5 coupon 20942 non-null object 6 discount_amt 20942 non-null float64 7 order_amt 75620 non-null float64 dtypes: datetime64[ns](1), float64(2), int64(1), object(4) memory usage: 4.6+ MB
Note : Pada tutorial ini, untuk menampilkan output pada layar akan menggunakan fungsi print dan Dataframe dikonversi terlebih dahulu dengan method to_string dengan maksimal data ditampilkan sebanyak 10 baris.
Contoh menampilkan variabel data :
Python
# mencetak variabel data # Jika menggunakan jupyter notebook akan menampilkan format tabel rapi # data # print(data) # yang digunakan sepanjang tutorial print(data.to_string(max_rows=10))
Output
tr_date state city membership_id num_items coupon discount_amt order_amt 0 2020-06-24 Florida Miami A23-472-61 22 None NaN 213.68 1 2020-05-15 South Carolina Greenville None 13 Yes 0.33 155.59 2 2020-02-20 Georgia Atlanta R04-941-15 22 None NaN 106.82 3 2020-07-22 Florida Sarasota U46-940-80 17 Yes 0.25 169.14 4 2020-01-31 Florida Pensacola None 8 None NaN 72.93 ... ... ... ... ... ... ... ... ... 75615 2019-11-30 Florida Fort Pierce G23-376-16 12 None NaN 230.75 75616 2020-08-02 Florida Panama City P63-789-70 16 None NaN 76.02 75617 2020-01-11 Florida Panama City E79-788-42 13 None NaN 104.27 75618 2020-08-21 Alabama Gadsden E21-954-64 14 None NaN 161.50 75619 2019-11-02 Georgia Athens None 15 None NaN 179.96
Filter Null, NaN, None
Nilai null, NaN, None dapat difilter menggunakan fungsi isna atau isnull. Kedua fungsi ini pada pandas akan memberikan hasil yang sama.
Contoh 1 – Filter data dengan kolom membership_id bernilai None:
Python
# menggunakan fungsi isna print(data[data["membership_id"].isna()].to_string(max_rows=10)) # menggunakan fungsi isnull print(data[data["membership_id"].isnull()].to_string(max_rows=10))
Output
tr_date state city membership_id num_items coupon discount_amt order_amt 1 2020-05-15 South Carolina Greenville None 13 Yes 0.33 155.59 4 2020-01-31 Florida Pensacola None 8 None NaN 72.93 5 2020-02-16 Florida Miami None 26 None NaN 104.69 7 2019-11-07 Florida Homestead None 18 None NaN 94.76 9 2020-10-01 Georgia Atlanta None 6 Yes 0.36 202.52 ... ... ... ... ... ... ... ... ... 75606 2020-06-06 Florida West Palm Beach None 11 None NaN 44.94 75609 2019-10-22 Florida Orlando None 22 None NaN 114.95 75612 2020-07-17 Florida Miami None 21 None NaN 198.88 75613 2020-10-16 Alabama Birmingham None 29 Yes 0.13 158.22 75619 2019-11-02 Georgia Athens None 15 None NaN 179.96
Contoh 2 – Filter data dengan kolom discount_amt bernilai NaN :
Python
# menggunakan fungsi isna print(data[data["discount_amt"].isna()].to_string(max_rows=10))
Output
tr_date state city membership_id num_items coupon discount_amt order_amt 0 2020-06-24 Florida Miami A23-472-61 22 None NaN 213.68 2 2020-02-20 Georgia Atlanta R04-941-15 22 None NaN 106.82 4 2020-01-31 Florida Pensacola None 8 None NaN 72.93 5 2020-02-16 Florida Miami None 26 None NaN 104.69 6 2020-04-24 Florida Daytona Beach O68-957-14 6 None NaN 23.50 ... ... ... ... ... ... ... ... ... 75615 2019-11-30 Florida Fort Pierce G23-376-16 12 None NaN 230.75 75616 2020-08-02 Florida Panama City P63-789-70 16 None NaN 76.02 75617 2020-01-11 Florida Panama City E79-788-42 13 None NaN 104.27 75618 2020-08-21 Alabama Gadsden E21-954-64 14 None NaN 161.50 75619 2019-11-02 Georgia Athens None 15 None NaN 179.96
Filter Data Numerik
Filter pada data numerik dapat menggunakan operator perbandingan:
- sama dengan :
== - tidak sama dengan :
!= - lebih besar dari :
> - lebih besar atau sama dengan :
>= - lebih kecil dari :
< - lebih kecil atau sama dengan :
<= - ada di dalam list atau range:
.isin(...) - berada di antara batas-batas:
.between(...) - gabungan kondisi :
(...) & (...)(harus semua terpenuhi),(...) | (...)(setidaknya 1 saja terpenuhi) - negasi :
~
Contoh 3 – Filter data dimana kolom order_amt lebih besar atau sama dengan $200:
Agar sintaks lebih mudah dibaca, kita dapat memisahkan filter atau statemen kondisi pada variabel tersendiri. Hal ini sangat bermanfaat pada saat jumlah kondisi yang digunakan semakin banyak.
Python
filter_1 = data["order_amt"] >= 200 # data[filter_1] print(data[filter_1].to_string(max_rows=10))
Output
tr_date state city membership_id num_items coupon discount_amt order_amt 0 2020-06-24 Florida Miami A23-472-61 22 None NaN 213.68 9 2020-10-01 Georgia Atlanta None 6 Yes 0.36 202.52 20 2019-10-31 Florida West Palm Beach K73-446-05 19 None NaN 208.82 24 2020-08-13 Georgia Gainesville None 31 None NaN 216.70 26 2020-04-22 Florida Pompano Beach None 11 None NaN 204.15 ... ... ... ... ... ... ... ... ... 75601 2020-02-17 Florida Orlando E75-924-94 5 Yes 0.16 202.56 75602 2020-02-03 Florida Lake Worth None 34 None NaN 220.64 75608 2019-10-10 Florida Orlando Z52-600-31 13 None NaN 232.22 75611 2020-08-18 Alabama Anniston Q10-224-44 24 None NaN 216.49 75615 2019-11-30 Florida Fort Pierce G23-376-16 12 None NaN 230.75
Contoh 4 – Filter data dengan discount_amt sama dengan 0.2 dan order_amt lebih dari $100 :
Jika filter terdiri dari 2 atau lebih kondisi dan menggunakan operator perbandingan (==,!=, dst) pastikan untuk menutup setiap kondisi dalam tanda kurung (kondisi 1) & (kondisi 2) | ...
Python
# pastikan setiap kondisi dimasukkan dalam tanda (...) filter_2 = (data["discount_amt"] == 0.2) & (data["order_amt"] < 50) # data[filter_2] print(data[filter_2].to_string(max_rows=10))
Output
tr_date state city membership_id num_items coupon discount_amt order_amt 472 2019-11-13 Florida Jacksonville None 29 Yes 0.2 24.73 762 2019-11-28 Georgia Atlanta S26-460-55 7 Yes 0.2 36.73 1007 2019-12-25 Georgia Atlanta None 29 Yes 0.2 24.76 2389 2019-11-11 Florida Gainesville O52-218-69 32 Yes 0.2 25.04 3639 2019-12-22 Alabama Montgomery U35-263-94 3 Yes 0.2 35.21 ... ... ... ... ... ... ... ... ... 67271 2020-03-02 Alabama Birmingham None 29 Yes 0.2 29.44 70579 2020-05-14 Florida Pensacola None 7 Yes 0.2 46.67 73366 2020-01-31 Florida Melbourne G71-142-05 11 Yes 0.2 43.71 74204 2020-06-01 Florida Clearwater None 19 Yes 0.2 42.16 74889 2020-06-01 Florida Brooksville C95-444-50 26 Yes 0.2 45.33
Contoh 5 – Filter nilai num_items antara 10-19 dan order_amt < $50 :
Python
filter_3 = data["num_items"].isin(range(10, 20)) & (data["order_amt"] < 50) # data[filter_3] print(data[filter_3].to_string(max_rows=10))
Output
tr_date state city membership_id num_items coupon discount_amt order_amt 28 2020-04-28 Florida Gainesville Q80-771-31 13 None NaN 45.35 40 2020-01-13 Florida Saint Augustine O45-092-88 17 None NaN 24.27 63 2019-10-27 Florida Hialeah B70-879-96 10 Yes 0.19 32.38 94 2019-12-23 Florida West Palm Beach G74-909-31 12 None NaN 22.60 129 2020-10-06 Alabama Mobile None 15 None NaN 29.49 ... ... ... ... ... ... ... ... ... 75501 2019-10-01 Georgia Decatur None 14 None NaN 22.71 75538 2020-06-28 Florida Fort Lauderdale L86-855-25 12 None NaN 43.00 75566 2020-10-10 Alabama Montgomery None 13 None NaN 25.74 75591 2020-09-10 Georgia Atlanta I01-339-59 16 None NaN 29.67 75606 2020-06-06 Florida West Palm Beach None 11 None NaN 44.94
Contoh 6 – Filter nilai num_items bukan [3, 5 atau 7] dan order_amt > $100 :
Python
filter_4 = (~data["num_items"].isin([3, 5, 7])) & (data["order_amt"] > 100) # data[filter_4] print(data[filter_4].to_string(max_rows=10))
Output
tr_date state city membership_id num_items coupon discount_amt order_amt 0 2020-06-24 Florida Miami A23-472-61 22 None NaN 213.68 1 2020-05-15 South Carolina Greenville None 13 Yes 0.33 155.59 2 2020-02-20 Georgia Atlanta R04-941-15 22 None NaN 106.82 3 2020-07-22 Florida Sarasota U46-940-80 17 Yes 0.25 169.14 5 2020-02-16 Florida Miami None 26 None NaN 104.69 ... ... ... ... ... ... ... ... ... 75613 2020-10-16 Alabama Birmingham None 29 Yes 0.13 158.22 75615 2019-11-30 Florida Fort Pierce G23-376-16 12 None NaN 230.75 75617 2020-01-11 Florida Panama City E79-788-42 13 None NaN 104.27 75618 2020-08-21 Alabama Gadsden E21-954-64 14 None NaN 161.50 75619 2019-11-02 Georgia Athens None 15 None NaN 179.96
Filter Data Text dan Objek
Untuk data yang berisi text atau objek dapat dibuat difilter menggunakan operator serta fungsi sebagai berikut:
- sama dengan :
== - tidak sama dengan :
!= - diawali dengan :
str.startswith - diakhiri dengan :
str.endswith - mengandung text :
str.contains - ada di dalam list :
isin - negasi :
~ - regex :
str.contains(regex=True)
Contoh 7 – Filter kolom state=“Florida" :
Python
filter_5 = data["state"] == "Florida" # data[filter_5] print(data[filter_5].to_string(max_rows=10))
Output
tr_date state city membership_id num_items coupon discount_amt order_amt 0 2020-06-24 Florida Miami A23-472-61 22 None NaN 213.68 3 2020-07-22 Florida Sarasota U46-940-80 17 Yes 0.25 169.14 4 2020-01-31 Florida Pensacola None 8 None NaN 72.93 5 2020-02-16 Florida Miami None 26 None NaN 104.69 6 2020-04-24 Florida Daytona Beach O68-957-14 6 None NaN 23.50 ... ... ... ... ... ... ... ... ... 75609 2019-10-22 Florida Orlando None 22 None NaN 114.95 75612 2020-07-17 Florida Miami None 21 None NaN 198.88 75615 2019-11-30 Florida Fort Pierce G23-376-16 12 None NaN 230.75 75616 2020-08-02 Florida Panama City P63-789-70 16 None NaN 76.02 75617 2020-01-11 Florida Panama City E79-788-42 13 None NaN 104.27
Contoh 8 – Filter kolom state adalah "Florida" dan city bukan "Miami" :
Python
filter_6 = (data["state"] == "Florida") & (data["city"] != "Miami") # data[filter_6] print(data[filter_6].to_string(max_rows=10))
Output
tr_date state city membership_id num_items coupon discount_amt order_amt 3 2020-07-22 Florida Sarasota U46-940-80 17 Yes 0.25 169.14 4 2020-01-31 Florida Pensacola None 8 None NaN 72.93 6 2020-04-24 Florida Daytona Beach O68-957-14 6 None NaN 23.50 7 2019-11-07 Florida Homestead None 18 None NaN 94.76 10 2020-02-07 Florida Fort Lauderdale A23-124-53 7 Yes 0.42 172.93 ... ... ... ... ... ... ... ... ... 75608 2019-10-10 Florida Orlando Z52-600-31 13 None NaN 232.22 75609 2019-10-22 Florida Orlando None 22 None NaN 114.95 75615 2019-11-30 Florida Fort Pierce G23-376-16 12 None NaN 230.75 75616 2020-08-02 Florida Panama City P63-789-70 16 None NaN 76.02 75617 2020-01-11 Florida Panama City E79-788-42 13 None NaN 104.27
Contoh 9 – Filter kolom city diawali kata "Fort" :
Python
filter_7 = data["city"].str.startswith("Fort")
data[filter_7]
print(data[filter_7].to_string(max_rows=10))Output
tr_date state city membership_id num_items coupon discount_amt order_amt 10 2020-02-07 Florida Fort Lauderdale A23-124-53 7 Yes 0.42 172.93 19 2019-10-11 Florida Fort Lauderdale O72-081-20 29 None NaN 187.34 42 2019-10-25 Florida Fort Pierce J63-913-89 24 None NaN 93.12 49 2020-07-06 Florida Fort Myers B45-088-52 5 None NaN 216.23 59 2019-11-07 Florida Fort Lauderdale None 15 None NaN 151.35 ... ... ... ... ... ... ... ... ... 75547 2020-06-01 Florida Fort Pierce None 8 None NaN 176.79 75552 2020-02-12 Florida Fort Lauderdale None 6 None NaN 212.13 75562 2020-01-07 Florida Fort Lauderdale F20-296-85 16 None NaN 124.53 75575 2020-03-10 Florida Fort Lauderdale None 28 None NaN 86.91 75615 2019-11-30 Florida Fort Pierce G23-376-16 12 None NaN 230.75
Contoh 10 – Filter kolom state adalah "Florida" dan kolom city mengandung kata "land" :
Python
filter_8 = (data["state"] == "Florida") & data["city"].str.contains("land")
data[filter_8]
print(data[filter_8].to_string(max_rows=10))Output
tr_date state city membership_id num_items coupon discount_amt order_amt 16 2020-10-30 Florida Orlando None 25 None NaN 51.31 145 2019-11-21 Florida Lakeland None 22 Yes 0.50 84.69 150 2020-02-14 Florida Orlando N28-332-99 16 None NaN 55.73 157 2020-01-24 Florida Orlando None 5 None NaN 149.51 166 2019-10-03 Florida Orlando R30-922-39 14 None NaN 183.37 ... ... ... ... ... ... ... ... ... 75456 2019-10-07 Florida Lakeland None 24 Yes 0.44 191.45 75564 2020-10-22 Florida Orlando Z46-564-21 25 Yes 0.25 157.63 75601 2020-02-17 Florida Orlando E75-924-94 5 Yes 0.16 202.56 75608 2019-10-10 Florida Orlando Z52-600-31 13 None NaN 232.22 75609 2019-10-22 Florida Orlando None 22 None NaN 114.95
Contoh 11 – Filter kolom state adalah "Florida", "Georgia" atau "Alabama" dan kolom coupon bernilai "Yes" :
Python
selected_state = ["Florida", "Georgia", "Alabama"] filter_9_1 = data["state"].isin(selected_state) filter_9_2 = data["coupon"] == "Yes" # data[filter_9_1 & filter_9_2] print(data[filter_9_1 & filter_9_2].to_string(max_rows=10))
Output
tr_date state city membership_id num_items coupon discount_amt order_amt 3 2020-07-22 Florida Sarasota U46-940-80 17 Yes 0.25 169.14 8 2020-03-06 Georgia Lawrenceville H90-561-44 28 Yes 0.17 105.71 9 2020-10-01 Georgia Atlanta None 6 Yes 0.36 202.52 10 2020-02-07 Florida Fort Lauderdale A23-124-53 7 Yes 0.42 172.93 13 2020-04-22 Alabama Birmingham None 22 Yes 0.41 176.76 ... ... ... ... ... ... ... ... ... 75600 2020-01-19 Florida Jacksonville None 5 Yes 0.22 122.68 75601 2020-02-17 Florida Orlando E75-924-94 5 Yes 0.16 202.56 75605 2020-06-21 Georgia Columbus None 17 Yes 0.34 155.56 75610 2020-01-16 Alabama Mobile E29-461-87 31 Yes 0.14 102.49 75613 2020-10-16 Alabama Birmingham None 29 Yes 0.13 158.22
Contoh 12 – Filter data membership_id dengan patern [X-Z]?9-9??-[1-5]? :
Python
filter_10_1 = ~data["membership_id"].isna()
filter_10_2 = data["membership_id"].str.contains("[X-Z]\d9-9\d\d-[1-5]\d", regex=True)
# data[filter_10_1 & filter_10_2]
print(data[filter_10_1 & filter_10_2].to_string(max_rows=10))Output
tr_date state city membership_id num_items coupon discount_amt order_amt 50 2020-06-10 Alabama Birmingham Y09-907-14 28 No 0.00 128.69 4106 2020-01-22 Florida Boynton Beach Y79-959-18 33 No 0.00 179.57 5349 2020-02-19 Alabama Mobile Y39-904-45 24 No 0.00 225.17 11146 2020-03-25 Florida Palm Bay Z29-955-38 30 Yes 0.19 191.54 19880 2020-08-21 Alabama Mobile X49-948-24 20 No 0.00 194.33 ... ... ... ... ... ... ... ... ... 52524 2020-03-29 Florida Bradenton Z89-902-27 6 Yes 0.46 104.35 61588 2019-10-17 Florida Daytona Beach Y79-918-38 16 No 0.00 83.69 70711 2019-12-24 Alabama Birmingham Y69-936-49 19 No 0.00 185.18 71649 2020-09-20 Mississippi Jackson X09-955-14 3 No 0.00 138.51 72698 2020-01-26 Florida Ocala Z49-998-39 29 No 0.00 174.92
Filter Data DateTime
Kolom-kolom bertipe datetime dapat difilter menggunakan operator sama sepeti pada data numerik.
Contoh 13 – Filter berdasarkan tanggal tertentu :
Python
filter_11 = data["tr_date"] == "2020-01-01" # data[filter_11] print(data[filter_11].to_string(max_rows=10))
Output
tr_date state city membership_id num_items coupon discount_amt order_amt 577 2020-01-01 Florida Jacksonville Y90-746-84 3 None NaN 163.18 1424 2020-01-01 Georgia Duluth N54-069-70 33 None NaN 213.16 1792 2020-01-01 South Carolina Columbia None 3 None NaN 120.63 1918 2020-01-01 Florida Vero Beach Z81-325-76 23 None NaN 193.26 1985 2020-01-01 Georgia Savannah None 28 None NaN 64.50 ... ... ... ... ... ... ... ... ... 71875 2020-01-01 Florida Saint Petersburg I10-625-73 9 None NaN 104.78 72554 2020-01-01 Florida Fort Lauderdale K35-525-05 8 None NaN 201.99 72737 2020-01-01 Florida Lehigh Acres T33-942-96 34 None NaN 22.86 73302 2020-01-01 Florida Cape Coral K41-060-07 32 None NaN 97.19 73881 2020-01-01 Alabama Montgomery L82-292-59 20 None NaN 214.39
Contoh 14 – Filter menurut tahun tertentu dan bulan tertentu :
Untuk mendapatkan nilai tahun, bulan, hari pada data datetime dapat melalui method dt.
Python
# Tahun 2019 Bulan Juli-Des filter_12 = (data["tr_date"].dt.year == 2019) & (data["tr_date"].dt.month > 6) # data[filter_12] print(data[filter_12].to_string(max_rows=10))
Output
tr_date state city membership_id num_items coupon discount_amt order_amt 7 2019-11-07 Florida Homestead None 18 None NaN 94.76 18 2019-11-27 Florida Jacksonville None 9 None NaN 125.20 19 2019-10-11 Florida Fort Lauderdale O72-081-20 29 None NaN 187.34 20 2019-10-31 Florida West Palm Beach K73-446-05 19 None NaN 208.82 25 2019-11-16 Florida Miami Beach T87-267-53 33 None NaN 61.08 ... ... ... ... ... ... ... ... ... 75599 2019-12-25 Florida Ocala V99-965-64 4 Yes 0.29 193.25 75608 2019-10-10 Florida Orlando Z52-600-31 13 None NaN 232.22 75609 2019-10-22 Florida Orlando None 22 None NaN 114.95 75615 2019-11-30 Florida Fort Pierce G23-376-16 12 None NaN 230.75 75619 2019-11-02 Georgia Athens None 15 None NaN 179.96
Contoh 15 – Filter antara 2 rentang tanggal :
Python
start = pd.to_datetime("2020-06-01")
end = pd.to_datetime("2020-09-30")
# inclusive
# both : start dan end termasuk dalam range
# neither : start dan end tidak termasuk dalam range
# left : start masuk dalam range dan end tidak
# right : start tidak masuk dalam range dan end masuk
filter_13 = data["tr_date"].isin(pd.date_range(start, end, inclusive="both"))
data[filter_13]
print(data[filter_13].to_string(max_rows=10))Output
tr_date state city membership_id num_items coupon discount_amt order_amt 0 2020-06-24 Florida Miami A23-472-61 22 None NaN 213.68 3 2020-07-22 Florida Sarasota U46-940-80 17 Yes 0.25 169.14 14 2020-06-21 Alabama Birmingham J25-217-44 33 None NaN 47.86 15 2020-08-13 Florida Tampa None 26 None NaN 42.29 21 2020-07-27 Georgia Atlanta None 6 Yes 0.28 67.64 ... ... ... ... ... ... ... ... ... 75611 2020-08-18 Alabama Anniston Q10-224-44 24 None NaN 216.49 75612 2020-07-17 Florida Miami None 21 None NaN 198.88 75614 2020-08-26 Alabama Birmingham U38-420-51 5 None NaN 98.44 75616 2020-08-02 Florida Panama City P63-789-70 16 None NaN 76.02 75618 2020-08-21 Alabama Gadsden E21-954-64 14 None NaN 161.50
Mengelola Data Hasil Filter
Data yang sudah difilter dapat dilakukan proses lebih lanjut, misalnya:
Menganti Nilai NaN dengan 0 pada kolom discount_amt dan nilai None pada kolom coupon menjadi No :
Python
# filter baris yang memiliki nilai discount_amt = NaN filter_13 = data["discount_amt"].isnull() # ganti nilainya dengan 0 data.loc[filter_13, "discount_amt"] = 0 # filter baris yang memiliki nilai coupon = None filter_14 = data["coupon"].isna() # ganti nilainya dengan No data.loc[filter_13, "coupon"] = "No" # mencetak data yang sudah di-update print(data.to_string(max_rows=10))
Output
tr_date state city membership_id num_items coupon discount_amt order_amt 0 2020-06-24 Florida Miami A23-472-61 22 No 0.00 213.68 1 2020-05-15 South Carolina Greenville None 13 Yes 0.33 155.59 2 2020-02-20 Georgia Atlanta R04-941-15 22 No 0.00 106.82 3 2020-07-22 Florida Sarasota U46-940-80 17 Yes 0.25 169.14 4 2020-01-31 Florida Pensacola None 8 No 0.00 72.93 ... ... ... ... ... ... ... ... ... 75615 2019-11-30 Florida Fort Pierce G23-376-16 12 No 0.00 230.75 75616 2020-08-02 Florida Panama City P63-789-70 16 No 0.00 76.02 75617 2020-01-11 Florida Panama City E79-788-42 13 No 0.00 104.27 75618 2020-08-21 Alabama Gadsden E21-954-64 14 No 0.00 161.50 75619 2019-11-02 Georgia Athens None 15 No 0.00 179.96
Filter data untuk tahun 2020 untuk wilayah Florida dan Georgia dan simpan ke dalam file CSV :
Python
# filter baris yang memiliki tr_date : tahun 2020
filter_15 = (data["tr_date"].dt.year == 2020) & data["state"].isin(["Florida", "Georgia"])
# simpan hasil filter pada variabel data_2020
data_2020 = data[filter_15]
# simpan data tahun 2020 ke dalam file csv
data_2020.to_csv("Florida_Georgia_2020.csv", index=False)Tutorial Pandas





