Filter Data pada DataFrame Pandas

Filter Data Pandas Dataframe

Dalam tahapan preprocessing data ataupun sebagai bagian dari analisis, seringkali dilakukan proses filter data-data sesuai kebutuhan. Tutorial ini kita akan membahas teknik-teknik untuk memfilter data menggunakan pustaka pandas. Versi pandas yang digunakan adalah versi 2.0.2.

Python

# pip install pandas

import pandas as pd

print(pd.__version__)

# 2.0.2

Penyiapan Data

Silahkan menggunakan data apa saja sepanjang dapat di impor ke dalam dataframe pandas. Data yang digunakan pada tutorial ini adalah data sintetis atau buatan yang berisi informasi mengenai transaksi suatu retail. Format data yang digunakan adalah file parquet. Jika ingin menggunakan data yang sama, klik di sini.

Python

# membaca data (format parquet)
data = pd.read_parquet("retail_transaction.parquet")

# menampilkan informasi data
data.info()

Output

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 75620 entries, 0 to 75619
Data columns (total 8 columns):
 #   Column         Non-Null Count  Dtype         
---  ------         --------------  -----         
 0   tr_date        75620 non-null  datetime64[ns]
 1   state          75620 non-null  object        
 2   city           75620 non-null  object        
 3   membership_id  43923 non-null  object        
 4   num_items      75620 non-null  int64         
 5   coupon         20942 non-null  object        
 6   discount_amt   20942 non-null  float64       
 7   order_amt      75620 non-null  float64       
dtypes: datetime64[ns](1), float64(2), int64(1), object(4)
memory usage: 4.6+ MB

Note : Pada tutorial ini, untuk menampilkan output pada layar akan menggunakan fungsi print dan Dataframe dikonversi terlebih dahulu dengan method to_string dengan maksimal data ditampilkan sebanyak 10 baris.

Contoh menampilkan variabel data :

Python

# mencetak variabel data

# Jika menggunakan jupyter notebook akan menampilkan format tabel rapi
# data 

# print(data)

# yang digunakan sepanjang tutorial
print(data.to_string(max_rows=10))

Output

         tr_date           state         city membership_id  num_items coupon  discount_amt  order_amt
0     2020-06-24         Florida        Miami    A23-472-61         22   None           NaN     213.68
1     2020-05-15  South Carolina   Greenville          None         13    Yes          0.33     155.59
2     2020-02-20         Georgia      Atlanta    R04-941-15         22   None           NaN     106.82
3     2020-07-22         Florida     Sarasota    U46-940-80         17    Yes          0.25     169.14
4     2020-01-31         Florida    Pensacola          None          8   None           NaN      72.93
...          ...             ...          ...           ...        ...    ...           ...        ...
75615 2019-11-30         Florida  Fort Pierce    G23-376-16         12   None           NaN     230.75
75616 2020-08-02         Florida  Panama City    P63-789-70         16   None           NaN      76.02
75617 2020-01-11         Florida  Panama City    E79-788-42         13   None           NaN     104.27
75618 2020-08-21         Alabama      Gadsden    E21-954-64         14   None           NaN     161.50
75619 2019-11-02         Georgia       Athens          None         15   None           NaN     179.96

Filter Null, NaN, None

Nilai null, NaN, None dapat difilter menggunakan fungsi isna atau isnull. Kedua fungsi ini pada pandas akan memberikan hasil yang sama.

Contoh 1 – Filter data dengan kolom membership_id bernilai None:

Python

# menggunakan fungsi isna
print(data[data["membership_id"].isna()].to_string(max_rows=10))

# menggunakan fungsi isnull
print(data[data["membership_id"].isnull()].to_string(max_rows=10))

Output

         tr_date           state             city membership_id  num_items coupon  discount_amt  order_amt
1     2020-05-15  South Carolina       Greenville          None         13    Yes          0.33     155.59
4     2020-01-31         Florida        Pensacola          None          8   None           NaN      72.93
5     2020-02-16         Florida            Miami          None         26   None           NaN     104.69
7     2019-11-07         Florida        Homestead          None         18   None           NaN      94.76
9     2020-10-01         Georgia          Atlanta          None          6    Yes          0.36     202.52
...          ...             ...              ...           ...        ...    ...           ...        ...
75606 2020-06-06         Florida  West Palm Beach          None         11   None           NaN      44.94
75609 2019-10-22         Florida          Orlando          None         22   None           NaN     114.95
75612 2020-07-17         Florida            Miami          None         21   None           NaN     198.88
75613 2020-10-16         Alabama       Birmingham          None         29    Yes          0.13     158.22
75619 2019-11-02         Georgia           Athens          None         15   None           NaN     179.96

Contoh 2 – Filter data dengan kolom discount_amt bernilai NaN :

Python

# menggunakan fungsi isna
print(data[data["discount_amt"].isna()].to_string(max_rows=10))

Output

         tr_date    state           city membership_id  num_items coupon  discount_amt  order_amt
0     2020-06-24  Florida          Miami    A23-472-61         22   None           NaN     213.68
2     2020-02-20  Georgia        Atlanta    R04-941-15         22   None           NaN     106.82
4     2020-01-31  Florida      Pensacola          None          8   None           NaN      72.93
5     2020-02-16  Florida          Miami          None         26   None           NaN     104.69
6     2020-04-24  Florida  Daytona Beach    O68-957-14          6   None           NaN      23.50
...          ...      ...            ...           ...        ...    ...           ...        ...
75615 2019-11-30  Florida    Fort Pierce    G23-376-16         12   None           NaN     230.75
75616 2020-08-02  Florida    Panama City    P63-789-70         16   None           NaN      76.02
75617 2020-01-11  Florida    Panama City    E79-788-42         13   None           NaN     104.27
75618 2020-08-21  Alabama        Gadsden    E21-954-64         14   None           NaN     161.50
75619 2019-11-02  Georgia         Athens          None         15   None           NaN     179.96

Filter Data Numerik

Filter pada data numerik dapat menggunakan operator perbandingan:

  • sama dengan : ==
  • tidak sama dengan : !=
  • lebih besar dari : >
  • lebih besar atau sama dengan : >=
  • lebih kecil dari : <
  • lebih kecil atau sama dengan : <=
  • ada di dalam list atau range: .isin(...)
  • berada di antara batas-batas: .between(...)
  • gabungan kondisi : (...) & (...) (harus semua terpenuhi), (...) | (...) (setidaknya 1 saja terpenuhi)
  • negasi : ~

Contoh 3 – Filter data dimana kolom order_amt lebih besar atau sama dengan $200:

Agar sintaks lebih mudah dibaca, kita dapat memisahkan filter atau statemen kondisi pada variabel tersendiri. Hal ini sangat bermanfaat pada saat jumlah kondisi yang digunakan semakin banyak.

Python

filter_1 = data["order_amt"] >= 200

# data[filter_1]

print(data[filter_1].to_string(max_rows=10))

Output

         tr_date    state             city membership_id  num_items coupon  discount_amt  order_amt
0     2020-06-24  Florida            Miami    A23-472-61         22   None           NaN     213.68
9     2020-10-01  Georgia          Atlanta          None          6    Yes          0.36     202.52
20    2019-10-31  Florida  West Palm Beach    K73-446-05         19   None           NaN     208.82
24    2020-08-13  Georgia      Gainesville          None         31   None           NaN     216.70
26    2020-04-22  Florida    Pompano Beach          None         11   None           NaN     204.15
...          ...      ...              ...           ...        ...    ...           ...        ...
75601 2020-02-17  Florida          Orlando    E75-924-94          5    Yes          0.16     202.56
75602 2020-02-03  Florida       Lake Worth          None         34   None           NaN     220.64
75608 2019-10-10  Florida          Orlando    Z52-600-31         13   None           NaN     232.22
75611 2020-08-18  Alabama         Anniston    Q10-224-44         24   None           NaN     216.49
75615 2019-11-30  Florida      Fort Pierce    G23-376-16         12   None           NaN     230.75

Contoh 4 – Filter data dengan discount_amt sama dengan 0.2 dan order_amt lebih dari $100 :

Jika filter terdiri dari 2 atau lebih kondisi dan menggunakan operator perbandingan (==,!=, dst) pastikan untuk menutup setiap kondisi dalam tanda kurung (kondisi 1) & (kondisi 2) | ...

Python

# pastikan setiap kondisi dimasukkan dalam tanda (...)
filter_2 = (data["discount_amt"] == 0.2) & (data["order_amt"] < 50)

# data[filter_2]

print(data[filter_2].to_string(max_rows=10))

Output

         tr_date    state          city membership_id  num_items coupon  discount_amt  order_amt
472   2019-11-13  Florida  Jacksonville          None         29    Yes           0.2      24.73
762   2019-11-28  Georgia       Atlanta    S26-460-55          7    Yes           0.2      36.73
1007  2019-12-25  Georgia       Atlanta          None         29    Yes           0.2      24.76
2389  2019-11-11  Florida   Gainesville    O52-218-69         32    Yes           0.2      25.04
3639  2019-12-22  Alabama    Montgomery    U35-263-94          3    Yes           0.2      35.21
...          ...      ...           ...           ...        ...    ...           ...        ...
67271 2020-03-02  Alabama    Birmingham          None         29    Yes           0.2      29.44
70579 2020-05-14  Florida     Pensacola          None          7    Yes           0.2      46.67
73366 2020-01-31  Florida     Melbourne    G71-142-05         11    Yes           0.2      43.71
74204 2020-06-01  Florida    Clearwater          None         19    Yes           0.2      42.16
74889 2020-06-01  Florida   Brooksville    C95-444-50         26    Yes           0.2      45.33

Contoh 5 – Filter nilai num_items antara 10-19 dan order_amt < $50 :

Python

filter_3 = data["num_items"].isin(range(10, 20)) & (data["order_amt"] < 50)

# data[filter_3]

print(data[filter_3].to_string(max_rows=10))

Output

         tr_date    state             city membership_id  num_items coupon  discount_amt  order_amt
28    2020-04-28  Florida      Gainesville    Q80-771-31         13   None           NaN      45.35
40    2020-01-13  Florida  Saint Augustine    O45-092-88         17   None           NaN      24.27
63    2019-10-27  Florida          Hialeah    B70-879-96         10    Yes          0.19      32.38
94    2019-12-23  Florida  West Palm Beach    G74-909-31         12   None           NaN      22.60
129   2020-10-06  Alabama           Mobile          None         15   None           NaN      29.49
...          ...      ...              ...           ...        ...    ...           ...        ...
75501 2019-10-01  Georgia          Decatur          None         14   None           NaN      22.71
75538 2020-06-28  Florida  Fort Lauderdale    L86-855-25         12   None           NaN      43.00
75566 2020-10-10  Alabama       Montgomery          None         13   None           NaN      25.74
75591 2020-09-10  Georgia          Atlanta    I01-339-59         16   None           NaN      29.67
75606 2020-06-06  Florida  West Palm Beach          None         11   None           NaN      44.94

Contoh 6 – Filter nilai num_items bukan [3, 5 atau 7] dan order_amt > $100 :

Python

filter_4 = (~data["num_items"].isin([3, 5, 7])) & (data["order_amt"] > 100)

# data[filter_4]

print(data[filter_4].to_string(max_rows=10))

Output

         tr_date           state         city membership_id  num_items coupon  discount_amt  order_amt
0     2020-06-24         Florida        Miami    A23-472-61         22   None           NaN     213.68
1     2020-05-15  South Carolina   Greenville          None         13    Yes          0.33     155.59
2     2020-02-20         Georgia      Atlanta    R04-941-15         22   None           NaN     106.82
3     2020-07-22         Florida     Sarasota    U46-940-80         17    Yes          0.25     169.14
5     2020-02-16         Florida        Miami          None         26   None           NaN     104.69
...          ...             ...          ...           ...        ...    ...           ...        ...
75613 2020-10-16         Alabama   Birmingham          None         29    Yes          0.13     158.22
75615 2019-11-30         Florida  Fort Pierce    G23-376-16         12   None           NaN     230.75
75617 2020-01-11         Florida  Panama City    E79-788-42         13   None           NaN     104.27
75618 2020-08-21         Alabama      Gadsden    E21-954-64         14   None           NaN     161.50
75619 2019-11-02         Georgia       Athens          None         15   None           NaN     179.96

Filter Data Text dan Objek

Untuk data yang berisi text atau objek dapat dibuat difilter menggunakan operator serta fungsi sebagai berikut:

  • sama dengan :==
  • tidak sama dengan : !=
  • diawali dengan : str.startswith
  • diakhiri dengan : str.endswith
  • mengandung text : str.contains
  • ada di dalam list : isin
  • negasi : ~
  • regex : str.contains (regex=True)

Contoh 7 – Filter kolom state=Florida" :

Python

filter_5 = data["state"] == "Florida"

# data[filter_5]

print(data[filter_5].to_string(max_rows=10))

Output

         tr_date    state           city membership_id  num_items coupon  discount_amt  order_amt
0     2020-06-24  Florida          Miami    A23-472-61         22   None           NaN     213.68
3     2020-07-22  Florida       Sarasota    U46-940-80         17    Yes          0.25     169.14
4     2020-01-31  Florida      Pensacola          None          8   None           NaN      72.93
5     2020-02-16  Florida          Miami          None         26   None           NaN     104.69
6     2020-04-24  Florida  Daytona Beach    O68-957-14          6   None           NaN      23.50
...          ...      ...            ...           ...        ...    ...           ...        ...
75609 2019-10-22  Florida        Orlando          None         22   None           NaN     114.95
75612 2020-07-17  Florida          Miami          None         21   None           NaN     198.88
75615 2019-11-30  Florida    Fort Pierce    G23-376-16         12   None           NaN     230.75
75616 2020-08-02  Florida    Panama City    P63-789-70         16   None           NaN      76.02
75617 2020-01-11  Florida    Panama City    E79-788-42         13   None           NaN     104.27

Contoh 8 – Filter kolom state adalah "Florida" dan city bukan "Miami" :

Python

filter_6 = (data["state"] == "Florida") & (data["city"] != "Miami")

# data[filter_6]

print(data[filter_6].to_string(max_rows=10))

Output

         tr_date    state             city membership_id  num_items coupon  discount_amt  order_amt
3     2020-07-22  Florida         Sarasota    U46-940-80         17    Yes          0.25     169.14
4     2020-01-31  Florida        Pensacola          None          8   None           NaN      72.93
6     2020-04-24  Florida    Daytona Beach    O68-957-14          6   None           NaN      23.50
7     2019-11-07  Florida        Homestead          None         18   None           NaN      94.76
10    2020-02-07  Florida  Fort Lauderdale    A23-124-53          7    Yes          0.42     172.93
...          ...      ...              ...           ...        ...    ...           ...        ...
75608 2019-10-10  Florida          Orlando    Z52-600-31         13   None           NaN     232.22
75609 2019-10-22  Florida          Orlando          None         22   None           NaN     114.95
75615 2019-11-30  Florida      Fort Pierce    G23-376-16         12   None           NaN     230.75
75616 2020-08-02  Florida      Panama City    P63-789-70         16   None           NaN      76.02
75617 2020-01-11  Florida      Panama City    E79-788-42         13   None           NaN     104.27

Contoh 9 – Filter kolom city diawali kata "Fort" :

Python

filter_7 = data["city"].str.startswith("Fort")

data[filter_7]

print(data[filter_7].to_string(max_rows=10))

Output

         tr_date    state             city membership_id  num_items coupon  discount_amt  order_amt
10    2020-02-07  Florida  Fort Lauderdale    A23-124-53          7    Yes          0.42     172.93
19    2019-10-11  Florida  Fort Lauderdale    O72-081-20         29   None           NaN     187.34
42    2019-10-25  Florida      Fort Pierce    J63-913-89         24   None           NaN      93.12
49    2020-07-06  Florida       Fort Myers    B45-088-52          5   None           NaN     216.23
59    2019-11-07  Florida  Fort Lauderdale          None         15   None           NaN     151.35
...          ...      ...              ...           ...        ...    ...           ...        ...
75547 2020-06-01  Florida      Fort Pierce          None          8   None           NaN     176.79
75552 2020-02-12  Florida  Fort Lauderdale          None          6   None           NaN     212.13
75562 2020-01-07  Florida  Fort Lauderdale    F20-296-85         16   None           NaN     124.53
75575 2020-03-10  Florida  Fort Lauderdale          None         28   None           NaN      86.91
75615 2019-11-30  Florida      Fort Pierce    G23-376-16         12   None           NaN     230.75

Contoh 10 – Filter kolom state adalah "Florida" dan kolom city mengandung kata "land" :

Python

filter_8 = (data["state"] == "Florida") & data["city"].str.contains("land")

data[filter_8]

print(data[filter_8].to_string(max_rows=10))

Output

         tr_date    state      city membership_id  num_items coupon  discount_amt  order_amt
16    2020-10-30  Florida   Orlando          None         25   None           NaN      51.31
145   2019-11-21  Florida  Lakeland          None         22    Yes          0.50      84.69
150   2020-02-14  Florida   Orlando    N28-332-99         16   None           NaN      55.73
157   2020-01-24  Florida   Orlando          None          5   None           NaN     149.51
166   2019-10-03  Florida   Orlando    R30-922-39         14   None           NaN     183.37
...          ...      ...       ...           ...        ...    ...           ...        ...
75456 2019-10-07  Florida  Lakeland          None         24    Yes          0.44     191.45
75564 2020-10-22  Florida   Orlando    Z46-564-21         25    Yes          0.25     157.63
75601 2020-02-17  Florida   Orlando    E75-924-94          5    Yes          0.16     202.56
75608 2019-10-10  Florida   Orlando    Z52-600-31         13   None           NaN     232.22
75609 2019-10-22  Florida   Orlando          None         22   None           NaN     114.95

Contoh 11 – Filter kolom state adalah "Florida", "Georgia" atau "Alabama" dan kolom coupon bernilai "Yes" :

Python

selected_state = ["Florida", "Georgia", "Alabama"]

filter_9_1 = data["state"].isin(selected_state)
filter_9_2 = data["coupon"] == "Yes"

# data[filter_9_1 & filter_9_2]

print(data[filter_9_1 & filter_9_2].to_string(max_rows=10))

Output

         tr_date    state             city membership_id  num_items coupon  discount_amt  order_amt
3     2020-07-22  Florida         Sarasota    U46-940-80         17    Yes          0.25     169.14
8     2020-03-06  Georgia    Lawrenceville    H90-561-44         28    Yes          0.17     105.71
9     2020-10-01  Georgia          Atlanta          None          6    Yes          0.36     202.52
10    2020-02-07  Florida  Fort Lauderdale    A23-124-53          7    Yes          0.42     172.93
13    2020-04-22  Alabama       Birmingham          None         22    Yes          0.41     176.76
...          ...      ...              ...           ...        ...    ...           ...        ...
75600 2020-01-19  Florida     Jacksonville          None          5    Yes          0.22     122.68
75601 2020-02-17  Florida          Orlando    E75-924-94          5    Yes          0.16     202.56
75605 2020-06-21  Georgia         Columbus          None         17    Yes          0.34     155.56
75610 2020-01-16  Alabama           Mobile    E29-461-87         31    Yes          0.14     102.49
75613 2020-10-16  Alabama       Birmingham          None         29    Yes          0.13     158.22

Contoh 12 – Filter data membership_id dengan patern [X-Z]?9-9??-[1-5]? :

Python

filter_10_1 = ~data["membership_id"].isna() 
filter_10_2 = data["membership_id"].str.contains("[X-Z]\d9-9\d\d-[1-5]\d", regex=True)

# data[filter_10_1 & filter_10_2]

print(data[filter_10_1 & filter_10_2].to_string(max_rows=10))

Output

         tr_date        state           city membership_id  num_items coupon  discount_amt  order_amt
50    2020-06-10      Alabama     Birmingham    Y09-907-14         28     No          0.00     128.69
4106  2020-01-22      Florida  Boynton Beach    Y79-959-18         33     No          0.00     179.57
5349  2020-02-19      Alabama         Mobile    Y39-904-45         24     No          0.00     225.17
11146 2020-03-25      Florida       Palm Bay    Z29-955-38         30    Yes          0.19     191.54
19880 2020-08-21      Alabama         Mobile    X49-948-24         20     No          0.00     194.33
...          ...          ...            ...           ...        ...    ...           ...        ...
52524 2020-03-29      Florida      Bradenton    Z89-902-27          6    Yes          0.46     104.35
61588 2019-10-17      Florida  Daytona Beach    Y79-918-38         16     No          0.00      83.69
70711 2019-12-24      Alabama     Birmingham    Y69-936-49         19     No          0.00     185.18
71649 2020-09-20  Mississippi        Jackson    X09-955-14          3     No          0.00     138.51
72698 2020-01-26      Florida          Ocala    Z49-998-39         29     No          0.00     174.92

Filter Data DateTime

Kolom-kolom bertipe datetime dapat difilter menggunakan operator sama sepeti pada data numerik.

Contoh 13 – Filter berdasarkan tanggal tertentu :

Python

filter_11 = data["tr_date"] == "2020-01-01"

# data[filter_11]

print(data[filter_11].to_string(max_rows=10))

Output

         tr_date           state              city membership_id  num_items coupon  discount_amt  order_amt
577   2020-01-01         Florida      Jacksonville    Y90-746-84          3   None           NaN     163.18
1424  2020-01-01         Georgia            Duluth    N54-069-70         33   None           NaN     213.16
1792  2020-01-01  South Carolina          Columbia          None          3   None           NaN     120.63
1918  2020-01-01         Florida        Vero Beach    Z81-325-76         23   None           NaN     193.26
1985  2020-01-01         Georgia          Savannah          None         28   None           NaN      64.50
...          ...             ...               ...           ...        ...    ...           ...        ...
71875 2020-01-01         Florida  Saint Petersburg    I10-625-73          9   None           NaN     104.78
72554 2020-01-01         Florida   Fort Lauderdale    K35-525-05          8   None           NaN     201.99
72737 2020-01-01         Florida      Lehigh Acres    T33-942-96         34   None           NaN      22.86
73302 2020-01-01         Florida        Cape Coral    K41-060-07         32   None           NaN      97.19
73881 2020-01-01         Alabama        Montgomery    L82-292-59         20   None           NaN     214.39

Contoh 14 – Filter menurut tahun tertentu dan bulan tertentu :

Untuk mendapatkan nilai tahun, bulan, hari pada data datetime dapat melalui method dt.

Python

# Tahun 2019 Bulan Juli-Des
filter_12 = (data["tr_date"].dt.year == 2019) & (data["tr_date"].dt.month > 6)

# data[filter_12]

print(data[filter_12].to_string(max_rows=10))

Output

         tr_date    state             city membership_id  num_items coupon  discount_amt  order_amt
7     2019-11-07  Florida        Homestead          None         18   None           NaN      94.76
18    2019-11-27  Florida     Jacksonville          None          9   None           NaN     125.20
19    2019-10-11  Florida  Fort Lauderdale    O72-081-20         29   None           NaN     187.34
20    2019-10-31  Florida  West Palm Beach    K73-446-05         19   None           NaN     208.82
25    2019-11-16  Florida      Miami Beach    T87-267-53         33   None           NaN      61.08
...          ...      ...              ...           ...        ...    ...           ...        ...
75599 2019-12-25  Florida            Ocala    V99-965-64          4    Yes          0.29     193.25
75608 2019-10-10  Florida          Orlando    Z52-600-31         13   None           NaN     232.22
75609 2019-10-22  Florida          Orlando          None         22   None           NaN     114.95
75615 2019-11-30  Florida      Fort Pierce    G23-376-16         12   None           NaN     230.75
75619 2019-11-02  Georgia           Athens          None         15   None           NaN     179.96

Contoh 15 – Filter antara 2 rentang tanggal :

Python

start = pd.to_datetime("2020-06-01")
end = pd.to_datetime("2020-09-30")

# inclusive
# both : start dan end termasuk dalam range
# neither : start dan end tidak termasuk dalam range
# left : start masuk dalam range dan end tidak
# right : start tidak masuk dalam range dan end masuk
filter_13 = data["tr_date"].isin(pd.date_range(start, end, inclusive="both"))

data[filter_13]
print(data[filter_13].to_string(max_rows=10))

Output

         tr_date    state         city membership_id  num_items coupon  discount_amt  order_amt
0     2020-06-24  Florida        Miami    A23-472-61         22   None           NaN     213.68
3     2020-07-22  Florida     Sarasota    U46-940-80         17    Yes          0.25     169.14
14    2020-06-21  Alabama   Birmingham    J25-217-44         33   None           NaN      47.86
15    2020-08-13  Florida        Tampa          None         26   None           NaN      42.29
21    2020-07-27  Georgia      Atlanta          None          6    Yes          0.28      67.64
...          ...      ...          ...           ...        ...    ...           ...        ...
75611 2020-08-18  Alabama     Anniston    Q10-224-44         24   None           NaN     216.49
75612 2020-07-17  Florida        Miami          None         21   None           NaN     198.88
75614 2020-08-26  Alabama   Birmingham    U38-420-51          5   None           NaN      98.44
75616 2020-08-02  Florida  Panama City    P63-789-70         16   None           NaN      76.02
75618 2020-08-21  Alabama      Gadsden    E21-954-64         14   None           NaN     161.50

Mengelola Data Hasil Filter

Data yang sudah difilter dapat dilakukan proses lebih lanjut, misalnya:

Menganti Nilai NaN dengan 0 pada kolom discount_amt dan nilai None pada kolom coupon menjadi No :

Python

# filter baris yang memiliki nilai discount_amt = NaN
filter_13 = data["discount_amt"].isnull()

# ganti nilainya dengan 0
data.loc[filter_13, "discount_amt"] = 0

# filter baris yang memiliki nilai coupon = None
filter_14 = data["coupon"].isna()

# ganti nilainya dengan No
data.loc[filter_13, "coupon"] = "No"


# mencetak data yang sudah di-update
print(data.to_string(max_rows=10))

Output

         tr_date           state         city membership_id  num_items coupon  discount_amt  order_amt
0     2020-06-24         Florida        Miami    A23-472-61         22     No          0.00     213.68
1     2020-05-15  South Carolina   Greenville          None         13    Yes          0.33     155.59
2     2020-02-20         Georgia      Atlanta    R04-941-15         22     No          0.00     106.82
3     2020-07-22         Florida     Sarasota    U46-940-80         17    Yes          0.25     169.14
4     2020-01-31         Florida    Pensacola          None          8     No          0.00      72.93
...          ...             ...          ...           ...        ...    ...           ...        ...
75615 2019-11-30         Florida  Fort Pierce    G23-376-16         12     No          0.00     230.75
75616 2020-08-02         Florida  Panama City    P63-789-70         16     No          0.00      76.02
75617 2020-01-11         Florida  Panama City    E79-788-42         13     No          0.00     104.27
75618 2020-08-21         Alabama      Gadsden    E21-954-64         14     No          0.00     161.50
75619 2019-11-02         Georgia       Athens          None         15     No          0.00     179.96

Filter data untuk tahun 2020 untuk wilayah Florida dan Georgia dan simpan ke dalam file CSV :

Python

# filter baris yang memiliki tr_date : tahun 2020
filter_15 = (data["tr_date"].dt.year == 2020) & data["state"].isin(["Florida", "Georgia"])

# simpan hasil filter pada variabel data_2020
data_2020 = data[filter_15]

# simpan data tahun 2020 ke dalam file csv
data_2020.to_csv("Florida_Georgia_2020.csv", index=False)

You may also like...

Leave a Reply

Your email address will not be published. Required fields are marked *

Daftar Isi