Scrape Data Produk Tokopedia Menggunakan Python
Hallo semua kali ini admin akan membahas cara melakukan scrape data pada website tokopedia, atau lebih simpelnya kita akan membuat sebuah codingan menggunkan python untuk dapat mengekstrak data sebuah website, database, aplikasi enterprise, sistem legacy yang kemudian menyimpannya ke dalam sebuah file dengan format tabular atau spreadsheet.
Sebelum memulainya ada beberapa llibary yang mesti kamu install, yaitu
- Pandas
$py -m pip install pandas
- Request
$py -m pip install requests
- Maths
Langkah berikutnya ialah kita aka menggambil API dari tokopedia yang fungsinya untuk melakukan scrape data yang kita ingin cari dan untuk mencari API yang akan digunakan kita membuka fitur dari browser yakni inspect element dan untuk membukanya cukup klik tombol F12, tampilannya seperti di bawah ini.
Kemudian pada bagian network pilih fetch/XHR dan bila tidak muncul seperti pada gambar dibawah ini klik tombol
ctrl + r
untuk refresh halaman. Lalu pilih searchProductQueryV4 lalu copy as cURL(bash) dan paste file yang tadi ke Visual Studio Code.Langkah berikutnya ialah membuat progmranya dengan menginport libarynya.
import requests
import pandas as pd
import math
Copy hasil url yang kamu dapat tadi tapi code ini belum bisa di gunakan, untuk itu kamu harus edit terlebih dahulu dengan mengahapus (-H), hapus (\), ubah variable curlnya dan hapus ( — data-raw $),( — compressed). untuk lebih lanjutnya silahkan lihat code yang sudah diubah dan di tambahkan code-codenya. Kamu bisa melihatnya di bawah ini.
curl 'https://gql.tokopedia.com/graphql/SearchProductQueryV4' \
-H 'authority: gql.tokopedia.com' \
-H 'accept: */*' \
-H 'accept-language: id-ID,id;q=0.9,en-US;q=0.8,en;q=0.7' \
-H 'content-type: application/json' \
-H 'cookie: _gcl_au=1.1.1413001030.1672228217; _UUID_NONLOGIN_=c1e6d871a598c87993e83a1501ab6f32; DID=c17aa6dcf617cbed0e6fba4639e43a825b415d046ba7852e650167030893d15c7b82e1ff1b4981e31e45c710a91c3b9c; DID_JS=YzE3YWE2ZGNmNjE3Y2JlZDBlNmZiYTQ2MzllNDNhODI1YjQxNWQwNDZiYTc4NTJlNjUwMTY3MDMwODkzZDE1YzdiODJlMWZmMWI0OTgxZTMxZTQ1YzcxMGE5MWMzYjlj47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=; _UUID_CAS_=cf7fbfcb-eb64-4d6c-950c-4f9fa2b9d118; _CASE_=2c75331e33756d656560637b75361e33756d677b753b353b756d751d363c36252336770722243623757b75341e33756d6660617b753b383930756d75757b753b3623756d75757b75271438756d75757b75201e33756d66656566676460627b75241e33756d66666264676260647b7524032e2732756d75653f757b75203f24756d750c2c0b75203625323f38222432083e330b756d66656566676460627b0b75243225213e343208232e27320b756d0b75653f0b757b0b750808232e273239363a320b756d0b75003625323f38222432240b752a7b2c0b75203625323f38222432083e330b756d677b0b75243225213e343208232e27320b756d0b7566623a0b757b0b750808232e273239363a320b756d0b75003625323f38222432240b752a0a757b753b022733756d75656765657a66657a656f03666f6d62666d66657c67606d6767752a; __auc=1457b782185589295917cd3559a; _gcl_aw=GCL.1673789178.CjwKCAiA5Y6eBhAbEiwA_2ZWITpDT3tQihrnx6f12R1Z_SS5v8I4QOajIsLsCgKIHBmbr7AQhXs29hoCq7MQAvD_BwE; _gac_UA-126956641-6=1.1673789184.CjwKCAiA5Y6eBhAbEiwA_2ZWITpDT3tQihrnx6f12R1Z_SS5v8I4QOajIsLsCgKIHBmbr7AQhXs29hoCq7MQAvD_BwE; hfv_banner=true; _gac_UA-9801603-1=1.1673790632.CjwKCAiA5Y6eBhAbEiwA_2ZWITpDT3tQihrnx6f12R1Z_SS5v8I4QOajIsLsCgKIHBmbr7AQhXs29hoCq7MQAvD_BwE; bm_sz=407F36BCEED7397B2796CCBAFF2620BC~YAAQjawwF83rwqSFAQAA3Ot9wBKaKZdbB9Y7jhId4dnxST3HQMpCcM8fw9GzoBRlw3XS7cxHSRdesUmndvzw5MrWXkF8CCBrNf55zkb0irB+X0GSiJ7jlrfQhPJm5pyOJweP7MljCNkGhvoXsFdeALV6hogaciQ8eZszuw5MjLFJqK+8R5TF197k8snHi35W6asNfHIFQFC7QjCJf1Xdscwq6pRiDBDzRMEom0ZJLFA5QH6mxXxyDvweAaxknlCJpuQNulK9uYX3Z9flS2ub/yej6tx838qGRtOrbPZjUFGb35aHKfA=~3618871~3551289; bm_mi=D1ECD6D69DF4FCF2447CA87DD3B36DCC~YAAQjawwF1TtwqSFAQAAT/h9wBKECteQpow38LeC7AF6lJBbNnWwoXOIabtjZ0aYYwP82+DDjKvq/3Iy8aSOF1VpWZ7n5lo43QM/3wcZAfCXDBqOAhKW0DHYl1f+HtfY0BhaSPowXHaRps5Bps7shyzI4xCCJlJ/EjoDPnRNyG+pOO0eR81EEPvKTmnxorPFpI+meB92180YpAbIzuCRR2wlRjXGLrN6hPcn2K639lyRg7OysVGeD0wKRwA6C9qCJOSw+UeOzsBLzN2yasv24UHDzu0KhAMG1aUAX1M7m+Is1/9XgCuF4N2em+EeBT4=~1; _SID_Tokopedia_=FkOk2TYitSctm40z6gGU5d0rvFbmBfnuTekFzj5sfhrNCACYiHTFqsOhAnULdaIF0gKZAObdOYW1rkMnfFmgEAKYkkABG4SwICQ8y9Sn23jxpfLCfE3uUwaGvCYdVDlQ; _gid=GA1.2.954641500.1673971765; ak_bmsc=406E089203B03D88D47767E1E217F9FD~000000000000000000000000000000~YAAQjawwF078wqSFAQAAsXt+wBIT3sGullRUNTQOxWYfvUfnJHjOSUmYmeOSkNZFVgu5iBocwkqC9vouEj5QJpUHI3yfwqLTKiPWk6hPIewTMeRS1zhYRKjWw6m8qAiFKxWwz81KTZULKsQMkgSu7OEyh54SjCA5Kjqq4msj5Fkt8sI7Rxz20JisZ0kTm+ifNDcnGAuwXwEuKv/OwR+zaZOs4V/ZlN057HaxgjYJMJdAQbNyGxaS/IcWjOLyFAJUysP9PuN9U861QbpUoAByFFgk+hbcOPedDfvU78IbfG8QGtI9DLUK9EiM41s+RiNACl8TzDqBYZKkkcmkTPiGo56Ss5iThx1KeRCPHxlzKNy2YQD11hYqkrkcE8Ev8qQoTdZ1uTA8sF+EDUXxsHOd95D12RkWR9j2u5J5TCrJHUVBSElUv0iH1b3e8+5VXsjpzjGwfGhdvYHplzDtdL0kPX611vA7NQamdCiWM2JQUf/lCs799UHZ38bw/3Lx3faUHBHp; AMP_TOKEN=%24NOT_FOUND; _abck=9753AC2451958445CFC616A8A0ACC69F~0~YAAQjawwF7Caw6SFAQAA8z6LwAkFDeCLlOgT11GetPYNLFZ8pAJVzwWYLl1s3cjd6hMuDGbAqmTXgRWFLvkoFiGerB8edHuVSOGZtUrTDK/LF6Ydj4hlOGJn4h5QZAjak06hXhYdeISENSoLneTNBlyxtjNOxDU9jB+AiAIZKtNkmKVp+H4oPVXY6xZ7kPuTlFP/A+hHPlsvCiPtnzM8AGXiTPdMBzGIaA8yePHjNoVPAVzX0YujY4NYNLO79XEnwVkcms5SrCmfpJPMzagoVgKBx3SQUCB3LemfpFFKy/zjxvMy/fcNjifEXO1XWqckmO7sI8jE3eCe/KFJdbArlmt/GQmWz3zQm7DhDiAG0cUM0UMRPWyKEFO54DhWx1MLxnHgMne8MH2Spk4s7U5B6432CfpAxLOFqxYG~-1~-1~-1; __asc=acf830bd185c0a8e39129c5d96b; _dc_gtm_UA-126956641-6=1; _dc_gtm_UA-9801603-1=1; _ga_70947XW48P=GS1.1.1673971765.5.1.1673974635.60.0.0; _ga=GA1.1.1251187054.1672228218' \
-H 'origin: https://www.tokopedia.com' \
-H 'referer: https://www.tokopedia.com/search?st=product&q=keybord%20RGB' \
-H 'sec-ch-ua: "Not_A Brand";v="99", "Google Chrome";v="109", "Chromium";v="109"' \
-H 'sec-ch-ua-mobile: ?0' \
-H 'sec-ch-ua-platform: "Windows"' \
-H 'sec-fetch-dest: empty' \
-H 'sec-fetch-mode: cors' \
-H 'sec-fetch-site: same-site' \
-H 'tkpd-userid: 0' \
-H 'user-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36' \
-H 'x-device: desktop-0.0' \
-H 'x-source: tokopedia-lite' \
-H 'x-tkpd-lite-service: zeus' \
-H 'x-version: bacb7c4' \
--data-raw $'[{"operationName":"SearchProductQueryV4","variables":{"params":"device=desktop&ob=23&page=1&q=keybord%20RGB&related=true&rows=60&safe_search=false&scheme=https&shipping=&source=search&st=product&start=0&topads_bucket=true&unique_id=c1e6d871a598c87993e83a1501ab6f32&user_addressId=&user_cityId=176&user_districtId=2274&user_id=&user_lat=&user_long=&user_postCode=&user_warehouseId=12210375&variants="},"query":"query SearchProductQueryV4($params: String\u0021) {\\n ace_search_product_v4(params: $params) {\\n header {\\n totalData\\n totalDataText\\n processTime\\n responseCode\\n errorMessage\\n additionalParams\\n keywordProcess\\n componentId\\n __typename\\n }\\n data {\\n banner {\\n position\\n text\\n imageUrl\\n url\\n componentId\\n trackingOption\\n __typename\\n }\\n backendFilters\\n isQuerySafe\\n ticker {\\n text\\n query\\n typeId\\n componentId\\n trackingOption\\n __typename\\n }\\n redirection {\\n redirectUrl\\n departmentId\\n __typename\\n }\\n related {\\n position\\n trackingOption\\n relatedKeyword\\n otherRelated {\\n keyword\\n url\\n product {\\n id\\n name\\n price\\n imageUrl\\n rating\\n countReview\\n url\\n priceStr\\n wishlist\\n shop {\\n city\\n isOfficial\\n isPowerBadge\\n __typename\\n }\\n ads {\\n adsId: id\\n productClickUrl\\n productWishlistUrl\\n shopClickUrl\\n productViewUrl\\n __typename\\n }\\n badges {\\n title\\n imageUrl\\n show\\n __typename\\n }\\n ratingAverage\\n labelGroups {\\n position\\n type\\n title\\n url\\n __typename\\n }\\n componentId\\n __typename\\n }\\n componentId\\n __typename\\n }\\n __typename\\n }\\n suggestion {\\n currentKeyword\\n suggestion\\n suggestionCount\\n instead\\n insteadCount\\n query\\n text\\n componentId\\n trackingOption\\n __typename\\n }\\n products {\\n id\\n name\\n ads {\\n adsId: id\\n productClickUrl\\n productWishlistUrl\\n productViewUrl\\n __typename\\n }\\n badges {\\n title\\n imageUrl\\n show\\n __typename\\n }\\n category: departmentId\\n categoryBreadcrumb\\n categoryId\\n categoryName\\n countReview\\n customVideoURL\\n discountPercentage\\n gaKey\\n imageUrl\\n labelGroups {\\n position\\n title\\n type\\n url\\n __typename\\n }\\n originalPrice\\n price\\n priceRange\\n rating\\n ratingAverage\\n shop {\\n shopId: id\\n name\\n url\\n city\\n isOfficial\\n isPowerBadge\\n __typename\\n }\\n url\\n wishlist\\n sourceEngine: source_engine\\n __typename\\n }\\n violation {\\n headerText\\n descriptionText\\n imageURL\\n ctaURL\\n ctaApplink\\n buttonText\\n buttonType\\n __typename\\n }\\n __typename\\n }\\n __typename\\n }\\n}\\n"}]' \
--compressed
Untuk code perubahannya kamu bisa lihat di bawah ini.
url_target = 'https://gql.tokopedia.com/graphql/SearchProductQueryV4'
header = {'authority': 'gql.tokopedia.com',
'accept': '*/*',
'accept-language': 'en-US,en;q=0.9',
'cache-control': 'no-cache',
'content-type': 'application/json',
'cookie': '_gcl_au=1.1.742344446.1668161227; DID=af3dc19da02ce29f6a0b5b42a0c49298cde2ceff9735aed91a4ce0ac9fa5c43bef01c3977563b91e5eb1ab63e1cd5577; DID_JS=YWYzZGMxOWRhMDJjZTI5ZjZhMGI1YjQyYTBjNDkyOThjZGUyY2VmZjk3MzVhZWQ5MWE0Y2UwYWM5ZmE1YzQzYmVmMDFjMzk3NzU2M2I5MWU1ZWIxYWI2M2UxY2Q1NTc347DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=; _UUID_NONLOGIN_=f419ede0ae33a45c36cd442f6b801712; _jxx=66c12850-e572-11ec-bda2-4f63525eb407; _jx=66c12850-e572-11ec-bda2-4f63525eb407; _UUID_CAS_=a6e9168d-d9ef-4275-a19a-a74ec4995d41; _CASE_=7d24624f62243c343431322a24674f62243c362a246a646a243c244c676d67747267265673756772242a24654f62243c3731302a246a696861243c24242a246a6772243c24242a24764569243c24242a24714f62243c37343437363531332a24754f62243c37373335363331352a2475527f7663243c24346e242a24716e75243c245d7d5a24716774636e69737563596f625a243c37343437363531332a5a24756374706f656359727f76635a243c5a24346e5a242a5a245959727f766368676b635a243c5a24516774636e69737563755a247b2a7d5a24716774636e69737563596f625a243c362a5a24756374706f656359727f76635a243c5a2437336b5a242a5a245959727f766368676b635a243c5a24516774636e69737563755a247b5b242a246a537662243c24343634342b37372b37375237313c36313c363f2d36313c3636247b; __auc=5794a37d18466286071c95daa13; hfv_banner=true; _gid=GA1.2.1854659675.1668649242; bm_sz=05EEAD4804A6F8E6BAB6767459CA6E76~YAAQzOwZuM4gTIKEAQAAT2jAiRH7AJio4on1JpPx7bHwuPi3ZUgJYjnOJiMEx32J9WxatoY7KLMZTLbgltZT8sZtRLY9O0dAliIfxxlamkfymIUR7cfTyQ3VYMvwBWUeGSJuDXpG+4GJydoef7VURIFsW5/Iu8NbCxMqkptk4e9dO0l4OGRgq/ogA1ohBfK7kUP7DBTDVoyjZCd6U88sRRUCDAsqNuXLlnkAhHudR7oZ+IL95uuK0cD3jFJ8Pu8NqntBPXRLPRvWA/inqRdfD9UNELFQ6z1X8Ha5XImYQTntFnmXGnA=~3617079~3356225; bm_mi=91038649E3C89AF53C6A2BD5516B728D~YAAQzOwZuAcjTIKEAQAApX/AiRHtd8wP/Z3ssr0KtY6iTHfx8CVHSATWfqPKx8WU+hm1IK6/tBzIc8p+a0Pp/Hcgmhycfg03PskA91NMaQSxkuG/cfj8EnnWOrKi4oHPSdxtfpaPKPP9hnbCY7m3rtxGnQNeL2n+D6magGRnkNLU/foQTl3PmvmXRpK524dAYJzwCujJieOqlYFD+KGEYdLC1I0t6Afe93KMbhYZJ/igzOTDMGMoJ/uxBUPWxRaLFPL1t/r+db8wVQ/h36DhuOGNYpStr6pWtG6D6MwzEL9NA9wR74h39jBO6ghCePI3hQ==~1; _abck=B2D8A04A236B2AA069C87B3A5DE321BE~0~YAAQzOwZuKQjTIKEAQAAWYjAiQjIRiB3dF+bimVc5Mm8SWSKjbyMy+u8Qe3wn/wtzCTAyg7+cJY2MzeF1y7whvmc4xzG2Zz2wTF/42HhRGBX9pzP617M3lpbPMDdFFM46BpUOgt50moIwObnZmuLp1S9cGWRSslLIJqHI1BeE/AHcqZpWnxqEB8E+hQj+scUWx2Yyw+Z1VeC2aHeIHBmMxOgU/PXKnXL95SRGQwRGHXeYEOsLCzBPEEQntvNxJ9T+vtg4l9+hl+Iiv7GoXrYryD/4iWYDgG0VxA7TCuFnyF2I+w3dwvi2osDA6wCTcF99k2yuYhe1lVlEN8GxRBwVvkuuE76fw1KgHnnbyNm0NZgTZ35DsfSExPLVXcqXV/rcrhRRos943/KPBdYo0p2IowoUmqTYi89J+0=~-1~-1~-1; _SID_Tokopedia_=cyVbwbapm6sAFa2KyDu5FzA0d-3pXu96LerPKwMyvB6_HxtpdgrJq0RCB0tEAcQAn5S0hDVMmUS3FLWzrv_w7DlaFkdsUS-Ti6vDgSfqFcGrNmgl4JkMGDlTYVL-_mHL; _jxxs=1668758411-66c12850-e572-11ec-bda2-4f63525eb407; _jxs=1668758411-66c12850-e572-11ec-bda2-4f63525eb407; __asc=5e24a02c18489c09f429400be92; ak_bmsc=4E453508B75C49E6359C52BCB9E15374~000000000000000000000000000000~YAAQzOwZuIglTIKEAQAAlajAiRFZTo7ufYKAGvp8UEIf5jwbVBLx7+dcB2RH/CIjZsgCAi4V2rXa30kvDmVuF/KwhpONRAO5GfqiccJy4zSZxKwGOVoOxjEKtiiQdstgCQEx+wJNOGPrp+54yUUuMSwu6V1iXiheFfuPhMB5EDl6RjUx01r47Pd5F3yB/Le6R51T0HoMMKLoEO3f9yyCNt7WFE+2lZy5iq8YmdzRqdoXxWcuZfwp10TrOWySO5tiRZ7vM+7nGQJDX3uUT37lG5N2btaKwQVYNpR5p03faqopU6iPyZmFmf5oXygnLdKlqJIrXCpU5WtfFfdsDcDfWt1FtRV1H7i4aEgN7cqQRwYSVUq7fBToWdsF7dXdNfGIsXBVP8ToORdXgnTVpGVjwO84sDjiPI1y4dFlysZ5egzFKnlEnaO33MfjU1MH1r6/F2kfOBWQGyjNyZz++Gjq7ivVSA1NVz+ScoSGESc9Tn4g2FIvqx2NOiSs6LJ+4BV6jDkrHJg=; _dc_gtm_UA-126956641-6=1; _ga_70947XW48P=GS1.1.1668758410.12.1.1668758574.57.0.0; AMP_TOKEN=%24NOT_FOUND; _ga=GA1.2.824419592.1668161229; _dc_gtm_UA-9801603-1=1',
'origin': 'https://www.tokopedia.com',
'pragma': 'no-cache',
'referer': 'https://www.tokopedia.com/search?st=product&q=baju%20anak%20perempuan&srp_component_id=02.01.00.00&srp_page_id=&srp_page_title=&navsource=',
'sec-ch-ua': '"Google Chrome";v="107", "Chromium";v="107", "Not=A?Brand";v="24"',
'sec-ch-ua-mobile': '?0',
'sec-ch-ua-platform': '"Linux"',
'sec-fetch-dest': 'empty',
'sec-fetch-mode': 'cors',
'sec-fetch-site': 'same-site',
'tkpd-userid': '0',
'user-agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Safari/537.36',
'x-device': 'desktop-0.0',
'x-source': 'tokopedia-lite',
'x-tkpd-lite-service': 'zeus',
'x-version': '1fbf287'}
def cek_jumlah_data(kata_kunci):
init_query = f'[{{"operationName":"SearchProductQueryV4","variables":{{"params":"device=desktop&navsource=&ob=23&page=1&q={kata_kunci}&related=true&rows=60&safe_search=false&scheme=https&shipping=&source=search&srp_component_id=01.07.00.00&srp_page_id=&srp_page_title=&st=product&start=0&topads_bucket=true&unique_id=1fdabea77fbaf5f1954bdbc40d4a9337&user_addressId=113228966&user_cityId=171&user_districtId=2233&user_id=7773903&user_lat=-6.377643399999999&user_long=106.7621449&user_postCode=16516&user_warehouseId=0&variants="}},"query":"query SearchProductQueryV4($params: String!) {{\\n ace_search_product_v4(params: $params) {{\\n header {{\\n totalData\\n totalDataText\\n processTime\\n responseCode\\n errorMessage\\n additionalParams\\n keywordProcess\\n componentId\\n __typename\\n }}\\n data {{\\n banner {{\\n position\\n text\\n imageUrl\\n url\\n componentId\\n trackingOption\\n __typename\\n }}\\n backendFilters\\n isQuerySafe\\n ticker {{\\n text\\n query\\n typeId\\n componentId\\n trackingOption\\n __typename\\n }}\\n redirection {{\\n redirectUrl\\n departmentId\\n __typename\\n }}\\n related {{\\n position\\n trackingOption\\n relatedKeyword\\n otherRelated {{\\n keyword\\n url\\n product {{\\n id\\n name\\n price\\n imageUrl\\n rating\\n countReview\\n url\\n priceStr\\n wishlist\\n shop {{\\n city\\n isOfficial\\n isPowerBadge\\n __typename\\n }}\\n ads {{\\n adsId: id\\n productClickUrl\\n productWishlistUrl\\n shopClickUrl\\n productViewUrl\\n __typename\\n }}\\n badges {{\\n title\\n imageUrl\\n show\\n __typename\\n }}\\n ratingAverage\\n labelGroups {{\\n position\\n type\\n title\\n url\\n __typename\\n }}\\n componentId\\n __typename\\n }}\\n componentId\\n __typename\\n }}\\n __typename\\n }}\\n suggestion {{\\n currentKeyword\\n suggestion\\n suggestionCount\\n instead\\n insteadCount\\n query\\n text\\n componentId\\n trackingOption\\n __typename\\n }}\\n products {{\\n id\\n name\\n ads {{\\n adsId: id\\n productClickUrl\\n productWishlistUrl\\n productViewUrl\\n __typename\\n }}\\n badges {{\\n title\\n imageUrl\\n show\\n __typename\\n }}\\n category: departmentId\\n categoryBreadcrumb\\n categoryId\\n categoryName\\n countReview\\n customVideoURL\\n discountPercentage\\n gaKey\\n imageUrl\\n labelGroups {{\\n position\\n title\\n type\\n url\\n __typename\\n }}\\n originalPrice\\n price\\n priceRange\\n rating\\n ratingAverage\\n shop {{\\n shopId: id\\n name\\n url\\n city\\n isOfficial\\n isPowerBadge\\n __typename\\n }}\\n url\\n wishlist\\n sourceEngine: source_engine\\n __typename\\n }}\\n violation {{\\n headerText\\n descriptionText\\n imageURL\\n ctaURL\\n ctaApplink\\n buttonText\\n buttonType\\n __typename\\n }}\\n __typename\\n }}\\n __typename\\n }}\\n}}\\n"}}]'
response = requests.post(url_target, headers=header, data=init_query)
#Jika ingin scrape seluruh data, maka buka remark yang ini
jumlah_data = response.json()[0]['data']['ace_search_product_v4']['header']['totalData']
jumlah_page = math.ceil(jumlah_data/60) + 1
return jumlah_data, jumlah_page
def scrape_tokeped(kata_kunci):
print("Mulai scrape data ke tokopedia....")
jml_data, jml_page = cek_jumlah_data(kata_kunci)
hasil = []
for page, data in zip(range(1, jml_page), range(0, jml_data, 60)):
print(page)
query = f'[{{"operationName":"SearchProductQueryV4","variables":{{"params":"device=desktop&navsource=&ob=23&page={page}&q={kata_kunci}&related=true&rows=60&safe_search=false&scheme=https&shipping=&source=search&srp_component_id=01.07.00.00&srp_page_id=&srp_page_title=&st=product&start={data}&topads_bucket=true&unique_id=3220fd80a9a96a8eb398771a986004aa&user_addressId=&user_cityId=176&user_districtId=2274&user_id=&user_lat=&user_long=&user_postCode=&user_warehouseId=12210375&variants="}},"query":"query SearchProductQueryV4($params: String!) {{\\n ace_search_product_v4(params: $params) {{\\n header {{\\n totalData\\n totalDataText\\n processTime\\n responseCode\\n errorMessage\\n additionalParams\\n keywordProcess\\n componentId\\n __typename\\n }}\\n data {{\\n banner {{\\n position\\n text\\n imageUrl\\n url\\n componentId\\n trackingOption\\n __typename\\n }}\\n backendFilters\\n isQuerySafe\\n ticker {{\\n text\\n query\\n typeId\\n componentId\\n trackingOption\\n __typename\\n }}\\n redirection {{\\n redirectUrl\\n departmentId\\n __typename\\n }}\\n related {{\\n position\\n trackingOption\\n relatedKeyword\\n otherRelated {{\\n keyword\\n url\\n product {{\\n id\\n name\\n price\\n imageUrl\\n rating\\n countReview\\n url\\n priceStr\\n wishlist\\n shop {{\\n city\\n isOfficial\\n isPowerBadge\\n __typename\\n }}\\n ads {{\\n adsId: id\\n productClickUrl\\n productWishlistUrl\\n shopClickUrl\\n productViewUrl\\n __typename\\n }}\\n badges {{\\n title\\n imageUrl\\n show\\n __typename\\n }}\\n ratingAverage\\n labelGroups {{\\n position\\n type\\n title\\n url\\n __typename\\n }}\\n componentId\\n __typename\\n }}\\n componentId\\n __typename\\n }}\\n __typename\\n }}\\n suggestion {{\\n currentKeyword\\n suggestion\\n suggestionCount\\n instead\\n insteadCount\\n query\\n text\\n componentId\\n trackingOption\\n __typename\\n }}\\n products {{\\n id\\n name\\n ads {{\\n adsId: id\\n productClickUrl\\n productWishlistUrl\\n productViewUrl\\n __typename\\n }}\\n badges {{\\n title\\n imageUrl\\n show\\n __typename\\n }}\\n category: departmentId\\n categoryBreadcrumb\\n categoryId\\n categoryName\\n countReview\\n customVideoURL\\n discountPercentage\\n gaKey\\n imageUrl\\n labelGroups {{\\n position\\n title\\n type\\n url\\n __typename\\n }}\\n originalPrice\\n price\\n priceRange\\n rating\\n ratingAverage\\n shop {{\\n shopId: id\\n name\\n url\\n city\\n isOfficial\\n isPowerBadge\\n __typename\\n }}\\n url\\n wishlist\\n sourceEngine: source_engine\\n __typename\\n }}\\n violation {{\\n headerText\\n descriptionText\\n imageURL\\n ctaURL\\n ctaApplink\\n buttonText\\n buttonType\\n __typename\\n }}\\n __typename\\n }}\\n __typename\\n }}\\n}}\\n"}}]'
response = requests.post(url_target, headers=header, data=query)
products = response.json()[0]['data']['ace_search_product_v4']['data']['products']
hasil.extend(products)
dtFrame = pd.DataFrame.from_dict(hasil)
dtFrame.to_csv('data_tokped_2.csv', encoding='utf-8')
print("Selesai ...")
keyword = "keybord RGB"
scrape_tokeped(keyword)
Pada scrape di atas kita akan melakukan scrape dengan keyword rgb, kemudian hasilnya akan di ubah ke dalam bentuk csv