我试图抓取亚马逊的产品页面( https://www.amazon.com/dp/B0B6TR2GTJ), 代码如下:
import requests url = "https://www.amazon.com/dp/B0B6TR2GTJ" headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36', 'Accept-Language': 'en-US, en;q=0.5' } r= requests.get(url, headers = headers) print(r.status_code) print("-------------------") doc = pq(r.text) print(doc("title")) print("-------------------") print(r.text) 结果如下(被判断为机器人了): Headers 尝试了各种写法, 都是一样的结果.
503 ------------------- <title>Sorry! Something went wrong!</title> ------------------- <!-- To discuss automated access to Amazon data please contact [email protected]. For information about migrating to our APIs refer to our Marketplace APIs at https://developer.amazonservices.com/ref=rm_5_sv, or our Product Advertising API at https://affiliate-program.amazon.com/gp/advertising/api/detail/main.html/ref=rm_5_ac for advertising use cases. --> <!doctype html> ...... 我爬虫还在初学阶段, 有没有前辈大神帮帮我. 万分感谢
