- 准备库、测试
- 安装request库
- 启动idle
- 测试百度网页,打印出来
- Requests库的7个主要方法(前言)
- request.get(url)
- Request库的2个重要对象
- Response对象的属性
- 实际的例子
- 爬取网页的通用代码框架
- Requests库的异常
- 提供了一个方法,与异常打交道
- Request库的7个主要方法
- 理解http协议
- HTTP协议对资源的操作
- 理解PATCH和PUT区别
- HTTP协议和Request库
- request.post()
- request.put()
- 七个方法(详细讲解)
- request.request(method,url,**kwargs)
- method:请求方式 七个
- **kwargs:控制访问的参数,均位可选项
- params
- data
- json
- headers
- cookies
- files
- timeout 超时时间
- proxies
- allow_redirects、stream、verify、cert
- 总结
- requests.get(url,params=None,**kwargs)
- request.head(url,**kwargs)
- requests.post(url,data=None,json=None,**kwargs)
- requests.put(url,data=None,**kwargs)
- requests.patch(url,data=None,**kwargs)
- requests.delete(url,**kwargs)
- 总结
- 举例
- 京东爬取
- 亚马逊商品页面爬取
- 访问成功200
- 访问500
- 百度/360搜索关键词提交
- 网络图片的爬取和存储
- 代码的可靠和稳定最重要
pip install requests启动idle 测试百度网页,打印出来
看到百度的主页已经被抓取下来
打开request库的源代码,get方法使用了request方法来封装
r.apparent_encoding:根据网页内容分析出的编码方式
用户请求url,服务器做出响应
通过这六个进行管理,每个都是独立的。
get方法爬取一些内容,并向服务器发送一些内容。
request.head(url,**kwargs) requests.post(url,data=None,json=None,**kwargs) requests.put(url,data=None,**kwargs) requests.patch(url,data=None,**kwargs) requests.delete(url,**kwargs)
后六个方法,这六个方法会常用到一些访问控制参数,所以把参数量放到了函数设计里面,那些不常用的,就放倒了可选的访问字段里面。
Python 3.8.8 (default, Apr 13 2021, 15:08:03) [MSC v.1916 64 bit (AMD64)] on win32 Type "help", "copyright", "credits" or "license()" for more information. >>> import requests >>> kv = {'wd':'Python'} >>> r = request.get("http://www.baidu.com/s",params = kv) Traceback (most recent call last): File "", line 1, in r = request.get("http://www.baidu.com/s",params = kv) NameError: name 'request' is not defined >>> r = requests.get("http://www.baidu.com/s",params = kv) >>> r.status_code 200 >>>
请求成功
Python 3.8.8 (default, Apr 13 2021, 15:08:03) [MSC v.1916 64 bit (AMD64)] on win32 Type "help", "copyright", "credits" or "license()" for more information. >>> import requests >>> kv = {'wd':'Python'} >>> r = request.get("http://www.baidu.com/s",params = kv) Traceback (most recent call last): File "", line 1, in r = request.get("http://www.baidu.com/s",params = kv) NameError: name 'request' is not defined >>> r = requests.get("http://www.baidu.com/s",params = kv) >>> r.status_code 200 >>> r.requests.url Traceback (most recent call last): File " ", line 1, in r.requests.url AttributeError: 'Response' object has no attribute 'requests' >>> r.request.url 'https://wappass.baidu.com/static/captcha/tuxing.html?&logid=8036147062640333629&ak=c27bbc89afca0463650ac9bde68ebe06&backurl=https%3A%2F%2Fwww.baidu.com%2Fs%3Fwd%3DPython&signature=2fa70a21294522eebadb45a7c1695212×tamp=1632972573' >>> len(r.text) 1545 >>> r.text 'nnn n çx99¾åº¦å®x89åx85¨éªx8cè¯x81 n n n n n n n n n n è¿x94åx9bx9eé¦x96页n n n néx97®é¢x98åx8fx8dé¦x88
n nnnnn'
Python 3.8.8 (default, Apr 13 2021, 15:08:03) [MSC v.1916 64 bit (AMD64)] on win32 Type "help", "copyright", "credits" or "license()" for more information. >>> import requests >>> path = "E:/tp/abc.jpg" >>> url = "http://www.sinaimg.cn/dy/slidenews/1_img/2016_43/63957_743512_229766.jpg" >>> r = requests.get(url) >>> r.status_code 200 >>> with open(path,'wb') as f: f.write(r.content) 156992 >>> >>> f.close() >>>
代码怎么执行都不会错