国产99视频精品免视看6

    1. <em id="yud1w"><acronym id="yud1w"><u id="yud1w"></u></acronym></em>
      
      
      <button id="yud1w"></button>

      python

      当前位置:首页?>?requests爬虫?>?当前文章

      requests爬虫

      requests的post请求提交表单、json串和文件数据讲解

      2019-07-18 159赞 python中国网
      每篇文章努力于解决一个问题!python高级、python面试全套、操作系统经典课等可移步文章底部。

        HTTP协议中没有规定post提交的数据必须使用什么编码方式,服务端根据请求头中的Content-Type字段来获取编码方式,再对数据进行解析。具体的编码方式包括如下:

       - application/x-www-form-urlencoded # 以form表单形式提交数据,最常见最熟悉
      
       - application/json # 以json串提交数据。
      
       - multipart/form-data # 上传文件
      

        下面使用requests来发送上述三种编码的POST请求。

        1、提交Form表单

        requests提交Form表单,一般存在于网站的登录,用来提交用户名和密码。以 http://httpbin.org/post 为例,在requests中,以form表单形式发送post请求,只需要将请求的参数构造成一个字典,然后传给requests.post()的data参数即可。(httpbin.org 网站可以显示提交请求的内容,输出的”Content-Type”:”application/x-www-form-urlencoded”,证明这是提交Form的方式。)代码如下:

      # -*- coding: utf-8 -*-
      import requests
      
      def get_html(url, key_value, retry=2):
          try:
              r = requests.post(url=url, headers=headers, data=key_value, timeout=5)
          except Exception as e:
              print(e)
              if retry > 0:
                  get_html(url, retry - 1)
          else:
              page = r.text
              return page
      
      
      if __name__ == "__main__":
          # 自定义请求头信息
          headers = {
              'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36',
          }
          url = 'http://httpbin.org/post'
          kw = {'wd': 'www.bdd33.com'}
          html = get_html(url, kw)
          print(html)
      
      D:python3installpython.exe D:/python/py3script/test.py
      {
        "args": {}, 
        "data": "", 
        "files": {}, 
        "form": {
          "wd": "www.bdd33.com"
        }, 
        "headers": {
          "Accept": "*/*", 
          "Accept-Encoding": "gzip, deflate", 
          "Content-Length": "19", 
          "Content-Type": "application/x-www-form-urlencoded", 
          "Host": "httpbin.org", 
          "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36"
        }, 
        "json": null, 
        "origin": "223.72.81.198, 223.72.81.198", 
        "url": "https://httpbin.org/post"
      }
      
      
      Process finished with exit code 0
      
      
      

        2、提交json串

        对于提交json串(浏览器中抓包显示payload),主要是用于发送ajax请求中,动态加载数据。

        可以用json.dumps()对dict进行编码,可以使用json参数直接传递,然后它就会被自动编码,在请求头中也不用显示声明 这是 2.4.2 版的新加功能。代码如下:

      # -*- coding: utf-8 -*-
      import requests
      import json
      
      
      def get_html(url, key_value, retry=2):
          try:
              r = requests.post(url=url, headers=headers, data=key_value, timeout=5)
          except Exception as e:
              print(e)
              if retry > 0:
                  get_html(url, retry - 1)
          else:
              page = r.text
              return page
      
      
      def get_html_json(url, key_value, retry=2):
          try:
              r = requests.post(url=url, headers=headers, json=key_value, timeout=5)
          except Exception as e:
              print(e)
              if retry > 0:
                  get_html_json(url, retry - 1)
          else:
              page = r.text
              return page
      
      
      if __name__ == "__main__":
          # 自定义请求头信息
          headers = {
              'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36',
              'Content-Type':'application/json; charset=UTF-8',
          }
          url = 'https://api.xxxx.com/xxx/xxx'
          kw = {'domain': 'www.bdd33.com'}
          # json.dumps
          html = get_html(url, json.dumps(kw))
          # 传递json参数
          html_json = get_html_json(url,kw)
      

        3.上传文件:

        上传文件在爬虫中使用的很少。Content-Type类型为multipart/form-data,以multipart形式发送post请求,只需将一文件传给 requests.post() 的 files参数即可。还是以 http://httpbin.org/post 为例,代码如下:

      # -*- coding: utf-8 -*-
      import requests
      
      def get_html(url, key_value, retry=2):
          try:
              r = requests.post(url=url, headers=headers, data=key_value, timeout=5)
          except Exception as e:
              print(e)
              if retry > 0:
                  get_html(url, retry - 1)
          else:
              page = r.text
              return page
      
      
      if __name__ == "__main__":
          # 自定义请求头信息
          headers = {
              'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36',
          }
          url = 'http://httpbin.org/post'
          files = {'file': open('ajax.png', 'rb')}
          html = get_html(url, files)
          print(html)
      
      D:python3installpython.exe D:/python/py3script/test.py
      {
        "args": {}, 
        "data": "", 
        "files": {
          "file": "data:application/octet-stream;base64,...太长..省略..."
        }, 
        "form": {}, 
        "headers": {
          "Accept": "*/*", 
          "Accept-Encoding": "gzip, deflate", 
          "Content-Length": "68870", 
          "Content-Type": "multipart/form-data; boundary=66f5b203f18f79960ac438c59af481b0", 
          "Host": "httpbin.org", 
          "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36"
        }, 
        "json": null, 
        "origin": "223.72.72.67, 223.72.72.67", 
        "url": "https://httpbin.org/post"
      }
      
      
      Process finished with exit code 0
      
      

        警告

        建议用二进制模式(binary mode)打开文件,因为 Requests 可能会试图为你提供 Content-Length header,在它这样做的时候,这个值会被设为文件的字节数(bytes)。如果用文本模式(text mode)打开文件,就可能会发生错误。

      文章评论

      requests的post请求提交表单、json串和文件数据讲解文章写得不错,值得赞赏
      国产99视频精品免视看6