Python video download and synthesis of sample code

Updated: April 16, 2022 08:57:54 Author: The Demon King does not cry
This article mainly introduces in detail how Python is to achieve video download and synthesis, the implementation of the steps explained in detail, interested partners quickly follow Xiaobian to learn it

Module usage

requests >>> pip install requests (Data Request third party module)

re # regular expression to match the extracted data


Development environment

Python 3.8 interpreter

Pycharm 2021.2 version recommendation

win + R Enter cmd Enter the installation command pip install module name If the red number is displayed, it may be because the network connection times out to switch the domestic image source

Case realization

1. Define your needs

Collect content, first analyze a video from where to get

Through the developer tools for packet analysis, analysis of video data can be obtained from where the content format m3u8 video content

When the video format of our website was m3u8, there was a file dedicated to all ts video clips

2. Code implementation steps

  • Send request
  • Get data
  • Analytic data
  • Save data

1. Send a request for the url of the video playing page

2. Obtain the data and obtain the response response data returned by the server

3. Analyze the data and extract the data content we want, video title and m3u8 link

4. Send a request. Send a request for the m3u8 link

5. Obtain the data and obtain the response response data returned by the server

6. Parse the data and extract all ts file urls [video clip]

7. Save the data, save all the videos, and then synthesize into an overall video content

Implementation code

import requests # Data request module pip install requests Enter the command import re # import regular expression module built-in module import json import pprint # Format output module  for page in range(1, 17): Print (f '-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- is the first {page} page data content -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --') list_url = '' # batch ctrl + R Select target data = {'quickViewId': 'ac-space-video-list', 'reqID': page + 1, 'ajaxpipe': '1', 'type': 'video', 'order': 'newest', 'page': page, 'pageSize': '20', 't': '1649944573765', } headers = { # 'cookies': 'Your cookie', 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.75 Safari/537.36'} # get request has a params parameter # post request data parameter response = requests.get(url=list_url, params=data, headers=headers) # print(response.text) id_list = re.findall('a href=.*? ac(.*?) "', response.text) for index in id_list: video_id = index.replace('\\', '') """ 1. Send a request, For video playback page url address send the request Use python code to simulate the browser to the url address request video "" "url = f ' {video_id}' # # request url address request header Using disguised python code, in order to be recognized by the server is a simple crawling method of crawler. When you add ua to get data, you may want cookie # to log in to get data, you need to add cookie user information. Used to detect whether you are logged in to the account headers = {# 'cookies': 'your cookie', 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.75 Safari/537.36'} # Send requests for url addresses through the Requests module. And carry headers headers disguise, and finally use the response custom variable to receive the returned data response = requests.get(url=url, headers=headers) # 2. Get the data # print(response.text) # 3. Parsing data through the findall method in re module in response.text to find the title data re.s match line feed # regular expression extracted data return is a list data type implementation process is not important, there are many ways, You can use whatever you like as long as you can get the data OK title = re.findall('<title >(.*?) - AcFun bullet screen video network - serious you lose \(\? Omega \? ノ - \ \) (゜ - ゜ \) つ ロ < / title > ', the response. The text) [0] video_info. = re the.findall (' window. PageInfo = window. VideoInfo = (. *?) ; ', response.text)[0] # print(video_info) # What's the safest way to convert a string to a dictionary to check the data type? loads(video_info) #? Loads (video_info) # pprint.pprint(json_data) # The dictionary value extracts the content (value) to the right of the colon according to the content (key) to the left of the colon m3u8_url = \ json.loads(json_data['currentVideoInfo']['ksPlayJson'])['adaptationSet'][0]['representation'][0]['backupUrl'][0] # print(title) # print(m3u8_url) # Send requests to the m3u8_url address through the get request mode in the requests module, and carry the camouflage of headers request head to obtain the text data of the response body. Receive data with m3u8_data custom variable m3u8_data = requests.get(url=m3u8_url, headers=headers).text # split() String split m3u8_data = re.sub('#E.*', '', m3u8_data).split() # print(m3u8_data) for ts in m3u8_data: ts_url = '' + ts ts_content = requests.get(url=ts_url, headers=headers).content # ab a is appended to save, b binary data ab is appended to save binary data with open('video\\' + title + '.mp4', mode='ab') as f: F.rite (ts_content) print(' Video save complete: ', title)

Video tutorial

To this article about Python video download and synthesis of sample code is introduced to this article, more related Python video download synthesis content please search script home previous articles or continue to browse the following related articles hope that you will support the script home in the future!

