久久久无码精品午夜,美女的诞生高清免费观看视频

^{<thead id="mqnzj"></thead>}

^{<thead id="mqnzj"></thead>}

[使用案例]如何使用ip代理獲取UC新聞內(nèi)容

發(fā)布時(shí)間：2020-05-25 關(guān)注熱度：°C

　　選擇優(yōu)質(zhì)的ip代理，我們能夠利用它來(lái)完成很多網(wǎng)絡(luò)工作，比如網(wǎng)上的大數(shù)據(jù)抓取，其實(shí)就是要依靠ip代理來(lái)進(jìn)行的。今天，IP海向大家介紹一個(gè)爬取新聞網(wǎng)站內(nèi)容的教程。

UC新聞內(nèi)容ip代理教你如何獲取

　　IP海以UC網(wǎng)站為例子：

　　這個(gè)網(wǎng)站并沒(méi)有太復(fù)雜的反爬蟲，我們可以直接解析爬取就好。

　　from bs4 import BeautifulSoup

　　from urllib import request

　　def download(title,url):

　　req = request.Request(url)

　　response = request.urlopen(req)

　　response = response.read().decode('utf-8')

　　soup = BeautifulSoup(response,'lxml')

　　tag = soup.find('div',class_='sm-article-content')

　　if tag == None:

　　return 0

　　title = title.replace(':','')

　　title = title.replace('"','')

　　title = title.replace('|','')

　　title = title.replace('/','')

　　title = title.replace('\','')

　　title = title.replace('*','')

　　title = title.replace('<','')

　　title = title.replace('>','')

　　title = title.replace('?','')

　　with open(r'D:codepythonspider_newsUC_newssociety\' + title + '.txt','w',encoding='utf-8') as file_object:

　　file_object.write(' ')

　　file_object.write(title)

　　file_object.write(' ')

　　file_object.write('該新聞地址：')

　　file_object.write(url)

　　file_object.write(' ')

　　file_object.write(tag.get_text())

　　#print('正在爬取')

　　if __name__ == '__main__':

　　for i in range(0,7):

　　url = 'https://news.uc.cn/c_shehui/'

　　# headers = {"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.91 Safari/537.36",

　　# "cookie":"sn=3957284397500558579; _uc_pramas=%7B%22fr%22%3A%22pc%22%7D"}

　　# res = request.Request(url,headers = headers)

　　res = request.urlopen(url)

　　req = res.read().decode('utf-8')

　　soup = BeautifulSoup(req,'lxml')

　　#print(soup.prettify())

　　tag = soup.find_all('div',class_ = 'txt-area-title')

　　#print(tag.name)

　　for x in tag:

　　news_url = 'https://news.uc.cn' + x.a.get('href')

　　print(x.a.string,news_url)

　　download(x.a.string,news_url)

　　這樣，我們就完成了網(wǎng)站新聞數(shù)據(jù)的抓取，可以檢查運(yùn)行結(jié)果看到，我們的數(shù)據(jù)是否成功獲得。

版權(quán)聲明：本文為IP海（iphai.cn）原創(chuàng)作品，未經(jīng)許可，禁止轉(zhuǎn)載！

上一篇：[代理百科]怎么判斷ip代理是否能用？如何識(shí)別ip代理？

下一篇：[代理百科]ip代理如何選擇服務(wù)商？

欧美成a人片在线观看久,久久久久亚洲av无码专区桃色,久久久婷婷五月亚洲97号色,久久亚洲中文字幕精品一区,另类av

[使用案例]如何使用ip代理獲取UC新聞內(nèi)容

最新資訊

干貨分享