urllib,requests,BeautifulSoup,Selenium違い

投稿者: WP_MIWA_KY | 2022-04-10

0件のコメント

参考HP：https://vaaaaaanquish.hatenablog.com/entry/2017/06/25/202924
参考HP：https://myafu-python.com/syntax/library/webscraping-library/

目次

比較まとめ

urllib.request()とrequestsライブラリは別物なので注意

urllib

python標準ライブラリ
使用頻度が高いのは urllib.request()
APIは全くだめ

from urllib.request import urlopen
from bs4 import BeautifulSoup

html = urlopen('http://***')
bs = BeautifulSoup(html.read(), 'html.parser')
print(bs.h1)

# <h1>An Interesting Title</h1>

requests

requests
SSL、ベーシック認証、ダイジェスト認証など
ログイン後の画面にアクセスするときも利用可（Seleniumも可）
たぶんBeautifulSoupにhtml情報を渡す際は「’.content’」で渡す必要がある？
urllib.requestはそのまま渡せる

import requests

params = {'firstname': 'Yamada', 'lastname': 'Taro'}
r = requests.post('http://*****')
print(r.text)

beautiful soup

selenium

JS等の動的なWebであってもうまく取得できる
ブラウザ経由で取得できる

Iconic One Theme | Powered by Wordpress