こんにちは。YOSHITAKA(@YOSHITA19704216)です。
今回は動画の通り、Pythonを打ち込んで、できたので足跡として残しておきます。
僕もまだまだ詳しくないので、詳しいことは言えませんが、コピペ程度に使ってください。
- Pythonでダウンロードする実践的な方法が分かります。
Contents
必要な情報と記述例一式
内容を参考にしたサイト
この動画を参考にしました。
https://www.youtube.com/watch?v=GXBCPEBAlVk&t=1054s
from bs4 import BeautifulSoup import lxml import requests headers = {"user-Agent": "hoge"} URL = "https://search.yahoo.co.jp/image/search?p=%E3%81%8C%E3%81%A3%E3%81%8D%E3%83%BC&ei=UTF-8&save=0" resp = requests.get(URL, timeout=1, headers=headers) soup = BeautifulSoup(resp.text,"lxml") print(soup.find("a")["href"])
———
from bs4 import BeautifulSoup import lxml import requests headers = {"user-Agent": "hoge"} URL = "https://search.yahoo.co.jp/image/search?p=%E3%81%8C%E3%81%A3%E3%81%8D%E3%83%BC&ei=UTF-8&save=0" resp = requests.get(URL, timeout=1, headers=headers) soup = BeautifulSoup(resp.text,"lxml") print(soup.find_all("img")[0])
———
from bs4 import BeautifulSoup import lxml import requests headers = {"user-Agent": "hoge"} URL = "https://search.yahoo.co.jp/image/search?p=%E3%81%8C%E3%81%A3%E3%81%8D%E3%83%BC&ei=UTF-8&save=0" resp = requests.get(URL, timeout=1, headers=headers) soup = BeautifulSoup(resp.text,"lxml") print(soup.find_all("img")[0]["src"])
———
ローカルにダウンロードする。 落とす
画像の個数を調べる
from bs4 import BeautifulSoup import lxml import requests headers = {"user-Agent": "hoge"} URL = "https://search.yahoo.co.jp/image/search?p=%E3%81%8C%E3%81%A3%E3%81%8D%E3%83%BC&ei=UTF-8&save=0" resp = requests.get(URL, timeout=1, headers=headers) soup = BeautifulSoup(resp.text,"lxml") imgs = soup.find_all("img") print(len(imgs))
エンコード&デコードしました。
(Macだとアドレスバーが勝手に変化します)
https://search.yahoo.co.jp/image/search?p=%E3%82%AC%E3%83%83%E3%82%AD%E3%83%BC&oq=&ei=UTF-8&save=0
https://search.yahoo.co.jp/image/search?p=ガッキー&oq=&ei=UTF-8&save=0
———
from bs4 import BeautifulSoup import lxml import requests headers = {"user-Agent": "hoge"} URL = "https://search.yahoo.co.jp/image/search?p=ガッキー&oq=&ei=UTF-8&save=0" resp = requests.get(URL, timeout=1, headers=headers) soup = BeautifulSoup(resp.text,"lxml") imgs = soup.find_all(alt="「ガッキー」の画像検索結果") print(len(imgs))
———
URLを取得
from bs4 import BeautifulSoup import lxml import requests headers = {"user-Agent": "hoge"} URL = "https://search.yahoo.co.jp/image/search?p=ガッキー&oq=&ei=UTF-8&save=0" resp = requests.get(URL, timeout=1, headers=headers) soup = BeautifulSoup(resp.text,"lxml") imgs = soup.find_all(alt="「ガッキー」の画像検索結果") print(len(imgs)) for i in range(len(imgs)): print(imgs[i]["src"])
———
ダウンロードする
from bs4 import BeautifulSoup import lxml import requests import urllib.request headers = {"user-Agent": "hoge"} URL = "https://search.yahoo.co.jp/image/search?p=ガッキー&oq=&ei=UTF-8&b=1" resp = requests.get(URL, timeout=1, headers=headers) soup = BeautifulSoup(resp.text,"lxml") imgs = soup.find_all(alt="「ガッキー」の画像検索結果") print(len(imgs)) for i in range(len(imgs)): filepath = "{}.jpg".format(i) urllib.request.urlretrieve(imgs[i]["src"],filepath)
———
ページごとにダウンロード
from bs4 import BeautifulSoup import lxml import requests import urllib.request num = 0 headers = {"user-Agent": "hoge"} URL = "https://search.yahoo.co.jp/image/search?p=ガッキー&oq=&ei=UTF-8&b={}".format(1+ 20*num) resp = requests.get(URL, timeout=1, headers=headers) soup = BeautifulSoup(resp.text,"lxml") imgs = soup.find_all(alt="「ガッキー」の画像検索結果") print(len(imgs)) for i in range(len(imgs)): filepath = "{}.jpg".format(i) urllib.request.urlretrieve(imgs[i]["src"],filepath)
from bs4 import BeautifulSoup import lxml import requests import urllib.request num = 0 for num in range(3): headers = {"user-Agent": "hoge"} URL = "https://search.yahoo.co.jp/image/search?p=ガッキー&oq=&ei=UTF-8&b={}".format(1+ 20*num) resp = requests.get(URL, timeout=1, headers=headers) soup = BeautifulSoup(resp.text,"lxml") imgs = soup.find_all(alt="「ガッキー」の画像検索結果") print(len(imgs)) for i in range(len(imgs)): filepath = "{}.jpg".format(i) urllib.request.urlretrieve(imgs[i]["src"],filepath)
from bs4 import BeautifulSoup import lxml import requests import urllib.request num = 0 for num in range(3): headers = {"user-Agent": "hoge"} URL = "https://search.yahoo.co.jp/image/search?p=ガッキー&oq=&ei=UTF-8&b={}".format(1+ 20*num) resp = requests.get(URL, timeout=1, headers=headers) soup = BeautifulSoup(resp.text,"lxml") imgs = soup.find_all(alt="「ガッキー」の画像検索結果") print(len(imgs)) for i in range(len(imgs)): filepath = "./date/practice/{0}-{1}.jpg".format(num,i) urllib.request.urlretrieve(imgs[i]["src"],filepath)
———
検索ワードを自分で入れる
from bs4 import BeautifulSoup import lxml import requests import urllib.request import sys query_word = sys.argv[1] num = 0 for num in range(3): headers = {"user-Agent": "hoge"} URL = "https://search.yahoo.co.jp/image/search?p={0}&oq=&ei=UTF-8&b={1}".format(query_word,1+ 20*num) resp = requests.get(URL, timeout=1, headers=headers) soup = BeautifulSoup(resp.text,"lxml") imgs = soup.find_all(alt="「{}」の画像検索結果.format(query_word)") print(len(imgs)) for i in range(len(imgs)): filepath = "./date/practice/{0}-{1}.jpg".format(num,i) urllib.request.urlretrieve(imgs[i]["src"],filepath)
———
from bs4 import BeautifulSoup import lxml import requests import urllib.request import sys query_word = sys.argv[1] num = 0 for num in range(3): headers = {"user-Agent": "hoge"} URL = "https://search.yahoo.co.jp/image/search?p={0}&oq=&ei=UTF-8&b={1}".format(query_word,1+ 20*num) resp = requests.get(URL, timeout=1, headers=headers) soup = BeautifulSoup(resp.text,"lxml") imgs = soup.find_all(alt="「{}」の画像検索結果.format(query_word)") print(len(imgs)) for i in range(len(imgs)): filepath = "./date/{0}/{1}-{2}.jpg".format(query_word,num,i) urllib.request.urlretrieve(imgs[i]["src"],filepath)
———
フォルダをなければ作る
from bs4 import BeautifulSoup import lxml import requests import urllib.request import sys query_word = sys.argv[1] num = 0 for num in range(3): headers = {"user-Agent": "hoge"} URL = "https://search.yahoo.co.jp/image/search?p={0}&oq=&ei=UTF-8&b={1}".format(query_word,1+ 20*num) resp = requests.get(URL, timeout=1, headers=headers) soup = BeautifulSoup(resp.text,"lxml") imgs = soup.find_all(alt="「{}」の画像検索結果.format(query_word)") print(len(imgs)) for i in range(len(imgs)): dir_name ="./date/{0}" filepath = "./date/{0}/{1}-{2}.jpg".format(query_word,num,i) urllib.request.urlretrieve(imgs[i]["src"],filepath)
———
from bs4 import BeautifulSoup import lxml import requests import urllib.request import sys,os query_word = sys.argv[1] num = 0 for num in range(3): headers = {"user-Agent": "hoge"} URL = "https://search.yahoo.co.jp/image/search?p={0}&oq=&ei=UTF-8&b={1}".format(query_word,1+ 20*num) resp = requests.get(URL, timeout=1, headers=headers) soup = BeautifulSoup(resp.text,"lxml") imgs = soup.find_all(alt="「{}」の画像検索結果.format(query_word)") print(len(imgs)) for i in range(len(imgs)): dir_name ="./date/{0}".format(query_word) if not os.path.exists(dir_name): os.makedirs(dir_name) filepath = dir_name + "{0}-{1}.jpg".format(num,i) urllib.request.urlretrieve(imgs[i]["src"],filepath)
———
from bs4 import BeautifulSoup import lxml import requests import urllib.request import sys, os query_word = sys.argv[1] for num in range(2): headers = {"user-Agent": "hoge"} URL = "https://search.yahoo.co.jp/image/search?p={0}&oq=&ei=UTF-8&b={1}".format(query_word, 1+ 20*num) resp = requests.get(URL, timeout=1, headers=headers) soup = BeautifulSoup(resp.text,"lxml") imgs = soup.find_all(alt="「{}」の画像検索結果".format(query_word)) for i in range(len(imgs)): dir_name ="./date/{0}".format(query_word) if not os.path.exists(dir_name): os.makedirs(dir_name) filepath = dir_name + "/{0}-{1}.jpg".format(num, i) urllib.request.urlretrieve(imgs[i]["src"], filepath)
———
完成
from bs4 import BeautifulSoup import lxml import requests import urllib.request import sys, os query_word = sys.argv[1] max_num = int(sys.argv[2]) for num in range(max_num): headers = {"user-Agent": "hoge"} URL = "https://search.yahoo.co.jp/image/search?p={0}&oq=&ei=UTF-8&b={1}".format(query_word, 1+ 20*num) resp = requests.get(URL, timeout=1, headers=headers) soup = BeautifulSoup(resp.text,"lxml") imgs = soup.find_all(alt="「{}」の画像検索結果".format(query_word)) for i in range(len(imgs)): dir_name ="./date/{0}".format(query_word) if not os.path.exists(dir_name): os.makedirs(dir_name) filepath = dir_name + "/{0}-{1}.jpg".format(num, i) urllib.request.urlretrieve(imgs[i]["src"], filepath)
まとめ
今回は、備忘録になりました。
僕も勉強中ですので、コピペで思い出せるようにしておきます。
この記事を読んでいるあなたも、パイソンを学ぶきっかけになってくれればと思います。