[python] Python으로 URL의 내용을 어떻게 읽을 수 있습니까?

Question 1

브라우저에 붙여 넣으면 다음이 작동합니다.

http://www.somesite.com/details.pl?urn=2344

하지만 Python으로 URL을 읽으려고하면 아무 일도 일어나지 않습니다.

 link = 'http://www.somesite.com/details.pl?urn=2344'
 f = urllib.urlopen(link)
 myfile = f.readline()
 print myfile

URL을 인코딩해야합니까, 아니면 보이지 않는 것이 있습니까?

Question 2

질문에 답하려면 :

import urllib

link = "http://www.somesite.com/details.pl?urn=2344"
f = urllib.urlopen(link)
myfile = f.read()
print(myfile)

당신은 할 필요가 read()없습니다readline()

수정 (2018-06-25) : Python 3 이후 레거시 urllib.urlopen()가로 대체되었습니다 urllib.request.urlopen()(자세한 내용은 https://docs.python.org/3/library/urllib.request.html#urllib.request.urlopen의 메모 참조). .

Python 3을 사용하는 경우 다음 질문에서 Martin Thoma 또는 innm의 답변을 참조하십시오.
https://stackoverflow.com/a/28040508/158111(Python 2/3 compat)
https://stackoverflow.com/a/45886824 / 158111 (Python 3)

또는 http://docs.python-requests.org/en/latest/ 여기에서이 라이브러리를 가져 와서 진지하게 사용하십시오 🙂

import requests

link = "http://www.somesite.com/details.pl?urn=2344"
f = requests.get(link)
print(f.text)

Question 3

를 들어 python3사용자, 시간을 절약 다음 코드를 사용하는,

from urllib.request import urlopen

link = "https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html"

f = urlopen(link)
myfile = f.read()
print(myfile)

오류에 대한 다른 스레드가 있음을 알고 Name Error: urlopen is not defined있지만 시간을 절약 할 수 있다고 생각했습니다.

Question 4

Python 2.X 및 Python 3.X에서 작동하는 솔루션은 Python 2 및 3 호환성 라이브러리를 사용합니다 six.

from six.moves.urllib.request import urlopen
link = "http://www.somesite.com/details.pl?urn=2344"
response = urlopen(link)
content = response.read()
print(content)

Question 5

이 답변 중 어느 것도 Python 3에 적합하지 않습니다 (이 게시물 당시 최신 버전에서 테스트 됨).

이것이 당신이하는 방법입니다 …

import urllib.request

try:
   with urllib.request.urlopen('http://www.python.org/') as f:
      print(f.read().decode('utf-8'))
except urllib.error.URLError as e:
   print(e.reason)

위 내용은 ‘utf-8’을 반환하는 내용입니다. 파이썬이 “적절한 인코딩을 추측”하게하려면 .decode ( ‘utf-8’)를 제거하십시오.

문서 :
https://docs.python.org/3/library/urllib.request.html#module-urllib.request

Question 6

다음과 같이 웹 사이트 html 내용을 읽을 수 있습니다.

from urllib.request import urlopen
response = urlopen('http://google.com/')
html = response.read()
print(html)

Question 7

#!/usr/bin/python
# -*- coding: utf-8 -*-
# Works on python 3 and python 2.
# when server knows where the request is coming from.

import sys

if sys.version_info[0] == 3:
    from urllib.request import urlopen
else:
    from urllib import urlopen
with urlopen('https://www.facebook.com/') as \
    url:
    data = url.read()

print data

# When the server does not know where the request is coming from.
# Works on python 3.

import urllib.request

user_agent = \
    'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.7) Gecko/2009021910 Firefox/3.0.7'

url = 'https://www.facebook.com/'
headers = {'User-Agent': user_agent}

request = urllib.request.Request(url, None, headers)
response = urllib.request.urlopen(request)
data = response.read()
print data

Question 8

URL은 문자열이어야합니다.

import urllib

link = "http://www.somesite.com/details.pl?urn=2344"
f = urllib.urlopen(link)
myfile = f.readline()
print myfile