일 | 월 | 화 | 수 | 목 | 금 | 토 |
---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | 5 | ||
6 | 7 | 8 | 9 | 10 | 11 | 12 |
13 | 14 | 15 | 16 | 17 | 18 | 19 |
20 | 21 | 22 | 23 | 24 | 25 | 26 |
27 | 28 | 29 | 30 |
Tags
- vscode
- csv
- php
- visualstudio code
- cmd
- 오류
- MySQL
- DataGrip
- 단축키
- import data
- OrCAD 다운로드
- 데이터베이스
- run sql script
- github token
- 파이썬
- database
- error
- PHPStorm
- 클론
- console창
- 깃 토큰
- github clone
- localhost
- 에러
- Python
- 따옴표 삭제
- Visual Studio Code
- jupyter
- error 해결
- clone
Archives
- Today
- Total
개발 노트
파이썬을 이용한 크롤링 연습 소스코드 20/7/15 본문
인턴 앱 개발 : 20.07.06~08.31/Crawling : Python
파이썬을 이용한 크롤링 연습 소스코드 20/7/15
hayoung.dev 2020. 8. 7. 15:47In [1] : from bs4 import BeautifulSoup
In [2] : reading_file = open('C:\\Users\HM4\Desktop\장하영\covid19healthbot.cdc.gov.html', 'r', encoding='UTF8')
In [3] : reading_file = open('C:\\Users\HM4\Desktop\장하영\covid19healthbot.cdc.gov.html', 'r', encoding='UTF8')
soup = BeautifulSoup(reading_file, 'html.parser')
print(soup.prettify())
In [4] : list(soup.children)
Out [4] : ['html',
'\n',
' saved from url=(0048)https://covid19healthbot.cdc.gov/?language=en-us ',
'\n',
<html><head><meta content="text/html; charset=utf-8" http-equiv="Content-Type"/>
<meta content="IE=Edge" http-equiv="X-UA-Compatible"/>
<script src="./covid19healthbot.cdc.gov_files/jquery.min.js.다운로드"></script>
<script src="./covid19healthbot.cdc.gov_files/webchat-es5.gzip.js.다운로드"></script><style data-glamor="" type="text/css"></style><meta content="version=2.0.2" name="react-film"/><meta content="version=6.0.0" name="web-speech-cognitive-services"/><meta content="4.8.1" name="botframework-directlinespeech:version"/><meta content="full-es5" name="botframework-webchat:bundle:variant"/><meta content="4.8.1" name="botframework-webchat:bundle:version"/><meta content="4.8.1" name="botframework-webchat:core:version"/><meta content="4.8.1" name="botframework-webchat:ui:version"/>
<script src="./covid19healthbot.cdc.gov_files/index.js.다운로드"></script>
<link href="./covid19healthbot.cdc.gov_files/style.css" rel="stylesheet"/>
</head>
<body>
<div class="en-us" id="webchat" role="main"><div class="css-1t62idy css-990gl9" role="complementary" style="outline: 0px;" tabindex="-1"><div aria-labelledby="webchat__toaster__header__tg8rl" aria-live="polite" aria-relevant="additions text" class="css-sph49o css-6wwnjx webchat__toaster" role="log"><ul aria-labelledby="webchat__toaster__header__tg8rl" class="webchat__toaster__list" id="webchat__toaster__list__puvsu" role="region"></ul></div><div class="css-gtdio3 css-mfy564" dir="ltr" role="log"><div class="css-y1c0xs css-ca0rlf"><div aria-hidden="true" class="css-mfy564"></div><ul aria-atomic="false" aria-live="polite" aria-relevant="additions" class="css-dhu3ty css-7c9av6" role="list"><li aria-label=" " class="css-1qyo5rb" role="listitem"></li><li aria-label=" " class="css-1qyo5rb" role="listitem"><div class="css-hls04x css-10xzw44 webchat__stacked_indented_content" role="group"><span aria-label=" " class="css-9ohtah">Bot CDC said, The purpose of the Coronavirus Self-Checker is to help you make decisions about seeking appropriate medical care. This system is not intended for the diagnosis or treatment of disease or other conditions, including COVID-19. This system is intended only for adults who are 18 years and older and currently located in the United States.
This project was made possible through a partnership with the CDC Foundation and is enabled by Microsoft’s Azure platform. CDC’s collaboration with a non-federal organization does not imply an endorsement of any one particular service, product, or enterprise.
ᵥₑᵣ₆₂ ₍₂₀₂₀₋₀₇₋₁₅₎
. Sent at July 16 at 10:00 AM.</span><div class="webchat__stackedLayout__avatar"><div aria-hidden="true" class="css-1aivo0e webchat__defaultAvatar css-2bf20l"><div class="css-yb0hx9 webchat__initialsAvatar css-f25c5w"><div class="webchat__initialsAvatar__initials">CDC</div></div><div class="css-nzg3w0 webchat__imageAvatar css-12jrzs"><div class="css-1tdb3h1 webchat__imageAvatar__image" style="height: 100%; width: 100%;"><img alt="" src="./covid19healthbot.cdc.gov_files/cdcLogo.svg"/></div></div></div></div><div class="webchat__stackedLayout__content"><div aria-hidden="true" class="webchat__row message"><div class="css-1j843a5 css-ageddn bubble"><div class="webchat__bubble__content"><span aria-label=" " class="css-9ohtah">The purpose of the Coronavirus Self-Checker is to help you make decisions about seeking appropriate medical care. This system is not intended for the diagnosis or treatment of disease or other conditions, including COVID-19. This system is intended only for adults who are 18 years and older and currently located in the United States.
This project was made possible through a partnership with the CDC Foundation and is enabled by Microsoft’s Azure platform. CDC’s collaboration with a non-federal organization does not imply an endorsement of any one particular service, product, or enterprise.
ᵥₑᵣ₆₂ ₍₂₀₂₀₋₀₇₋₁₅₎
</span><div aria-hidden="true" class="markdown css-1b7yvbl"><p>The purpose of the Coronavirus Self-Checker is to help you make decisions about seeking appropriate medical care. This system is not intended for the diagnosis or treatment of disease or other conditions, including COVID-19. This system is intended only for adults who are 18 years and older and currently located in the United States.</p>
<p>This project was made possible through a partnership with the CDC Foundation and is enabled by Microsoft’s Azure platform. CDC’s collaboration with a non-federal organization does not imply an endorsement of any one particular service, product, or enterprise.</p>
<p>ᵥₑᵣ₆₂ ₍₂₀₂₀₋₀₇₋₁₅₎</p>
</div></div></div><div class="filler"></div></div><div aria-label=" " class="webchat__row attachment"><span aria-label=" " class="css-9ohtah">Bot sent</span><div class="css-1j843a5 css-ageddn attachment bubble"><div class="webchat__bubble__content"><div class="css-19keqwu"><div class="ac-container ac-adaptiveCard" style="display: flex; flex-direction: column; justify-content: flex-start; box-sizing: border-box; flex: 0 0 auto; padding: 15px; margin: 0px;" tabindex="0"><div class="ac-container" style="display: flex; flex-direction: column; justify-content: flex-start; box-sizing: border-box; flex: 0 0 auto; padding: 0px; margin: 0px;"></div><div class="ac-horizontal-separator" style="height: 8px; overflow: hidden;"></div><div><div style="overflow: hidden;"><div class="ac-actionSet" style="display: flex; flex-direction: column; align-items: stretch;"><button aria-label="I agree" class="ac-pushButton style-default" style="display: flex; align-items: center; justify-content: center; flex: 0 1 auto;" type="button"><div style="overflow: hidden; text-overflow: ellipsis; white-space: nowrap;">I agree</div></button><div style="height: 8px;"></div><button aria-label="I don't agree" class="ac-pushButton style-default" style="display: flex; align-items: center; justify-content: center; flex: 0 1 auto;" type="button"><div style="overflow: hidden; text-overflow: ellipsis; white-space: nowrap;">I don't agree</div></button></div></div><div></div></div></div></div></div></div></div><div class="webchat__row"><span aria-hidden="true" class="css-1s8geyi"><span aria-label=" " class="css-9ohtah">Sent at July 16 at 10:00 AM</span><span aria-hidden="true">2 minutes ago</span></span><div aria-hidden="true" class="filler"></div></div></div><div aria-hidden="true" class="filler"></div></div></li></ul></div></div><span aria-label=" " class="css-9ohtah">Connectivity Status: Connected</span></div></div>
<script>requestChatBot();</script>
<script src="./covid19healthbot.cdc.gov_files/topic_levels.js.다운로드"></script>
<script src="./covid19healthbot.cdc.gov_files/analytics_cdcgov.js.다운로드"></script>
<script>
s.pageName = "Coronavirus Assessment Tool";
s.channel = "Coronavirus";
siteCatalyst.setLevel1("ncird");
siteCatalyst.setLevel2("coronavirus");
siteCatalyst.setLevel3("multi");
siteCatalyst.setLevel4("dvd");
siteCatalyst.setLevel5("2019-nCoV");
if ('' !== CDC.partnerUrl) {
s.referrer = CDC.partnerUrl;
s.prop8 = 'Widget';
}
if ('' === CDC.language) {
s.prop5 = 'eng';
} else if ('es' === CDC.language || 0 === CDC.language.indexOf('es-')) {
s.prop5 = 'spa';
} else if ('ko' === CDC.language || 0 === CDC.language.indexOf('ko-')) {
s.prop5 = 'kor';
} else if ('vi' === CDC.language || 0 === CDC.language.indexOf('vi-')) {
s.prop5 = 'vie';
} else if ('zh' === CDC.language || 0 === CDC.language.indexOf('zh-')) {
s.prop5 = 'chi';
} else {
s.prop5 = 'eng';
}
// Update the level variables here.
updateVariables(s);
var s_code = s.t();
if (s_code) { document.write(s_code); }
</script>
</body></html>]
In [5] : html = list(soup.children)[2]
html
Out [5] : ' saved from url=(0048)https://covid19healthbot.cdc.gov/?language=en-us '
In [6] : soup.body
Ont [6] : <body>
<div class="en-us" id="webchat" role="main"><div class="css-1t62idy css-990gl9" role="complementary" style="outline: 0px;" tabindex="-1"><div aria-labelledby="webchat__toaster__header__tg8rl" aria-live="polite" aria-relevant="additions text" class="css-sph49o css-6wwnjx webchat__toaster" role="log"><ul aria-labelledby="webchat__toaster__header__tg8rl" class="webchat__toaster__list" id="webchat__toaster__list__puvsu" role="region"></ul></div><div class="css-gtdio3 css-mfy564" dir="ltr" role="log"><div class="css-y1c0xs css-ca0rlf"><div aria-hidden="true" class="css-mfy564"></div><ul aria-atomic="false" aria-live="polite" aria-relevant="additions" class="css-dhu3ty css-7c9av6" role="list"><li aria-label=" " class="css-1qyo5rb" role="listitem"></li><li aria-label=" " class="css-1qyo5rb" role="listitem"><div class="css-hls04x css-10xzw44 webchat__stacked_indented_content" role="group"><span aria-label=" " class="css-9ohtah">Bot CDC said, The purpose of the Coronavirus Self-Checker is to help you make decisions about seeking appropriate medical care. This system is not intended for the diagnosis or treatment of disease or other conditions, including COVID-19. This system is intended only for adults who are 18 years and older and currently located in the United States.
This project was made possible through a partnership with the CDC Foundation and is enabled by Microsoft’s Azure platform. CDC’s collaboration with a non-federal organization does not imply an endorsement of any one particular service, product, or enterprise.
ᵥₑᵣ₆₂ ₍₂₀₂₀₋₀₇₋₁₅₎
. Sent at July 16 at 10:00 AM.</span><div class="webchat__stackedLayout__avatar"><div aria-hidden="true" class="css-1aivo0e webchat__defaultAvatar css-2bf20l"><div class="css-yb0hx9 webchat__initialsAvatar css-f25c5w"><div class="webchat__initialsAvatar__initials">CDC</div></div><div class="css-nzg3w0 webchat__imageAvatar css-12jrzs"><div class="css-1tdb3h1 webchat__imageAvatar__image" style="height: 100%; width: 100%;"><img alt="" src="./covid19healthbot.cdc.gov_files/cdcLogo.svg"/></div></div></div></div><div class="webchat__stackedLayout__content"><div aria-hidden="true" class="webchat__row message"><div class="css-1j843a5 css-ageddn bubble"><div class="webchat__bubble__content"><span aria-label=" " class="css-9ohtah">The purpose of the Coronavirus Self-Checker is to help you make decisions about seeking appropriate medical care. This system is not intended for the diagnosis or treatment of disease or other conditions, including COVID-19. This system is intended only for adults who are 18 years and older and currently located in the United States.
This project was made possible through a partnership with the CDC Foundation and is enabled by Microsoft’s Azure platform. CDC’s collaboration with a non-federal organization does not imply an endorsement of any one particular service, product, or enterprise.
ᵥₑᵣ₆₂ ₍₂₀₂₀₋₀₇₋₁₅₎
</span><div aria-hidden="true" class="markdown css-1b7yvbl"><p>The purpose of the Coronavirus Self-Checker is to help you make decisions about seeking appropriate medical care. This system is not intended for the diagnosis or treatment of disease or other conditions, including COVID-19. This system is intended only for adults who are 18 years and older and currently located in the United States.</p>
<p>This project was made possible through a partnership with the CDC Foundation and is enabled by Microsoft’s Azure platform. CDC’s collaboration with a non-federal organization does not imply an endorsement of any one particular service, product, or enterprise.</p>
<p>ᵥₑᵣ₆₂ ₍₂₀₂₀₋₀₇₋₁₅₎</p>
</div></div></div><div class="filler"></div></div><div aria-label=" " class="webchat__row attachment"><span aria-label=" " class="css-9ohtah">Bot sent</span><div class="css-1j843a5 css-ageddn attachment bubble"><div class="webchat__bubble__content"><div class="css-19keqwu"><div class="ac-container ac-adaptiveCard" style="display: flex; flex-direction: column; justify-content: flex-start; box-sizing: border-box; flex: 0 0 auto; padding: 15px; margin: 0px;" tabindex="0"><div class="ac-container" style="display: flex; flex-direction: column; justify-content: flex-start; box-sizing: border-box; flex: 0 0 auto; padding: 0px; margin: 0px;"></div><div class="ac-horizontal-separator" style="height: 8px; overflow: hidden;"></div><div><div style="overflow: hidden;"><div class="ac-actionSet" style="display: flex; flex-direction: column; align-items: stretch;"><button aria-label="I agree" class="ac-pushButton style-default" style="display: flex; align-items: center; justify-content: center; flex: 0 1 auto;" type="button"><div style="overflow: hidden; text-overflow: ellipsis; white-space: nowrap;">I agree</div></button><div style="height: 8px;"></div><button aria-label="I don't agree" class="ac-pushButton style-default" style="display: flex; align-items: center; justify-content: center; flex: 0 1 auto;" type="button"><div style="overflow: hidden; text-overflow: ellipsis; white-space: nowrap;">I don't agree</div></button></div></div><div></div></div></div></div></div></div></div><div class="webchat__row"><span aria-hidden="true" class="css-1s8geyi"><span aria-label=" " class="css-9ohtah">Sent at July 16 at 10:00 AM</span><span aria-hidden="true">2 minutes ago</span></span><div aria-hidden="true" class="filler"></div></div></div><div aria-hidden="true" class="filler"></div></div></li></ul></div></div><span aria-label=" " class="css-9ohtah">Connectivity Status: Connected</span></div></div>
<script>requestChatBot();</script>
<script src="./covid19healthbot.cdc.gov_files/topic_levels.js.다운로드"></script>
<script src="./covid19healthbot.cdc.gov_files/analytics_cdcgov.js.다운로드"></script>
<script>
s.pageName = "Coronavirus Assessment Tool";
s.channel = "Coronavirus";
siteCatalyst.setLevel1("ncird");
siteCatalyst.setLevel2("coronavirus");
siteCatalyst.setLevel3("multi");
siteCatalyst.setLevel4("dvd");
siteCatalyst.setLevel5("2019-nCoV");
if ('' !== CDC.partnerUrl) {
s.referrer = CDC.partnerUrl;
s.prop8 = 'Widget';
}
if ('' === CDC.language) {
s.prop5 = 'eng';
} else if ('es' === CDC.language || 0 === CDC.language.indexOf('es-')) {
s.prop5 = 'spa';
} else if ('ko' === CDC.language || 0 === CDC.language.indexOf('ko-')) {
s.prop5 = 'kor';
} else if ('vi' === CDC.language || 0 === CDC.language.indexOf('vi-')) {
s.prop5 = 'vie';
} else if ('zh' === CDC.language || 0 === CDC.language.indexOf('zh-')) {
s.prop5 = 'chi';
} else {
s.prop5 = 'eng';
}
// Update the level variables here.
updateVariables(s);
var s_code = s.t();
if (s_code) { document.write(s_code); }
</script>
</body>
반응형
'인턴 앱 개발 : 20.07.06~08.31 > Crawling : Python' 카테고리의 다른 글
파이썬 오류 해결 방법 : pip install BeautifulSoup : Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output. (0) | 2020.08.06 |
---|---|
파이썬 기초 사용하기(데이터 반환, 열의 이름 바꾸기) (0) | 2020.08.02 |
파이썬 셀레니움에서 error: unicodeescape \UXXXXXXXX escape 해결법 (0) | 2020.08.02 |
cmd로 python 입력할 때 오류 뜨는 경우 (0) | 2020.08.02 |
파이썬을 이용하여 크롤링(Crawling) 하기 20/7/16 (0) | 2020.08.02 |