관리 메뉴

개발 노트

파이썬을 이용한 크롤링 연습 소스코드 20/7/15 본문

인턴 앱 개발 : 20.07.06~08.31/Crawling : Python

파이썬을 이용한 크롤링 연습 소스코드 20/7/15

hayoung.dev 2020. 8. 7. 15:47
In [1] : from bs4 import BeautifulSoup

In [2] : reading_file = open('C:\\Users\HM4\Desktop\장하영\covid19healthbot.cdc.gov.html', 'r', encoding='UTF8')

In [3] : reading_file = open('C:\\Users\HM4\Desktop\장하영\covid19healthbot.cdc.gov.html', 'r', encoding='UTF8')
soup = BeautifulSoup(reading_file, 'html.parser')
print(soup.prettify())

In [4] : list(soup.children)
Out [4] : ['html',
 '\n',
 ' saved from url=(0048)https://covid19healthbot.cdc.gov/?language=en-us ',
 '\n',
 <html><head><meta content="text/html; charset=utf-8" http-equiv="Content-Type"/>
 <meta content="IE=Edge" http-equiv="X-UA-Compatible"/>
 <script src="./covid19healthbot.cdc.gov_files/jquery.min.js.다운로드"></script>
 <script src="./covid19healthbot.cdc.gov_files/webchat-es5.gzip.js.다운로드"></script><style data-glamor="" type="text/css"></style><meta content="version=2.0.2" name="react-film"/><meta content="version=6.0.0" name="web-speech-cognitive-services"/><meta content="4.8.1" name="botframework-directlinespeech:version"/><meta content="full-es5" name="botframework-webchat:bundle:variant"/><meta content="4.8.1" name="botframework-webchat:bundle:version"/><meta content="4.8.1" name="botframework-webchat:core:version"/><meta content="4.8.1" name="botframework-webchat:ui:version"/>
 <script src="./covid19healthbot.cdc.gov_files/index.js.다운로드"></script>
 <link href="./covid19healthbot.cdc.gov_files/style.css" rel="stylesheet"/>
 </head>
 <body>
 <div class="en-us" id="webchat" role="main"><div class="css-1t62idy css-990gl9" role="complementary" style="outline: 0px;" tabindex="-1"><div aria-labelledby="webchat__toaster__header__tg8rl" aria-live="polite" aria-relevant="additions text" class="css-sph49o css-6wwnjx webchat__toaster" role="log"><ul aria-labelledby="webchat__toaster__header__tg8rl" class="webchat__toaster__list" id="webchat__toaster__list__puvsu" role="region"></ul></div><div class="css-gtdio3 css-mfy564" dir="ltr" role="log"><div class="css-y1c0xs css-ca0rlf"><div aria-hidden="true" class="css-mfy564"></div><ul aria-atomic="false" aria-live="polite" aria-relevant="additions" class="css-dhu3ty css-7c9av6" role="list"><li aria-label=" " class="css-1qyo5rb" role="listitem"></li><li aria-label=" " class="css-1qyo5rb" role="listitem"><div class="css-hls04x css-10xzw44 webchat__stacked_indented_content" role="group"><span aria-label=" " class="css-9ohtah">Bot CDC said, The purpose of the Coronavirus Self-Checker is to help you make decisions about seeking appropriate medical care.  This system is not intended for the diagnosis or treatment of disease or other conditions, including COVID-19.   This system is intended only for adults who are 18 years and older and currently located in the United States. 
 
 This project was made possible through a partnership with the CDC Foundation and is enabled by Microsoft’s Azure platform. CDC’s collaboration with a non-federal organization does not imply an endorsement of any one particular service, product, or enterprise.  
 
 ᵥₑᵣ₆₂ ₍₂₀₂₀₋₀₇₋₁₅₎
 . Sent at July 16 at 10:00 AM.</span><div class="webchat__stackedLayout__avatar"><div aria-hidden="true" class="css-1aivo0e webchat__defaultAvatar css-2bf20l"><div class="css-yb0hx9 webchat__initialsAvatar css-f25c5w"><div class="webchat__initialsAvatar__initials">CDC</div></div><div class="css-nzg3w0 webchat__imageAvatar css-12jrzs"><div class="css-1tdb3h1 webchat__imageAvatar__image" style="height: 100%; width: 100%;"><img alt="" src="./covid19healthbot.cdc.gov_files/cdcLogo.svg"/></div></div></div></div><div class="webchat__stackedLayout__content"><div aria-hidden="true" class="webchat__row message"><div class="css-1j843a5 css-ageddn bubble"><div class="webchat__bubble__content"><span aria-label=" " class="css-9ohtah">The purpose of the Coronavirus Self-Checker is to help you make decisions about seeking appropriate medical care.  This system is not intended for the diagnosis or treatment of disease or other conditions, including COVID-19.   This system is intended only for adults who are 18 years and older and currently located in the United States. 
 
 This project was made possible through a partnership with the CDC Foundation and is enabled by Microsoft’s Azure platform. CDC’s collaboration with a non-federal organization does not imply an endorsement of any one particular service, product, or enterprise.  
 
 ᵥₑᵣ₆₂ ₍₂₀₂₀₋₀₇₋₁₅₎
 </span><div aria-hidden="true" class="markdown css-1b7yvbl"><p>The purpose of the Coronavirus Self-Checker is to help you make decisions about seeking appropriate medical care.  This system is not intended for the diagnosis or treatment of disease or other conditions, including COVID-19.   This system is intended only for adults who are 18 years and older and currently located in the United States.</p>
 <p>This project was made possible through a partnership with the CDC Foundation and is enabled by Microsoft’s Azure platform. CDC’s collaboration with a non-federal organization does not imply an endorsement of any one particular service, product, or enterprise.</p>
 <p>ᵥₑᵣ₆₂ ₍₂₀₂₀₋₀₇₋₁₅₎</p>
 </div></div></div><div class="filler"></div></div><div aria-label=" " class="webchat__row attachment"><span aria-label=" " class="css-9ohtah">Bot sent</span><div class="css-1j843a5 css-ageddn attachment bubble"><div class="webchat__bubble__content"><div class="css-19keqwu"><div class="ac-container ac-adaptiveCard" style="display: flex; flex-direction: column; justify-content: flex-start; box-sizing: border-box; flex: 0 0 auto; padding: 15px; margin: 0px;" tabindex="0"><div class="ac-container" style="display: flex; flex-direction: column; justify-content: flex-start; box-sizing: border-box; flex: 0 0 auto; padding: 0px; margin: 0px;"></div><div class="ac-horizontal-separator" style="height: 8px; overflow: hidden;"></div><div><div style="overflow: hidden;"><div class="ac-actionSet" style="display: flex; flex-direction: column; align-items: stretch;"><button aria-label="I agree" class="ac-pushButton style-default" style="display: flex; align-items: center; justify-content: center; flex: 0 1 auto;" type="button"><div style="overflow: hidden; text-overflow: ellipsis; white-space: nowrap;">I agree</div></button><div style="height: 8px;"></div><button aria-label="I don't agree" class="ac-pushButton style-default" style="display: flex; align-items: center; justify-content: center; flex: 0 1 auto;" type="button"><div style="overflow: hidden; text-overflow: ellipsis; white-space: nowrap;">I don't agree</div></button></div></div><div></div></div></div></div></div></div></div><div class="webchat__row"><span aria-hidden="true" class="css-1s8geyi"><span aria-label=" " class="css-9ohtah">Sent at July 16 at 10:00 AM</span><span aria-hidden="true">2 minutes ago</span></span><div aria-hidden="true" class="filler"></div></div></div><div aria-hidden="true" class="filler"></div></div></li></ul></div></div><span aria-label=" " class="css-9ohtah">Connectivity Status: Connected</span></div></div>
 <script>requestChatBot();</script>
 <script src="./covid19healthbot.cdc.gov_files/topic_levels.js.다운로드"></script>
 <script src="./covid19healthbot.cdc.gov_files/analytics_cdcgov.js.다운로드"></script>
 <script>
 
 		s.pageName = "Coronavirus Assessment Tool";
 		s.channel = "Coronavirus";
 		siteCatalyst.setLevel1("ncird");
 		siteCatalyst.setLevel2("coronavirus");
 		siteCatalyst.setLevel3("multi");
 		siteCatalyst.setLevel4("dvd");
 		siteCatalyst.setLevel5("2019-nCoV");
 
 		if ('' !== CDC.partnerUrl) {
 			s.referrer = CDC.partnerUrl;
 			s.prop8 = 'Widget';
 		}
 
 		if ('' === CDC.language) {
 			s.prop5 = 'eng';
 		} else if ('es' === CDC.language || 0 === CDC.language.indexOf('es-')) {
 			s.prop5 = 'spa';
 		} else if ('ko' === CDC.language || 0 === CDC.language.indexOf('ko-')) {
 			s.prop5 = 'kor';
 		} else if ('vi' === CDC.language || 0 === CDC.language.indexOf('vi-')) {
 			s.prop5 = 'vie';
 		} else if ('zh' === CDC.language || 0 === CDC.language.indexOf('zh-')) {
 			s.prop5 = 'chi';
 		} else {
 			s.prop5 = 'eng';
 		}
 
 		// Update the level variables here.
 		updateVariables(s);
 
 		var s_code = s.t();
 		if (s_code) { document.write(s_code); }
 
 	</script>
 </body></html>]
 
In [5] : html = list(soup.children)[2]
html
Out [5] : ' saved from url=(0048)https://covid19healthbot.cdc.gov/?language=en-us '

In [6] : soup.body
Ont [6] : <body>
<div class="en-us" id="webchat" role="main"><div class="css-1t62idy css-990gl9" role="complementary" style="outline: 0px;" tabindex="-1"><div aria-labelledby="webchat__toaster__header__tg8rl" aria-live="polite" aria-relevant="additions text" class="css-sph49o css-6wwnjx webchat__toaster" role="log"><ul aria-labelledby="webchat__toaster__header__tg8rl" class="webchat__toaster__list" id="webchat__toaster__list__puvsu" role="region"></ul></div><div class="css-gtdio3 css-mfy564" dir="ltr" role="log"><div class="css-y1c0xs css-ca0rlf"><div aria-hidden="true" class="css-mfy564"></div><ul aria-atomic="false" aria-live="polite" aria-relevant="additions" class="css-dhu3ty css-7c9av6" role="list"><li aria-label=" " class="css-1qyo5rb" role="listitem"></li><li aria-label=" " class="css-1qyo5rb" role="listitem"><div class="css-hls04x css-10xzw44 webchat__stacked_indented_content" role="group"><span aria-label=" " class="css-9ohtah">Bot CDC said, The purpose of the Coronavirus Self-Checker is to help you make decisions about seeking appropriate medical care.  This system is not intended for the diagnosis or treatment of disease or other conditions, including COVID-19.   This system is intended only for adults who are 18 years and older and currently located in the United States. 

This project was made possible through a partnership with the CDC Foundation and is enabled by Microsoft’s Azure platform. CDC’s collaboration with a non-federal organization does not imply an endorsement of any one particular service, product, or enterprise.  

ᵥₑᵣ₆₂ ₍₂₀₂₀₋₀₇₋₁₅₎
. Sent at July 16 at 10:00 AM.</span><div class="webchat__stackedLayout__avatar"><div aria-hidden="true" class="css-1aivo0e webchat__defaultAvatar css-2bf20l"><div class="css-yb0hx9 webchat__initialsAvatar css-f25c5w"><div class="webchat__initialsAvatar__initials">CDC</div></div><div class="css-nzg3w0 webchat__imageAvatar css-12jrzs"><div class="css-1tdb3h1 webchat__imageAvatar__image" style="height: 100%; width: 100%;"><img alt="" src="./covid19healthbot.cdc.gov_files/cdcLogo.svg"/></div></div></div></div><div class="webchat__stackedLayout__content"><div aria-hidden="true" class="webchat__row message"><div class="css-1j843a5 css-ageddn bubble"><div class="webchat__bubble__content"><span aria-label=" " class="css-9ohtah">The purpose of the Coronavirus Self-Checker is to help you make decisions about seeking appropriate medical care.  This system is not intended for the diagnosis or treatment of disease or other conditions, including COVID-19.   This system is intended only for adults who are 18 years and older and currently located in the United States. 

This project was made possible through a partnership with the CDC Foundation and is enabled by Microsoft’s Azure platform. CDC’s collaboration with a non-federal organization does not imply an endorsement of any one particular service, product, or enterprise.  

ᵥₑᵣ₆₂ ₍₂₀₂₀₋₀₇₋₁₅₎
</span><div aria-hidden="true" class="markdown css-1b7yvbl"><p>The purpose of the Coronavirus Self-Checker is to help you make decisions about seeking appropriate medical care.  This system is not intended for the diagnosis or treatment of disease or other conditions, including COVID-19.   This system is intended only for adults who are 18 years and older and currently located in the United States.</p>
<p>This project was made possible through a partnership with the CDC Foundation and is enabled by Microsoft’s Azure platform. CDC’s collaboration with a non-federal organization does not imply an endorsement of any one particular service, product, or enterprise.</p>
<p>ᵥₑᵣ₆₂ ₍₂₀₂₀₋₀₇₋₁₅₎</p>
</div></div></div><div class="filler"></div></div><div aria-label=" " class="webchat__row attachment"><span aria-label=" " class="css-9ohtah">Bot sent</span><div class="css-1j843a5 css-ageddn attachment bubble"><div class="webchat__bubble__content"><div class="css-19keqwu"><div class="ac-container ac-adaptiveCard" style="display: flex; flex-direction: column; justify-content: flex-start; box-sizing: border-box; flex: 0 0 auto; padding: 15px; margin: 0px;" tabindex="0"><div class="ac-container" style="display: flex; flex-direction: column; justify-content: flex-start; box-sizing: border-box; flex: 0 0 auto; padding: 0px; margin: 0px;"></div><div class="ac-horizontal-separator" style="height: 8px; overflow: hidden;"></div><div><div style="overflow: hidden;"><div class="ac-actionSet" style="display: flex; flex-direction: column; align-items: stretch;"><button aria-label="I agree" class="ac-pushButton style-default" style="display: flex; align-items: center; justify-content: center; flex: 0 1 auto;" type="button"><div style="overflow: hidden; text-overflow: ellipsis; white-space: nowrap;">I agree</div></button><div style="height: 8px;"></div><button aria-label="I don't agree" class="ac-pushButton style-default" style="display: flex; align-items: center; justify-content: center; flex: 0 1 auto;" type="button"><div style="overflow: hidden; text-overflow: ellipsis; white-space: nowrap;">I don't agree</div></button></div></div><div></div></div></div></div></div></div></div><div class="webchat__row"><span aria-hidden="true" class="css-1s8geyi"><span aria-label=" " class="css-9ohtah">Sent at July 16 at 10:00 AM</span><span aria-hidden="true">2 minutes ago</span></span><div aria-hidden="true" class="filler"></div></div></div><div aria-hidden="true" class="filler"></div></div></li></ul></div></div><span aria-label=" " class="css-9ohtah">Connectivity Status: Connected</span></div></div>
<script>requestChatBot();</script>
<script src="./covid19healthbot.cdc.gov_files/topic_levels.js.다운로드"></script>
<script src="./covid19healthbot.cdc.gov_files/analytics_cdcgov.js.다운로드"></script>
<script>

		s.pageName = "Coronavirus Assessment Tool";
		s.channel = "Coronavirus";
		siteCatalyst.setLevel1("ncird");
		siteCatalyst.setLevel2("coronavirus");
		siteCatalyst.setLevel3("multi");
		siteCatalyst.setLevel4("dvd");
		siteCatalyst.setLevel5("2019-nCoV");

		if ('' !== CDC.partnerUrl) {
			s.referrer = CDC.partnerUrl;
			s.prop8 = 'Widget';
		}

		if ('' === CDC.language) {
			s.prop5 = 'eng';
		} else if ('es' === CDC.language || 0 === CDC.language.indexOf('es-')) {
			s.prop5 = 'spa';
		} else if ('ko' === CDC.language || 0 === CDC.language.indexOf('ko-')) {
			s.prop5 = 'kor';
		} else if ('vi' === CDC.language || 0 === CDC.language.indexOf('vi-')) {
			s.prop5 = 'vie';
		} else if ('zh' === CDC.language || 0 === CDC.language.indexOf('zh-')) {
			s.prop5 = 'chi';
		} else {
			s.prop5 = 'eng';
		}

		// Update the level variables here.
		updateVariables(s);

		var s_code = s.t();
		if (s_code) { document.write(s_code); }

	</script>
</body>

 

 

반응형