Python¿âÏê½âÖ®ÍøÂç(2)
×òÌìÊÔÁËÏÂÓÃHTMLParserÀàÀ´½âÎöÍøÒ³£¬¿É·¢ÏÖ½á¹û²¢²»ÀíÏë¡£²»¹ÜÔõô˵£¬ÏÈдϹý³Ì£¬Ï£ÍûºóÀ´ÈËÄÜÔÚ´Ë»ù´¡ÉϽâ¾öÎÒËùÓöµ½µÄÎÊÌâ¡£
дÁË2Ì×½â¾ö·½°¸£¬µ±È»Õâ2Ì×Ö»ÄܶÔÌض¨ÍøÕ¾ÓÐЧ¡£ÎÒÕâÀïÖ÷Ҫ˵Ã÷϶ÔBBCÖ÷Ò³www.bbc.co.ukºÍ¶ÔÍøÒ×www.163.comµÄ½âÎö¡£
¶ÔÓÚBBC£º
ÕâÌ×Òª¼òµ¥µÃ¶à£¬¿ÉÄÜÊǸÃÍøÒ³µÄ±àÂë±È½Ï±ê×¼°É
import html.parser
import urllib.request
class parseHtml(html.parser.HTMLParser):
def handle_starttag(self, tag, attrs):
print("Encountered a {} start tag".format(tag))
def handle_endtag(self, tag):
print("Encountered a {} end tag".format(tag))
def handle_charref(self,name):
print("charref")
def handle_entityref(self,name):
print("endtiyref")
def handle_data(self,data):
print("data")
def handle_comment(self,data):
print("comment")
def handle_decl(self,decl):
print("decl")
def handle_pi(self,decl):
print("pi")
#´ÓÕâÀ↑ʼ¿´Æð£¬ÉÏÃæÄǸö¼Ì³ÐºÜ¼òµ¥£¬È«²¿ÖØÔظ¸ÀຯÊý
#ÒÔ¶þ½øÖÆдµÄ·½Ê½´æ´¢BBCÍøÒ³£¬ÕâÊÇÉÏƪÄÚÈÝ(http://blog.csdn.net/xiadasong007/archive/2009/09/03/4516683.aspx),²»×¸Êö
file=open("bbc.html",'wb') #it's 'wb',not 'w'
url=urllib.request.urlopen("http://www.bbc.co.uk/")
while(1):
line=url.readline()
if len(line)==0:
break
file.write(line)
#Éú³ÉÒ»¸ö¶ÔÏó
pht=parseHtml()
#¶ÔÓÚÕâ¸öÍøÕ¾£¬ÎÒʹÓÃ'utf-8'´ò¿ª£¬·ñÔò»á³ö´í£¬ÆäËûÍøÕ¾¿ÉÄܾͲ»ÐèÒª£¬utf-8ÊÇUNICODE±àÂë
file=open("bbc.html",encoding='utf-8',mode='r')
#´¦ÀíÍøÒ³£¬feed
while(1):
line=
Ïà¹ØÎĵµ£º
ѧϰPythonµÄµÀ·ÂþÂþ£¬¹â¿´²»Á·±È½ÏÎÞÁÄ¡£
ÕÒÁ˸öÍøÒ³£¬ÉÏÃæÓм¸µÀÏ°Ì⣬ÎÞÁÄÖ®ÓàÄÃÀ´Á·ÊÖ£¬»¹ÓÐЩÀÖȤ¡£
ÊÇÕâÀhttp://www.cnblogs.com/belaliu/archive/2006/11/25/572140.html
×¢£ºÏ°ÌâºóÃæÌùµÄ´úÂë²»Ò»¶¨ÊÇ×îÓŵġ£
´ó²¿·Ö±È½ÏºÃ½â¾ö£¬ÓеãÄѶȵÄÊǵÚ4Ìâ×öÈ¥³ý×Ö·û´®ÄڵĿոñµÄ²Ù×÷¡£
ÕÒÁËÍøÉϵĽâ¾ö·½°¸£¬ÓÐÕâ ......
filename=raw_input('enter file name:')
f=open(filename,'rb')
f.seek(0,0)
index=0
for i in range(0,16):
print "%3s" % hex(i) ,
print
for i in range(0,16):
print "%-3s" % "#" ,
print
while True:
temp=f.read(1)
if len(temp) == 0:
break
else:
print "%3s" % temp.encode('hex'),
......
´úÂëÖвÉÓÃÁËÈý²½ÊµÏÖËãÊõ±í´ïʽµÄ½âÎö:
1. ½«ËãÊõ±í´ïʽ(×Ö·û´®)ת»»³ÉÒ»¸öÁбíparseElement·½·¨
2. ½«Áбí±íʾµÄËãÊõ±í´ïʽת»»³Éºó׺±í´ïʽchangeToSuffix
3. ¼ÆËãºó׺±í´ïʽµÄ½á¹û
ÕâÀïÎÒÊÇΪÁË·½±ã, ¾ÍдÁ˸öparseElement, ²»ÏëÄÇ·½·¨Ð´µ½ºóÃæÈ´°Ñ×Ô¼ºÈÆסÁË, ¿ÉÒÔÏëÏóÒ»¸ö´ø×ÔÔö, λ, Âß¼, ËãÊõµÄ±í´ïʽµÄÊýÖµÌá ......
1. ʼþÇý¶¯
Ò»¸öʼþ¼°Æä»Øµ÷µÄÀý×ÓÊÇÊó±êÒƶ¯¡£ÎÒÃǼÙÉèÊó±êÖ¸ÕëÍ£ÔÚÄúGUI ³ÌÐòµÄij´¦¡£Èç¹ûÊó±ê±»ÒƵ½Á˳ÌÐòµÄ±ð´¦£¬Ò»¶¨ÊÇÓÐʲô¶«Î÷ÒýÆðÁËÆÁÄ»ÉÏÖ¸ÕëµÄÒƶ¯£¬´Ó¶ø±íÏÖÕâÖÖλÖõÄתÒÆ¡£ÏµÍ³±ØÐë´¦ÀíÕâЩÊó±êÒƶ¯Ê¼þ²ÅÄÜÕ¹ÏÖ£¨²¢ÊµÏÖ£©Êó±êÔÚ´°¿ÚÉϵÄÒƶ¯¡£Ò»µ©ÄúÊÍ·ÅÁËÊó±ê£¬¾Í²»ÔÙ»áÓÐʼþÐèÒª´¦ ......
¹¤ÓûÉÆÆäʱØÏÈÀûÆäÆ÷£¡
¿ª·¢PythonÓÃʲô¹¤¾ßºÃÄØ£¿Æäʵ¸ÕѧPythonµÄ»°£¬Ê¹ÓÃIDLE¾Í¹»ÁË£¬ËäÈ»µ÷ÊÔ²»ÊÇÌر𷽱㣬µ«ÊǶÔÓÚ³õѧÒѾ¹»ÁË£¬¿ÉÒÔʹÓÃPrint½øÐмòµ¥µÄµ÷ÊÔ£¬²»½¨ÒéʹÓüÇʱ¾½øÐпª·¢£¬²»ÖªµÀµÄÈÏΪÄãºÜÅ££¬ÖªµÀµÄ……ÕâÊÇ×Ô¼º¸ø×Ô¼ºÕÒ×ïÊÜ£¬ÓÃEditplusÃ²Ë ......