html ´óÈ«
Html±êÇ©´óÈ«
<a></a>
³¬Îı¾Á´½Ó
<a
href="URL"></a>
´´½¨³¬Îı¾Á´½Ó ,ÆäÖеÄurlΪÁ´½ÓÄ¿±êµØÖ·
<a
href="mailtEMAIL"></a>
´´½¨×Ô¶¯·¢Ë͵ç×ÓÓʼþµÄÁ´½Ó
<a
name="name"></a>
´´½¨Î»ÓÚÎĵµÄÚ²¿µÄÊéÇ©
<a
href="#name"></a>
´´½¨Ö¸ÏòλÓÚÎĵµÄÚ²¿ÊéÇ©µÄÁ´½Ó
ÆäËûÁ´½Ó±ê¼Ç×¢½â£º
target="..."¾ö¶¨Á´½ÓÔ´ÔÚʲôµØ·½ÏÔʾ(Óû§×Ô¶¨ÒåµÄÃû×Ö£¬_blank,_parent,_self,_top
rel="..."·¢ËÍÁ´½ÓµÄÀàÐÍ
rev="..."±£´æÁ´½ÓµÄÀàÐÍ
accesskey="..."Ö¸¶¨¸ÃÔªËØµÄÈȼü
shape="..."ÔÊÐíÎÒÃÇʹÓÃÒѶ¨ÒåµÄÐÎ×´¶¨Òå¿Í»§¶ËµÄͼÐξµÏñ(default£¬rect£¬circle£¬poly
coord="..."ʹÓÃÏñËØ»òÕß³¤¶È°Ù·Ö±ÈÀ´¶¨ÒåÐÎ×´µÄ³ß´ç
tabindex="..."ʹÓö¨Òå¹ýµÄtabindexÔªËØÉèÖÃÔÚ¸÷¸öÔªËØÖ®¼äµÄ½¹µã»ñȡ˳Ðò(ʹÓÃtab¼üÊ¹ÔªËØ»ñµÃ½¹µã)
-----------------------------------------------------------------------------------------
<ADDRESS></ADDRESS>
µØÖ·±ê¼Ç
<b></b>
´ÖÌå×Ö
<BASEFONT></BASEFONT>
»ù×¼×ÖÌå±ê¼Ç
<big></big>
×ÖÌå¼Ó´ó
<BLOCKQUOTE></BLOCKQUOTE>
ÏòÓÒËõÅÅ
-----------------------------------------------------------------------------------------
<body></body>
Îļþ±¾Ìå
<body bgcolor="">
ÉèÖñ³¾°ÑÕÉ«¡£Ê¹ÓÃÃû×Ö»òRGBµÄÊ®Áù½øÖÆÖµ¡¡
<body background="">
ÉèÖñ³¾°Í¼Æ¬ÖйúÕ¾³¤µÚÒ»ÃÅ»§
<body bgsound=""> ÉèÖñ³¾°ÒôÀÖ
<body bgproperties="fixed">
¹Ì¶¨±³¾°Í¼Æ¬£¨IEÊÊÓã©
<body text="">
ÉèÖÃÎı¾ÑÕÉ«¡£Ê¹ÓÃÃû×Ö»òRGBµÄÊ®Áù½øÖÆÖµ
<body link=""> ÉèÖÃÁ´½ÓÑÕÉ«¡£Ê¹ÓÃÃû
×Ö»òRGBµÄÊ®Áù½øÖÆÖµ
<body vlink="">
ÉèÖÃÒÑʹÓõÄÁ´½ÓµÄÑÕÉ«¡£Ê¹ÓÃÃû×Ö»òRGBµÄÊ®Áù½øÖÆÖµ¡¡
<body alink="">
ÉèÖÃÕýÔÚ±»»÷ÖеÄÁ´½ÓµÄÑÕÉ«¡£Ê¹ÓÃÃû×Ö»òRGBµÄÊ®Áù½øÖÆÖµ
<body topmargin="">
ÉèÖÃÒ³ÃæµÄÉϱ߾à
<body leftmargin=""> ÉèÖÃÒ³ÃæµÄ×ó±ß¾à
----------------------------------
Ïà¹ØÎĵµ£º
1.avi¸ñʽ
´úÂëÆ¬¶ÏÈçÏ£º
<object id="video" width="400" height="200" border="0" classid="clsid:CFCDAA03-8BE4-11cf-B84B-0020AFBBCCFA">
<param name="ShowDisplay" value="0">
<param name="ShowControls" value="1">
<param name="AutoStart" value="1">
<param name="Auto ......
1 ËùÓÐµÄ .java|.jsp|.html|.xml Ô´Îļþ¾ùʹÓÃutf-8±àÂë¸ñʽ±£´æµ½ÏµÍ³´ÅÅÌ¡£
È磺ÔÚEclipseÖбà¼Îļþ£¬Ñ¡ÖÐÎļþ´ò¿ªÓÒ¼ü²Ëµ¥Ñ¡ÔñÊôÐÔ£¬½«Îı¾Îļþ±àÂëÉèÖÃΪÆäËû²¢Ñ¡ÔñUTF-8£»Ò²¿ÉÒÔÔÚ
Eclipse——Ê×Ñ¡Ïî——³£¹æ——ÄÚÈÝÀàÐÍÖÐÉèÖø÷ÖÖÎļþµÄȱʡ±àÂ룬ÕâÑùÒÔºóËùÓеÄÎı¾Îļþ¶¼Ê¹ÓÃÍ³Ò ......
´ó¼Ò¶¼ÖªµÀÔÚtableµÄijһ¸ñÀï²åÈëÎı¾Ê±,Èç¹ûÊÇÖÐÎĵ±È»ÊÇûÓÐÎÊÌâ,µ±µ½´ïÖ¸¶¨¿í¶Èʱ»á×Ô¶¯»»ÐÐ,µ«Èç¹ûÊÇÓ¢ÎÄ»òÊý×ÖÖ®ÀàµÄ¾Í»áÓÐÎÊÌâÁË.ÒòΪËüÊÇÒÔ¿Õ¸ñΪӢÎĵ¥´Ê¼äµÄÇø±ð,µ«ÊÇÈç¹ûÊäÈëÒ»³¤´®Ó¢ÎÄ,Öм䲻º¬¿Õ¸ñʱ,±í¸ñ¾Í±äÐÎÁË.µ«ÊÂʵtableµÄstyleÀïÓÐÒ»Ïî¿ÉÒÔÈÃÓ¢ÎÄÇ¿ÖÆ»»ÐеÄ,¾ÍÊÇword-break,µ±°ÑÕâÏîÉèΪbreak-allʱ´ó¼Ò ......
import urllib
from HTMLParser import HTMLParser
class TitleParser(HTMLParser):
def __init__(self):
self.title = ''
self.divcontent = ''
self.readingtitle = 0
self.readingdiv = 0
HTMLParser.__init__(self)
def handle_starttag(self, tag, attrs):
......
HTML£¨³¬Îı¾±ê¼ÇÓïÑÔ HyperText Markup Language£©
»ù±¾¸ñʽ·¶Àý´úÂ룺
´ò¿ªÈÎÒâÒ»¸ö¿ÉÒÔ±à¼Îı¾ÎļþµÄÈí¼þ£¨ÀýÈçWindows×Ô´øµÄ“¼Çʱ¾”£©£¬ÊäÈëÈçÏ´úÂ룺
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
......