html unicode±àÂëת»»·½·¨
¶ÔÓÚ"&# 24038;&# 36793;"ÕâÖÖ&#¿ªÊ¼µÄ×Ö·û£¬Ó¦¸ÃΪhtml unicode±àÂëÀàÐÍ£¬½âÂë·½·¨ÈçÏ£º
s="&# 24038;& # 36793;"
s="×ó±ß"
import re
_=re.compile('&#(x)?([0-9a-fA-F]+);')
to_str=lambda s,charset='utf-8':_.sub(lambda result:unichr(int(result.group(2),result.group(1)=='x' and 16 or 10)).encode(charset) ,s)
print to_str(s)
Ïà¹ØÎĵµ£º
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>Test Page</title>
</head>
<body>
<table border="1px">
......
map
:¶¨ÒåÒ»¸ö¿Í»§¶ËͼÏñÓ³É䡣ͼÏñÓ³É䣨image-map£©Ö¸´øÓпɵã»÷ÇøÓòµÄÒ»·ùͼÏñ¡£
ÊôÐÔ£º
name£º Ϊ image-map ¹æ¶¨µÄÃû³Æ¡£
  ......
´¿Îı¾»¹ÊÇHTML?
---ÄÄÒ»ÖÖÓʼþÀàÐ͸üÊʺÏÄ㣿
ÒýÑÔ
Èç¹ûÄãÕý×¼±¸Æô¶¯Ò»ÏîÓʼþÓªÏú¼Æ»®£¬µ«²»È·¶¨ÊǸÃÓÃͼÎIJ¢Ã¯µÄHTMLÓʼþÀ´ÌáÉýÓʼþµÄÊÓ¾õÌåÑ飬»¹ÊÇÓô¿Îı¾µÄÓʼþÀ´Ìá¸ßÓʼþµÄËÍ´ïÂÊ£¨²¢½ÚÊ¡×ÊÔ´£©£¬Comm100½«Í¨¹ý±¾ÎÄΪÄãÁоÙÕâÁ½ÖÖÓʼþÀàÐ͸÷×ÔµÄÓÅÁÓÊÆ£¬²¢½ÌÄãÈçºÎͨ¹ýÄ£°åÀàÐͺÍÏÔʾЧ¹ûÀ´ÓÅ»¯ÄãµÄÓʼþÓªÏú¼Æ»®¡ ......
ʹÓÃObjective-C½âÎöHTML»òÕßXML£¬ÏµÍ³×Ô´øÓÐÁ½ÖÖ·½Ê½Ò»¸öÊÇͨ¹ýlibxml£¬Ò»¸öÊÇͨ¹ýNSXMLParser¡£²»¹ýÕâÁ½ÖÖ·½Ê½¶¼ÐèÒª×Ô¼ºÐ´ºÜ¶à±àÂëÀ´´¦ÀíץȡÏÂÀ´µÄÄÚÈÝ£¬¶øÇÒ²»ÊǺÜÖ±¹Û¡£
ÓÐÒ»¸ö±È½ÏºÃµÄÀà¿âhpple£¬ËüÊÇÒ»¸öÇáÁ¿¼¶µÄ°ü×°¿ò¼Ü£¬¿ÉÒԺܺõĽâ¾öÕâ¸öÎÊÌâ¡£ËüÊÇÓÃXPathÀ´¶¨Î»ºÍ½âÎöHTML»òÕßXML¡£
°²×°²½Ö裺
-¼ÓÈë libx ......
ǰ¼¸Ìì×öÏîÄ¿¡£ÐèÒªÓõ½Ò»¸öWinFormµÄHTMLµÄ±à¼ºÍÏÔʾ¿Ø¼þ¡£.NET×Ô¼º²¢Ã»ÓÐÌṩÕâ·½ÃæµÄ¿Ø¼þ¡£È¥Googel°Ù¶ÈÁËһϡ£Ã»ÓÐÕÒµ½ºÏÊʵÄ.NET¿Ø¼þ¡£ÎÞÄÎÈ¥Ó¢ÎÄGoogelÁËһϡ£¹ûÈ»·¢ÏÖÁËÒ»¿îÃûΪ£º.NET Win HTML Editor Control 3.2µÄ¿Ø¼þ¡£ÏÂÔØÅäÖû·¾³ÊÔÓ᣷¢ÏÖÃâ·Ñ°æÌṩȫ¹¦ÄÜÊÔÓá£Î¨Ò»²»ºÃµÄµØ·½¾ÍÊÇÔÚ±à¼ÇøÓÐÒ»¸ö×¢²áµÄÁ ......