html unicode±àÂëת»»·½·¨
¶ÔÓÚ"&# 24038;&# 36793;"ÕâÖÖ&#¿ªÊ¼µÄ×Ö·û£¬Ó¦¸ÃΪhtml unicode±àÂëÀàÐÍ£¬½âÂë·½·¨ÈçÏ£º
s="&# 24038;& # 36793;"
s="×ó±ß"
import re
_=re.compile('&#(x)?([0-9a-fA-F]+);')
to_str=lambda s,charset='utf-8':_.sub(lambda result:unichr(int(result.group(2),result.group(1)=='x' and 16 or 10)).encode(charset) ,s)
print to_str(s)
Ïà¹ØÎĵµ£º
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>Test Page</title>
</head>
<body>
<table border="1px">
......
<html>
<head>
<title>text-font</title>
</head>
<body>
************************<font size="7" color="red">±êÌâÕ½Ú</font>*************************<br>
Õý³£Îı¾
<h1>Ò»¼¶±êÌâ</h1>
<h2 align=righ ......
HTML×Ö·ûʵÌå(Character Entities)
ÓÐЩ×Ö·ûÔÚHTMLÀïÓÐÌرðµÄº¬Ò壬±ÈÈçСÓÚºÅ<¾Í±íʾHTML TagµÄ¿ªÊ¼£¬Õâ¸öСÓÚºÅÊDz»ÏÔʾÔÚÎÒÃÇ×îÖÕ¿´µ½µÄÍøÒ³ÀïµÄ¡£ÄÇÈç¹ûÎÒÃÇÏ£ÍûÔÚÍøÒ³ÖÐÏÔʾһ¸öСÓںţ¬¸ÃÔõô°ìÄØ£¿
Õâ¾ÍҪ˵µ½HTML×Ö·ûʵÌå(HTML Character Entities)ÁË¡£
Ò»¸ö×Ö·ûʵÌå(Character Entity)·Ö³ÉÈý²¿·Ö£ºµÚÒ»²¿· ......
ʹÓÃObjective-C½âÎöHTML»òÕßXML£¬ÏµÍ³×Ô´øÓÐÁ½ÖÖ·½Ê½Ò»¸öÊÇͨ¹ýlibxml£¬Ò»¸öÊÇͨ¹ýNSXMLParser¡£²»¹ýÕâÁ½ÖÖ·½Ê½¶¼ÐèÒª×Ô¼ºÐ´ºÜ¶à±àÂëÀ´´¦ÀíץȡÏÂÀ´µÄÄÚÈÝ£¬¶øÇÒ²»ÊǺÜÖ±¹Û¡£
ÓÐÒ»¸ö±È½ÏºÃµÄÀà¿âhpple£¬ËüÊÇÒ»¸öÇáÁ¿¼¶µÄ°ü×°¿ò¼Ü£¬¿ÉÒԺܺõĽâ¾öÕâ¸öÎÊÌâ¡£ËüÊÇÓÃXPathÀ´¶¨Î»ºÍ½âÎöHTML»òÕßXML¡£
°²×°²½Ö裺
-¼ÓÈë libx ......
ÈçºÎÓÐÒ»¸ö×Ö·û´®ÊÇÕâÑùµÄÐÎʽstr = "&bbbLAA";
ÏëµÃµ½"L"µÄ»°¿ÉÒÔÕâÑùȥʵÏÖ£º
//sDataStr = "&bbbLAA";
//sLeftQuote = ""&bbb";
//sRightQuote = "&AA";
µ÷ÓÃÕâ¸ö·½·¨½«µÃµ½L×ֶΡ£
function abCutString( sDataStr, sLeftQuote, sRightQuote)
{
var sReturnVal = '';
var nStart ......