Ò׽ؽØÍ¼Èí¼þ¡¢µ¥Îļþ¡¢Ãâ°²×°¡¢´¿ÂÌÉ«¡¢½ö160KB

html unicode±àÂëת»»·½·¨

¶ÔÓÚ"&# 24038;&# 36793;"ÕâÖÖ&#¿ªÊ¼µÄ×Ö·û£¬Ó¦¸ÃΪhtml unicode±àÂëÀàÐÍ£¬½âÂë·½·¨ÈçÏ£º
s="&#  24038;& # 36793;"
s="×ó±ß"
import re
_=re.compile('&#(x)?([0-9a-fA-F]+);')
to_str=lambda s,charset='utf-8':_.sub(lambda result:unichr(int(result.group(2),result.group(1)=='x' and 16 or 10)).encode(charset) ,s)
print to_str(s)


Ïà¹ØÎĵµ£º

HTML ¼òÊ·

HTML ÊÇ Web ͳһÓïÑÔ£¬ÕâЩÈÝÄÉÔÚ¼âÀ¨ºÅÀïµÄ¼òµ¥±êÇ©£¬¹¹³ÉÁËÈç½ñµÄ Web¡£1991 Ä꣬Tim Berners-Lee
±àдÁËÒ»·Ý½Ð×ö “HTML ±êÇ©”µÄÎĵµ£¬ÀïÃæ°üº¬ÁË´óÔ¼20¸öÓÃÀ´±ê¼ÇÍøÒ³µÄ HTML ±êÇ©¡£ËûÖ±½Ó½èÓà SGML
µÄ±ê¼Ç¸ñʽ£¬Ò²¾ÍÊǺóÀ´ÎÒÃÇ¿´µ½µÄ HTML ±ê¼ÇµÄ¸ñʽ¡£±¾ÎĽ²ÊöÁË HTML ÕâÃÅ Web ±ê¼ÇÓïÑԵķ¢Õ¹¼òÊ·¡£
......

HTMLÌØÊâ×Ö·ûÏÔʾ


HTML×Ö·ûʵÌå(Character Entities)
ÓÐЩ×Ö·ûÔÚHTMLÀïÓÐÌØ±ðµÄº¬Ò壬±ÈÈçСÓÚºÅ<¾Í±íʾHTML TagµÄ¿ªÊ¼£¬Õâ¸öСÓÚºÅÊDz»ÏÔʾÔÚÎÒÃÇ×îÖÕ¿´µ½µÄÍøÒ³ÀïµÄ¡£ÄÇÈç¹ûÎÒÃÇÏ£ÍûÔÚÍøÒ³ÖÐÏÔʾһ¸öСÓںţ¬¸ÃÔõô°ìÄØ£¿
Õâ¾ÍҪ˵µ½HTML×Ö·ûʵÌå(HTML Character Entities)ÁË¡£
Ò»¸ö×Ö·ûʵÌå(Character Entity)·Ö³ÉÈý²¿·Ö£ºµÚÒ»²¿· ......

È¡¶ÔÓ¦ÍøÖ·µÄhtmlÔ´Âë

        System.Net.WebClient wc = new System.Net.WebClient();
        Byte[] pageData = wc.DownloadData("httP://www");
        string s = System.Text.Encoding.Default.GetString(pageData); ......

ÓйØÈ¥µôhtmlÖеÄÁ½¶Ë×Ö·û

ÈçºÎÓÐÒ»¸ö×Ö·û´®ÊÇÕâÑùµÄÐÎʽstr = "&bbbLAA";
ÏëµÃµ½"L"µÄ»°¿ÉÒÔÕâÑùȥʵÏÖ£º
//sDataStr = "&bbbLAA";
//sLeftQuote = ""&bbb";
//sRightQuote = "&AA";
µ÷ÓÃÕâ¸ö·½·¨½«µÃµ½L×ֶΡ£
function abCutString( sDataStr, sLeftQuote, sRightQuote)
{
 var sReturnVal = '';
 var nStart ......

¸Õ¸ÕÕÒ³öÀ´µÄÏà¶Ô׼ȷµÄ²éÕÒHTMLµÄÕýÔò±í´ïʽ

Dim objReg,objMatches,objMatch
Set objReg=new RegExp
objReg.Global=True
objReg.IgnoreCase=True
objReg.Pattern="<('[^']*'|""[^""]*""|[^'"">])*?>"
Set objMatches=objReg.Execute(×Ö·û´®)
For Each objMatch In objMatches
ÕÒµ½µÄHTML £ºobjMatch.value
Next
Set objMatches=Nothing
Set objRe ......
© 2009 ej38.com All Rights Reserved. ¹ØÓÚE½¡ÍøÁªÏµÎÒÃÇ | Õ¾µãµØÍ¼ | ¸ÓICP±¸09004571ºÅ