Java HTML ParserÓ¦ÓÃ
×î½üÒòΪÏîÄ¿ÐèÒª£¬Ñо¿ÁËjava html parserÀà¿âµÄÓ¦Ó᣼ǼÏÂʹÓÃÒªµã£º
Ö÷ÒªµÄÀà˵Ã÷£º
1¡¢ParserÀà
½âÎöÆ÷Ö÷À࣬¸ºÔðÔØÈëHTML´úÂë²¢½âÎö¡£
2¡¢Node½Ó¿Ú
ÓÃÀ´±íÕ÷ÔÚ½âÎö¹ý³ÌÖÐʹÓõÄÓï·¨µ¥Ôª¡£Ê¾ÀýÈç϶Îhtml´úÂ룺
<span> ----Tag node
text ----Text Node
</span>
Îı¾ºÍ±êÇ©¶¼ÊǶÀÁ¢µÄnodeÔªËØ¡£textÎı¾ÊDZêÇ©spanµÄchild node
3¡¢NodeFilter
±êÇ©¹ýÂËÆ÷½Ó¿Ú£¬ÓÃÀ´ÔÚparser»òNodeListÖйýÂ˳öÐèÒªµÄijһÀànode¡£
4¡¢NodeList
Êý¾Ý½á¹¹£¬±íʾNodeµÄ¼¯ºÏ
ÐèÒªÌØ±ð×¢ÒâµÄµØ·½£º
ParserºÍNodeList¶¼ÓÐÒ»¸öÃûΪextractAllNodesThatMatch(NodeFilter filter)µÄ·½·¨ÓÃÀ´¹ýÂ˳ö·ûºÏij¸öÌõ¼þµÄnode£¬µ«ÊÇÆäÄÚ²¿µÄʵÏÖ»úÖÆ²»Í¬¡£
ParserÊÇÔÚ½âÎöÆ÷µÄ¹¦ÄÜ»ù´¡ÉÏʹÓÃIterorʵÏÖ¡£Ã¿´Îµ÷Óø÷½·¨ºóÐèÒªÖ´ÐÐreset·½·¨£¬·ñÔò»áÓ°ÏìÏÂÒ»´Îµ÷ÓõĽá¹û¡£
¶øNodeListÊÇÔÚÄÚ²¿µÄÊý×éÉϽøÐÐÑ»·Åжϣ¬Òò´Ë¸÷´Îµ÷ÓÃÖ®¼ä²»»á»¥ÏàÓ°Ï죬ЧÂÊÒ²±ÈParserµÄ¸ß£¬ÍÁ½¨Ê¹Óá£
´úÂëʾÀý£º
ʵÏÖgetElementByID¹¦ÄÜ
<code>
public class NodeIDFilter implements NodeFilter {
private String id;
public NodeIDFilter(String id)
{
this.id=id;
}
public boolean accept(Node node) {
if(node instanceof Tag)
{
if(!((Tag)node).isEndTag())
{
String s=((Tag)node).getAttribute("id");
if(s!=null)
return s.equals(this.id);
}
}
return false;
// throw new UnsupportedOperationException("Not supported yet.");
}
}
public class MHTMLParser
{
....
protected Node getElementById(String id) throws ParserException
{
//this.myparser.reset();
if(this.mNodeList==null||this.mNodeList.size()==0) return null;
NodeIDFilter nodef = new NodeIDFilter(id);
NodeList nl = this.mNodeList.extractAllNodesThatMatch(nodef,true);
//
if (nl.size() != 0)
{
return nl.elementAt(0);
}
return null;
}
}
</code>
Ïà¹ØÎĵµ£º
ѧjavaÒ²½«½ü¿ìÁ½ÄêµÄʱ¼äÁË£¬Ö®Ç°Ñ§¹ýµÄ¶«Î÷×Ô¼º¸Ð¾õÓеãÄ£ºý£¬ÀíÂÛÕÆÎյIJ»ÊǺÜ͸³¹£¬ÓÐЩÎÊÌâ½â¾öµÄÒ²²»ÊǺÜÈ«Ãæ£¬Îª´ËÔÚ´óѧ±ÏҵǰϦ£¬Ïë°Ñ֪ʶºÃºÃµÄÊáÀíһϣ¬°Ñ×Ô¼º¶Ô¼¼ÊõµÄÒɵãºÍһЩÑо¿ÐĵÃдµ½csdn²©¿ÍÉÏ¡£ ......
//ת×Ôhttp://pterodactyl.javaeye.com/blog/345892
stack ºÍ heep ¶¼ÊÇÄÚ´æµÄÒ»²¿·Ö
stack ¿Õ¼äС£¬ËٶȱȽϿ죬 ÓÃÀ´·Å¶ÔÏóµÄÒýÓÃ
heep ´ó£¬Ò»°ãËùÓд´½¨µÄ¶ÔÏó¶¼·ÅÔÚÕâÀï¡£
Õ»(stack):ÊÇÒ»¸öÏȽøºó³öµÄÊý¾Ý½á¹¹,ͨ³£ÓÃÓÚ±£´æ·½·¨(º¯Êý)ÖеIJÎÊý,¾Ö²¿±äÁ¿.
ÔÚjavaÖÐ,ËùÓлù±¾ÀàÐͺÍÒýÓÃÀàÐͶ¼ÔÚÕ»Öд洢.Õ»ÖÐÊý¾ÝµÄÉ ......
ÔÎÄÀ´×ÔÓÚ¡¶Developing Games in Java¡·£¬×÷ÕߣºDavid Brackeen, Bret Barker, Laurence Vanhelsuwé
JavaÔÚÉè¼ÆÊ±³ä·Ö¿¼ÂÇÁËÏß³Ì,Òò´ËÔÚJavaÖÐÏ̵߳IJÙ×÷±ÈÆäËûÓïÑÔÒª¼òµ¥¡£Ö»ÐèÉú³ÉÒ»¸öThreadÀàµÄʵÀý±ã¿É´´½¨Ò»¸öỊ̈߳¬È»ºóµ÷ÓÃstart()·½·¨Ïß³Ì¾ÍÆô¶¯ÁË¡£
&nbs ......
Ò»¡¢ÎÊÌâµÄ²úÉú
¡¡¡¡ËÑË÷½á¹û¸ßÁÁÏÔʾ£¬ÔÚÐÂÎűêÌ⣬À´Ô´Ö®ÀàµÄµØ·½ºÃ×ö£¬Ö»ÐèÒªÓÃstr.Replace(keyword,"<font style=\"color:red;\"" + keyword +"</font>");ÕâÑùµÄ·½·¨¾Í¿ÉʵÏÖ¡£
¡¡¡¡ÎÊÌâÔÚÓÚ£¬ÔÚÐÂÎÅÄÚÈÝÀï×öËÑË÷¡£ÆäÖÐhtml±êÇ©Àï¿ÉÄܺ¬Óйؼü×Ö£¬ÓÃÉÏÃæÕâÖÖ·½·¨£¬½«»áÌæ»»µôhtml±êÇ©µÄ²¿·ÖÄÚÈÝ£¬µ¼ÖÂÐÂÎÅÄÚ ......
W3C±ê×¼µÄHTML±êÇ©
°´¹¦ÄÜÀà±ðÅÅÁÐ
DTD£ºÖ¸Ê¾ÔÚÄÄÖÖ XHTML 1.0 DTD ÖÐÔÊÐí¸Ã±êÇ©¡£
S=Strict,ÑϸñÀàÐÍ, T=Transitional,¹ý¶ÉÀàÐÍ¡¾×îÆÕ±é¡¿, F=Frameset,¿ò¼ÜÀàÐÍ.
±êÇ©³É¶Ô£¬xhtmlÊDZÈhtml¸üÑϸñ£¬ÀàËÆXML¸ñʽ
±êÇ©ÃèÊöDTD
<!DOCTYPE>
¶¨ÒåÎĵµÀàÐÍ¡£
STF
<html>
¶¨Òå HTML Îĵµ¡£
STF
< ......