lucene Ë÷ÒýHTMLÎĵµ
ÉîδÀ´¼¼Êõ
1¡¢´ó²¿·ÖWEBÎĵµ²ÉÓÃHTML¸ñʽ¡£
2¡¢±¾ÀýÓÃÈçÏÂHTMLÎĵµ
<html>
<head>
<title>
Laptop power supplies are avaliable in First class only
</title>
</head>
<body>
<h1>code,write,fly</h1>
</body>
</html>
3¡¢Ê¹ÓÃJTidy
JTidyÓÉAndy Quick±àдµÄTidyµÄJava°æ±¾¡£
public class JTidyHTMLHandler implements DocumentHandler{
publicorg.apache.lucene.document.Document getDocument(InputStreamis)
throwsDocumentHandlerException{ //´«ÈëÒ»¸ö´ú±íHTMLÎĵµµÄInputStream¶ÔÏó
Tidy tidy=new Tidy();
tidy.setQuiet(true);
tidy.setShowWarnings(false);
//½âÎö´ú±íHTMLÎĵµµÄInputStream¶ÔÏó
org.w3c.dom.Documentroot=tidy.parseDOM(is,null);
ElementrawDoc=root.getDocumentElement();
org.apache.lucene.document.Document doc=neworg.apache.lucene.document.Document();
Stringtitle=getTitle(rawDoc);//»ñµÃ±êÌâ
Stringbody=getBody(rawDoc);//»ñµÃ<body>ºÍ</body>Ö®¼äËùÓÐÔªËØ
if((title!=null)&&(!title.equals(""))) {
doc.add(Field.Text("title",title));
}
if((body!=null)&&(!body.equals(""))){
doc.add(Field.Text("body",body));
}
return doc;
}
protected String getTitle(Element rawDoc){
if(rawDoc==null){
returnnull;
}
Stringtitle="";
NodeListchildren=rawDoc.getElementsB
Ïà¹ØÎĵµ£º
CSSÊÇDHTMLµÄ»ù´¡£¬CSSÓÃÓÚÉ趨HTMLÔªËØÔÚÒ³ÃæÉϵÄÏÔʾ·ç¸ñ£¬¶øCSS-PÔòÊÇCSSµÄÒ»¸öÀ©Õ¹£¬Ëü¿ÉÓÃÀ´¿ØÖÆHTMLÔªËØÔÚÍøÒ³ÉÏ»òÕß˵ÔÚ´°¿ÚµÄλÖá£ÏÂÃæµÄÁ½¸öÁ´½ÓÌṩÁËCSSºÍCSS£PÏ꾡µÄ¼¼ÊõÊֲ᣺
¡¡¡¡ W3C CSS-Positioning
¡¡¡¡ Builder.com's CSS Guide
¡¡¡¡ ÔÚ±¾¿Î³ÌÖУ¬½«»á·´¸´µØ¶ÔCSS½øÐнéÉÜ¡£
¡¡¡¡ ʹÓÃDIV±êÇ©
¡¡ ......
×ªÔØ£ºhttp://jiangzhengjun.javaeye.com/blog/480996
ʼþ
DOMͬʱ֧³ÖÁ½ÖÖʼþģʽ£º²¶»ñÐÍʼþºÍðÅÝÐÍʼþ£¬µ«ÊÇ£¬²¶»ñÐÍʼþÏÈ·¢Éú¡£Á½ÖÖʼþÁ÷»á´¥¼°DOMÖеÄËùÓжÔÏ󣬴Ódocument¶ÔÏó¿ª
ʼ£¬Ò²ÔÚdocument¶ÔÏó½áÊø£¨´ó²¿·Ö¼æÈݱê×¼µÄä¯ÀÀ»á¼ÌÐø½«Ê¼þ²¶»ñ/ðÅÝÑÓÐøÖÁwindow¶ÔÏ󣩣¬DOMÖеÄÔªËØ¶¼»áÁ¬ÐøÊÕµ½Á½´ÎÊ ......
Window ¶ÔÏó
Window ¶ÔÏóÊÇ JavaScript ²ã¼¶ÖеĶ¥²ã¶ÔÏó¡£
Window ¶ÔÏó´ú±íÒ»¸öä¯ÀÀÆ÷´°¿Ú»òÒ»¸ö¿ò¼Ü¡£
Window ¶ÔÏó»áÔÚ <body> »ò <frameset> ÿ´Î³öÏÖʱ±»×Ô¶¯´´½¨¡£
ÓÐ¹Ø Window ¶ÔÏóµÄÏêϸÃèÊö¡£
IE: Internet Explorer, F: Firefox, O: Opera.
Window ¶ÔÏóµÄ¼¯ºÏ
CollectionDescriptionIEFO
fr ......
HTML³£Ó÷ûºÅ£º
ÏÔʾһ¸ö¿Õ¸ñ  
< СÓÚ < <
> ´óÓÚ > >
& &·ûºÅ & &
" Ë«ÒýºÅ " "
ÆäËû³£ÓõÄ×Ö·ûʵÌå(Character Entities)
ÏÔʾ½á¹û ˵Ã÷ Entity Name Entity Number
? °æÈ¨ © ©
? ×¢²áÉ̱ ......