Ò׽ؽØͼÈí¼þ¡¢µ¥Îļþ¡¢Ãâ°²×°¡¢´¿ÂÌÉ«¡¢½ö160KB

lucene Ë÷ÒýHTMLÎĵµ ÉîδÀ´¼¼Êõ


1¡¢´ó²¿·ÖWEBÎĵµ²ÉÓÃHTML¸ñʽ¡£
2¡¢±¾ÀýÓÃÈçÏÂHTMLÎĵµ
<html>
   <head>
      <title>
         Laptop power supplies are avaliable in First class only
       </title>
   </head>
    <body>
       <h1>code,write,fly</h1>
   </body>
</html>
3¡¢Ê¹ÓÃJTidy
JTidyÓÉAndy Quick±àдµÄTidyµÄJava°æ±¾¡£
public class JTidyHTMLHandler implements DocumentHandler{
   publicorg.apache.lucene.document.Document getDocument(InputStreamis) 
      throwsDocumentHandlerException{ //´«ÈëÒ»¸ö´ú±íHTMLÎĵµµÄInputStream¶ÔÏó
   Tidy tidy=new Tidy();
   tidy.setQuiet(true);
   tidy.setShowWarnings(false);
  //½âÎö´ú±íHTMLÎĵµµÄInputStream¶ÔÏó
   org.w3c.dom.Documentroot=tidy.parseDOM(is,null);
   ElementrawDoc=root.getDocumentElement();
  
  org.apache.lucene.document.Document doc=neworg.apache.lucene.document.Document();
   Stringtitle=getTitle(rawDoc);//»ñµÃ±êÌâ
   Stringbody=getBody(rawDoc);//»ñµÃ<body>ºÍ</body>Ö®¼äËùÓÐÔªËØ
   if((title!=null)&&(!title.equals("")))  {
     doc.add(Field.Text("title",title));
   }
   if((body!=null)&&(!body.equals(""))){
      doc.add(Field.Text("body",body));
   }
   return doc;
 }
 protected String getTitle(Element rawDoc){
    if(rawDoc==null){
        returnnull;
    }
   
    Stringtitle="";
    NodeListchildren=rawDoc.getElementsB


Ïà¹ØÎĵµ£º

HTMLÏà¶Ô·¾¶ Éϼ¶Ä¿Â¼¼°Ï¼¶Ä¿Â¼µÄд·¨


 ÈçºÎ±íʾÉϼ¶Ä¿Â¼
../±íʾԴÎļþËùÔÚĿ¼µÄÉÏÒ»¼¶Ä¿Â¼£¬../../±íʾԴÎļþËùÔÚĿ¼µÄÉÏÉϼ¶Ä¿Â¼£¬ÒÔ´ËÀàÍÆ¡£
¼ÙÉèinfo.html·¾¶ÊÇ£ºc:\Inetpub\wwwroot\sites\blabla\info.html
¼ÙÉèindex.html·¾¶ÊÇ£ºc:\Inetpub\wwwroot\sites\index.html
ÔÚinfo.html¼ÓÈëindex.html³¬Á´½ÓµÄ´úÂëÓ¦¸ÃÕâÑùд£º
<a href ......

ÈçºÎ·ÃÎÊhtmlÍøÒ³ÖеÄÖ¡¶ÔÏó

Èç¹ûÓÃwebbrowserµÄdocumentÈ¥»ñÈ¡IFrameÕâ¸ö¶ÔÏó£¬ËæºóÒ»¶¨»áµÃµ½accessdeniedµÄ´íÎó£¬ÎªÊ²Ã´£¿
ÕÒÁËһϣ¬¾Ý˵ÊDz»ÄÜ¿çÓò·ÃÎÊ...
ÖÕÓÚÓÐÒ»Ì죬ÊÔÁËһϣ¬ÔÚOnDocumentCompleteʼþÀïÃæ·µ»ØµÄdispatch¾ÍÊÇiframeµÄIHTMLWindow2¶ÔÏó£¬ÔÚÄÇÀï¿ÉÒÔ²Ù×÷ËùÓеĶÔÏó£¬È»ºó£¬ÄãÏë×öʲô¾ÍËæÄãÁË£¬²»ÖªµÀÈç¹ûÕâ¸öʱºò±£´æÁËÕâ¸öI ......

html ÖÐÀûÓÃjsµ÷ÓÃÒþ²ØdivÄ£·Â¶Ô»°¿òСÀý

<html>
<head>
    <script>  
  function   locking(){  
  document.all.ly.style.display="block";  
  document.all.ly.style.width=document.body.clientWidth;  
  document.all.ly.style.height ......

[HTML±à¼­Æ÷]C#±àдµÄHTML±à¼­Æ÷£ºÔ­Àíƪ

×÷Õߣº¹â½ÅѾ˼¿¼ ʱ¼ä£º12/23/2009 1:51:00 PM
Ò»¿ªÊ¼¾Í¾õµÃHTML±à¼­Æ÷ÕâÍæÒâÓ¦¸ÃÊǺܸßÉîĪ²âµÄ¡£ËæËæ±ã±ã¾ÍÏëÕûÒ»¸öÓ¦¸Ã²»ÊÇÒ»¼þÈÝÒ×µÄÊÂÇé¡£ºóÀ´¶ÔWebBrowser¿Ø¼þÓÐÁËһЩÁ˽⣬²»¹ý¶¼ÊǺܷôdzµÄÁ˽⡣ֻ֪µÀÓÃÕâ¸ö¿Ø¼þ¾ÍÄܹ»ÔÚ×Ô¼ºµÄ³ÌÐòÖиãÒ»¸öWEBä¯ÀÀÆ÷Ö®ÀàµÄ¶«Î÷£¬´ÓÀ´Ã»ÓÐÏë¹ýHTML±à¼­Æ÷Ò²¿ÉÒÔʹÓÃÕâ¸ö¿Ø¼þÀ´ÊµÏ ......

¼òµ¥HTML ×ÖÌ壬ÑÕÉ«±íʾ£¨×ªÔØ£©


×ÖÌå(FONT)±ê¼Ç(TAGS)
±êÌâ×ÖÌå(Header)
<h#> ... </h#> #=1, 2, 3, 4, 5, 6
<h1>½ñÌìÌìÆøÕæºÃ£¡</h1> ½ñÌìÌìÆøÕæºÃ£¡
<h2>½ñÌìÌìÆøÕæºÃ£¡</h2> ½ñÌìÌìÆøÕæºÃ£¡
<h3>½ñÌìÌìÆøÕæºÃ£¡</h3> ½ñÌìÌìÆøÕæºÃ£¡
<h4>½ñÌìÌìÆøÕæºÃ£¡</h4> ½ñÌìÌìÆøÕæºÃ£¡
&l ......
© 2009 ej38.com All Rights Reserved. ¹ØÓÚE½¡ÍøÁªÏµÎÒÃÇ | Õ¾µãµØͼ | ¸ÓICP±¸09004571ºÅ