wordתhtmlÈçºÎÇå³ýÈßÓà´úÂë
ÎÒÓм¸Íò¸ö´ÓwordתÀ´µÄhtmlÎļþ£¬µ«ÕâЩhtmlÎļþÓÉdocµÄ100¶àK±ä³ÉÁ˼¸M£¬¼¸Ê®M¡£
ÔÀ´×ªÎªhtmlʱ²úÉúÁË´óÁ¿µÄÈßÓà´úÂ룬ÇëÎÊÓÐʲô·½·¨¿ÉÒÔÇå³ýÕâЩÀ¬»ø¡£
ÐèÒª³ÌÐò´úÂë¡£
¸Õ²Åû·ÖÁË£¬ÏÖÔÚÓÖÓÐÁË£¬¿ÉÒÔ¼Ó·ÖµÄ
/// <summary>
/// ÇåÀíWordÉú³ÉµÄÈßÓàHTML
/// </summary>
/// <param name="html"> </param>
/// <returns> </returns>
public static string CleanWordHtml(string html)
{
StringCollection sc = new StringCollection();
// get rid of unnecessary tag spans (comments and title)
sc.Add(@" <!--(\w|\W)+?-->");
sc.Add(@" <title>(\w|\W)+? </title>");
// Get rid of classes and styles
sc.Add(@"\s?class=\w+");
sc.Add(@"\s+style='[^']+'");
// Get rid of unnecessary tags
//sc.Add(@"
Ïà¹ØÎÊ´ð£º
ÎÒµÄjava applet ǶÈëhtmlºóÎÞ·¨ÏÔʾ£¬¶øÓÃappletviewerÏÔʾÕý³£ÎªÊ²Ã´£¬»¹ÓоÍÊÇÓÃhtml converterת»¯ºó¿ÉÒÔÏÔʾ£¬ÎªÊ²Ã´£¿Çë´óÏÀ´Í½Ì£¡
HTML code:
<HTML>
<HEAD>
<TITLE>TEST.HTML< ......
ÎÒÔÚ×öÒ»¸öÍøÕ¾aspµÄ£¬ÏëÉú³Éhtml£¬Éú³ÉºóÈçºÎµ÷ÓÃÄØ£¿
È磺ÎÒµ±Ç°µ÷ÓÃÒ³ÃæÊÇhttp://192.168.0.100/jdasp/x.asp?cnmai=1795 £¬Éú³ÉµÄÊÇx1795.htmlÎļþ£¬
ÈçºÎÔÚµ÷ÓÃx ......
ÊÖ»úÄÜ´ò¿ª.htmlµÄÍøÕ¾,Ϊʲô»¹Òª×öwapÍøÕ¾ÁË?,,,ÊÖ»úä¯ÀÀwapÍøÕ¾ÓÐʲôºÃ´¦
ÎÒÃǹ«Ë¾×öµÄwap¾ÍÊÇhtmlµÄ¡£
¹Ø×¢
ºÜ¶àµÍ¶ËµÄÊÖ»ú¶¼»¹ÊÇÖ»ÄÜ¿´wml¸ñʽµÄÀ²£¬wml±¾À´¾ÍÊÇרÃÅÕë¶ÔÊÖ»úÖƶ¨µÄÒ»Ì×Ò³ÃæÏÔʾÓïÑÔÀ²£ ......
ÓÃÏÂÃæÕâ¸öº¯Êý¿ÉÒÔ¶ÁÈ¡ÍøÒ³±£´æÏÂÀ´µÄHTMÎļþ£¬µ«ÊDz»ÄÜÖ±½Ó¶ÁÈ¡ÍøÒ³£¬ÎªÊ²Ã´£¿
BOOL GetSourceHtml(CString theUrl,CString Filename)
{
CInternetSess ......
HTMLÀïÈçºÎʵÏÖ¼ÈÓÐÉϱêÓÖÓÐϱꣿ
²Î ¿¼ :
HTMLÌØÊâ±ê¼Ç Éϱê ϱê Ï»®Ïß É¾³ýÏßµÈ http://www.cnblogs.com/7788/archive/2009/08/25/1553757.html
ÒýÓÃ
²Î ¿¼ :
HTMLÌØÊâ±ê¼Ç Éϱê ϱê Ï»®Ïß É¾³ý ......