XMLºÍHTML³£ÓÃתÒå×Ö·û
XMLºÍHTMLÖж¼ÓÐһЩÌØÊâµÄ×Ö·û£¬ÕâЩ×Ö·ûÔÚXMLºÍHTMLÖÐÊDz»ÄÜÖ±½ÓʹÓõģ¬Èç¹û±ØÐëʹÓÃÕâЩ×Ö·û£¬Ó¦¸ÃʹÓÃÆä¶ÔÓ¦µÄתÒå×Ö·û¡£
Èç¹ûÔÚXMLÎĵµÖÐʹÓÃÀàËÆ"<" µÄ×Ö·û, ÄÇô½âÎöÆ÷½«»á³öÏÖ´íÎó£¬ÒòΪ½âÎöÆ÷»áÈÏΪÕâÊÇÒ»¸öÐÂÔªËصĿªÊ¼¡£
ËùÒÔ²»Ó¦¸ÃÏñÏÂÃæÄÇÑùÊéд´úÂë:
<message>if salary < 1000 then</message>
¡¡¡¡ÎªÁ˱ÜÃâ³öÏÖÕâÖÖÇé¿ö£¬±ØÐ뽫×Ö·û"<" ת»»³É“<”£¬ÈçÏÂÃæÕâÑù:
<message>if salary < 1000 then</message>
XML³£ÓÃתÒå×Ö·û£º
×Ö·û
תÒå×Ö·û
ÃèÊö
&
&
ºÍ
<
<
СÓÚºÅ
>
>
´óÓÚºÅ
"
"
Ë«ÒýºÅ
'
'
µ¥ÒýºÅ
¡¡HTMLÖÐ<, >£¬&µÈÓÐÌØÊ⺬Ò壬(Ç°Á½¸ö×Ö·ûÓÃÓÚÁ´½ÓÇ©£¬&ÓÃÓÚתÒå)£¬²»ÄÜÖ±½ÓʹÓá£Ê¹ÓÃÕâÈý¸ö×Ö·ûʱ£¬Ó¦Ê¹ÓÃËüÃǵÄתÒå×Ö·û¡£
HTML³£ÓÃתÒå×Ö·û£º
×Ö·û
תÒå×Ö·û
ÃèÊö
&
&
ºÍ
<
<
СÓÚºÅ
>
>
´óÓÚºÅ
"
"
Ë«ÒýºÅ
¿Õ¸ñ
©
©
°æȨ·û
®
®
×¢²á·û
T ......
XMLºÍHTML³£ÓÃתÒå×Ö·û
XMLºÍHTMLÖж¼ÓÐһЩÌØÊâµÄ×Ö·û£¬ÕâЩ×Ö·ûÔÚXMLºÍHTMLÖÐÊDz»ÄÜÖ±½ÓʹÓõģ¬Èç¹û±ØÐëʹÓÃÕâЩ×Ö·û£¬Ó¦¸ÃʹÓÃÆä¶ÔÓ¦µÄתÒå×Ö·û¡£
Èç¹ûÔÚXMLÎĵµÖÐʹÓÃÀàËÆ"<" µÄ×Ö·û, ÄÇô½âÎöÆ÷½«»á³öÏÖ´íÎó£¬ÒòΪ½âÎöÆ÷»áÈÏΪÕâÊÇÒ»¸öÐÂÔªËصĿªÊ¼¡£
ËùÒÔ²»Ó¦¸ÃÏñÏÂÃæÄÇÑùÊéд´úÂë:
<message>if salary < 1000 then</message>
¡¡¡¡ÎªÁ˱ÜÃâ³öÏÖÕâÖÖÇé¿ö£¬±ØÐ뽫×Ö·û"<" ת»»³É“<”£¬ÈçÏÂÃæÕâÑù:
<message>if salary < 1000 then</message>
XML³£ÓÃתÒå×Ö·û£º
×Ö·û
תÒå×Ö·û
ÃèÊö
&
&
ºÍ
<
<
СÓÚºÅ
>
>
´óÓÚºÅ
"
"
Ë«ÒýºÅ
'
'
µ¥ÒýºÅ
¡¡HTMLÖÐ<, >£¬&µÈÓÐÌØÊ⺬Ò壬(Ç°Á½¸ö×Ö·ûÓÃÓÚÁ´½ÓÇ©£¬&ÓÃÓÚתÒå)£¬²»ÄÜÖ±½ÓʹÓá£Ê¹ÓÃÕâÈý¸ö×Ö·ûʱ£¬Ó¦Ê¹ÓÃËüÃǵÄתÒå×Ö·û¡£
HTML³£ÓÃתÒå×Ö·û£º
×Ö·û
תÒå×Ö·û
ÃèÊö
&
&
ºÍ
<
<
СÓÚºÅ
>
>
´óÓÚºÅ
"
"
Ë«ÒýºÅ
¿Õ¸ñ
©
©
°æȨ·û
®
®
×¢²á·û
T ......
×÷Õß
´Þ¿µ
·¢²¼ÓÚ
2010Äê5ÔÂ13ÈÕ ÏÂÎç10ʱ14·Ö
Ëæ×ÅWeb2.0¼¼ÊõµÄ²»¶Ï·¢Õ¹£¬WebÇ°¶ËµÄÓÅ»¯Êܵ½Ô½À´Ô½¶àµÄ¹Ø×¢£¬ÌرðÊÇJavaScriptºÍCSSÓÅ»¯µÄÌÖÂÛÒ»Ö±ÊÇÈȵ㣬¹¤¾ßÒ²
Ïà¶Ô·á¸»£¬¶ø¶ÔHTMLÓÅ»¯ÔòÓÐËùºöÊÓ£¬×î½ü£¬À´×԰ٶȷºÓû§ÌåÑéÍŶÓ
µÄ¹¤³ÌʦMiller
£¨chenminliang£©×«
ÎÄ
Ç¿µ÷ÁËHTMLÓÅ»¯µÄÖØÒªÐÔºÍÏà¹Ø¼¼ÇÉ¡£
MillerÊ×ÏȾÙÀý˵Ã÷ÁËHTMLÓÅ»¯ÉÔÏÔºöÂÔµÄÊÂʵ£º
ÔÚSteve Souders
µÄ´ó×÷¡¶Even Faster Web Sites
¡·
ÖÐ̸µ½·Ç³£¶àÓÐЧµÄÇ°¶ËÓÅ»¯·½·¨£¬ÀýÈçJavascriptµÄ¼ÓÔØ¡¢CSSÑ¡Ôñ·û¡¢Í¼Æ¬ÓÅ»¯¡¢gzip¡¢iframeÎÊÌâµÈ£¬Î¨¶ÀûÓÐϸ˵HTMLÓÅ»¯¡£
ËûÇ¿µ÷HTMLÓÅ»¯ËäÈ»¿´ËÆ΢С£¬µ«ÊDz»¿ÉºöÊÓ£º
ÔÚÕû¸öÇ°¶ËµÄ¹¹³ÉÖУ¬HTMLÊDZز»¿ÉÉÙµÄÒ»²¿·Ö£¬¶øÇÒÊÇÕæÕýµÄչʾ“Ç°¶Ë”¡£ËäÈ»Ó붯éüÊ®¼¸KµÄJavascriptÏà
±È£¬HTMLµÄ´óСÔÚÕû¸öÒ³Ãæ×ÊÔ´ÖÐÒ»°ã²»»áռ̫¶àµÄ±ÈÖØ£¬¶øÇÒ»¹ÓÐGzip£¬µ«ÊÇÊÂʵ±íÃ÷£¬´ó¶àÊýÒ³Ã涼ÓнϴóµÄѹËõÓàµØ£¬¼´Ê¹ÊÇGzip¹ýºóÈÔÈ»Äܼõ
С¿É¹ÛµÄÌå»ý...
MillerÔÚÎÄÖÐ×ܽáÁËHTMLÓÅ»¯µÄ¸÷ÖÖ·½·¨£¬½«Æä·ÖΪÁ½ÀࣺÂÌÉ«¹æÔò
——ÔÚ¸÷ÀàÒ³ÃæÖÐÊÊÓÃÇ ......
protected override void OnPreInit(EventArgs e)
{
base.OnPreInit(e);
string path = Server.MapPath("HomePage.htm");
if (File.Exists(path))
{
DateTime lastUpdatedTime = File.GetLastWriteTime(path);
if ((DateTime.Now - lastUpdatedTime) <= TimeSpan.fromHours(2))
{
&nb ......
͹Ï߱߿ò(¿í¶È10,ºìÉ«)
·Ö×é¿ò¡¡ ´úÂë
<fieldset style="border:10px ridge #FF0000; padding:2px; width:500">
<legend>·Ö×é¿ò</legend>
¡¡</fieldset>
°¼Ïß
·Ö×é¿ò¡¡ ´úÂë
<fieldset style="border:10px groove #FF0000; padding:2px; width:500">
<legend>·Ö×é¿ò</legend>
¡¡</fieldset>
ǶÈë
·Ö×é¿ò¡¡ ´úÂë
<fieldset style="border:10px inset #FF0000; padding:2px; width:500">
<legend>·Ö×é¿ò</legend>
¡¡</fieldset>
¿ª¶Ë
·Ö×é¿ò¡¡ ´úÂë
<fieldset style="border:10px outset #FF0000; padding:2px; width:500">
<legend>·Ö×é¿ò</legend>
¡¡</fieldset>
ʵÏß
·Ö×é¿ò¡¡ ´úÂë
<fieldset style="border:10px solid #FF0000; padding:2px; width:500">
<legend>·Ö×é¿ò</legend>
¡¡</fieldset>
......
ÔÎĵØÖ·£ºhttp://bbs.chinaunix.net/viewthread.php?tid=1316204
Ç°ÌìÑо¿Ê¹ÓÃHTML::TreeBuilderÄ£¿é·ÖÎöÍøÒ³£¬¿´µ½ÁËһƪÎÄÕ£¬Ë³±ã¾Í·ÒëÁËһϣ¬·¢ÉÏÀ´·ÖÏí¡£±¾ÈËÎıʲ»ºÃ£¬eÎÄˮƽÓÐÏÞ£¬´ó¼Ò´éºÏ¿´°É¡£
ÔÎĵØÖ·£ºhttp://www.perl.com/pub/a/2006/01/19/analyzing_html.html?page=1
ÎÄÕµı³¾°ÊÇ£¬×÷ÕßÔÚ½ÌÊÚÍøÒ³±à¼µÄ¿Î³Ì£¬Ëû»á¸øѧÉú×öһЩʹÓÃnvu×öÍøÒ³×÷Òµ£¬Ã¿¸ö×÷ÒµÖÐÓÐЩÌض¨µÄÒªÇó£¬×÷Õß¿àÓÚ¸øѧÉúµÄ×÷ÒµÆÀ·ÖºÍ×ö×¢ÊÍ£¬ËùÒÔ¾ÍÏ뵽ʹÓÃperl³ÌÐò¶ÔѧÉúµÄ×÷Æ·½øÐзÖÎö¡£
perlµÄÕýÔò±í´ïʽÔÚÎı¾´¦Àí·½ÃæµÄÄÜÁ¦ÒѾ·Ç³£×¿Ô½£¬²¢ÇÒ»¹ÓзֽâÍøÒ³µÄרÓÃÄ£×éHTML::TreeBuilder¡£ËüÌṩÁËÒ»¸öhtmlµÄ·Ö½âÆ÷£¬Õâ¸ö·Ö½âÆ÷¿ÉÒÔ´ÓÒ»¸öÍøÒ³¹¹½¨³öÒ»¸öÔªËصÄÊ÷Ðνṹ¡£²¢ÇÒ£¬´ÓÒ»¸öÍøÒ³Öн¨Á¢Ò»¿ÃÊ÷ºÍ¹¹½¨ËüµÄÄÚÈÝÊǷdz£ÈÝÒ׵ģº
#н¨Ò»¿ÃÊ÷
$tree = HTML::TreeBuilder->new;
#ÓÉÒ»¸öÍøÒ³Îļþ¹¹½¨Ê÷µÄÄÚÈÝ
$tree->parse_file($file_name);
#µ±È»Ò²¿ÉÒÔÓÉÒ»¸ö±äÁ¿µÄÄÚÈÝÖзֽâ³öÊ÷µÄÄÚÈÝ
$tree->parse($value);
Ê÷µÄ½ÚµãÊÇÒ»¸öHTML::Element¶ÔÏó¡£ÕâÓкܶ෽·¨¿ÉÒÔ´æÈ¡ºÍ²Ù×÷Ê÷ÖеÄÕâЩ½Úµã¡£µ±ÄãʹÓÃÍê³ÉÁËÕâ¿ÃÊ÷µÄʱºò£¬¿ÉÒÔʹÓÃÏÂÃæµÄ·½·¨Ïú»ÙËü²¢ÇÒÊÍ· ......
ÔÎĵØÖ·£ºhttp://www.php-oa.com/2009/09/24/perl-html-tree-builder-xpath.html
ת¹ýÀ´ ÂýÂýÑо¿
Ç¿´óµÄPerlÖÐ,Óг¬¼¶¶àÇ¿´óµÄÄ£¿é,ÈÃÎÒÃDz»ÔÚÐèÒªÖظ´µÄ·¢Ã÷ÂÖ×Ó.ÏÂÃæÕâ¸ö¾ÍÊÇÒ»¸öÇ¿´óµÄÄ£¿é.HTML::TreeBuilder::XPath.ËüÄÜÏóxmlÒ»Ñù½âÎöÍøÕ¾.ÔõôʹÓþͲ»Ï¸½²ÁË,ÈçÏÂ,¼ûʵÀý,ÎÒÊÇ´Óalexa.comÍøÕ¾µÃµ½ÎÒµÄÍøÕ¾ÅÅÃûµÄÒ»¸öÀý×Ó.»áÏÔʾÈçϵĽá¹û
1
2
#perl test.pl
ÄãµÄÍøÕ¾ÅÅÃûΪ: 199,954
HTML::TreeBuilder::XPathµÄʵÀý
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
#!/usr/bin/perl
use strict;
use LWP::Simple;
use HTML::TreeBuilder::XPath;
use Data::Dumper;
my $url = "http://www.alexa.com/siteinfo/www.php-oa.com";
my $html = get( $url );
my $tree = new HTML::TreeBuilder::XPath;
$tree->parse( $html );
$tree->eof;
#$tree->dump;
my $srt;
my $items = $tree->findnodes( '/html/body/descendant::div[@class[.=~/data down/]]' );
for my $item ( $items->get_nodelist() ){
eval{
$srt = $item->content->[1];
};
......