Ò׽ؽØÍ¼Èí¼þ¡¢µ¥Îļþ¡¢Ãâ°²×°¡¢´¿ÂÌÉ«¡¢½ö160KB

ʹÓÃPerlµÄHTML::TreeBuilder::XPathÀ´½âÎöÍøÒ³ÄÚÈÝ

Ô­ÎĵØÖ·£ºhttp://www.php-oa.com/2009/09/24/perl-html-tree-builder-xpath.html
ת¹ýÀ´ ÂýÂýÑо¿
Ç¿´óµÄPerlÖÐ,Óг¬¼¶¶àÇ¿´óµÄÄ£¿é,ÈÃÎÒÃDz»ÔÚÐèÒªÖØ¸´µÄ·¢Ã÷ÂÖ×Ó.ÏÂÃæÕâ¸ö¾ÍÊÇÒ»¸öÇ¿´óµÄÄ£¿é.HTML::TreeBuilder::XPath.ËüÄÜÏóxmlÒ»Ñù½âÎöÍøÕ¾.ÔõôʹÓþͲ»Ï¸½²ÁË,ÈçÏÂ,¼ûʵÀý,ÎÒÊÇ´Óalexa.comÍøÕ¾µÃµ½ÎÒµÄÍøÕ¾ÅÅÃûµÄÒ»¸öÀý×Ó.»áÏÔʾÈçϵĽá¹û
1
2
#perl test.pl
ÄãµÄÍøÕ¾ÅÅÃûΪ: 199,954
HTML::TreeBuilder::XPathµÄʵÀý
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
#!/usr/bin/perl
use strict;
 
use LWP::Simple;
use HTML::TreeBuilder::XPath;
use Data::Dumper;
 
my $url = "http://www.alexa.com/siteinfo/www.php-oa.com";
my $html = get( $url );
my $tree = new HTML::TreeBuilder::XPath;
$tree->parse( $html );
$tree->eof;
#$tree->dump;
my $srt;
my $items = $tree->findnodes( '/html/body/descendant::div[@class[.=~/data down/]]' );
for my $item ( $items->get_nodelist() ){
eval{
$srt = $item->content->[1];
};
print "ÄãµÄÍøÕ¾ÅÅÃûΪ:".$srt."\n";
}
ÔõôʹÓÃ×îÂé·³µÄÒ»µãÔÚÓÚÕâ¸öXPathµÄÓï·¨.ÏÂÃæÊǼòµ¥µÄÓï·¨½éÉÜ.
XPATHµÄ¼òµ¥Óï·¨½éÉÜ
XPATH»ù±¾ÉÏÊÇÓÃÒ»ÖÖÀàËÆÄ¿Â¼Ê÷µÄ·½·¨À´ÃèÊöÔÚXMLÎĵµÖеÄ·¾¶¡£±ÈÈçÓÓ/”À´×÷ΪÉÏϲ㼶¼äµÄ·Ö¸ô¡£µÚÒ»¸ö“/”±íʾÎĵµµÄ¸ù½Úµã£¨×¢Ò⣬²»ÊÇÖ¸Îĵµ×îÍâ²ãµÄtag½Úµã£¬¶øÊÇÖ¸Îĵµ±¾Éí£©¡£±ÈÈç¶ÔÓÚÒ»¸öHTMLÎļþÀ´Ëµ£¬×îÍâ²ãµÄ½ÚµãÓ¦¸ÃÊÇ"/html"¡£
ͬÑùµÄ£¬“..”ºÍ“.”·Ö±ð±»ÓÃÀ´±íʾ¸¸½ÚµãºÍ±¾½Úµã¡£
XPATH·µ»ØµÄ²»Ò»¶¨¾ÍÊÇΨһµÄ½Úµã£¬¶øÊÇ·ûºÏÌõ¼þµÄËùÓнڵ㡣±ÈÈçÔÚHTMLÎĵµÀïʹÓÓ/html/head/scrpt”¾Í»á°ÑheadÀïµÄËùÓÐscript½Úµã¶¼È¡³öÀ´¡£
ΪÁËËõС¶¨Î»·¶Î§£¬ÍùÍù»¹ÐèÒªÔö¼Ó¹ýÂËÌõ¼þ¡£¹ýÂ˵ķ½·¨¾ÍÊÇÓÓ[”“]”°Ñ¹ýÂËÌõ¼þ¼ÓÉÏ¡£±ÈÈçÔÚHTMLÎĵµÀïʹÓÓ/html/body/div[@id='main']”£¬¼´¿ÉÈ¡³öbodyÀïidΪmainµÄdiv½Úµã¡£
ÆäÖÐ@id±íʾÊôÐÔid£¬ÀàËÆµÄ»¹¿ÉÒÔʹÓÃÈç@name, @value, @href, @src, @class….
¶øº¯Êýtext()µÄÒâ˼ÔòÊÇÈ¡µÃ½Úµã°üº¬µÄÎı¾¡£±ÈÈ磺<div>hello<p>world</p>< /div>ÖУ¬ÓÃ"div[


Ïà¹ØÎĵµ£º

ÈçºÎʹÓÃObjective C½âÎöHTMLºÍXML

ʹÓÃObjective-C½âÎöHTML»òÕßXML£¬ÏµÍ³×Ô´øÓÐÁ½ÖÖ·½Ê½Ò»¸öÊÇͨ¹ýlibxml£¬Ò»¸öÊÇͨ¹ýNSXMLParser¡£²»¹ýÕâÁ½ÖÖ·½Ê½¶¼ÐèÒª×Ô¼ºÐ´ºÜ¶à±àÂëÀ´´¦ÀíץȡÏÂÀ´µÄÄÚÈÝ£¬¶øÇÒ²»ÊǺÜÖ±¹Û¡£
ÓÐÒ»¸ö±È½ÏºÃµÄÀà¿âhpple£¬ËüÊÇÒ»¸öÇáÁ¿¼¶µÄ°ü×°¿ò¼Ü£¬¿ÉÒԺܺõĽâ¾öÕâ¸öÎÊÌâ¡£ËüÊÇÓÃXPathÀ´¶¨Î»ºÍ½âÎöHTML»òÕßXML¡£
°²×°²½Ö裺
-¼ÓÈë libx ......

ÓйØÈ¥µôhtmlÖеÄÁ½¶Ë×Ö·û

ÈçºÎÓÐÒ»¸ö×Ö·û´®ÊÇÕâÑùµÄÐÎʽstr = "&bbbLAA";
ÏëµÃµ½"L"µÄ»°¿ÉÒÔÕâÑùȥʵÏÖ£º
//sDataStr = "&bbbLAA";
//sLeftQuote = ""&bbb";
//sRightQuote = "&AA";
µ÷ÓÃÕâ¸ö·½·¨½«µÃµ½L×ֶΡ£
function abCutString( sDataStr, sLeftQuote, sRightQuote)
{
 var sReturnVal = '';
 var nStart ......

html ³¬Á´½Ó a ÊôÐÔ

html ³¬Á´½Ó ÊôÐÔ
HTML ʹÓ󬼶Á´½ÓÓëÍøÂçÉϵÄÁíÒ»¸öÎĵµÏàÁ¬¡£
ê±êÇ©ºÍ Href ÊôÐÔ
HTML ʹÓà <a> £¨Ãª£©±êÇ©À´´´½¨Á¬½ÓÁíÒ»¸öÎĵµµÄÁ´½Ó¡£
ê¿ÉÒÔÖ¸ÏòÍøÂçÉϵÄÈκÎ×ÊÔ´£ºÒ»ÕÅ HTML Ò³Ãæ£¬Ò»·ùͼÏñ£¬Ò»¸öÉùÒô»òÊÓÆµÎļþµÈµÈ¡£
´´½¨ÃªµÄÓï·¨£º
<a href="url">Text to be displayed</a>
ÀýÈ磺
& ......

XHTMLºÍHTMLµÄÇø±ð

ÒòΪWap2.0ʹÓÃXHTMLÓïÑÔ²¢¼æÈÝWML£¬ÒÔÏÂÊÇÍøÉÏÕÒµÄXHTMLºÍHTMLµÄÇø±ð
XHTMLÏà±ÈÓÚHTML
  1.ËùÓеıê¼Ç¶¼±ØÐëÒªÓÐÒ»¸öÏàÓ¦µÄ½áÊø±ê¼Ç
  ÒÔǰÔÚHTMLÖУ¬Äã¿ÉÒÔ´ò¿ªÐí¶à±êÇ©£¬ÀýÈçºÍ<li>¶ø²»Ò»¶¨Ð´¶ÔÓ¦µÄ
  ºÍ</li>À´¹Ø±ÕËüÃÇ¡£µ«ÔÚXHTMLÖÐÕâÊDz»ºÏ·¨µÄ¡£XHTMLÒªÇóÓÐÑϽ÷µÄ½á¹¹£¬ËùÓбêÇ©±ØÐë ......

HTML ¼òÊ·

HTML ¼òÊ·
      HTML ÊÇ Web ͳһÓïÑÔ£¬ÕâЩÈÝÄÉÔÚ¼âÀ¨ºÅÀïµÄ¼òµ¥±êÇ©£¬¹¹³ÉÁËÈç½ñµÄ Web¡£1991 Ä꣬Tim Berners-Lee ±àдÁËÒ»·Ý½Ð×ö “HTML ±êÇ©”µÄÎĵµ£¬ÀïÃæ°üº¬ÁË´óÔ¼20¸öÓÃÀ´±ê¼ÇÍøÒ³µÄ HTML ±êÇ©¡£ËûÖ±½Ó½èÓà SGML µÄ±ê¼Ç¸ñʽ£¬Ò²¾ÍÊǺóÀ´ÎÒÃÇ¿´µ½µÄ HTML ±ê¼ÇµÄ¸ñʽ¡£±¾ÎĽ²Ê ......
© 2009 ej38.com All Rights Reserved. ¹ØÓÚE½¡ÍøÁªÏµÎÒÃÇ | Õ¾µãµØÍ¼ | ¸ÓICP±¸09004571ºÅ