Ò׽ؽØͼÈí¼þ¡¢µ¥Îļþ¡¢Ãâ°²×°¡¢´¿ÂÌÉ«¡¢½ö160KB

¡¾Nutch¡¿LinuxÏÂÓ¦ÓÃnutch 1.0 WebÇ°¶ËʵÏÖµ¥»ú¼ìË÷

nutchµÄÅÀ³æºÍËÑË÷¿ÉÒÔ˵ÊÇ·ÖÀëµÄÁ½¿é£¬ÅÀ³æ¿ÉÒÔÊÇM/R×÷Òµ£¬µ«ËÑË÷²»ÊÇM/R×÷Òµ¡£ËÑË÷ÓÐÁ½ÖÖ·½Ê½£ºÒ»Êǽ«ÅÀ³æÊý¾Ý(»òÕß³ÆË÷ÒýÊý¾Ý)·ÅÔÚ±¾µØÓ²ÅÌ£¬½øÐÐËÑË÷¡£¶þÊÇÖ±½ÓËÑË÷HDFSÖеÄÅÀ³æÊý¾Ý¡£
ÕâÀï½éÉÜÈçºÎʹÓÃnutch-1.0µÄWEBÇ°¶Ë¼ìË÷±¾µØÅÀ³æÊý¾Ý£º
(1)NutchµÄËÑË÷¿ÉÒÔ¶ÀÁ¢ÓÚhadoop¼¯Èº£¬Ö»Òª½«ÅÀ³æÏÂÀ´µÄÊý¾Ýcopyµ½ÈκλúÆ÷£¬ÔÚ´Ë»úÆ÷ÉÏ°²×°Ò»¸ötomcat£¬²¢ÔËÐÐnutch×Ô´øµÄWEBÇ°¶Ë³ÌÐò²¢×öÏàÓ¦ÅäÖ㬾ͿÉʵÏÖËÑË÷¡£
(2)½«Ê¹ÓÃÃüÁîbin/nutch crawl -dir data -depth 3 -topN 5ÅÀ³æÏÂÏÂÀ´µÄÊý¾Ýdata·ÅÔÚ±¾µØijĿ¼Ï£¨Èç¹ûÊÇ·Ö²¼Ê½ÅÀ³æ£¬¿ÉÒÔʹÓÃÃüÁî" bin/hadoop dfs -copyfromLocal data ±¾µØĿ¼" ½«ÅÀ³æÊý¾Ýdata¸´ÖƵ½±¾µØĿ¼£©£¬ÀýÈ罫Éú³ÉµÄdataĿ¼¸´ÖƵ½/home/nutch/nutchinstall/crawltest/Ŀ¼Ï¡££¨°²È«Æð¼û£¬ÇëÈ·±£Ä¿Â¼Â·¾¶ÖÐûÓпոñ£¬Õâ¸ö¿ÉÄÜÓÐÓ°Ï죩¡£
˵Ã÷£º
dataĿ¼ÊÇÅÀ³æÉú³ÉµÄĿ¼£¬ÏÂÃæÓÐÕâЩ×ÓĿ¼£ºcrawldb,index,indexes,linkdb,segments
(3)°²×°tomcat£¬ÇëÈ·±£°²×°Â·¾¶Ã»Óпոñ£¬ÕâºÜÖØÒª£¬ÔÚwindowsÉÏÒòΪÓпոñµ¼ÖÂËÑË÷½á¹ûʼÖÕΪ0.
(4)½«NutchÖ÷Ŀ¼ÏµÄWEBÇ°¶Ë³ÌÐònutch-1.0.war¸´ÖƵ½ /usr/program/apache-tomcat-6.0.18/webapps/Ŀ¼ÏÂ(apache°²×°Ä¿Â¼ÊÇ/usr/program/apache-tomcat-6.0.18)
(5)ä¯ÀÀÆ÷ÖÐÊäÈëhttp://localhost:8080/nutch-1.0£¬½«×Ô¶¯½âѹnutch-1.0.war¡£
(6)ÅäÖÃWEBÇ°¶Ë³ÌÐòÖеÄnutch-site.xmlÎļþ£¬ÅäÖÃÍê³Éºó±ØÐëÖØÆôtomcat(/usr/program/apache-tomcat-6.0.18/bin/shutdown.sh,È»ºóÔÚstart.sh)¡£
nutch-site.xmlÔÚĿ¼/usr/program/apache-tomcat-6.0.18/webapps/nutch-1.0/WEB-INF/classes/Ï£¬
ÅäÖÃÈçÏ£º
<property>
  <name>http.agent.name</name>   ²»¿ÉÉÙ£¬·ñÔòÎÞËÑË÷½á¹û
  <value>nutch-1.0</value>
  <description>HTTP 'User-Agent' request header.</description>
</property>
<property>
  <name>http.robots.agents</name>
  <value>nutch-1.0,*</value>
  <description>The agent strings we'll look for in robots.txt files,
  comma-separated, in decreasing order of precedence. You should
  put the value of http.agent.name as the first agent name, and keep the
  default * at the end of the li


Ïà¹ØÎĵµ£º

Linux ±à³Ì¾­µäÊé¼®ÍƼö


     ³ÉΪһÃû¾«Í¨ Linux ³ÌÐòÉè¼ÆµÄ¸ß¼¶³ÌÐòÔ±Ò»Ö±ÊDz»ÉÙÅóÓÑ×Î×ÎÒÔÇóµÄÄ¿±ê¡£¸ù¾ÝÖлªÓ¢²ÅÍøͳ¼ÆÊý¾Ý£¬±±¾©µØÇø Linux ³ÌÐòÔ±ÔÂнƽ¾ùΪ Windows ³ÌÐòÔ±µÄ 1.8 ±¶¡¢Java ³ÌÐòÔ±µÄ 2.6 ±¶£¬ Linux ³ÌÐòÔ±ÄêÖÕ½±½ðƽ¾ùΪ Windows ³ÌÐòÔ±µÄ 2.9 ±¶¡£Í¬Ê±Êý¾ÝÏÔʾ£¬Ëæ׏¤×÷¾­ÑéµÄÔö³¤£¬ Linux ³ÌÐòÔ±Ó ......

10¸ö×î¿áµÄLinuxµ¥ÐÐÃüÁî

ת×Ô£ºhttp://linuxtoy.org/archives/top-10-one-liners.html
ÏÂÃæÊÇÀ´×Ô Commandlinefu ÍøÕ¾ÓÉÓû§Í¶Æ±¾ö³öµÄ 10 ¸ö×î¿áµÄ Linux µ¥ÐÐÃüÁϣÍû¶ÔÄãÓÐÓá£
sudo !!
ÒÔ root ÕÊ»§Ö´ÐÐÉÏÒ»ÌõÃüÁî¡£
python -m SimpleHTTPServer
ÀûÓà Python ´î½¨Ò»¸ö¼òµ¥µÄ Web ·þÎñÆ÷£¬¿Éͨ¹ý http://$HOSTNAME:8000 ·ÃÎÊ¡£
:w !su ......

BusyBox ¼ò»¯Ç¶Èëʽ Linux ϵͳ

http://www.ibm.com/developerworks/cn/linux/l-busybox/
BusyBox ¼ò»¯Ç¶Èëʽ Linux ϵͳ
ΪС»·¾³×¼±¸µÄÒ»¸öС¹¤¾ß°ü
ÎĵµÑ¡Ïî
<tr valign="top"><td width="8"><img alt="" height="1" width="8" src="//www.ibm.com/i/c.gif"/></ ......

LinuxÍø¿¨Çý¶¯¿ª·¢×ܽá

1. ÍøÂç×Óϵͳ
* ϵͳµ÷ÓýӿÚ
* ЭÒéÎ޹زã
* ЭÒéʵÏÖ²ã
* Çý¶¯Î޹زã
* Çý¶¯³ÌÐò²ã
×¢£ºµ÷ÓýӿÚ<->ЭÒé²ã<->Çý¶¯³ÌÐò
2. Íø¿¨Çý¶¯³ÌÐò
* λÓÚÊý¾ÝÁ´Â·²ã
3. ¹Ø¼üÊý¾Ý½á¹¹
* struct net_device
* struct sk_buffer
4. ³õʼ»¯
* Éú³Énet_device£¬²¢³õʼ»¯Æä³ÉÔ±
* ¸ù¾ÝÐèÒª£¬¶ÔÍø¿¨×ÔÉí¼Ä´æÆ÷ ......

linux ÏûÏ¢¶ÓÁÐʹÓþ­Ñé

¸ÅÊö£º
ÏûÏ¢¶ÓÁÐÊÇlinuxϽø³Ì¼äͨÐŵÄÒ»ÖÖ·½Ê½£¬Ò»°ãÓÃÓÚ´«ËÍÉÙÁ¿Êý¾Ý£¬Èç¹û´óÁ¿Êý¾ÝÐèÒªÔÚ½ø³Ì¼ä¹²Ïí£¬Ôò¿ÉÒÔÓù²ÏíÄÚ´æ¡£
º¯Êý½Ó¿ÚʹÓãº
linuxÏÂÌṩÁËÒÔϼ¸¸ö½Ó¿Ú£¬ÓÃÓÚÏûÏ¢¶ÓÁеÄʹÓã¬Ê¹ÓÃÒ»°ã¹ý³ÌÈçÏ£º
1. µ÷ÓÃftok½Ó¿Ú²úÉúÒ»¸ökey.
    ʹÓÃftokµÄºÃ´¦ÊÇ£¬·ÃÎÊͬһ¸öÏûÏ¢¶ÓÁеIJ»Í¬½ø³Ì¿ÉÒÔ ......
© 2009 ej38.com All Rights Reserved. ¹ØÓÚE½¡ÍøÁªÏµÎÒÃÇ | Õ¾µãµØͼ | ¸ÓICP±¸09004571ºÅ