»°ËµPython£¨ËÄ£©»¶ÓСÂéȸ
С°×ÊǸö΢ÈíÃÔ£¬ËûµÄżÏñÊDZȶû´óÊ壬ÔÒòµ±È»ÊǵØÇòÈ˶¼ÖªµÀÀ²¡£´ó¶þµÄʱºò£¬ËûµÄ“ê¡Ñ§¼Æ»®”ÔøÒ»¶ÈµÃ³Ñ£¬ÔÒòÊÇËû¹Ò¿ÆÌ«¶à¡£µ±È»£¬´óÈýÐÂѧÆÚ¿ªÊ¼µÄʱºò£¬Ãæ¶Ô¹«ÖÚÖÊÒÉ£¬Ð¡°×Õ¾ÔÚÒÎ×ÓÉÏ£¬Ïñ¼«ÁË¡¶´óÄÚÃÜ̽ÁãÁã·¢¡·ÀïµÄÎ÷ÃÅ´µÑ©£º“ÊÀ½çÊ׸»±È²»Ò»¶¨Óжà³öÉ«£¬ÕâÖ»²»¹ýÊÇÄãÃÇÕâЩÐǶ·ÊÐÃñÒ»ÏáÇéÔ¸µÄÏë·¨°ÕÁË¡£”
ÿÌìС°×¶¼»áÔÚËÞÉá×ªÓÆ£¬ºÃÏñºÜ“¹Â¶À”£¬×ìÀïÄîÄîÓдʣº“Õâ¸öÊÀ½çÕýÔÚ·¢Éú×Å·Ì츲µØµÄ±ä»¯£¬¶øÎÒÃÇÈ´ÏñÃ«Â¿ËÆµÄÉú»î¡£”×îºó£¬Ëû×Ü»áÀ´Ò»¾ä£º“ÎÒÒª³ÉÁ¢µÚ¶þ¸ö¹È¸è£¡”
Õâ½Ú¿Î£¬ÎÒÃǾͻáÁ˽âËÑË÷ÒýÇæ£¬»¹»á±àдһ¸öСÐ͵ÄÍøÂçÅÀ³æ¡£
ËÑË÷ÒýÇæÓÐÄ¿·Ö¹¹³É£¿
Ê×ÏȸÐлÕâÕÅͼµÄÔ×÷Õߣ¬Ö÷Òª»¹ÊÇÒª¸ÐлCountry¡£Í¨¹ýÕâÕÅͼ£¬ÎÒÃÇ¿ÉÒÔ¿´µ½£ºÊ×ÏÈ£¬ÍøÂçÖ©Öë×¥È¡ÍøÒ³£¬½«ÍøÒ³ÄÚÈݼ°Á´½Ó´æµ½Êý¾Ý¿âÖС£È»ºóÓÉË÷ÒýÄ£¿é½¨Á¢¹Ø¼ü´Êµ½ÍøÖ·µÄË÷Òý£¬¹©¼ìË÷Ä£¿é²éѯ¡£¼ìË÷Ä£¿éÊǸù¾ÝÄãÊäÈëµÄÄÚÈÝ´ÓË÷ÒýÊý¾Ý¿âÌáÈ¡Êý¾Ý¡£Ö÷Ҫģ¿é½éÉÜÈçÏ£º
ÍøÒ³×¥È¡Ä£¿é£º°üÀ¨CrawlerºÍCrawler control£¬ÆäÖÐCrawler¸ºÔðץȡ²¢·ÖÎöÍøÒ³Á´½Ó£¬·µ»ØpageºÍurl£»Crawler control¸ºÔð¿ØÖÆ¡¢µ÷¶ÈCrawler¡£
ÍøÒ³´æ´¢Ä£¿é:Page cache£¬ÓÃÓÚ´æ´¢Crawlerץȡµ½µÄÍøÒ³ÄÚÈÝ¡£
Ë÷ÒýÄ£¿é:½¨Á¢¹Ø¼ü´Êµ½Á´½ÓºÍÍøÒ³µÄË÷Òý¡£
¼ìË÷Ä£¿é£º½«Òª²éѯµÄÄÚÈÝ·Ö½âΪÊʺϲéѯµÄ´Ê¡£
Óû§½Ó¿Ú£º½ÓÊÜÓû§ÊäÈ룬´«µÝµ½¼ìË÷Ä£¿é¡£
½ÓÏÂÀ´µÄ¿Î³ÌÀïÎÒÃÇ»á¸ù¾ÝËùѧµÄPython֪ʶ¿ª·¢Ò»¸öСÐ͵ÄËÑË÷ÒýÇæ¡£Ãû×Ö½ÐSparrow¼´Âéȸ£¬Òâ˼ÊÇ“ÂéȸËäС£¬ÎåÔà¾ãÈ«”¡£ÎÒÃǵēÂéȸ”»áËæ×ÅÎÒÃÇ֪ʶµÄÔö¼Ó¶øÔ½·ÉÔ½¸ß£¬Ëµ²»¶¨»á±ä³É·ï»ËÄØ¡£µ±È»£¬ÏÖÔÚËü»¹Ã»ÓÐÆð·É¡£
ÈÃÎÒÃÇ¿ªÊ¼ËÑË÷ÒýÇæÖ®Âðɣ¡
Ê×ÏÈÎÒÃÇҪѧϰµÄÄ£¿éÊÇÍøÒ³×¥È¡Ä£¿é£¨Crawler£©£¬ÓÖ½Ð×öÍøÂçÖ©Ö루Spider£©¡£
Õâ¸öÄ£¿éÓÉCrawlerÀàÀ´Íê³É£¬¸ÃÀà³õʼ»¯Ê±Ê×ÏȽÓÊÜCrawler controlÄ£¿é´«µÝµÄurl£¬Ö´ÐÐÍê±Ï×îºó·µ»ØÍøÒ³ÄÚÈÝpageºÍÍøÒ³ÄÚ³öÏÖµÄurlÁ´½Ólink¡£Ô´ÂëÈçÏ£º
import urllib.request #ÓÃÓÚ»ñÈ¡ÍøÒ³ÄÚÈÝ
import urllib.parse #½âÎöÍøÖ·µÄÄ£¿é
import re #ÕýÔò±í´ïʽ
import queue #²Ù×÷¶ÓÁеÄÄ£¿é
class Crawler(object): #ÍøÂç×
Ïà¹ØÎĵµ£º
×ܽáÏ£¬Python ÏÂÔØÍøÒ³µÄ¼¸ÖÖ·½·¨
1
fd = urllib2.urlopen(url_link)
data = fd.read()
ÕâÊÇ×î¼ò½àµÄÒ»ÖÖ£¬µ±È»Ò²ÊÇGetµÄ·½·¨
2
ͨ¹ýGETµÄ·½·¨
def GetHtmlSource(url):
try:
htmSource = ''
&nb ......
1¡¢strÀàÐÍ¿ÉÒÔÀí½âΪһ¸ö¶þ½øÖÆblock£¬»òmultibyte
2¡¢multibyte_str.decode("<multibyte_encode_method>") -> unicode
3¡¢unicode_str.encode("<multibyte_encode_method>") -> multibyte_str(binary block)
4¡¢unicode_str µÄ²Ù×÷²ÎÊýҲӦΪunicode£¬È磺unicode_str.find("Ñù±¾".deco ......
PythonÖÐreactor,factory,protocolµÄѧϰ±Ê¼Ç
×îΪ¼òµ¥µÄÇé¿öÏ£¬³ýÁËÁ˽âÇåreactorµÄ¼òµ¥Ê¹Óã¬Ä㻹ҪÁ˽âProtocolºÍFactory¡£ËüÃÇ×îÖÕ¶¼»áÓÉreactorµÄÕìÌý½¨Á¢ºÍrunÀ´Í³Ò»µ÷¶ÈÆðÀ´¡£
½¨Á¢·þÎñÆ÷µÄµÚÒ»¸öÒª½â¾öµÄÎÊÌâ¾ÍÊÇ·þÎñÓëÍâ½çµÄ½»Á÷Ð ......
1£¬ÏÂÔØorg.python.pydev.feature-1.5.0.1251989166.zip http://sourceforge.net/projects/pydev/files/
2£¬°²×°²å¼þµ½eclipse
3£¬ÖØÆôeclipse
×¢Ò⣺ʹÓÃ1.5.6°æ±¾pydev²å¼þ£¬´´½¨python¹¤³Ì»á±¨´í£¬Ê¹ÓÃ1.5.0°æ±¾ÎÞ´ËÎÊÌâ¡£ ......
Install Python Eric IDE
1 Download following things
1) Python3.1
2) PyQt for python 3.1
(http://www.riverbankcomputing.co.uk/software/pyqt/download) I am using
PyQt-Py3.1-gpl-4.7.3-2.exe
3) Eric5 IDE
(http://eric-ide.python-projects.org/eric-download.html)
2 ......