程序 python 抓取新浪读书频道小说
二进制文件下载地址:
SinaGetBook
效果如图:
代码:
#!/usr/bin/env python
#coding=utf-8
#!/usr/bin/env python
#coding=utf-8
import traceback
import sys
import wx
import re
import urllib
import wx.richtext as rt
import wx.lib.buttonpanel as bp
import Casing
import Debug
def trace_back():
try:
return traceback.print_exc()
except:
return ''
class Window(wx.Frame):
def __init__(self):
sys.setdefaultencoding("utf-8")
wx.Frame.__init__(self,None,-1,u'新浪网图书频道抓取工具',pos=wx.Point(0, 0),size=(800,620))
l1 = wx.StaticText(self, -1, u"目录URL:")
self.t1 = wx.TextCtrl(self, -1, "http://vip.book.sina.com.cn/book/?book=27633", size=(500, -1))
l2 = wx.StaticText(self, -1, u"内容URL前缀:")
self.t2 = wx.TextCtrl(self, -1, "http://vip.book.sina.com.cn/book/", size=(500, -1))
l3 = wx.StaticText(self, -1, u"替换的内容:")
self.t3 = wx.TextCtrl(self, -1,
u"阅读‘刘猛’的其他作品: \n"
u"http://vip.book.sina.com.cn/book/?book=39011《狼牙》作者新作:冰是睡着的水\n"
u"http://vip.book.sina.com.cn/book/?book=41217刘猛展示狙击手神秘生活:刺客\n"
u"http://vip.book.sina.com.cn/book/?book=38884中国特种部队生存实录:狼牙\n"
u"http://vip.book.sina.com.cn/book/?book=43226刘猛最新力作:如临大敌",
size=(500, 100), style=wx.TE_MULTILINE|wx.TE_PROCESS_ENTER)
self.t3.SetInsertionPoint(0)
l4 = wx.StaticText(self, -1, u"内容")
#self.t4 = wx.TextCtrl(self, -1,"",
# size=(600, 400), style=wx.TE_MULTILINE|wx.TE_PROCESS_ENTER)
self.t4 = rt.RichTextCtrl(self,-1,"",size=(600, 400), style=wx.VSCROLL|wx.HSCROLL|wx.NO_BORDER);
#self.t4.SetInsertionPoint(0)
self.b = wx.Button(self, -1, u"开始抓取")
self.Bind(wx.E
相关文档:
filename=raw_input('enter file name:')
f=open(filename,'rb')
f.seek(0,0)
index=0
for i in range(0,16):
print "%3s" % hex(i) ,
print
for i in range(0,16):
print "%-3s" % "#" ,
print
while True:
temp=f.read(1)
if len(temp) == 0:
break
else:
print "%3s" % temp.encode('hex'),
......
正则表达式
具体的参考手册,这里记下一些小问题:
1、re对象的方法
match Match a regular expression pattern to the beginning of a string.
search re.search(pattern, string, flags) flags:re.I re.M re.X re.S re.L re.U
sub Substitute oc ......
ZoundryDocument
Python skin is known for its color variations and for its elasticity; it is
the warmest leather of the season and ideal for the manufacture of many luxury
goods. Sometimes natural patterns can be hidden when they're done in black, but
the finish here has a bit of a shine to it ......
1. 事件驱动
一个事件及其回调的例子是鼠标移动。我们假设鼠标指针停在您GUI 程序的某处。如果鼠标被移到了程序的别处,一定是有什么东西引起了屏幕上指针的移动,从而表现这种位置的转移。系统必须处理这些鼠标移动事件才能展现(并实现)鼠标在窗口上的移动。一旦您释放了鼠标,就不再会有事件需要处 ......
昨天试了下用HTMLParser类来解析网页,可发现结果并不理想。不管怎么说,先写下过程,希望后来人能在此基础上解决我所遇到的问题。
写了2套解决方案,当然这2套只能对特定网站有效。我这里主要说明下对BBC主页www.bbc.co.uk和对网易www.163.com的解析。
对于BBC:
这套要简单得多,可能是该网页的编码比较标准吧
import ......