发布网友
共2个回答
热心网友
import re
#id=45717
common_log_format_regex = re.compile('id=\d+')
files = open("aaa.txt",'r',encoding = 'utf-8')
lines = files.readlines()
txt = ''.join(lines)
files.close()
data = common_log_format_regex.findall(txt)
writer = open("id.txt",'w',encoding = 'utf-8')
writer.write('\n'.join(data))
writer.close()
热心网友
#!/usr/bin/env python
#coding=utf-8
import re
string = '''
<ul id="iyy_speak">
<li><a href="player.php?type=2&id=46819" target="_blank" >中文内容</a><span class="yy_man"><a href="player.php?type=2&id=46819" target="_blank">试听</a></span></li>
<li><a href="player.php?type=2&id=46818" target="_blank" >中文内容2</a><span class="yy_man"><a href="player.php?type=2&id=46818" target="_blank">试听</a></span></li>
<li><a href="player.php?type=2&id=45717" target="_blank" >中文内容3</a><span class="yy_man"><a href="player.php?type=2&id=45717" target="_blank">试听</a></span></li>
</ul>
'''
match = re.findall("<li><a href=\"player.php\?type=2\&id=(\d+)\".*?>(.*?)\<",string)
for x,y in match:
print x,y.decode('utf-8')
这样? 感觉怪怪的。。。随意了。。。
追问确实怪怪的。是要获取网页内容取得 ID 和中文内容,一页有12行,要获取12次,然后保存追答网页就用urllib2抓一下,然后你把网页内容就存到string里去呗,然后用这个正则处理一下,正则写的应该没啥问题。。。