ASP

XMLHTTP批量抓取远程资料

可以在此基础上结合正则表达式做成更好的效果,希望大家能分享一下Xmlhttp的session共享技术

<html> 
<head> 
<title>AUTOGET</title> 
<meta http-equiv='Content-Type' content='text/html; charset=gb2312'> 
</head> 
<body bgcolor='#FFFFFF' style='font-family:Arial;font-size:12px'> 
<% 
"================================================= 
"FileName: Getit.Asp 
"Intro : Auto Get Data From Remote WebSite 
"createAt: 2002-02 Lastupdate:2004-09 
"DB Table : data 
"Table Field: 
" UID -> Long -> Keep ID Of the pages 
" UContent -> Text -> Keep Content Of the Pages(HTML) 
"================================================= 

Server.ScriptTimeout=5000 

"on error resume next 
Set conn = Server.createObject('ADODB.Connection') 
conn.open 'Provider=Microsoft.Jet.OLEDB.4.0;Data Source=' &amp; Server.MapPath('getit.mdb') 
Set rs = Server.createObject('ADODB.Recordset') 
sql='select * from data' 
rs.open sql,conn,1,3 

Dim comeFrom,myErr,myCount 

"======================================================== 
comeFrom='http://www.xxx.com/U.asp?ID=' 
myErr1='该资料不存在' 
myErr2='该资料已隐藏' 
"======================================================== 

"*************************************************************** 
" 只需要更改这里 i 的始点intMin和终点intMax,设定步长intStep 
" 每次区间设置成5万左右。估计要两个多小时。期间不需要人工干预 
"**************************************************************** 
intMin=0 
intMax=10000 
"设定步长 
intStep=100 

"========================================================== 
"以下代码不要更改 
"========================================================== 
Call GetPart (intMin) 
Response.write '已经转换完成' &amp; intMin &amp; '~~' &amp; intMax &amp; '之间的数据' 
rs.close 
Set rs=Nothing 
conn.Close 
set conn=nothing 
%> 
</body> 
</html> 
<%
"使用XMLHTTP抓取地址并进次内容处理 
Function GetBody(Url) 
Dim objXML 
On Error Resume Next 
Set objXML = createObject('Microsoft.XMLHTTP') 
With objXML 
.Open 'Get', Url, False, '', '' 
.Send 
GetBody = .ResponseBody 
End With 
GetBody=BytesToBstr(GetBody,'GB2312') 
Set objXML = Nothing 
End Function 
"使用Adodb.Stream处理二进制数据 
Function BytesToBstr(strBody,CodeBase) 
dim objStream 
set objStream = Server.createObject('Adodb.Stream') 
objStream.Type = 1 
objStream.Mode =3 
objStream.Open 
objStream.Write strBody 
objStream.Position = 0 
objStream.Type = 2 
objStream.Charset = CodeBase 
BytesToBstr = objStream.ReadText 
objStream.Close 
set objStream = nothing 
End Function 
"主函数 
Function GetPart(iStart) 
Dim iGo 
time1=timer() 
myCount=0 
For iGo=iStart To iStart+intStep 
If iGo<=intMax Then 
Response.Execute comeFrom &amp; iGo 
"进行简单的数据处理 
content = GetBody(comeFrom &amp; iGo ) 
content = Replace(content,chr(34),''') 
If instr(content,myErr1) or instr(content,myErr2) Then 
"跳过错误信息 
Else 
"写入数据库 
rs.AddNew 
rs('UID')=iGo 
"******************************** 
rs('UContent')=Replace(content,''',chr(34)) 
"********************************* 
rs.update 
myCount=myCount+1 
Response.Write iGo &amp; '<BR>' 
Response.Flush 
End If 
Else 
Response.write '<font color=red>成功抓取'&amp;myCount&amp;'条记录,' 
time2=timer() 
Response.write '耗时:' &amp; Int(FormatNumber((time2-time1)*1000000,3)) &amp; ' 秒</font><BR>' 
Response.Flush 
Exit Function 
End If 
Next 
Response.write '<font color=red>成功抓取'&amp;myCount&amp;'条记录,' 
time2=timer() 
Response.write '耗时:' &amp; CInt(FormatNumber((time2-time1),3)) &amp; ' 秒</font><BR>' 
Response.Flush 
"递归 
GetPart(iGo+1) 
End Function%>
分类: ASP

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注

这个站点使用 Akismet 来减少垃圾评论。了解你的评论数据如何被处理

相关文章

开始在上面输入您的搜索词,然后按回车进行搜索。按ESC取消。

返回顶部