0%

在VBA里实现的screen scripting

在VBA里写过了次爬虫,在其他语言里大概爬虫都很成熟了,姑且记一下.



为了储存HTMLDocument格式的object,先在reference里加入microsoft html object library.

1
2
3
4
5
6
Dim aWB As Control
Dim vFrame As HTMLDocument
Dim vBuffer Ass String
Dim vTimeLimit As Date
Dim aTimeOut As Integer
Dim vURL As String

在这里是用access里的web control作为object打开网站,也可以直接打开html的网站.

1
2
3
4
5
Set aWB = Me!WB ' the web control object
aWB.Object.Silent = True ' 让 web control不要报错
vURL = “www.google.ca”
aWB.ControlSource = "=(""" & vURL & """)"
aWB.Refresh

screen scripting的时候需要等网页两次.
一次是网页本体是否有load出来.
第二次是网页是否有load出所有的object.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
Public Sub WaitForBrowserReady(ByRef aWebObj As Object, Optional ByVal aTimeOut As Long = 10)
On Error GoTo ERR_Handler:

Dim vTimeLimit As Date

If aTimeOut > 0 Then
vTimeLimit = DateAdd("s", aTimeOut, VBA.Now)

Do While aWebObj.Busy Or aWebObj.ReadyState <> 4
If DateDiff("s", vTimeLimit, VBA.Now) > 0 Then
Err.Raise ' Add Err message here
End If
DoEvents
Loop
Else
Do While aWebControl.Object.Busy Or aWebControl.Object.ReadyState <> 4
DoEvents
Loop
End If

EXIT_CLEANUP:
Exit Sub
ERR_CleanUp:
On Error GoTo 0
Err.Raise ' Add Err message here
ERR_Handler:
' Store Err message
Resume ERR_CleanUp:
End Sub

等待page是否有load完全部就比较tricky,需要去某个page寻找“最后打印”出来的object,然后测是否有读出来.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
Public Sub WaitForPageReady(ByRef aWebObj As Object, ByVal aWebStr As String, Optional ByVal aTimeOut As Long = 10)
Dim vFrame As HTMLDocument
Dim vElement As Object
Dim vBuffer As Variant
Dim vTimeLimit As Date
vTimeLimit = DateAdd("s", aTimeOut, VBA.Now)

On Error Resume Next

Select Case aWebStr
Case "Page1" ' depends on different kind of page, the "last load object" is different
Do While vBuffer <> "theTarget"
Set vFrame = aWeb.Object.Document
Set vElement = vFrame.Document.GetElementsByClassName("some class name here")
vBuffer = vElement.innerText ' get the compare string
If DateDiff("s", vTimeLimit, VBA.Now) > 0 Then
Err.Raise ' Add Err message here
End If
DoEvents
Loop
End Select

EXIT_CleanUp:
On Error GoTo 0
Err.Clear
Exit Sub
ERR_CleanUp:
On Error GoTo 0
Err.Raise ' Add Err message here
ERR_Handler:
' Store Err message
Resume ERR_CleanUp:
End Sub

在pass这两个function后就可以开始scripting网页上的内容了.
注意,在access 网页object的时候,除了class name,还有frame的区别.
需要在“可以读取的frame”内,access 相应的object,不然读不到.
辨别方式:在inspect,select element in the page to inspect看看是否有不同的frame.

在这里查看属于哪一个frame.

注意vFrame.Document.GetElementsByClassName -> 根据不同需求,可以用getElementById等其他function.
Mentor的代码是自己搓了个类似WaitForBrowserReady的Function.
如果一定时间内getElement function没有get到target的内容,也time out 给error然后退出.



总结是爬虫跟其他代码区别,它是一种对着动态来写码(?),脑子里一定要分好如果有什么condition,代码是否能有对应的select case能跑得到.
然后就finger crossed(不是).



General Screen Scripting

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
Public Sub CreateSession(Option ByVal aProfileName As String = "")
Dim xErrNum As Long, xErrMsg As String

Dim vSessionCount As Long
Dim vDone As Boolean
Dim vI As Long
Dim vTimeOut As Date
Dim vSessionString As Strnig, vSessionChar As String
Dim vResult As Long

' sanitize input argument
aProfileName = RTrim(LTrim("" & aProfileName))

' ensure a profile was requested
If aProfileName = "" Then
Err.Raise EX_VALUE_NULL, ERR_SOURCE, "Unable to open session"
End If

' set the config file path and name
If aProfileName = "applicationA" Then
aProfileName = Environ("AppData") & "\thePath\ofTheApplication.exe"
Else
aProfileName = Environ("AppData") & "\thePath\" & aProfileName
End If

' free handle if it already exists
If mHasHandle = True Then
ReleaseHandle
End If

' use the connection manager to test for any open mainframe windows
Set theConnMgr = CreateObject("theShell")
theConnMgr.theClass.Refresh
vSessionCount = theConnMgr.theClass.Count

' create new session interface object
Set thePS = theConnMgr.autECLPS ' autPS
Set theWin = theConnMgr.window ' autECLSession
Set theOIA = theConnMgr.autECLOIA ' AutOIA

' find used session tags
vSessionString = ""
If vSessionCount > 0 Then
For vI = 1 To vSessionCount
vSessionString = vSessionString & theConnMgr.autECLConnList(vI).Name
Next
vSessionString = UCase(vSessionString)
End If

' find available session tags
If vSessionString = "" Then
vSessionChar = "Z"
Else
For vI = 90 To 65 Step -1 ' check for existing sessions from Z to A and use next available
If InStr(1, vSessionString, Chr(vI)) < 1 Then
vSessionChar = Chr(vI)
Exit For
End If
Next
End If

' use the connection manager to open a new session
mSessionID = Asc(vSessionChar) - 64
vResult = theConnMgr.OpenSession(SESSION_TYPE, mSessionID, aProfileName, SESSION_TIMEOUT, SESSION_PAINT_CNT)
If vResult Then
MsgBox "Error connecting to the system", 0
Exit Sub
End If

' wait for session to launch
vTimeOut = DateAdd("s", LAUNCH_TIMEOUT, VBA.Now)
vDone = False
Do Until vDone = True
' session manager never updates the 'count' after creation, so create it each time through the loop
theConnMgr.autECLConnList.Refresh

' attemp to find the created session
If theConnMgr.autECLConnList.Count > 0 Then
For vI = 1 To theConnMgr.autECLConnList.Count
If theConnMgr.autECLConnList(vI).Name = vSessionChar Then
mHandle = theConnMgr.autECLConnList(vI).Handle
thePS.SetConnectionByHandle mHandle
theOIA.SetConnectionByHandle mHandle
mHasHandle = True
vDone = True
Exit For
End If
Next
End If

' timeout if the session wasn't found in the given time frame
If vDone = False And vTimeOut < VBA.Now Then
Err.Raise EX_TIMEOUT, ERR_SOURCE, "Unable to get session handle within timeout limit"
End If

DoEvents
Loop

' give the mainframe up until the timeout limit to respond
If theOIA.WaitForInputReady(INTERFACE_TIMEOUT) = False Then
Err.Raise EX_TIMEOUT, ERR_SOURCE, "Session failed to respond within" & INTERFACE_TIMEOUT / 1000 & " second timeout limit"
End If

Exit Sub

ERR_CLEANUP:
On Error Resume Next
ReleaseHandle
Err.Clear
On Error GoTo 0
Err.Raise xErrNum, ERR_SOURCE, ErrMsg
ERR_HANDLER:
xErrNum = Err.Number
xErrMsg = Err.Description
Resume ERR_CLEANUP:
End Sub
1
2
3
4
Public Sub ReleaseHandle()
Set theConnMgr = Nothing ' and others
' set to 0 or "" or False
End Sub
1
2
3
4
5
6
Public Sub CloseSession()
If mHasHandle = True And mSessionID > 0 Then
theConnMgr.CloseSession SESSION_TYPE, mSessionID
End If
' and error cleaner too
End Sub