PawnScraper






A powerful scraper plugin that provides interface for utlising html_parsers and css selectors in pawn.
Installing
Thanks to Southclaws,plugin installation is now much easier with sampctl
sampctl p install Sreyas-Sreelal/pawn-scraper
OR
Download suitable binary files from releases for your operating system
Add it your plugins folder
Add PawnScraper to server.cfg or? PawnScraper.so (for linux)
Add pawnscraper.inc in includes folder
Building
API
ParseHtmlDocument(document[])]
Params
Returns
Example Usage
new Html:doc = ParseHtmlDocument("\
<!DOCTYPE html>\
<meta charset=\"utf-8\">\
<title>Hello, world!</title>\
<h1 class=\"foo\">Hello, <i>world!</i></h1>\
");
ASSERT(doc != INVALID_HTML_DOC);
DeleteHtml(doc);
ResponseParseHtml(Response:id)
HttpGet(url[],Header:headerid=INVALID_HEADER)
HttpGetThreaded(playerid,callback[],url[],Header:headerid=INVALID_HEADER)
Params
playerid - id of the player
callback[] - name of the callback function to handle the response.
url[] - Url of a website
header - id of header object created using CreateHeader
Example Usage
HttpGetThreaded(0,"MyHandler","https://sa-mp.com");
//********
forward MyHandler(playerid,Response:responseid);
public MyHandler(playerid,Response:responseid){
? ? ASSERT(responseid != INVALID_HTTP_RESPONSE);
? ? DeleteResponse(responseid);
}
ParseSelector(string[])
CreateHeader(?)
Params
Returns
Example Usage
new Header:header = CreateHeader(
? ? "User-Agent","Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"
);
ASSERT(header != INVALID_HEADER);
new Response:response = HttpGet("https://sa-mp.com/",header);
ASSERT(response != INVALID_HTTP_RESPONSE);
ASSERT(DeleteHeader(header) == 1);
GetNthElementName(Html:docid,Selector:selectorid,idx,string[],size = sizeof(string))
Params
docid - Html instance id
selectorid - CSS selector instance id
idx - the n?th occurence of element in the document (starts from 0)
string[] - element name is stored
size - sizeof string
Returns
1 if successful
0 if failed
Example Usage
new Html:doc = ParseHtmlDocument("\
? ? <!DOCTYPE html>\
? ? <meta charset=\"utf-8\">\
? ? <title>Hello, world!</title>\
? ? <h1 class=\"foo\">Hello, <i>world!</i></h1>\
");
ASSERT(doc != INVALID_HTML_DOC);
new Selector:selector = ParseSelector("i");
ASSERT(selector != INVALID_SELECTOR);
new i= -1,element_name;
while(GetNthElementName(doc,selector,,element_name)!=0){
? ? ASSERT(strcmp(element_name,"i") == 0);
}
DeleteSelector(selector);
DeleteHtml(doc);
GetNthElementText(Html:docid,Selector:selectorid,idx,string[],size = sizeof(string))
Params
Returns
1 if successful
0 if failed
Example Usage
new Html:doc = ParseHtmlDocument("\
? ? <!DOCTYPE html>\
? ? <meta charset=\"utf-8\">\
? ? <title>Hello, world!</title>\
? ? <h1 class=\"foo\">Hello, <i>world!</i></h1>\
");
ASSERT(doc != INVALID_HTML_DOC);
new Selector:selector = ParseSelector("h1.foo");
ASSERT(selector != INVALID_SELECTOR);
new element_text;
ASSERT(GetNthElementText(doc,selector,0,element_text) == 1);
new check = strcmp(element_text,("Hello, world!"));
ASSERT(check == 0);
DeleteSelector(selector);
DeleteHtml(doc);
GetNthElementAttrVal(Html:docid,Selector:selectorid,idx,attribute[],string[],size = sizeof(string))
Params
docid - Html instance id
selectorid - CSS selector instance id
idx - the n?th occurence of element in the document (starts from 0)
attribute[] - the attribute of element
string[] - element name
size - sizeof string
Returns
1 if successful
0 if failed
Example Usage
new Html:doc = ParseHtmlDocument("\
<!DOCTYPE html>\
<meta charset=\"utf-8\">\
<title>Hello, world!</title>\
<h1 class=\"foo\">Hello, <i>world!</i></h1>\
");
ASSERT(doc != INVALID_HTML_DOC);
new Selector:selector = ParseSelector("h1");
ASSERT(selector != INVALID_SELECTOR);
new element_attribute;
ASSERT(GetNthElementAttrVal(doc,selector,0,"class",element_attribute) == 1);
new check = strcmp(element_attribute,("foo"));
ASSERT(check == 0);
DeleteSelector(selector);
DeleteHtml(doc);
DeleteHtml(Html:id)
Params
Returns
1 if successful
0 if failed
DeleteSelector(Selector:id)
Params
Returns
1 if successful
0 if failed
DeleteResponse(Html:id)
Params
Returns
1 if successful
0 if failed
DeleteHeader(Header:id)
Params
Returns
1 if successful
0 if failed
Example Usage
A small example to fetch all links in wiki.sa-mp.com
new Response:response = HttpGet("https://wiki.sa-mp.com");
if(response == INVALID_HTTP_RESPONSE){
printf("HTTP ERROR");
return;
}
new Html:html = ResponseParseHtml(response);
if(html == INVALID_HTML_DOC){
DeleteResponse(response);
return;
}
new Selector:selector = ParseSelector("a");
if(selector == INVALID_SELECTOR){
DeleteResponse(response);
DeleteHtml(html);
return;
}
new str,i;
while(GetNthElementAttrVal(html,selector,i,"href",str)){
printf("%s",str);
;
}
//delete created objects after the usage..
DeleteHtml(html);
DeleteResponse(response);
DeleteSelector(selector);
The same above with threaded http call would be
HttpGetThreaded(0,"MyHandler","https://wiki.sa-mp.com");
//...
forward MyHandler(playerid,Response:responseid);
public MyHandler(playerid,Response:responseid){
if(responseid == INVALID_HTTP_RESPONSE){
printf("HTTP ERROR");
return 0;
}
new Html:html = ResponseParseHtml(responseid);
if(html == INVALID_HTML_DOC){
DeleteResponse(response);
return 0;
}
new Selector:selector = ParseSelector("a");
if(selector == INVALID_SELECTOR){
DeleteResponse(response);
DeleteHtml(html);
return 0;
}
new str,i;
while(GetNthElementAttrVal(html,selector,i,"href",str)){
printf("%s",str);
;
}
DeleteHtml(html);
Delete(response);
DeleteSelector(selector);
return 1;
}
More examples can be found in examples
Repository
https://github.com/Sreyas-Sreelal/pawn-scraper
Note
The plugin is in primary stage and more tests and features needed to be added.I?m open to any kind of contribution, just open a pull request if you have anything to improve or add new features.
Special thanks