Home Articles FAQs XREF Games Software Instant Books BBS About FOLDOC RFCs Feedback Sitemap
irt.Org
#

Q1318 Is there a way of screen scrapping the information on a html page, using JavaScript?

You are here: irt.org | FAQ | JavaScript | File | Q1318 [ previous next ]

In Internet Explorer use:

<iframe frameborder=0 width="0" height="0" marginheight=0 marginwidth=0 NAME="iframe" scrolling=no src="page_to_be_scrapped.htm"></iframe>

<script language="JavaScript"><!--
if (window.frames.length > 0) {
    alert(window.frames['iframe'].document.body.innerHTML);
}
//--></script>

In Netscape Navigator go to http://jshelper.pharlap.com and follow instructions for the server side assists:

<html>
<head>
<title></title>
<script language="JavaScript" src="http://jshelper.pharlap.com/netutils/httpget.js?http://www.nytimes.com/"></script>

<script language="JavaScript" type="text/javascript">
function scrapeHeadlines() {
    var searchStart = "<NYT_HEADLINE>";
    var searchEnd = "</NYT_HEADLINE>";
    aNews=FileContents.split(searchStart);
    for (i=1;i<aNews.length;i++) {
        aHeadlineOnly=aNews[i].split(searchEnd);
        document.write(aHeadlineOnly[0]);
    }
}
//--> </script>
</head>

<body onLoad="scrapeHeadlines()">

<b><u>The headlines are:</u></b><br><br>

</body>
</html>

or by using a signed script and LiveConnect:

<script language="JavaScript" type="text/javascript">
function fetchURL(url) {
    if ((location.host == '' && url.indexOf(location.protocol) == -1)  ||
       url.indexOf(location.host) == -1) {
        netscape.security.PrivilegeManager.enablePrivilege('UniversalConnect');
    }
    var dest = new java.net.URL(url);
    var dis = new java.io.DataInputStream(dest.openStream());
    var res = '';
    while ((line = dis.readLine()) != null) {
        res += line;
        res += java.lang.System.getProperty('line.separator');
    }
    dis.close();
    return res;
}

alert(fetchURL(location.href));
//--> </script>

But it needs to be signed or otherwise trusted for locations other than the one the script is loaded from.

©2018 Martin Webb