The Java library "JTidy" offers capabilities to read webpages from URLs.
First, create a new Tidy-Object and an InputStream in. Call the method parseDOM() to get a new DOM-Object of the webpage.
Tidy tidy = new Tidy();
String urlString = "URL of the desired webpage";
URL url = new URL(urlString);
URLConnection uc = url.openConnection();
BufferedReader in = new BufferedReader(new InputStreamReader(uc.getInputStream()));
org.w3c.dom.Document node = tidy.parseDOM( in, null );
The API documentation is available online, see http://jtidy.sourceforge.net/apidocs/index.html
Java offers capabilities to read webpages from URLs.
First, store the URL of the webpage in a String. Instantiate a new instance of the built-in Java class URL and pass the string as parameter in the constructor. Call the method openConnection() to get a new URLConnection object. Use that object to create a new BufferedReader and read the webpage line by line.
String urlString = "URL of the desired webpage";
URL url = new URL(urlString);
URLConnection uc = url.openConnection();
BufferedReader in = new BufferedReader(new InputStreamReader(uc.getInputStream()));
String inputLine;
String htmlText = "";
while ((inputLine = in.readLine()) != null) {
htmlText = htmlText + inputLine;
}
in.close();