Retrieve the content of a web page

advertisements

This question already has an answer here:

  • How do you Programmatically Download a Webpage in Java 10 answers
  • How to use java.net.URLConnection to fire and handle HTTP requests 12 answers

I'd like to fetch a webpage and save the content as a string? Is there a library to do that? I want to use the string for some a program I am building. It's for websites, that don't necessarily provide rss feed.


i think you need this

URL url = new URL("http://www.google.com/");
URLConnection con = url.openConnection();
InputStream in = con.getInputStream();
String encoding = null; // con.getContentEncoding(); *** WRONG: should use "con.getContentType()" instead but it returns something like "text/html; charset=UTF-8" so this value must be parsed to extract the actual encoding
encoding = encoding == null ? "UTF-8" : encoding;
String body = IOUtils.toString(in, encoding);
System.out.println(body);