I used TagSoup some years ago, but last week I came across ‘JSoup’. It also allows parsing of ‘real world HTML’, and comes with a really neat API to download and select subsets of your document.
See for yourself:
String url = “https://javaspecialists.teachable.com/p/refactoring2j8”;
Document doc = Jsoup.connect(url).get();
Elements items = doc.select(“a.item”);