How to wait while parallelStreams is running?

advertisements

So, let's deal now. I have Parser class where I can get links from html. I've used Jsoup to get links and then I want to add this links to List. I'm using StreamAPI to proceed that. Look at the following code fragment:

public static List<Link> getLinksFromURL(String URL) {
    List<Link> links = new ArrayList<>();
    Elements elements = selectElements(URL, "a[href]");
    elements.parallelStream().forEach((Parser) -> addLink(links, elements.attr("href")));
    return links;
}

It works. But what am I dealing with? When I tested this method I notice that streams execution may be finish for a while. And I can get wrong data in my test class. E.g. I check List size. During debuging all is fine. But in runtime I get few elements instead of all links because streams haven't been executed yet. Simple test. But size()==3 or less sometime:

@Test
public void getLinksFromURLTest() {
    Assert.assertTrue(Parser.getLinksFromURL("testLinks").size()==4);
}

And my question is how I can wait while this streams are executing? I need to get all links :)

Small notice: I get html from local html that represent on Spark server that I launch for tests.

P.S.: if I've not written understandable, please, let me know and I add explanations.

I'll be so thankful if you help me. Good luck everyone! :)

Update: addLink method

private static void addLink(List<Link> links, String URL) {
    if (!URL.isEmpty() && isLink(URL) && !hasSameLink(links, URL)){
        links.add(new Link(URL));
    }
}


You're making a mistake in your forEach() call. For each element, you're calling Elements.attr() on the elements variable instead of Element.attr() on the current item. You probably meant to do something like this:

forEach(element -> addLink(links, element.attr("href")))

Regardless, your code is not thread safe. You can't have multiple threads writing to a regular ArrayList without some form of synchronization (which would likely kill the benefit of parallelization). That's also the likely reason why you're seeing inconsistent results in your testing. You should either use a thread-safe collection or just a stick with a sequential iteration.

Alternatively, you may be able to convert all of your logic to a stream pipeline and use a collector instead:

return elements.parallelStream()
        .filter(url -> !url.isEmpty())
        .filter(url -> isLink(url))
        .distinct()
        .map(Link::new)
        .collect(Collectors.toList());