Regex cuts URLs in text that does not have a separator


Apologies for yet another regex question!

I have some input text which rather unhelpfully has multiple urls (only urls) all on one line with no separators\n

this example contains just two urls, but it could be more.

I'm trying to separate the urls, into a list using python

I've tried searching for solutions and tried a few but can't get this to work exactly, as they greedily consume all following urls.

I realise that's probably because https://... could probably be legally allowed in the query part of a url, but in my case I'm willing to assume it can't, and assume that when it occurs it's the start of the next url.

I also tried (http[s]://.*?) but that with and without the ? either makes it get the whole bit of text or just the https://

You need to use a positive lookahead assertion.

>>> s = "\n"
>>> re.findall(r'https?://.*?(?=https?://|$|\s)', s)
['', '']