What is the preferred way to cut off random characters at the end of a string in Python?
I am trying to simplify a list of URLs to do some analysis and therefore need to cut-off everything that comes after the file extension ".php"
Since the characters that follow after ".php" are different for each URL using strip() doesn't work. I thought about regex and substring(). But what would be the most efficient way to solve this task ?
Let's say I have the following URLs:
And I want the output to be:
Thanks for your advice !
There are two ways to accomplish what you want.
If you know how the string ends:
In your example, if You know that the string ends with
.php? then all you need to do is:
If you don't know how the string ends:
In this case you can use urlparse and take everything but the parameters.
from urlparse import urlparse for url is urls: p = urlparse(url) print p.scheme + p.netloc + p.path