How to tear Random Tanks at the end of a Channel with Regex / Strip () in Python?

advertisements

What is the preferred way to cut off random characters at the end of a string in Python?

I am trying to simplify a list of URLs to do some analysis and therefore need to cut-off everything that comes after the file extension ".php"

Since the characters that follow after ".php" are different for each URL using strip() doesn't work. I thought about regex and substring(). But what would be the most efficient way to solve this task ?

Example:

Let's say I have the following URLs:

example.com/index.php?random_var=random-19wdwka
example.org/index.php?another_var=random-2js9m2msl

And I want the output to be:

example.com/index.php
example.org/index.php

Thanks for your advice !


There are two ways to accomplish what you want.

If you know how the string ends:

In your example, if You know that the string ends with .php? then all you need to do is:

my_string.split('?')[0]

If you don't know how the string ends:

In this case you can use urlparse and take everything but the parameters.

from urlparse import urlparse

for url is urls:
    p = urlparse(url)
    print p.scheme + p.netloc + p.path