How can I separate this into two strings?

advertisements

I am new to Python, I am not sure what should I be looking for but I assure you I have done my research and still came up with a rather ugly 20 lines long block of code for this simple issue.

I am processing a traversal URL with my app based on Pyramid framework.

Now, the URL can be these: (url = None)

  1. url = ""
  2. url = "/"
  3. url = "/block_1"
  4. url = "/block_1/"
  5. url = "/block_1/block_2"
  6. url = "/block_1/block_2/"

The url can contain nothing. In this case, I want my function to return False, None, or an empty list or tuple. (Does not matter which.) (matching options 0 or 1)

Block_1: This is a single word, a to Z string. Can not and should not contain any special characters. In fact, what's fetched as block_1, should be in a dictionary (or a list) and if not found, an error should be raised and returned. If block_1 is not there or not found, the function, as stated above, should return False, None or empty list or tuple. (matching options 2 and 3)

Block_2: Block_2 can be anything. For simplicity, it can contain any characters of any languages along with special characters such as: ()[]. Excuse me if I'm mistaken but I think what I want is basically for it to match [\pL\pN].*, with one exception: its last character can not be either of slashes: neither "\" nor "/". Preferably, I would like it to be a to Z (including all languages' alphabets and their accented characters) along with some other characters from a list (which I define specially as mentioned above: () and []). If block_2 is not given it should have the value None and if it's not matched, it should return False. (matching last 2 options listed above)

My code starts with, rather primitively for which I apologise:

if not url:
    return False
# then goes on evaluating the first charachter to see if it's a /
if fetch[0]  == '/':
    length = len(url)
    #then checks if there's a second / for the block_2
    slash_2 = fetch.find('/', 3) # or '/', 1
    if slash_2 == -1:
        block_1, block_2 = url[1:length].lower(), None
        # checks if block_1 is in a dictionary
        if not block_1 in the_dict:
            return False
    else: # if it's there it processes what's remaining
        block_1 = fetch[1:slash_2]
        block_2 = fetch[slash_2+1:]
        # then checks if there's another slash at the end of block_2
        if block_2[-1] == '/': # if so it removes it
            block_2 = block_2[:-1]
return False # otherwise returns false, which can be () or [] or None

I'm sorry if my code is terrible and over complicated. I would love nothing more than a more elegant and better way to do this.

So how can I do it? What should I do to get rid of this jammed lines of code?

Thank you.


split('/') should definitely be used and that should help you parse the URL.

If that is not sufficient, urlparse should be used to parse

urlparse.urlparse(path)

In [31]: url = 'http://stackoverflow.com/questions/12809298/how-can-i-separate-this-into-two-strings/12809315#12809315'

In [32]: urlparse.urlparse(url)
Out[32]: ParseResult(scheme='http', netloc='stackoverflow.com', path='/questions/12809298/how-can-i-separate-this-into-two-strings/12809315', params='', query='', fragment='12809315')

In [33]: a = urlparse.urlparse(url)

In [34]: a.path
Out[34]: '/questions/12809298/how-can-i-separate-this-into-two-strings/12809315'

In [35]: a.path.split('/')
Out[35]:
['',
 'questions',
 '12809298',
 'how-can-i-separate-this-into-two-strings',
 '12809315']