Divide the list of possibly nested function expressions into Python

advertisements

PostgreSQL allows indexes to be created on expressions, e.g., CREATE INDEX ON films ((lower(title))). It also has a pg_get_expr() information function that translates the internal format of the expression into text, i.e., lower(title) in the former example. The expressions can get quite hairy at times. Here are some examples (in Python):

sample_exprs = [
    'lower(c2)',
    'lower(c2), lower(c3)',
    "btrim(c3, 'x'::text), lower(c2)",
    "date_part('month'::text, dt), date_part('day'::text, dt)",
    '"substring"(c2, "position"(c2, \'_begin\'::text)), "substring"(c2, "position"(c2, \'_end\'::text))',
    "(((c2)::text || ', '::text) || c3), ((c3 || ' '::text) || (c2)::text)",
    'f1(f2(arga, f3()), arg1), f4(arg2, f5(argb, argc)), f6(arg3)']

The last item isn't really from Postgres but is just an extreme example of what my code ought to handle.

I wrote a Python function to split the textual lists into the component expressions. For example, that last item is broken down into:

 f1(f2(arga, f3()), arg1)
 f4(arg2, f5(argb, argc))
 f6(arg3)

I experimented with str methods like find() and count() and also considered regexes, but in the end I wrote a function that is what I would have written in C (essentially counting open and close parens to find where to break the text). Here's the function:

def split_exprs(idx_exprs):
    keyexprs = []
    nopen = nclose = beg = curr = 0
    for c in idx_exprs:
        curr += 1
        if c == '(':
            nopen += 1
        elif c == ')':
            nclose += 1
            if nopen > 0 and nopen == nclose:
                if idx_exprs[beg] == ',':
                    beg += 1
                if idx_exprs[beg] == ' ':
                    beg += 1
                keyexprs.append(idx_exprs[beg:curr])
                beg = curr
                nopen = nclose = 0
    return keyexprs

The question is whether there is a more Pythonic or elegant way to do this or to use regexes to solve this.


Here is my version, more pythonic, less clutter I think, and works on stream of chars , though I don't see any advantage in that :)

def split_fns(fns):
    level = 0
    stack = [[]]
    for ch in fns:
        if level == 0 and ch in [' ',',']:
            continue
        stack[-1].append(ch)

        if ch == "(":
            level += 1
        elif ch == ")":
            level -= 1
            if level == 0:
                stack.append([])

    return ["".join(t) for t in stack if t]