How to analyze a string in variables?

advertisements

I know how to parse a string into variables in the manner of this SO question, e.g.

ABCDE-123456

becomes:

var1=ABCDE
var2=123456

via, say, cut. I can do that in one script, no problem.

But I have a few dozen scripts which parse strings/arguments all in the same fashion (same arguments & variables, i.e. same parsing strategy). And sometimes I need to make a change or add a variable to the parsing mechanism.

Of course, I could go through every one of my dozens of scripts and change the parsing manually (even if just copy & paste), but that would be tedious and more error-prone to bugs/mistakes.

Is there a modular way to do parse strings/arguments as such?

I thought of writing a script which parses the string/args into variables and then exports, but the export command does not work form child-to-parent, (only vice-versa).


Something like this might work:

parse_it () {
    SEP=${SEP--}
    string=$1
    names=${@:2}

    IFS="$SEP" read $names <<< "$string"
}

$ parse_it ABCDE-123456 var1 var2
$ echo "$var1"
ABCDE
$ echo "$var2"
123456
$ SEP=: parse_it "foo:bar:baz" id1 id2 id3
$ echo $id2
bar

The first argument is the string to parse, the remaining arguments are names of variables that get passed to read as the variables to set. (Not quoting $names here is intentional, as we will let the shell split the string into multiple words, one per variable. Valid variable names consist of only _, letters, and numbers, so there are no worries about undesired word splitting or pathname generation by not quoting $names). The function assumes the string uses a single separator of "-", which can be overridden via the environment.

For more complex parsing, you may want to use a custom regular expression (bash 4 or later required for the -g flag to declare):

parse_it () {
    reg_ex=$1
    string=$2
    shift 2
    [[ $string =~ $reg_ex ]] || return
    i=1
    for name; do
        declare -g "$name=${BASH_REMATCH[i++]}"
    done
}

$ parse_it '(.*)-(.*):(.*)' "abc-123:xyz" id1 id2 id3
$ echo "$id2"
123