What is the best way to analyze a non-flat file format in Java?


I am attempting to parse a nested file format in Java.

The file format looks like this:

head [

    A [
        property value
        property2 value
        property3 [
            ... down the rabbit hole ...

    ... more As ...

    B [
        .. just the same as A

    ... more Bs ...

What is the best/easiest technique to parse this into my program?

  • Finite State Machine?

  • Manually read it word by word and keep track of what part of the structure I am in?

  • Write a grammar...?

As a side note, I have no control over the format - because I knew someone would say it!

If the grammar is indeed nested like this, writing a very simple top-down parser would be a trivial task: you have very few tokens to recognize, and the nested structure repeats itself very conveniently for a textbook recursive-descent parser.

I would not even bother with ANTLR or another parser generator for something this simple, because the learning curve would eat the potential benefits for the project* .

* Potential benefits for you from learning a parser generator are hard to overestimate: if you can spend a day or two learning to build parsers with ANTLR, your view of structured text files will change forever.