For a larger project, I'm currently in the process of writing a Stanford polygon file (PLY) parser. The example at Github Gists is currently capable of parsing ASCII-format PLY files into a data abstraction
Mesh. It also contains a description of the actual grammar, for those inclined.
However the format definition (PLY - Polygon File Format) also includes two binary formats (little and big endian). Since those two formats are much more common (and storage-space efficient), I would like to be able to parse those files with
pyparsing as well.
I'm grateful for some advice on how to do that, if at all possible.
The idea of the binary PLY files is that, the header portion consists of an ASCII description of the actual data of the file, and the body contains the actual data. An example (data in brackets are hex bytes):
ply format binary_little_endian 1.0 element vertex 1 property float x property float y property float z property uchar red property uchar green property uchar blue property uchar alpha end_header [84 72 F1 C1 D8 FD 9F C1 00 00 00 00 3B 45 CB FF]
My first approach was to just load the input file in binary format (using
bytes instead of
str), and adapt the parser accordingly, but this somehow throws
pyparsing off track. Also, I don't really know how to tell
pyparsing how to grok byte groups.
File "components.py", line 338, in create mesh = PlyParser.create().load(mesh_path) File "model_parser.py", line 120, in create property_position = aggregate_property("position", b"x", b"y", b"z") File "model_parser.py", line 113, in aggregate_property aggregates.append(pp.Group(property_simple_prefix + keyword_or(*keywords)("name"))) File "model_parser.py", line 87, in keyword_or return pp.Or(pp.CaselessKeyword(literal) for literal in keywords) File "pyparsing.py", line 3418, in __init__ super(Or,self).__init__(exprs, savelist) File "pyparsing.py", line 3222, in __init__ exprs = list(exprs) File "model_parser.py", line 87, in <genexpr> return pp.Or(pp.CaselessKeyword(literal) for literal in keywords) File "pyparsing.py", line 2496, in __init__ super(CaselessKeyword,self).__init__( matchString, identChars, caseless=True ) File "pyparsing.py", line 2422, in __init__ self.matchLen = len(matchString) TypeError: object of type 'int' has no len()
What you might want to try is to open the file as text, use pyparsing to parse the header and capture the end position of the "end header" token. Use the structure information extracted from the header to build a Python struct reader that will process the binary content. Then reopen the file as binary, seek to the position, and use the struct reader to load the binary content. Probably simpler than twisting pyparsing to be both text and binary.