I'm looking for an elegant way to extract nested data from a MATLAB data structure


Using MATLAB, other than the brute force technique of using nested FOR loops, I am curious if there is a more elegant means of extracting the X & Y data from the sample data structure that I have shown below. I haven't been able to devise an elegant way of doing this in MATLAB using bsxfun, arrayfun, or strucfun.

% Create an example of the input structure that I need to parse
for i =1:100
    setName = ['n' num2str(i)];
    for j = 1:randi(10,1)
        repName = ['n' num2str(j)];
        data.sets.(setName).replicates.(repName).X = i + randn();
        data.sets.(setName).replicates.(repName).Y = i + randn();

clearvars -except data

% Brute force technique using nested FOR Loops to extract X & Y from this
% nested structure for easy plotting. Is there a better way to extract the
% X & Y values created above without using FOR loops?

n = 1;
setNames = fieldnames(data.sets);
for i =1:length(setNames)
    replicateNames = fieldnames(data.sets.(setNames{i}).replicates);
    for j = 1:length(replicateNames)
        X(n) = data.sets.(setNames{i}).replicates.(replicateNames{j}).X;
        Y(n) = data.sets.(setNames{i}).replicates.(replicateNames{j}).Y;
        n = n+1;


MATLAB works best with arrays/matrices (be it numeric arrays, struct arrays, cell arrays, object arrays, etc..). The language offers constructs to slice and index into arrays easily.

So the idiomatic way in MATLAB would have been to create a non-scalar structure array, as opposed to a deeply nested structure.

For example lets first convert the nested structure into an 2D array of structures, where the first dimension denotes the "replicates", and the second dimension denotes the "sets":

ds = struct('X',[], 'Y',[]);
sets = fieldnames(data.sets);
for i=1:numel(sets)
    reps = fieldnames(data.sets.(sets{i}).replicates);
    for j=1:numel(reps)
        ds(j,i) = data.sets.(sets{i}).replicates.(reps{j});

The result is a 10-by-100 structure array, each with two fields X and Y:

>> ds
ds =
10x100 struct array with fields:

Accessing data.sets.n99.replicates.n9 in the original structure would be equivalent to ds(9,99) in the new structure.

>> data.sets.n99.replicates.n9
ans =
    X: 100.3616
    Y: 98.8023

>> ds(9,99)
ans =
    X: 100.3616
    Y: 98.8023

This new struct has the benefit that it can easily be accessed using array-indexing notation and comma-separated lists. So we can to extract the X and Y vectors like you did simply as:

XX = [ds.X];    % or XX = cat(2, ds.X)
YY = [ds.Y];
scatter(XX, YY, 1)

So if you had control over building the struct, I would design it as described above to begin with. Otherwise the double for-loop in your code with the dynamic field names is the best way to extract the values from it.

You could probably write a bunch of structfun called on each other, but that won't be the most readable code. Here is what I came up with to flatten the nested structure:

D = structfun(@(n) ...
        structfun(@(nn) [nn.X nn.Y], n.replicates, 'UniformOutput',false), ...
        data.sets, 'UniformOutput',false);

The resulting structure can be accessed with less nested fields:

>> D.n99.n9
ans =

Slightly better the original one, but still not easily traversed without some for-loops.