Creating a model dataset from multiple datasets with different columns

advertisements

Currently, I have several sets of business unit data that I'd like to put into a standard template format. Some business unit data contains columns that others don't. I would like to check if certain columns exist and then to create them if they don't. I understand that techniques to achieve similar functionality have been discussed earlier, here and here. However, I was wondering if a better method exists.

My current code is:

data Source_Data4;
set Interm.Source_Data3;
if 0 then do;
    a="";
    b="";
end;
run;


If you have multiple datasets with different columns that you want to use a template for, an excellent way to do this is something like this:

data want;
if 0 then set template;
set have2;
run;

This is far easier to code than a bunch of retain/length statements. It accomplishes the identical results as the retain solution (it defines the PDV), with one exception; it will define lengths and formats of variables based on template (while retain does not affect length or format). This may be desirable or may not be, depending on your use case. It is very helpful when combining multiple datasets, as it provides a single point at which length/format differences can be tested for; once this step occurs, you can be confident that your various datasets are all identical in variable length/format.

Creating this dataset can be done a number of ways. One simple way is:

data template;
if 0 then set have;
if 0 then set have2;
stop;
run;

That will create a blank dataset with have1 order followed by any new variables from have2. If that's not desired, you may want to add a RETAIN statement prior to the if 0's that draws from a data dictionary.