In our training set, we performed feature selection (ex. CfsSubsetEval GreedyStepwise) and then classified the instances using a classifier (ex. J48). We have saved the model Weka created.
Now, we want to classify new [unlabeled] instances (which still has the original number of attributes of the training set before it went under feature selection). Are we right in assuming that we should perform the feature selection in this set of new [unlabeled] instances so we could re-evaluate it using the saved model (to make the training and test sets compatible)? If yes, how can we filter the test set?
Thank you for helping!
Yes, both test and training set must have the same number of attributes and each attribute must correspond to the same thing. So you should remove the same attributes (that you removed from training set) from your test set before classification.