Batch create folders based on part of the file name and move files into this folder


I have 1.6 million(!) PDF files in a single folder. The files are all named similar to this:

LAST_FIRST_7-24-1936 Diagnostic - Topography 11-18-10_1.pdf
LAST_FIRST_7-24-1936 Glasses RX 6-1-11_3.pdf

I need to create a folder based on the first part of the file and then move that file and all other files with that same first part of the file name into that folder. In this case the folder would be named "LAST_FIRST_7-24-1936". The folder will always be named the same as the first part of the file up until the space.

I would like to create batch file that will do this. With my rudimentary programming knowledge I came up with this logical process for doing this:

1 Take the first file and name it var1
2 Remove everything after the space in var1 and name it var2
3 Create a folder named var2
4 Move the file var1 into the folder var2
5 If there are more files Go to line 1, otherwise end

I don't know what the proper syntax would be for this.

I did find this link Need a script to create folders based on file names, and auto move files I made this batch based on that link

pushd D:\Data\Medinfo PDFs
for %%F in (*.pdf) do (
  2>nul md "%%~nF"
  >nul move /y "%%~nF*.*" "%%~nF"

But, it doesn't allow me to create the folder name from just part of the file name. If I can figure that part out I think it would work. I know I need to create variable for the folder name but I don't know how to edit the file name variable to remove everything after the space. Any help would be appreciated. I'm not opposed to doing this in PowerShell or something else as long as it works natively in Windows Server 2008 R2.

SET "sourcedir=c:\sourcedir"
PUSHD %sourcedir%
FOR /f "tokens=1*" %%a IN (
 'dir /b /a-d "*_*_*-*-* *.*"'
 ) DO (
 ECHO MD %%a
 ECHO MOVE "%%a %%b" .\%%a\

This should accomplish the required task - or at least show the required instructions.

If you are satisfied with the commands issued, set sourcedir to your required root directory and remove the two echo keywords to activate.

The "directory already exists" message generated by the MD on attempting to re-create an existing directory may be suppresed by appending 2>nul to the MD line.

Similarly, the report that one file has been moved may be suppresed by appending >nul to the MOVE line.

2>nul suppresses error messages (trying to create an existing directory is an error) whereas the 'files moved' message is an ordinary output message, hence the difference.

Addendum - how it works.

First, the PUSHD sets the current directory to the target.
The DIR command output is tokenised by the FOR/F. The tokens=1* clause instructs that the first token (1) is assigned to the nominated metavariable (%%a) and implicitly the second token (*) to %%b - simply the next alphabetically. Token * means everything after those token numbers explicitly mentioned. No delims clause is used, so the default delimiters (the set of SEPARATORS, SPACE , ; TAB are used.

The DIR is targeted on a mask of *_*_*-* *.*, so only files matching that mask - where * means any number of any characters - will be located. Because the mask is "quoted" the spaces is included in the mask. Without the quotes, two separate masks would be specified. The /b option produces a list in basic form, that is, names only, no headers or summary. The /a-d option suppresses any directory names that may have fitted the mask.

Hence, for LAST_FIRST_7-24-1936 Diagnostic - Topography 11-18-10_1.pdf, the dir lists LAST_FIRST_7-24-1936 Diagnostic - Topography 11-18-10_1.pdf and FOR/F tokenises as LAST_FIRST_7-24-1936 to %%a and Diagnostic - Topography 11-18-10_1.pdf to %%b using the SPACE as a delimiter.

The filename can then be reconstructed by re-inserting the space between %%a and %%b. Any filename containing a separator needs to be quoted to group the characters and signal that they are not separated elements. The target of the move is terminated with \ to specify "This is a directory name."

The POPD restores the original logged directory.