I have text files that are approximately 6MB in size. There are some lines that contain the NULL (Chr(0))character that I would like to remove. I have two methods to do this: using Asc()=0 but this takes approximately 50s to complete, the other method uses InStr (line, Chr(0)) =0 (fast ~ 4sec)but the results remove vital info from the lines which contain the NULL characters.
First line of text file as example:
First method (works but VERY slow)
function normalise (textFile ) Set fso = CreateObject("Scripting.FileSystemObject") writeTo = fso.BuildPath(tempFolder, saveTo & ("\Output.arc")) Set objOutFile = fso.CreateTextFile(writeTo) Set objFile = fso.OpenTextFile(textFile,1) Do Until objFile.AtEndOfStream strCharacters = objFile.Read(1) If Asc(strCharacters) = 0 Then objOutFile.Write "" nul = true Else if nul = true then objOutFile.Write(VbLf & strCharacters) else objOutFile.Write(strCharacters) end if nul = false End If Loop objOutFile.close end function
The output looks like this:
@@MMCIBN.000 7 076059 7653 1375686349 2528 780608 10700 \ _NC_ACT.DIR\CFG_RESET.INI
Second method code:
filename = WScript.Arguments(0) Set fso = CreateObject("Scripting.FileSystemObject") sDate = Year(Now()) & Right("0" & Month(now()), 2) & Right("00" & Day(Now()), 2) file = fso.BuildPath(fso.GetFile(filename).ParentFolder.Path, saveTo & "Output " & sDate & ".arc") Set objOutFile = fso.CreateTextFile(file) Set f = fso.OpenTextFile(filename) Do Until f.AtEndOfStream line = f.ReadLine If (InStr(line, Chr(0)) > 0) Then line = Left(line, InStr(line, Chr(0)) - 1) & Right(line, InStr(line, Chr(0)) + 1) end if objOutFile.WriteLine line Loop f.Close
but then the output is:
Can someone please guide me how to remove the NULLS quickly without losing information. I have thought to try and use the second method to scan for which line numbers need updating and then feed this to the first method to try and speed things up, but quite honestly I have no idea where to even start doing this! Thanks in advance...
It looks like the first method is just replacing each
NULL with a newline. If that's all you need, you can just do this:
OK, sounds like you need to replace each set of NULLs with a newline. Let's try this instead:
strText = fso.OpenTextFile(textFile, 1).ReadAll() With New RegExp .Pattern = "\x00+" .Global = True strText = .Replace(strText, vbCrLf) End With objOutFile.Write strText
I think the
Read/ReadAll methods of the
TextStream class are having trouble dealing with the mix of text and binary data. Let's use an
ADO Stream object to read the data instead.
' Read the "text" file using a Stream object... Const adTypeText = 2 With CreateObject("ADODB.Stream") .Type = adTypeText .Open .LoadFromFile textFile .Charset = "us-ascii" strText = .ReadText() End With ' Now do our regex replacement... With New RegExp .Pattern = "\x00+" .Global = True strText = .Replace(strText, vbCrLf) End With ' Now write using a standard TextStream... With fso.CreateTextFile(file) .Write strText .Close End With