delete null characters from the text file using vbs


I have text files that are approximately 6MB in size. There are some lines that contain the NULL (Chr(0))character that I would like to remove. I have two methods to do this: using Asc()=0 but this takes approximately 50s to complete, the other method uses InStr (line, Chr(0)) =0 (fast ~ 4sec)but the results remove vital info from the lines which contain the NULL characters.

First line of text file as example:


First method (works but VERY slow)

function normalise (textFile )

Set fso = CreateObject("Scripting.FileSystemObject")
writeTo = fso.BuildPath(tempFolder, saveTo & ("\Output.arc"))
Set objOutFile = fso.CreateTextFile(writeTo)
Set objFile = fso.OpenTextFile(textFile,1)

Do Until objFile.AtEndOfStream
    strCharacters = objFile.Read(1)
    If Asc(strCharacters) = 0 Then
        objOutFile.Write ""
        nul = true
        if nul = true then
            objOutFile.Write(VbLf & strCharacters)
        end if
    nul = false
    End If

end function

The output looks like this:


Second method code:

filename = WScript.Arguments(0)

Set fso = CreateObject("Scripting.FileSystemObject")

sDate = Year(Now()) & Right("0" & Month(now()), 2) & Right("00" & Day(Now()), 2)
file = fso.BuildPath(fso.GetFile(filename).ParentFolder.Path, saveTo & "Output " & sDate & ".arc")
Set objOutFile = fso.CreateTextFile(file)
Set f = fso.OpenTextFile(filename)

Do Until f.AtEndOfStream
    line = f.ReadLine

    If (InStr(line, Chr(0)) > 0) Then
        line = Left(line, InStr(line, Chr(0)) - 1) & Right(line, InStr(line, Chr(0)) + 1)
    end if

    objOutFile.WriteLine line



but then the output is:


Can someone please guide me how to remove the NULLS quickly without losing information. I have thought to try and use the second method to scan for which line numbers need updating and then feed this to the first method to try and speed things up, but quite honestly I have no idea where to even start doing this! Thanks in advance...

It looks like the first method is just replacing each NULL with a newline. If that's all you need, you can just do this:


OK, sounds like you need to replace each set of NULLs with a newline. Let's try this instead:

strText = fso.OpenTextFile(textFile, 1).ReadAll()

With New RegExp
    .Pattern = "\x00+"
    .Global = True
    strText = .Replace(strText, vbCrLf)
End With

objOutFile.Write strText

Update 2:

I think the Read/ReadAll methods of the TextStream class are having trouble dealing with the mix of text and binary data. Let's use an ADO Stream object to read the data instead.

' Read the "text" file using a Stream object...
Const adTypeText = 2

With CreateObject("ADODB.Stream")
    .Type = adTypeText
    .LoadFromFile textFile
    .Charset = "us-ascii"
    strText = .ReadText()
End With

' Now do our regex replacement...
With New RegExp
    .Pattern = "\x00+"
    .Global = True
    strText = .Replace(strText, vbCrLf)
End With

' Now write using a standard TextStream...
With fso.CreateTextFile(file)
    .Write strText
End With