SQL replacement query for variable content

advertisements

I am looking for a SQL query to clean up a hacked SQL Server database. I have some basic SQL knowledge, but I have no idea how to solve the below.

For one of our websites we have a SQL Server database that was recently hacked into. Thousands of records were filled with hidden divs, containing all sorts of dodgy references. Our ISP says the content of the database is not their responsibility and they have no knowledge how to help us clean up the database. There are no clean backups available. It is way too much work to go through all records manually.

So I am now desperately trying to find a SQL query to remove these blocks of hidden text from the database.

Two useful bits of information:

  1. All the spammy content is contained within div tags. The information between the tags is different in every instance, but they all open and close with the div tag.

  2. Our original data will have some HTML-content, but will never contain div tags. So if we can find a way to remove everything from the starting div up to and including the closing div, then we would be sorted.

Any assistance here is much appreciated. Thanks for your time.


Try this out; will only work if your assumptions are correct. Another assumption hacker did not add nested DIVs. And yes, TEST this thoroughly before running the update. And back up your data before running the update.

CREATE TABLE #temp(id INT IDENTITY, html VARCHAR(MAX));

INSERT #temp(html)
VALUES('<p>Some text</p><strong>other text</strong><div>added by hacker</div>')
,('<p>Some text</p><strong>other text<div>added by hacker within html tag</div></strong>')
,('<p>Some text</p><div>some other text added by <a href="http://google.com">hack</a></div><strong>other text</strong>');

SELECT html
,  CHARINDEX('<div',html) AS startPos
, CHARINDEX('</div>',html) AS endPos
, (CHARINDEX('</div>',html)+6)-(CHARINDEX('<div',html)) AS stringLenToRemove
, SUBSTRING(html, CHARINDEX('<div',html), (CHARINDEX('</div>',html)+6)-(CHARINDEX('<div',html))) AS HtmlAddedByHack
,REPLACE(html,SUBSTRING(html, CHARINDEX('<div',html), (CHARINDEX('</div>',html)+6)-(CHARINDEX('<div',html))), '') AS sanitizedHtml
FROM #temp;

--UPDATE #temp
--SET html = REPLACE(html,SUBSTRING(html, CHARINDEX('<div',html), (CHARINDEX('</div>',html)+6)-(CHARINDEX('<div',html))), '');

--SELECT  *
--FROM    #temp;