How to manage the analysis of a large xml file and save it in a database

advertisements

I have a fairly large xml file ( greater than 2mb ) that I'm parsing and storing in an sqlite database. I can parse it and store it for the first time fine. My question concerns updating the database when I want to parse the xml file again ( for changes, additions, or deletions ). My initial thought is to just wipe the information in the database and do inserts again rather than parse the data, check to see if a given item is already in the database and do an update. Is there an approach that is better than another? Would there be a performance hit one way or another? I'd appreciate any thoughts on the matter.


Yes, re-inserting is probably a bad idea. How complicated is the xml structure, how many tables are involved when you would query the existence of one item that is reflected by the structure?

If it's complex you might be able to create a checksum of your entries or a hash of some attributes and values which identify a record uniquely and store this hash/checksum in an extra table in the db, when you look for modified entries you just compute the hash/checksum and look for it in one table. Maybe that even makes the querying faster, depending how expensive the hash calculation is.