How to parse big XML files in Python
Mar 05, 2008
Parsing big XML files in Python (some 60 MB is already big for me) was a bit painful until now. I used to import minidom and sometimes sax.
The problem with minidom is that the whole XML file loads into memory. Unless you have a 16GB machine, go to get a coffee, as you won't be able to do anything else until the cpu ends processing the file. If you try to do it with SAX, you have to work detecting every element start and end. Quite crappy.
Today I learned a better solution from Erral: use lxml library. Here is an example so that you see how can we convert an XML file into a list of dicts:
from lxml import etree coords = etree.parse("/path/to/your/xml/file").getroot() coords_list = [] for coord in coords: this = {} for child in coord.getchildren(): this[child.tag] = child.text coords_list.append(this)
Quite straightforward, isn't it?
Commenting has been disabled.
You may be interested in these other articles
Applications for munipalities
Feb 18, 2025
How to show hiking routes on the web?
Jan 12, 2025
How about immediate parsing of an unknown size stream read in blocks ? (not necessarily XML)