How to parse big XML files in Python
Mar 05, 2008
Parsing big XML files in Python (some 60 MB is already big for me) was a bit painful until now. I used to import minidom and sometimes sax.
The problem with minidom is that the whole XML file loads into memory. Unless you have a 16GB machine, go to get a coffee, as you won't be able to do anything else until the cpu ends processing the file. If you try to do it with SAX, you have to work detecting every element start and end. Quite crappy.
Today I learned a better solution from Erral: use lxml library. Here is an example so that you see how can we convert an XML file into a list of dicts:
from lxml import etree coords = etree.parse("/path/to/your/xml/file").getroot() coords_list = [] for coord in coords: this = {} for child in coord.getchildren(): this[child.tag] = child.text coords_list.append(this)
Quite straightforward, isn't it?
Commenting has been disabled.
You may be interested in these other articles
JakiZU! promoting food sustainability
Oct 28, 2024
How about immediate parsing of an unknown size stream read in blocks ? (not necessarily XML)