#1 Data Analytics Program in India
₹2,499₹1,499Enroll Now
5 min read min read

XML Data Processing

Learn to read and process XML data in Python

XML Data Processing

What is XML?

XML stands for eXtensible Markup Language. It's a text format for storing structured data using tags.

What XML looks like:

code.txt
<person>
  <name>John</name>
  <age>25</age>
  <city>New York</city>
</person>

Where XML is used:

  • Configuration files
  • Data exchange between systems
  • Web services (SOAP APIs)
  • Office documents (docx, xlsx)
  • RSS feeds

XML vs JSON:

  • XML: More verbose, tags, used in older systems
  • JSON: Simpler, lighter, modern APIs

The xml.etree.ElementTree Module

Python has built-in XML support.

code.py
import xml.etree.ElementTree as ET

What this does: Imports XML parser with shorter name (ET).

Reading XML String

code.py
import xml.etree.ElementTree as ET

xml_string = """
<person>
    <name>John</name>
    <age>25</age>
    <city>New York</city>
</person>
"""

root = ET.fromstring(xml_string)

print("Tag:", root.tag)
print("Name:", root.find("name").text)
print("Age:", root.find("age").text)

What this does:

  • fromstring() parses XML text
  • root is the top element
  • find() locates child elements
  • .text gets content inside tags

Reading XML File

code.py
import xml.etree.ElementTree as ET

tree = ET.parse("data.xml")
root = tree.getroot()

print("Root tag:", root.tag)

for child in root:
    print("Child:", child.tag, "Value:", child.text)

What this does:

  • parse() reads XML file
  • getroot() gets top element
  • Loops through child elements

Finding Elements

Find First Match

code.py
import xml.etree.ElementTree as ET

tree = ET.parse("students.xml")
root = tree.getroot()

first_student = root.find("student")
name = first_student.find("name").text
print("First student:", name)

What find() does: Returns first element that matches tag name.

Find All Matches

code.py
import xml.etree.ElementTree as ET

tree = ET.parse("students.xml")
root = tree.getroot()

students = root.findall("student")

for student in students:
    name = student.find("name").text
    grade = student.find("grade").text
    print("Student:", name, "Grade:", grade)

What findall() does: Returns list of all matching elements.

Reading XML Attributes

XML tags can have attributes.

XML with attributes:

code.txt
<student id="1" status="active">
    <name>John</name>
</student>

Reading attributes:

code.py
import xml.etree.ElementTree as ET

tree = ET.parse("students.xml")
root = tree.getroot()

for student in root.findall("student"):
    student_id = student.get("id")
    status = student.get("status")
    name = student.find("name").text

    print("ID:", student_id)
    print("Status:", status)
    print("Name:", name)
    print()

What .get() does: Gets attribute value from element.

Nested XML

XML can have multiple levels.

Example XML:

code.txt
<school>
    <classroom>
        <student>
            <name>John</name>
            <subjects>
                <subject>Math</subject>
                <subject>Science</subject>
            </subjects>
        </student>
    </classroom>
</school>

Reading nested data:

code.py
import xml.etree.ElementTree as ET

tree = ET.parse("school.xml")
root = tree.getroot()

classroom = root.find("classroom")
student = classroom.find("student")
name = student.find("name").text

print("Student:", name)
print("Subjects:")

subjects = student.find("subjects")
for subject in subjects.findall("subject"):
    print("-", subject.text)

What this does: Navigates through multiple levels to get data.

Using XPath

XPath is a powerful way to find elements.

code.py
import xml.etree.ElementTree as ET

tree = ET.parse("students.xml")
root = tree.getroot()

names = root.findall(".//name")

for name in names:
    print(name.text)

What .// means: Find all elements with this tag anywhere in the tree.

More XPath examples:

code.py
root.findall("./student")

root.findall("./student/name")

root.findall(".//student[@status='active']")

XPath patterns:

  • . current element
  • .. parent element
  • .// all descendants
  • [@attr='value'] filter by attribute

Creating XML

Build XML from Python.

code.py
import xml.etree.ElementTree as ET

root = ET.Element("students")

student1 = ET.SubElement(root, "student")
student1.set("id", "1")

name1 = ET.SubElement(student1, "name")
name1.text = "John"

age1 = ET.SubElement(student1, "age")
age1.text = "20"

tree = ET.ElementTree(root)
tree.write("output.xml", encoding="utf-8", xml_declaration=True)

print("XML file created")

What this creates:

code.txt
<?xml version='1.0' encoding='utf-8'?>
<students>
    <student id="1">
        <name>John</name>
        <age>20</age>
    </student>
</students>

Practice Example

The scenario: Process product catalog XML file.

Example XML (products.xml):

code.txt
<catalog>
    <product id="1" category="Electronics">
        <name>Laptop</name>
        <price>999.99</price>
        <stock>5</stock>
    </product>
    <product id="2" category="Electronics">
        <name>Phone</name>
        <price>599.99</price>
        <stock>10</stock>
    </product>
    <product id="3" category="Accessories">
        <name>Mouse</name>
        <price>25.99</price>
        <stock>50</stock>
    </product>
</catalog>

Python program:

code.py
import xml.etree.ElementTree as ET

tree = ET.parse("products.xml")
root = tree.getroot()

print("Product Catalog")
print("=" * 40)

total_value = 0
product_count = 0

for product in root.findall("product"):
    product_id = product.get("id")
    category = product.get("category")
    name = product.find("name").text
    price = float(product.find("price").text)
    stock = int(product.find("stock").text)

    value = price * stock
    total_value = total_value + value
    product_count = product_count + 1

    print("Product ID:", product_id)
    print("Name:", name)
    print("Category:", category)
    print("Price:", price)
    print("Stock:", stock)
    print("Value:", value)
    print()

print("=" * 40)
print("Total products:", product_count)
print("Total inventory value:", total_value)

electronics = root.findall(".//product[@category='Electronics']")
print("Electronics count:", len(electronics))

What this program does:

  1. Parses XML file
  2. Loops through all products
  3. Extracts attributes and child elements
  4. Calculates inventory value
  5. Uses XPath to filter by category

Converting XML to Dictionary

code.py
import xml.etree.ElementTree as ET

tree = ET.parse("student.xml")
root = tree.getroot()

student_dict = {}
for child in root:
    student_dict[child.tag] = child.text

print(student_dict)

What this creates: {'name': 'John', 'age': '25', 'city': 'New York'}

Key Points to Remember

XML uses tags to structure data. Tags come in pairs: opening and closing.

ET.parse() reads XML files, ET.fromstring() reads XML strings.

find() gets first match, findall() gets all matches. Use .text to get content, .get() for attributes.

XPath (.// pattern) helps find elements anywhere in tree.

XML is more verbose than JSON but still widely used in enterprise systems.

Common Mistakes

Mistake 1: Forgetting .text

code.py
name = root.find("name")  # This is element object
name = root.find("name").text  # This is actual text

Mistake 2: Wrong method

code.py
students = root.find("student")  # Only gets first one
students = root.findall("student")  # Gets all

Mistake 3: Not checking if element exists

code.py
name = root.find("name").text  # Error if name doesn't exist!

Better:

code.py
name_element = root.find("name")
if name_element is not None:
    name = name_element.text

What's Next?

You now know XML basics. Next, you'll learn about Introduction to APIs - how to connect your Python programs to web services and get data from the internet.