LLM Output Parser: Effortless JSON/XML Extraction

Published: March 01, 2025

LLMs often return structured data buried inside unstructured text. Instead of writing custom regex or manual parsing, you can now use LLM Output Parser to instantly extract the most relevant JSON/XML structures with just one function call.

This is the first release of LLM Output Parser, a lightweight yet powerful Python package for extracting structured JSON and XML from unstructured text generated by Large Language Models!

👉 Check it out on GitHub: https://github.com/KameniAlexNea/llm-output-parser

A robust utility for extracting and parsing structured data (JSON and XML) from unstructured text outputs generated by Large Language Models (LLMs).

Key Features

✅ Extracts JSON and XML from raw text, Markdown code blocks ( json ..., `xml ... ```), and mixed content.
✅ Handles complex formats including nested structures and multiple distinct objects within the same text.
✅ Converts extracted XML structures into JSON-compatible Python dictionaries.
✅ Intelligently selects the most comprehensive or relevant structure when multiple are found.
✅ Provides robust error handling and recovery strategies.

Installation

Install from PyPI:

pip install llm-output-parser

Or install from source:

git clone https://github.com/KameniAlexNea/llm-output-parser.git
cd llm-output-parser
pip install -e .

Usage

JSON Parsing

from llm_output_parser import parse_json

# Parse JSON from an LLM response
llm_response = """
Here's the data you requested:
# json data
{
    "name": "John Doe",
    "age": 30,
    "skills": ["Python", "Machine Learning", "NLP"]
}

Let me know if you need anything else!
"""

data = parse_json(llm_response)
print(data["name"])  # Output: John Doe
print(data["skills"])  # Output: ['Python', 'Machine Learning', 'NLP']

XML Parsing

from llm_output_parser import parse_xml

# Parse XML from an LLM response and convert to JSON
llm_response = """
Here's the user data in XML format:

# xml data
<user id="123">
    <name>Jane Smith</name>
    <email>jane@example.com</email>
    <roles>
        <role>admin</role>
        <role>editor</role>
    </roles>
</user>

Let me know if you need any other information.
"""

data = parse_xml(llm_response)
print(data["@id"]) # Output: 123
print(data["name"]) # Output: Jane Smith
print(data["roles"]["role"]) # Output: ['admin', 'editor']

Handling Complex Cases

The library can handle various complex scenarios:

JSON Within Text

text = 'The user profile is: {"name": "John", "email": "john@example.com"}'
data = parse_json(text)
# Output: {"name": "John", "email": "john@example.com"}

XML Within Text

text = 'The configuration is: <config><server>localhost</server><port>8080</port></config>'
data = parse_xml(text)
# Output: {"server": "localhost", "port": "8080"}

Multiple JSON/XML Objects

When multiple valid objects are present, the parser returns the most comprehensive one (typically the largest or most nested).

# For JSON
text = '''
Small object: {"id": 123}

Larger object:
# json
{
    "user": {
        "id": 123,
        "name": "John",
        "email": "john@example.com",
        "preferences": {
            "theme": "dark",
            "notifications": true
        }
    }
}

'''
data = parse_json(text)  # Returns the larger, more complex object

# For XML
text = '''
Simple: <item>value</item>

Complex:
# xml
<product category="electronics">
    <name>Smartphone</name>
    <price currency="USD">999.99</price>
    <features>
        <feature>5G</feature>
        <feature>High-res camera</feature>
    </features>
</product>

'''
data = parse_xml(text)  # Returns the more complex XML converted to JSON

XML to JSON Conversion Details

When parsing XML, the library converts it to a JSON-compatible dictionary with the following conventions:

XML attributes are prefixed with @ (e.g., <item id="123"> becomes {"@id": "123"})
Text content of elements that also have attributes or child elements is stored under the #text key.
Simple elements containing only text become key-value pairs.
Repeated elements at the same level are automatically grouped into arrays.

Example:

xml_str = '''
<library>
    <book category="fiction">
        <title>The Great Gatsby</title>
        <author>F. Scott Fitzgerald</author>
    </book>
    <book category="non-fiction">
        <title>Sapiens</title>
        <author>Yuval Noah Harari</author>
    </book>
</library>
'''
data = parse_xml(xml_str)
# Results in:
# {
#     "book": [
#         {
#             "@category": "fiction",
#             "title": "The Great Gatsby",
#             "author": "F. Scott Fitzgerald"
#         },
#         {
#             "@category": "non-fiction",
#             "title": "Sapiens",
#             "author": "Yuval Noah Harari"
#         }
#     ]
# }

Error Handling

If no valid JSON or XML structure can be found in the input string, a ValueError is raised:

try:
    data = parse_json("This string contains no JSON.")
except ValueError as e:
    print(f"Error: {e}")
    # Output: Error: Failed to parse JSON from the input string.

try:
    data = parse_xml("This string contains no XML.")
except ValueError as e:
    print(f"Error: {e}")
    # Output: Error: Failed to parse XML from the input string.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request on GitHub.

Fork the repository.
Create your feature branch (git checkout -b feature/amazing-feature).
Commit your changes (git commit -m 'Add some amazing feature').
Push to the branch (git push origin feature/amazing-feature).
Open a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file in the GitHub repository for details.

Share on

Twitter Facebook LinkedIn

Alex KAMENI