LLM Output Parser: Effortless JSON/XML Extraction
Published:
LLMs often return structured data buried inside unstructured text. Instead of writing custom regex or manual parsing, you can now use LLM Output Parser to instantly extract the most relevant JSON/XML structures with just one function call.
This is the first release of LLM Output Parser, a lightweight yet powerful Python package for extracting structured JSON and XML from unstructured text generated by Large Language Models!
👉 Check it out on GitHub: https://github.com/KameniAlexNea/llm-output-parser
A robust utility for extracting and parsing structured data (JSON and XML) from unstructured text outputs generated by Large Language Models (LLMs).
Key Features
- ✅ Extracts JSON and XML from raw text, Markdown code blocks (
json ...
, `
xml ...
```), and mixed content. - ✅ Handles complex formats including nested structures and multiple distinct objects within the same text.
- ✅ Converts extracted XML structures into JSON-compatible Python dictionaries.
- ✅ Intelligently selects the most comprehensive or relevant structure when multiple are found.
- ✅ Provides robust error handling and recovery strategies.
Installation
Install from PyPI:
pip install llm-output-parser
Or install from source:
git clone https://github.com/KameniAlexNea/llm-output-parser.git
cd llm-output-parser
pip install -e .
Usage
JSON Parsing
from llm_output_parser import parse_json
# Parse JSON from an LLM response
llm_response = """
Here's the data you requested:
# json data
{
"name": "John Doe",
"age": 30,
"skills": ["Python", "Machine Learning", "NLP"]
}
Let me know if you need anything else!
"""
data = parse_json(llm_response)
print(data["name"]) # Output: John Doe
print(data["skills"]) # Output: ['Python', 'Machine Learning', 'NLP']
XML Parsing
from llm_output_parser import parse_xml
# Parse XML from an LLM response and convert to JSON
llm_response = """
Here's the user data in XML format:
# xml data
<user id="123">
<name>Jane Smith</name>
<email>jane@example.com</email>
<roles>
<role>admin</role>
<role>editor</role>
</roles>
</user>
Let me know if you need any other information.
"""
data = parse_xml(llm_response)
print(data["@id"]) # Output: 123
print(data["name"]) # Output: Jane Smith
print(data["roles"]["role"]) # Output: ['admin', 'editor']
Handling Complex Cases
The library can handle various complex scenarios:
JSON Within Text
text = 'The user profile is: {"name": "John", "email": "john@example.com"}'
data = parse_json(text)
# Output: {"name": "John", "email": "john@example.com"}
XML Within Text
text = 'The configuration is: <config><server>localhost</server><port>8080</port></config>'
data = parse_xml(text)
# Output: {"server": "localhost", "port": "8080"}
Multiple JSON/XML Objects
When multiple valid objects are present, the parser returns the most comprehensive one (typically the largest or most nested).
# For JSON
text = '''
Small object: {"id": 123}
Larger object:
# json
{
"user": {
"id": 123,
"name": "John",
"email": "john@example.com",
"preferences": {
"theme": "dark",
"notifications": true
}
}
}
'''
data = parse_json(text) # Returns the larger, more complex object
# For XML
text = '''
Simple: <item>value</item>
Complex:
# xml
<product category="electronics">
<name>Smartphone</name>
<price currency="USD">999.99</price>
<features>
<feature>5G</feature>
<feature>High-res camera</feature>
</features>
</product>
'''
data = parse_xml(text) # Returns the more complex XML converted to JSON
XML to JSON Conversion Details
When parsing XML, the library converts it to a JSON-compatible dictionary with the following conventions:
- XML attributes are prefixed with
@
(e.g.,<item id="123">
becomes{"@id": "123"}
) - Text content of elements that also have attributes or child elements is stored under the
#text
key. - Simple elements containing only text become key-value pairs.
- Repeated elements at the same level are automatically grouped into arrays.
Example:
xml_str = '''
<library>
<book category="fiction">
<title>The Great Gatsby</title>
<author>F. Scott Fitzgerald</author>
</book>
<book category="non-fiction">
<title>Sapiens</title>
<author>Yuval Noah Harari</author>
</book>
</library>
'''
data = parse_xml(xml_str)
# Results in:
# {
# "book": [
# {
# "@category": "fiction",
# "title": "The Great Gatsby",
# "author": "F. Scott Fitzgerald"
# },
# {
# "@category": "non-fiction",
# "title": "Sapiens",
# "author": "Yuval Noah Harari"
# }
# ]
# }
Error Handling
If no valid JSON or XML structure can be found in the input string, a ValueError
is raised:
try:
data = parse_json("This string contains no JSON.")
except ValueError as e:
print(f"Error: {e}")
# Output: Error: Failed to parse JSON from the input string.
try:
data = parse_xml("This string contains no XML.")
except ValueError as e:
print(f"Error: {e}")
# Output: Error: Failed to parse XML from the input string.
Contributing
Contributions are welcome! Please feel free to submit a Pull Request on GitHub.
- Fork the repository.
- Create your feature branch (
git checkout -b feature/amazing-feature
). - Commit your changes (
git commit -m 'Add some amazing feature'
). - Push to the branch (
git push origin feature/amazing-feature
). - Open a Pull Request.
License
This project is licensed under the MIT License - see the LICENSE
file in the GitHub repository for details.