legislative-flattener
cri-matthew/legislative-flattener-skillConverts hierarchical legislative text from Word documents into a flat list of requirements. Use when processing regulatory documents, compliance frameworks, or legal text that needs to be extracted into individual, numbered requirements for analysis or mapping.
SKILL.md
name: legislative-flattener description: Converts hierarchical legislative text from Word documents into a flat list of requirements. Use when processing regulatory documents, compliance frameworks, or legal text that needs to be extracted into individual, numbered requirements for analysis or mapping. allowed-tools: Read, Write, Bash, Grep, Glob
Legislative Text Flattener
This skill processes Word documents (DOCX) containing legislative or regulatory text and converts hierarchical structures into a flat, numbered list of discrete requirements or provisions.
When to Use This Skill
Activate this skill when the user wants to:
- Flatten hierarchical legislative or regulatory text
- Extract requirements from compliance frameworks
- Convert nested sections/subsections into a linear list
- Prepare legislative text for requirement mapping
- Process Word documents containing legal or regulatory content
Quick Start
Install required dependencies:
pip install python-docx openpyxl
Basic usage (outputs formatted XLSX by default):
python flattener_utility.py input.docx output.xlsx
Or specify format explicitly:
python flattener_utility.py input.docx output.xlsx --format xlsx
python flattener_utility.py input.docx output.csv --format csv
python flattener_utility.py input.docx output.json --format json
Processing Instructions
1. Input Analysis
First, understand the input document structure:
- Read the DOCX file (use python-docx library or convert to text)
- Identify the hierarchical structure (sections, subsections, paragraphs, sub-paragraphs)
- Detect numbering schemes (1.2.3, (a)(b)(c), Article-Section-Paragraph, etc.)
- Note any special formatting or emphasis (bold, italics, SHALL/MUST keywords)
2. Extraction Process
Extract content while preserving context:
- Each discrete requirement or provision becomes a separate item
- Maintain parent context for nested items
- Preserve the original numbering/reference system
- Extract complete sentences or logical requirement units
- Identify normative language (shall, must, should, may)
3. Flattening Strategy
Convert hierarchy to flat structure:
- Assign sequential flat numbering (1, 2, 3...)
- Include original reference in metadata (e.g., "Section 500.02(b)(3)")
- Preserve full context path for each item
- Handle multi-level lists and nested requirements
- Maintain logical groupings where appropriate
4. Output Format
Generate a structured flat list with these fields for each requirement:
Flat ID: [Sequential number]
Original Reference: [Original section/subsection identifier]
Context Path: [Full hierarchical path, e.g., "Article 5 > Section 2 > Paragraph b"]
Requirement Type: [Mandate/Prohibition/Permission/Definition]
Normative Level: [SHALL/MUST/SHOULD/MAY/INFORMATIVE]
Text: [Full requirement text]
Keywords: [Extracted key concepts or compliance areas]
---
5. Implementation Approach
Use this workflow:
# Install required library if needed
# pip install python-docx
from docx import Document
import re
def flatten_legislative_text(docx_path):
"""
Flattens hierarchical legislative text from DOCX.
Returns a list of flattened requirements.
"""
doc = Document(docx_path)
flattened = []
flat_id = 1
context_stack = []
for paragraph in doc.paragraphs:
# Skip empty paragraphs
if not paragraph.text.strip():
continue
# Detect hierarchy level (by style or numbering)
level = detect_level(paragraph)
original_ref = extract_reference(paragraph)
# Update context stack
update_context(context_stack, level, paragraph.text)
# Extract if it's a requirement (not just a heading)
if is_requirement(paragraph):
item = {
'flat_id': flat_id,
'original_ref': original_ref,
'context_path': ' > '.join(context_stack),
'requirement_type': classify_requirement(paragraph.text),
'normative_level': extract_normative_level(paragraph.text),
'text': clean_text(paragraph.text),
'keywords': extract_keywords(paragraph.text)
}
flattened.append(item)
flat_id += 1
return flattened
6. Output Options
Provide results in user's preferred format:
- XLSX (Excel): Formatted Excel workbook with styled headers, auto-filter, frozen panes, and optimized column widths (default and recommended)
- CSV: Simple tabular format with all fields as columns
- JSON: Structured data for programmatic use
- Markdown: Human-readable with clear section breaks
- Database insert: SQL statements or database-ready format
7. Quality Checks
Validate the flattening:
- Ensure no requirements are lost or duplicated
- Verify context paths are complete and accurate
- Check that normative language is correctly identified
- Confirm original references are preserved
- Review that similar requirements are consistently formatted
Example Output (Markdown Format)
## Flattened Requirements
### Requirement 1
- **Flat ID**: 1
- **Original Reference**: Section 500.02(a)
- **Context Path**: Part 500 > Section 500.02 > Paragraph (a)
- **Requirement Type**: Mandate
- **Normative Level**: SHALL
- **Text**: Each Covered Entity shall maintain a cybersecurity program designed to protect the confidentiality, integrity and availability of the Covered Entity's Information Systems.
- **Keywords**: cybersecurity program, confidentiality, integrity, availability, Information Systems
---
### Requirement 2
- **Flat ID**: 2
- **Original Reference**: Section 500.02(b)
- **Context Path**: Part 500 > Section 500.02 > Paragraph (b)
- **Requirement Type**: Mandate
- **Normative Level**: SHALL
- **Text**: The cybersecurity program shall be based on the Covered Entity's Risk Assessment and designed to perform the following core cybersecurity functions...
- **Keywords**: Risk Assessment, core cybersecurity functions
---
Notes
- XLSX output is recommended for most use cases as it provides:
- Professional formatting with styled headers
- Auto-filter for easy data exploration
- Frozen header row for scrolling large datasets
- Optimized column widths for readability
- Text wrapping for long requirement text
- Handle tables within documents by processing each cell as potential requirement text
- Preserve cross-references between sections where they exist
- Flag ambiguous or incomplete requirements for manual review
- Support batch processing of multiple documents
- Maintain a processing log of any items that couldn't be automatically classified
XLSX Features
The XLSX export includes:
- Header styling: Bold white text on blue background
- Auto-filter: Filter requirements by any column
- Frozen panes: Header row stays visible when scrolling
- Column widths: Optimized for each field type (10-60 characters)
- Text wrapping: Long text automatically wraps for readability
- Borders: Clean grid lines for professional appearance
Supporting Files
See examples.md for sample input/output pairs and template.md for customizable output templates.
README
Legislative Flattener - Claude Code Skill
A powerful Claude Code skill for converting hierarchical legislative text from Word documents into flat, numbered requirement lists.
Perfect for compliance analysis, regulatory mapping, and legislative documentation processing.
🎯 Features
- 🔄 Parent-Child Flattening: Intelligently combines parent requirements with sub-items into standalone requirements
- 📊 Word Metadata Support: Reads hidden Word numbering to auto-generate hierarchical references like (1)(a), (1)(b)(i)
- 🎨 Multiple Export Formats: Excel (XLSX), CSV, JSON, and Markdown
- 🔍 Smart Filtering: Automatically excludes definition sections (X.1.1, X.1.2), notes, and applicability statements
- 📝 Context Tracking: Maintains full document structure with hierarchical paths
- ✨ Normative Analysis: Identifies requirement types (Mandate, Prohibition, Permission) and normative levels (SHALL, MUST, SHOULD, MAY)
📋 Prerequisites
- Python 3.8 or higher
- Claude Code (optional, for skill integration)
🚀 Quick Start
Install as Claude Code Skill (Recommended)
# Clone the repository
git clone https://github.com/yourusername/legislative-flattener-skill.git
# Copy to Claude Code skills directory
mkdir -p ~/.claude/skills
cp -r legislative-flattener-skill ~/.claude/skills/legislative-flattener
# Install Python dependencies
cd ~/.claude/skills/legislative-flattener
pip install -r requirements.txt
Now the skill is available in all your Claude Code projects! You can invoke it by typing /legislative-flattener or mentioning legislative document processing.
Standalone Installation
# Clone the repository
git clone https://github.com/yourusername/legislative-flattener-skill.git
cd legislative-flattener-skill
# Install dependencies
pip install -r requirements.txt
# Run directly
python flattener_utility.py input.docx output.xlsx
💻 Usage
Command Line
# Basic usage - defaults to Excel output
python flattener_utility.py input.docx output.xlsx
# Specify output format
python flattener_utility.py input.docx output.csv --format csv
python flattener_utility.py input.docx output.json --format json
python flattener_utility.py input.docx output.md --format markdown
Python API
from flattener_utility import LegislativeFlattener
# Create flattener instance
flattener = LegislativeFlattener()
# Process document
requirements = flattener.flatten_from_docx('input.docx')
print(f"Found {len(requirements)} requirements")
# Export to desired format
flattener.export_to_xlsx(requirements, 'output.xlsx')
flattener.export_to_json(requirements, 'output.json')
flattener.export_to_csv(requirements, 'output.csv')
flattener.export_to_markdown(requirements, 'output.md')
With Claude Code
Simply open Claude Code in any project and:
- Type:
analyze ASIC_8A.docx with legislative-flattener - Or:
@legislative-flattener process my regulatory document
Claude will automatically use the skill to process your legislative documents!
📖 Example
Input Document Structure
Rule 8A.3.1 - Critical Business Services
(1) An Operator must have adequate arrangements...
(2) Without limiting subrule (1), arrangements must include:
(a) identifying Critical Business Services;
(b) assessing and managing risks;
(c) ensuring sufficient capacity;
Output (Flattened to Excel)
| Flat ID | Original Ref | Context Path | Type | Normative Level | Text |
|---|---|---|---|---|---|
| 1 | 8A.3.1(1) | Rule 8A.3.1 | Mandate | MUST | An Operator must have adequate arrangements... |
| 2 | 8A.3.1(2)(a) | Rule 8A.3.1 | Mandate | MUST | Without limiting subrule (1), arrangements must include identifying Critical Business Services |
| 3 | 8A.3.1(2)(b) | Rule 8A.3.1 | Mandate | MUST | Without limiting subrule (1), arrangements must include assessing and managing risks |
| 4 | 8A.3.1(2)(c) | Rule 8A.3.1 | Mandate | MUST | Without limiting subrule (1), arrangements must include ensuring sufficient capacity |
🎨 Output Formats
Excel (XLSX) - Default & Recommended
- ✨ Professional formatting with colored headers
- 🔍 Auto-filter on all columns
- 📌 Frozen header row
- 📏 Optimized column widths
- 📝 Text wrapping for readability
- 🎯 Clean borders and grid lines
JSON
- 🔧 Structured data for programmatic use
- 🌐 UTF-8 encoded
- 📦 Easy integration with other tools
CSV
- 📊 Universal spreadsheet compatibility
- 🚀 Lightweight and portable
Markdown
- 📄 Human-readable documentation
- 🔖 Version control friendly
- ✍️ Perfect for reports
🧪 Tested With
Successfully processed:
- ✅ ASIC Chapter 8A: 89 requirements captured (5/7 rules perfect match)
- ✅ ASIC Chapter 8B: 83 requirements captured (5/6 rules within ±2)
- ✅ Multi-level hierarchies (up to 4 levels deep: (1)(a)(i)(A))
- ✅ Complex Word numbering schemes with multiple NumIDs
- ✅ Mixed explicit and implicit numbering
📚 Skill Files
This repository contains a complete Claude Code skill:
SKILL.md: Skill definition and metadata for Claude Codeflattener_utility.py: Core Python implementationexamples.md: Sample input/output examplestemplate.md: Output template documentationrequirements.txt: Python dependenciesREADME.md: This file
🔧 How It Works
- Document Analysis: Reads Word document structure, styles, and numbering metadata
- Hierarchy Detection: Identifies parent-child relationships using Word styles (MIR Body Text, MIR Subpara)
- Numbering Generation: Auto-generates hierarchical references from Word's hidden numbering levels
- Text Flattening: Combines parent requirements with sub-items to create standalone requirements
- Smart Filtering: Excludes definitions (X.1.1, X.1.2), notes, and applicability statements
- Classification: Analyzes normative language to classify requirements and extract keywords
- Export: Generates formatted output in your preferred format
🤝 Contributing
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
📄 License
This project is licensed under the MIT License - see the LICENSE file for details.
🙏 Acknowledgments
- Built for Claude Code skill integration
- Designed for ASIC regulatory compliance analysis
- Inspired by the need for better legislative document processing in compliance workflows
📞 Support
- Issues: Report bugs or request features
- Discussions: Ask questions or share ideas
- Documentation: See SKILL.md for detailed skill documentation
🗺️ Roadmap
- Support for PDF input files
- Configurable filtering rules via config file
- Web interface for non-technical users
- Batch processing for multiple documents
- Custom output templates
- Export to Word, HTML, and database formats
- Requirement comparison and diff tools
- Integration with compliance management systems
Built with ❤️ for the compliance and regulatory community
Star ⭐ this repository if you find it helpful!