Converts hierarchical legislative text from Word documents into a flat list of requirements. Use when processing regulatory documents, compliance frameworks, or legal text that needs to be extracted into individual, numbered requirements for analysis or mapping.

0 stars
0 forks
Python
5 views

SKILL.md


name: legislative-flattener description: Converts hierarchical legislative text from Word documents into a flat list of requirements. Use when processing regulatory documents, compliance frameworks, or legal text that needs to be extracted into individual, numbered requirements for analysis or mapping. allowed-tools: Read, Write, Bash, Grep, Glob

Legislative Text Flattener

This skill processes Word documents (DOCX) containing legislative or regulatory text and converts hierarchical structures into a flat, numbered list of discrete requirements or provisions.

When to Use This Skill

Activate this skill when the user wants to:

  • Flatten hierarchical legislative or regulatory text
  • Extract requirements from compliance frameworks
  • Convert nested sections/subsections into a linear list
  • Prepare legislative text for requirement mapping
  • Process Word documents containing legal or regulatory content

Quick Start

Install required dependencies:

pip install python-docx openpyxl

Basic usage (outputs formatted XLSX by default):

python flattener_utility.py input.docx output.xlsx

Or specify format explicitly:

python flattener_utility.py input.docx output.xlsx --format xlsx
python flattener_utility.py input.docx output.csv --format csv
python flattener_utility.py input.docx output.json --format json

Processing Instructions

1. Input Analysis

First, understand the input document structure:

  • Read the DOCX file (use python-docx library or convert to text)
  • Identify the hierarchical structure (sections, subsections, paragraphs, sub-paragraphs)
  • Detect numbering schemes (1.2.3, (a)(b)(c), Article-Section-Paragraph, etc.)
  • Note any special formatting or emphasis (bold, italics, SHALL/MUST keywords)

2. Extraction Process

Extract content while preserving context:

  • Each discrete requirement or provision becomes a separate item
  • Maintain parent context for nested items
  • Preserve the original numbering/reference system
  • Extract complete sentences or logical requirement units
  • Identify normative language (shall, must, should, may)

3. Flattening Strategy

Convert hierarchy to flat structure:

  • Assign sequential flat numbering (1, 2, 3...)
  • Include original reference in metadata (e.g., "Section 500.02(b)(3)")
  • Preserve full context path for each item
  • Handle multi-level lists and nested requirements
  • Maintain logical groupings where appropriate

4. Output Format

Generate a structured flat list with these fields for each requirement:

Flat ID: [Sequential number]
Original Reference: [Original section/subsection identifier]
Context Path: [Full hierarchical path, e.g., "Article 5 > Section 2 > Paragraph b"]
Requirement Type: [Mandate/Prohibition/Permission/Definition]
Normative Level: [SHALL/MUST/SHOULD/MAY/INFORMATIVE]
Text: [Full requirement text]
Keywords: [Extracted key concepts or compliance areas]
---

5. Implementation Approach

Use this workflow:

# Install required library if needed
# pip install python-docx

from docx import Document
import re

def flatten_legislative_text(docx_path):
    """
    Flattens hierarchical legislative text from DOCX.
    Returns a list of flattened requirements.
    """
    doc = Document(docx_path)
    flattened = []
    flat_id = 1
    context_stack = []

    for paragraph in doc.paragraphs:
        # Skip empty paragraphs
        if not paragraph.text.strip():
            continue

        # Detect hierarchy level (by style or numbering)
        level = detect_level(paragraph)
        original_ref = extract_reference(paragraph)

        # Update context stack
        update_context(context_stack, level, paragraph.text)

        # Extract if it's a requirement (not just a heading)
        if is_requirement(paragraph):
            item = {
                'flat_id': flat_id,
                'original_ref': original_ref,
                'context_path': ' > '.join(context_stack),
                'requirement_type': classify_requirement(paragraph.text),
                'normative_level': extract_normative_level(paragraph.text),
                'text': clean_text(paragraph.text),
                'keywords': extract_keywords(paragraph.text)
            }
            flattened.append(item)
            flat_id += 1

    return flattened

6. Output Options

Provide results in user's preferred format:

  • XLSX (Excel): Formatted Excel workbook with styled headers, auto-filter, frozen panes, and optimized column widths (default and recommended)
  • CSV: Simple tabular format with all fields as columns
  • JSON: Structured data for programmatic use
  • Markdown: Human-readable with clear section breaks
  • Database insert: SQL statements or database-ready format

7. Quality Checks

Validate the flattening:

  • Ensure no requirements are lost or duplicated
  • Verify context paths are complete and accurate
  • Check that normative language is correctly identified
  • Confirm original references are preserved
  • Review that similar requirements are consistently formatted

Example Output (Markdown Format)

## Flattened Requirements

### Requirement 1
- **Flat ID**: 1
- **Original Reference**: Section 500.02(a)
- **Context Path**: Part 500 > Section 500.02 > Paragraph (a)
- **Requirement Type**: Mandate
- **Normative Level**: SHALL
- **Text**: Each Covered Entity shall maintain a cybersecurity program designed to protect the confidentiality, integrity and availability of the Covered Entity's Information Systems.
- **Keywords**: cybersecurity program, confidentiality, integrity, availability, Information Systems

---

### Requirement 2
- **Flat ID**: 2
- **Original Reference**: Section 500.02(b)
- **Context Path**: Part 500 > Section 500.02 > Paragraph (b)
- **Requirement Type**: Mandate
- **Normative Level**: SHALL
- **Text**: The cybersecurity program shall be based on the Covered Entity's Risk Assessment and designed to perform the following core cybersecurity functions...
- **Keywords**: Risk Assessment, core cybersecurity functions

---

Notes

  • XLSX output is recommended for most use cases as it provides:
    • Professional formatting with styled headers
    • Auto-filter for easy data exploration
    • Frozen header row for scrolling large datasets
    • Optimized column widths for readability
    • Text wrapping for long requirement text
  • Handle tables within documents by processing each cell as potential requirement text
  • Preserve cross-references between sections where they exist
  • Flag ambiguous or incomplete requirements for manual review
  • Support batch processing of multiple documents
  • Maintain a processing log of any items that couldn't be automatically classified

XLSX Features

The XLSX export includes:

  • Header styling: Bold white text on blue background
  • Auto-filter: Filter requirements by any column
  • Frozen panes: Header row stays visible when scrolling
  • Column widths: Optimized for each field type (10-60 characters)
  • Text wrapping: Long text automatically wraps for readability
  • Borders: Clean grid lines for professional appearance

Supporting Files

See examples.md for sample input/output pairs and template.md for customizable output templates.

README

Legislative Flattener - Claude Code Skill

A powerful Claude Code skill for converting hierarchical legislative text from Word documents into flat, numbered requirement lists.

License: MIT Python 3.8+ Claude Code

Perfect for compliance analysis, regulatory mapping, and legislative documentation processing.

🎯 Features

  • 🔄 Parent-Child Flattening: Intelligently combines parent requirements with sub-items into standalone requirements
  • 📊 Word Metadata Support: Reads hidden Word numbering to auto-generate hierarchical references like (1)(a), (1)(b)(i)
  • 🎨 Multiple Export Formats: Excel (XLSX), CSV, JSON, and Markdown
  • 🔍 Smart Filtering: Automatically excludes definition sections (X.1.1, X.1.2), notes, and applicability statements
  • 📝 Context Tracking: Maintains full document structure with hierarchical paths
  • ✨ Normative Analysis: Identifies requirement types (Mandate, Prohibition, Permission) and normative levels (SHALL, MUST, SHOULD, MAY)

📋 Prerequisites

  • Python 3.8 or higher
  • Claude Code (optional, for skill integration)

🚀 Quick Start

Install as Claude Code Skill (Recommended)

# Clone the repository
git clone https://github.com/yourusername/legislative-flattener-skill.git

# Copy to Claude Code skills directory
mkdir -p ~/.claude/skills
cp -r legislative-flattener-skill ~/.claude/skills/legislative-flattener

# Install Python dependencies
cd ~/.claude/skills/legislative-flattener
pip install -r requirements.txt

Now the skill is available in all your Claude Code projects! You can invoke it by typing /legislative-flattener or mentioning legislative document processing.

Standalone Installation

# Clone the repository
git clone https://github.com/yourusername/legislative-flattener-skill.git
cd legislative-flattener-skill

# Install dependencies
pip install -r requirements.txt

# Run directly
python flattener_utility.py input.docx output.xlsx

💻 Usage

Command Line

# Basic usage - defaults to Excel output
python flattener_utility.py input.docx output.xlsx

# Specify output format
python flattener_utility.py input.docx output.csv --format csv
python flattener_utility.py input.docx output.json --format json
python flattener_utility.py input.docx output.md --format markdown

Python API

from flattener_utility import LegislativeFlattener

# Create flattener instance
flattener = LegislativeFlattener()

# Process document
requirements = flattener.flatten_from_docx('input.docx')

print(f"Found {len(requirements)} requirements")

# Export to desired format
flattener.export_to_xlsx(requirements, 'output.xlsx')
flattener.export_to_json(requirements, 'output.json')
flattener.export_to_csv(requirements, 'output.csv')
flattener.export_to_markdown(requirements, 'output.md')

With Claude Code

Simply open Claude Code in any project and:

  1. Type: analyze ASIC_8A.docx with legislative-flattener
  2. Or: @legislative-flattener process my regulatory document

Claude will automatically use the skill to process your legislative documents!

📖 Example

Input Document Structure

Rule 8A.3.1 - Critical Business Services

(1) An Operator must have adequate arrangements...

(2) Without limiting subrule (1), arrangements must include:
    (a) identifying Critical Business Services;
    (b) assessing and managing risks;
    (c) ensuring sufficient capacity;

Output (Flattened to Excel)

Flat ID Original Ref Context Path Type Normative Level Text
1 8A.3.1(1) Rule 8A.3.1 Mandate MUST An Operator must have adequate arrangements...
2 8A.3.1(2)(a) Rule 8A.3.1 Mandate MUST Without limiting subrule (1), arrangements must include identifying Critical Business Services
3 8A.3.1(2)(b) Rule 8A.3.1 Mandate MUST Without limiting subrule (1), arrangements must include assessing and managing risks
4 8A.3.1(2)(c) Rule 8A.3.1 Mandate MUST Without limiting subrule (1), arrangements must include ensuring sufficient capacity

🎨 Output Formats

Excel (XLSX) - Default & Recommended

  • ✨ Professional formatting with colored headers
  • 🔍 Auto-filter on all columns
  • 📌 Frozen header row
  • 📏 Optimized column widths
  • 📝 Text wrapping for readability
  • 🎯 Clean borders and grid lines

JSON

  • 🔧 Structured data for programmatic use
  • 🌐 UTF-8 encoded
  • 📦 Easy integration with other tools

CSV

  • 📊 Universal spreadsheet compatibility
  • 🚀 Lightweight and portable

Markdown

  • 📄 Human-readable documentation
  • 🔖 Version control friendly
  • ✍️ Perfect for reports

🧪 Tested With

Successfully processed:

  • ASIC Chapter 8A: 89 requirements captured (5/7 rules perfect match)
  • ASIC Chapter 8B: 83 requirements captured (5/6 rules within ±2)
  • ✅ Multi-level hierarchies (up to 4 levels deep: (1)(a)(i)(A))
  • ✅ Complex Word numbering schemes with multiple NumIDs
  • ✅ Mixed explicit and implicit numbering

📚 Skill Files

This repository contains a complete Claude Code skill:

  • SKILL.md: Skill definition and metadata for Claude Code
  • flattener_utility.py: Core Python implementation
  • examples.md: Sample input/output examples
  • template.md: Output template documentation
  • requirements.txt: Python dependencies
  • README.md: This file

🔧 How It Works

  1. Document Analysis: Reads Word document structure, styles, and numbering metadata
  2. Hierarchy Detection: Identifies parent-child relationships using Word styles (MIR Body Text, MIR Subpara)
  3. Numbering Generation: Auto-generates hierarchical references from Word's hidden numbering levels
  4. Text Flattening: Combines parent requirements with sub-items to create standalone requirements
  5. Smart Filtering: Excludes definitions (X.1.1, X.1.2), notes, and applicability statements
  6. Classification: Analyzes normative language to classify requirements and extract keywords
  7. Export: Generates formatted output in your preferred format

🤝 Contributing

Contributions are welcome! Please:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

  • Built for Claude Code skill integration
  • Designed for ASIC regulatory compliance analysis
  • Inspired by the need for better legislative document processing in compliance workflows

📞 Support

🗺️ Roadmap

  • Support for PDF input files
  • Configurable filtering rules via config file
  • Web interface for non-technical users
  • Batch processing for multiple documents
  • Custom output templates
  • Export to Word, HTML, and database formats
  • Requirement comparison and diff tools
  • Integration with compliance management systems

Built with ❤️ for the compliance and regulatory community

Star ⭐ this repository if you find it helpful!