legislative-flattener

cri-matthew/legislative-flattener-skill

Converts hierarchical legislative text from Word documents into a flat list of requirements. Use when processing regulatory documents, compliance frameworks, or legal text that needs to be extracted into individual, numbered requirements for analysis or mapping.

0 stars

0 forks

Python

5 views

View on GitHub Add to Favorites

SKILL.md

name: legislative-flattener description: Converts hierarchical legislative text from Word documents into a flat list of requirements. Use when processing regulatory documents, compliance frameworks, or legal text that needs to be extracted into individual, numbered requirements for analysis or mapping. allowed-tools: Read, Write, Bash, Grep, Glob

Legislative Text Flattener

This skill processes Word documents (DOCX) containing legislative or regulatory text and converts hierarchical structures into a flat, numbered list of discrete requirements or provisions.

When to Use This Skill

Activate this skill when the user wants to:

Flatten hierarchical legislative or regulatory text
Extract requirements from compliance frameworks
Convert nested sections/subsections into a linear list
Prepare legislative text for requirement mapping
Process Word documents containing legal or regulatory content

Quick Start

Install required dependencies:

pip install python-docx openpyxl

Basic usage (outputs formatted XLSX by default):

python flattener_utility.py input.docx output.xlsx

Or specify format explicitly:

python flattener_utility.py input.docx output.xlsx --format xlsx
python flattener_utility.py input.docx output.csv --format csv
python flattener_utility.py input.docx output.json --format json

Processing Instructions

1. Input Analysis

First, understand the input document structure:

Read the DOCX file (use python-docx library or convert to text)
Identify the hierarchical structure (sections, subsections, paragraphs, sub-paragraphs)
Detect numbering schemes (1.2.3, (a)(b)(c), Article-Section-Paragraph, etc.)
Note any special formatting or emphasis (bold, italics, SHALL/MUST keywords)

2. Extraction Process

Extract content while preserving context:

Each discrete requirement or provision becomes a separate item
Maintain parent context for nested items
Preserve the original numbering/reference system
Extract complete sentences or logical requirement units
Identify normative language (shall, must, should, may)

3. Flattening Strategy

Convert hierarchy to flat structure:

Assign sequential flat numbering (1, 2, 3...)
Include original reference in metadata (e.g., "Section 500.02(b)(3)")
Preserve full context path for each item
Handle multi-level lists and nested requirements
Maintain logical groupings where appropriate

4. Output Format

Generate a structured flat list with these fields for each requirement:

Flat ID: [Sequential number]
Original Reference: [Original section/subsection identifier]
Context Path: [Full hierarchical path, e.g., "Article 5 > Section 2 > Paragraph b"]
Requirement Type: [Mandate/Prohibition/Permission/Definition]
Normative Level: [SHALL/MUST/SHOULD/MAY/INFORMATIVE]
Text: [Full requirement text]
Keywords: [Extracted key concepts or compliance areas]
---

5. Implementation Approach

Use this workflow:

# Install required library if needed
# pip install python-docx

from docx import Document
import re

def flatten_legislative_text(docx_path):
    """
    Flattens hierarchical legislative text from DOCX.
    Returns a list of flattened requirements.
    """
    doc = Document(docx_path)
    flattened = []
    flat_id = 1
    context_stack = []

    for paragraph in doc.paragraphs:
        # Skip empty paragraphs
        if not paragraph.text.strip():
            continue

        # Detect hierarchy level (by style or numbering)
        level = detect_level(paragraph)
        original_ref = extract_reference(paragraph)

        # Update context stack
        update_context(context_stack, level, paragraph.text)

        # Extract if it's a requirement (not just a heading)
        if is_requirement(paragraph):
            item = {
                'flat_id': flat_id,
                'original_ref': original_ref,
                'context_path': ' > '.join(context_stack),
                'requirement_type': classify_requirement(paragraph.text),
                'normative_level': extract_normative_level(paragraph.text),
                'text': clean_text(paragraph.text),
                'keywords': extract_keywords(paragraph.text)
            }
            flattened.append(item)
            flat_id += 1

    return flattened

6. Output Options

Provide results in user's preferred format:

XLSX (Excel): Formatted Excel workbook with styled headers, auto-filter, frozen panes, and optimized column widths (default and recommended)
CSV: Simple tabular format with all fields as columns
JSON: Structured data for programmatic use
Markdown: Human-readable with clear section breaks
Database insert: SQL statements or database-ready format

7. Quality Checks

Validate the flattening:

Ensure no requirements are lost or duplicated
Verify context paths are complete and accurate
Check that normative language is correctly identified
Confirm original references are preserved
Review that similar requirements are consistently formatted

Example Output (Markdown Format)

## Flattened Requirements

### Requirement 1
- **Flat ID**: 1
- **Original Reference**: Section 500.02(a)
- **Context Path**: Part 500 > Section 500.02 > Paragraph (a)
- **Requirement Type**: Mandate
- **Normative Level**: SHALL
- **Text**: Each Covered Entity shall maintain a cybersecurity program designed to protect the confidentiality, integrity and availability of the Covered Entity's Information Systems.
- **Keywords**: cybersecurity program, confidentiality, integrity, availability, Information Systems

---

### Requirement 2
- **Flat ID**: 2
- **Original Reference**: Section 500.02(b)
- **Context Path**: Part 500 > Section 500.02 > Paragraph (b)
- **Requirement Type**: Mandate
- **Normative Level**: SHALL
- **Text**: The cybersecurity program shall be based on the Covered Entity's Risk Assessment and designed to perform the following core cybersecurity functions...
- **Keywords**: Risk Assessment, core cybersecurity functions

---

Notes

XLSX output is recommended for most use cases as it provides:
- Professional formatting with styled headers
- Auto-filter for easy data exploration
- Frozen header row for scrolling large datasets
- Optimized column widths for readability
- Text wrapping for long requirement text
Handle tables within documents by processing each cell as potential requirement text
Preserve cross-references between sections where they exist
Flag ambiguous or incomplete requirements for manual review
Support batch processing of multiple documents
Maintain a processing log of any items that couldn't be automatically classified

XLSX Features

The XLSX export includes:

Header styling: Bold white text on blue background
Auto-filter: Filter requirements by any column
Frozen panes: Header row stays visible when scrolling
Column widths: Optimized for each field type (10-60 characters)
Text wrapping: Long text automatically wraps for readability
Borders: Clean grid lines for professional appearance

Supporting Files

See examples.md for sample input/output pairs and template.md for customizable output templates.

README

Legislative Flattener - Claude Code Skill

A powerful Claude Code skill for converting hierarchical legislative text from Word documents into flat, numbered requirement lists.

Perfect for compliance analysis, regulatory mapping, and legislative documentation processing.

🎯 Features

🔄 Parent-Child Flattening: Intelligently combines parent requirements with sub-items into standalone requirements
📊 Word Metadata Support: Reads hidden Word numbering to auto-generate hierarchical references like (1)(a), (1)(b)(i)
🎨 Multiple Export Formats: Excel (XLSX), CSV, JSON, and Markdown
🔍 Smart Filtering: Automatically excludes definition sections (X.1.1, X.1.2), notes, and applicability statements
📝 Context Tracking: Maintains full document structure with hierarchical paths
✨ Normative Analysis: Identifies requirement types (Mandate, Prohibition, Permission) and normative levels (SHALL, MUST, SHOULD, MAY)

📋 Prerequisites

Python 3.8 or higher
Claude Code (optional, for skill integration)

🚀 Quick Start

Install as Claude Code Skill (Recommended)

# Clone the repository
git clone https://github.com/yourusername/legislative-flattener-skill.git

# Copy to Claude Code skills directory
mkdir -p ~/.claude/skills
cp -r legislative-flattener-skill ~/.claude/skills/legislative-flattener

# Install Python dependencies
cd ~/.claude/skills/legislative-flattener
pip install -r requirements.txt

Now the skill is available in all your Claude Code projects! You can invoke it by typing /legislative-flattener or mentioning legislative document processing.

Standalone Installation

# Clone the repository
git clone https://github.com/yourusername/legislative-flattener-skill.git
cd legislative-flattener-skill

# Install dependencies
pip install -r requirements.txt

# Run directly
python flattener_utility.py input.docx output.xlsx

💻 Usage

Command Line

# Basic usage - defaults to Excel output
python flattener_utility.py input.docx output.xlsx

# Specify output format
python flattener_utility.py input.docx output.csv --format csv
python flattener_utility.py input.docx output.json --format json
python flattener_utility.py input.docx output.md --format markdown

Python API

from flattener_utility import LegislativeFlattener

# Create flattener instance
flattener = LegislativeFlattener()

# Process document
requirements = flattener.flatten_from_docx('input.docx')

print(f"Found {len(requirements)} requirements")

# Export to desired format
flattener.export_to_xlsx(requirements, 'output.xlsx')
flattener.export_to_json(requirements, 'output.json')
flattener.export_to_csv(requirements, 'output.csv')
flattener.export_to_markdown(requirements, 'output.md')

With Claude Code

Simply open Claude Code in any project and:

Type: analyze ASIC_8A.docx with legislative-flattener
Or: @legislative-flattener process my regulatory document

Claude will automatically use the skill to process your legislative documents!

📖 Example

Input Document Structure

Rule 8A.3.1 - Critical Business Services

(1) An Operator must have adequate arrangements...

(2) Without limiting subrule (1), arrangements must include:
    (a) identifying Critical Business Services;
    (b) assessing and managing risks;
    (c) ensuring sufficient capacity;

Output (Flattened to Excel)

Flat ID	Original Ref	Context Path	Type	Normative Level	Text
1	8A.3.1(1)	Rule 8A.3.1	Mandate	MUST	An Operator must have adequate arrangements...
2	8A.3.1(2)(a)	Rule 8A.3.1	Mandate	MUST	Without limiting subrule (1), arrangements must include identifying Critical Business Services
3	8A.3.1(2)(b)	Rule 8A.3.1	Mandate	MUST	Without limiting subrule (1), arrangements must include assessing and managing risks
4	8A.3.1(2)(c)	Rule 8A.3.1	Mandate	MUST	Without limiting subrule (1), arrangements must include ensuring sufficient capacity

🎨 Output Formats

Excel (XLSX) - Default & Recommended

✨ Professional formatting with colored headers
🔍 Auto-filter on all columns
📌 Frozen header row
📏 Optimized column widths
📝 Text wrapping for readability
🎯 Clean borders and grid lines

JSON

🔧 Structured data for programmatic use
🌐 UTF-8 encoded
📦 Easy integration with other tools

CSV

📊 Universal spreadsheet compatibility
🚀 Lightweight and portable

Markdown

📄 Human-readable documentation
🔖 Version control friendly
✍️ Perfect for reports

🧪 Tested With

Successfully processed:

✅ ASIC Chapter 8A: 89 requirements captured (5/7 rules perfect match)
✅ ASIC Chapter 8B: 83 requirements captured (5/6 rules within ±2)
✅ Multi-level hierarchies (up to 4 levels deep: (1)(a)(i)(A))
✅ Complex Word numbering schemes with multiple NumIDs
✅ Mixed explicit and implicit numbering

📚 Skill Files

This repository contains a complete Claude Code skill:

SKILL.md: Skill definition and metadata for Claude Code
flattener_utility.py: Core Python implementation
examples.md: Sample input/output examples
template.md: Output template documentation
requirements.txt: Python dependencies
README.md: This file

🔧 How It Works

Document Analysis: Reads Word document structure, styles, and numbering metadata
Hierarchy Detection: Identifies parent-child relationships using Word styles (MIR Body Text, MIR Subpara)
Numbering Generation: Auto-generates hierarchical references from Word's hidden numbering levels
Text Flattening: Combines parent requirements with sub-items to create standalone requirements
Smart Filtering: Excludes definitions (X.1.1, X.1.2), notes, and applicability statements
Classification: Analyzes normative language to classify requirements and extract keywords
Export: Generates formatted output in your preferred format

🤝 Contributing

Contributions are welcome! Please:

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

Built for Claude Code skill integration
Designed for ASIC regulatory compliance analysis
Inspired by the need for better legislative document processing in compliance workflows

📞 Support

Issues: Report bugs or request features
Discussions: Ask questions or share ideas
Documentation: See SKILL.md for detailed skill documentation

🗺️ Roadmap

Support for PDF input files
Configurable filtering rules via config file
Web interface for non-technical users
Batch processing for multiple documents
Custom output templates
Export to Word, HTML, and database formats
Requirement comparison and diff tools
Integration with compliance management systems

Built with ❤️ for the compliance and regulatory community

Star ⭐ this repository if you find it helpful!

Installation

Option 1: Use slash command in Claude Code

/install-skill https://github.com/cri-matthew/legislative-flattener-skill

Option 2: Clone to skills directory

# Global (all projects)

git clone https://github.com/cri-matthew/legislative-flattener-skill ~/.claude/skills/legislative-flattener-skill

# Project-specific

git clone https://github.com/cri-matthew/legislative-flattener-skill .claude/skills/legislative-flattener-skill

Add MCP server to .cursor/mcp.json:

{
  "mcpServers": {
    "skillz": {
      "command": "npx",
      "args": ["-y", "skillz-mcp", "https://github.com/cri-matthew/legislative-flattener-skill"]
    }
  }
}

Restart Cursor after adding the configuration.

Option 1: Use Gemini CLI command

gemini extensions install https://github.com/cri-matthew/legislative-flattener-skill

Option 2: Clone to extensions directory

git clone https://github.com/cri-matthew/legislative-flattener-skill ~/.gemini/extensions/legislative-flattener-skill