D365 Knowledge Graph Construction

Transform your Dynamics 365 CRM into knowledge graph for advanced analytics and relationship discovery

🎯 Problem

Due to the nature of query languange, there are limitation for LLMs to extract meaning and relationship within traditional record based system. Knowledge graph has been proven to improve the accuracy and reduced hallucination when presented with multi-part questioning that requires connecting dots across system.

https://neo4j.com/blog/genai/knowledge-graph-llm-multi-hop-reasoning/

This automated ETL pipeline transforms D365 into a Neo4j knowledge graph, unlocking powerful graph analytics, relationship discovery, and AI-powered insights that are impossible with traditional relational queries.

🎯 Project Goals

This project builds a comprehensive knowledge graph from Microsoft Dynamics 365 data using Neo4j Aura DB. The system extracts, transforms, and loads D365 entities and their relationships into a graph database, enabling advanced analytics, relationship discovery, and intelligent querying capabilities.

Primary Objectives

Data Integration: Extract data from Dynamics 365 Dataverse
Graph Transformation: Convert relational D365 data into a property graph model
Relationship Preservation: Maintain all existing D365 relationships in the knowledge graph
Extract meaning from text and description: Extract relationship from text in notes, email, and relates them to entity
Real-time Synchronization: Keep the knowledge graph updated with D365 changes
Scalable Architecture: Build a robust, maintainable, and scalable ETL pipeline

🚀 Core Functionality

1. Data Extraction ✅ COMPLETE

Async D365 Client: High-performance async client with MSAL authentication
Rate Limiting: Automatic rate limiting (6000 requests/minute) with backoff
Multi-Entity Extractors: 10 specialized extractors for all D365 entities
File-Based Storage: Parquet compression with JSON fallback
Pagination Handling: Microsoft-recommended @odata.nextLink approach
Batch Processing: Configurable batch sizes for optimal performance
Quality Scoring: Entity-specific validation and data quality metrics

2. Graph Transformation ✅ COMPLETE

Schema Mapping: Automated D365 entity to Neo4j node mapping
Field Transformation: 50+ business rules for data standardization
Relationship Building: Intelligent relationship discovery and mapping
Data Validation: Comprehensive validation with quality scoring
Business Rules Engine: Entity-specific transformations and formatting
Multi-Entity Support: Unified transformation pipeline for all 10 entities

3. Neo4j Loading ✅ COMPLETE

Connection Management: Async Neo4j driver with connection pooling
Batch Loading: UNWIND-based batch operations for optimal performance
Node Creation: MERGE operations for idempotency (prevents duplicates)
Relationship Loading: Polymorphic relationship handling
File Support: Direct loading from Parquet, JSON, JSONL files
Index Management: Automatic constraint and index creation
Data Validation: Pre-load validation with quality checks

4. Supported D365 Entities (11 Total)

Core Business Entities:
- Account (Companies/Organizations)
- Contact (Individual Persons)
- Lead (Sales Prospects)
Sales Transaction Entities:
- Opportunity (Sales Deals)
- Order (Sales Orders)
- Invoice (Financial Invoices)
Communication Activity Entities:
- Email (Email Communications)
- PhoneCall (Phone Call Activities)
- Appointment (Scheduled Meetings)
- ActivityParty (Activity Participants - Senders/Recipients) ✨ NEW
Content Entities:
- Note/Annotation (Notes, Attachments, Comments)
Extensibility: Framework supports custom D365 entities

5. Knowledge Graph Operations

Batch loading of historical data
Real-time incremental updates
Data quality validation
Relationship integrity checks
Graph optimization and indexing

📁 Project Structure

D365KGConstruct/
│
├── README.md                 # Project overview and documentation
├── requirements.txt          # Python dependencies
├── .env.example             # Environment variables template
├── config.yaml              # Configuration settings
│
├── documents/               # Documentation folder
│   ├── architecture.md      # System architecture and design
│   ├── api-reference.md     # API documentation
│   └── deployment-guide.md  # Deployment instructions
│
├── src/                     # Source code
│   ├── __init__.py
│   ├── extractors/          # D365 data extraction modules
│   ├── transformers/        # Data transformation logic
│   ├── loaders/            # Neo4j loading modules
│   ├── models/             # Data models and schemas
│   ├── utils/              # Utility functions
│   └── orchestration/      # Pipeline orchestration
│
├── tasks/                   # Manual task tracking
│   └── TODO.md             # Task list and progress tracking
│
├── ai-dev-tasks/           # AI-assisted development tasks
│   ├── prompts/            # AI prompts for code generation
│   └── generated/          # AI-generated code artifacts
│
├── tests/                  # Test suite
│   ├── unit/              # Unit tests
│   ├── integration/       # Integration tests
│   └── fixtures/          # Test data and mocks
│
├── scripts/               # Utility scripts
│   ├── setup_neo4j.py    # Neo4j initialization script
│   ├── validate_graph.py # Graph validation utilities
│   └── run_etl.py       # Main ETL execution script
│
└── docker/               # Docker configuration
    ├── Dockerfile       # Container definition
    └── docker-compose.yml # Multi-container setup

🛠️ Technology Stack

Graph Database: Neo4j Aura DB
D365 Integration: Dataverse Web API / Azure SDK
Data Processing: Pandas, NumPy
Graph Driver: py2neo / neo4j-python-driver
Containerization: Docker
CI/CD: GitHub Actions / Azure DevOps

🚦 Quick Start

Clone the Repository

git clone <repository-url>
cd D365KGConstruct

Set Up Environment

Windows Users (Recommended):

# Option 1: Quick setup (use if starting fresh)
reset_venv.bat
venv\Scripts\activate
pip install -r requirements.txt

# Option 2: Manual setup
python -m venv venv
venv\Scripts\activate
pip install -r requirements.txt

If Virtual Environment is Broken:

# This fixes "Unable to create process" errors
reset_venv.bat
venv\Scripts\activate
pip install -r requirements-minimal.txt

Linux/Mac Users:

python -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Minimal Installation (Core Features Only):

pip install -r requirements-minimal.txt
python -m src.cli --help
# Note: System works with minimal packages using JSON storage

For Annotation Enrichment (Optional):

# After activating venv
install_enrichment.bat   # Windows
# Or manually:
pip install neo4j-graphrag openai beautifulsoup4 html2text

Configure Credentials

cp .env.example .env
# Edit .env with your D365 and Neo4j credentials

Verify Connectivity

# Test all connections (D365 + Neo4j)
python -m src.cli test-connection

# Test specific connections
python -m src.cli test-connection --service neo4j    # Neo4j AuraDB only
python -m src.cli test-connection --service d365     # D365 Dataverse only

Initialize Neo4j Schema
```
python -m src.cli init
```

Extract Data from D365 to Files

# Multi-Entity Extraction (All 11 entities in one run)
python -m src.cli extract --mode=full                # Extract ALL entities
python -m src.cli extract --mode=incremental         # Incremental multi-entity

# Single Entity Extraction
python -m src.cli extract --entity=account --mode=full       # Accounts only
python -m src.cli extract --entity=contact --mode=full       # Contacts only
python -m src.cli extract --entity=lead --mode=full          # Leads only
python -m src.cli extract --entity=opportunity --mode=full   # Opportunities only
python -m src.cli extract --entity=salesorder --mode=full    # Orders only
python -m src.cli extract --entity=invoice --mode=full       # Invoices only
python -m src.cli extract --entity=email --mode=full         # Emails only
python -m src.cli extract --entity=phonecall --mode=full     # Phone calls only
python -m src.cli extract --entity=appointment --mode=full   # Appointments only
python -m src.cli extract --entity=activityparty --mode=full # Activity participants only
python -m src.cli extract --entity=annotation --mode=full    # Notes/attachments only

# Custom output directory
python -m src.cli extract --output-dir=custom/path --mode=full

# Output structure for multi-entity extraction:
# /output/extract/{run_id}/
# ├── account/batch_001.parquet
# ├── contact/batch_001.parquet
# ├── lead/batch_001.parquet
# ├── opportunity/batch_001.parquet
# ├── salesorder/batch_001.parquet
# ├── invoice/batch_001.parquet
# ├── email/batch_001.parquet
# ├── phonecall/batch_001.parquet
# ├── appointment/batch_001.parquet
# ├── activityparty/batch_001.parquet  # Links activities to contacts
# └── annotation/batch_001.parquet

Load Extracted Data to Neo4j

# Load all entities from extraction directory
python -m src.cli load --source output/extract/{run_id}

# Load specific entity only
python -m src.cli load --source output/extract/{run_id} --entity=account

# Load annotations WITH entity extraction from text (using LLMs)
python -m src.cli load --source output/extract/{run_id} --entity=annotation --enrich

# Test annotation enrichment with sample (cost-effective)
python -m src.cli load --source output/extract/{run_id} --entity=annotation --enrich --enrich-sample=10

# Clear existing graph before loading (careful!)
python -m src.cli load --source output/extract/{run_id} --clear-first

# Custom batch size for loading
python -m src.cli load --source output/extract/{run_id} --batch-size=500

# Create indexes and constraints only (no data loading)
python -m src.cli load --source output/extract/{run_id} --indexes-only

# Supported file formats: Parquet, JSON, JSONL
# The loader automatically detects and processes all formats

Run Complete ETL Pipeline

# Full pipeline: Extract → Transform → Load
python -m src.cli run --mode=full         # Complete ETL (initial load)
python -m src.cli run --mode=incremental  # Incremental ETL pipeline
python -m src.cli run --dry-run           # Dry run without making changes

# Pipeline with options
python -m src.cli run --mode=full --clear-graph      # Clear graph before loading
python -m src.cli run --mode=full --entity=account   # Single entity pipeline

# Extract annotations AND enrich with LLM entity extraction
python -m src.cli run --mode=full --entity=annotation --enrich

# Test annotation enrichment with sample (10 annotations)
python -m src.cli run --mode=full --entity=annotation --enrich --enrich-sample=10

# Skip specific phases
python -m src.cli run --skip-extract      # Use existing extracted files
python -m src.cli run --skip-transform    # Skip transformation phase
python -m src.cli run --skip-load         # Extract and transform only

Activity Text Enrichment with LLM ✨ NEW

# The --enrich flag enables LLM-based entity extraction from activity text
# Works on: Annotations (notetext), PhoneCalls (description), Appointments (description)
# Extracts: Entities (Person, Company, Product, etc.) and relationships (WORKS_FOR, FOUNDED_BY, etc.)

# Enrich ALL activities (annotations + phonecalls + appointments)
python -m src.cli load --source output/extract/{run_id} --enrich

# Enrich specific activity type only
python -m src.cli load --source output/extract/{run_id} --entity=phonecall --enrich
python -m src.cli load --source output/extract/{run_id} --entity=appointment --enrich
python -m src.cli load --source output/extract/{run_id} --entity=annotation --enrich

# Test with sample before full processing (cost-effective, 10 records per entity type)
python -m src.cli load --source output/extract/{run_id} --enrich --enrich-sample=10

# Full ETL with enrichment in one command
python -m src.cli run --mode=full --enrich

# Requirements:
# - Set OPENAI_API_KEY environment variable
# - Install: pip install neo4j-graphrag openai beautifulsoup4 html2text
# - Configure: config/annotation_kg_schema.yaml (optional)

# Output (written directly to Neo4j by SimpleKGPipeline):
# - :Company, :Person, :Product, :Technology, :Location nodes (LLM-extracted)
# - Relationships: WORKS_FOR, FOUNDED_BY, PARTNERED_WITH, etc.
# - Super-labels automatically applied: :BusinessEntity, :PersonEntity

# Example: Enrich phone call that mentions "Ivan Komashinsky purchased My Course (ABC111)"
# Creates: Person node "Ivan Komashinsky", Product node "My Course (ABC111)"
# Relationship: (Ivan)-[:PURCHASED]->(My Course)

Apply Ontology Super-Labels ✨ NEW

# Apply BusinessEntity and PersonEntity super-labels to existing nodes
# This enables querying across entity type synonyms (Company/Account, Person/Contact)
python -m src.cli apply-ontology

# What it does:
# - Adds :BusinessEntity label to both :Account (D365) and :Company (LLM) nodes
# - Adds :PersonEntity label to both :Contact (D365) and :Person (LLM) nodes
# - Sets source property to track data origin (D365 vs LLM)

# Example queries after applying ontology:
# Find all business entities (both D365 Accounts and LLM-extracted Companies)
MATCH (n:BusinessEntity {name: "Coho Winery"})

OPTIONAL MATCH (n)-[r]-(related) RETURN n, r, related

# Find all person entities (both D365 Contacts and LLM-extracted Persons)
MATCH (n:PersonEntity) RETURN n.name, n.source LIMIT 10

# See docs/ontology_and_entity_resolution.md for full documentation
```

QUICK RUN

python -m src.cli init
python -m src.cli extract --mode=full   
python -m src.cli load --source .\output\extract\full_2025-12-18_180701\ --enrich --clear-first

📈 Use Cases

360-Degree Customer View: Visualize all customer interactions and relationships
Sales Intelligence: Discover hidden patterns in sales data
Relationship Analytics: Analyze complex business relationships
Impact Analysis: Understand cascading effects of business changes
Recommendation Engine: Build AI-powered recommendations based on graph patterns
Fraud Detection: Identify suspicious patterns through graph algorithms
Master Data Management: Maintain a single source of truth for entity relationships

🤝 Contributing

Please refer to CONTRIBUTING.md for guidelines on how to contribute to this project.

📝 License

This project is licensed under the MIT License - see the LICENSE file for details.

📞 Support

For questions, issues, or suggestions:

Create an issue in the GitHub repository
Contact the development team at [team-email]
Refer to the documentation for detailed guides

🏁 Project Status

Current Version: 0.3.0 (Beta) - Phase 3 Complete ✅

✅ Phase 1: Foundation (100% Complete)

Project structure setup
Architecture design
Neo4j Aura DB connectivity
D365 OAuth authentication setup

✅ Phase 2: Core Development (100% Complete)

✅ Phase 3: Loading Module (100% Complete)

Neo4j connection manager with async driver
Node creation logic with MERGE operations
Relationship creation logic (polymorphic support)
Batch processing with UNWIND queries
Index and constraint management
File-based loading (Parquet/JSON/JSONL)
Data validation and quality checks
Complete CLI integration

📋 Upcoming Phases

Pipeline orchestration (Airflow/Prefect)
Enhanced incremental sync mechanism
Testing suite completion
Performance optimization
Production deployment
Relationship loading enhancement
Advanced graph analytics

Overall Progress: 90% Complete

🔧 Recent Updates

December 19, 2025 - Multi-Entity Enrichment & Ontology

Extended: LLM enrichment now supports PhoneCall and Appointment entities (not just Annotations)
Added: extract_from_activity() method for generic activity text enrichment
Enhanced: Single --enrich flag automatically processes all activity types
Added: apply-ontology CLI command for semantic entity alignment
Feature: Automatic super-label application (BusinessEntity, PersonEntity)
Fixed: Critical bug in super-label application (wrong query execution mode)
Enhanced: execute_write_query now supports returning data with return_data parameter
Result: Extract entities from phone call descriptions, appointment notes, and annotations
Result: Query across entity type synonyms (Company/Account, Person/Contact)
Documentation: Complete ontology guide in docs/ontology_and_entity_resolution.md

December 11, 2025 - ActivityParty Extraction Fix

Fixed: Entity name pluralization bug (activityparty → activityparties)
Fixed: Invalid field configuration for ActivityParty entity
Fixed: Null handling in ActivityParty transformation
Added: 11th entity support (ActivityParty) for activity-to-participant relationships
Result: Successfully extracts sender/recipient/organizer relationships for emails, calls, and appointments

Last Updated: December 19, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

D365 Knowledge Graph Construction

🎯 Problem

🎯 Project Goals

Primary Objectives

🚀 Core Functionality

1. Data Extraction ✅ COMPLETE

2. Graph Transformation ✅ COMPLETE

3. Neo4j Loading ✅ COMPLETE

4. Supported D365 Entities (11 Total)

5. Knowledge Graph Operations

📁 Project Structure

🛠️ Technology Stack

🚦 Quick Start

QUICK RUN

📈 Use Cases

🤝 Contributing

📝 License

📞 Support

🏁 Project Status

✅ Phase 1: Foundation (100% Complete)

✅ Phase 2: Core Development (100% Complete)

✅ Phase 3: Loading Module (100% Complete)

📋 Upcoming Phases

🔧 Recent Updates

December 19, 2025 - Multi-Entity Enrichment & Ontology

December 11, 2025 - ActivityParty Extraction Fix

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.github		.github
config		config
documents		documents
src		src
tasks		tasks
tests		tests
.env.example		.env.example
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
config.yaml		config.yaml
install_windows.bat		install_windows.bat
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
requirements-core.txt		requirements-core.txt
requirements-data.txt		requirements-data.txt
requirements-minimal.txt		requirements-minimal.txt
requirements.txt		requirements.txt

License

awong789/dynamics365-knowledge-graph

Folders and files

Latest commit

History

Repository files navigation

D365 Knowledge Graph Construction

🎯 Problem

🎯 Project Goals

Primary Objectives

🚀 Core Functionality

1. Data Extraction ✅ COMPLETE

2. Graph Transformation ✅ COMPLETE

3. Neo4j Loading ✅ COMPLETE

4. Supported D365 Entities (11 Total)

5. Knowledge Graph Operations

📁 Project Structure

🛠️ Technology Stack

🚦 Quick Start

QUICK RUN

📈 Use Cases

🤝 Contributing

📝 License

📞 Support

🏁 Project Status

✅ Phase 1: Foundation (100% Complete)

✅ Phase 2: Core Development (100% Complete)

✅ Phase 3: Loading Module (100% Complete)

📋 Upcoming Phases

🔧 Recent Updates

December 19, 2025 - Multi-Entity Enrichment & Ontology

December 11, 2025 - ActivityParty Extraction Fix

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages