Transform your Dynamics 365 CRM into knowledge graph for advanced analytics and relationship discovery
Due to the nature of query languange, there are limitation for LLMs to extract meaning and relationship within traditional record based system. Knowledge graph has been proven to improve the accuracy and reduced hallucination when presented with multi-part questioning that requires connecting dots across system.
https://neo4j.com/blog/genai/knowledge-graph-llm-multi-hop-reasoning/
This automated ETL pipeline transforms D365 into a Neo4j knowledge graph, unlocking powerful graph analytics, relationship discovery, and AI-powered insights that are impossible with traditional relational queries.
This project builds a comprehensive knowledge graph from Microsoft Dynamics 365 data using Neo4j Aura DB. The system extracts, transforms, and loads D365 entities and their relationships into a graph database, enabling advanced analytics, relationship discovery, and intelligent querying capabilities.
- Data Integration: Extract data from Dynamics 365 Dataverse
- Graph Transformation: Convert relational D365 data into a property graph model
- Relationship Preservation: Maintain all existing D365 relationships in the knowledge graph
- Extract meaning from text and description: Extract relationship from text in notes, email, and relates them to entity
- Real-time Synchronization: Keep the knowledge graph updated with D365 changes
- Scalable Architecture: Build a robust, maintainable, and scalable ETL pipeline
- Async D365 Client: High-performance async client with MSAL authentication
- Rate Limiting: Automatic rate limiting (6000 requests/minute) with backoff
- Multi-Entity Extractors: 10 specialized extractors for all D365 entities
- File-Based Storage: Parquet compression with JSON fallback
- Pagination Handling: Microsoft-recommended @odata.nextLink approach
- Batch Processing: Configurable batch sizes for optimal performance
- Quality Scoring: Entity-specific validation and data quality metrics
- Schema Mapping: Automated D365 entity to Neo4j node mapping
- Field Transformation: 50+ business rules for data standardization
- Relationship Building: Intelligent relationship discovery and mapping
- Data Validation: Comprehensive validation with quality scoring
- Business Rules Engine: Entity-specific transformations and formatting
- Multi-Entity Support: Unified transformation pipeline for all 10 entities
- Connection Management: Async Neo4j driver with connection pooling
- Batch Loading: UNWIND-based batch operations for optimal performance
- Node Creation: MERGE operations for idempotency (prevents duplicates)
- Relationship Loading: Polymorphic relationship handling
- File Support: Direct loading from Parquet, JSON, JSONL files
- Index Management: Automatic constraint and index creation
- Data Validation: Pre-load validation with quality checks
- Core Business Entities:
- Account (Companies/Organizations)
- Contact (Individual Persons)
- Lead (Sales Prospects)
- Sales Transaction Entities:
- Opportunity (Sales Deals)
- Order (Sales Orders)
- Invoice (Financial Invoices)
- Communication Activity Entities:
- Email (Email Communications)
- PhoneCall (Phone Call Activities)
- Appointment (Scheduled Meetings)
- ActivityParty (Activity Participants - Senders/Recipients) ✨ NEW
- Content Entities:
- Note/Annotation (Notes, Attachments, Comments)
- Extensibility: Framework supports custom D365 entities
- Batch loading of historical data
- Real-time incremental updates
- Data quality validation
- Relationship integrity checks
- Graph optimization and indexing
D365KGConstruct/
│
├── README.md # Project overview and documentation
├── requirements.txt # Python dependencies
├── .env.example # Environment variables template
├── config.yaml # Configuration settings
│
├── documents/ # Documentation folder
│ ├── architecture.md # System architecture and design
│ ├── api-reference.md # API documentation
│ └── deployment-guide.md # Deployment instructions
│
├── src/ # Source code
│ ├── __init__.py
│ ├── extractors/ # D365 data extraction modules
│ ├── transformers/ # Data transformation logic
│ ├── loaders/ # Neo4j loading modules
│ ├── models/ # Data models and schemas
│ ├── utils/ # Utility functions
│ └── orchestration/ # Pipeline orchestration
│
├── tasks/ # Manual task tracking
│ └── TODO.md # Task list and progress tracking
│
├── ai-dev-tasks/ # AI-assisted development tasks
│ ├── prompts/ # AI prompts for code generation
│ └── generated/ # AI-generated code artifacts
│
├── tests/ # Test suite
│ ├── unit/ # Unit tests
│ ├── integration/ # Integration tests
│ └── fixtures/ # Test data and mocks
│
├── scripts/ # Utility scripts
│ ├── setup_neo4j.py # Neo4j initialization script
│ ├── validate_graph.py # Graph validation utilities
│ └── run_etl.py # Main ETL execution script
│
└── docker/ # Docker configuration
├── Dockerfile # Container definition
└── docker-compose.yml # Multi-container setup
- Graph Database: Neo4j Aura DB
- D365 Integration: Dataverse Web API / Azure SDK
- Data Processing: Pandas, NumPy
- Graph Driver: py2neo / neo4j-python-driver
- Containerization: Docker
- CI/CD: GitHub Actions / Azure DevOps
-
Clone the Repository
git clone <repository-url> cd D365KGConstruct
-
Set Up Environment
Windows Users (Recommended):
# Option 1: Quick setup (use if starting fresh) reset_venv.bat venv\Scripts\activate pip install -r requirements.txt # Option 2: Manual setup python -m venv venv venv\Scripts\activate pip install -r requirements.txt
If Virtual Environment is Broken:
# This fixes "Unable to create process" errors reset_venv.bat venv\Scripts\activate pip install -r requirements-minimal.txt
Linux/Mac Users:
python -m venv venv source venv/bin/activate pip install -r requirements.txtMinimal Installation (Core Features Only):
pip install -r requirements-minimal.txt python -m src.cli --help # Note: System works with minimal packages using JSON storageFor Annotation Enrichment (Optional):
# After activating venv install_enrichment.bat # Windows # Or manually: pip install neo4j-graphrag openai beautifulsoup4 html2text
-
Configure Credentials
cp .env.example .env # Edit .env with your D365 and Neo4j credentials -
Verify Connectivity
# Test all connections (D365 + Neo4j) python -m src.cli test-connection # Test specific connections python -m src.cli test-connection --service neo4j # Neo4j AuraDB only python -m src.cli test-connection --service d365 # D365 Dataverse only
-
Initialize Neo4j Schema
python -m src.cli init
-
Extract Data from D365 to Files
# Multi-Entity Extraction (All 11 entities in one run) python -m src.cli extract --mode=full # Extract ALL entities python -m src.cli extract --mode=incremental # Incremental multi-entity # Single Entity Extraction python -m src.cli extract --entity=account --mode=full # Accounts only python -m src.cli extract --entity=contact --mode=full # Contacts only python -m src.cli extract --entity=lead --mode=full # Leads only python -m src.cli extract --entity=opportunity --mode=full # Opportunities only python -m src.cli extract --entity=salesorder --mode=full # Orders only python -m src.cli extract --entity=invoice --mode=full # Invoices only python -m src.cli extract --entity=email --mode=full # Emails only python -m src.cli extract --entity=phonecall --mode=full # Phone calls only python -m src.cli extract --entity=appointment --mode=full # Appointments only python -m src.cli extract --entity=activityparty --mode=full # Activity participants only python -m src.cli extract --entity=annotation --mode=full # Notes/attachments only # Custom output directory python -m src.cli extract --output-dir=custom/path --mode=full # Output structure for multi-entity extraction: # /output/extract/{run_id}/ # ├── account/batch_001.parquet # ├── contact/batch_001.parquet # ├── lead/batch_001.parquet # ├── opportunity/batch_001.parquet # ├── salesorder/batch_001.parquet # ├── invoice/batch_001.parquet # ├── email/batch_001.parquet # ├── phonecall/batch_001.parquet # ├── appointment/batch_001.parquet # ├── activityparty/batch_001.parquet # Links activities to contacts # └── annotation/batch_001.parquet
-
Load Extracted Data to Neo4j
# Load all entities from extraction directory python -m src.cli load --source output/extract/{run_id} # Load specific entity only python -m src.cli load --source output/extract/{run_id} --entity=account # Load annotations WITH entity extraction from text (using LLMs) python -m src.cli load --source output/extract/{run_id} --entity=annotation --enrich # Test annotation enrichment with sample (cost-effective) python -m src.cli load --source output/extract/{run_id} --entity=annotation --enrich --enrich-sample=10 # Clear existing graph before loading (careful!) python -m src.cli load --source output/extract/{run_id} --clear-first # Custom batch size for loading python -m src.cli load --source output/extract/{run_id} --batch-size=500 # Create indexes and constraints only (no data loading) python -m src.cli load --source output/extract/{run_id} --indexes-only # Supported file formats: Parquet, JSON, JSONL # The loader automatically detects and processes all formats
-
Run Complete ETL Pipeline
# Full pipeline: Extract → Transform → Load python -m src.cli run --mode=full # Complete ETL (initial load) python -m src.cli run --mode=incremental # Incremental ETL pipeline python -m src.cli run --dry-run # Dry run without making changes # Pipeline with options python -m src.cli run --mode=full --clear-graph # Clear graph before loading python -m src.cli run --mode=full --entity=account # Single entity pipeline # Extract annotations AND enrich with LLM entity extraction python -m src.cli run --mode=full --entity=annotation --enrich # Test annotation enrichment with sample (10 annotations) python -m src.cli run --mode=full --entity=annotation --enrich --enrich-sample=10 # Skip specific phases python -m src.cli run --skip-extract # Use existing extracted files python -m src.cli run --skip-transform # Skip transformation phase python -m src.cli run --skip-load # Extract and transform only
-
Activity Text Enrichment with LLM ✨ NEW
# The --enrich flag enables LLM-based entity extraction from activity text # Works on: Annotations (notetext), PhoneCalls (description), Appointments (description) # Extracts: Entities (Person, Company, Product, etc.) and relationships (WORKS_FOR, FOUNDED_BY, etc.) # Enrich ALL activities (annotations + phonecalls + appointments) python -m src.cli load --source output/extract/{run_id} --enrich # Enrich specific activity type only python -m src.cli load --source output/extract/{run_id} --entity=phonecall --enrich python -m src.cli load --source output/extract/{run_id} --entity=appointment --enrich python -m src.cli load --source output/extract/{run_id} --entity=annotation --enrich # Test with sample before full processing (cost-effective, 10 records per entity type) python -m src.cli load --source output/extract/{run_id} --enrich --enrich-sample=10 # Full ETL with enrichment in one command python -m src.cli run --mode=full --enrich # Requirements: # - Set OPENAI_API_KEY environment variable # - Install: pip install neo4j-graphrag openai beautifulsoup4 html2text # - Configure: config/annotation_kg_schema.yaml (optional) # Output (written directly to Neo4j by SimpleKGPipeline): # - :Company, :Person, :Product, :Technology, :Location nodes (LLM-extracted) # - Relationships: WORKS_FOR, FOUNDED_BY, PARTNERED_WITH, etc. # - Super-labels automatically applied: :BusinessEntity, :PersonEntity # Example: Enrich phone call that mentions "Ivan Komashinsky purchased My Course (ABC111)" # Creates: Person node "Ivan Komashinsky", Product node "My Course (ABC111)" # Relationship: (Ivan)-[:PURCHASED]->(My Course)
-
Apply Ontology Super-Labels ✨ NEW
# Apply BusinessEntity and PersonEntity super-labels to existing nodes # This enables querying across entity type synonyms (Company/Account, Person/Contact) python -m src.cli apply-ontology # What it does: # - Adds :BusinessEntity label to both :Account (D365) and :Company (LLM) nodes # - Adds :PersonEntity label to both :Contact (D365) and :Person (LLM) nodes # - Sets source property to track data origin (D365 vs LLM) # Example queries after applying ontology: # Find all business entities (both D365 Accounts and LLM-extracted Companies) MATCH (n:BusinessEntity {name: "Coho Winery"})
OPTIONAL MATCH (n)-[r]-(related) RETURN n, r, related
# Find all person entities (both D365 Contacts and LLM-extracted Persons)
MATCH (n:PersonEntity) RETURN n.name, n.source LIMIT 10
# See docs/ontology_and_entity_resolution.md for full documentation
```
python -m src.cli init
python -m src.cli extract --mode=full
python -m src.cli load --source .\output\extract\full_2025-12-18_180701\ --enrich --clear-first- 360-Degree Customer View: Visualize all customer interactions and relationships
- Sales Intelligence: Discover hidden patterns in sales data
- Relationship Analytics: Analyze complex business relationships
- Impact Analysis: Understand cascading effects of business changes
- Recommendation Engine: Build AI-powered recommendations based on graph patterns
- Fraud Detection: Identify suspicious patterns through graph algorithms
- Master Data Management: Maintain a single source of truth for entity relationships
Please refer to CONTRIBUTING.md for guidelines on how to contribute to this project.
This project is licensed under the MIT License - see the LICENSE file for details.
For questions, issues, or suggestions:
- Create an issue in the GitHub repository
- Contact the development team at [team-email]
- Refer to the documentation for detailed guides
Current Version: 0.3.0 (Beta) - Phase 3 Complete ✅
- Project structure setup
- Architecture design
- Neo4j Aura DB connectivity
- D365 OAuth authentication setup
- Data Extraction Module
- Async D365 client with MSAL authentication
- Multi-entity extraction (11 D365 entities including ActivityParty)
- File-based storage (Parquet/JSON)
- Pagination handling (@odata.nextLink)
- Rate limiting and error handling
- Entity name pluralization with English grammar rules
- Transformation Module
- Schema mapping (D365 → Neo4j)
- Field transformation and validation
- Relationship building
- Business rules engine (50+ rules)
- Data quality scoring
- Neo4j connection manager with async driver
- Node creation logic with MERGE operations
- Relationship creation logic (polymorphic support)
- Batch processing with UNWIND queries
- Index and constraint management
- File-based loading (Parquet/JSON/JSONL)
- Data validation and quality checks
- Complete CLI integration
- Pipeline orchestration (Airflow/Prefect)
- Enhanced incremental sync mechanism
- Testing suite completion
- Performance optimization
- Production deployment
- Relationship loading enhancement
- Advanced graph analytics
Overall Progress: 90% Complete
- Extended: LLM enrichment now supports PhoneCall and Appointment entities (not just Annotations)
- Added:
extract_from_activity()method for generic activity text enrichment - Enhanced: Single
--enrichflag automatically processes all activity types - Added:
apply-ontologyCLI command for semantic entity alignment - Feature: Automatic super-label application (BusinessEntity, PersonEntity)
- Fixed: Critical bug in super-label application (wrong query execution mode)
- Enhanced:
execute_write_querynow supports returning data withreturn_dataparameter - Result: Extract entities from phone call descriptions, appointment notes, and annotations
- Result: Query across entity type synonyms (Company/Account, Person/Contact)
- Documentation: Complete ontology guide in
docs/ontology_and_entity_resolution.md
- Fixed: Entity name pluralization bug (
activityparty→activityparties) - Fixed: Invalid field configuration for ActivityParty entity
- Fixed: Null handling in ActivityParty transformation
- Added: 11th entity support (ActivityParty) for activity-to-participant relationships
- Result: Successfully extracts sender/recipient/organizer relationships for emails, calls, and appointments
Last Updated: December 19, 2025
