Specifications¶
Formal specifications of DataJoint's data model and behavior.
These documents define how DataJoint works at a detailed level. They serve as authoritative references for:
- Understanding exact behavior of operations
- Implementing compatible tools and extensions
- Debugging complex scenarios
How to Use These Specifications¶
If you're new to DataJoint: Start with the tutorials and how-to guides before diving into specifications. Specs are technical references, not learning materials.
If you're implementing features: Use specs as authoritative sources for behavior. Start with dependencies (see below) and work up to your target specification.
If you're debugging: Specs clarify exact behavior when documentation or examples are ambiguous.
Reading Order¶
Start Here¶
- Database Backends — Supported databases (MySQL, PostgreSQL)
- Table Declaration — How to define tables
- Primary Keys — Key propagation rules
- Type System — Three-layer type architecture
Next: Choose based on your needs: - Working with data? → Data Operations - Building queries? → Query Algebra - Using large data? → Object Storage
Query Algebra¶
Prerequisites: Table Declaration, Primary Keys
- Query Operators — Restrict, proj, join, aggr, union
- Semantic Matching — Attribute lineage
- Fetch API — Data retrieval
Data Operations¶
Prerequisites: Table Declaration
- Data Manipulation — Insert, update, delete
- AutoPopulate — Jobs 2.0 system
- Job Metadata — Hidden job tracking columns
Object Storage¶
Prerequisites: Type System
- Object Store Configuration — Store setup
- Codec API — Custom type implementation
<npy>Codec — NumPy array storage
Advanced Topics¶
- Master-Part Relationships — Compositional modeling
- Virtual Schemas — Schema introspection without source
Document Structure¶
Each specification follows a consistent structure:
- Overview — What this specifies
- User Guide — Practical usage
- API Reference — Methods and signatures
- Concepts — Definitions and rules
- Implementation Details — Internal behavior
- Examples — Concrete code samples
- Best Practices — Recommendations
Specifications by Topic¶
Schema Definition¶
| Specification | Prerequisites | Related How-To | Related Explanation |
|---|---|---|---|
| Table Declaration | None | Define Tables | Relational Workflow Model |
| Master-Part Relationships | Table Declaration | Model Relationships | Data Pipelines |
| Virtual Schemas | Table Declaration | — | — |
Key concepts: Table tiers (Manual, Lookup, Imported, Computed, Part), foreign keys, dependency graphs, compositional modeling
Query Algebra¶
| Specification | Prerequisites | Related How-To | Related Explanation |
|---|---|---|---|
| Query Operators | Table Declaration, Primary Keys | Query Data | Query Algebra |
| Semantic Matching | Query Operators | Model Relationships | Query Algebra |
| Primary Keys | Table Declaration | Design Primary Keys | Entity Integrity |
| Fetch API | Query Operators | Fetch Results | — |
| Diagram | Table Declaration | Read Diagrams | — |
Key concepts: Restriction (&, -), projection (.proj()), join (*), aggregation (.aggr()), union, universal set (U()), attribute lineage, schema visualization
Type System¶
| Specification | Prerequisites | Related How-To | Related Explanation |
|---|---|---|---|
| Type System | None | Choose a Storage Type | Type System |
| Codec API | Type System | Create Custom Codec | Custom Codecs |
<npy> Codec |
Type System | Use Object Storage | — |
Key concepts: Native types (MySQL), core types (portable), codec types (Python objects), in-table vs object storage, addressing schemes
Object Storage¶
| Specification | Prerequisites | Related How-To | Related Explanation |
|---|---|---|---|
| Object Store Configuration | Type System | Configure Object Storage | Data Pipelines (OAS) |
Key concepts: Hash-addressed storage (deduplication), schema-addressed storage (browsable paths), filepath storage (user-managed), store configuration, path generation
Data Operations¶
| Specification | Prerequisites | Related How-To | Related Explanation |
|---|---|---|---|
| Data Manipulation | Table Declaration | Insert Data | Normalization |
| AutoPopulate | Table Declaration, Data Manipulation | Run Computations, Distributed Computing | Computation Model |
| Job Metadata | AutoPopulate | Handle Errors | Computation Model |
Key concepts: Insert patterns, transactional integrity, workflow normalization, Jobs 2.0, job coordination, populate(), make() method, job states