A finite-state morphological analyzer that performs morphological decomposition and code-switching detection on mixed Bisaya-Tagalog text using a python-based Non-Deterministic Finite Automaton (NFA).
data_v2/: Lexicon data files (JSON) - Prefix, Infix, Suffix, Circumfix tables, and Root Lexicons.src/python/: Core logic including FSM implementation (bindings.py) and Web Server (server.py).web/: Frontend assets (HTML, CSS, JS).docs/: Formal documentation and academic proposal.
- Python 3.x
-
Install Python dependencies:
pip install -r requirements.txt
-
Run the Web Server:
python src/python/server.py
Access the web interface at http://localhost:8000. Enter mixed Bisaya-Tagalog text to analyze morphology and detect code-switching.
The analyzer helps linguistic research by decomposing words using a Finite-State Morphotactic approach. The system consists of four interacting components controlled by a global automaton:
- PrefixFSM: Lexical automaton for prefix tokens.
- InfixFSM: Handling ε-transitions for inserting morphemes.
- RootLexicon: Stem lexicon lookup structure for validation.
- SuffixCircumfixFSM: Lexical automaton for suffixes and circumfix constraints.
The system outputs a formal morphotactic parse string where morphemes are separated with + and annotated with tags derived from the FSM states.
Examples:
magsulat->mag[PFX] + sulat[ROOT]sinulat->in[INFX] + sulat[ROOT]kasulatan->ka[CIRCUMFIX_PREFIX] + sulat[ROOT] + an[CIRCUMFIX_SUFFIX]