Conversation
|
@multiqc-bot fix linting |
|
Starting off with an automated review: /review |
SummaryGreat work on adding the fasterp module! The implementation closely follows the fastp module pattern, which is appropriate given fasterp's compatible interface. However, there are two critical issues blocking this PR:
Detailed ReviewCritical Issues ❌1. Missing Test Data FilesThe tests expect data files at Action Required: Add test data files to 2. Type Annotation Errors (mypy)The module has several missing type annotations that cause mypy failures on Python 3.14: Lines 59-66 - Missing type annotations for class attributes: # Current (fails mypy):
self.fasterp_data = dict()
self.fasterp_duplication_plotdata = dict()
# ... etc
# Should be:
self.fasterp_data: Dict[str, Dict[str, float]] = {}
self.fasterp_duplication_plotdata: Dict[str, Dict[int, float]] = {}
self.fasterp_insert_size_data: Dict[str, Dict[int, float]] = {}
self.fasterp_all_data: Dict[str, Any] = {}
self.fasterp_qual_plotdata: Dict[str, Dict[str, Dict[int, float]]] = {}
self.fasterp_gc_content_data: Dict[str, Dict[str, Dict[int, float]]] = {}
self.fasterp_n_content_data: Dict[str, Dict[str, Dict[int, float]]] = {}
self.fasterp_overrepresented_sequences: Dict[str, Dict[str, Dict[str, int]]] = {}Lines 530-532 in # Current (fails mypy):
cnt_by_sample = defaultdict(Counter)
cnt_by_seq = Counter()
pct_by_seq = Counter()
# Should be:
cnt_by_sample: defaultdict[str, Counter[str]] = defaultdict(Counter)
cnt_by_seq: Counter[str] = Counter()
pct_by_seq: Counter[str] = Counter()Line 400 - Type error in Note: These same issues exist in the fastp module (multiqc/modules/fastp/fastp.py:70, 404, 534-536), so you're in good company! But since this is new code, it should meet current standards. Code Quality ✅Excellent Aspects:
Minor Issues & Suggestions 💡1. Module Ordering (multiqc/modules/fasterp/fasterp.py:81-83)The # Add software version
self.add_software_version(None)
# General Stats Table
self.fasterp_general_stats_table(self.fasterp_data)
# ... sections ...
# Save entire original parsed JSON
self.write_data_file(self.fasterp_all_data, "multiqc_fasterp")This is actually correct! However, the fastp module has these in a different order (write_data_file before sections), which may be worth noting for consistency. 2. Version Detection (Line 393-396)The code checks for both if "fastp_version" in parsed_json.get("summary", {}):
self.add_software_version(parsed_json["summary"]["fastp_version"], s_name)
elif "fasterp_version" in parsed_json.get("summary", {}):
self.add_software_version(parsed_json["summary"]["fasterp_version"], s_name)Question: Does fasterp actually output a 3. Config Option Documentation (Line 23)The docstring mentions config option: fasterp:
s_name_filenames: trueHowever, the code properly uses the modern # Recommended (modern approach):
use_filename_as_sample_name: ["fasterp"]
# Deprecated (still works but shows warning):
fasterp:
s_name_filenames: true4. Search Pattern Collision (multiqc/search_patterns.yaml:361-365)The search pattern for fasterp is identical to fastp: fasterp:
fn: "*.json"
contents: '"before_filtering": {'
num_lines: 50Potential Issue: Both modules will try to parse the same JSON files. This could lead to:
Suggestion: Can fasterp output be distinguished from fastp output? If fasterp includes a unique field (like 5. Test File OrganizationThe test file
Security Considerations ✅No security issues identified:
Performance Considerations ✅
Documentation 📝Good:
Suggestions:
Action ItemsBefore this PR can be merged:
Once these are addressed (especially items 1-2), this will be a solid contribution! The code is well-written and follows the project's conventions closely. Great work overall! Looking forward to seeing this merged. 🚀 |
|
@drbh - the review picked up the identical file search to fastp. Could you elaborate on this a little? It's not great having two modules that parse the same files. If fastp and fasterp have identical outputs then it's better to have a single module really (see also: Kraken and Bracken, for example). |
This PR adds the fasterp tool as a new module. Since fasterp exposes the same interface as fastp the module is very similar.
Please let me know if any changes are needed! Thanks!
tests
output
========================= test session starts ========================= platform darwin -- Python 3.11.1, pytest-7.4.4, pluggy-1.3.0 -- /Users/drbh/.pyenv/versions/3.11.1/bin/python3.11 cachedir: .pytest_cache rootdir: /Users/drbh/Projects/MultiQC configfile: pyproject.toml testpaths: tests, multiqc/modules plugins: syrupy-4.6.0, asyncio-0.23.7, anyio-3.7.1, typeguard-4.4.4 asyncio: mode=Mode.STRICT collected 526 items / 520 deselected / 6 selected tests/test_fasterp.py::test_fasterp_module PASSED [ 16%] tests/test_fasterp.py::test_fasterp_json_parsing PASSED [ 33%] tests/test_fasterp.py::test_fasterp_data_structure PASSED [ 50%] tests/test_fasterp.py::test_fasterp_vs_fastp_compatibility PASSED [ 66%] tests/test_modules_run.py::test_all_modules[fasterp-entry_point40] PASSED [ 83%] tests/test_modules_run.py::test_ignore_samples[fasterp-entry_point40] PASSED [100%] ========================== warnings summary =========================== ../../.pyenv/versions/3.11.1/lib/python3.11/site-packages/_pytest/config/__init__.py:1316 /Users/drbh/.pyenv/versions/3.11.1/lib/python3.11/site-packages/_pytest/config/__init__.py:1316: PytestRemovedIn8Warning: The --strict option is deprecated, use --strict-markers instead. self.issue_config_time_warning( -- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html ============ 6 passed, 520 deselected, 1 warning in 1.23s =============