Skip to content

feat: add fasterp module#3392

Open
drbh wants to merge 2 commits intoMultiQC:mainfrom
drbh:add-fasterp-module
Open

feat: add fasterp module#3392
drbh wants to merge 2 commits intoMultiQC:mainfrom
drbh:add-fasterp-module

Conversation

@drbh
Copy link

@drbh drbh commented Nov 24, 2025

This PR adds the fasterp tool as a new module. Since fasterp exposes the same interface as fastp the module is very similar.

Please let me know if any changes are needed! Thanks!

tests

pytest -k fasterp -vvv --strict

output

========================= test session starts =========================
platform darwin -- Python 3.11.1, pytest-7.4.4, pluggy-1.3.0 -- /Users/drbh/.pyenv/versions/3.11.1/bin/python3.11
cachedir: .pytest_cache
rootdir: /Users/drbh/Projects/MultiQC
configfile: pyproject.toml
testpaths: tests, multiqc/modules
plugins: syrupy-4.6.0, asyncio-0.23.7, anyio-3.7.1, typeguard-4.4.4
asyncio: mode=Mode.STRICT
collected 526 items / 520 deselected / 6 selected

tests/test_fasterp.py::test_fasterp_module PASSED               [ 16%]
tests/test_fasterp.py::test_fasterp_json_parsing PASSED         [ 33%]
tests/test_fasterp.py::test_fasterp_data_structure PASSED       [ 50%]
tests/test_fasterp.py::test_fasterp_vs_fastp_compatibility PASSED [ 66%]
tests/test_modules_run.py::test_all_modules[fasterp-entry_point40] PASSED [ 83%]
tests/test_modules_run.py::test_ignore_samples[fasterp-entry_point40] PASSED [100%]

========================== warnings summary ===========================
../../.pyenv/versions/3.11.1/lib/python3.11/site-packages/_pytest/config/__init__.py:1316
  /Users/drbh/.pyenv/versions/3.11.1/lib/python3.11/site-packages/_pytest/config/__init__.py:1316: PytestRemovedIn8Warning: The --strict option is deprecated, use --strict-markers instead.
    self.issue_config_time_warning(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
============ 6 passed, 520 deselected, 1 warning in 1.23s =============

@ewels
Copy link
Member

ewels commented Dec 5, 2025

@multiqc-bot fix linting

@ewels
Copy link
Member

ewels commented Dec 6, 2025

Starting off with an automated review:

/review

@claude
Copy link

claude bot commented Dec 6, 2025

Summary

Great work on adding the fasterp module! The implementation closely follows the fastp module pattern, which is appropriate given fasterp's compatible interface. However, there are two critical issues blocking this PR:

  1. Missing test data files - Tests are failing because test data directory doesn't exist
  2. Type annotation errors - mypy type checking is failing on Python 3.14
Detailed Review

Critical Issues ❌

1. Missing Test Data Files

The tests expect data files at test-data/data/modules/fasterp/, but this directory doesn't exist in the PR. All fasterp tests are failing:

AssertionError: Test data directory not found: /home/runner/work/MultiQC/MultiQC/test-data/data/modules/fasterp
AssertionError: Test JSON file not found: /home/runner/work/MultiQC/MultiQC/test-data/data/modules/fasterp/sample1.json

Action Required: Add test data files to test-data/data/modules/fasterp/ directory. You can use actual fasterp JSON output files (with sensitive data removed if needed).

2. Type Annotation Errors (mypy)

The module has several missing type annotations that cause mypy failures on Python 3.14:

Lines 59-66 - Missing type annotations for class attributes:

# Current (fails mypy):
self.fasterp_data = dict()
self.fasterp_duplication_plotdata = dict()
# ... etc

# Should be:
self.fasterp_data: Dict[str, Dict[str, float]] = {}
self.fasterp_duplication_plotdata: Dict[str, Dict[int, float]] = {}
self.fasterp_insert_size_data: Dict[str, Dict[int, float]] = {}
self.fasterp_all_data: Dict[str, Any] = {}
self.fasterp_qual_plotdata: Dict[str, Dict[str, Dict[int, float]]] = {}
self.fasterp_gc_content_data: Dict[str, Dict[str, Dict[int, float]]] = {}
self.fasterp_n_content_data: Dict[str, Dict[str, Dict[int, float]]] = {}
self.fasterp_overrepresented_sequences: Dict[str, Dict[str, Dict[str, int]]] = {}

Lines 530-532 in fasterp_overrepresented_sequences_plot():

# Current (fails mypy):
cnt_by_sample = defaultdict(Counter)
cnt_by_seq = Counter()
pct_by_seq = Counter()

# Should be:
cnt_by_sample: defaultdict[str, Counter[str]] = defaultdict(Counter)
cnt_by_seq: Counter[str] = Counter()
pct_by_seq: Counter[str] = Counter()

Line 400 - Type error in general_stats_addcols:
The headers dict contains lambda functions in modify fields, which mypy sees as object. Consider casting or restructuring to satisfy type checker.

Note: These same issues exist in the fastp module (multiqc/modules/fastp/fastp.py:70, 404, 534-536), so you're in good company! But since this is new code, it should meet current standards.


Code Quality ✅

Excellent Aspects:

  1. Follows Repository Conventions

    • ✅ Raises ModuleNoSamplesFound (not UserWarning)
    • ✅ Calls self.add_software_version()
    • ✅ Calls self.write_data_file() at the end
    • ✅ Entry point added to pyproject.toml
    • ✅ Docstring on module class (not separate file)
    • ✅ Module info starts with capital letter
    • ✅ Uses double quotes throughout
    • ✅ Modern Python 3 syntax with f-strings
  2. Well-Structured Code

    • Clean separation of parsing, processing, and visualization
    • Proper error handling with try/except blocks
    • Appropriate use of logging for debug messages
    • Good code reuse from fastp module
  3. Comprehensive Test Coverage

    • Tests module initialization and data parsing
    • Tests data structure validation
    • Tests compatibility with fastp module
    • Tests integration with module runner

Minor Issues & Suggestions 💡

1. Module Ordering (multiqc/modules/fasterp/fasterp.py:81-83)

The add_software_version() call is placed before sections are added, but write_data_file() is correctly at the end. According to CLAUDE.md, write_data_file should be called at the very end after all sections. Currently:

# Add software version
self.add_software_version(None)

# General Stats Table
self.fasterp_general_stats_table(self.fasterp_data)

# ... sections ...

# Save entire original parsed JSON
self.write_data_file(self.fasterp_all_data, "multiqc_fasterp")

This is actually correct! However, the fastp module has these in a different order (write_data_file before sections), which may be worth noting for consistency.

2. Version Detection (Line 393-396)

The code checks for both fastp_version and fasterp_version:

if "fastp_version" in parsed_json.get("summary", {}):
    self.add_software_version(parsed_json["summary"]["fastp_version"], s_name)
elif "fasterp_version" in parsed_json.get("summary", {}):
    self.add_software_version(parsed_json["summary"]["fasterp_version"], s_name)

Question: Does fasterp actually output a fasterp_version field, or does it use fastp_version for compatibility? If it always uses fastp_version, the second condition may be unnecessary.

3. Config Option Documentation (Line 23)

The docstring mentions config option:

fasterp:
  s_name_filenames: true

However, the code properly uses the modern use_filename_as_sample_name option and marks the old one as deprecated (line 187-190). The docstring should be updated to reflect the recommended approach:

# Recommended (modern approach):
use_filename_as_sample_name: ["fasterp"]

# Deprecated (still works but shows warning):
fasterp:
  s_name_filenames: true

4. Search Pattern Collision (multiqc/search_patterns.yaml:361-365)

The search pattern for fasterp is identical to fastp:

fasterp:
  fn: "*.json"
  contents: '"before_filtering": {'
  num_lines: 50

Potential Issue: Both modules will try to parse the same JSON files. This could lead to:

  • Duplicate sample names if both modules run on fastp output
  • Performance overhead from double-parsing

Suggestion: Can fasterp output be distinguished from fastp output? If fasterp includes a unique field (like "fasterp_version" or "command": "fasterp..."), you could make the search pattern more specific to avoid false matches.

5. Test File Organization

The test file tests/test_fasterp.py is well-structured with good coverage. Consider adding a test for:

  • Sample name parsing from command line (various formats with spaces, etc.)
  • Handling of missing optional fields (duplication, insert_size, etc.)
  • Version detection logic

Security Considerations ✅

No security issues identified:

  • ✅ Uses json.load() safely
  • ✅ Proper error handling for malformed JSON
  • ✅ No code execution or unsafe string operations
  • ✅ Appropriate use of regex for parsing

Performance Considerations ✅

  • ✅ Efficient JSON parsing
  • ✅ Proper use of defaultdict and Counter
  • ✅ No obvious performance bottlenecks
  • ✅ Data structures are appropriate for the task

Documentation 📝

Good:

  • ✅ Clear module docstring
  • ✅ Comprehensive PR description with test output
  • ✅ Descriptive function names and comments

Suggestions:

  • Add a note in the docstring about fasterp being compatible with fastp
  • Document any known differences between fasterp and fastp output formats

Action Items

Before this PR can be merged:

  1. [REQUIRED] Add test data files to test-data/data/modules/fasterp/
  2. [REQUIRED] Fix mypy type annotation errors (lines 59-66, 530-532, 400)
  3. [RECOMMENDED] Update docstring to show modern config option
  4. [RECOMMENDED] Consider making search pattern more specific to avoid fastp/fasterp collision
  5. [OPTIONAL] Clarify version field handling

Once these are addressed (especially items 1-2), this will be a solid contribution! The code is well-written and follows the project's conventions closely.

Great work overall! Looking forward to seeing this merged. 🚀

@ewels
Copy link
Member

ewels commented Dec 7, 2025

@drbh - the review picked up the identical file search to fastp. Could you elaborate on this a little? It's not great having two modules that parse the same files.

If fastp and fasterp have identical outputs then it's better to have a single module really (see also: Kraken and Bracken, for example).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants