Conversation
There was a problem hiding this comment.
Copilot reviewed 6 out of 24 changed files in this pull request and generated no comments.
Files not reviewed (18)
- DESCRIPTION: Language not supported
- NAMESPACE: Language not supported
- R/cbind.R: Language not supported
- R/complete.R: Language not supported
- R/edit.setup.R: Language not supported
- R/futuremice.R: Language not supported
- R/initialize.imp.R: Language not supported
- R/internal.R: Language not supported
- R/mice.R: Language not supported
- R/parlmice.R: Language not supported
- R/sampler.R: Language not supported
- R/sampler.univ.R: Language not supported
- man/futuremice.Rd: Language not supported
- man/mice.Rd: Language not supported
- man/parlmice.Rd: Language not supported
- man/record.event.Rd: Language not supported
- tests/testthat/test-as.mids.R: Language not supported
- tests/testthat/test-mice.impute.norm.R: Language not supported
Comments suppressed due to low confidence (2)
_pkgdown.yml:131
- Verify that the new 'record.event' reference corresponds to an existing documentation page to avoid broken links.
- record.event
_pkgdown.yml:184
- Ensure that the 'developer-notes-complete' page is properly linked and exists in the documentation to support developer guidance.
- developer-notes-complete
|
Well, thanks Copilot for your informative review. |
|
We also need support for custom imputation methods (cf #550). |
Merge branch 'master' into future.apply # Conflicts: # DESCRIPTION # NEWS.md # R/edit.setup.R # R/mice.R # R/sampler.R # _pkgdown.yml
|
See alexanderrobitzsch/miceadds#30 for an example with |
Development note
|
|
|
||
| # Set up parallel backend if requested | ||
| if (parallel) { | ||
| if (!requireNamespace("future.apply", quietly = TRUE)) { |
There was a problem hiding this comment.
You could consider making future.apply a hard dependency. Then, if parallel = FALSE, you just specify a sequential backend, and if parallel = TRUE. The advantage of this would be that the code is cleaner and easier to maintain (there is just one loop instead of the if-else later on). The disadvantage is that a future::apply() with sequential backend may use the rng state differently, and may thus not reproduce the original results.
There was a problem hiding this comment.
This is an interesting idea that simplifies the main loop in sampler() at the price of an additional dependency. I would favor it if the algorithm is exactly reproducible. It needs not be backward compatible (impossible to maintain over versions).
There was a problem hiding this comment.
I think it should be reproducible at the same machine, but perhaps not across platforms. We can test this, I suppose. I think, but am not entirely certain, that mice will then also be reproducible regardless of whether the replicator uses parallel = TRUE. I can dive into this.
R/sampler.R
Outdated
| } | ||
| log_i <- if (exists("loggedEvents", inherits = FALSE)) get("loggedEvents", inherits = FALSE) else NULL | ||
| list(imp = imp_i, mean = mean_i, var = var_i, log = log_i) | ||
| }, future.seed = TRUE) |
There was a problem hiding this comment.
Add the dotdotdot here when closing the future_lapply() call, then users can specify future.packages, future.globals and eventually all other functionality that is available. This allows users to specify their own imputation functions, for example.
There was a problem hiding this comment.
We can support a user-specified list future.globals as mice dots argument. It should be strainghtforward to adapt the code.
| log_i <- if (exists("loggedEvents", inherits = FALSE)) get("loggedEvents", inherits = FALSE) else NULL | ||
| list(imp = imp_i, mean = mean_i, var = var_i, log = log_i) | ||
| }, | ||
| future.packages = future.packages, |
There was a problem hiding this comment.
Hi Stef, I think the current code does not allow users to have their self-written custom imputation functions as imputation function, unless they save these within a custom package. Perhaps you can allow users to specify a list with objects (functions/variables) for future.globals, and append these two functions to it (if non-null). Alternatively, if these run_imputation_cycle and update_chain_stats functions are properly added to the mice package, calling these through globals is not necessary, as the futures should have access to the entire mice package.
There was a problem hiding this comment.
Would that require current run_imputation_cycle() and update_chain_stats() to be exported functions?
There was a problem hiding this comment.
I don't think so. That would imply that many unexported functions that are called under the hood should be exported explicitly. It's more likely that all functions in a package are available in downstream futures.
There was a problem hiding this comment.
If this mostly concerns that functions specified in the $meth slot may not be available for the future workers, would it just make sense to scan $meth, see if the functions only exist in .GlobalEnv and throw a warning to the user?
|
To install the experimental version use |
This PR adds native support for parellel imputation to
mice().The
mice()function now supports parallel execution of imputations via the newparallel = TRUEargument. When enabled, instead of sequentially calculatingmimputations at a given iteration, themchains are distributed across available CPU cores using thefutureandfuture.applyframeworks.Parallel imputation may significantly reduce runtime, especially for large datasets and many imputations (
m), but does not pay-off for small datasets or few imputations.Parallel execution is implemented only in the
mice()function, and does not affect themice.impute.*()functions.To activate parallel execution:
parallel = FALSEfor backward compatibility.n.corespecifies the number of CPU cores to use. Ifn.coreis not specified (default) the actual number of cores used is calculated as minimum(number of available cores - 1, number of imputations).printFlag = TRUEprints iteration and imputation number only in sequential mode; parallel mode reports timing per iteration.mice()will automatically select a parallel backend (default ismultisession). To override, users may manually callplan(...)before runningmice().futureandfuture.applypackages must be installed to run parallel imputation. If not installed,mice()will throw an error and suggest installing the packages.parlmice()andfuturemice()are still functional, but now throw a warning that they will be deprecated in the future. Users are encouraged to use the newparallelargument inmice()instead.Notes and questions:
parallelandn.coreargument formice(). To keep the mice API clean, I suggest setting alternative future plans outsidemice(). Does that sound like a good idea?seedsare tricky for parallel environments. It is not yet clear whether we can borrow the standardseedargument frommice()to get reproducible solutions. Do we need to add a few tests for this?mice()?parallelmode,isTRUE(printFlag)generates the time per iteration. Is that appropriate, or better report something else?