Conversation
|
Thanks for your PR @DmitriyValetov I have made a few minor comments and changes for your review. The notebook is welcome, however I would like to place the notebook in a separate I would also like to see:
I can help with number 2 if you'd like. |
|
I will take a try. So, we need two tests: one for correlating and one correlated problem with analitical results? |
Yes, we can also re-use the example given in your notebook as a high-level check to ensure expected covariance and mean values are being generated (or at least approximately so): https://numpy.org/doc/stable/reference/generated/numpy.testing.assert_almost_equal.html You can add the tests to this file I think. |
|
Added test for adding covariance to saltelli sample. Also I have encountered several papers like: https://www.sciencedirect.com/science/article/abs/pii/S1364815215300153 Implementation of independent Sobol version have found here: https://gitlab.com/CEMRACS17/shapley-effects - this lib has errors (and dead for sever years), but the part of Sobol independent is ok. I have taken this implementation with small changes to lib on my job. |
|
What about adding sobol independent and full indices (they are specialized for problems with correlated inputs) + shapley effects code from that library: shapley-effects (offcource adapted code)? (It is dead, but have relatively fresh issues that are for noone - so it is needed. Also I need it sometimes.) I can have a try to convert it to sample-analyze format and insert it in salib. |
|
Yes, sounds good. We just need to be mindful of any licensing issues but you seem to have considered this already. Thanks for contributing! |
|
Just adding a reference to a related issue (#193) |
|
Shapley & Sobol methods added with tests and examples. |
|
Well, Shapley is a little unstable. |
|
Hi @DmitriyValetov. Thank you for implementing the covarianced sampling methods in SALib, it's a really useful addition to the library! I have been using the Shapley method for a model with correlated inputs. One of my input variables is categorical and I was wondering if you are you aware of any issues with using categorical input variables to calculate Shapley effects, I couldn't find anything in the literature? I've included the categorical distribution by adding an OpenTurns user defined distribution to distrs.py with the weights set to the probability of each category. (https://openturns.github.io/openturns/latest/user_manual/_generated/openturns.UserDefined.html) and this seems to give sensible answers. Do you know if there's a better way of doing this or does this seem like a sensible approach? Thank you! |
|
Hi @jameswoodcock . Have never analyzed data with categorical features by salib and alike methods. But there is another way... and there is used the Shapley method, but in different approach. Have you heard about Shap package (https://github.com/slundberg/shap)? It is usually used to interpret boosting methods. So you can analyze data as is this way:
My friend from bioinformatics usually goes this way. Also, if this approach model-use is convenient, give it a try to https://github.com/oegedijk/explainerdashboard. |
|
@DmitriyValetov thank you for your reply. I hadn't heard about the Shap package before, it looks really useful. I'll give it a try! |
|
Hey all, is there are reason why this hasn't been merged into A paper was recently published that builds off of aforementioned https://www.sciencedirect.com/science/article/abs/pii/S1364815215300153 |
|
Hi @mschrader15 The current implementation has dependencies on a few big external packages. We're currently looking into how best to reduce/remove these dependencies so that SALib continues to be relatively self-contained. I'm not able to do this very quickly as I, and other maintainers here, don't have much time available currently. But if you're willing to look under the hood we'd welcome any contribution. |
|
Totally understand. I really just wanted to make sure that it wasn't due to implementation errors. I forked and merged on my own branch for a time critical use case (no offense meant btw, @DmitriyValetov, this was a big effort and really helpful to my research!). I've contacted the authors of https://www.sciencedirect.com/science/article/pii/S0307904X21002122 to see if they would be interested in sharing code / supporting integration into SALib. If so, I'll organize a PR (eventually) |
…to covarianced_sampling
|
Regarding the dependency to OpenTURNS. I had a look and it seems to me that the main usage is Copulae to sample distributions. Copula are also present in statsmodels (will not get into SciPy since it was rejected and motivated the work to statsmodels) but it would add an extra dependency as SALib does not has it either. I am not sure it would be really on scope with the library to pull Copula in here. Or we would need to have another way to sample multivariate distributions. I am checking but in SciPy what we added to sample arbitrary distributions is just 1D. But I would maybe suggest doing this in 2 steps. 1 add Shapley by itself as it does not seem to really require Copula (normal copula is just the classical multivariate normal distribution). And then think about what to do with Sobol' correlated version. (A first step could be to only support multivariate normal distributions.) |
|
Regarding copulas @tupui do you think the Copulas package would be a sufficient alternative? Seems like it would be a more lightweight dependency than statsmodels. Otherwise, I agree that a first step would be to support multivariate normal distributions. |
It would be enough in terms of features sure, but it's still an additional dependency. In practice, I am wondering if we could not just go with 1 or 2 Copulas and if that's the case, we might as well just add these ourselves (I can). Another possibility is to have clear instructions on how to use external libraries to generate a correlated sample. It could be argued that sampling is not really the responsibility of SALib. |
|
FYI, the Copulas package is not anymore an option as they changed their license to an incompatible license. |
|
Apologies @DmitriyValetov , the contribution here is really valuable, I got waylaid with finishing my PhD. If you don't mind I will try splitting off your contribution into smaller Pull Requests starting with the Shapley method. |
|
Hi everyone, is the method for covarianced sampling now Merged with the Main now? I did. Not find the method in the docs |
|
@kka1996 this merge request is still open so no the method is not yet available. |
Good day!
I'm daily using SALib and sometimes need to make correlated samples.
I know that there are methods aimed on problems with correlated
parameters, but to make it simpler for coding comparison calculating
I offer these changes.
Sobol sequence sampling by multiplying with cholesky decomposition of covariance matrix.
Attached a notebook example with this functionality use to examples directory.