Skip to content

Checking algorithm parameters for validity #182

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

bcajes
Copy link

@bcajes bcajes commented Apr 10, 2017

Adding some checks for parameters of various algorithms within GraphFrames #76 . TODO: check to see if SVD++ parameter requirements can be made more specific.

@codecov-io
Copy link

codecov-io commented Apr 10, 2017

Codecov Report

Merging #182 into master will increase coverage by 0.66%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #182      +/-   ##
==========================================
+ Coverage   88.14%   88.81%   +0.66%     
==========================================
  Files          17       17              
  Lines         717      733      +16     
  Branches       56       56              
==========================================
+ Hits          632      651      +19     
+ Misses         85       82       -3
Impacted Files Coverage Δ
src/main/scala/org/graphframes/lib/PageRank.scala 96% <100%> (+0.76%) ⬆️
...main/scala/org/graphframes/lib/ShortestPaths.scala 95.83% <100%> (+0.18%) ⬆️
.../graphframes/lib/StronglyConnectedComponents.scala 100% <100%> (ø) ⬆️
...n/scala/org/graphframes/lib/LabelPropagation.scala 100% <100%> (ø) ⬆️
...c/main/scala/org/graphframes/lib/SVDPlusPlus.scala 87.5% <100%> (+7.5%) ⬆️
...in/spark-1.x/org/apache/spark/sql/SQLHelpers.scala 0% <0%> (-100%) ⬇️
...cala/org/graphframes/lib/ConnectedComponents.scala 95.69% <0%> (+1.07%) ⬆️
...in/spark-2.x/org/apache/spark/sql/SQLHelpers.scala 100% <0%> (+100%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 02cda44...2f446c8. Read the comment docs.

Copy link
Member

@felixcheung felixcheung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these are great to have and think perhaps as a follow up we should:

  • add documentation of the acceptable values, so they can be more discoverable
  • add default values and documentation of those

what do you think?

@@ -45,7 +45,7 @@ class ShortestPaths private[graphframes] (private val graph: GraphFrame) extends
* The list of landmark vertex ids. Shortest paths will be computed to each landmark.
*/
def landmarks(value: Seq[Any]): this.type = {
// TODO(tjh) do some initial checks here, without running queries.
require(value.nonEmpty, "expecting at least one landmark")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also check for the type within the Seq?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tracing through this implementation, and correct me if I'm wrong, it looks like graphframe vertex IDs can be of any type allowable within a spark Dataframe. I can add a pattern match check for all of those types.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right - so perhaps not a good idea to duplicate the check here, if Spark DataFrame is already checking it?

resetProb = Some(value)
this
}

def tol(value: Double): this.type = {
require(value >= 0, s"Tolerance must be no less than 0, but got ${value}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd leave this as tol for consistency with other checks?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay

@@ -80,16 +80,20 @@ class PageRank private[graphframes] (

/** Reset probability "alpha" */
def resetProbability(value: Double): this.type = {
require(value >= 0 && value <= 1, s"Random reset probability must belong" +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd leave this as resetProbability?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay

@bcajes
Copy link
Author

bcajes commented Apr 20, 2017

@felixcheung Agreed some documentation and default values would be helpful. I can add most of them. Though, I'm not familiar enough with the gamma parameters of SVDPlusPlus. So, if you or someone else can shed light or point me towards existing documentation for these parameters, that would be much appreciated. Thanks!

@felixcheung
Copy link
Member

unfortunately there isn't much information on SVDPlusPlus - let's skip that one for now

@SemyonSinchenko
Copy link
Collaborator

@rjurney @SauronShepherd I like this PR. It may be slightly incomplete, but it is still a good starting point. What do you think about it? If you are OK with it, I can try to resolve conflicts and we can merge it.

@SauronShepherd
Copy link
Contributor

It's not a bad idea, but maybe we should stop for a moment and rethink all these config-related issues/new proposals.

@rjurney
Copy link
Collaborator

rjurney commented Apr 3, 2025

If it helps and makes sense, I like the PR. If we want a more complete approach, we can do that too but let’s not hold up contributions that aren’t sweeping as it precludes most people from contributing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants