Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 32 additions & 2 deletions docs/src/04-user-guide/02-basic-operations.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,10 @@
# Basic Graph Operations

## Basics

GraphFrames provide several simple graph queries, such as node degree. Also, since GraphFrames represent graphs as pairs of vertex and edge DataFrames, it is easy to make powerful queries directly on the vertex and edge DataFrames. Those DataFrames are made available as `vertices` and `edges` fields in the GraphFrame.

## Python API
### Python API

```python
from graphframes.examples import Graphs
Expand Down Expand Up @@ -52,7 +54,7 @@ g.vertices.groupBy().min("age").show()
numFollows = g.edges.filter("relationship = 'follow'").count()
```

## Scala API
### Scala API

```scala
import org.graphframes.{examples,GraphFrame}
Expand Down Expand Up @@ -102,3 +104,31 @@ g.vertices.groupBy().min("age").show()
// This queries the edge DataFrame.
val numFollows = g.edges.filter("relationship = 'follow'").count()
```

## Filtering edges or vertices

GraphFrames provides an API for filtering edges and vertices based on their attributes.

**NOTE:** *This API is for simple filtering. For the more complex use cases, it is recommended to use [`PropertyGraphFrame` model](/04-user-guide/11-property-graphs.md). `PropertyGraphFrame` handles the logical schema of the whole graph and provides a more powerful API for selecting any subgraph based on required properties and filters.*

### Python API

```python
from pyspark.sql import functions as F
from graphframes.examples import Graphs

g = Graphs(spark).friends() # Get example graph
g.filterVertices(F.col("name") == F.lit("Alice"))
g.filterEdges(F.col("relationship") == F.lit("follow"))
```

### Scala API

```scala
import org.apache.spark.sql.functions._
import org.graphframes.{examples,GraphFrame}

val g: GraphFrame = examples.Graphs.friends
g.filterVertices(col("name") === lit("Alice"))
g.filterEdges(col("relationship") === lit("follow"))
```
28 changes: 28 additions & 0 deletions docs/src/04-user-guide/14-special-columns.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# Reserved Columns

GraphFrames internally use the following reserved columns:

- `id` for vertex IDs
- `src` for edge source IDs
- `dst` for edge destination IDs
- `attr` for vertex attributes during the GraphX conversion
- `new_id` for indexed Long IDs for vertices
- `new_src` for indexed Long IDs for edge sources
- `new_dst` for indexed Long IDs for edge destinations
- `graphx_attr` for vertex attributes during the GraphX conversion
- `weight` for edge weights
- `MSG` for messages in AggregateMessages
- `_pregel_msg` for pregel messages
- `_pregel_is_active` for pregel vertex active status

## Algorithm Specific Columns

- `component` for result of connected components
- `label` for result of label propagation
- `distances` for result of shortest paths
- `pagerank` for result of pagerank
- `count` for result of triangle count
- `column{1-4}` for SVD++ reserved columns
- `outDegree` for result of out degree
- `inDegree` for result of in degree
- `degree` for result of degree