Skip to content

DHB Integration#1246

Open
HazelCC wants to merge 105 commits intomasterfrom
dhb_integration
Open

DHB Integration#1246
HazelCC wants to merge 105 commits intomasterfrom
dhb_integration

Conversation

@HazelCC
Copy link
Copy Markdown

@HazelCC HazelCC commented Aug 20, 2024

This PR integrate the data structure Dynamic Hashed Blocks (DHB) into Networkit. This improves NetworKit for dynamic graph operations.

TLTR:

For a detailed description of DHB, refer to the paper: A Fast Data Structure for Dynamic Graphs Based on Hash-Indexed Adjacency Blocks (2022)

Why?

Experiments have shown that DHB outperforms competing dynamic graph structures in edge insertions, updates, deletions, and traversal operations. This integration allows users to leverage DHB to gain performance for dynamic graph operations compared to the (vanilla) Graph class.

How?

This integration extends the functionalities of the default Graph class by supporting (almost) all methods implemented in Graph. When NetworKit is build using DHB, the necessary resources will be pulled in as a sub-module, and the associated code branches will be compiled. If the respective NetworKit modules are not compiled using the DHB integration, DHBGraph will fall back using the implementation of the default Graph class.

Measurement:

Experiments run on commodity hardware (MacOS, M1, 8 GB RAM) have been run to get some baseline measurements on the performance of default Graph class vs. the DHBGraph class. For edge insertions, sequential weight update, and sequential edge removal, DHBGraph outperforms Graph class in an unweighted graph setting.

image(2)
image(1)
image

TODOs:

  • Review the passing by value semantics for all lambdas passed to a method such as f(L handle). Instead, we might want to pass all lambdas by rvalue-reference as in f(L&& handle).
  • Review which type of omp parallel for scheduler (guided?) we use.
  • Review possibilities to use a concurrent/parallel version of functions that iterate over edges/vertices. For example, we might make removeNode to use the concurrent version of forNeighbors.
  • Write more tests for edgeIterator and nodeIterator. (@fabratu)
  • Write tests to check the consistency of unweighted undirected DHBGraph objects. An unweighted undirected graph should store one edge in both direction. (@HazelCC)
  • Improve documentation. (@fabratu)
  • Fix GCC build issues. @c-bebop
  • Build DHB on Clang and fix issues @c-bebop

@HazelCC HazelCC requested review from bernlu, c-bebop and fabratu August 20, 2024 14:48
@HazelCC HazelCC self-assigned this Aug 20, 2024
@HazelCC HazelCC marked this pull request as draft August 21, 2024 15:14
Copy link
Copy Markdown
Member

@fabratu fabratu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to keep the PR manageable, I suggest to fix all remaining issues concerning the integration in this PR and move optimization and additional thoughts to another PR.

Concerning building / testing: It seems that DHB is not correctly added as a submodule or there is some ill-defined logic in CMakeLists.txt. CI does fetch all submodules, however DHB is not found.

@c-bebop
Copy link
Copy Markdown

c-bebop commented Oct 23, 2024

@HazelCC Could you clarify and elaborate what exactly you mean with the point: Write tests to check the consistency of unweighted DHBGraph objects.?

@HazelCC
Copy link
Copy Markdown
Author

HazelCC commented Oct 29, 2024

@HazelCC Could you clarify and elaborate what exactly you mean with the point: Write tests to check the consistency of unweighted DHBGraph objects.?

An undirected graph object should store directed edges in both directions. Previously there were issues where logical bugs causing inconsistency for undirected graph, therefore it's good to have more tests addressing this.

@c-bebop
Copy link
Copy Markdown

c-bebop commented Oct 29, 2024

@HazelCC Could you clarify and elaborate what exactly you mean with the point: Write tests to check the consistency of unweighted DHBGraph objects.?

An undirected graph object should store directed edges in both directions. Previously there were issues where logical bugs causing inconsistency for undirected graph, therefore it's good to have more tests addressing this.

For now, we have referenced you as taking over for this task. Is that fine?

@HazelCC
Copy link
Copy Markdown
Author

HazelCC commented Oct 30, 2024

@HazelCC Could you clarify and elaborate what exactly you mean with the point: Write tests to check the consistency of unweighted DHBGraph objects.?

An undirected graph object should store directed edges in both directions. Previously there were issues where logical bugs causing inconsistency for undirected graph, therefore it's good to have more tests addressing this.

For now, we have referenced you as taking over for this task. Is that fine?

For sure. I'd happy to do that.

@c-bebop c-bebop self-assigned this Mar 26, 2025
c-bebop
c-bebop previously approved these changes Apr 29, 2025
Copy link
Copy Markdown

@c-bebop c-bebop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very good, now just some small fixes.

Copy link
Copy Markdown

@bernlu bernlu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably discuss some of these comments, some are quite detailed and maybe out of scope for this PR.

An additional general comment: I think it would be nice to add time complexity to the documentation for (almost) all functions.

Comment on lines +21 to +24
/**
* A weighted edge used for the graph constructor with
* initializer list syntax.
*/
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure what this comment is supposed to tell me. The 'initializer list' part is obvious from reading the code. A short explanation of this struct would be more useful, or maybe just no comment at all?

struct WeightedEdge : Edge {
edgeweight weight;

// Needed by cython
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this function should never be used in c++? Then the comment should clarify this and tell the user to never call this constructor.

Comment on lines +37 to +40
// forward declaration to randomization/CurveballImpl.hpp
namespace CurveballDetails {
class CurveballMaterialization;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This declaration seems to be unused.

Comment on lines +188 to +193
if (!directed) {
is_inserted_out = m_dhb_graph.insert(u, v, edge_data, do_update);
is_inserted_in = m_dhb_graph.insert(v, u, edge_data, do_update);
} else {
is_inserted = m_dhb_graph.insert(u, v, edge_data, do_update);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reorder for readability

Suggested change
if (!directed) {
is_inserted_out = m_dhb_graph.insert(u, v, edge_data, do_update);
is_inserted_in = m_dhb_graph.insert(v, u, edge_data, do_update);
} else {
is_inserted = m_dhb_graph.insert(u, v, edge_data, do_update);
}
if (directed) {
is_inserted = m_dhb_graph.insert(u, v, edge_data, do_update);
} else {
is_inserted_out = m_dhb_graph.insert(u, v, edge_data, do_update);
is_inserted_in = m_dhb_graph.insert(v, u, edge_data, do_update);
}

edgesIndexed(edgesIndexed), // edges are not indexed by default
nodeAttributeMap(this), edgeAttributeMap(this) {

m_dhb_graph = dhb::Matrix<EdgeData>(n);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move this into initializer list

Comment on lines +315 to +320
void DHBGraph::removeAllEdges() {
m_dhb_graph = dhb::Matrix<EdgeData>(m_dhb_graph.vertices_count());
m = 0;
storedNumberOfSelfLoops = 0;
omega = 0;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this function should invalidate all edgeAttributes as well.

Comment on lines +1130 to +1131
* BEWARE: Running time is \Theta(1) for DHB implementation but \Theta(deg(u)) for fallback
* NetworKit graph DS.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this fallback nk graph is outdated information?

*
* @param u Node.
* @param i index; should be in [0, degreeOut(u))
* @return @a edge weight to the i-th (outgoing) neighbor of @a u, or @c +inf if no such
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function actually returns the default edge weight and not inf when the neighbor does not exist.

* @param u Node.
* @param i index; should be in [0, degreeOut(u))
* @return pair: i-th (outgoing) neighbor of @a u and the corresponding
* edge weight, or @c defaultEdgeWeight if unweighted.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should add that none is returned when the neighbor does not exist.

*
* This function is not supported for DHB. Please use getIthNeighborWithWeight(node u, index i).
*/
std::pair<node, edgeweight> getIthNeighborWithWeight(Unsafe, node u, index i) const;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this function declared twice, but once we claim that it is not supported while the other one is implemented?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For @c-bebop : Look whether all Unsafe methods are marked as unsupported / unimplemented.

@bernlu bernlu requested a review from Copilot June 24, 2025 17:32
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR integrates the Dynamic Hashed Blocks (DHB) data structure into NetworKit to accelerate dynamic graph operations. Key changes include:

  • Adding a new DHBGraph implementation and test (DHBGraphGTest).
  • Updating source files such as EdgeIterators.cpp to add DHB-specific specializations.
  • Modifying multiple CMakeLists and CI configuration files to incorporate DHB as a submodule and update linking and build settings.

Reviewed Changes

Copilot reviewed 10 out of 13 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
networkit/cpp/graph/test/CMakeLists.txt Added DHBGraphGTest to the test suite.
networkit/cpp/graph/EdgeIterators.cpp Added DHBGraph-specific template specializations for edge iterators.
networkit/cpp/graph/CMakeLists.txt Included DHBGraph.cpp in source files for compilation.
include/networkit/graph/Graph.hpp Removed duplicate edge type definitions (moved to Edge.hpp).
include/networkit/graph/Edge.hpp Introduced new definitions for Edge and WeightedEdge types.
(Other files) Updated build scripts, submodules (.gitmodules) and CI configuration accordingly.
Comments suppressed due to low confidence (1)

networkit/cpp/graph/EdgeIterators.cpp:22

  • The DHBGraph specialization of validEdge omits the Unsafe parameter found in the Graph version's getIthNeighbor call; please confirm that this API difference is intentional and that it correctly reflects the DHBGraph interface design.
}

Comment on lines +283 to +287
target_link_libraries(networkit PRIVATE
dhb
OpenMP::OpenMP_CXX
PUBLIC
tlx)
Copy link

Copilot AI Jun 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review the linking order and the use of PRIVATE vs. PUBLIC for the added 'dhb' dependency to ensure that transitive dependencies are handled correctly.

Suggested change
target_link_libraries(networkit PRIVATE
dhb
OpenMP::OpenMP_CXX
PUBLIC
tlx)
target_link_libraries(networkit PUBLIC
dhb
tlx
PRIVATE
OpenMP::OpenMP_CXX)

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct @c-bebop

@c-bebop
Copy link
Copy Markdown

c-bebop commented Sep 2, 2025

There are still some review comments but.. are we merging this one?

@c-bebop
Copy link
Copy Markdown

c-bebop commented Dec 5, 2025

I see that there have been some review remarks by @bernlu . We are working on this feature for more than 1 year now. The merge should have happened this spring. I am sure I can address some of the comments. I would like us to focus on a merge ASAP as I need this feature for my work and I would be happy to have this merged into master! 🙏

@fabratu
Copy link
Copy Markdown
Member

fabratu commented Dec 10, 2025

True, not so easy this one - formost having a final decision. :) If it's possible for you, address the comments you see fit, and I will handle the rest.

* TODO:
* - [ ] Review the passing by value semantics for all lambdas passed
* to a method such as f(L handle). Instead, we might want to pass
* all lambdas by rvalue-reference as in f(L&& handle).
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please mark the DHBGraph class as experimental. It suffices to mark it in the documentation for C++ by adding @note to users.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants