Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Show HN: Graph Processing with Python and GraphBLAS (github.com/michelp)
122 points by michelpp on July 17, 2019 | hide | past | favorite | 19 comments


Hi Michel - You've been busy with this and the Postgres lib. Good work. The Gremlin User Group has been working on a new universal graph DSL and compiler that can compile down to any of the graph backends. GraphBLAS has been part of the discussion since the start. An early draft of mm-ADT was just released last week...

mm-ADT: A Multi-Model Abstract Data Type http://rredux.com/mm-adt/


Thanks espeed! I'll give this a read.


This looks really interesting, potentially makes GraphBLAS much more accessible for exploratory work. A few questions for the author:

Does this work in blocking or non-blocking mode? Naively I imagine there might be more opportunity for the GraphBLAS implementation to optimize execution in non-blocking mode.

Is there a way to efficiently store and load matrices to and from files? Ideally in a such a way that the data is just mmap'ed or copied directly into memory on load?

Does this only work with SuiteSparse or could it potentially work with a GPU implementation like https://github.com/gunrock/graphblast too?


> Does this work in blocking or non-blocking mode? Naively I imagine there might be more opportunity for the GraphBLAS implementation to optimize execution in non-blocking mode.

Non blocking mode.

> Is there a way to efficiently store and load matrices to and from files? Ideally in a such a way that the data is just mmap'ed or copied directly into memory on load?

The Matrix.from_mm and Matrix.to_mm methods will read and write files in the standard Matrix Market format. They are wrappers around the underlying LAGraph functions.

> Does this only work with SuiteSparse or could it potentially work with a GPU implementation like https://github.com/gunrock/graphblast too?

It uses a lot of the GxB "extensions" that are to my knowledge only present in SuiteSparse. Investigating other implementations has been on my radar but not a high priority, certainly very open to getting PRs for helping with interchangeability!


The GraphBLAS GPU code is in the works [1]. For storage see the new RedisGraph 1.0 implementation released last year and 'michelp has a Python Postgres implementation in development.

[1] Sparse versus dense in GraphBLAS: sometimes dense is better http://aldenmath.com/sparse-verses-dense-in-graphblas-someti...

[2] RedisGraph: A Graph Database Module for Redis http://graphblas.org/?title=Graph_BLAS_Forum#Graph_analysis_...

[3] Graph Processing with Postgres and GraphBLAS https://news.ycombinator.com/item?id=19379800


For the bioinformatics datasets I work with it is not cost efficient to load everything into a database. For some of these datasets (e.g. GWAS - Genome Wide Association Studies which are essentially sparse matrices) it might be interesting to explore with graph queries. I guess my ideal would be to have a GraphBLAS equivalent to Spark SQL queries working across files in cloud storage / NAS.


There are several teams working on distributed GraphBLAS (the GraphBLAS/D4M model was designed to run on supercomputers [0]). Kepner's team at MIT is one [1].

NB: D4M was the original name before it was changed to GraphBLAS and became a standard.

[0] GraphBLAS: Building Blocks For High Performance Graph Analytics https://crd.lbl.gov/news-and-publications/news/2017/graphbla...

[1] A Billion Updates per Second Using 30,000 Hierarchical In-Memory D4M Databases https://arxiv.org/abs/1902.00846

http://www.mit.edu/~kepner/


This is very cool. I wonder if the Graphblas and the Dask team should collaborate.

Dask has a production grade distributed computing system (that is cloud compatible with kubernetes, yarn, EMR, Dataproc,etc).


RAPIDS has picked up Dask for multi-gpu aspects of cudf (think spark/pandas on GPUs), and as cugraph is single GPU (https://github.com/rapidsai/cugraph) for going fast on ~billion row datasets... I'm guessing dask+cugraph will be happening for the next 100-1000X, if not already.

Graph partitioning is a weird world, so will be interesting to see!


Interesting - would be interested in a comparison between GraphBLAS (which I had not heard of until just now) and, for example, graph-tool's (https://graph-tool.skewed.de/) underlying algorithms (Boost Graph Library).


It seems that they serve very different purposes. Graphblas is mostly focused on the processing of real-valued functions defined on the vertices and edges of large-scale sparse graphs, and is optimized for this use case (think graphs with thousands of millions of vertices). The boost graph library is not tailored to numeric functions and it will probably be less efficient for this use case.


Thanks! That's a helpful description.


There's a ton of previous threads over the last year with detailed descriptions and discussions...

Search GraphBLAS https://hn.algolia.com/?query=GraphBLAS&sort=byPopularity&pr...


GraphBLAS is also the engine behind RedisGraph. Sparse adjacency matrices are interesting from a graph database perspective in particular because they are typically faster and more compact than the mainstream index-free adjacency systems (e.g. Neo4j).


Very interesting, didn't know about graphblas!

Does anybody here know about the advantages with respect to scipy.sparse ? Does scipy.sparse use graphblas internally?


pygraphblas author here. scipy.sparse and graphblas are superficially similar in that they provide a sparse matrix type, but scipy.sparse is oriented more toward mathematical matrix computation and provides a wonderful and huge bag of tools to work with, for example, solving and doing LU decomposition, see scipy.sparse.linalg.

graphblas provides a different bag of tools oriented toward solving graph problems, primarily around semirings and custom graph element operation. I don't think scipy.sparse, for example, provides a means for plugging in different semiring operations for computing shortest paths or max flow. On the other hand, graphblas does not provide decomposition or solving algorithms like scipy.sparse does.

It is my goal to make it so that scipy.sparse matrices, dense numpy matrices, and networkX graphs can be interchangeable used with pygraphblas in the most reasonably efficient way. This way we get all the tools in both bags. Would love to chat with more scipy developers to see where there can be unification.

UPDATE: it looks like there is some support for graph solving with scipy.sparse.csgraph, would be interesting to see how these compare to GraphBLAS based approaches!


Tim Davis also wrote the underlying code for much of scipy.sparse (his code is underneath almost all sparse matrix libs [1]). See...

[1] Tim Davis Research http://faculty.cse.tamu.edu/davis/research.html


Oh, what a man! He's also the author of the legendary CHOLMOD algorithm, probably one of the most used numerical algorithms ever, just after the FFT.


very cool




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: