In fact, the Neo4j loader process never finished (we killed it after a considerable wait). Outperformed Neo4j 46.7k to 280 N-Quads per second. With the golden data set of 1.1 million N-Quads, Dgraph For Dgraph, we typically send 1000 N-Quads per request and have 500 concurrent connections. In fact, that’s a sure way to make Neo4j data corrupt and hang the system 3.2 Note that we only used 20 concurrent connections and batched 200 N-Quads for each request because Neo4j doesn’t work well if we increase either the number of connections or N-Quads per connection beyond this. In the video below, you can see a comparison of loading 1.1 million N-Quads on Dgraph It is the fastest way we could find to load RDF data into Neo4j. This program used Bolt, a new protocol by Neo4j. We wrote a small program similar to the Dgraphloader which reads N-Quads, batches them and tries to load them concurrently into Neo4j. So, we looked into the next best option to load graph data into Neo4j. While this is okay for relational data, this doesn’t work for graph data sets, where each entity can be of multiple types, and relationships between entities are fluid. One file for each type of entity, and one file per relationship between two types of entities. If we were to try and convert it to CSV format, we would end up with 100s of CSV files. In our 21 million dataset, we have 50 distinct types of entities and 132 types of relationships between these entities. The loader for Neo4j accepts data in CSV format which is essentially what SQL tables have. The first problem we faced was that Neo4j doesn’t accept data in RDF format directly 3.1 We feel this data is highly interconnected and makes a good use case for storing in a graph database. Have been using the Freebase film data for our development and testing. We wanted to load a dense graph data set involving real world data.
0 Comments
Leave a Reply. |
Details
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |