Graphs are everywhere. They are used in social networks, the world wide web, biological networks, semantic web, product recommendation engines, mapping services, blockchains, and Bitcoin flow analyses. Furthermore, they’re used to define the flow of computation of software programs, to represent communication networks in distributed systems, and to represent data relationships in large organizations.
These are the most important graph applications:
This article dives into these 9 applications. At the end of the article, you will find awesome resources to download graph data sets. Plus, I will answer the question about the largest graphs in the world.
You can also watch the explainer video from Finxter founder Chris if you like.
How does that look like? A digitized social network describes each user as a graph vertex and each friendship relation as an edge between two users. The hundreds of billions of friendship relations in the Facebook social network together build a graph data structure of massive scale.
Each time you use Facebook Graph Search, Facebook runs an algorithm on this graph data structure.
Each time a client of Facebook wants to learn about who is an influencer in a certain field, they run a graph algorithm (simple or complex). Such a graph algorithm can determine your position in the network. It can use the graph structure to predict the flow of information (“how many people would buy the new MacBook, if you recommended it?”).
The first important application scenario for graphs is social network analysis. And this topic is so important that you would not even have to look further to find a solid motivation for graph data.
But, of course, we will do exactly this.
The web is a huge collection of documents pointing to each other via hyperlinks. In other words, the web is another massive graph data set.
You can model the web as a graph by treating each web page as a graph vertex and each hyperlink as a graph edge.
Many business models of large companies (such as Google) revolve around analyzing the massive web graph.
Think about it: what does the Google search really do? Given your information need, Google needs to present you with the most relevant piece of content from this huge web graph. If they do a good job (they do), they satisfy your information need. Being satisfied with the service causes you to reuse it again and again.
The problem is that it’s still difficult for machines to differentiate between good and bad content. 99% of the web content you would consider as trash. How does the Google bot know which content has high quality?
However, not all links are created equal. Which link source carries more value: Wikipedia.org or franks-cute-cat-videos.wordpress.com? Simply put, if Wikipedia links to your content, it’s more likely that it has high quality.
This idea is the basis of Google’s famous PageRank algorithm. It is an iterative graph algorithm that determines the rank of a web page (higher is better) based on the summed ranks of web pages linking to them. How do we know their rank? We look into the summed ranks of web pages linking to them. As you can see, this is an iterative procedure that refines the ranks of all the web pages one step at a time. After converging, the PageRank algorithm gives a pretty accurate measure of how trustworthy and renowned a web page is.
Ok, let’s go back to the question: what are application scenarios or use cases of graphs in computer science? The business models of the largest companies in the world, with billions of dollars yearly profits, center around efficient processing and information retrieval from large graphs.
Let’s move on to another application domain of graph theory: biological networks. The (biological) environment is actually one of the largest sources of real-world graphs. Let’s explore some biological networks in the following collection.
Brain networks. Neuron A connects to neuron B via the synapsis (A,B). A single human brain contains 100 billion neurons (source). The neurons of the brain of a child are interconnected by more than 1 quadrillion (1,000,000,000,000,000) synapses (source).
Protein interaction networks. Protein A is connected to protein B if they have interacted within a certain time frame. “Protein-protein interactions (PPIs) are the physical contacts of high specificity established between two or more protein molecules as a result of biochemical events steered by electrostatic forces including the hydrophobic effect” (source). The number of such interactions can be huge. You may ask: why do we need these networks? The reason is that they are critical to study the efficiency of evolutionary processes. A recent study showed: “[…], it has been discovered that proteins with high degrees of connectedness are more likely to be essential for survival than proteins with lesser degrees” (source).
DNA-Protein interaction networks. DNA (or RNA) A is connected to Protein B if they have interacted within a certain time frame. These interaction networks are also called “Gene Regulatory Networks”. They are important points of study in the field of morphogenesis (= the process of creating body structures).
Food networks. This bloody biological network describes one of the most natural processes in the world. Species A is connected to species B if A eats B. The stability of these networks over time is an important area of research in ecology. The network is not trivial: there are more than 7.6 million species in the world. And many species eat hundreds of different species. Such food networks are full of valuable insights into why certain species die out.
There are myriads of similar biological interaction networks. If you want to learn more about biological networks, check out this excellent Wikipedia article.
The knowledge of the world is inherently graph-structured. Information A is connected to information B if A stands in relation to B in some specific way. Consider the following types of information:
There are hundreds of trillions such relations between two entities on the web. Among others, Google collects all these relations and merges them into a massive knowledge graph. This huge graph enables machines to automatically infer new knowledge from the graph. For example, Donald Trump lives in Texas; Many people living in Texas like steaks; Hence, it is likely that Donald Trump likes steaks as well.
Few people know that it’s possible to access this knowledge graph gathered by Google. Google provides an API that you can easily use and play around with.
Think about the opportunities of massive knowledge graphs that are shared among devices all over the world! The emerging collective intelligence makes applications smarter – even without high computing power. Every dumb device has access to the world’s wisdom. Now that sounds a bit scary, I know. But this article is only a compilation of graph applications and practical use cases. It’s neither more nor less.
In the last sections, you have learned that one of the largest company in the world centers around processing massive graph data. Next, you will learn about how the company owned by the richest man on the earth (Jeff Bezos’ net worth is 141 Billion US-$ in 2018, source) uses graphs to boost its revenue. Amazon. If you buy a product, Amazon recommends you buying similar products. These recommended products are based on what other users have already bought. For example, you buy a book about Python; Amazon recommends you to buy a book about Scrum.
How do these recommender systems work? At the heart of these systems are huge bipartite graphs.
Here is a quick reminder of what bipartite graphs are:
“In the mathematical field of graph theory, a bipartite graph (or bigraph) is a graph whose vertices can be divided into two disjoint and independent sets U and V such that every edge connects a vertex in U to one in V. Vertex sets U and V are usually called the parts of the graph. Equivalently, a bipartite graph is a graph that does not contain any odd-length cycles” (source).
Amazon knows for each user, which products he has bought (and liked). It’s a bipartite graph. On the left side are all the users. On the right side are all the products. If a user has bought (and liked) a product, there is a connection from the user to the product. The graph is bipartite as there can not be a connection between two users or two products. (Neither can a user buy another user nor can a product buy another product.)
The graph problem Amazon has to solve now is the following. Given the existing user-product graph. What are some likely edges from users and products? We can do this by clustering users together that bought the same products. It is likely that users will buy similar products in the future as well. Those are the recommendations provided by Amazon. While the algorithms are much more sophisticated, this remains the main idea.
We have already seen that artificial neural networks are huge graphs connecting neurons via artificial synapses. There are many different types of neural networks. The main difference between these types is the architecture of the graphs. If you need visual examples, check out the following awesome cheat sheet.
You are using an important graph application every couple of days. Navigation has become the killer application in mobile scenarios. Apps like Maze, Google Maps, Apple Maps, and Uber are installed on every smartphone. Do you have studied a subject related to computer science? Then you know that navigational problems are inherently modeled as graph problems. Think about the traveling salesman problem, shortest path problems, Hammington paths, etc. Do you want to know the fastest route from point A to point B? Your navigation app makes a graph problem out of it. What is the underlying graph? Create a vertex for each location. And create a connection between two locations A and B if there is a direct road between locations A and B. Finally, you annotate the road with the traveling time from point A to point B.
Do you possess Bitcoins? How did you get them? In any case, you created a Bitcoin wallet and transferred funds into your own wallet. The transaction of moving Bitcoins into your wallet was stored for all times in the Bitcoin Blockchain.
What is the Blockchain?
It is “an open, distributed ledger that can record transactions between two parties efficiently and in a verifiable and permanent way” (source).
In other words, the Blockchain is a large graph. The vertices are blocks, each storing many transactions. The edges connect subsequent blocks. The largest branch initiating from the first block (THE block-chain) is the currently valid state of historical transactions.
The Blockchain is an interesting graph that is often analyzed in the cryptocurrency space. Another insightful graph arises when you use Bitcoin wallets as vertices and transactions between wallets as edges. The resulting graph reflects the money flow between Bitcoin wallets. This graph is critical to learning about global money flow patterns.
Graph algorithms are also important in many other domains and use cases. For instance, the field of computer chip design relies on resource-efficient ways to place signals on a single chip. The numerous signals must be embedded onto the two-dimensional plane. Here is an interesting paper about chip design.
Here are three of the largest graphs that are currently available in the world:
Thanks for reading through this article! If you have any additional graph applications (or graph resources), please let me know by replying to any of our emails (register by clicking the button below and follow the steps).
Want to dive deeper into computer science with Python? Register for the “Coffee Break Python” newsletter! It’s a 100% **FREE** Python email course (BONUS: 5 Python cheat sheets for absolute beginners).
Be on the Right Side of Change 🚀
Learning Resources 🧑💻
⭐ Boost your skills. Join our free email newsletter (160k subs) with daily emails and 1000+ tutorials on AI, data science, Python, freelancing, and business!
Join the Finxter Academy and unlock access to premium courses 👑 to certify your skills in exponential technologies and prompt engineering.
New Finxter Tutorials:
Finxter Categories: