Python offers a wide range of libraries that allow us to easily and quickly address problems in various research areas. Geospatial data analysis and graph theory are two research areas where Python provides a powerful set of useful libraries. In this article, we will conduct a simple analysis of world borders, specifically exploring which countries share borders with others. We will begin by utilizing information from a GeoJSON file containing polygons for all countries worldwide. The ultimate goal is to create a graph representing the various borders using NetworkX and utilize this graph to perform multiple analyses.
GeoJSON files enable the representation of various geographical areas and are widely used in geographical analysis and visualizations. The initial stage of our analysis involves reading the countries.geojson
file and converting it into a GeoDataFrame
using GeoPandas
. This file has been sourced from the following GitHub repository and contains polygons representing different countries worldwide.
As shown above, the GeoDataFrame
contains the following columns:
ADMIN
: Represents the administrative name of the geographical area, such as the country or region name.ISO_A3
: Stands for the ISO 3166–1 alpha-3 country code, a three-letter code uniquely identifying countries.ISO_A2
: Denotes the ISO 3166–1 alpha-2 country code, a two-letter code also used for country identification.geometry
: This column contains the geometrical information that defines the shape of the geographical area, represented asMULTIPOLYGON
data.
You can visualize all the multi polygons that make up the GeoDataFrame
using theplot
method, as demonstrated below.
The multi polygons within the geometry
column belong to the class shapely.geometry.multipolygon.MultiPolygon
. These objects contain various attributes, one of which is the centroid
attribute. The centroid
attribute provides the geometric center of the MULTIPOLYGON
and returns a POINT
that represents this center.
Subsequently, we can use this POINT
to extract the latitude and longitude of each MULTIPOLYGON
and store the results in two columns within the GeoDataFrame
. We perform this calculation because we will later use these latitude and longitude values to visualize the nodes on the graph based on their real geographic positions.
Now it’s time to proceed with the construction of the graph that will represent the borders between different countries worldwide. In this graph, the nodes will represent countries, while the edges will indicate the existence of a border between these countries. If there is a border between two nodes, the graph will have an edge connecting them; otherwise, there will be no edge.
The function create_country_network
processes the information within the GeoDataFrame
and constructs a Graph
representing country borders.
Initially, the function iterates through each row of the GeoDataFrame
, where each row corresponds to a different country. Then, it creates a node for the country while adding latitude and longitude as attributes to the node.
In the event that the geometry is not valid, it rectifies it using the buffer(0)
method. This method essentially fixes invalid geometries by applying a small buffer operation with a distance of zero. This action resolves problems such as self-intersections or other geometric irregularities in the multipolygon representation.
After creating the nodes, the next step is to populate the network with the relevant edges. To do this, we iterate through the different countries, and if there is an intersection between the polygons representing both countries, it implies they share a common border, and, as a result, an edge is created between their nodes.
The next step involves visualizing the created network, where nodes represent countries worldwide, and edges signify the presence of borders between them.
The function plot_country_network_on_map
is responsible for processing the nodes and edges of the graph G
and displaying them on a map.
The positions of the nodes on the graph are determined by the latitude and longitude coordinates of the countries. Additionally, a map has been placed in the background to provide a clearer context for the created network. This map was generated using the boundary
attribute from the GeoDataFrame
. This attribute provides information about the geometrical boundaries of the represented countries, aiding in the creation of the background map.
It’s important to note one detail: in the used GeoJSON file, there are islands that are considered independent countries, even though they administratively belong to a specific country. This is why you may see numerous points in maritime areas. Keep in mind that the graph created relies on the information available in the GeoJSON file from which it was generated. If we were to use a different file, the resulting graph would be different.
The country border network we’ve created can swiftly assist us in addressing multiple questions. Below, we will outline three insights that can easily be derived by processing the information provided by the network. However, there are many other questions that this network can help us answer.
Insight 1: Examining Borders of a Chosen Nation
In this section, we will visually assess the neighbors of a specific country.
The plot_country_borders
function enables quick visualization of the borders of a specific country. This function generates a subgraph of the country provided as input and its neighboring countries. It then proceeds to visualize these countries, making it easy to observe the neighboring countries of a specific nation. In this instance, the chosen country is Mexico, but we can easily adapt the input to visualize any other country.
As you can see in the generated image, Mexico shares its border with three countries: the United States, Belize, and Guatemala.
Insight 2: Top 10 Countries with the Most Borders
In this section, we will analyze which countries have the highest number of neighboring countries and display the results on the screen. To achieve this, we have implemented the calculate_top_border_countries
function. This function assesses the number of neighbors for each node in the network and displays only those with the highest number of neighbors (top 10).
We must reiterate that the results obtained are dependent on the initial GeoJSON file. In this case, the Siachen Glacier is coded as a separate country, which is why it appears as sharing a border with China.
Insight 3: Exploring the Shortest Country-to-Country Routes
We conclude our analysis with a route assessment. In this case, we will evaluate the minimum number of borders one must cross when traveling from an origin country to a destination country.
The find_shortest_path_between_countries
function calculates the shortest path between an origin country and a destination country. However, it’s important to note that this function provides only one of the possible shortest paths. This limitation arises from its use of the shortest_path
function from NetworkX
, which inherently finds a single shortest path due to the nature of the algorithm used.
To access all possible paths between two points, including multiple shortest paths, there are alternatives available. In the context of the find_shortest_path_between_countries
function, one could explore options such as all_shortest_paths
or all_simple_paths
. These alternatives are capable of returning multiple shortest paths instead of just one, depending on the specific requirements of the analysis.
We employed the function to find the shortest path between Spain and Poland, and the analysis revealed that the minimum number of border crossings required to travel from Spain to Poland is 3.
Python offers a plethora of libraries spanning various domains of knowledge, which can be seamlessly integrated into any data science project. In this instance, we have utilized libraries dedicated to both geometric data analysis and graph analysis to create a graph representing the world’s borders. Subsequently, we have demonstrated use cases for this graph to rapidly answer questions, enabling us to conduct geographical analysis effortlessly.
Thanks for reading.
Amanda Iglesias