Data Source: Acquired from Codecademy, which says the data is “from the National Parks Service”.
National Parks are large natural or near natural areas set aside to protect large-scale ecological processes, along with the complement of species and ecosystems characteristic of the area, which also provide a foundation for environmentally and culturally compatible spiritual, scientific, educational, recreational and visitor opportunities.1 In the United States, there are sixty-three national parks (not to be confused with other kinds of parks).2 Former US President Barack Obama has been quoted as saying “This [National Parks] was America’s best idea”.3
A Data Scientist can find themselves dealing with data from different fields of study. What if one day, as a data scientist, you were given a list of endangered species and tasked with creating a prioritization order for choosing which species to focus on, if manpower and resources are limited? We will pretend we are in that very situation here and explore a possible approach.
We have been given data from four American National Parks. Yosemite, Yellowstone, Bryce, and Great Smoky Mountains.
Start by exploring the data we’ve been given. This Interactive Treemap contains different species grouped by taxonomic class and conservation status. Zoom in on different categories by clicking on its box. Zoom out by clicking on the navigation bar at the top of the figure. Reveal more information on different elements of the treemap by moving the cursor over it. Tap and Hold to reveal information on a touchscreen.
There are four different conservation status labels. In order to start assigning priority, we need to understand these labels.
Although Species of Concern dominates the group, we can narrow our focus only on Endangered and Threatened species, as those are the only ones offered legal protection.
There are 14 species listed as endangered and 9 different species listed as threatened. How would we choose which to prioritize? One of the criteria we can use is a species’ level in the food pyramid, known as its trophic level. Here’s an explainer from Brittanica.
Trophic level, step in a nutritive series, or food chain, of an ecosystem. The organisms of a chain are classified into these levels on the basis of their feeding behaviour. The first and lowest level contains the producers, green plants. The plants or their products are consumed by the second-level organisms—the herbivores, or plant eaters. At the third level, primary carnivores, or meat eaters, eat the herbivores; and at the fourth level, secondary carnivores eat the primary carnivores. These categories are not strictly defined, as many organisms feed on several trophic levels; for example, some carnivores also consume plant materials or carrion and are called omnivores, and some herbivores occasionally consume animal matter. A separate trophic level, the decomposers or transformers, consists of organisms such as bacteria and fungi that break down dead organisms and waste materials into nutrients usable by the producers.7
Trophic pyramid, Image, Encyclopædia Britannica, Access Date: May 9, 2021
Energy pyramid, Image, Encyclopædia Britannica, Access Date: May 9, 2021
Notice the numerical values of energy, kcal, as you go from the bottom to the top. A general trend is that it goes down by a factor of ten each time you ascend a level. This is because an organism only stores 10% of the energy it eats as potential energy in its body. The rest is lost as heat energy or as kinetic energy used to move around and breathe.
Another trend, more emphasized by the pyramid, is that there are less numbers of species towards the top. For our analysis this means that species at the top will be given more importance. If a species at the top goes extinct, there are less, or possibly even none, that are ready to take its place.
Why do we even require that all spaces on the pyramid be occupied? Every single place on the pyramid has an important function. The species at the top regulate the numbers of the species at the bottom. When a species near the bottom gets out of control it can eat up all the food supply, affecting all the other species in the whole ecosystem. Effects that ripple all the way from the top to the bottom of the pyramid are known as trophic cascades. A case study is the Gray Wolves of Yellowstone Park. In the 1920s, the Gray Wolves went locally extinct in the Park.
Original File
Creative Commons Attribution-Share Alike 4.0 International
Credit: Wikimedia Commons User Ccarroll17
“Once the wolves were gone, elk populations began to rise. Over the next few years conditions of Yellowstone National Park declined drastically. A team of scientists visiting Yellowstone in 1929 and 1933 reported, “The range was in deplorable conditions when we first saw it, and its deterioration has been progressing steadily since then.” By this time many biologists were worried about eroding land and plants dying off. The elk were multiplying inside the park and deciduous, woody species such as aspen and cottonwood suffered from overgrazing. The park service started trapping and moving the elk and, when that was not effective, killing them. Elk population control methods continued for more than 30 years. Elk control prevented further degradation of the range, but didn’t improve its overall condition”8
The Gray Wolves were reintroduced into the park in the 1990s. As a result, the Elk populations were reduced, the plants in the park have recovered, and the overall health of the Yellowstone ecosystem has greatly improved. The wolves do not feed on the elk to the point of extinction, but they keep the elk population in check.8
Now that we have established the concept of trophic levels, we can start identifying the levels of the species in our data.
After doing some internet research we’ve found information on the diet and predators of each of our species, which helped us assign trophic levels. We are only doing our best to assign trophic levels. In reality, there are many nuances and things to consider, such as a species being at different trophic levels throughout its life, like a tadpole that becomes a frog. We have two visualizations; a treemap and a trophic pyramid.
(Species information on mouse hover, tap and hold on mobile)
Since trophic_level
and conservation_status
are both ordinal categorical values we can convert them into numerical values that we can use to calculate a priority score. We will assign values from 1 to 6 for trophic_level
, and 1 to 2 for conservation_status
. Next, we will normalize the values of trophic_level
and conservation_status
. We are assuming equal distance between trophic levels (which is technically not ideal). A further step for refinement would be to do additional research and consultation to apply appropriate weights/distances to the trophic levels, which is beyond the scope of this article.
Our formula is
\[priority score = conservation status \cdot trophic level\]We are multiplying instead of adding because we are making the assumption that the factors involved are not independent of each other, rather, that they all can combine in the same ecosystem to create a compounding effect.9 This is reflected by the fact that the loss of a single species or habitat can have massive/cascading/chain-reaction effects throughout the whole system.
In the figure below, the size of each box represents our calculated priority score of each species. (Species information on mouse hover, tap and hold on mobile)
In our treemap, the Grizzly Bear has a higher score than the June Sucker, even though the Grizzly Bear is threatened and the June Sucker is endangered, which is what we want because we are considering trophic level as well as conservation status.
There is much room for improvement. More factors can be added, and relative weights can be assigned to each factor. Additional factors for each species could be:
A modified* weighted product formula for decision-making is
\[Priorityscore = \prod_{j=1}^{n}{a_{j}}^{w_j}\]Applying the formula to get a prioritization score would be
\[priority score = conservation status^{w_1} \cdot trophic level^{w_2} \cdot environmenttransform^{w_3} \cdot pollinationlevel^{w_4} \cdot DiseaseIncubation^{w_5} \cdot Populationestimate^{w_6} \cdot HabitatArea^{w_7}\]*The full weighted product formula has a root of k, where k is the sum of all relative weights. We do not need it if we are only concerned with relative scale and not with exact central location.
Github Repository of this Project