An interesting final project for my stats class
Ever wondered why is it Mount Baker and Mount Rainier, and not Baker Mountain or Rainier Peak? Are Peaks really peakier and Mounts more massive, or do summit titles follow too much whim and misnaming for any significant relationships to exist? When the final project in my statistics class involved conducting a regression analysis on a dataset of our choice, I knew exactly what my topic would be: "Are Summits Titled by Topography or Whim? A Multinomial Logistic Regression Study on Mountains, Mounts, and Peaks." I think I first pondered this question a few years ago, probably during one of those inevitable intellectual discussions climbers have during a cold overnight bivy or an endless snow slog.
Don't worry, I've tried to present the findings in a way that can be understood by the curious climber. I even use my results for interesting applications such as comparing the topography of different mountainous areas (such as Washington vs. Colorado), quantifying the "strength" of a summit's type (such as Prusik Peak being a stronger Peak than Puny Peak), and evaluating if the post-eruption Mount St. Helens is still a real Mount.
The following page gives a summary of some of the key findings. You can download my full paper if you are interested in more. Any outdoors enthusiast is bound to find these results interesting.
Enjoy (and learn)!
A Multinomial Logistic Regression Study on Mountains, Mounts, and Peaks
Click here to download full paper containing more details on the actual analysis!
Why is it Mount Baker and Mount Rainier, and not Baker Mountain or Rainier Peak? Are Peaks really peakier and Mounts more massive, or do summit titles follow too much whim and misnaming for any significant relationships to exist? In this project, I use multinomial logistic regression techniques to show that there are in fact numerically significant differences in the topography of Peaks, Mounts, and Mountains; and that the general characteristics of the three summit types can be quantified in terms of elevation, prominence, and isolation. Using these results, I look at interesting applications such as using the modeling techniques to compare the topography of different mountainous areas (such as Washington vs. Colorado), quantifying the "strength" of a summit's type (such as Prusik Peak being a stronger Peak than Puny Peak), and evaluating if the post-eruption Mount St. Helens is still a real Mount.
(Not all of this is described on this page, see my full report if you are interested in more.)
The dataset used in this analysis lists the 2,027 Peaks, Mounts, and Mountains in Washington state, along with their respective elevations, prominences, and isolations. Although this dataset excludes summits that do not fall within these categories (such as Spires, Towers, numbered ridgeline highpoints, and names that are too creative to be restrained by titles such as Himmelgeisterhorn, The Wart, and Beheaded Dog), the dataset encompasses a majority (roughly 95%) of the significant summits in Washington. The data is freely available online at listsofjohn.com. (See my report for a more complete explanation of how I came to have 2,027 summits in my analysis.)
An interesting characteristic of the data is that there are several summits in Washington that have the same names. This is especially true for the lower profile Mountains that might be known only locally. Also, Mountain-namers have taken a greater liking to colors than have Peak-namers and Mount-namers – there are 17 Green Mountains, 9 Red Mountains, 6 Blue Mountains, 4 White Mountains, 3 Black Mountains, and 1 Purple Mountain in the state.
|Methodology: Multinomial Logistic Regression|
The goal of this project is to associate the summit class (i.e. Mountain, Mount, or Peak) with elevation, prominence, and isolation. The dependent variable is the summit class, which in this analysis is either "Peak," "Mount," or "Mountain." Since names are categorical and have no order, this is a nominal variable (i.e. cannot be ordered in any meaningful way). There are three continuous independent variables that are analyzed in terms of their relationship towards the summit title:
- Elevation: the height of the summit above mean sea level.
- Prominence: the elevation rise of the peak, from the highest connecting saddle of another higher peak.
- Isolation: the distance from a given summit to the nearest higher land or summit.
Multinomial logistic regression exists to handle the case where the dependent variable is nominal and consists of more than two categories. I won't go into the details here (you can see my attached report if you are interested), but in the end I ended up with equations relating the topographical characteristics of the summit (i.e. elevation, prominence isolation) to the probability of the summit being each summit type (i.e. Mountain, Mount, or Peak). The following marginal plots were generated during the regression analysis, and represent the probabilities of summit type and their relationship to the variables in the model (which were elevation, log(isolation), and log(isolation)*prominence).
|Figure: These marginal probability curves represent the probabilities of summit type and their relationship to the variables in the model (which were elevation, log(isolation), and log(isolation)*prominence), holding the other two variables fixed at their median values. These are the curves as predicted by the model. The probabilities for the three summit types are scaled to sum to one. These curves have some physical interpretations. For example, the plot of probability vs. elevation (plot 1) shows that at the median values of prom and iso, Peaks are favored by higher elevations while Mountains are favored by lower elevations. Mounts have a low probability due to the fact that most Mounts have significantly higher prom and iso and not many would be found at the median values used to create the plot (i.e. a marginal plot with higher prom and iso would show Mounts as having the highest probability). The plot of probability versus iso (plot 2) is not directly interpretable due to the interaction of prom.iso, but it suggests that Peaks are favored at low isolations. The plot also suggests that increasing isolation favors Mountains over Mounts, but this is counteracted by the strong interaction term prom.iso (plot 3) that indicates that Mounts are characterized by a combination of high prominence and high isolation. This would be Mounts such as Rainier, Baker, and Adams. The plot of probability vs. prom.iso (plot 3) also shows how Mountains have low prom. The shape of the curve for Peaks on this plot is interesting, suggesting that Peaks generally fall within a narrow band of relatively small isolation and prominence. This narrow band likely is associated with the tendency of Peaks to be found in clustered groups, such as the Picket Range in the North Cascades.|
The results of this regression analysis suggest that there are in fact significant topographical differences between Mountains, Mounts, and Peaks. The trends evidenced for each summit type seem to fit the general mental images one forms when they picture Mountains, Mounts, and Peaks, a fact which adds intuitive support to my model. Despite the fact that no summit-naming convention exists and summit names are often just a result of what sounded good to the namer, it is clear that there is a tendency – subconscious or otherwise – to call low profile masses Mountains, large isolated massifs Mounts, and clustered peaky summits Peaks. Some inconsistencies might result from how a summit is perceived from different angles – such a summit that appears as a rocky bump on one side and a sheer face on the other. Although whim and inconsistency are inherent to naming, in the end any apparent inconsistencies do not invalidate the generality we can derive from the model, but merely point out that there in fact is a generality that can be used to either predict or classify a summit as is a Mountain, Mount, or Peak.
The following figure shows the general characteristics of Mounts, Mountains, and Peaks evidenced by the data.
|Application I: Quantifying the "Strength" of Summit Type|
So, what can we do with the regression results, other than make general conclusions about how topography is related to summit type? An interesting application of the model is to define a factor (which I will call the "Strength" factor, S) that gives an indication of how strongly as summit fits into a particular summit class, based on the model equations. For example, Prusik Peak and Little Tahoma Peak are certainly more "peaky" than Eldorado Peak or Dome Peak, and Glacier Peak seems more of a Mount than a Peak. The Strength factor, S, quantifies the topographic differences between these peaks.
The multinomial logistic regression created two model equations. A summit's set of elev, prom, and iso can be input into the model equations to generate the probability that the summit is either a Peak, Mount, or Mountain. Mathematically, the Strength factor is based on the differences between the predicted probabilities for the three summit types. For more details, including a formula for the strength factor, see my attached paper.
The Strength factor, S, ranges from -1 to 1, where:
- o A value close to 0 indicates a summit with a "weak" summit type (0 indicates equal probability of all summit types). These summits are expected to be ones that would be more visually ambiguous as to what summit type category they qualify for – for example, a prominent and lofty summit in the middle of a range that is not clearly a Mount or a Peak, or a relatively low-elevation but isolated summit that some would view more as a Mountain and others would view more as a Mount.
- o An absolute value close to 1 indicates a summit with a "strong" summit type (1 indicates a summit that is a perfect representation of its modeled summit type); in general it seems that S≥0.33 can be considered a strong summit type, and S≥0.5 very strong. Strong summits are closer to the stereotypical view of their summit class, such as Mount Rainier in the class of Mounts.
- A negative number indicates the summit is mistitled for its “true” summit type (according to the model, at least). Strength factors closer to -1 indicate summits that show strong characteristics of a different summit type.
The following table provides some examples of some well-known summits that are classified as "strong," "weak," and "mistitled" according to the model.
|"Strong" Mounts (S>~0.5)||"Strong" Peaks (S>~0.5)||"Strong" Mountains (S>~0.5)|
|Rainier, Mount||Old Guard Peak||Jackass Mountain|
|Baker, Mount||Sherman Peak||Green Mountain|
|Olympus, Mount||Inspiration Peak||Chuckanut Mountain|
|Adams, Mount||Little Tahoma Peak||Sumas Mountain|
|"Weak" Mounts (S<~0.2)||"Weak" Peaks (S<~0.2)||"Weak" Mountains (S<~0.2)|
|Spickard, Mount||Snowfield Peak||Winchester Mountain|
|Deception, Mount||Eldorado Peak||Snowqueen Mountain|
|Bonaparte, Mount||Tomyhoi Peak||McGregor Mountain|
|Dome Peak||White Chuck Mountain|
|"Mistitled" Mounts (S<0)||"Mistitled" Peaks (S<0)||"Mistitled" Mountains (S<0)|
|Terror, Mount||(Pk)||Gilbert Peak||(Mt)||Goode Mountain||(Mt)|
|Redoubt, Mount||(Pk)||Lizard Head Peak||(Mtn)||Remmel Mountain||(Mt)|
|Larrabee, Mount||(Pk)||Puny Peak||(Mtn)||Silver Star Mountain||(Pk)|
|Pilchuck, Mount||(Mtn)||Lummi Peak||(Mtn)||South Hozomeen Mountain||(Pk)|
|Torment, Mount||(Pk)||Satus Peak||(Mtn)||Liberty Bell Mountain||(Pk)|
|Si, Mount||(Mtn)||Glacier Peak||(Mt)||Sahale Mountain||(Pk)|
|Application II: Is Mount St. Helens still a Mount?|
At 8:32 am on May 18, 1980, Mount Saint Helens became about 1300 ft shorter. Its catastrophic eruption was the deadliest and most economically destructive volcanic event in the history of the United States. An interesting question, based on my regression results, is whether the post-eruption Mount Saint Helens is still indeed a lofty Mount? Or has the loss of 1300ft of elevation (and prominence) qualified it as a more mellow Saint Helens Mountain?
The dataset for the regression analysis listed Mount Saint Helens with its pre-eruption elevation, prominence, and isolation, as these were its statistics at the time the summit was named. According to the odds prediction equations generated by our regression analysis (which predict the odds that a summit is a Mount, Peak, or Mountain given its elevation, prominence, and isolation), the pre-eruption Saint Helens was indeed a Mount, with 74% odds (with only 6% odds of being a Mountain). Plugging the post-eruption statistics into the same equations shows that Saint Helens is still a Mount, but a more "mountainous" Mount now with 59% odds of being a Mount and 18% odds of being a Mountain.
So what scale of eruption would have been required for Saint Helens to become a Mountain? It turns out that in order for the odds of being a Mountain to outweigh the odds of being a Mount, the eruption would have had to blast 1200 more feet from the summit. This would have required orders of magnitude more energy.
And Rainier? Well, it turns out Rainier would remain a Mount until it blew enough of its top to become shorter than any of the summits nearby. If Little Tahoma remained standing, then Rainier would only have to lose 3300 ft to become Rainier Peak.
Regression can be fun and interesting!