Welcome to SP!  -
Operation Clean up SummitPost
Article
 

Operation Clean up SummitPost

 
Operation Clean up SummitPost

Page Type: Article

Object Title: Operation Clean up SummitPost

 

Page By: Josh Lewis

Created/Edited: May 2, 2012 / Jun 1, 2013

Object ID: 788157

Hits: 2471 

Page Score: 84.82%  - 19 Votes 

Vote: Log in to vote

 

Do Your Part

For a long time SummitPost has been experiencing periods where it goes slower than your average website. This is caused by all the great contributions which makes the database run slow. As one of the greatest mountaineering websites of all times I feel as though this is a issue where we can help make a difference. This article was created to help show you some of the things you can do.



I feel as though SP is coming into a new Era where we as members have to be responsible with how we post and what extras we have on the site that are not necessary. Take a Stand and join the operation.

Private Messages

Many of the active members including myself send lots of Private Messages, over the years this could range from hundreds to thousands. If you never look at them, it's time to remove them. But what about the Private Messages I wanna keep? Simply archive them which will make it easier to find and separate it from the main inbox. When removing private messages you can remove them in bulk by hitting the "select all" link at the bottom of the inbox page. Then hit the delete button (or archive if you want to save them). This may take a while depending on how many private messages you have built up. Even just deleting some of your inbox messages would be doing SP a favor, so don't feel obligated to go crazy with it. But if your willing, be sure to do the same with the Sent Box.

Select All in the Inbox
Select All

Sent Box
Sent Box

Save a PM you Liked
Archive the Private Messages you want to Keep

Page Versions

Page Versions are a huge part of the problem from my understanding. There are over 35,000 pages posted here on SP (not including images) which most of them have back up versions. These are created to help restore information in case of accidental removal of important text or other scenarios. There are 3 back ups per page, so in theory there are 105,000 pages from back ups alone! I'm not saying back ups are a bad thing, but if your the page owner and/or has a trusted admin for your page, then you should remove the back ups. This can be done by going to any of your pages, clicking on the page menu link called "View History". Near the bottom left there is a link that says "All" which will check all the versions, then hit the button called "Delete Selected Versions". This will not harm your page in any way.

Page Version History
Deleting Former Page Versions

Be sure to do this after making all/most of your revisions to that pages or else the back ups will appear again. Keep in mind if you have a lot of editors or a admin of a page you don't fully trust, then keep the back ups for that page. Then proceed to do this with more of your pages to help save space and database load.

Resizing Images

Resizing pictures is another important thing you can do. Unfortunately it does very little for the database, but does save space on the SummitPost server which over time might save a little bit of speed if many people start doing this. To do this use your favorite resizing program, for me I use Picasa 3. I use the export mode to resize the pictures which can be seen on youtube. You can also find online image resizers which may take longer one by one. Resizing your image will also speed up the upload process.

Other Things

There are a few other things we can do to help like deleting irrelevant pictures or ones that are not very informative. Also deleting old forum posts that do not serve helpful to SP. If you can't delete it, ask the elves to delete it.

Deleting a Forum Post
Delete Forum Posts that no longer help the Community

When posting pictures, try to make them count. I take thousands of photographs and often times have a hard time choosing between ones that are similar but are not exactly the same. Some might post both pictures, but I discourage this because not only of space take up, but for the audience who browses though pictures.

Back in the "Good old days" of SP you could post all kinds of stuff and not worry about it. SP has gotten a lot more content since and needs more care for what gets posted. It is even more important than ever that posts are relevant to SP. I know it's gonna sound a little strange, but here is one way to think of it: In the wilderness back in the day you could camp where ever you want when ever you wanted to. Now days there are many designated camping zones and limitations to camping which is set in place to help preserve the wilderness.

I, Josh Lewis have done pretty much everything in this article. I hope others can join in as well. Keep in mind that SP still may run slow even with a lot of people participating, but every bit helps. But it's worth a shot, I say go for it!

About this Article: If you have anything to add or correct, please let me know. I hope that no one was offended in any way, but want whats best for SummitPost.

Images

I want You to Help Clean  up SummitPostPage Version HistoryDeleting a Forum PostSelect All in the InboxSave a PM you LikedSent Box

Comments


[ Post a Comment ]
Viewing: 1-11 of 11    

KieferThanks!

Kiefer

Hasn't voted

Thanks for this short tutorial, Josh. Anything to get SP to run faster, I'm all for.
Appreciate the tips.
Posted May 2, 2012 12:49 am

Josh LewisRe: Thanks!

Josh Lewis

Hasn't voted

I just hope people take this seriously.
Posted May 21, 2012 7:32 pm

RedwicThis is great, Josh!

Redwic

Voted 10/10

It shows a lot of maturity. Thanks for the tips!

In fact, I am starting to clean-up my Private Message space, right now! In doing so, here is a helpful suggestion...
-> Many of the Private Messages in my Inbox and Sentbox are parts of reply strings. If someone has a repeated PM string (reply/reply/reply) happening back-and-forth with another SP member, and if all of the prior PMs in the string are still in the latest PM, then just keep the latest one if you don't want to lose the information within the PM correspondence. Then you don't have a bunch of space being taken for stuff already written/saved.

Here is another suggestion:
-> When a person is creating a new Mountain page, that same person should not make a separate Route page for the standard approach. That separate Route page (and its corresponding backup pages) creates *a lot* of unnecessary space on SummitPost. (An exception might be if a particular route/trail is used for multiple peaks.)

Just include the standard approach information on the main Mountain page, and leave Route pages only for alternate route options (if not already listed on the main page). The same goes for those who add an Album or Trip Report when they create a Mountain page, when those pages basically have the same information and/or photos as the main page. It just duplicates information and at least triples SP space (due to backup pages). Several SP members still do these unnecessary multiple-add contributions on a regular basis, and it needs to be cut down to help free-up space on the overall website. Heck, even I have been guilty of doing it a few times, in the past. But now, especially because of this article, I will be more cognizant of the ramifications in the future.
Posted May 2, 2012 5:19 am

Sarah SimonRe: This is great, Josh!

Sarah Simon

Hasn't voted

Redwic,

I need to respectfully and completely disagree with the "lump everything on one page" approach. SP was designed with a hierarchical structure (Area/Range >> Mountain/Rock >> Route...etc.) and it's a great way to keep things organized and easy to browse and locate. When we ask too much of a mountain page - cramming multiple peaks plus routes plus a mini trip report - into the page, then we are not leveraging the native organizational structure of SP.

(For the record, I have included route info for the simplest of routes in the my pages under the "Getting There" section. Another contributor is always welcome to submit a more detailed description of the primary route, if desired). Typically in my pages, however, if the route description extends beyond a map plus mileage and elevation gain, I'll break it out into a route page.)

Sarah
Posted May 2, 2012 12:06 pm

Josh LewisRe: This is great, Josh!

Josh Lewis

Hasn't voted

Sarah, I don't think you fully understand what Redwic is getting at. Redwic and I have seen a few SP members post route pages on mountains that already have a description for the main route. In fact I've seen it go as far as the page owner makes a very simple mountain page and then creates a route page for the standard route even though they own the main mountain page. Many of them do it for the points. Granted there are a few scenarios which I consider making a "standard route" page for a mountain okay.

1. If you contact the page owner to add more standard route information to the main mountain page and they refuse, this is in my opinion enough cause to grand you a fair way of making a route page as long as you provide more information on the route than the mountain page that is "low quality / less than what you would put".

2. If the mountain is world famous and/or has lots of information then putting the standard route included might over whelm the reader. I've posted about this on another site which people say a route page is preferred.

3. The mountain page owner provided 0 or almost no information on the standard route, you could either contact the owner or create a route page on that route.

In regards to the other things you said, Redwic was not encouraging us to put our trip reports or cram multiple mountains onto one page.
Posted May 2, 2012 1:06 pm

RedwicRe: This is great, Josh!

Redwic

Voted 10/10

I don't want to lump everything into one page. I never wrote that, either. There are some SP members who make a Mountain page, but then use all of the exact same information and/or photos to make Albums and TRs. It is redundant. As for routes, it is redundant for the Mountain page creator to make a Route page for a standard approach if the description and photos for that standard approach can already be put on the main page. Certainly, there are exceptions... such as some routes which require more in-depth descriptions and photos so people know exactly what to do and when to do them, or famous mountains that have many routes.

Again, the purpose of this article, as well as my original response to it, is/was to show there are ways which SummitPost space can be cleaned up. Whether or not you choose to assist with those suggestions is your choice.

UPDATE: When I submitted this note, I did not realize that Josh had already just responded. He gets what I was trying to say.
Posted May 2, 2012 1:11 pm

Sarah SimonRe: This is great, Josh!

Sarah Simon

Hasn't voted

Thanks to both of you (Redwic and Josh) for your dedication to this site.

I agree to disagree.

Contributors who wish to do a thorough job of submitting material within the hierarchical structure baked into the SP system should not be discouraged from doing so. They most assuredly should not be implicated in "point hoarding" for leveraging the native SP hierarchy to share detailed beta with the community.

I recall one contributor, in particular, who does not contribute often, but when he does, he submits a "full suite" of material (Mountain/Rock, Route & Trip Report). Each type of contribution he submits adds value to the site. There is little/no material overlap between the contributions. I can't fathom faulting him for this, and am grateful he spends the extra effort submitting the unique components to share his experience with the community.
Posted May 2, 2012 1:42 pm

rggThe amount of data is not the issue

rgg

Hasn't voted

If a modern IT system is slow, the cause is rarely the amount of data in the database, so, without further details about the Summitpost database, or better yet, measurements of certain actions, I don't believe this is the case with Summitpost.

The strongest argument for this is that it's not always slow, just occasionally. If the amount of data was to blame, it would be consistently slow.

My other argument is based on how databases work. Suppose you have one million items of something, say with an id, and an index to go with it. You want to retrieve something. Now, think of the index as a book of one million pages. Flip it open in the middle. Is the id on that page higher than what you are looking for? Then the item must be in the upper half, and now you only have half a million pages to searh. So, open the book at 3/4 and repeat.

Starting at one million, you have to half the search space 20 times to get exactly to the right page. And here is the thing: for two million, it only takes one more time, so 21 times: only 5% slower!
Looking at it another way: if the number of items increases from a thousand to a million, the index search time only doubles!

Of course, the index page is not yet the item you were looking for, but it will have an address on a hard drive, and with that, the item can be retrieved in milliseconds.
So you might say, perhaps Summitpost doesn't have good indexes? Well, indexes exist or they don't, but they don't disappear temporarily. So, the fact that Summitpost is not slow all the time means that there are suitable indexes.

In general, if a database is slow occasionally, while it isn't even used heavily, I would look at other things than the amount of data, starting with locking issues. If something is being stored in the database, there will be a lock. Preferably for a very short time, because a lock can mean that certain parts of the database are inaccessible for certain actions. The biggest problems can occur if there are several simultaneous processes that need several part of the database. Say, two processes want to insert something in two tables, A and B. Here is a nice scenario:

Process 1 starts with table A and gets a lock on (part of) table A until it's finished.
Process 2 starts with table B and also gets a lock, on (part of) table B until it's finished.

Now we have a deadlock.

But, you say, if I only want to upload a picture, I'm only using one table, right? Yes, you are, but you're also using an index on that table, which will be locked too. And perhaps you're attaching that picture to a mountain at the same time, so you're locking something else that keeps track of these attachments. And so on ...

A modern database system detects deadlocks, and tries to resolve them, for example by simply killing one of the two processes. The program using the database can employ a scheme of repeated attempts to try to get the process go ahead even if it fails at first time. And no matter how good a database is, such a scheme can slow things down.

Apart from deadlocks, some programming patterns can slow things down by acquiring locks when it isn't necessary, making other processes wait. For example, a bad choice would be to acquire a lock upon opening a window in which to create or edit an item. The lock could persist until the change is submitted, the window is closed or the connection is lost - which may take a long time. Contrary to deadlocks, the database system cannot easily detect and resolve a problem like this. At best, after a certain amount of time, the database can kill such a process.
Posted May 2, 2012 6:48 am

Josh LewisRe: The amount of data is not the issue

Josh Lewis

Hasn't voted

Thanks Rgg for your detailed post. This is something you should send to Matt for looking into. Perhaps this could be a solution if something is found... but on another note the amount of data on SP is certainly large just by looking at statistics. If it wasn't the amount of data, then why would Mbpost.com be running at such a speedy rate? I'm not trying to dock down the locking theory you mentioned, but what I'm saying is that at the very least the amount of data effects this situation. A little while back SP had a complete database migration which if I remember right SP is now currently running on a modern server.
Posted May 2, 2012 1:16 pm

MoapaPkentries in the DB, not size

MoapaPk

Hasn't voted

I'd guess MBpost had far fewer entries in the DB of pointers.

The individual entries we delete are usually trivial in size, compared to the photos stored on SP servers. However, we are trying to remove entries in the grand DB. I've done my part, but I'd guess it is a trivial part compared to all the little-used links the DB must hold. Lots of stuff left from people who have gone missing, or people who just don't care.
Posted May 2, 2012 6:02 pm

Josh LewisRe: entries in the DB, not size

Josh Lewis

Hasn't voted

If I read you right, your proving my point more. Let's say everyone did what this article mentions, I know for a fact SummitPost would be faster. A while back Matt cleared out inactive members which for a while SP was sped up quite a bit.
Posted May 21, 2012 7:36 pm

Viewing: 1-11 of 11