This morning we ran into a really weird issue with SharePoint 2013. Out of nowhere, our intranet SharePoint portal would not load. It just timed out. After two minutes, you would see the "Sorry, something went wrong" text and a correlation ID. When I looked up that ID in the logs folder, I saw that /default.aspx was timing out.
My first thought is that it was a misbehaving web part, so I tried to load the web part editing page with the ?contents=1 in the URL trick. That also timed out. I did some research and found you can extend the timeout by modifying executionTimeout on the httpRuntime tag in the web.config. I set it to 24 hours. Nothing was going to time out now!
Well, after reloading the page and waiting about five minutes, the page did finally finish loading. But the quick launch just said "error." I couldn't find any information on the internet or in the logs what was going on, but at least now I had it narrowed down to (probably) the Quick Launch. We also were able to access all the sub-sites that had different Quick Launches. The evidence was growing.
I tried to go into the site settings navigation section. It took nearly an hour and I just gave up. I wondered if I could peek into the Quick Launch with a SQL query. It was at this time that my colleague informed me that the other day he noticed entries duplicating in the Quick Launch any time one user published a particular document. He didn't know why and since it was just a visual issue and not pressing, he left it alone. Could this be a clue?
Well, I poked around in the SharePoint contentdb for the site. Surprisingly, the Quick Launch is actually somewhat simple to query. You can SELECT * from the NavNodes table and there you go. When I did this, there were over 3500 rows. That's a lot for a navigation bar, but if you think about, it's not an obscenely high number. But who knows what sort of joins this table has with other tables as a matter of course in SharePoint operation. I scrolled through the results and one entry kept repeating. Yup, the same one my colleague had seen the other day. Only instead of repeating twice or four times, it was repeating thousands of times.
This sure seems suspect. An entry repeated over and over in the Quick Launch and the Quick Launch generating an error after timing out the page? That could be the culprit. But how do we get rid of these bad entries? Deleting directly from the database is not a Microsoft supported scenario (at least not without their technicians performing the deletion). Time was of the essence as the site was down. So I took a backup of the contentdb and threw caution to the wind and deleted all the rows that matched the repeated entry's criteria.
It was time to load the website. About five to ten tense seconds later, the site loaded. Even better, every reload after that took no time at all! We fixed it!
One strange issue after the page loaded was that some of the repeated entries were still in the Quick Launch, but only about 30 this time. They were easy to remove from the Quick Launch using the UI. I suspect they were left over from either a cache or some other table is involved in constructing the Quick Launch (maybe the most recently visited pages list?).
Anyway, now everything seems to be back to normal. Carry on, SharePoint, you crazy beast.