Skip to main content

Hell of sitecore aliases pipeline breaking the site with 500 error

Hello Friends,

I belive this blog post is very important for everyone because, It has some very serious effect on working of your headless website, i will share my experience what we faced and how we resolved it

Issue we started facing

Our site started giving "Key cannot be null or empty" with YSOD like following 



Side affect

Because of this 500 error, Our site pages were showing 500 custom error page intermittently and our MAU (Monthly Active User) drop rate increased.

Sitecore KB

There is already Sitecore KB article talking about this error but the patch which is provided on this link is confusing as well as very huge and it could bring other issues along with it as that upgrade patch also has lot of other things too which i did not want to introduce in our stable CMS.

Known Issues - Retrieving the child items of resource items is not thread-safe

Observation

Though the surfaced exception was looking similar and giving same error and behavior given on this article, We looked closely the inner exception and stack trace where we noticed following in bold

System.ArgumentNullException:   at System.Collections.Concurrent.ConcurrentDictionary`2.TryGetValue (mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089)   at Sitecore.Caching.Generics.Cache`1+InnerBox.DoGetEntry (Sitecore.Kernel, Version=17.0.0.0, Culture=neutral, PublicKeyToken=null)   at Sitecore.Caching.Generics.Cache`1.GetValue (Sitecore.Kernel, Version=17.0.0.0, Culture=neutral, PublicKeyToken=null)   at Sitecore.Caching.Generics.Cache`1.ContainsKey (Sitecore.Kernel, Version=17.0.0.0, Culture=neutral, PublicKeyToken=null)   at Sitecore.Data.DataProviders.Sql.SqlDataProvider.EnsureChildrenPrefetched (Sitecore.Kernel, Version=17.0.0.0, Culture=neutral, PublicKeyToken=null)   at Sitecore.Data.DataProviders.Sql.SqlDataProvider.GetChildIDs (Sitecore.Kernel, Version=17.0.0.0, Culture=neutral, PublicKeyToken=null)   at Sitecore.Data.DataProviders.CompositeDataProvider+<DoGetChildIDs>d__94.MoveNext (Sitecore.Kernel, Version=17.0.0.0, Culture=neutral, PublicKeyToken=null)   at Sitecore.Common.EnumerableExtensions.ForEach (Sitecore.Kernel, Version=17.0.0.0, Culture=neutral, PublicKeyToken=null)   at Sitecore.Data.DataProviders.CompositeDataProvider.GetChildIDs (Sitecore.Kernel, Version=17.0.0.0, Culture=neutral, PublicKeyToken=null)   at Sitecore.Data.DataProviders.DataProvider.GetChildIDs (Sitecore.Kernel, Version=17.0.0.0, Culture=neutral, PublicKeyToken=null)   at Sitecore.Data.DataSource.GetChildIDs (Sitecore.Kernel, Version=17.0.0.0, Culture=neutral, PublicKeyToken=null)   at Sitecore.Nexus.Data.DataCommands.GetChildrenCommand.Execute (Sitecore.Kernel, Version=17.0.0.0, Culture=neutral, PublicKeyToken=null)   at Sitecore.Data.Engines.EngineCommand`2.Execute (Sitecore.Kernel, Version=17.0.0.0, Culture=neutral, PublicKeyToken=null)   at Sitecore.Data.Managers.ItemProvider.GetChildren (Sitecore.Kernel, Version=17.0.0.0, Culture=neutral, PublicKeyToken=null)   at Sitecore.Data.Managers.ItemProvider.GetChildren (Sitecore.Kernel, Version=17.0.0.0, Culture=neutral, PublicKeyToken=null)   at Sitecore.Nexus.Data.DataCommands.ResolvePathCommand.ResolvePath (Sitecore.Kernel, Version=17.0.0.0, Culture=neutral, PublicKeyToken=null)   at Sitecore.Nexus.Data.DataCommands.ResolvePathCommand.ResolvePath (Sitecore.Kernel, Version=17.0.0.0, Culture=neutral, PublicKeyToken=null)   at Sitecore.Nexus.Data.DataCommands.ResolvePathCommand.Execute (Sitecore.Kernel, Version=17.0.0.0, Culture=neutral, PublicKeyToken=null)   at Sitecore.Data.Engines.EngineCommand`2.Execute (Sitecore.Kernel, Version=17.0.0.0, Culture=neutral, PublicKeyToken=null)   at Sitecore.Data.Managers.ItemProvider.GetItem (Sitecore.Kernel, Version=17.0.0.0, Culture=neutral, PublicKeyToken=null)   at Sitecore.Data.Managers.ItemManager.GetItem (Sitecore.Kernel, Version=17.0.0.0, Culture=neutral, PublicKeyToken=null)   at Sitecore.Data.AliasResolver.get_Item (Sitecore.Kernel, Version=17.0.0.0, Culture=neutral, PublicKeyToken=null)   at Sitecore.Data.AliasResolver.Exists (Sitecore.Kernel, Version=17.0.0.0, Culture=neutral, PublicKeyToken=null)   at Sitecore.Pipelines.HttpRequest.AliasResolver.Process (Sitecore.Kernel, Version=17.0.0.0, Culture=neutral, PublicKeyToken=null)   at n/a (Sitecore.Kernel, Version=17.0.0.0, Culture=neutral, PublicKeyToken=null)   at Sitecore.Pipelines.CorePipeline.Run (Sitecore.Kernel, Version=17.0.0.0, Culture=neutral, PublicKeyToken=null)   at Sitecore.Pipelines.DefaultCorePipelineManager.Run (Sitecore.Kernel, Version=17.0.0.0, Culture=neutral, PublicKeyToken=null)   at Sitecore.Pipelines.DefaultCorePipelineManager.Run (Sitecore.Kernel, Version=17.0.0.0, Culture=neutral, PublicKeyToken=null)   at Sitecore.Web.RequestEventsHandler.OnPostAuthenticateRequest (Sitecore.Kernel, Version=17.0.0.0, Culture=neutral, PublicKeyToken=null)   at System.Web.HttpApplication+SyncEventExecutionStep.System.Web.HttpApplication.IExecutionStep.Execute (System.Web, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a)   at System.Web.HttpApplication.ExecuteStepImpl (System.Web, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a)

   at System.Web.HttpApplication.ExecuteStep (System.Web, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a) 

I knew the function of aliases resolver of Sitecore and why it exists, but did not know why that pipeline even executing if we do not have any aliases defined in Sitecore, So i was little surprised with this, Even if it is running, It should just exist because there are no aliases defined in sitecore.

So i did further digging to know what is happening behind the scene, Here are the findings 

1. What is the "AliasResolver"?

The AliasResolver is a standard processor located in the <httpRequestBegin> pipeline. Its primary job is to look at the incoming URL path and determine if it matches a predefined Sitecore Alias (configured under /sitecore/system/Aliases).

If a match is found, it maps that pretty/short URL to the actual content item path in the tree and sets it as Context.Item.


2. Why is it giving "Value Cannot Be Null (Parameter Name: Key)"?

The Layout Service Multi-threading: When your Headless/JSS application hammers the /sitecore/api/layout/render/jss endpoint, concurrent async requests cross paths in the ASP.NET Core  .NET  pipeline.

and our observation also reavelad that, this error is only coming when the request is of  "/sitecore/api/layout/render/jss"



3. We are close, but what is the issues and how to resolve it?

The Shared Resources Cache: To resolve aliases, the AliasResolver safely checks the cache or queries the child collection of the aliases root. In Sitecore 10.2, Sitecore moved several system items (including templates and system settings) into Read-Only Resource Files (.dat files on disk) to speed up performance.

The Dictionary Race Condition: When multiple concurrent Layout Service threads attempt to resolve items or read the children of these resource-backed elements at the exact same time, a race condition occurs within an internal collection (such as PrefetchData or Dictionary).

The Crash: One thread corrupts the internal array or returns a null value where a key string or ID was strictly expected. When the concurrent thread picks it up, the code drops a low-level .NET ArgumentNullException: Value cannot be null. Parameter name: key (or an IndexOutOfRangeException), bubbles up through the AliasResolver, and throws a 500 Internal Server Error.

Solution

There are three solutions to this issue, And each solution depends on what kind of issues you are running into and as dictionary race condition could come without aliases too, so you will need to observe your stack trace before you apply any of below 

If you are using aliases in your application, You can create a custom AliasResolver processor that immediately aborts processing if the current request is directed at the Layout Service endpoint.

Approach - 1: Write a Custom Resolver

using Sitecore.Pipelines.HttpRequest;

using System;

namespace YourNamespace.Pipelines.HttpRequest

{

    public class CustomAliasResolver : AliasResolver

    {

        public override void Process(HttpRequestArgs args)

        {

            // Abort immediately if this is a JSS Layout Service call

            if (args.Url.FilePath.StartsWith("/sitecore/api/layout/render/jss", StringComparison.OrdinalIgnoreCase))

            {

                return;

            }

            // Otherwise, fall back to standard Sitecore Alias resolution

            base.Process(args);

        }

    }

}

And patch it in via Configuration

Replace the default AliasResolver with your newly optimized class:

<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/">

  <sitecore>

    <pipelines>

      <httpRequestBegin>

        <processor type="Sitecore.Pipelines.HttpRequest.AliasResolver, Sitecore.Kernel">

          <patch:attribute name="type">YourNamespace.Pipelines.HttpRequest.CustomAliasResolver, YourAssemblyName</patch:attribute>

        </processor>

      </httpRequestBegin>

    </pipelines>

  </sitecore>

</configuration>

Approach - 2: Delete the AliasResolver pipeline completely using patch, if you are not using aliases

<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/">

  <sitecore>

    <pipelines>

      <httpRequestBegin>

        <processor type="Sitecore.Pipelines.HttpRequest.AliasResolver, Sitecore.Kernel">

          <patch:delete />

        </processor>

      </httpRequestBegin>

    </pipelines>

  </sitecore>

</configuration>

Approach - 3: Upgrade to Sitecore newer version or patch

Our scenario was different as we were getting clear alias pipeline stack trace, but if you observe same error either in content management instance or on content delivery with stack trace given on below link, Please update to the patch given in Sitecore KB below 

https://support.sitecore.com/kb?id=kb_article_view&sysparm_article=KB1001823

I observed the site for 24 hours after this patch, and no 500 errors, and a happy customer and we observed that drop rate was decreased and site started functioning normally and MAU increased.

BTW - i have raised the feature requests about changing the pipeline so that it should only execute code for resource files and overwhelm the race condition if aliases are used, if they are not used, it should just work without the upgrade or patch.

Comments

Popular posts from this blog

Zero to Hero - A real life RCA of exact issue in Sitecore Managed Cloud environment

Hello All, The purpose of today's post is to share a real life burning and escalated scenario which was new to me and how did I approach it and how big the escalations were and what was the outcome Sitecore's goodwill was at stack not because Sitecore is not capable of handling it but just because our environment was Sitecore Managed Cloud, and any issue that comes if its infra, back end code, front end code will be first pointed as Sitecore issue and that is where our consultancy and experience will play a role to prove that it is not Sitecore issue.  Issue we faced Out of the blue our site started giving "504 Gateway Time-out", and it was reported that almost everyone is getting this error, but when we used to browse the site, everything looked good and never 504. 504 Gateway Time-out error tells that, That the request went to Content Delivery servers of Sitecore from gateway, but gateway did not get response in time from those CDs and hence it gave time out error. ...

401.1 Unauthorized with windows authentication error code 0xc000006d

How many of you have faced this hosting issue when you do everything what it takes to run the site with windows authentication but still you are getting the same error again and again? If you think you also have faced the same issue and you tired of reading MSDN KBs for it and still have not found the issue (If KB has solved the issue, well and good, if not you can try this trick),Please Read below Typical scenario In typical hosting with IIS, i did every possible things like enabling windows authentication, changing it in web.config, configuring connection pool, authorization rules, it asks me for window authentication login and despite of entering correct credentials it always fails and keeps on asking for login, and when pressed cancel it gives 401.1 with 0xc000006d error code Solution (Which worked for me at-least after trying for almost 6-9 hrs) You need to change the Loop Back Check in registry so that it allows the host names which you are giving in url are allowed and au...

Sitecore SXA "Add here" button not available and not able to move components from experience editor

 Hi Folks, Hope you all are doing just fine and getting yourself vaccinated and staying home, Today i would like to share one information or a scenario which we encountered last week. Scenario We are using Sitecore SXA (Version is not important here as it will same for all), and one of our content author logged a bug that, they neither able to see "Add here" button on specific place holder nor they are able to move the component on the page in different location, Move option was disabled, and they wanted to have a freedom of moving component here and there and also "Add here" is a vital functionality to add component to specific place holders.   Troubleshooting With my surprise as i have never seen this behavior before,  1) I first check their access rights but they had all of needed permissions and they were able to do things normally before. 1) I started looking into place holder restriction and different settings of allowed renderings to see if there are no such ...