Hello Friends,
I belive this blog post is very important for everyone because, It has some very serious effect on working of your headless website, i will share my experience what we faced and how we resolved it
Issue we started facing
Our site started giving "Key cannot be null or empty" with YSOD like following
Side affect
Because of this 500 error, Our site pages were showing 500 custom error page intermittently and our MAU (Monthly Active User) drop rate increased.
Sitecore KB
There is already Sitecore KB article talking about this error but the patch which is provided on this link is confusing as well as very huge and it could bring other issues along with it as that upgrade patch also has lot of other things too which i did not want to introduce in our stable CMS.
Known Issues - Retrieving the child items of resource items is not thread-safe
Observation
Though the surfaced exception was looking similar and giving same error and behavior given on this article, We looked closely the inner exception and stack trace where we noticed following in bold
System.ArgumentNullException: at System.Collections.Concurrent.ConcurrentDictionary`2.TryGetValue (mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089) at Sitecore.Caching.Generics.Cache`1+InnerBox.DoGetEntry (Sitecore.Kernel, Version=17.0.0.0, Culture=neutral, PublicKeyToken=null) at Sitecore.Caching.Generics.Cache`1.GetValue (Sitecore.Kernel, Version=17.0.0.0, Culture=neutral, PublicKeyToken=null) at Sitecore.Caching.Generics.Cache`1.ContainsKey (Sitecore.Kernel, Version=17.0.0.0, Culture=neutral, PublicKeyToken=null) at Sitecore.Data.DataProviders.Sql.SqlDataProvider.EnsureChildrenPrefetched (Sitecore.Kernel, Version=17.0.0.0, Culture=neutral, PublicKeyToken=null) at Sitecore.Data.DataProviders.Sql.SqlDataProvider.GetChildIDs (Sitecore.Kernel, Version=17.0.0.0, Culture=neutral, PublicKeyToken=null) at Sitecore.Data.DataProviders.CompositeDataProvider+<DoGetChildIDs>d__94.MoveNext (Sitecore.Kernel, Version=17.0.0.0, Culture=neutral, PublicKeyToken=null) at Sitecore.Common.EnumerableExtensions.ForEach (Sitecore.Kernel, Version=17.0.0.0, Culture=neutral, PublicKeyToken=null) at Sitecore.Data.DataProviders.CompositeDataProvider.GetChildIDs (Sitecore.Kernel, Version=17.0.0.0, Culture=neutral, PublicKeyToken=null) at Sitecore.Data.DataProviders.DataProvider.GetChildIDs (Sitecore.Kernel, Version=17.0.0.0, Culture=neutral, PublicKeyToken=null) at Sitecore.Data.DataSource.GetChildIDs (Sitecore.Kernel, Version=17.0.0.0, Culture=neutral, PublicKeyToken=null) at Sitecore.Nexus.Data.DataCommands.GetChildrenCommand.Execute (Sitecore.Kernel, Version=17.0.0.0, Culture=neutral, PublicKeyToken=null) at Sitecore.Data.Engines.EngineCommand`2.Execute (Sitecore.Kernel, Version=17.0.0.0, Culture=neutral, PublicKeyToken=null) at Sitecore.Data.Managers.ItemProvider.GetChildren (Sitecore.Kernel, Version=17.0.0.0, Culture=neutral, PublicKeyToken=null) at Sitecore.Data.Managers.ItemProvider.GetChildren (Sitecore.Kernel, Version=17.0.0.0, Culture=neutral, PublicKeyToken=null) at Sitecore.Nexus.Data.DataCommands.ResolvePathCommand.ResolvePath (Sitecore.Kernel, Version=17.0.0.0, Culture=neutral, PublicKeyToken=null) at Sitecore.Nexus.Data.DataCommands.ResolvePathCommand.ResolvePath (Sitecore.Kernel, Version=17.0.0.0, Culture=neutral, PublicKeyToken=null) at Sitecore.Nexus.Data.DataCommands.ResolvePathCommand.Execute (Sitecore.Kernel, Version=17.0.0.0, Culture=neutral, PublicKeyToken=null) at Sitecore.Data.Engines.EngineCommand`2.Execute (Sitecore.Kernel, Version=17.0.0.0, Culture=neutral, PublicKeyToken=null) at Sitecore.Data.Managers.ItemProvider.GetItem (Sitecore.Kernel, Version=17.0.0.0, Culture=neutral, PublicKeyToken=null) at Sitecore.Data.Managers.ItemManager.GetItem (Sitecore.Kernel, Version=17.0.0.0, Culture=neutral, PublicKeyToken=null) at Sitecore.Data.AliasResolver.get_Item (Sitecore.Kernel, Version=17.0.0.0, Culture=neutral, PublicKeyToken=null) at Sitecore.Data.AliasResolver.Exists (Sitecore.Kernel, Version=17.0.0.0, Culture=neutral, PublicKeyToken=null) at Sitecore.Pipelines.HttpRequest.AliasResolver.Process (Sitecore.Kernel, Version=17.0.0.0, Culture=neutral, PublicKeyToken=null) at n/a (Sitecore.Kernel, Version=17.0.0.0, Culture=neutral, PublicKeyToken=null) at Sitecore.Pipelines.CorePipeline.Run (Sitecore.Kernel, Version=17.0.0.0, Culture=neutral, PublicKeyToken=null) at Sitecore.Pipelines.DefaultCorePipelineManager.Run (Sitecore.Kernel, Version=17.0.0.0, Culture=neutral, PublicKeyToken=null) at Sitecore.Pipelines.DefaultCorePipelineManager.Run (Sitecore.Kernel, Version=17.0.0.0, Culture=neutral, PublicKeyToken=null) at Sitecore.Web.RequestEventsHandler.OnPostAuthenticateRequest (Sitecore.Kernel, Version=17.0.0.0, Culture=neutral, PublicKeyToken=null) at System.Web.HttpApplication+SyncEventExecutionStep.System.Web.HttpApplication.IExecutionStep.Execute (System.Web, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a) at System.Web.HttpApplication.ExecuteStepImpl (System.Web, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a)at System.Web.HttpApplication.ExecuteStep (System.Web, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a)
I knew the function of aliases resolver of Sitecore and why it exists, but did not know why that pipeline even executing if we do not have any aliases defined in Sitecore, So i was little surprised with this, Even if it is running, It should just exist because there are no aliases defined in sitecore.
So i did further digging to know what is happening behind the scene, Here are the findings
1. What is the "AliasResolver"?
The AliasResolver is a standard processor located in the <httpRequestBegin> pipeline. Its primary job is to look at the incoming URL path and determine if it matches a predefined Sitecore Alias (configured under /sitecore/system/Aliases).
If a match is found, it maps that pretty/short URL to the actual content item path in the tree and sets it as Context.Item.
2. Why is it giving "Value Cannot Be Null (Parameter Name: Key)"?
The Layout Service Multi-threading: When your Headless/JSS application hammers the /sitecore/api/layout/render/jss endpoint, concurrent async requests cross paths in the ASP.NET Core .NET pipeline.
and our observation also reavelad that, this error is only coming when the request is of "/sitecore/api/layout/render/jss"
3. We are close, but what is the issues and how to resolve it?
The Shared Resources Cache: To resolve aliases, the AliasResolver safely checks the cache or queries the child collection of the aliases root. In Sitecore 10.2, Sitecore moved several system items (including templates and system settings) into Read-Only Resource Files (.dat files on disk) to speed up performance.
The Dictionary Race Condition: When multiple concurrent Layout Service threads attempt to resolve items or read the children of these resource-backed elements at the exact same time, a race condition occurs within an internal collection (such as PrefetchData or Dictionary).
The Crash: One thread corrupts the internal array or returns a null value where a key string or ID was strictly expected. When the concurrent thread picks it up, the code drops a low-level .NET ArgumentNullException: Value cannot be null. Parameter name: key (or an IndexOutOfRangeException), bubbles up through the AliasResolver, and throws a 500 Internal Server Error.
Solution
There are three solutions to this issue, And each solution depends on what kind of issues you are running into and as dictionary race condition could come without aliases too, so you will need to observe your stack trace before you apply any of below
If you are using aliases in your application, You can create a custom AliasResolver processor that immediately aborts processing if the current request is directed at the Layout Service endpoint.
Approach - 1: Write a Custom Resolver
using Sitecore.Pipelines.HttpRequest;
using System;
namespace YourNamespace.Pipelines.HttpRequest
{
public class CustomAliasResolver : AliasResolver
{
public override void Process(HttpRequestArgs args)
{
// Abort immediately if this is a JSS Layout Service call
if (args.Url.FilePath.StartsWith("/sitecore/api/layout/render/jss", StringComparison.OrdinalIgnoreCase))
{
return;
}
// Otherwise, fall back to standard Sitecore Alias resolution
base.Process(args);
}
}
}
And patch it in via Configuration
Replace the default AliasResolver with your newly optimized class:
<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/">
<sitecore>
<pipelines>
<httpRequestBegin>
<processor type="Sitecore.Pipelines.HttpRequest.AliasResolver, Sitecore.Kernel">
<patch:attribute name="type">YourNamespace.Pipelines.HttpRequest.CustomAliasResolver, YourAssemblyName</patch:attribute>
</processor>
</httpRequestBegin>
</pipelines>
</sitecore>
</configuration>
Approach - 2: Delete the AliasResolver pipeline completely using patch, if you are not using aliases
<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/">
<sitecore>
<pipelines>
<httpRequestBegin>
<processor type="Sitecore.Pipelines.HttpRequest.AliasResolver, Sitecore.Kernel">
<patch:delete />
</processor>
</httpRequestBegin>
</pipelines>
</sitecore>
</configuration>
Approach - 3: Upgrade to Sitecore newer version or patch
Our scenario was different as we were getting clear alias pipeline stack trace, but if you observe same error either in content management instance or on content delivery with stack trace given on below link, Please update to the patch given in Sitecore KB below
https://support.sitecore.com/kb?id=kb_article_view&sysparm_article=KB1001823
I observed the site for 24 hours after this patch, and no 500 errors, and a happy customer and we observed that drop rate was decreased and site started functioning normally and MAU increased.
BTW - i have raised the feature requests about changing the pipeline so that it should only execute code for resource files and overwhelm the race condition if aliases are used, if they are not used, it should just work without the upgrade or patch.

Comments
Post a Comment