Tuesday, March 28, 2017

Yet Another ADFS Looping Issue

I recently applied the Dynamics 365 Update to a CRM 2016 Service Pack 1 environment that was setup for IFD and ran into some unexpected behavior. This environment had about 15 organization setup in it. When trying to log on using the external URLs (e.g. orgname.domain.com) access worked without issue but using the internal URL for any organization (e.g. crm.domain.com/orgname) CRM and ADFS would run in a continuous loop. Sometimes it would error out, sometimes it would just keep looping forever.

Started off by checking event viewers on both the CRM and ADFS server. I noticed that on the times it did error out after a few loops instead of looping nonstop, it resulted in an event log similar to one I saw before caused by a bug with the 0.1 Update for CRM 2016 (http://blog.gagepennisi.com/2016/04/crm-2016-update-01-bug-with-adfs-for.html):

MSIS7042: The same client browser session has made '6' requests in the last '17' seconds. Contact your administrator for details.

So immediately I thought “Great, here we go again.” There was also nothing in the CRM event viewer.

Hours of trying different different troubleshooting tactics for similar situations (including this one with the same looping behavior but with the external URL - http://blog.gagepennisi.com/2016/01/adfs-logon-page-loop-issue.html) yielded nothing. Rebuilt the Relying Party Trusts, reconfigured IFD, created a self-signed certificate to verify it wasn’t a certificate issue, etc…

Instead of keeping on with the guess and check, I realized I needed to get more information around the problem so I started with a Fiddler trace – great web debugging tool, if you haven’t heard of it (http://www.telerik.com/fiddler). Unfortunately, in this case all it did was confirm the behavior I was seeing in the browser which was a constant redirect between D365 and ADFS.


After that, I decided to run a platform trace on the CRM server to see if CRM would give me any insight but nothing really stood out to me. Almost at my wit’s end, I decided to rope in my colleague Dan Francis to get another set of eyes on it. After some review of the platform trace he noticed one little line that seemed a bit odd:

>Multi-org sharable cache loading system and non-system metadata with build number 7.0.0.3543 and language 1033

CRM 2016 is version 8.0 and D365 is 8.2 so the fact that we were seeing any refernce to 7.0.X.XXXX (CRM 2015) was not inline with what was to be expected. We checked the Deployment Manger and realized there was an old, disabled organization from CRM 2015 that was not upgraded (because it was disabled at the time of the original CRM 2016 upgrade). After some deliberation, we decided to remove that organization from the deployment and boom! The internal URL began working as expected!

Basically what we were able to surmise from this is that when the internal URL is being used, all organizations – whether enabled or disabled – are checked for versioning and the lowest build number is used to load this “multi-org sharable cache” and apparently D365 doesn’t play nicely with older versions hanging around. Moral of the story – update/upgrade all of your orgs or just remove them.

Big thanks to Dan the Man!