FireWolf Pl.

A Place of Freedom

@FireWolf1 month ago

10/3
19:58
macOS Mojave

Coffee Lake Intel UHD Graphics 630 on macOS Mojave: A compromise solution to the kernel panic due to division by zero in the framebuffer driver

Hi folks! Long time no see.

Finally, I have some time to write this post…

>> Introduction

So recently I was working on installing macOS Mojave on my laptop.
It is equipped with a Core i7 8750H processor and a gorgeous 4K display.

As usual, the first thing right after the installation is to make the integrated graphics card working.

Some of you may have already known that DVMT pre-allocated memory is an “annoying” issue on non-Apple laptop, because Apple has raised the limit as of Broadwell.
(Read the post I published three years ago if you want to know more about this issue.)

Thankfully, the DVMT pre-allocated memory is set to 64 MB by default in BIOS.
This looks reasonable to me since 32 MB DVMT is not enough to power the builtin 4K display.
Besides, I needed to patch the `CoreDisplay` framework to unlock the pixel clock limitation.
So I thought this should be relatively easy, but it turned out that I was “too young and native”.

>> The story

A common `ig-platform-id` used for Intel UHD Graphics 630 is `0x3E9B0000`, so initially I tried to inject it and the system rebooted without showing any error after the `gIOConsoleUser: gIOScreenLockState` line. So probably there was something wrong with the graphics driver.

In order to figure out what happened, I booted the system with injecting an invalid `ig-platform-id` to disable the framebuffer driver, and the desktop successfully showed up with an alert saying something like your computer automatically rebooted due to a problem.

This is an obvious sign of a kernel panic, and I got a detailed report as follows.

 

Anonymous UUID: A6C52317-1B2D-E00D-241C-DBCE7C091990

Sat Sep 29 13:17:09 2018

*** Panic Report ***
panic(cpu 8 caller 0xffffff80004d781d): Kernel trap at 0xffffff7f837537d0, type 0=divide error, registers:
CR0: 0x0000000080010033, CR2: 0xffffff81f645e000, CR3: 0x00000004512ce05c, CR4: 0x00000000003626e0
RAX: 0x017d68fdc0000000, RBX: 0x017d68fdc0000000, RCX: 0x0100000100000000, RDX: 0x0000000000000000
RSP: 0xffffff81e8ba3540, RBP: 0xffffff81e8ba3570, RSI: 0xffffff81e8ba3388, RDI: 0xffffff81c816c000
R8: 0x00000003169e9807, R9: 0x0000000000000000, R10: 0xffffff81c8178d00, R11: 0x0000000000000000
R12: 0x000000001fc8bfd0, R13: 0x0000000000000000, R14: 0xffffff81e8ba3588, R15: 0x0000000009a7ec80
RFL: 0x0000000000010246, RIP: 0xffffff7f837537d0, CS: 0x0000000000000008, SS: 0x0000000000000010
Fault CR2: 0xffffff81f645e000, Error code: 0x0000000000000000, Fault CPU: 0x8, PL: 0, VF: 0

Backtrace (CPU 8), Frame : Return Address
0xffffff81e8ba3010 : 0xffffff80003aba9d 
0xffffff81e8ba3060 : 0xffffff80004e5bd3 
0xffffff81e8ba30a0 : 0xffffff80004d75fa 
0xffffff81e8ba3110 : 0xffffff8000358ca0 
0xffffff81e8ba3130 : 0xffffff80003ab4b7 
0xffffff81e8ba3250 : 0xffffff80003ab303 
0xffffff81e8ba32c0 : 0xffffff80004d781d 
0xffffff81e8ba3430 : 0xffffff8000358ca0 
0xffffff81e8ba3450 : 0xffffff7f837537d0 
0xffffff81e8ba3570 : 0xffffff7f837514bb 
0xffffff81e8ba3990 : 0xffffff7f837296e5 
0xffffff81e8ba3a00 : 0xffffff7f814dd7c6 
0xffffff81e8ba3a40 : 0xffffff7f814dd67b 
0xffffff81e8ba3a90 : 0xffffff8000a83f68 
0xffffff81e8ba3ae0 : 0xffffff7f814e3c79 
0xffffff81e8ba3b30 : 0xffffff8000a8d3ef 
0xffffff81e8ba3c70 : 0xffffff8000492234 
0xffffff81e8ba3d80 : 0xffffff80003b118d 
0xffffff81e8ba3dd0 : 0xffffff800038bb45 
0xffffff81e8ba3e50 : 0xffffff80003a04fe 
0xffffff81e8ba3ef0 : 0xffffff80004bed4b 
0xffffff81e8ba3fa0 : 0xffffff8000359486 
    Kernel Extensions in backtrace:
        com.apple.iokit.IOGraphicsFamily(530.9)@0xffffff7f814c1000->0xffffff7f8150bfff
            dependency: com.apple.iokit.IOPCIFamily(2.9)@0xffffff7f80c95000
        com.apple.driver.AppleIntelCFLGraphicsFramebuffer(12.0.2)@0xffffff7f83711000->0xffffff7f83912fff
            dependency: com.apple.iokit.IOPCIFamily(2.9)@0xffffff7f80c95000
            dependency: com.apple.iokit.IOACPIFamily(1.4)@0xffffff7f80d10000
            dependency: com.apple.iokit.IOAcceleratorFamily2(400.25)@0xffffff7f82ef0000
            dependency: com.apple.iokit.IOReportFamily(47)@0xffffff7f80ddb000
            dependency: com.apple.AppleGraphicsDeviceControl(3.22.18)@0xffffff7f817d8000
            dependency: com.apple.iokit.IOGraphicsFamily(530.9)@0xffffff7f814c1000

We can see that a kernel panic happens because of a divided-by-zero error. The kernel traps at `0xffffff7f837537d0` which is within the range of `AppleIntelCFLGraphicsFramebuffer`.

The memory for `AppleIntelCFLGraphicsFramebuffer` starts at `0xffffff7f83711000`, so we can easily calculate the offset that triggers the panic. So the offset in my case (macOS 10.14.1 beta 1) is `0x427D0`.

After disassembling the binary, we can find that the panic is triggered inside a function named `AppleIntelFramebufferController::SetupDPTimings(AppleIntelFramebuffer*, AppleIntelDisplayPath*, AppleIntelFramebufferController::CRTCParams*)`.

The internal display is connected via embedded DisplayPort (eDP), so apparently something wrong happened during setting up the display.

loc_427cb:
00000000000427cb xor edx, edx
00000000000427cd mov rax, rbx
00000000000427d0 div r13               // Trigger the kernel panic
00000000000427d3 mov rsi, rax
00000000000427d6 cmp rsi, 0x1000000
00000000000427dd jb loc_427e8

>> Diagnose the issue

Now let’s do the research!

From the panic report, we can see that `%r13` is zero at the moment of panic.
In order to figure out the reason, we need to analyze this function and know what’s going on.

So here is the related code snippet I translated.
(I have omitted some debugging or irrelevant code.)

int AppleIntelFramebufferController::SetupDPTimings(AppleIntelFramebufferController* this,
                                                    AppleIntelFramebuffer* framebuffer, 
                                                    AppleIntelDisplayPath* displayPath, 
                                                    AppleIntelFramebufferController::CRTCParams* params)
{
    // Parameters are passed in registers
    // %rdi = arg0 (Hidden `this` pointer)
    // %rsi = arg1 (framebuffer)               // <- Focus on this register
    // %rdx = arg2 (displayPath)
    // %rcx = arg3 (parameters)

    ...<some code at here>...

    rdx = framebuffer->field_0x25bc; // 0x42786: movl 0x25bc(%rsi), %edx
    r13 = r15 * 8;                   // 0x4278c: leal (,%rdx,8), %r13d
    r13 = r13 * r15;                 // 0x42794: imulq %r15, %r13

    if (rdx == 0 || r15 == 0)
    {
        kprintf("[IGFB][ERROR ] fActiveNumberOfLanes = %d, linkSymbolClock = %llu.. 
                 One of them is 0 for FB%x - will lead to DivideByZero panic. 
                 PixelClock = %llu!!!!!\n", rdx, r15, framebuffer->field_0x01dc, r12");
    }

    rsi = rbx / r13;

    ...<some code at here>...
}

So basically, `%r13` is a product of `%r15` and `%rdx`, and Apple adds a warning message saying that a divide-by-zero panic will happen if any one of them is zero. Besides, this message also provides some useful information that `%rdx` holds the number of active lanes and `%r15` holds the link symbol clock (`LS_CLK`).

Take a look at the panic report above again and we can find that `%rdx` has a zero value. Equivalently, the field at `0x25bc` in an instance of `AppleIntelFramebuffer` is zero. So somehow Apple’s graphics driver does not detect the number of active lanes correctly.

Unfortunately, there is not too much information about this field in `SetupDPTimings()`, so we need to analyze the callers who make this function call.

Let’s keep reversing the binary and we can find that it has a caller named `AppleIntelFramebufferController::SetupClocks(AppleIntelFramebuffer*, AppleIntelDisplayPath*, IODetailedTimingInformationV2 const*, AppleIntelFramebufferController::CRTCParams*)`. It is a really huge function, but there are two crucial blocks at `0x4126b` and `0x41652`.

Again, let’s first take a look at the code I translated.

int AppleIntelFramebufferController::SetupClocks(AppleIntelFramebufferController* this,
                                                 AppleIntelFramebuffer* framebuffer, 
                                                 AppleIntelDisplayPath* displayPath, 
                                                 IODetailedTimingInformationV2 const* timingInfoV2, 
                                                 AppleIntelFramebufferController::CRTCParams* params)
{
    ......
    
    // 0x4126b
    // %rbx holds the reference to `framebuffer` now
    // %r15 holds the reference to `displayPath` now
    if (framebuffer->field_0x24ae != 0)                           // 0x4126b: cmpb    $0x0, 0x24ae(%rbx)
    {
        // This branch is not executed
        ......
        
        framebuffer->field_0x25bc = framebuffer->field_0x249d;    // 0x4127f: movzbl  0x249d(%rbx), %eax
                                                                  // 0x41286: movl    %eax, 0x25bc(%rbx)
        ......
    }
    else
    {
        ......
        
        framebuffer->field_0x25bc = framebuffer->field_0x2389;    // 0x4131b: movzbl  0x2389(%rbx), %eax
                                                                  // 0x41322: movl    %eax, 0x25bc(%rbx)
        ......
        
        // 0x4145b
        int retValue = ___idvarGetParam(0xf);                     // prototype: int ___idvarGetParam(int)
        
        if (retValue != 0)
        {
            if (displayPath->[email protected] == 3)
            {
                AppleIntelFramebufferController::SetupOptimalLaneCount(this, 0xf, timingInfoV2);
            }
            else
            {
                arg1->[email protected] = min(retValue, arg1->[email protected]);
            }
        }
        else
        {
            AppleIntelFramebufferController::SetupOptimalLaneCount(this, 0xf, timingInfoV2)
        }
        
        // 0x41491
        if (framebuffer->[email protected] == 0)
        {
            kprintf("[IGFB][ERROR  ] fActiveNumberOfLanes 0 for FB%d.. 
                     ERROR.. Will lead to DivideByZero panic later!!!!\n")
        }
        
        ......
        
    }
}

 

As you can see from the above code snippet, the field at `0x25bc` can be set in several places. After debugging a little bit, I could confirm that the “else” body got executed. So let’s focus on the block at `0x4145b` right now.

There are so many branches. Are you tired of the assembly now? Yes, I am. It was 1AM in the “morning” and I really wanted to solve this graphics issue quickly.

So I stopped figuring out what is going on in `___idvarGetParam()` and other related functions after I found that `SetupOptimalLaneCount()` would be invoked anyway. I can tell you that `SetupOptimalLaneCount()` does some calculations on the number of lanes and saves the result to the field at `0x25bc` and is only called by `SetupClocks()`.

>> Find the solution

Now I think you should realize that `SetupOptimalLaneCount()` somehow sets the number of active lanes to 0. So a quick and dirty “fix” is to set a fixed number of lanes in this function.

// Modified version
int AppleIntelFramebufferController::SetupOptimalLaneCount(AppleIntelFramebuffer* arg0, IODetailedTimingInformationV2 const* arg1)
{
    // Set 4 for 4K internal display, 2 for 1080p or below
    arg1->[email protected] = 4; 
    // Equivalently: 
    // *(int32_t*)( ((int8_t*) arg1) + 0x25bc ) = 4;
    
    // This function always returns 0
    return 0;
}

Clover KextsToPatch (Plain):

macOS Mojave 10.14:, 10.14.1

Comment: Set the number of active lanes to 4 (for laptop with 4K display) (by FireWolf)
Find: 8B96C025 00008A8E 95230000 0FB686
Repl: B8040000 008986BC 25000031 C05DC3
Comment: Set the number of active lanes to 2 (for laptop with 1080p or below display) (by FireWolf)
Find: 8B96C025 00008A8E 95230000 0FB686
Repl: B8020000 008986BC 25000031 C05DC3

Clover KextsToPatch (Plist):

macOS Mojave 10.14, 10.14.1:

<key>KextsToPatch</key>
       <array>   
           <dict>
               <key>Comment</key>
               <string>Set the number of active lanes to 4 (for laptop with 4K display) (by FireWolf)</string>
               <key>Disabled</key>
               <false/>
               <key>Find</key>
               <data>
               i5bAJQAAio6VIwAAD7aG
               </data>
               <key>InfoPlistPatch</key>
               <false/>
               <key>Name</key>
               <string>AppleIntelCFLGraphicsFramebuffer</string>
               <key>Replace</key>
               <data>
               uAQAAACJhrwlAAAxwF3D
               </data>
           </dict>
           <dict>
               <key>Comment</key>
               <string>Set the number of active lanes to 2 (for laptop with 4K display) (by FireWolf)</string>
               <key>Disabled</key>
               <false/>
               <key>Find</key>
               <data>
               i5bAJQAAio6VIwAAD7aG
               </data>
               <key>InfoPlistPatch</key>
               <false/>
               <key>Name</key>
               <string>AppleIntelCFLGraphicsFramebuffer</string>
               <key>Replace</key>
               <data>
               uAIAAACJhrwlAAAxwF3D
               </data>
           </dict>
       </array>

I found a good article talking about it (I still cannot find it, but please check Section 2.5 in the data sheet for 8th Gen Intel Core Processor Families. Specifically, table 2-34 Display Resolution and Link Rate Support on page 51, and Table 2-37 Supported Resolutions for HBR2 (5.4 Gbps) by Link Width on page 52), so it can be either 1 (RBR), 2 (HBR) or 4 (HBR2). I tried to set 2, but I got a black screen, so probably that was not enough for the builtin 4K display. When I set it to 4, the system finally booted with the graphics acceleration and of course without the divide-by-zero kernel panic.

>> This is not the end

Although I managed to light up the internal display and get the graphics acceleration working, this is not the end of this story. As I mentioned above, this is a quick and dirty fix. We still don’t know whether this issue is caused by a bug in Apple’s driver or unlocking the pixel clock. For the latter case, there must be something that causes Apple’s driver failing to detect the correct number of active lanes. Does Apple do this on purpose? Or does this issue only happen on laptops with high resolution display? We don’t know, but I hope this post could be a stepping-stone for further researches. If you are interested in this issue, I have post some notes including my findings and thoughts at the end of this post and hope to give you a good start. ­čÖé

>> Notes, Updates & Thoughts

The framebuffer driver is from macOS Mojave 10.14.1 beta 1.

1. In `SetupOptimalLaneCount()`, the field at `0x25bc` is determined by whether the values of fields at `0x2595` and `0x25c0` are zero or not.

2. The value of the field at `0x25c0` is assigned by the value of the field at `0x2387` in `SetupClocks()` (@0x41315).

3. The field at `0x25c0` is further used to configure the link symbol clock field (`@0x25d0`) in the `allocatePLL()` function.

4. This issue is caused by the driver failing to read DPCD info. Unfortunately, I am busy with my course work and therefore don’t have sufficient time to investigate this issue further at the moment.

5. I have identified the real issue, and I will update it with a new post. A new patch is coming. Stay tuned.

>> Resources

Stock Framebuffer Kexts:

10.14 (18A391): AppleIntelCFLGraphicsFramebuffer_10.14.0_18A391.zip (39 downloads)

10.14.1 Beta 1 (18B45d): AppleIntelCFLGraphicsFramebuffer_10.14.1b1_18B45d.zip (22 downloads)

>> Changelog

1. [2018.10.05] Added binary patches for macOS 10.14 (18A391). The previous patch I post only applies to macOS 10.14.1 beta 1.

2. [2018.10.05] Added stock framebuffer drivers.

3. [2018.10.06] Added a reference to the data sheet for 8th Gen Intel Core Processors.

4. [2018.10.30] Added the general patch that works for both 10.14 and 10.14.1.

5. [2018.11.05] Updated the translated pseudo code and fixed some errors related to function parameters. (My bad, I forgot the implicit `this` argument in C++.)

This is probably the last update of this post, as I will write a new post to talk about the real issue and provide a new patch to solve this.

Coffee Lake Intel UHD Graphics 630 on macOS Mojave: A compromise solution to the kernel panic due to division by zero in the framebuffer driver

    1. FireWolf Write
      Firefox 62Firefox 62Mac OS X 10.13Mac OS X 10.13

      Hi Andgie,

      The kernel panic happens at `0xffffff7f837537d0` and the memory for `AppleIntelCFLGraphicsFramebuffer` starts at `0xffffff7f83711000`, so `0x7f837537d0` – `0x7f83711000` = `0x427D0`.

      Cheers,
      FireWolf

      Reply