Copy-on-Write 机制 | Lantern's 小站

简介

如果有多个调用者同时请求相同的资源(如内存或磁盘上的数据存储)，他们会共同获得相同的指针指向相同的资源，直到某个调用者试图修改资源的内容时，系统才会真正复制一份专用副本（private copy）给该调用者，而其他调用者所见到的最初的资源仍然保持不变。这过程对其他的调用者都是透明的（transparently）。此作法主要的优点是如果调用者没有修改该资源，就不会有副本（private copy）被创建，因此多个调用者只是读取操作时可以共享同一份资源。

fork 中的 COW

fork

fork 系统调用完成后，那么当父进程或者子进程尝试写共享物理页时，内核将拷贝物理页面

写共享内存页

当父进程 A 或子进程 B 任何一方对这些已共享的物理页面执行写操作时，都会产生页面出错异常（page_fault int14）中断，会将flags & FAULT_FLAG_WRITE，然后通过do_page_fault() -> handle_mm_fault() -> handle_pte_fault()调用链解决这个异常。

static vm_fault_t handle_pte_fault(struct vm_fault *vmf)
{
    ...
    if (vmf->flags & FAULT_FLAG_WRITE) {
        if (!pte_write(entry))
            return do_wp_page(vmf);
        entry = pte_mkdirty(entry);
    }
    ...
}

pte_write 会根据 pte_flags(pte) & _PAGE_RW 判断页是否有写保护，这个标记是之前 fork 时 clear 掉的，所以会接着调用do_wp_page

/*
 * This routine handles present pages, when users try to write
 * to a shared page. It is done by copying the page to a new address
 * and decrementing the shared-page counter for the old page.
 * 当用户试图写入共享页面时，此例程处理当前页面。将页面复制到一个新地址并减少旧页面的共享页面计数器。
 * ...
 */
static vm_fault_t do_wp_page(struct vm_fault *vmf)
    __releases(vmf->ptl)
{
    ...
    return wp_page_copy(vmf);
}

/*
 * Handle the case of a page which we actually need to copy to a new page.
 *
 * Called with mmap_sem locked and the old page referenced, but
 * without the ptl held.
 *
 * High level logic flow:
 *
 * - Allocate a page, copy the content of the old page to the new one.
 * - Handle book keeping and accounting - cgroups, mmu-notifiers, etc.
 * - Take the PTL. If the pte changed, bail out and release the allocated page
 * - If the pte is still the way we remember it, update the page table and all
 *   relevant references. This includes dropping the reference the page-table
 *   held to the old page, as well as updating the rmap.
 * - In any case, unlock the PTL and drop the reference we took to the old page.
 */
static vm_fault_t wp_page_copy(struct vm_fault *vmf)