The supervisor memory-management fence instruction `SFENCE.VMA` is used to synchronize updates to in-memory memory-management data structures with current execution. Instruction execution causes implicit reads and writes to these data structures; however, these implicit references are ordinarily not ordered with respect to explicit loads and stores. Executing an SFENCE.VMA instruction guarantees that any previous stores already visible to the current RISC-V hart are ordered before certain implicit references by subsequent instructions in that hart to the memory-management data structures. The specific set of operations ordered by SFENCE.VMA is determined by _xs1_ and _xs2_, as described below. SFENCE.VMA is also used to invalidate entries in the address-translation cache associated with a hart (see <<sv32algorithm>>). Further details on the behavior of this instruction are described in <<virt-control>> and <<pmp-vmem>>. [NOTE] ==== The SFENCE.VMA is used to flush any local hardware caches related to address translation. It is specified as a fence rather than a TLB flush to provide cleaner semantics with respect to which instructions are affected by the flush operation and to support a wider variety of dynamic caching structures and memory-management schemes. SFENCE.VMA is also used by higher privilege levels to synchronize page table writes and the address translation hardware. ==== SFENCE.VMA orders only the local hart's implicit references to the memory-management data structures. [NOTE] ==== Consequently, other harts must be notified separately when the memory-management data structures have been modified. One approach is to use 1) a local data fence to ensure local writes are visible globally, then 2) an interprocessor interrupt to the other thread, then 3) a local SFENCE.VMA in the interrupt handler of the remote thread, and finally 4) signal back to originating thread that operation is complete. This is, of course, the RISC-V analog to a TLB shootdown. ==== For the common case that the translation data structures have only been modified for a single address mapping (i.e., one page or superpage), _xs1_ can specify a virtual address within that mapping to effect a translation fence for that mapping only. Furthermore, for the common case that the translation data structures have only been modified for a single address-space identifier, _xs2_ can specify the address space. The behavior of SFENCE.VMA depends on _xs1_ and _xs2_ as follows: * If __xs1__=`x0` and __xs2__=`x0`, the fence orders all reads and writes made to any level of the page tables, for all address spaces. The fence also invalidates all address-translation cache entries, for all address spaces. * If __xs1__=`x0` and __xs2__≠``x0``, the fence orders all reads and writes made to any level of the page tables, but only for the address space identified by integer register _xs2_. Accesses to _global_ mappings (see <<translation>>) are not ordered. The fence also invalidates all address-translation cache entries matching the address space identified by integer register _xs2_, except for entries containing global mappings. * If __xs1__≠``x0`` and __xs2__=`x0`, the fence orders only reads and writes made to leaf page table entries corresponding to the virtual address in __xs1__, for all address spaces. The fence also invalidates all address-translation cache entries that contain leaf page table entries corresponding to the virtual address in _xs1_, for all address spaces. * If __xs1__≠``x0`` and __xs2__≠``x0``, the fence orders only reads and writes made to leaf page table entries corresponding to the virtual address in _xs1_, for the address space identified by integer register _xs2_. Accesses to global mappings are not ordered. The fence also invalidates all address-translation cache entries that contain leaf page table entries corresponding to the virtual address in _xs1_ and that match the address space identified by integer register _xs2_, except for entries containing global mappings. If the value held in _xs1_ is not a valid virtual address, then the SFENCE.VMA instruction has no effect. No exception is raised in this case. When __xs2__≠``x0``, bits SXLEN-1:ASIDMAX of the value held in _xs2_ are reserved for future standard use. Until their use is defined by a standard extension, they should be zeroed by software and ignored by current implementations. Furthermore, if ASIDLEN<ASIDMAX, the implementation shall ignore bits ASIDMAX-1:ASIDLEN of the value held in _xs2_. [NOTE] ==== It is always legal to over-fence, e.g., by fencing only based on a subset of the bits in _xs1_ and/or _xs2_, and/or by simply treating all SFENCE.VMA instructions as having _xs1_=`x0` and/or _xs2_=`x0`. For example, simpler implementations can ignore the virtual address in _xs1_ and the ASID value in _xs2_ and always perform a global fence. The choice not to raise an exception when an invalid virtual address is held in _xs1_ facilitates this type of simplification. ==== An implicit read of the memory-management data structures may return any translation for an address that was valid at any time since the most recent SFENCE.VMA that subsumes that address. The ordering implied by SFENCE.VMA does not place implicit reads and writes to the memory-management data structures into the global memory order in a way that interacts cleanly with the standard RVWMO ordering rules. In particular, even though an SFENCE.VMA orders prior explicit accesses before subsequent implicit accesses, and those implicit accesses are ordered before their associated explicit accesses, SFENCE.VMA does not necessarily place prior explicit accesses before subsequent explicit accesses in the global memory order. These implicit loads also need not otherwise obey normal program order semantics with respect to prior loads or stores to the same address. [NOTE] ==== A consequence of this specification is that an implementation may use any translation for an address that was valid at any time since the most recent SFENCE.VMA that subsumes that address. In particular, if a leaf PTE is modified but a subsuming SFENCE.VMA is not executed, either the old translation or the new translation will be used, but the choice is unpredictable. The behavior is otherwise well-defined. In a conventional TLB design, it is possible for multiple entries to match a single address if, for example, a page is upgraded to a superpage without first clearing the original non-leaf PTE's valid bit and executing an SFENCE.VMA with __xs1__=`x0`. In this case, a similar remark applies: it is unpredictable whether the old non-leaf PTE or the new leaf PTE is used, but the behavior is otherwise well defined. Another consequence of this specification is that it is generally unsafe to update a PTE using a set of stores of a width less than the width of the PTE, as it is legal for the implementation to read the PTE at any time, including when only some of the partial stores have taken effect. *** This specification permits the caching of PTEs whose V (Valid) bit is clear. Operating systems must be written to cope with this possibility, but implementers are reminded that eagerly caching invalid PTEs will reduce performance by causing additional page faults. ==== Implementations must only perform implicit reads of the translation data structures pointed to by the current contents of the `satp` register or a subsequent valid (V=1) translation data structure entry, and must only raise exceptions for implicit accesses that are generated as a result of instruction execution, not those that are performed speculatively. Changes to the `sstatus` fields SUM and MXR take effect immediately, without the need to execute an SFENCE.VMA instruction. Changing `satp`.MODE from Bare to other modes and vice versa also takes effect immediately, without the need to execute an SFENCE.VMA instruction. Likewise, changes to `satp`.ASID take effect immediately. [TIP] ==== The following common situations typically require executing an SFENCE.VMA instruction: * When software recycles an ASID (i.e., reassociates it with a different page table), it should _first_ change `satp` to point to the new page table using the recycled ASID, _then_ execute SFENCE.VMA with __xs1__=`x0` and _xs2_ set to the recycled ASID. Alternatively, software can execute the same SFENCE.VMA instruction while a different ASID is loaded into `satp`, provided the next time `satp` is loaded with the recycled ASID, it is simultaneously loaded with the new page table. * If the implementation does not provide ASIDs, or software chooses to always use ASID 0, then after every `satp` write, software should execute SFENCE.VMA with __xs1__=`x0`. In the common case that no global translations have been modified, _xs2_ should be set to a register other than `x0` but which contains the value zero, so that global translations are not flushed. * If software modifies a non-leaf PTE, it should execute SFENCE.VMA with __xs1__=`x0`. If any PTE along the traversal path had its G bit set, _xs2_ must be `x0`; otherwise, _xs2_ should be set to the ASID for which the translation is being modified. * If software modifies a leaf PTE, it should execute SFENCE.VMA with _xs1_ set to a virtual address within the page. If any PTE along the traversal path had its G bit set, _xs2_ must be `x0`; otherwise, _xs2_ should be set to the ASID for which the translation is being modified. * For the special cases of increasing the permissions on a leaf PTE and changing an invalid PTE to a valid leaf, software may choose to execute the SFENCE.VMA lazily. After modifying the PTE but before executing SFENCE.VMA, either the new or old permissions will be used. In the latter case, a page-fault exception might occur, at which point software should execute SFENCE.VMA in accordance with the previous bullet point. ==== If a hart employs an address-translation cache, that cache must appear to be private to that hart. In particular, the meaning of an ASID is local to a hart; software may choose to use the same ASID to refer to different address spaces on different harts. [NOTE] ==== A future extension could redefine ASIDs to be global across the SEE, enabling such options as shared translation caches and hardware support for broadcast TLB shootdown. However, as OSes have evolved to significantly reduce the scope of TLB shootdowns using novel ASID-management techniques, we expect the local-ASID scheme to remain attractive for its simplicity and possibly better scalability. ==== For implementations that make `satp`.MODE read-only zero (always Bare), attempts to execute an SFENCE.VMA instruction might raise an illegal-instruction exception.
sfence.vma xs1, xs2
Type: