Memory Copy Forward-only, reads and writes unprivileged, reads non-temporal. These instructions perform a memory copy. The prologue, main, and epilogue instructions are expected to be run in succession and to appear consecutively in memory: CPYFPTRN, then CPYFMTRN, and then CPYFETRN.
CPYFPTRN performs some preconditioning of the arguments suitable for using the CPYFMTRN instruction, and performs an implementation defined amount of the memory copy. CPYFMTRN performs an implementation defined amount of the memory copy. CPYFETRN performs the last part of the memory copy.
The inclusion of implementation defined amounts of memory copy allows some optimization of the size that can be performed.
The memory copy performed by these instructions is in the forward direction only, so the instructions are suitable for a memory copy only where there is no overlap between the source and destination locations, or where the source address is greater than the destination address.
The architecture supports two algorithms for the memory copy: option A and option B. Which algorithm is used is implementation defined.
Portable software should not assume that the choice of algorithm is constant.
After execution of CPYFPTRN, option A (which results in encoding PSTATE.C = 0):
After execution of CPYFPTRN, option B (which results in encoding PSTATE.C = 1):
For CPYFMTRN, option A (encoded by PSTATE.C = 0), the format of the arguments is:
For CPYFMTRN, option B (encoded by PSTATE.C = 1), the format of the arguments is:
For CPYFETRN, option A (encoded by PSTATE.C = 0), the format of the arguments is:
For CPYFETRN, option B (encoded by PSTATE.C = 1), the format of the arguments is:
31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
sz | 0 | 1 | 1 | 0 | 0 | 1 | op1 | 0 | Rs | 1 | 0 | 1 | 1 | 0 | 1 | Rn | Rd | ||||||||||||||
op2 |
if !HaveFeatMOPS() || sz != '00' then UNDEFINED; integer d = UInt(Rd); integer s = UInt(Rs); integer n = UInt(Rn); bits(4) options = op2; boolean rnontemporal = options<3> == '1'; boolean wnontemporal = options<2> == '1'; MOPSStage stage; case op1 of when '00' stage = MOPSStage_Prologue; when '01' stage = MOPSStage_Main; when '10' stage = MOPSStage_Epilogue; otherwise SEE "Memory Copy and Memory Set"; CheckMOPSEnabled(); if s == n || s == d || n == d || d == 31 || s == 31 || n == 31 then Constraint c = ConstrainUnpredictable(Unpredictable_MOPSOVERLAP31); assert c IN {Constraint_UNDEF, Constraint_NOP}; case c of when Constraint_UNDEF UNDEFINED; when Constraint_NOP EndOfInstruction();
For information about the constrained unpredictable behavior of this instruction, see Architectural Constraints on UNPREDICTABLE behaviors, and particularly Memory Copy and Memory Set CPY*.
integer N = MaxBlockSizeCopiedBytes(); bits(64) toaddress = X[d, 64]; bits(64) fromaddress = X[s, 64]; bits(64) cpysize = X[n, 64]; bits(4) nzcv = PSTATE.<N,Z,C,V>; bits(8*N) readdata; integer B; boolean implements_option_a = CPYFOptionA(); boolean rprivileged = if options<1> == '1' then AArch64.IsUnprivAccessPriv() else PSTATE.EL != EL0; boolean wprivileged = if options<0> == '1' then AArch64.IsUnprivAccessPriv() else PSTATE.EL != EL0; AccessDescriptor raccdesc = CreateAccDescMOPS(MemOp_LOAD, rprivileged, rnontemporal); AccessDescriptor waccdesc = CreateAccDescMOPS(MemOp_STORE, wprivileged, wnontemporal); if stage == MOPSStage_Prologue then if cpysize<63> == '1' then cpysize = 0x7FFFFFFFFFFFFFFF<63:0>; if implements_option_a then nzcv = '0000'; // Copy in the forward direction offsets the arguments. toaddress = toaddress + cpysize; fromaddress = fromaddress + cpysize; cpysize = Zeros(64) - cpysize; else nzcv = '0010'; else CheckMemCpyParams(stage, implements_option_a, nzcv, options, d, s, n, toaddress, fromaddress, cpysize); bits(64) stagecpysize = MemCpyStageSize(stage, toaddress, fromaddress, cpysize); if implements_option_a then while SInt(stagecpysize) != 0 do // IMP DEF selection of the block size that is worked on. While many // implementations might make this constant, that is not assumed. B = CPYSizeChoice(toaddress, fromaddress, cpysize); assert B <= -1 * SInt(stagecpysize); readdata<B*8-1:0> = Mem[fromaddress+cpysize, B, raccdesc]; Mem[toaddress+cpysize, B, waccdesc] = readdata<B*8-1:0>; cpysize = cpysize + B; stagecpysize = stagecpysize + B; if stage != MOPSStage_Prologue then X[n, 64] = cpysize; else while UInt(stagecpysize) > 0 do // IMP DEF selection of the block size that is worked on. While many // implementations might make this constant, that is not assumed. B = CPYSizeChoice(toaddress, fromaddress, cpysize); assert B <= UInt(stagecpysize); readdata<B*8-1:0> = Mem[fromaddress, B, raccdesc]; Mem[toaddress, B, waccdesc] = readdata<B*8-1:0>; fromaddress = fromaddress + B; toaddress = toaddress + B; cpysize = cpysize - B; stagecpysize = stagecpysize - B; if stage != MOPSStage_Prologue then X[n, 64] = cpysize; X[d, 64] = toaddress; X[s, 64] = fromaddress; if stage == MOPSStage_Prologue then X[n, 64] = cpysize; X[d, 64] = toaddress; X[s, 64] = fromaddress; PSTATE.<N,Z,C,V> = nzcv;
Internal version only: isa v33.64, AdvSIMD v29.12, pseudocode v2023-06_rel, sve v2023-06_rel ; Build timestamp: 2023-07-04T19:42
Copyright © 2010-2023 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.