86BUGS.LST revision 1.0 By Harald Feldmann

This file lists undocumented and buggy instructions of the Intel 80x86 family of processors. Some of the information was obtained from the book "Programmer's technical reference, the processor and coprocessor; by Robert L. Hummel; Ziff davis press. ISBN 1-56276-016-5 Which is highly recommended. Note that Intel does not support the special features and may decide to drop opcode variants and instructions in future products.

Undocumented instructions and undocumented features of Intel and IIT processors:


    This instruction regularly performs the following action:
	- unpacked BCD in AX	  example (AX = 0104h)
	- AL = AH * 10d + AL		  (AL = 0eh )
	- AH = 00			  (AH = 00h )
    The normal opcode decodes as follows: d5,0a
    The instruction itself is an instruction plus operand. By
    replacing the second byte with any number in the range 00 -
    ff we can build our own instruction AAD for various number
    systems in those ranges. For example by coding d5,10 we
    achieve an instruction that performs: AL = AH * 16d + AL.
    Note: the variant is not supported on all 80x86-compatible
    CPUs, notably the NEC V-series, because some hard-code the
    divisor at 0Ah


    This instruction regularly performs the following action:
	- binary number in AL
	- AH = AL / 10d
	- AL = AL MOD 10d
    Thus creating an unpacked BCD in AX.
    The normal opcode decodes as follows: d4,0a
    The instruction itself is an instruction plus operand. By
    replacing the second byte with any number in the range 00 -
    ff we can build our own instruction AAM for various number
    systems in that range. For example by coding d4,07 we
    achieve an instruction that performs: AH = AL / 07d, AL = AL
    MOD 07d
    The AAD and AAM opcode variants have been found in Future
    Domain SCSI controller ROMS.

LOADALL: OPCODE: 0f,05 (i80286) & 0f,07 (i80386 & i80486)

    Load _ALL_ processor registers. Does exactly as the name
    suggests, separate versions for i80286 and i80386 exist. The
    i80286 LOADALL instruction reads a block of 102 bytes into
    the chip, starting at address 000800 hex. The i80286 LOADALL
    takes 195 clocks to execute.
    The sequence is as follows (Hex address, Bytes, Register):
	0800: 6 N/A
	0806: 2 MSW (Machine Status Word)
	0808: 14 N/A
	0816: 2 TR (Task Register)
	0818: 2 FLAGS (Flags)
	081a: 2 IP (Instruction Pointer)
	081c: 2 LDT (Local Descriptor Table)
	081e: 2 DS (Data Segment)
	0820: 2 SS (Stack Segment)
	0822: 2 CS (Code Segment)
	0824: 2 ES (Extra Segment)
	0826: 2 DI (Destination Index)
	0828: 2 SI (Source Index)
	082a: 2 BP (Base Pointer)
	082c: 2 SP (Stack Pointer)
	082e: 2 BX (BX register)
	0830: 2 DX (DX register)
	0832: 2 CX (CX register)
	0834: 2 AX (AX register)
	0836: 6 ES cache (ES descriptor _cache_)
	083c: 6 CS cache (CS descriptor _cache_)
	0842: 6 SS cache (SS descriptor _cache_)
	0848: 6 DS cache (DS descriptor _cache_)
	084e: 6 GDTR (Global Descriptor Table)
	0854: 6 LDT cache (Local Descriptor_cache_)
	085a: 6 IDTR (Interrupt Descriptor table)
	0860: 6 TSS cache (Task State Segment _cache_)
    Descriptor cache entries are internal copies of the
    original registers (the LDT cache is normally a copy of the
    last regularly _loaded_ LDT). Note that after executing
    LOADALL, the chip will use the _cache_ registers without
    re-checking the caches against the regular registers. That
    means that cache and register do not have to be the same.
    Caches are updated when the original register is loaded
    again. Both will then contain the same value.
    Descriptor caches layout:
    3 bytes	   24 bit physical address of segment
    1 byte	   access rights byte, mapped as access right
		   byte in a regular descriptor. The present
		   bit now represents a valid bit. If this bit
		   is cleared (zero) the segment is invalid and
		   accessing it will trigger exception 0dh. The
		   DPL (Descriptor Privilege Level) fields of
		   the CS and SS descriptor caches determine
		   the CPL (Current Privilege Level).
    2 bytes	   16 bit segment limit.
    This layout is the same for the GDTR and IDTR registers,
    except that the access rights byte must be zero.
    i80386 LOADALL:
    The i80386 variant loads 204 (dec) bytes from the address at
    ES:EDI and resumes execution in the specified state.
    No timing information available.
    relative offset: Bytes:   Registers:
	0000:	     4	      CR0
	0004:	     4	      EFLAGS
	0008:	     4	      EIP
	000c:	     4	      EDI
	0010:	     4	      ESI
	0014:	     4	      EBP
	0018:	     4	      ESP
	001c:	     4	      EBX
	0020:	     4	      EDX
	0024:	     4	      ECX
	0028:	     4	      EAX
	002c:	     4	      DR6
	0030:	     4	      DR7
	0034:	     4	      TR
	0038:	     4	      LDT
	003c:	     4	      GS (zero extended)
	0040:	     4	      FS (zero extended)
	0044:	     4	      DS (zero extended)
	0048:	     4	      SS (zero extended)
	004c:	     4	      CS (zero extended)
	0050:	     4	      ES (zero extended)
	0054:	    12	      TSS descriptor cache
	0060:	    12	      IDT descriptor cache
	006c:	    12	      GDT descriptor cache
	0078:	    12	      LDT descriptor cache
	0084:	    12	      GS descriptor cache
	0090:	    12	      FS descriptor cache
	009c:	    12	      DS descriptor cache
	00a8:	    12	      SS descriptor cache
	00b4:	    12	      CS descriptor cache
	00c0:	    12	      ES descriptor cache
    Descriptor caches layout:
    1 byte	   zero
    1 byte	   access rights byte, same as i80286
    2 bytes	   zero
    4 bytes	   32 bit physical base address of segment
    4 bytes	   32 bit segment limit


    This instruction is likely to be an alias for the LOADALL on
    the i80286. It is not documented and is even marked as
    unused in the 'Programmer's technical reference'. Still it
    executes on the i80286. >> info wanted <<


    This instruction copies the Carry Flag to the AL register.
    In case of a CY, AL becomes ffh. When the Carry Flag is
    cleared, AL becomes 00.

Floating Point special instructions:


    This instruction is available only on the IIT (Integrated
    Information Technology Inc.) math processors.
    Takes 242 clocks.
    The instruction performs a 4x4 matrix multiply in one
    instruction using four banks of 8 floating point registers.
    The operands must be loaded to a specific bank in a specific
    order. The equation solved can be represented by:
    Xn = (A00 * Xo) + (A01 * Xo) + (A02 * Xo) + (A03 * Xo)
    Yn = (A10 * Yo) + (A11 * Yo) + (A12 * Yo) + (A13 * Yo)
    Zn = (A20 * Zo) + (A21 * Zo) + (A22 * Zo) + (A23 * Zo)
    Vn = (A30 * Vo) + (A31 * Vo) + (A32 * Vo) + (A33 * Vo)
    Where Xo stands for the original X value and Xn for the
    result. Operands must be loaded to the following registers
    in the specified banks in the specified order.
	   Before FMUL4X4		 After FMUL4X4
		    bank	       bank
    Register:	0    1	  2		 0
	ST(0)  Xo   A33  A31		Xn
	ST(1)  Yo   A23  A21		Yn
	ST(2)  Zo   A13  A11		Zn
	ST(3)  Vo   A03  A01		Vn
	ST(4)	    A32  A30		 ?
	ST(5)	    A22  A20		 ?
	ST(6)	    A12  A10		 ?
	ST(7)	    A02  A00		 ?
    All four banks can be selected by using the bankswitching
    instructions, but only bank 0, 1 and 2 make sense since bank
    3 is an internal scratchpad. The separate banks can contain
    8 floating points and may be re-used with normal
    instructions. Each bank acts like an independent i80287,
    except when bankswitched inbetween, in those cases where the
    initial status is not maintained;
    Pseudo- multichip operation can be performed in each bank
    and even in multiple banks at the same time (although only
    one instruction will operate on one register at any given
    time), provided that the active register and top register
    are not changed after switching from bank to bank.
	FINIT			     ; reset control word
	FSBP1			     ; select bank 1
	FLD DWORD PTR es:[si]	     ; first original
	FLD DWORD PTR es:[si+4]      ; second original
	FLD DWORD PTR es:[si+8]      ; third original
	FSTCW WORD PTR [bx]	     ; save FPU control status
	FSBP2			     ; NOTE ! you will see three
				       active registers in this
				       bank when using a
	FINIT			     ; nothing visible
	FLD DWORD PTR [si]	     ; new value
	FLD DWORD PTR [si+4]	     ; second new value
	FADD ST,ST(1)		     ; two values visible
	FSTP DWORD PTR [si+8]	     ; one value visible
	FSBP1			     ; one original visible
	FLDCW WORD PTR [bx]	     ; restore FPU status to the
				       one active in bank 1,
				       causing original three
				       values to be visible
				       again in correct
	... simply continue with what you wanted to do with
	those numbers from es:[si], they are still there.
	FLD DWORD PTR [si+8]	     ; for instance...
    This feature of the IIT chips can be used to perform complex
    operations in registers with many components remaining the
    same for a large dataset, only saving intermediary results
    to ONE memory location, bankswitching to the next series of
    operands, loading that ONE operand and continuing the
    calculation with the next set of operands already in that
    bank. This does require another read into the new bank but
    may save time and memoryspace compared to memory based
    operands or multiple pass algorithms with multiple arrays of
    intermediary results.



    Selects the original bank. (default) (6 clocks)


    Selects bank 1 from FMUL4X4 instruction diagram (6 clocks)


    Selects bank 2 from FMUL4X4 instruction diagram (6 clocks)


    Selects the scratchpad bank3 used by the FMUL4X4 internally.
    Not very useful but funny to look at... How-to: load
    any value into bank 0,1 or 2 until you have a full 8
    registers, then execute this bankswitch. Using a
    debugger like CodeView you are now able to inspect the
    bank3 registers. (most likely to take 6 clocks)


    Apparently the IIT 2c87 recognises and executes some
    i80387 trigoniometric functions. UNDOCUMENTED
    FSIN (sine) and FCOS (cosine) have been tested and function
    according to the Intel 80387 specifications. FSINCOS
    (available on the Intel 80287XL, 80387 and up) does not

FSIN: OPCODE: d9,fe IIT 2c87+ (also Intel 80387+) UNDOCUMENTED

    Calculates the sine of the input in radians in ST(0). After
    calculation, ST(0) contains the sine. Takes approximately
    120 clocks.

FCOS: OPCODE: d9,ff IIT 2c87+ (also Intel 80387+) UNDOCUMENTED

    Calculates the cosine of the input in radians in ST(0).
    After calculation, ST(0) contains the cosine. Takes
    approximately 120 clocks.

Instructions by mnemonic mnemonic: opcode: processor: remark & remedy:

AAA i80286 & i80386 & i80486


INS i80286 &

		i80386 &

INVD i80486

MOV to SS n/a early 8088 Some early 8088 would not properly

			    disable interrupts after a move to
			    the SS register. Workaround would
			    be to explicitly clear the
			    interrupts, update SS and SP and
			    then re-enable the interrupts.
			    Typically this would occur in a
			    situation where one would relocate
			    a stack in memory, more than 64Kb
			    from the original one, updating
			    both SS and SP like in:
			      MOV SS,AX  ; would disable
					   automatically during
					   this and next
			      MOV SP,DX  ; interrupts disabled
			      ...	 ; interrupts enabled.

multiple prefixes with REPx 8088 & 8086 They would not properly restart at

			    the first prefix byte after an
			    interrupt. when more than one
			    prefix is used. e.g. LOCK REP MOVSW
			    CS:[bx]. A workaround is to test
			    after the instruction for CX==0,
			    here: LOCK REP MOVSW CS:[BX] OR
			    CX,CX JNZ here because of the CS
			    override, the REP and LOCK prefixes
			    would not be recognised to be part
			    of the instruction and the REP MOVSW
			    would be aborted. This also seems to
			    be the case for a REP MOVSW CS:[BX]
			    Note that this also implies that
			    REPZ, REPNZ are affected in SCASW
			    for instance.