Programming docs for the Delfina Flipper Edition Soundcard This document is freely distributable as long as it is not changed, and one of the sources www.jschoenfeld.com, www.jschonfeld.de, www.siliconsonic.com or www.siliconsonic.de is mentioned. ****************************************************************************** *This file replaces the old file Inside_Flipper.txt. If you have one of the * *old versions, please delete it from your harddrive, as the new Delfina * *Flipper V2.0 will behave different from what's mentioned in the Flipper V1.0* *docs. * ****************************************************************************** This file evolves while development is making progress. I decided to release it to the public to get rid of all the eMails asking "when is it finished??". history: May 21st, 2002: Had to change the order of full duplex transfers: It's now "first read, then write", and full duplex can now be used with or without the prefetch mode. May 20th, 2002: improved waitstate logic. Final tests of fifo state machines to be done tomorrow... May 19th, 2002: Changed polarity of IRQ enable bits. The $40 register is set to 0 with a reset, so all IRQs are inhibited without drivers enabling them. This ensures safe system startup. Corrected memory map of 64K area. May 18th, 2002: The sample PCB arrived! Corrected a mistake in the AFifo state machine and the DFifo state machine (same mistake in both Fifos). DSP Glue logic updated. May 6th, 2002: CODEC glue logic completed. Now all timing requirements are met. PCB design is making slow progress. Need a better CAD software! May 5th, 2002: Corrected an error in DSP glue logic, did the IRQ logic for DSP and Midi. Improved Fifo/waitstate logic: Any 16/32 bit access can now be done, regardless of the Fifo state. If there's no space, the CPU will wait automatically. Finalized Schematic entry for Prototype 2 board. May 4th, 2002: Updated config-MACH, introduced status register, Prepared prototype PCB (packages, schematic entry) May 2nd, 2002: Reduced SPDIF sync&count to 10 macrocells. Less than half of what it was before :-) May 1st, 2002: Prepared Clockport interface, did the DSP glue logic, extended DSP memory map. April 30th, 2002: Finalized the Zorro interface. A combinatoric state machine takes advantage of the errors in the RBM OnBoard Ex and the Apollo 2030 accelerator: Both make a premature end of cycle. In other words: Both will be faster than an average Zorro. Added "full duplex burst transfer". Updated "Precautions" section: "Read Word" can now be done any time without checking for Fifos being empty. April 28th, 2002: RBM OnBoard Zorro expansion SUCKS! I've spent the whole day debugging this tinker-style piece of crap. What did they think when implementing /XRDY only halfways, and /OVR not at all? How is a hardware designer supposed to implement waitstates in that daughterboard? Well, found a solution:-) Zorro interface now works in ALL Amigas without slowing down. Long fight. Phew. Good thing with all those faults in the OnBoard Ex logic: It will be even faster than any other Zorro. I don't know what kind of pot you have to smoke to implement a 2-cycle zorro access, but it's possible with the OnBoard ;-) April 27th, 2002: Finally got the new DFifo to fit in a MACH210 with extended features: The Prefetch option for the DataReadFifo can now be switched on and off. April 26th, 2002: Optimized Data fifo: A Byte-wide write access now garbles the data write Fifo for burst and 16-bit wide transfers. This saves a lot of logic, and hardly affects existing software. April 25th, 2002: Added code examples for burst transfer The logic re-design is making quick progress. Everything looks like a major reduction of hardware. April 24th, 2002: Added DSP56K memory map. April 23rd, 2002: Initial release for total re-design of the hardware. The first prototype with seven MACH210 and two MACH211 was simply too power-consuming and too expensive. General ------- Delfina Flipper Edition uses two config spaces if used in a Zorro slot. One is 64K, the other 128K. The 64K area must be located between $e80000 and $ef00000, while the 128K area can be located anywhere in the 16MB Zorro space. The 64K area is compatible to the Delfina Plus by Petsoff Limited Partnership. However, existing Delfina Software will not recognize any difference, because the delfina.library will be just the same on the user's side. Using the Delfina library is recommended. This document is only for educational purpose, and shall not encourage programmers to hack the hardware from the Amiga side. It carries a lot of valuable information for DSP programmers, and can give a basic idea of how the Amiga communicates with the DSP. Hardware description -------------------- Vendor number: 4626 ($1212) product numbers: 8 and 9 Serial numbers: 0 on both config spaces. The board does not support the "shutup_forever" option, because the two config spaces are mandatory. Memory map of 64K area: $0000-$000f: DSP Host port in 8-bit mode (even addresses) caution: While fifo logic is working, byte accesses are NOT passed over to the DSP. You MUST check for Fifos being empty before using these addresses. If you read or write anyway, an access is tried by the logic regardless of Fifo state, so you may read nonsense, or write anything, but not the data you want to write. $0010-$001f: Mirror of DSP host port (even addresses) $0020-$003f: DSP host port in 16-bit mode. A write to this area causes two byte accesses to registers 6 and 7 of the DSP host port. $0040 : for write: Control register. The control register cannot be read, therefore you should keep a shadow of it somewhere in memory. bit 7: DSP&State machine reset. Setting this bit to 0 will reset the DSP and all State machines of the Delfina Flipper. Write a 1 to get everything running again. State after reset of the Amiga is 0 (DSP halted). bit 6: Prefetch mode. A 1 in this bit will enable the Fifo state machine to prefetch data from the DSP, so it's available without waitstates on the next 16-bit read. bit 5: GPIO IRQ enable: A 1 in this bit will enable the DSP to issue an INT2 with GPIO pin ?? (Teemu, please make a decision!) bit 4: HREQ IRQ enable: A 1 in this bit will enable HREQ pin from the DSP to be mapped to INT2 line of the Zorro slot. A 0 in this bit disables HREQ->INT2 interrupts (should not be enabled when 16-bit or burst transfers are used!) bit 3: Midi reset. A 1 in this bit will reset the MIDI UART. bit 2: SPDif input PLL frequency pre-select: 0=44.1/48khz 1=32 khz bits 1 and 0 are unused and should be set to 0 with any write. $0040 : for read: Status register bit 4: Address fifo empty bit 5: data write fifo empty bit 6: data read fifo empty Precaustions for using 16-bit and 32-bit burst mode: -> If you intend to read data with the 8-bit mode, it's recommended to switch the prefetch mode off (only applies to host port registers 6 and 7). -> If you have the HREQ INT bit enabled, 16-bit and burstmode transfers should not be used at all, because they use the HREQ line of the DSP host port for sycronisation with the DSP. A lot of unexpected IRQs would be the result. The 128K area is described in the "burst mode transfer" section further down. It has no specific memory map, because the address lines are not used in a conventional way. If you are not familiar with hardware issues, just take the example sources provided in this file. DSP 56002 memory map: X Memory: $0000-$3fff 24 bit wide memory $4000-$7fff mirror of the above $8000-$ffe7 unused $ffe8,$ffe9,$ffeb Host interface. See the DSP56K User's manual, chapter 5.3 (Motorola order code DSP56002UM/AD) Y Memory: $0000-$3fff 24 bit wide memory $4000-$7fff mirror of the above $8000-$ffbf unused $ffc0-$ffdf reserved $ffe0-$ffff Crystal CS 4231 CODEC Note: DSP Memory cannot be expanded. Logic for this would have to be faster than light: At 74 Mhz, one access is 13.5 ns short. With setup and hold times, 12ns speed for the memory chips is the slowest we can use. Just accept that the memory is soldered to the board, and that 96K is enough. Period. If memory is getting tight, use the mirrors to utilize the whole physical memory: The DSP has 256 words RAM on-chip for X and Y space (together 512 words), and another 512 words of data rom (256 words each for X and Y space). If these are switched on, the external memory between $0000 and $01ff can only be accessed through the mirror in $4000-$41ff. This mirror is always active, so you can leave the internal RAM and ROM switched on forever without losing the memory space! Burst transfer writes through the 128K area ------------------------------------------- The hardware transfers 32 bits of data in one Zorro write acces by misusing the address lines 1-16 (not 0-15) for the high word of a long word while transferring the low word on the data bus bits 0-15. You enable this highly efficient transfer mode by writing to the 128K config space of the Delfina. Any access to this area will be understood as a longword-write to the host port of the DSP. First, the data word taken from the address lines is written, then the data word on the data lines. The order of registers of the host port is 6-7-6-7. Delfina has a 32-bit Fifo, so one longword-write is always possible without waitstates. If the DSP gets busy after the first two byte-accesses to the host port, the hardware will take care of the waitstates. All this happens "in the background" - waitstates on the Zorro bus will only occur if the 32bit Fifo is not empty until the next longword-write to the 128K area or the next access to the 16-bit wide space in the 64K area (the Fifo is shared between the two areas in order to prevent messups). To make it possible to use the instructions 'move.w dn,0(an,dn.w)' or 'movem.w (an)+,', the highbit of the address is inverted. The two instructions will sign extend the datawords, which will cause inversion of bit 16 of a base pointer that points exactly to the middle of the 128K space. The following loop copies 128 longwords (512 bytes) using this burstmode transfer. It is unrolled and interleaved for 4 longwords in order to speed it up. This may or may not increase performance - it's currently untested and for educational purpose only. If you have suggestions on how to speed it up or make it shorter, write to me, my eMail address is at the end of this file. -------- cut ---------- ;uses regs d0-d4/a0-a1 mc68020 movea.l STRUCT_hardwarebase(a5),a0 ;load the baseaddress of the ;128K block into a0 from whereever ;you keep it. movea.l STRUCT_databuffer(a5),a1 ;load the address of the buffer to send ;from whereever you keep it. adda.l #$00010000,a0 ;add 2^16 to base because we are using sign ;extended words for relative addressing. move.w #$001F,d4 ;loop 32 times 4 longs for the 512 bytes .loop: move.l (a1)+,d0 ;get longword 1 move.l d0,d1 ;copy 1 move.l (a1)+,d2 ;get longword 2 swap d0 ;get highword 1 in low move.l d2,d3 ;copy 2 move.w d1,(a0,d0.w*2) ;write 32 bits to Delfina (1) swap d2 ;get highword 2 in low move.w d3,(a0,d2.w*2) ;write 32 bits to Delfina (2) move.l (a1)+,d0 ;get longword 3 move.l d0,d1 ;copy 3 move.l (a1)+,d2 ;get longword 4 swap d0 ;get highword 3 in low move.l d2,d3 ;copy 4 move.w d1,(a0,d0.w*2) ;write 32 bits to Delfina (3) swap d2 ;get highword 4 in low move.w d3,(a0,d2.w*2) ;write 32 bits to Delfina (4) dbra d4,.loop ------------ cut ------------ The rolled up loop has 42 (38? don't have the table at hand!) bytes and should easily fit the instruction cache of a 68020 or higher. To use this burstmode transfer in an efficient way, the 020 addressing modes are mandatory. On 68000 CPUs, it will presumably be slower to use the burstmode compared to the 16-bit transfers in the 64K area. Since the code example above does not use 'movem' instructions, it should also be the most efficient way to do it on an 060 processor, but there may be a possibility to speed things up on a 020/030/040 with the use of the 'movem' instruction: move.w #$001F,d4 .loop: movem.w (a0)+,d0-d3 move.w d1,(a1,d0.w*2) move.w d3,(a1,d2.w*2) movem.w (a0)+,d0-d3 move.w d1,(a1,d0.w*2) move.w d3,(a1,d2.w*2) dbra.b d4,.loop Again, this is currently untested and subject to be updated with your suggestions. Full Duplex burst transfers through the 128K area ------------------------------------------------- A simple move.w (a0,d1.w*2),d0 with a0 being the base address of the 128K area plus $10000 (as in the section above) will first read data from host port registers 6&7 and store them in d0, then write the contents of d1 to the same host port registers (everything word-wise). Sample sources for full duplex copyloops will follow. If you have any questions, feel free to write me an eMail: jens@jschoenfeld.de --EOF