Ever since Yamaha distributed the audio styles for Genos, I've been meaning to take a look inside of an audio style file. Here's a little preliminary information.
An audio style file is an IFF-like container just like a Standard MIDI File (SMF). In fact, an audio style file has the same internal organization as a regular style file which we know to be a Type 0 SMF with extra chunks.
An audio style file has the following chunks (in order):
Type Purpose
---- ------------------------------------
MThd SMF header chunk
MTrk SMF track chunk
CASM Yamaha CASM chunk
AASM Audio assembly (descriptor) chunk
AFil Audio file (waveform) chunk
OTSc Yamaha OTS chunk
The AASM and AFil chunks are new, additional chunks beyond the known MIDI, CASM and OTS chunks. All chunks have a four byte chunk identifier and a four byte chunk size. The chunk size does not include the identifier or chunk size bytes, as usual.
The AASM chunk is relatively small, about 2,500 bytes. It consists of 15 variable length ASEG subchunks. The ASEG subchunk has a four byte subchunk size. Each ASEG corresponds to a style section; that's why there are fifteen of them.
An ASEG subchunk has three parts:
Type Purpose
---- ------------------------------------
Adec Identifies the style section
Atab Identifies the audio file; other functions unknown
AMix Function unknown
The Adec part is variable length, having an explicit four byte size. The Atab and AMix parts appears to be fixed length (101 and 28 bytes, respectively) and do not have an explicit size field.
The Adec part is ASCII text and is a style section name like "Main A" or "Fill In DD". That is the only information in Adec.
I don't know what the Atab or AMix parts do. The Atab part contains an ASCII string which identifies the audio file associated with the style section. This string is clearly visible in a dump. (Example below.) All of the Atab and AMix parts in the test audio file have the same values except for the audio file names.
File Offset: 36965
Subchunk type: 'ASEG'
Subchunk size: 151
Section name: Main D
Atab type: 'Atab'
0 0 0 97 0 32 32 32 | 00 00 00 61 00 20 20 20 | ...a.
32 32 32 32 32 41 56 48 | 20 20 20 20 20 29 38 30 | )80
115 67 97 110 97 100 105 97 | 73 43 61 6E 61 64 69 61 | sCanadia
110 82 111 99 107 95 77 97 | 6E 52 6F 63 6B 5F 4D 61 | nRock_Ma
105 110 32 68 0 0 0 0 | 69 6E 20 44 00 00 00 00 | in D....
0 0 0 0 0 0 0 0 | 00 00 00 00 00 00 00 00 | ........
0 0 0 0 0 0 0 0 | 00 00 00 00 00 00 00 00 | ........
0 0 0 0 0 0 0 0 | 00 00 00 00 00 00 00 00 | ........
1 15 -1 7 -1 -1 -1 -1 | 01 0F FF 07 FF FF FF FF | ........
0 0 0 127 0 0 0 0 | 00 00 00 7F 00 00 00 00 | ........
127 0 0 0 0 0 127 0 | 7F 00 00 00 00 00 7F 00 | ........
0 0 0 0 127 0 0 0 | 00 00 00 00 7F 00 00 00 | ........
0 0 0 0 0 0 0 0 | 00 00 00 00 00 00 00 00 | ........
AMix type: 'AMix'
0 0 0 24 7 -128 0 -1 | 00 00 00 18 07 80 00 FF | ........
88 4 4 2 24 8 0 -80 | 58 04 04 02 18 08 00 B0 | X.......
7 71 0 10 64 0 91 0 | 07 47 00 0A 40 00 5B 00 | .G..@.[.
0 -1 47 0 0 0 0 0 | 00 FF 2F 00 00 00 00 00 | ../.....
I'm still working on the AFil chunk. It has substructure, too. The AFil has the following (sub)chunk types:
ADSg
ANdc
AWav
WAVE
Afmt
Sfmt
SPnt
Sdec
Adat
Atmp
ADSg seems to be the container chunk. Like ASEG, there is fifteen of everything. The ANdc subchunk contains the audio file name which matches up with the name in the ASEG. AWav looks to be a container, too. (I need to verify this hunch.)
As you might guess, the AFil chunk is pretty big because it contains waveform data.
The audio "file" format is WAV-like, but it is not exactly WAV (Microsoft RIFF). I was able to playback the audio by importing the audio style file as a raw (untyped) audio file. The audio format seems to be 44,100Hz, 16-bit stereo. No compression or encryption. It shouldn't be too hard to dump the audio. Or, maybe replace the audio, thereby making it possible to create a new audio style.
I've got some Java code that I will eventually share as soon as I get the AFil chunk worked out.
All the best -- pj
Here's just a few quick observations while working out the AFil chunk.
An AFil chunk consists of 15 ADSg subchunks. The following table shows the offset and length information for the first ADSg in the example AFil:
AFil 37287 15261858
ADSg 37295 1219275 Container for an audio file
ANdc 37303 50 File name
AWav 37361 1219209 Container for audio waveform
WAVE 37369 n/a Marker (no subchunk size)
Afmt 37373 16 Audio format information
Sfmt 37397 217 Container for section information
Sdec 37608 6 Section name, e.g., Main A
Adat 37622 1218300 Waveform data
AInf 1255930 640 Container for audio information
BPnt 1255938 136
OPnt 1256082 240
APnt 1256330 232
ATmp 1256570 0 Empty, subchunk size is 0
ADSg 1256578
The container relationships are important because the containers and subchunks are nested:
AFil contains ADSg
ADSg contains ANdc, AWav
AWav contains WAVE, Afmt, Sfmt, Sdec, Adat, AInf
AInf contains BPnt, OPnt, APnt, ATmp
Thanks to all of this nesting, it's gonna take a little while to extend the Java code.
I appreciate everyone taking a look and making suggestions and comments, especially about the meaning of this stuff.
Take care -- pj
Now that you know a little bit about what's inside of an audio style file, here is brief overview of what the Audio Phraser program generates.
Audio Phraser generates an MThd MIDI file header chunk, a single MTrk chunk (Type 0), an ASEG chunk for each audio waveform, an AFil chunk (containing an ADSg subchunk for each audio file) and a CASM chunk.
The MIDI tempo and time signature are the same as the tempo set in Audio Phraser. The MIDI song title is set to "Audio Phraser".
The MIDI track contains the usual markers at the beginning: SFF2 and SInt. A single SysEx message is generated after SInt: General MIDI System ON (F0 7E 7F 09 01 F7). The key signature is set to C/Am, followed by:
- SMPTE Offset
- Sequencer-specific MIDI meta event: ff 7f 04 43 00 01 00 00
Oddly, MIDI channel 4 has four, whack-looking MIDI OFF events:
NOTE OFF G#9
NOTE OFF G5
NOTE OFF C0
NOTE OFF C0
A bug? The remaining markers indicate the start of the style sections. The section length corresponds to the length of the audio waveform for the section. Thus, if the audio waveform for "Main A" is 2 bars, then the MIDI section for "Main A" is 2 bars long.
The CASM chunk is minimal and sets NTR/NTT for MIDI channel 9 (Subrhythm). NTR is "Root Fixed" and NTT is "Bypass/Bass Off". No NTR/NTT is given for channel 10 (rhythm/drums).
Audio Phraser does not generate an OTSc (One Touch Settings) chunk.
Audio Phraser creates an AWI file for each waveform that it imports into an audio style file. The AWI file most likely holds the results of Audio Phraser's analysis (i.e., beat detection and so forth). It would be interesting and informative to compare the contents of an AWI file against the ASEG and AInf chunks in the resulting audio style file. I'm guessing that the AWI file is the "prototype" for the ASEG and AInf chunks.