From 577fbc034604516b7834a5f64e53334dfdc53847 Mon Sep 17 00:00:00 2001 From: vaclav Date: Thu, 10 Aug 2023 18:15:25 +0200 Subject: [PATCH 01/10] revision of Readme.txt file --- readme.txt | 170 ++++++++++++++++++++++++++++------------------------- 1 file changed, 90 insertions(+), 80 deletions(-) diff --git a/readme.txt b/readme.txt index ab6bc2fa75..9f2eea6ed4 100644 --- a/readme.txt +++ b/readme.txt @@ -31,12 +31,13 @@ *******************************************************************************************************/ -These files represent a pre-release of a codec candidate to the IVAS +These files represent a codec candidate to the IVAS Extension to the 3GPP EVS Codec floating-point C simulation. All code is -written in ANSI-C. The system is implemented as two separate programs: +written in C. The system is implemented as three separate programs: IVAS_cod Encoder IVAS_dec Decoder + IVAS_rend Renderer For encoding using the coder program, the input is a binary audio file (*.8k, *.16k, *.32k, *.48k) and the output is a binary @@ -62,7 +63,8 @@ such as an HP (HP-UX) or a Sun, then binary files will need to be modified by swapping the byte order in the files. The input and output files (*.8k, *.16k, *.32k, *.48k) are 16-bit signed -binary files with 8/16/32/48 kHz sampling rate with no headers. +binary files with 8/16/32/48 kHz sampling rate with no headers. Alternatively, +the input and output files are WAV files. The Encoder produces bitstream files in either ITU G.192 or MIME file storage format. @@ -126,10 +128,9 @@ should have the following structure: |-- lib_debug |-- lib_dec |-- lib_enc + |-- lib_lc3plus |-- lib_rend |-- lib_util - |-- scripts - |-- tests |-- readme.txt The package includes a Makefile for gcc, which has been verified on @@ -140,9 +141,10 @@ in the c-code directory. The package also includes a solution-file for Microsoft Visual Studio 2017 (x86). To compile the code, please open "Workspace_msvc\Workspace_msvc.sln" and build -"encoder" for the encoder and "decoder" for the decoder executable. The resulting -encoder/decoder/renderer executables are named "IVAS_cod.exe", "IVAS_dec.exe", -and "IVAS_rend.exe". All reside in the c-code directory. +"encoder" for the encoder, "decoder" for the decoder, and "renderer" for the +renderer executable. The resulting encoder/decoder/renderer executables are +"IVAS_cod.exe", "IVAS_dec.exe", and "IVAS_rend.exe". All reside in the c-code +main directory. RUNNING THE SOFTWARE @@ -167,9 +169,9 @@ R : Bitrate in bps, (24400, 32000, 48000, 64000, 80000, 96000,128000) for 2 ISM, 3 ISM and 4 ISM also 160000, 192000, 256000 for 3 ISM and 4 ISM also 384000 - for 4 ISM also 512000 - for IVAS SBA, MASA, MC R=(13200, 16400, 24400, 32000, 48000, 64000, 80000, - 96000, 128000, 160000, 192000, 256000, 384000, 512000) + for 4 ISM also 512000 + for IVAS SBA, MASA, MC, ISM-MASA, and ISM-SBA R=(13200, 16400, 24400, 32000, + 48000, 64000, 80000, 96000, 128000, 160000, 192000, 256000, 384000, 512000) Alternatively, R can be a bitrate switching file which consists of R values indicating the bitrate for each frame in bps. These values are stored in binary format using 4 bytes per value @@ -201,27 +203,24 @@ EVS mono is default, for IVAS choose one of the following: -stereo, -ism, -sba, where InputConf specifies the channel configuration: 5_1, 7_1, 5_1_2, 5_1_4, 7_1_4 Loudspeaker positions are assumed to have azimuth and elevation as per ISO/IEC 23091-3:2018 Table 3. Channel order is as per ISO/IEC 23008-3:2015 Table 95. - See readme.txt for details. + See below for details. -dtx D : Activate DTX mode, D = (0, 3-100) is the SID update rate - where 0 = adaptive, 3-100 = fixed in number of frames, - default is deactivated + where 0 = adaptive, 3-100 = fixed in number of frames, default is deactivated -dtx : Activate DTX mode with a SID update rate of 8 frames - Note: DTX is supported in EVS, stereo, ISM, SBA up to 80kbps and MASA up to 128kbps --rf p o : Activate channel-aware mode for WB and SWB signal at 13.2kbps, + Note: DTX is supported in EVS, stereo, ISM, MASA, and SBA up to 80kbps +-rf p o : Activate channel-aware mode in EVS for WB and SWB signal at 13.2kbps, where FEC indicator, p: LO or HI, and FEC offset, o: 2, 3, 5, or 7 in number of frames. Alternatively p and o can be replaced by a rf configuration file with each line - contains the values of p and o separated by a space, - default is deactivated + contains the values of p and o separated by a space, default is deactivated -max_band B : Activate bandwidth limitation, B = (NB, WB, SWB or FB) alternatively, B can be a text file where each line contains "nb_frames B" -no_delay_cmp : Turn off delay compensation --stereo_dmx_evs : Activate stereo downmix function for EVS. +-stereo_dmx_evs : Stereo downmix function for EVS -mime : Mime output bitstream file format The encoder produces TS26.445 Annex.2.6 Mime Storage Format, (not RFC4867 Mime Format). default output bitstream file format is G.192 -bypass mode : SBA PCA by-pass, mode = (1, 2), 1 = PCA off, 2 = signal adaptive, default is 1 --q : Quiet mode, no frame counters - default is deactivated +-q : Quiet mode, limit printouts to terminal, default is deactivated The usage of the "IVAS_dec" program is as follows: @@ -233,7 +232,8 @@ Usage for IVAS: IVAS_dec.exe [Options] OutputConf Fs bitstream_file output_file Mandatory parameters: --------------------- OutputConf : Output configuration: MONO, STEREO, 5_1, 7_1, 5_1_2, 5_1_4, 7_1_4, FOA, - HOA2, HOA3, BINAURAL, BINAURAL_ROOM_IR, BINAURAL_ROOM_REVERB, BINAURAL_SPLIT_CODED, BINAURAL_SPLIT_PCM, EXT + HOA2, HOA3, BINAURAL, BINAURAL_ROOM_IR, BINAURAL_ROOM_REVERB, + BINAURAL_SPLIT_CODED, BINAURAL_SPLIT_PCM, EXT By default, channel order and loudspeaker positions are equal to the encoder. For loudspeaker outputs, OutputConf can be a custom loudspeaker layout file. See below for details. @@ -261,7 +261,7 @@ Options: Format files, the magic word in the mime file is used to determine which of the two supported formats is in use. default bitstream file format is G.192 --hrtf File : HRTF filter File used in ISm format and BINAURAL output configuration +-hrtf File : HRTF filter File used in BINAURAL rendering -T File : Head rotation specified by external trajectory File -otr tracking_type : Head orientation tracking type: 'none', 'ref', 'avg', 'ref_vec' or 'ref_vec_lev' (only for binaural rendering) @@ -269,11 +269,11 @@ Options: works only in combination with '-otr ref' mode -rvf File : Reference vector specified by external trajectory file works only in combination with '-otr ref_vec' and 'ref_vec_lev' modes --render_config File : Renderer configuration option File +-render_config File : Renderer configuration option with parameters specified in File +-om File : MD output file for BINAURAL_SPLIT_PCM output -non_diegetic_pan P : panning mono non-diegetic sound to stereo -90<= P <=90, left or l or 90->left, right or r or -90->right, center or c or 0->middle --q : Quiet mode, no frame counter - default is deactivated +-q : Quiet mode, limit printouts to terminal, default is deactivated The usage of the "IVAS_rend" program is as follows: @@ -282,34 +282,36 @@ The usage of the "IVAS_rend" program is as follows: Usage: IVAS_rend [options] Valid options: - --input_file, -i Path to the input file (WAV, raw PCM or scene description file) - --input_format, -if Audio format of input file (e.g. 5_1 or HOA3 or META, use -l for a list) - --input_metadata, -im Space-separated list of path to metadata files for ISM or MASA inputs or BINAURAL_SPLIT_PCM input mode - --output_file, -o Path to the output file - --output_format, -of Output format to render. - Alternatively, can be a custom loudspeaker layout file - --sample_rate, -fs Input sampling rate in kHz (16, 32, 48) - required only with raw PCM inputs - --trajectory_file, -tf Head rotation trajectory file for simulation of head tracking (only for binaural outputs) - --output_metadata, -om coded metadata file for BINAURAL_SPLIT_PCM output mode - --post_rend_bfi_file, -prbfi Split rendering option: bfi file - --reference_rotation_file, -rf Reference rotation trajectory file for simulation of head tracking (only for binaural outputs) - --custom_hrtf, -hrtf Custom HRTF file for binaural rendering (only for binaural outputs) - --render_config, -rc Binaural renderer configuration file (only for binaural outputs) - --non_diegetic_pan, -ndp Panning mono non diegetic sound to stereo -90<= pan <= 90 - left or l or 90->left, right or r or -90->right, center or c or 0 ->middle - - --tracking_type, -otr Head orientation tracking type: 'none', 'ref', 'avg' or `ref_vec` or `ref_vec_lev` (only for binaural outputs) - --lfe_position, -lp Output LFE position. Comma-delimited triplet of [gain, azimuth, elevation] where gain is linear (like --gain, -g) and azimuth, elevation are in degrees. - If specified, overrides the default behavior which attempts to map input to output LFE channel(s) - --lfe_matrix, -lm LFE panning matrix. File (CSV table) containing a matrix of dimensions [ num_input_lfe x num_output_channels ] with elements specifying linear routing gain (like --gain, -g). - If specified, overrides the output LFE position option and the default behavior which attempts to map input to output LFE channel(s) - --no_delay_cmp, -ndc [flag] Turn off delay compensation - --quiet, -q [flag] Limit printouts to terminal - --gain, -g Input gain (linear, not in dB) to be applied to input audio file - --list, -l List supported audio formats - --reference_vector_file, -rvf Reference vector trajectory file for simulation of head tracking (only for binaural outputs) - --exterior_orientation_file, -exof External orientation trajectory file for simulation of external orientations - --sync_md_delay, -smd Metadata Synchronization Delay in ms, Default is 0. Quantized by 5ms subframes for TDRenderer (13ms -> 10ms -> 2subframes) +-i File : Input audio File (WAV, raw PCM or scene description file) +-if Format : Audio Format of input file (e.g. 5_1 or HOA3 or META, use -l for a list) +-im Files : Metadata files for ISM (one file per object) or MASA inputs or BINAURAL_SPLIT_PCM input mode +-o File : Output audio File +-of Format : Audio Format of output file + Alternatively, it can be a custom loudspeaker layout file +-fs : Input sampling rate in kHz (16, 32, 48) - required only with raw PCM inputs +-tf File : Head rotation trajectory file for simulation of head tracking (only for binaural outputs) +-om File : Coded metadata File for BINAURAL_SPLIT_PCM output mode +-prbfi File : Split rendering option: bfi File +-rf File : Reference rotation trajectory File for simulation of head tracking (only for binaural outputs) +-rvf File : Reference vector trajectory File for simulation of head tracking (only for binaural outputs) +-hrtf File : Custom HRTF File for binaural rendering (only for binaural outputs) +-rc File : Binaural renderer configuration File (only for binaural outputs) +-ndp P : Panning mono non-diegetic sound to stereo -90<= P <= 90 + left or l or 90->left, right or r or -90->right, center or c or 0 ->middle +-otr tracking_type : Head orientation tracking type: 'none', 'ref', 'avg' or `ref_vec` or `ref_vec_lev` (only for binaural outputs) +-lp Position : Output LFE position. Comma-delimited triplet of [gain, azimuth, elevation] where gain is linear + (like --gain, -g) and azimuth, elevation are in degrees. + If specified, overrides the default behavior which attempts to map input to output LFE channel(s) +-lm File : LFE panning matrix File (CSV table) containing a matrix of dimensions [ num_input_lfe x + num_output_channels ] with elements specifying linear routing gain (like --gain, -g). + If specified, overrides the output LFE position option and the default behavior which attempts to map + input to output LFE channel(s) +-ndc : Turn off delay compensation +-q : Quiet mode, limit printouts to terminal, default is deactivated +-g : Input gain (linear, not in dB) to be applied to input audio file +-l : List supported audio formats +-exof : External orientation trajectory file for simulation of external orientations +-smd : Metadata Synchronization Delay in ms, Default is 0. Quantized by 5ms subframes. MULTICHANNEL LOUDSPEAKER INPUT / OUTPUT CONFIGURATIONS @@ -344,10 +346,10 @@ An example custom loudspeaker layout file is available: ls_setup_16ch_8+4+4.txt RUNNING THE SELF TEST ===================== -A codec verification script is available in scripts/self_test.py. The -script demonstrates how to use the software at several operating points and -compares the output to a reference version/implementation. Please note: -In order to keep the run-time short it does not cover all operating +A codec verification script is available at https://forge.3gpp.org/rep/ivas-codec-pc/ivas-codec/ +in scripts/self_test.py. The script demonstrates how to use the software at several operating points +and compares the output to a reference version/implementation. +Please note: In order to keep the run-time short it does not cover all operating points or complete coverage. Documentation on the self_test.py can be found as a part of scripts/README.md. @@ -385,13 +387,29 @@ stvST32c.wav - 2 channels, 32000 Hz, 659200 samples per channel, clean spe stvST32n.wav - 2 channels, 32000 Hz, 620800 samples per channel, noisy speech stvST48c.wav - 2 channels, 48000 Hz, 988800 samples per channel, clean speech/audio stvST48n.wav - 2 channels, 48000 Hz, 931200 samples per channel, noisy speech -stv1MASA1TC48c.wav - 1 channel (1 MASA transport channel), 48000 Hz, 48000 Hz, 144000 samples -stv1MASA1TC48n.wav - 1 channel (1 MASA transport channel), 48000 Hz, 48000 Hz, 963840 samples -stv1MASA2TC48c.wav - 2 channels (2 MASA transport channel), 48000 Hz, 48000 Hz, 288000 samples per channel -stv1MASA2TC48n.wav - 2 channels (2 MASA transport channel), 48000 Hz, 48000 Hz, 963840 samples per channel -stv2MASA1TC48c.wav - 1 channel (1 MASA transport channel), 48000 Hz, 48000 Hz, 288000 -stv2MASA2TC48c.wav - 2 channels (2 MASA transport channel), 48000 Hz, 48000 Hz, 144000 samples per channel - +stv1MASA1TC48c.wav - 1 channel (1 MASA 1 transport channel), 48000 Hz, 48000 Hz, 144000 samples +stv1MASA1TC48n.wav - 1 channel (1 MASA 1 transport channel), 48000 Hz, 48000 Hz, 963840 samples +stv1MASA2TC48c.wav - 2 channels (2 MASA 2 transport channels), 48000 Hz, 48000 Hz, 288000 samples per channel +stv1MASA2TC48n.wav - 2 channels (2 MASA 2 transport channels), 48000 Hz, 48000 Hz, 963840 samples per channel +stv2MASA1TC48c.wav - 1 channel (1 MASA 1 transport channel), 48000 Hz, 48000 Hz, 288000 +stv2MASA2TC48c.wav - 2 channels (2 MASA 2 transport channels), 48000 Hz, 48000 Hz, 144000 samples per channel +stvOMASA_1ISM_1MASA2TC48c.wav - 3 channels (1 discrete audio object and 1 MASA 2 transport channels), 48000 Hz +stvOMASA_1ISM_2MASA1TC32c.wav - 2 channels (1 discrete audio object and 2 MASA 1 transport channel), 32000 Hz +stvOMASA_1ISM_2MASA2TC48c.wav - 3 channels (1 discrete audio object and 2 MASA 2 transport channels), 48000 Hz +stvOMASA_2ISM_1MASA1TC16c.wav - 3 channels (2 discrete audio object and 1 MASA 1 transport channel), 48000 Hz +stvOMASA_2ISM_1MASA2TC48c.wav - 4 channels (2 discrete audio object and 1 MASA 2 transport channels), 16000 Hz +stvOMASA_2ISM_2MASA2TC48c.wav - 4 channels (2 discrete audio object and 2 MASA 2 transport channels), 48000 Hz +stvOMASA_3ISM_1MASA1TC32c.wav - 4 channels (3 discrete audio object and 1 MASA 1 transport channel), 32000 Hz +stvOMASA_3ISM_1MASA2TC16c.wav - 5 channels (3 discrete audio object and 1 MASA 2 transport channels), 16000 Hz +stvOMASA_3ISM_1MASA2TC32c.wav - 5 channels (3 discrete audio object and 1 MASA 2 transport channels), 32000 Hz +stvOMASA_3ISM_1MASA2TC48c.wav - 5 channels (3 discrete audio object and 1 MASA 2 transport channels), 32000 Hz +stvOMASA_3ISM_2MASA1TC48c.wav - 4 channels (3 discrete audio object and 2 MASA 1 transport channel), 48000 Hz +stvOMASA_3ISM_2MASA2TC32c.wav - 5 channels (3 discrete audio object and 2 MASA 2 transport channels), 32000 Hz +stvOMASA_3ISM_2MASA2TC48c.wav - 5 channels (3 discrete audio object and 2 MASA 2 transport channels), 48000 Hz +stvOMASA_4ISM_1MASA1TC48c.wav - 5 channels (4 discrete audio object and 1 MASA 1 transport channel), 48000 Hz +stvOMASA_4ISM_1MASA2TC48c.wav - 6 channels (4 discrete audio object and 1 MASA 2 transport channels), 48000 Hz +stvOMASA_4ISM_2MASA1TC48c.wav - 5 channels (4 discrete audio object and 2 MASA 1 transport channel), 48000 Hz +stvOMASA_4ISM_2MASA2TC48c.wav - 6 channels (4 discrete audio object and 2 MASA 2 transport channels), 48000 Hz For the MASA operation modes, in addition the following metadata files located in /scripts/testv/ folder are required: @@ -451,7 +469,7 @@ The metadata reader accepts 1-8 values specified per line. If a value is not spe value is assumed. For the HRTF filter File option, external HRTF filter Files are available in folder -/scripts/binauralRenderer_interface/binaural_renderers_hrtf_data : +/scripts/binauralRenderer_interface/binaural_renderers_hrtf_data: ivas_binaural_16kHz.bin ivas_binaural_32kHz.bin @@ -466,21 +484,13 @@ headrot_case01_3000_q.csv headrot_case02_3000_q.csv headrot_case03_3000_q.csv -For Reference vector specified by external trajectory file, example files are available at -/scripts/trajectories folder. - - -For the Renderer configuration option operation modes, external configuration files are available: - -rend_config_hospital_patientroom.cfg -config_recreation.cfg -config_renderer.cfg +For Reference vector specified by external trajectory file, example files are available in folder +/scripts/trajectories. - ADDITIONAL SCRIPTS - ================== +For the Renderer configuration option operation modes, external configuration files are available, e.g.: -Additional scripts for item generation and codec testing are available -in the directories scripts and tests. Please refer to scripts/README.md, resp. -tests/README.md for additional documentation. +rend_rend_config_hospital_patientroom.cfg +rend_config_recreation.cfg +rend_config_renderer.cfg -- GitLab From 21e19fffc1e7b3ad5c7dd759cc501f8a24238130 Mon Sep 17 00:00:00 2001 From: vaclav Date: Fri, 11 Aug 2023 07:23:42 +0200 Subject: [PATCH 02/10] add description for orientation and config features --- readme.txt | 57 +++++++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 56 insertions(+), 1 deletion(-) diff --git a/readme.txt b/readme.txt index 9f2eea6ed4..b1efb09ec5 100644 --- a/readme.txt +++ b/readme.txt @@ -411,6 +411,8 @@ stvOMASA_4ISM_1MASA2TC48c.wav - 6 channels (4 discrete audio object and 1 MASA 2 stvOMASA_4ISM_2MASA1TC48c.wav - 5 channels (4 discrete audio object and 2 MASA 1 transport channel), 48000 Hz stvOMASA_4ISM_2MASA2TC48c.wav - 6 channels (4 discrete audio object and 2 MASA 2 transport channels), 48000 Hz +MASA metadata file +------------------ For the MASA operation modes, in addition the following metadata files located in /scripts/testv/ folder are required: @@ -421,12 +423,17 @@ stv1MASA2TC48n.met stv2MASA1TC48c.met stv2MASA2TC48c.met +The detailed syntax of MASA metadata files can be found in 3GPP TS 26.258. + It is strongly recommended to align these files to the corresponding PCM audio files. The MASA metadata files can be generated with the latest version of the IVAS MASA C Reference Software, which was made available at https://www.3gpp.org/ftp/TSG_SA/WG4_CODEC/TSGS4_118-e/Docs/S4-220443.zip + +Object based audio metadata file +-------------------------------- For the ISM operation modes, in addition the following metadata files located at /scripts/testv/ folder are required: @@ -468,6 +475,9 @@ with the following meaning: The metadata reader accepts 1-8 values specified per line. If a value is not specified, the default value is assumed. + +HRTF filter file +---------------- For the HRTF filter File option, external HRTF filter Files are available in folder /scripts/binauralRenderer_interface/binaural_renderers_hrtf_data: @@ -475,6 +485,14 @@ ivas_binaural_16kHz.bin ivas_binaural_32kHz.bin ivas_binaural_48kHz.bin +The HRTF filter file has a specific container format with a header and a sequence of entries. The +detailed syntax can be found in 3GPP TS 26.258. + + +Head rotation trajectory file +----------------------------- + +[TBD] For the Head rotation operation modes, external trajectory files are available: @@ -484,11 +502,48 @@ headrot_case01_3000_q.csv headrot_case02_3000_q.csv headrot_case03_3000_q.csv + +Reference rotation/vector file +------------------------------ +The external reference orientation of the orientation tracking feature can either be provided as a +rotation (Quaternion or Euler angles) or as a pair of 3-dimensional positions (listener position +and acoustic reference position). + +The Reference Rotation format is identical to Head rotation trajectory file. + +The Reference Vector file format describes a pair of x/y/z positions, one for the listener and one +for the acoustic reference. The acoustic reference direction is defined by the vector from the +listener towards the acoustic reference position. The reference vector file is a CSV file with +comma as separator. Each line must contain a listener and an acoustic reference position in the +following order: + x axis position of the listener. + y axis position of the listener. + z axis position of the listener. + x axis position of the acoustic reference. + y axis position of the acoustic reference. + z axis position of the acoustic reference. + For Reference vector specified by external trajectory file, example files are available in folder /scripts/trajectories. +External orientation file +------------------------- + +[TBD] + + +Renderer config file +-------------------- +The renderer configuration file provides metadata for controlling the rendering process. This metadata +includes acoustics environment parameters and source directivity. The data can be provided using +binary bitstream or a text file. + +The renderer configuration text file can additionally be used to configure the pre-rendering step +of the split binaural renderer. All split renderer parameters are optional. + +The detailed syntax can be found in 3GPP TS 26.258. -For the Renderer configuration option operation modes, external configuration files are available, e.g.: +Example renderer configuration files are available, e.g.: rend_rend_config_hospital_patientroom.cfg rend_config_recreation.cfg -- GitLab From 672171d2870213cdaef2bd2b9a2f6d3dacc14b4f Mon Sep 17 00:00:00 2001 From: vaclav Date: Fri, 11 Aug 2023 07:27:42 +0200 Subject: [PATCH 03/10] file -> File --- readme.txt | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/readme.txt b/readme.txt index b1efb09ec5..a176a68fd1 100644 --- a/readme.txt +++ b/readme.txt @@ -265,12 +265,12 @@ Options: -T File : Head rotation specified by external trajectory File -otr tracking_type : Head orientation tracking type: 'none', 'ref', 'avg', 'ref_vec' or 'ref_vec_lev' (only for binaural rendering) --rf File : Reference rotation specified by external trajectory file +-rf File : Reference rotation specified by external trajectory File works only in combination with '-otr ref' mode --rvf File : Reference vector specified by external trajectory file +-rvf File : Reference vector specified by external trajectory File works only in combination with '-otr ref_vec' and 'ref_vec_lev' modes -render_config File : Renderer configuration option with parameters specified in File --om File : MD output file for BINAURAL_SPLIT_PCM output +-om File : MD output File for BINAURAL_SPLIT_PCM output -non_diegetic_pan P : panning mono non-diegetic sound to stereo -90<= P <=90, left or l or 90->left, right or r or -90->right, center or c or 0->middle -q : Quiet mode, limit printouts to terminal, default is deactivated @@ -287,9 +287,9 @@ Valid options: -im Files : Metadata files for ISM (one file per object) or MASA inputs or BINAURAL_SPLIT_PCM input mode -o File : Output audio File -of Format : Audio Format of output file - Alternatively, it can be a custom loudspeaker layout file + Alternatively, it can be a custom loudspeaker layout File -fs : Input sampling rate in kHz (16, 32, 48) - required only with raw PCM inputs --tf File : Head rotation trajectory file for simulation of head tracking (only for binaural outputs) +-tf File : Head rotation trajectory File for simulation of head tracking (only for binaural outputs) -om File : Coded metadata File for BINAURAL_SPLIT_PCM output mode -prbfi File : Split rendering option: bfi File -rf File : Reference rotation trajectory File for simulation of head tracking (only for binaural outputs) @@ -310,7 +310,7 @@ Valid options: -q : Quiet mode, limit printouts to terminal, default is deactivated -g : Input gain (linear, not in dB) to be applied to input audio file -l : List supported audio formats --exof : External orientation trajectory file for simulation of external orientations +-exof : External orientation trajectory File for simulation of external orientations -smd : Metadata Synchronization Delay in ms, Default is 0. Quantized by 5ms subframes. -- GitLab From f46d0901c6d9897644a5acd767ff3251ecb880af Mon Sep 17 00:00:00 2001 From: vaclav Date: Mon, 14 Aug 2023 09:33:48 +0200 Subject: [PATCH 04/10] - address comments - remove split rendering description --- readme.txt | 22 +++++++--------------- 1 file changed, 7 insertions(+), 15 deletions(-) diff --git a/readme.txt b/readme.txt index a176a68fd1..c7b881c1b8 100644 --- a/readme.txt +++ b/readme.txt @@ -31,9 +31,9 @@ *******************************************************************************************************/ -These files represent a codec candidate to the IVAS -Extension to the 3GPP EVS Codec floating-point C simulation. All code is -written in C. The system is implemented as three separate programs: +These files represent the 3GPP EVS Codec Extension for Immersive Voice and +Audio Services (IVAS) floating-point C simulation. All code is writtten +in C. The system is implemented as three separate programs: IVAS_cod Encoder IVAS_dec Decoder @@ -62,8 +62,8 @@ If the software is to be run on some other platform than PC, such as an HP (HP-UX) or a Sun, then binary files will need to be modified by swapping the byte order in the files. -The input and output files (*.8k, *.16k, *.32k, *.48k) are 16-bit signed -binary files with 8/16/32/48 kHz sampling rate with no headers. Alternatively, +The input and output files (*.8k, *.16k, *.32k, *.48k) are 16-bit integer +PCM files with 8/16/32/48 kHz sampling rate with no headers. Alternatively, the input and output files are WAV files. The Encoder produces bitstream files in either ITU G.192 or MIME file @@ -128,7 +128,6 @@ should have the following structure: |-- lib_debug |-- lib_dec |-- lib_enc - |-- lib_lc3plus |-- lib_rend |-- lib_util |-- readme.txt @@ -232,8 +231,7 @@ Usage for IVAS: IVAS_dec.exe [Options] OutputConf Fs bitstream_file output_file Mandatory parameters: --------------------- OutputConf : Output configuration: MONO, STEREO, 5_1, 7_1, 5_1_2, 5_1_4, 7_1_4, FOA, - HOA2, HOA3, BINAURAL, BINAURAL_ROOM_IR, BINAURAL_ROOM_REVERB, - BINAURAL_SPLIT_CODED, BINAURAL_SPLIT_PCM, EXT + HOA2, HOA3, BINAURAL, BINAURAL_ROOM_IR, BINAURAL_ROOM_REVERB, EXT By default, channel order and loudspeaker positions are equal to the encoder. For loudspeaker outputs, OutputConf can be a custom loudspeaker layout file. See below for details. @@ -270,7 +268,6 @@ Options: -rvf File : Reference vector specified by external trajectory File works only in combination with '-otr ref_vec' and 'ref_vec_lev' modes -render_config File : Renderer configuration option with parameters specified in File --om File : MD output File for BINAURAL_SPLIT_PCM output -non_diegetic_pan P : panning mono non-diegetic sound to stereo -90<= P <=90, left or l or 90->left, right or r or -90->right, center or c or 0->middle -q : Quiet mode, limit printouts to terminal, default is deactivated @@ -284,14 +281,12 @@ Usage: IVAS_rend [options] Valid options: -i File : Input audio File (WAV, raw PCM or scene description file) -if Format : Audio Format of input file (e.g. 5_1 or HOA3 or META, use -l for a list) --im Files : Metadata files for ISM (one file per object) or MASA inputs or BINAURAL_SPLIT_PCM input mode +-im Files : Metadata files for ISM (one file per object) or MASA inputs -o File : Output audio File -of Format : Audio Format of output file Alternatively, it can be a custom loudspeaker layout File -fs : Input sampling rate in kHz (16, 32, 48) - required only with raw PCM inputs -tf File : Head rotation trajectory File for simulation of head tracking (only for binaural outputs) --om File : Coded metadata File for BINAURAL_SPLIT_PCM output mode --prbfi File : Split rendering option: bfi File -rf File : Reference rotation trajectory File for simulation of head tracking (only for binaural outputs) -rvf File : Reference vector trajectory File for simulation of head tracking (only for binaural outputs) -hrtf File : Custom HRTF File for binaural rendering (only for binaural outputs) @@ -538,9 +533,6 @@ The renderer configuration file provides metadata for controlling the rendering includes acoustics environment parameters and source directivity. The data can be provided using binary bitstream or a text file. -The renderer configuration text file can additionally be used to configure the pre-rendering step -of the split binaural renderer. All split renderer parameters are optional. - The detailed syntax can be found in 3GPP TS 26.258. Example renderer configuration files are available, e.g.: -- GitLab From 88034bfc5a47007f91631a46156562cfd068ba2a Mon Sep 17 00:00:00 2001 From: vaclav Date: Mon, 14 Aug 2023 10:49:18 +0200 Subject: [PATCH 05/10] - provide [TBD] information - introduce Readme_split_rendering.txt --- readme.txt | 48 +++- readme_split_rendering.txt | 543 +++++++++++++++++++++++++++++++++++++ 2 files changed, 578 insertions(+), 13 deletions(-) create mode 100644 readme_split_rendering.txt diff --git a/readme.txt b/readme.txt index c7b881c1b8..cc4ed7d9c3 100644 --- a/readme.txt +++ b/readme.txt @@ -230,15 +230,15 @@ Usage for IVAS: IVAS_dec.exe [Options] OutputConf Fs bitstream_file output_file Mandatory parameters: --------------------- -OutputConf : Output configuration: MONO, STEREO, 5_1, 7_1, 5_1_2, 5_1_4, 7_1_4, FOA, - HOA2, HOA3, BINAURAL, BINAURAL_ROOM_IR, BINAURAL_ROOM_REVERB, EXT - By default, channel order and loudspeaker positions are equal to the - encoder. For loudspeaker outputs, OutputConf can be a custom loudspeaker - layout file. See below for details. - Parameter is only used when decoding IVAS bitstream. -Fs : Output sampling rate in kHz (8, 16, 32 or 48) -bitstream_file : Input bitstream filename or RTP packet filename (in VOIP mode) -output_file : Output audio filename +OutputConf : Output configuration: MONO, STEREO, 5_1, 7_1, 5_1_2, 5_1_4, 7_1_4, FOA, + HOA2, HOA3, BINAURAL, BINAURAL_ROOM_IR, BINAURAL_ROOM_REVERB, EXT + By default, channel order and loudspeaker positions are equal to the + encoder. For loudspeaker outputs, OutputConf can be a custom loudspeaker + layout file. See below for details. + Parameter is only used when decoding IVAS bitstream. +Fs : Output sampling rate in kHz (8, 16, 32 or 48) +bitstream_file : Input bitstream filename or RTP packet filename (in VOIP mode) +output_file : Output audio filename Options: -------- @@ -276,9 +276,10 @@ Options: The usage of the "IVAS_rend" program is as follows: --------------------------------------------------- -Usage: IVAS_rend [options] +Usage: IVAS_rend [Options] -Valid options: +Options: +-------- -i File : Input audio File (WAV, raw PCM or scene description file) -if Format : Audio Format of input file (e.g. 5_1 or HOA3 or META, use -l for a list) -im Files : Metadata files for ISM (one file per object) or MASA inputs @@ -487,7 +488,22 @@ detailed syntax can be found in 3GPP TS 26.258. Head rotation trajectory file ----------------------------- -[TBD] +Input data representing the current rotation of the listeners head can be provided to the decoder +in an ASCII formatted file comprising four columns separated by commas. These columns contain +floating-point numbers representing either a quaternion or a Euler angle. The distinction between +these two input formats is made by a magic number in the first column. If this value is set to -3.0, +it is assumed that the remaining three columns contain three Euler angles. Otherwise, all four +columns are interpreted as a Quaternion. The input is expected to have one line for each subframe of 5 ms. + +In the case of Quaternion-based input, the columns are the w, x, y, z components of a unit quaternion. +Proper normalization to 1 mustshall be maintained in the input. The coordinate system is defined such that +the x-axis points from the left to the right ear, the y axis points into the direction of view, and the +z axis point from bottom to top. The origin is in the center of the head. + +In the case of Euler angle input, the first column contains the magic number -3.0, and the next three +columns are the Euler angles yaw, pitch, and roll. The rotations are applied in the order yaw-pitch-roll. +The yaw angle rotates around the z axis, the pitch angle rotates aroud the new y axis, and the roll angle +rotates around the new x axis. For the Head rotation operation modes, external trajectory files are available: @@ -521,10 +537,16 @@ following order: For Reference vector specified by external trajectory file, example files are available in folder /scripts/trajectories. + External orientation file ------------------------- +The external orientation file provides orientation information for any non-listener dependent orientations. +The orientations shall be given as floating point quaternions to the decoder/renderer in (w, x, y, z) order. +Additional information may be given as HeadRotIndicator, ExtOriIndicator, ExtIntrpFlag and ExtIntrpNFrames. +Each entry line represents a sub-frame entry, where the sub-frame resolution is 5ms. In the processing, the +quaternions are inverted to act as a rotation instead of orientation. -[TBD] +The detailed syntax can be found in 3GPP TS 26.258. Renderer config file diff --git a/readme_split_rendering.txt b/readme_split_rendering.txt new file mode 100644 index 0000000000..c7b881c1b8 --- /dev/null +++ b/readme_split_rendering.txt @@ -0,0 +1,543 @@ +/****************************************************************************************************** + + (C) 2022-2023 IVAS codec Public Collaboration with portions copyright Dolby International AB, Ericsson AB, + Fraunhofer-Gesellschaft zur Foerderung der angewandten Forschung e.V., Huawei Technologies Co. LTD., + Koninklijke Philips N.V., Nippon Telegraph and Telephone Corporation, Nokia Technologies Oy, Orange, + Panasonic Holdings Corporation, Qualcomm Technologies, Inc., VoiceAge Corporation, and other + contributors to this repository. All Rights Reserved. + + This software is protected by copyright law and by international treaties. + The IVAS codec Public Collaboration consisting of Dolby International AB, Ericsson AB, + Fraunhofer-Gesellschaft zur Foerderung der angewandten Forschung e.V., Huawei Technologies Co. LTD., + Koninklijke Philips N.V., Nippon Telegraph and Telephone Corporation, Nokia Technologies Oy, Orange, + Panasonic Holdings Corporation, Qualcomm Technologies, Inc., VoiceAge Corporation, and other + contributors to this repository retain full ownership rights in their respective contributions in + the software. This notice grants no license of any kind, including but not limited to patent + license, nor is any license granted by implication, estoppel or otherwise. + + Contributors are required to enter into the IVAS codec Public Collaboration agreement before making + contributions. + + This software is provided "AS IS", without any express or implied warranties. The software is in the + development stage. It is intended exclusively for experts who have experience with such software and + solely for the purpose of inspection. All implied warranties of non-infringement, merchantability + and fitness for a particular purpose are hereby disclaimed and excluded. + + Any dispute, controversy or claim arising under or in relation to providing this software shall be + submitted to and settled by the final, binding jurisdiction of the courts of Munich, Germany in + accordance with the laws of the Federal Republic of Germany excluding its conflict of law rules and + the United Nations Convention on Contracts on the International Sales of Goods. + +*******************************************************************************************************/ + + +These files represent the 3GPP EVS Codec Extension for Immersive Voice and +Audio Services (IVAS) floating-point C simulation. All code is writtten +in C. The system is implemented as three separate programs: + + IVAS_cod Encoder + IVAS_dec Decoder + IVAS_rend Renderer + +For encoding using the coder program, the input is a binary +audio file (*.8k, *.16k, *.32k, *.48k) and the output is a binary +encoded parameter file (*.192). For decoding using the decoder program, +the input is a binary parameter file (*.192) and the output is a binary +synthesized audio file (*.8k, *.16k, *.32k, *.48k). For certain audio +formats (ISM, MASA), there are additional metadata files required. Audio +channels are interleaved in the input and output audio file. + + + FILE FORMATS: + ============= + +The file format of the supplied binary data (*.8k, *.16k, *.32k, *.48k, +*.192) is 16-bit binary data which is read and written in 16 bit words. +The data is therefore platform DEPENDENT. +The files contain only data, i.e., there is no header. +The test files included in this package are "PC" format, meaning that the +least signification byte of the 16-bit word comes first in the files. + +If the software is to be run on some other platform than PC, +such as an HP (HP-UX) or a Sun, then binary files will need to be modified +by swapping the byte order in the files. + +The input and output files (*.8k, *.16k, *.32k, *.48k) are 16-bit integer +PCM files with 8/16/32/48 kHz sampling rate with no headers. Alternatively, +the input and output files are WAV files. + +The Encoder produces bitstream files in either ITU G.192 or MIME file +storage format. + +Using ITU G.192 format: + +For every 20 ms input audio frame, the encoded bitstream contains the +following data: + + Word16 SyncWord + Word16 DataLen + Word16 1st Databit + Word16 2nd DataBit + . + . + . + Word16 Nth DataBit + + +The SyncWord from the encoder is always 0x6b21. If decoder receives +SyncWord as 0x6b20 it indicates that the current frame was received in +error (bad frame). + +The DataLen parameter gives the number of audio data bits in the +frame. For example using DTX, DataLen for NO_DATA frames is zero. + +Each bit is presented as follows: Bit 0 = 0x007f, Bit 1 = 0x0081. + +Using MIME file storage format: + +The MIME file storage format is a byte based format which is +appropriate for media file storage or as format for email/MMS +attachments. + +Encoder: With the "-mime" option, the encoder always produces EVS-mime +storage format specified in TS26.445 Annex.2.6. The AMRWB-mime(RFC4867) +storage format is not supported by the encoder. + +Decoder: With the "-mime" option, the decoder can parse both EVS-mime +format storage files and AMRWB-mime (RFC4867) storage format files. +The decoder automatically distinguishes between the two +mime storage formats by reading the initial Magic Word in the bitstream +file. The EVS-mime storage format is described in TS 26.445, Annex +A.2.6. The AMRWB-mime storage format is described in RFC-4867. + + + INSTALLING THE SOFTWARE + ======================= + +Installing the software on the PC: + +First unpack the compressed folder into your directory. After that you +should have the following structure: + +. +`-- c-code + |-- Makefile + |-- Workspace_msvc + |-- apps + |-- lib_com + |-- lib_debug + |-- lib_dec + |-- lib_enc + |-- lib_rend + |-- lib_util + |-- readme.txt + +The package includes a Makefile for gcc, which has been verified on +32-bit Linux systems. The code can be compiled by entering the directory +"c-code" and typing the command: make. The resulting encoder/decoder/renderer +executables are named "IVAS_cod", "IVAS_dec", and "IVAS_rend". All reside +in the c-code directory. + +The package also includes a solution-file for Microsoft Visual Studio 2017 (x86). +To compile the code, please open "Workspace_msvc\Workspace_msvc.sln" and build +"encoder" for the encoder, "decoder" for the decoder, and "renderer" for the +renderer executable. The resulting encoder/decoder/renderer executables are +"IVAS_cod.exe", "IVAS_dec.exe", and "IVAS_rend.exe". All reside in the c-code +main directory. + + + RUNNING THE SOFTWARE + ==================== + +The usage of the "IVAS_cod" program is as follows: +-------------------------------------------------- + +Usage: IVAS_cod.exe [Options] R Fs input_file bitstream_file + +Mandatory parameters: +--------------------- +R : Bitrate in bps, + for EVS native modes R = (5900*, 7200, 8000, 9600, 13200, 16400, + 24400, 32000, 48000, 64000, 96000, 128000) + *VBR mode (average bitrate), + for AMR-WB IO modes R = (6600, 8850, 12650, 14250, 15850, 18250, + 19850, 23050, 23850) + for IVAS stereo R = (13200, 16400, 24400, 32000, 48000, 64000, 80000, + 96000, 128000, 160000, 192000, 256000) + for IVAS ISM R = 13200 for 1 ISM, 16400 for 1 ISM and 2 ISM, + (24400, 32000, 48000, 64000, 80000, 96000,128000) + for 2 ISM, 3 ISM and 4 ISM also 160000, 192000, 256000 + for 3 ISM and 4 ISM also 384000 + for 4 ISM also 512000 + for IVAS SBA, MASA, MC, ISM-MASA, and ISM-SBA R=(13200, 16400, 24400, 32000, + 48000, 64000, 80000, 96000, 128000, 160000, 192000, 256000, 384000, 512000) + Alternatively, R can be a bitrate switching file which consists of R values + indicating the bitrate for each frame in bps. These values are stored in + binary format using 4 bytes per value +Fs : Input sampling rate in kHz, Fs = (8, 16, 32 or 48) +input_file : Input audio filename +bitstream_file : Output bitstream filename + +Options: +-------- +EVS mono is default, for IVAS choose one of the following: -stereo, -ism, -sba, -masa, -mc +-stereo : Stereo format +-ism [+]Ch Files : ISM format + where Ch specifies the number of ISMs (1-4) + where positive (+) indicates extended metadata (only 64 kbps and up) + and Files specify input files containing metadata, one file per object + (use NULL for no input metadata) +-sba +/-Order : Scene Based Audio input format (Ambisonics ACN/SN3D), + where Order specifies the Ambisionics order (1-3), + where positive (+) means full 3D and negative (-) only 2D/planar components to be coded +-masa Ch File : MASA format + where Ch specifies the number of MASA input/transport channels (1 or 2): + and File specifies input file containing parametric MASA metadata +-ism_masa IsmCh MasaCh IsmFiles MasaFile : MASA and ISM format + where IsmCh specifies the number of ISMs (1-4),\n" ); + MasaCh specifies the number of MASA input/transport channels (1-2), + IsmFiles specify input files containing metadata, one file per object, + and MasaFile specifies input file containing parametric MASA metadata +-mc InputConf : Multi-channel format + where InputConf specifies the channel configuration: 5_1, 7_1, 5_1_2, 5_1_4, 7_1_4 + Loudspeaker positions are assumed to have azimuth and elevation as per + ISO/IEC 23091-3:2018 Table 3. Channel order is as per ISO/IEC 23008-3:2015 Table 95. + See below for details. +-dtx D : Activate DTX mode, D = (0, 3-100) is the SID update rate + where 0 = adaptive, 3-100 = fixed in number of frames, default is deactivated +-dtx : Activate DTX mode with a SID update rate of 8 frames + Note: DTX is supported in EVS, stereo, ISM, MASA, and SBA up to 80kbps +-rf p o : Activate channel-aware mode in EVS for WB and SWB signal at 13.2kbps, + where FEC indicator, p: LO or HI, and FEC offset, o: 2, 3, 5, or 7 in number of frames. + Alternatively p and o can be replaced by a rf configuration file with each line + contains the values of p and o separated by a space, default is deactivated +-max_band B : Activate bandwidth limitation, B = (NB, WB, SWB or FB) + alternatively, B can be a text file where each line contains "nb_frames B" +-no_delay_cmp : Turn off delay compensation +-stereo_dmx_evs : Stereo downmix function for EVS +-mime : Mime output bitstream file format + The encoder produces TS26.445 Annex.2.6 Mime Storage Format, (not RFC4867 Mime Format). + default output bitstream file format is G.192 +-bypass mode : SBA PCA by-pass, mode = (1, 2), 1 = PCA off, 2 = signal adaptive, default is 1 +-q : Quiet mode, limit printouts to terminal, default is deactivated + + +The usage of the "IVAS_dec" program is as follows: +-------------------------------------------------- + +Usage for EVS: IVAS_dec.exe [Options] Fs bitstream_file output_file +Usage for IVAS: IVAS_dec.exe [Options] OutputConf Fs bitstream_file output_file + +Mandatory parameters: +--------------------- +OutputConf : Output configuration: MONO, STEREO, 5_1, 7_1, 5_1_2, 5_1_4, 7_1_4, FOA, + HOA2, HOA3, BINAURAL, BINAURAL_ROOM_IR, BINAURAL_ROOM_REVERB, EXT + By default, channel order and loudspeaker positions are equal to the + encoder. For loudspeaker outputs, OutputConf can be a custom loudspeaker + layout file. See below for details. + Parameter is only used when decoding IVAS bitstream. +Fs : Output sampling rate in kHz (8, 16, 32 or 48) +bitstream_file : Input bitstream filename or RTP packet filename (in VOIP mode) +output_file : Output audio filename + +Options: +-------- +-VOIP : VoIP mode: RTP in G192 +-VOIP_hf_only=0 : VoIP mode: EVS RTP Payload Format hf_only=0 in rtpdump +-VOIP_hf_only=1 : VoIP mode: EVS RTP Payload Format hf_only=1 in rtpdump + The decoder may read rtpdump files containing TS26.445 Annex A.2.2 + EVS RTP Payload Format. The SDP parameter hf_only is required. + Reading RFC4867 AMR/AMR-WB RTP payload format is not supported. +-Tracefile TF : VoIP mode: Generate trace file named TF +-fec_cfg_file : Optimal channel aware configuration computed by the JBM + as described in Section 6.3.1 of TS26.448. The output is + written into a .txt file. Each line contains the FER indicator + (HI|LO) and optimal FEC offset. +-no_delay_cmp : Turn off delay compensation +-mime : Mime bitstream file format + The decoder may read both TS26.445 Annex.2.6 and RFC4867 Mime Storage + Format files, the magic word in the mime file is used to determine + which of the two supported formats is in use. + default bitstream file format is G.192 +-hrtf File : HRTF filter File used in BINAURAL rendering +-T File : Head rotation specified by external trajectory File +-otr tracking_type : Head orientation tracking type: 'none', 'ref', 'avg', 'ref_vec' + or 'ref_vec_lev' (only for binaural rendering) +-rf File : Reference rotation specified by external trajectory File + works only in combination with '-otr ref' mode +-rvf File : Reference vector specified by external trajectory File + works only in combination with '-otr ref_vec' and 'ref_vec_lev' modes +-render_config File : Renderer configuration option with parameters specified in File +-non_diegetic_pan P : panning mono non-diegetic sound to stereo -90<= P <=90, + left or l or 90->left, right or r or -90->right, center or c or 0->middle +-q : Quiet mode, limit printouts to terminal, default is deactivated + + +The usage of the "IVAS_rend" program is as follows: +--------------------------------------------------- + +Usage: IVAS_rend [options] + +Valid options: +-i File : Input audio File (WAV, raw PCM or scene description file) +-if Format : Audio Format of input file (e.g. 5_1 or HOA3 or META, use -l for a list) +-im Files : Metadata files for ISM (one file per object) or MASA inputs +-o File : Output audio File +-of Format : Audio Format of output file + Alternatively, it can be a custom loudspeaker layout File +-fs : Input sampling rate in kHz (16, 32, 48) - required only with raw PCM inputs +-tf File : Head rotation trajectory File for simulation of head tracking (only for binaural outputs) +-rf File : Reference rotation trajectory File for simulation of head tracking (only for binaural outputs) +-rvf File : Reference vector trajectory File for simulation of head tracking (only for binaural outputs) +-hrtf File : Custom HRTF File for binaural rendering (only for binaural outputs) +-rc File : Binaural renderer configuration File (only for binaural outputs) +-ndp P : Panning mono non-diegetic sound to stereo -90<= P <= 90 + left or l or 90->left, right or r or -90->right, center or c or 0 ->middle +-otr tracking_type : Head orientation tracking type: 'none', 'ref', 'avg' or `ref_vec` or `ref_vec_lev` (only for binaural outputs) +-lp Position : Output LFE position. Comma-delimited triplet of [gain, azimuth, elevation] where gain is linear + (like --gain, -g) and azimuth, elevation are in degrees. + If specified, overrides the default behavior which attempts to map input to output LFE channel(s) +-lm File : LFE panning matrix File (CSV table) containing a matrix of dimensions [ num_input_lfe x + num_output_channels ] with elements specifying linear routing gain (like --gain, -g). + If specified, overrides the output LFE position option and the default behavior which attempts to map + input to output LFE channel(s) +-ndc : Turn off delay compensation +-q : Quiet mode, limit printouts to terminal, default is deactivated +-g : Input gain (linear, not in dB) to be applied to input audio file +-l : List supported audio formats +-exof : External orientation trajectory File for simulation of external orientations +-smd : Metadata Synchronization Delay in ms, Default is 0. Quantized by 5ms subframes. + + + MULTICHANNEL LOUDSPEAKER INPUT / OUTPUT CONFIGURATIONS + ====================================================== +The loudspeaker positions for each MC layouts are assumed to have the following azimuth and elevation +(as per ISO/IEC 23091-3:2018 Table 3), 4th channel is LFE: + 5_1 -> CICP6: azi | 30| -30| 0| 0| 110|-110| + ele | 0| 0| 0| 0| 0| 0| + 7_1 -> CICP12: azi | 30| -30| 0| 0| 110|-110| 135|-135| + ele | 0| 0| 0| 0| 0| 0| 0| 0| + 5_1_2 -> CICP14: azi | 30| -30| 0| 0| 110|-110| 30| -30| + ele | 0| 0| 0| 0| 0| 0| 35| 35| + 5_1_4 -> CICP16: azi | 30| -30| 0| 0| 110|-110| 30| -30| 110|-110| + ele | 0| 0| 0| 0| 0| 0| 35| 35| 35| 35| + 7_1_4 -> CICP19: azi | 30| -30| 0| 0| 135|-135| 90| -90| 30| -30| 135|-135| + ele | 0| 0| 0| 0| 0| 0| 0| 0| 35| 35| 35| 35| +Position is not considered for the LFE channel. Channel order is as per ISO/IEC 23008-3:2015 Table 95. + +Additionally, at the decoder, OutputConf can be a custom loudspeaker layout file with the format: + azi0, azi1, ... aziN-1 + ele0, ele1, ... eleN-1 + LFE0 [optional] +Where the first two rows are comma separated azimuth and elevation positions of the N loudspeakers. +The output channel ordering is 0, 1, ... N-1. The third row contains an index "LFE0" (zero based) +specifying the output channel to which the LFE input will be routed if present. If the third row is +omitted, the LFE input is downmixed to all channels with a factor of 1/N. Position is not considered for +the LFE channel. +An example custom loudspeaker layout file is available: ls_setup_16ch_8+4+4.txt + + + + RUNNING THE SELF TEST + ===================== + +A codec verification script is available at https://forge.3gpp.org/rep/ivas-codec-pc/ivas-codec/ +in scripts/self_test.py. The script demonstrates how to use the software at several operating points +and compares the output to a reference version/implementation. +Please note: In order to keep the run-time short it does not cover all operating +points or complete coverage. + +Documentation on the self_test.py can be found as a part of scripts/README.md. + +Note: Running the self_test.py requires the input vectors in the folder scripts/testv. + +stv1ISM48s.wav - 1 channel (1 audio object), 48000 Hz, 1440000 samples +stv2ISM48s.wav - 2 channels (discrete audio objects), 48000 Hz, 1440000 samples per channel +stv2OA32c.wav - 9 channels (2nd order Ambisonics ACN/SN3D), 32000 Hz +stv2OA48c.wav - 9 channels (2nd order Ambisonics ACN/SN3D), 48000 Hz +stv3ISM48s.wav - 3 channels (discrete audio objects), 48000 Hz, 1440000 samples per channel +stv3OA32c.wav - 16 channels (3rd order Ambisonics ACN/SN3D), 32000 Hz, 288939 samples per channel +stv3OA48c.wav - 16 channels (3rd order Ambisonics ACN/SN3D), 48000 Hz, 433408 samples per channel +stv4ISM48s.wav - 4 channel (discrete audio objects), 48000 Hz, 1440000 samples per channel +stv4ISM48n.wav - 4 channel (discrete audio objects), 48000 Hz, noisy speech +stv8c.wav - 1 channel, 8000 Hz, clean speech/audio +stv8n.wav - 1 channel, 8000 Hz, noisy speech +stv16c.wav - 1 channel, 16000 Hz, 610307 samples, clean speech +stv16n.wav - 1 channel, 16000 Hz, 257024 samples, noisy speech +stv32c.wav - 1 channel, 32000 Hz, 1220613 samples, clean speech/audio +stv32n.wav - 1 channel, 32000 Hz, 514048 samples, noisy speech +stv48c.wav - 1 channel, 48000 Hz, 960000 samples, clean speech/audio +stv48n.wav - 1 channel, 48000 Hz, 931200 samples, noisy clean speech +stv51MC48c.wav - 6 channels (5.1 1..6 where 4th channel is LFE), 960000 samples per channel, 48000 Hz +stv512MC48c.wav - 8 channels (5.1+2 1..8 where 4th channel is LFE), 144000 samples per channel, 48000 Hz +stv514MC48c.wav - 10 channels (7.1+2 1..10 where 4th channel is LFE), 144000 samples per channel, 48000 Hz +stv71MC48c.wav - 8 channels (7.1 1..8 where 4th channel is LFE), 144000 samples per channel, 48000 Hz +stv714MC48c.wav - 12 channels (7.1+4 1..12 where 4th channel is LFE), 144000 samples per channel, 48000 Hz +stvFOA16c.wav - 4 channels (1st order Ambisonics ACN/SN3D), 16000 Hz, +stvFOA32c.wav - 4 channels (1st order Ambisonics ACN/SN3D), 32000 Hz, 288939 samples per channel +stvFOA48c.wav - 4 channels (1st order Ambisonics ACN/SN3D), 48000 Hz, 433408 samples per channel +stvST16c.wav - 2 channels, 16000 Hz, 329601 samples per channel, clean speech/audio +stvST16n.wav - 2 channels, 16000 Hz, 310401 samples per channel, noisy speech +stvST32c.wav - 2 channels, 32000 Hz, 659200 samples per channel, clean speech/audio +stvST32n.wav - 2 channels, 32000 Hz, 620800 samples per channel, noisy speech +stvST48c.wav - 2 channels, 48000 Hz, 988800 samples per channel, clean speech/audio +stvST48n.wav - 2 channels, 48000 Hz, 931200 samples per channel, noisy speech +stv1MASA1TC48c.wav - 1 channel (1 MASA 1 transport channel), 48000 Hz, 48000 Hz, 144000 samples +stv1MASA1TC48n.wav - 1 channel (1 MASA 1 transport channel), 48000 Hz, 48000 Hz, 963840 samples +stv1MASA2TC48c.wav - 2 channels (2 MASA 2 transport channels), 48000 Hz, 48000 Hz, 288000 samples per channel +stv1MASA2TC48n.wav - 2 channels (2 MASA 2 transport channels), 48000 Hz, 48000 Hz, 963840 samples per channel +stv2MASA1TC48c.wav - 1 channel (1 MASA 1 transport channel), 48000 Hz, 48000 Hz, 288000 +stv2MASA2TC48c.wav - 2 channels (2 MASA 2 transport channels), 48000 Hz, 48000 Hz, 144000 samples per channel +stvOMASA_1ISM_1MASA2TC48c.wav - 3 channels (1 discrete audio object and 1 MASA 2 transport channels), 48000 Hz +stvOMASA_1ISM_2MASA1TC32c.wav - 2 channels (1 discrete audio object and 2 MASA 1 transport channel), 32000 Hz +stvOMASA_1ISM_2MASA2TC48c.wav - 3 channels (1 discrete audio object and 2 MASA 2 transport channels), 48000 Hz +stvOMASA_2ISM_1MASA1TC16c.wav - 3 channels (2 discrete audio object and 1 MASA 1 transport channel), 48000 Hz +stvOMASA_2ISM_1MASA2TC48c.wav - 4 channels (2 discrete audio object and 1 MASA 2 transport channels), 16000 Hz +stvOMASA_2ISM_2MASA2TC48c.wav - 4 channels (2 discrete audio object and 2 MASA 2 transport channels), 48000 Hz +stvOMASA_3ISM_1MASA1TC32c.wav - 4 channels (3 discrete audio object and 1 MASA 1 transport channel), 32000 Hz +stvOMASA_3ISM_1MASA2TC16c.wav - 5 channels (3 discrete audio object and 1 MASA 2 transport channels), 16000 Hz +stvOMASA_3ISM_1MASA2TC32c.wav - 5 channels (3 discrete audio object and 1 MASA 2 transport channels), 32000 Hz +stvOMASA_3ISM_1MASA2TC48c.wav - 5 channels (3 discrete audio object and 1 MASA 2 transport channels), 32000 Hz +stvOMASA_3ISM_2MASA1TC48c.wav - 4 channels (3 discrete audio object and 2 MASA 1 transport channel), 48000 Hz +stvOMASA_3ISM_2MASA2TC32c.wav - 5 channels (3 discrete audio object and 2 MASA 2 transport channels), 32000 Hz +stvOMASA_3ISM_2MASA2TC48c.wav - 5 channels (3 discrete audio object and 2 MASA 2 transport channels), 48000 Hz +stvOMASA_4ISM_1MASA1TC48c.wav - 5 channels (4 discrete audio object and 1 MASA 1 transport channel), 48000 Hz +stvOMASA_4ISM_1MASA2TC48c.wav - 6 channels (4 discrete audio object and 1 MASA 2 transport channels), 48000 Hz +stvOMASA_4ISM_2MASA1TC48c.wav - 5 channels (4 discrete audio object and 2 MASA 1 transport channel), 48000 Hz +stvOMASA_4ISM_2MASA2TC48c.wav - 6 channels (4 discrete audio object and 2 MASA 2 transport channels), 48000 Hz + +MASA metadata file +------------------ +For the MASA operation modes, in addition the following metadata files +located in /scripts/testv/ folder are required: + +stv1MASA1TC48c.met +stv1MASA1TC48n.met +stv1MASA2TC48c.met +stv1MASA2TC48n.met +stv2MASA1TC48c.met +stv2MASA2TC48c.met + +The detailed syntax of MASA metadata files can be found in 3GPP TS 26.258. + +It is strongly recommended to align these files to the corresponding +PCM audio files. The MASA metadata files can be generated with the +latest version of the IVAS MASA C Reference Software, which was made +available at +https://www.3gpp.org/ftp/TSG_SA/WG4_CODEC/TSGS4_118-e/Docs/S4-220443.zip + + +Object based audio metadata file +-------------------------------- +For the ISM operation modes, in addition the following metadata files +located at /scripts/testv/ folder are required: + +stvISM1.csv +stvISM2.csv +stvISM3.csv +stvISM4.csv + +These are comma separated files (csv) which indicate the per object position +in the format: +azimuth, elevation, radius, spread, gain, yaw, pitch, non-diegetic + +Example metadata line with default values: +0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 0 + +with the following meaning: +| Parameter | format, value range | meaning +--------------------------------------------------------------------------------------------------- +| azimuth | float, [-180,180] | azimuth or panning; positive indicates left; default: 0 +--------------------------------------------------------------------------------------------------- +| elevation | float, [-90,90] | elevation; positive indicates up; default: 0 +--------------------------------------------------------------------------------------------------- +| radius | float, [0, 15.75] | radius (extended metadata); default: 1 +--------------------------------------------------------------------------------------------------- +| spread | float, [0,360] | spread in angles from 0...360 deg; default: 0 +--------------------------------------------------------------------------------------------------- +| gain | float, [0,1] | gain; default: 1 +--------------------------------------------------------------------------------------------------- +| yaw | float, [-180,180] | yaw (extended metadata); positive indicates left; default: 0 +--------------------------------------------------------------------------------------------------- +| pitch | float, [-90,90] | pitch (extended metadata); positive indicates up; default: 0 +--------------------------------------------------------------------------------------------------- +| non-diegetic | float*, [0 1] | Flag for activation of non-diegetic rendering; default: 0 +| | if Flag is set to 1, panning gain is specified by azimuth. +| | Value between [-90,90], 90 left, -90 right, 0 center +--------------------------------------------------------------------------------------------------- +*Read as float value for convenience, but used as an integer flag internally. + +The metadata reader accepts 1-8 values specified per line. If a value is not specified, the default +value is assumed. + + +HRTF filter file +---------------- +For the HRTF filter File option, external HRTF filter Files are available in folder +/scripts/binauralRenderer_interface/binaural_renderers_hrtf_data: + +ivas_binaural_16kHz.bin +ivas_binaural_32kHz.bin +ivas_binaural_48kHz.bin + +The HRTF filter file has a specific container format with a header and a sequence of entries. The +detailed syntax can be found in 3GPP TS 26.258. + + +Head rotation trajectory file +----------------------------- + +[TBD] + +For the Head rotation operation modes, external trajectory files are available: + +headrot.csv +headrot_case00_3000_q.csv +headrot_case01_3000_q.csv +headrot_case02_3000_q.csv +headrot_case03_3000_q.csv + + +Reference rotation/vector file +------------------------------ +The external reference orientation of the orientation tracking feature can either be provided as a +rotation (Quaternion or Euler angles) or as a pair of 3-dimensional positions (listener position +and acoustic reference position). + +The Reference Rotation format is identical to Head rotation trajectory file. + +The Reference Vector file format describes a pair of x/y/z positions, one for the listener and one +for the acoustic reference. The acoustic reference direction is defined by the vector from the +listener towards the acoustic reference position. The reference vector file is a CSV file with +comma as separator. Each line must contain a listener and an acoustic reference position in the +following order: + x axis position of the listener. + y axis position of the listener. + z axis position of the listener. + x axis position of the acoustic reference. + y axis position of the acoustic reference. + z axis position of the acoustic reference. + +For Reference vector specified by external trajectory file, example files are available in folder +/scripts/trajectories. + +External orientation file +------------------------- + +[TBD] + + +Renderer config file +-------------------- +The renderer configuration file provides metadata for controlling the rendering process. This metadata +includes acoustics environment parameters and source directivity. The data can be provided using +binary bitstream or a text file. + +The detailed syntax can be found in 3GPP TS 26.258. + +Example renderer configuration files are available, e.g.: + +rend_rend_config_hospital_patientroom.cfg +rend_config_recreation.cfg +rend_config_renderer.cfg + -- GitLab From 133d0e551fcd3d6fa6b55b58f3d882dd1fafb735 Mon Sep 17 00:00:00 2001 From: vaclav Date: Mon, 14 Aug 2023 10:54:30 +0200 Subject: [PATCH 06/10] typo --- readme_split_rendering.txt | 511 +++---------------------------------- 1 file changed, 41 insertions(+), 470 deletions(-) diff --git a/readme_split_rendering.txt b/readme_split_rendering.txt index c7b881c1b8..ca8f3e7a48 100644 --- a/readme_split_rendering.txt +++ b/readme_split_rendering.txt @@ -30,94 +30,21 @@ *******************************************************************************************************/ +For the IVAS Readme.txt, please refer to Readme.txt. -These files represent the 3GPP EVS Codec Extension for Immersive Voice and -Audio Services (IVAS) floating-point C simulation. All code is writtten -in C. The system is implemented as three separate programs: +This readme_split_rendering.txt describes a usage of the binaural split +rendering feature in the IVAS codec. This feature is implemented as part of +the following two separate programs: - IVAS_cod Encoder IVAS_dec Decoder IVAS_rend Renderer - -For encoding using the coder program, the input is a binary -audio file (*.8k, *.16k, *.32k, *.48k) and the output is a binary -encoded parameter file (*.192). For decoding using the decoder program, -the input is a binary parameter file (*.192) and the output is a binary -synthesized audio file (*.8k, *.16k, *.32k, *.48k). For certain audio -formats (ISM, MASA), there are additional metadata files required. Audio -channels are interleaved in the input and output audio file. - - - FILE FORMATS: - ============= - -The file format of the supplied binary data (*.8k, *.16k, *.32k, *.48k, -*.192) is 16-bit binary data which is read and written in 16 bit words. -The data is therefore platform DEPENDENT. -The files contain only data, i.e., there is no header. -The test files included in this package are "PC" format, meaning that the -least signification byte of the 16-bit word comes first in the files. - -If the software is to be run on some other platform than PC, -such as an HP (HP-UX) or a Sun, then binary files will need to be modified -by swapping the byte order in the files. - -The input and output files (*.8k, *.16k, *.32k, *.48k) are 16-bit integer -PCM files with 8/16/32/48 kHz sampling rate with no headers. Alternatively, -the input and output files are WAV files. - -The Encoder produces bitstream files in either ITU G.192 or MIME file -storage format. - -Using ITU G.192 format: - -For every 20 ms input audio frame, the encoded bitstream contains the -following data: - - Word16 SyncWord - Word16 DataLen - Word16 1st Databit - Word16 2nd DataBit - . - . - . - Word16 Nth DataBit - - -The SyncWord from the encoder is always 0x6b21. If decoder receives -SyncWord as 0x6b20 it indicates that the current frame was received in -error (bad frame). - -The DataLen parameter gives the number of audio data bits in the -frame. For example using DTX, DataLen for NO_DATA frames is zero. - -Each bit is presented as follows: Bit 0 = 0x007f, Bit 1 = 0x0081. - -Using MIME file storage format: - -The MIME file storage format is a byte based format which is -appropriate for media file storage or as format for email/MMS -attachments. - -Encoder: With the "-mime" option, the encoder always produces EVS-mime -storage format specified in TS26.445 Annex.2.6. The AMRWB-mime(RFC4867) -storage format is not supported by the encoder. - -Decoder: With the "-mime" option, the decoder can parse both EVS-mime -format storage files and AMRWB-mime (RFC4867) storage format files. -The decoder automatically distinguishes between the two -mime storage formats by reading the initial Magic Word in the bitstream -file. The EVS-mime storage format is described in TS 26.445, Annex -A.2.6. The AMRWB-mime storage format is described in RFC-4867. + INSTALLING THE SOFTWARE ======================= -Installing the software on the PC: - -First unpack the compressed folder into your directory. After that you -should have the following structure: +Same as described in Readme.txt while the structure looks as follows: . `-- c-code @@ -128,416 +55,60 @@ should have the following structure: |-- lib_debug |-- lib_dec |-- lib_enc + |-- lib_lc3plus |-- lib_rend |-- lib_util |-- readme.txt + |-- readme_split_rendering.txt -The package includes a Makefile for gcc, which has been verified on -32-bit Linux systems. The code can be compiled by entering the directory -"c-code" and typing the command: make. The resulting encoder/decoder/renderer -executables are named "IVAS_cod", "IVAS_dec", and "IVAS_rend". All reside -in the c-code directory. - -The package also includes a solution-file for Microsoft Visual Studio 2017 (x86). -To compile the code, please open "Workspace_msvc\Workspace_msvc.sln" and build -"encoder" for the encoder, "decoder" for the decoder, and "renderer" for the -renderer executable. The resulting encoder/decoder/renderer executables are -"IVAS_cod.exe", "IVAS_dec.exe", and "IVAS_rend.exe". All reside in the c-code -main directory. RUNNING THE SOFTWARE ==================== -The usage of the "IVAS_cod" program is as follows: --------------------------------------------------- - -Usage: IVAS_cod.exe [Options] R Fs input_file bitstream_file - -Mandatory parameters: ---------------------- -R : Bitrate in bps, - for EVS native modes R = (5900*, 7200, 8000, 9600, 13200, 16400, - 24400, 32000, 48000, 64000, 96000, 128000) - *VBR mode (average bitrate), - for AMR-WB IO modes R = (6600, 8850, 12650, 14250, 15850, 18250, - 19850, 23050, 23850) - for IVAS stereo R = (13200, 16400, 24400, 32000, 48000, 64000, 80000, - 96000, 128000, 160000, 192000, 256000) - for IVAS ISM R = 13200 for 1 ISM, 16400 for 1 ISM and 2 ISM, - (24400, 32000, 48000, 64000, 80000, 96000,128000) - for 2 ISM, 3 ISM and 4 ISM also 160000, 192000, 256000 - for 3 ISM and 4 ISM also 384000 - for 4 ISM also 512000 - for IVAS SBA, MASA, MC, ISM-MASA, and ISM-SBA R=(13200, 16400, 24400, 32000, - 48000, 64000, 80000, 96000, 128000, 160000, 192000, 256000, 384000, 512000) - Alternatively, R can be a bitrate switching file which consists of R values - indicating the bitrate for each frame in bps. These values are stored in - binary format using 4 bytes per value -Fs : Input sampling rate in kHz, Fs = (8, 16, 32 or 48) -input_file : Input audio filename -bitstream_file : Output bitstream filename - -Options: --------- -EVS mono is default, for IVAS choose one of the following: -stereo, -ism, -sba, -masa, -mc --stereo : Stereo format --ism [+]Ch Files : ISM format - where Ch specifies the number of ISMs (1-4) - where positive (+) indicates extended metadata (only 64 kbps and up) - and Files specify input files containing metadata, one file per object - (use NULL for no input metadata) --sba +/-Order : Scene Based Audio input format (Ambisonics ACN/SN3D), - where Order specifies the Ambisionics order (1-3), - where positive (+) means full 3D and negative (-) only 2D/planar components to be coded --masa Ch File : MASA format - where Ch specifies the number of MASA input/transport channels (1 or 2): - and File specifies input file containing parametric MASA metadata --ism_masa IsmCh MasaCh IsmFiles MasaFile : MASA and ISM format - where IsmCh specifies the number of ISMs (1-4),\n" ); - MasaCh specifies the number of MASA input/transport channels (1-2), - IsmFiles specify input files containing metadata, one file per object, - and MasaFile specifies input file containing parametric MASA metadata --mc InputConf : Multi-channel format - where InputConf specifies the channel configuration: 5_1, 7_1, 5_1_2, 5_1_4, 7_1_4 - Loudspeaker positions are assumed to have azimuth and elevation as per - ISO/IEC 23091-3:2018 Table 3. Channel order is as per ISO/IEC 23008-3:2015 Table 95. - See below for details. --dtx D : Activate DTX mode, D = (0, 3-100) is the SID update rate - where 0 = adaptive, 3-100 = fixed in number of frames, default is deactivated --dtx : Activate DTX mode with a SID update rate of 8 frames - Note: DTX is supported in EVS, stereo, ISM, MASA, and SBA up to 80kbps --rf p o : Activate channel-aware mode in EVS for WB and SWB signal at 13.2kbps, - where FEC indicator, p: LO or HI, and FEC offset, o: 2, 3, 5, or 7 in number of frames. - Alternatively p and o can be replaced by a rf configuration file with each line - contains the values of p and o separated by a space, default is deactivated --max_band B : Activate bandwidth limitation, B = (NB, WB, SWB or FB) - alternatively, B can be a text file where each line contains "nb_frames B" --no_delay_cmp : Turn off delay compensation --stereo_dmx_evs : Stereo downmix function for EVS --mime : Mime output bitstream file format - The encoder produces TS26.445 Annex.2.6 Mime Storage Format, (not RFC4867 Mime Format). - default output bitstream file format is G.192 --bypass mode : SBA PCA by-pass, mode = (1, 2), 1 = PCA off, 2 = signal adaptive, default is 1 --q : Quiet mode, limit printouts to terminal, default is deactivated - - -The usage of the "IVAS_dec" program is as follows: --------------------------------------------------- - -Usage for EVS: IVAS_dec.exe [Options] Fs bitstream_file output_file -Usage for IVAS: IVAS_dec.exe [Options] OutputConf Fs bitstream_file output_file +The usage of the "IVAS_cod" program: +------------------------------------ -Mandatory parameters: ---------------------- -OutputConf : Output configuration: MONO, STEREO, 5_1, 7_1, 5_1_2, 5_1_4, 7_1_4, FOA, - HOA2, HOA3, BINAURAL, BINAURAL_ROOM_IR, BINAURAL_ROOM_REVERB, EXT - By default, channel order and loudspeaker positions are equal to the - encoder. For loudspeaker outputs, OutputConf can be a custom loudspeaker - layout file. See below for details. - Parameter is only used when decoding IVAS bitstream. -Fs : Output sampling rate in kHz (8, 16, 32 or 48) -bitstream_file : Input bitstream filename or RTP packet filename (in VOIP mode) -output_file : Output audio filename - -Options: --------- --VOIP : VoIP mode: RTP in G192 --VOIP_hf_only=0 : VoIP mode: EVS RTP Payload Format hf_only=0 in rtpdump --VOIP_hf_only=1 : VoIP mode: EVS RTP Payload Format hf_only=1 in rtpdump - The decoder may read rtpdump files containing TS26.445 Annex A.2.2 - EVS RTP Payload Format. The SDP parameter hf_only is required. - Reading RFC4867 AMR/AMR-WB RTP payload format is not supported. --Tracefile TF : VoIP mode: Generate trace file named TF --fec_cfg_file : Optimal channel aware configuration computed by the JBM - as described in Section 6.3.1 of TS26.448. The output is - written into a .txt file. Each line contains the FER indicator - (HI|LO) and optimal FEC offset. --no_delay_cmp : Turn off delay compensation --mime : Mime bitstream file format - The decoder may read both TS26.445 Annex.2.6 and RFC4867 Mime Storage - Format files, the magic word in the mime file is used to determine - which of the two supported formats is in use. - default bitstream file format is G.192 --hrtf File : HRTF filter File used in BINAURAL rendering --T File : Head rotation specified by external trajectory File --otr tracking_type : Head orientation tracking type: 'none', 'ref', 'avg', 'ref_vec' - or 'ref_vec_lev' (only for binaural rendering) --rf File : Reference rotation specified by external trajectory File - works only in combination with '-otr ref' mode --rvf File : Reference vector specified by external trajectory File - works only in combination with '-otr ref_vec' and 'ref_vec_lev' modes --render_config File : Renderer configuration option with parameters specified in File --non_diegetic_pan P : panning mono non-diegetic sound to stereo -90<= P <=90, - left or l or 90->left, right or r or -90->right, center or c or 0->middle --q : Quiet mode, limit printouts to terminal, default is deactivated - - -The usage of the "IVAS_rend" program is as follows: ---------------------------------------------------- +Same ss described in Readme.txt. -Usage: IVAS_rend [options] -Valid options: --i File : Input audio File (WAV, raw PCM or scene description file) --if Format : Audio Format of input file (e.g. 5_1 or HOA3 or META, use -l for a list) --im Files : Metadata files for ISM (one file per object) or MASA inputs --o File : Output audio File --of Format : Audio Format of output file - Alternatively, it can be a custom loudspeaker layout File --fs : Input sampling rate in kHz (16, 32, 48) - required only with raw PCM inputs --tf File : Head rotation trajectory File for simulation of head tracking (only for binaural outputs) --rf File : Reference rotation trajectory File for simulation of head tracking (only for binaural outputs) --rvf File : Reference vector trajectory File for simulation of head tracking (only for binaural outputs) --hrtf File : Custom HRTF File for binaural rendering (only for binaural outputs) --rc File : Binaural renderer configuration File (only for binaural outputs) --ndp P : Panning mono non-diegetic sound to stereo -90<= P <= 90 - left or l or 90->left, right or r or -90->right, center or c or 0 ->middle --otr tracking_type : Head orientation tracking type: 'none', 'ref', 'avg' or `ref_vec` or `ref_vec_lev` (only for binaural outputs) --lp Position : Output LFE position. Comma-delimited triplet of [gain, azimuth, elevation] where gain is linear - (like --gain, -g) and azimuth, elevation are in degrees. - If specified, overrides the default behavior which attempts to map input to output LFE channel(s) --lm File : LFE panning matrix File (CSV table) containing a matrix of dimensions [ num_input_lfe x - num_output_channels ] with elements specifying linear routing gain (like --gain, -g). - If specified, overrides the output LFE position option and the default behavior which attempts to map - input to output LFE channel(s) --ndc : Turn off delay compensation --q : Quiet mode, limit printouts to terminal, default is deactivated --g : Input gain (linear, not in dB) to be applied to input audio file --l : List supported audio formats --exof : External orientation trajectory File for simulation of external orientations --smd : Metadata Synchronization Delay in ms, Default is 0. Quantized by 5ms subframes. - - - MULTICHANNEL LOUDSPEAKER INPUT / OUTPUT CONFIGURATIONS - ====================================================== -The loudspeaker positions for each MC layouts are assumed to have the following azimuth and elevation -(as per ISO/IEC 23091-3:2018 Table 3), 4th channel is LFE: - 5_1 -> CICP6: azi | 30| -30| 0| 0| 110|-110| - ele | 0| 0| 0| 0| 0| 0| - 7_1 -> CICP12: azi | 30| -30| 0| 0| 110|-110| 135|-135| - ele | 0| 0| 0| 0| 0| 0| 0| 0| - 5_1_2 -> CICP14: azi | 30| -30| 0| 0| 110|-110| 30| -30| - ele | 0| 0| 0| 0| 0| 0| 35| 35| - 5_1_4 -> CICP16: azi | 30| -30| 0| 0| 110|-110| 30| -30| 110|-110| - ele | 0| 0| 0| 0| 0| 0| 35| 35| 35| 35| - 7_1_4 -> CICP19: azi | 30| -30| 0| 0| 135|-135| 90| -90| 30| -30| 135|-135| - ele | 0| 0| 0| 0| 0| 0| 0| 0| 35| 35| 35| 35| -Position is not considered for the LFE channel. Channel order is as per ISO/IEC 23008-3:2015 Table 95. - -Additionally, at the decoder, OutputConf can be a custom loudspeaker layout file with the format: - azi0, azi1, ... aziN-1 - ele0, ele1, ... eleN-1 - LFE0 [optional] -Where the first two rows are comma separated azimuth and elevation positions of the N loudspeakers. -The output channel ordering is 0, 1, ... N-1. The third row contains an index "LFE0" (zero based) -specifying the output channel to which the LFE input will be routed if present. If the third row is -omitted, the LFE input is downmixed to all channels with a factor of 1/N. Position is not considered for -the LFE channel. -An example custom loudspeaker layout file is available: ls_setup_16ch_8+4+4.txt - +The usage of the "IVAS_dec" program: +------------------------------------ - RUNNING THE SELF TEST - ===================== +Same ss described in Readme.txt while more command-line options are avilable: + +Usage for IVAS: IVAS_dec.exe [Options] OutputConf Fs bitstream_file output_file -A codec verification script is available at https://forge.3gpp.org/rep/ivas-codec-pc/ivas-codec/ -in scripts/self_test.py. The script demonstrates how to use the software at several operating points -and compares the output to a reference version/implementation. -Please note: In order to keep the run-time short it does not cover all operating -points or complete coverage. - -Documentation on the self_test.py can be found as a part of scripts/README.md. - -Note: Running the self_test.py requires the input vectors in the folder scripts/testv. - -stv1ISM48s.wav - 1 channel (1 audio object), 48000 Hz, 1440000 samples -stv2ISM48s.wav - 2 channels (discrete audio objects), 48000 Hz, 1440000 samples per channel -stv2OA32c.wav - 9 channels (2nd order Ambisonics ACN/SN3D), 32000 Hz -stv2OA48c.wav - 9 channels (2nd order Ambisonics ACN/SN3D), 48000 Hz -stv3ISM48s.wav - 3 channels (discrete audio objects), 48000 Hz, 1440000 samples per channel -stv3OA32c.wav - 16 channels (3rd order Ambisonics ACN/SN3D), 32000 Hz, 288939 samples per channel -stv3OA48c.wav - 16 channels (3rd order Ambisonics ACN/SN3D), 48000 Hz, 433408 samples per channel -stv4ISM48s.wav - 4 channel (discrete audio objects), 48000 Hz, 1440000 samples per channel -stv4ISM48n.wav - 4 channel (discrete audio objects), 48000 Hz, noisy speech -stv8c.wav - 1 channel, 8000 Hz, clean speech/audio -stv8n.wav - 1 channel, 8000 Hz, noisy speech -stv16c.wav - 1 channel, 16000 Hz, 610307 samples, clean speech -stv16n.wav - 1 channel, 16000 Hz, 257024 samples, noisy speech -stv32c.wav - 1 channel, 32000 Hz, 1220613 samples, clean speech/audio -stv32n.wav - 1 channel, 32000 Hz, 514048 samples, noisy speech -stv48c.wav - 1 channel, 48000 Hz, 960000 samples, clean speech/audio -stv48n.wav - 1 channel, 48000 Hz, 931200 samples, noisy clean speech -stv51MC48c.wav - 6 channels (5.1 1..6 where 4th channel is LFE), 960000 samples per channel, 48000 Hz -stv512MC48c.wav - 8 channels (5.1+2 1..8 where 4th channel is LFE), 144000 samples per channel, 48000 Hz -stv514MC48c.wav - 10 channels (7.1+2 1..10 where 4th channel is LFE), 144000 samples per channel, 48000 Hz -stv71MC48c.wav - 8 channels (7.1 1..8 where 4th channel is LFE), 144000 samples per channel, 48000 Hz -stv714MC48c.wav - 12 channels (7.1+4 1..12 where 4th channel is LFE), 144000 samples per channel, 48000 Hz -stvFOA16c.wav - 4 channels (1st order Ambisonics ACN/SN3D), 16000 Hz, -stvFOA32c.wav - 4 channels (1st order Ambisonics ACN/SN3D), 32000 Hz, 288939 samples per channel -stvFOA48c.wav - 4 channels (1st order Ambisonics ACN/SN3D), 48000 Hz, 433408 samples per channel -stvST16c.wav - 2 channels, 16000 Hz, 329601 samples per channel, clean speech/audio -stvST16n.wav - 2 channels, 16000 Hz, 310401 samples per channel, noisy speech -stvST32c.wav - 2 channels, 32000 Hz, 659200 samples per channel, clean speech/audio -stvST32n.wav - 2 channels, 32000 Hz, 620800 samples per channel, noisy speech -stvST48c.wav - 2 channels, 48000 Hz, 988800 samples per channel, clean speech/audio -stvST48n.wav - 2 channels, 48000 Hz, 931200 samples per channel, noisy speech -stv1MASA1TC48c.wav - 1 channel (1 MASA 1 transport channel), 48000 Hz, 48000 Hz, 144000 samples -stv1MASA1TC48n.wav - 1 channel (1 MASA 1 transport channel), 48000 Hz, 48000 Hz, 963840 samples -stv1MASA2TC48c.wav - 2 channels (2 MASA 2 transport channels), 48000 Hz, 48000 Hz, 288000 samples per channel -stv1MASA2TC48n.wav - 2 channels (2 MASA 2 transport channels), 48000 Hz, 48000 Hz, 963840 samples per channel -stv2MASA1TC48c.wav - 1 channel (1 MASA 1 transport channel), 48000 Hz, 48000 Hz, 288000 -stv2MASA2TC48c.wav - 2 channels (2 MASA 2 transport channels), 48000 Hz, 48000 Hz, 144000 samples per channel -stvOMASA_1ISM_1MASA2TC48c.wav - 3 channels (1 discrete audio object and 1 MASA 2 transport channels), 48000 Hz -stvOMASA_1ISM_2MASA1TC32c.wav - 2 channels (1 discrete audio object and 2 MASA 1 transport channel), 32000 Hz -stvOMASA_1ISM_2MASA2TC48c.wav - 3 channels (1 discrete audio object and 2 MASA 2 transport channels), 48000 Hz -stvOMASA_2ISM_1MASA1TC16c.wav - 3 channels (2 discrete audio object and 1 MASA 1 transport channel), 48000 Hz -stvOMASA_2ISM_1MASA2TC48c.wav - 4 channels (2 discrete audio object and 1 MASA 2 transport channels), 16000 Hz -stvOMASA_2ISM_2MASA2TC48c.wav - 4 channels (2 discrete audio object and 2 MASA 2 transport channels), 48000 Hz -stvOMASA_3ISM_1MASA1TC32c.wav - 4 channels (3 discrete audio object and 1 MASA 1 transport channel), 32000 Hz -stvOMASA_3ISM_1MASA2TC16c.wav - 5 channels (3 discrete audio object and 1 MASA 2 transport channels), 16000 Hz -stvOMASA_3ISM_1MASA2TC32c.wav - 5 channels (3 discrete audio object and 1 MASA 2 transport channels), 32000 Hz -stvOMASA_3ISM_1MASA2TC48c.wav - 5 channels (3 discrete audio object and 1 MASA 2 transport channels), 32000 Hz -stvOMASA_3ISM_2MASA1TC48c.wav - 4 channels (3 discrete audio object and 2 MASA 1 transport channel), 48000 Hz -stvOMASA_3ISM_2MASA2TC32c.wav - 5 channels (3 discrete audio object and 2 MASA 2 transport channels), 32000 Hz -stvOMASA_3ISM_2MASA2TC48c.wav - 5 channels (3 discrete audio object and 2 MASA 2 transport channels), 48000 Hz -stvOMASA_4ISM_1MASA1TC48c.wav - 5 channels (4 discrete audio object and 1 MASA 1 transport channel), 48000 Hz -stvOMASA_4ISM_1MASA2TC48c.wav - 6 channels (4 discrete audio object and 1 MASA 2 transport channels), 48000 Hz -stvOMASA_4ISM_2MASA1TC48c.wav - 5 channels (4 discrete audio object and 2 MASA 1 transport channel), 48000 Hz -stvOMASA_4ISM_2MASA2TC48c.wav - 6 channels (4 discrete audio object and 2 MASA 2 transport channels), 48000 Hz - -MASA metadata file ------------------- -For the MASA operation modes, in addition the following metadata files -located in /scripts/testv/ folder are required: - -stv1MASA1TC48c.met -stv1MASA1TC48n.met -stv1MASA2TC48c.met -stv1MASA2TC48n.met -stv2MASA1TC48c.met -stv2MASA2TC48c.met - -The detailed syntax of MASA metadata files can be found in 3GPP TS 26.258. - -It is strongly recommended to align these files to the corresponding -PCM audio files. The MASA metadata files can be generated with the -latest version of the IVAS MASA C Reference Software, which was made -available at -https://www.3gpp.org/ftp/TSG_SA/WG4_CODEC/TSGS4_118-e/Docs/S4-220443.zip - - -Object based audio metadata file --------------------------------- -For the ISM operation modes, in addition the following metadata files -located at /scripts/testv/ folder are required: - -stvISM1.csv -stvISM2.csv -stvISM3.csv -stvISM4.csv - -These are comma separated files (csv) which indicate the per object position -in the format: -azimuth, elevation, radius, spread, gain, yaw, pitch, non-diegetic - -Example metadata line with default values: -0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 0 - -with the following meaning: -| Parameter | format, value range | meaning ---------------------------------------------------------------------------------------------------- -| azimuth | float, [-180,180] | azimuth or panning; positive indicates left; default: 0 ---------------------------------------------------------------------------------------------------- -| elevation | float, [-90,90] | elevation; positive indicates up; default: 0 ---------------------------------------------------------------------------------------------------- -| radius | float, [0, 15.75] | radius (extended metadata); default: 1 ---------------------------------------------------------------------------------------------------- -| spread | float, [0,360] | spread in angles from 0...360 deg; default: 0 ---------------------------------------------------------------------------------------------------- -| gain | float, [0,1] | gain; default: 1 ---------------------------------------------------------------------------------------------------- -| yaw | float, [-180,180] | yaw (extended metadata); positive indicates left; default: 0 ---------------------------------------------------------------------------------------------------- -| pitch | float, [-90,90] | pitch (extended metadata); positive indicates up; default: 0 ---------------------------------------------------------------------------------------------------- -| non-diegetic | float*, [0 1] | Flag for activation of non-diegetic rendering; default: 0 -| | if Flag is set to 1, panning gain is specified by azimuth. -| | Value between [-90,90], 90 left, -90 right, 0 center ---------------------------------------------------------------------------------------------------- -*Read as float value for convenience, but used as an integer flag internally. - -The metadata reader accepts 1-8 values specified per line. If a value is not specified, the default -value is assumed. - - -HRTF filter file ----------------- -For the HRTF filter File option, external HRTF filter Files are available in folder -/scripts/binauralRenderer_interface/binaural_renderers_hrtf_data: - -ivas_binaural_16kHz.bin -ivas_binaural_32kHz.bin -ivas_binaural_48kHz.bin - -The HRTF filter file has a specific container format with a header and a sequence of entries. The -detailed syntax can be found in 3GPP TS 26.258. - - -Head rotation trajectory file ------------------------------ - -[TBD] - -For the Head rotation operation modes, external trajectory files are available: - -headrot.csv -headrot_case00_3000_q.csv -headrot_case01_3000_q.csv -headrot_case02_3000_q.csv -headrot_case03_3000_q.csv - - -Reference rotation/vector file ------------------------------- -The external reference orientation of the orientation tracking feature can either be provided as a -rotation (Quaternion or Euler angles) or as a pair of 3-dimensional positions (listener position -and acoustic reference position). - -The Reference Rotation format is identical to Head rotation trajectory file. - -The Reference Vector file format describes a pair of x/y/z positions, one for the listener and one -for the acoustic reference. The acoustic reference direction is defined by the vector from the -listener towards the acoustic reference position. The reference vector file is a CSV file with -comma as separator. Each line must contain a listener and an acoustic reference position in the -following order: - x axis position of the listener. - y axis position of the listener. - z axis position of the listener. - x axis position of the acoustic reference. - y axis position of the acoustic reference. - z axis position of the acoustic reference. +Additional options: +------------------- +OutputConf : Output configuration: MONO, STEREO, 5_1, 7_1, 5_1_2, 5_1_4, 7_1_4, FOA, + HOA2, HOA3, BINAURAL, BINAURAL_ROOM_IR, BINAURAL_ROOM_REVERB, + BINAURAL_SPLIT_CODED, BINAURAL_SPLIT_PCM, EXT +-om File : Coded metadata File for BINAURAL_SPLIT_PCM output mode -For Reference vector specified by external trajectory file, example files are available in folder -/scripts/trajectories. - -External orientation file -------------------------- -[TBD] +The usage of the "IVAS_rend" program: +------------------------------------- -Renderer config file --------------------- -The renderer configuration file provides metadata for controlling the rendering process. This metadata -includes acoustics environment parameters and source directivity. The data can be provided using -binary bitstream or a text file. - -The detailed syntax can be found in 3GPP TS 26.258. +Same ss described in Readme.txt while more command-line options are avilable: -Example renderer configuration files are available, e.g.: +Usage: IVAS_rend [options] + +Additional options: +------------------- +-om File : Coded metadata File for BINAURAL_SPLIT_PCM output mode +-prbfi File : Split rendering option: bfi File + + + + + RUNNING THE SELF TEST + ===================== -rend_rend_config_hospital_patientroom.cfg -rend_config_recreation.cfg -rend_config_renderer.cfg +Same ss described in Readme.txt except of renderer configuration text file. which +can additionally be used to configure the pre-rendering step of the split binaural +renderer. All split renderer parameters are optional. +The detailed syntax can be found in 3GPP TS 26.258. \ No newline at end of file -- GitLab From cba4fb81f03b5bd7e27f69902e69029412d5b595 Mon Sep 17 00:00:00 2001 From: vaclav Date: Mon, 14 Aug 2023 10:58:55 +0200 Subject: [PATCH 07/10] typo --- readme_split_rendering.txt | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/readme_split_rendering.txt b/readme_split_rendering.txt index ca8f3e7a48..9869a9bbdb 100644 --- a/readme_split_rendering.txt +++ b/readme_split_rendering.txt @@ -55,11 +55,11 @@ Same as described in Readme.txt while the structure looks as follows: |-- lib_debug |-- lib_dec |-- lib_enc - |-- lib_lc3plus + |-- lib_lc3plus |-- lib_rend |-- lib_util |-- readme.txt - |-- readme_split_rendering.txt + |-- readme_split_rendering.txt @@ -69,14 +69,14 @@ Same as described in Readme.txt while the structure looks as follows: The usage of the "IVAS_cod" program: ------------------------------------ -Same ss described in Readme.txt. +Same as described in Readme.txt. The usage of the "IVAS_dec" program: ------------------------------------ -Same ss described in Readme.txt while more command-line options are avilable: +Same as described in Readme.txt while more command-line options are avilable. Usage for IVAS: IVAS_dec.exe [Options] OutputConf Fs bitstream_file output_file @@ -92,7 +92,7 @@ OutputConf : Output configuration: MONO, STEREO, 5_1, 7_1, 5_1_2, 5_1_4 The usage of the "IVAS_rend" program: ------------------------------------- -Same ss described in Readme.txt while more command-line options are avilable: +Same as described in Readme.txt while more command-line options are avilable. Usage: IVAS_rend [options] @@ -107,8 +107,8 @@ Additional options: RUNNING THE SELF TEST ===================== -Same ss described in Readme.txt except of renderer configuration text file. which +Same as described in Readme.txt except of the renderer configuration text file which can additionally be used to configure the pre-rendering step of the split binaural renderer. All split renderer parameters are optional. -The detailed syntax can be found in 3GPP TS 26.258. \ No newline at end of file +The detailed syntax of the renderer configuration text can be found in 3GPP TS 26.258. -- GitLab From 2df181729dd940a529b69292acab09ddc0e985be Mon Sep 17 00:00:00 2001 From: vaclav Date: Mon, 14 Aug 2023 12:11:31 +0200 Subject: [PATCH 08/10] add C language version --- readme.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/readme.txt b/readme.txt index cc4ed7d9c3..4ce1fa3a29 100644 --- a/readme.txt +++ b/readme.txt @@ -33,7 +33,7 @@ These files represent the 3GPP EVS Codec Extension for Immersive Voice and Audio Services (IVAS) floating-point C simulation. All code is writtten -in C. The system is implemented as three separate programs: +in ISO/IEC C99. The system is implemented as three separate programs: IVAS_cod Encoder IVAS_dec Decoder -- GitLab From 05630555adc70a5a537868e6944eaa54f6cc2e67 Mon Sep 17 00:00:00 2001 From: vaclav Date: Mon, 14 Aug 2023 13:10:10 +0200 Subject: [PATCH 09/10] - update for '-exof' - add '-dpid ID' and '-aeid ID' --- readme.txt | 18 ++++++++++++------ 1 file changed, 12 insertions(+), 6 deletions(-) diff --git a/readme.txt b/readme.txt index 4ce1fa3a29..c990e516be 100644 --- a/readme.txt +++ b/readme.txt @@ -270,6 +270,10 @@ Options: -render_config File : Renderer configuration option with parameters specified in File -non_diegetic_pan P : panning mono non-diegetic sound to stereo -90<= P <=90, left or l or 90->left, right or r or -90->right, center or c or 0->middle +-exof File : External orientation trajectory File for simulation of external orientations +-dpid ID : Directivity pattern ID(s) (space-separated list of up to 4 numbers can be + specified) for binaural output configuration +-aeid ID : Acoustic environment ID (number >= 0) for BINAURAL_ROOM_REVERB output config. -q : Quiet mode, limit printouts to terminal, default is deactivated @@ -298,16 +302,18 @@ Options: -lp Position : Output LFE position. Comma-delimited triplet of [gain, azimuth, elevation] where gain is linear (like --gain, -g) and azimuth, elevation are in degrees. If specified, overrides the default behavior which attempts to map input to output LFE channel(s) --lm File : LFE panning matrix File (CSV table) containing a matrix of dimensions [ num_input_lfe x - num_output_channels ] with elements specifying linear routing gain (like --gain, -g). - If specified, overrides the output LFE position option and the default behavior which attempts to map - input to output LFE channel(s) +-lm File : LFE panning matrix File (CSV table) containing a matrix of dimensions + [ num_input_lfe x num_output_channels ] with elements specifying linear routing gain (like --gain, -g). + If specified, overrides the output LFE position option and the default behavior which attempts to map input to output LFE channel(s) -ndc : Turn off delay compensation --q : Quiet mode, limit printouts to terminal, default is deactivated -g : Input gain (linear, not in dB) to be applied to input audio file -l : List supported audio formats --exof : External orientation trajectory File for simulation of external orientations +-exof File : External orientation trajectory File for simulation of external orientations +-dpid ID : Directivity pattern ID(s) (space-separated list of up to 4 numbers can be + specified) for binaural output configuration +-aeid ID : Acoustic environment ID (number >= 0) for BINAURAL_ROOM_REVERB output config. -smd : Metadata Synchronization Delay in ms, Default is 0. Quantized by 5ms subframes. +-q : Quiet mode, limit printouts to terminal, default is deactivated MULTICHANNEL LOUDSPEAKER INPUT / OUTPUT CONFIGURATIONS -- GitLab From fb6e685f7852187309b46ca1eaf7fe480c322809 Mon Sep 17 00:00:00 2001 From: vaclav Date: Mon, 14 Aug 2023 13:15:19 +0200 Subject: [PATCH 10/10] Update -im description for SR --- readme_split_rendering.txt | 2 ++ 1 file changed, 2 insertions(+) diff --git a/readme_split_rendering.txt b/readme_split_rendering.txt index 9869a9bbdb..3cd9dab418 100644 --- a/readme_split_rendering.txt +++ b/readme_split_rendering.txt @@ -99,6 +99,8 @@ Usage: IVAS_rend [options] Additional options: ------------------- -om File : Coded metadata File for BINAURAL_SPLIT_PCM output mode +-im : Metadata files for ISM (one file per object) or MASA inputs or + BINAURAL_SPLIT_PCM input mode -prbfi File : Split rendering option: bfi File -- GitLab