Programmer Guide/SPU Reference/ASEG1: Difference between revisions
(initial import) |
m (1 revision: Initial import) |
Revision as of 17:31, 18 November 2010
ASEG1 - automatic segmentation
Usage:
ASEG1 DT TABP TABC TABS TABM TABO P0 P1 P2 P3
Inputs:
DT | time step |
TABP | name of parameter definition table. Note: all tables are extended shell-table |
TABC | name of channel definition table |
TABS | name of segment definition table |
TABM | name of channel-to-segment mapping table |
TABO | name of output table |
P0..P3 | segmentation parameter inputs (numbers or vectors) |
Outputs:
none, the output data (segment names and addresses) are stored in (appended to) the output table
Function:
The segmentation parameters P0-P3 can be any type of parameters (e.g.: RMS, band-RMS, F0) but must be extracted time-synchronous from the signal. It is assumed, that the parameters processed in the evaluation cycle t (t=0,1,....) are extracted from the signal time-interval [t.DT, (t+1).DT]. Each parameter value (each number and each vector element) must be defined by an entry of the table TABP. E.g. if P0 is a vector of three band-RMS values and P1 is the F0 value, the table TABP must consist of four entries (entry 0..2 for P0[0,1,2] and entry 3 for P1).
Table 1: TABP: parameter definitions (2 fields, 1 entry / parameter track){| |- |Field |Content / Description |- |0 |defines the parameter track type (0 or 1):0: the parameter track is a continuous function (e.g. RMS); for each frame a value is computed1: the parameter track is not a continuous function (e.g. F0, Formants); for some frames no parameter value can be computed (missing values) |- |1 |missing value indicator (any number):for not-continuous parameters this field defines the number that is used to indicate a missing parameter value in a frame (e.g. for F0 tracks the missing value indicator is set to 0) |}
The first segmentation step is implemented by a simple feature detection. For this purpose each parameter is assigned to one or more segmentation channel(s). A channel is used to detect a signal/segment feature based on the values of the assigned parameter. Two detection modes are defined:
p(t) | value of assigned parameter in the frame t |
s(t) | status of the channel in the frame t |
Interval method:
s(t) = 'on' | if minimum <= p(t) <= maximum |
s(t) = 'off' | otherwise |
Threshold method:
s(t) = 'on' | if s(t-1) = 'off' and p(t) >= on-threshold |
s(t) = 'off' | if s(t-1) = 'on' and p(t) <= off-threshold |
s(t) = s(t-1) | otherwise |
Table 2: TABC: channel definitions (5 fields, 1 entry / channel, multiple channels / parameters are possible){| |- |Field |Content / Description |- |0 |index of the assigned parameter (0 to number_of_parameters-1) |- |1 |selects the type of condition (0 or 1) 0: detect/filter an interval of parameter values 1: split parameter values using a threshold |- |2 |lower interval boundary (minimum) or on-threshold |- |3 |upper interval boundary (maximum) or off-threshold |- |4 |selects the processing used for frames with missing values (0, 1 or 2) 0: set channel to 'off' 1: set channel to 'on' 2: do not change channel (stay) |}
In the second segmentation step the channels (features) are combined. A mapping (mask) is used to define the feature configuration of a segment.
mi,j | mapping (mask) for segment i to channel j | |
mi,j = | 0 | channel j must be 'off' |
1 | channel j must be 'on' | |
2 | channel j is ignored |
If a channel configuration <c0(t)..cn-1(t)> (n = number of channels) matches the conditions defined by the mapping <mi,0..mi,n-1>, the segment status si(t) is set to 'on', otherwise it is set to 'off'. If multiple mappings are defined for a channel, the results of the matches are logically ORed (only one mapping must match).
Table 3: TABS: segment definitions (5 fields, 1 segment / entry){| |- |Field |Content / Description |- |0 |minimum segment 'on' duration (>0) |- |1 |maximum segment 'off' (pause) duration inside the segment (>=0) |- |2 |pre-offset = time subtracted from the detected beginning (>=0) |- |3 |post-offset = time added to the detected end (>=0) |- |4 |number of channel/segment mappings used for the segment (>0) |}
The mappings for all segments are stored in the mapping table TABM. This table consists of only one field and table entry for each mapping of a segment. E.g. if the 1st segment uses 3 mappings and the 2nd segment uses 1 mapping, the entries 0, 1 and 2 of TABM are the mappings for the 1st segment and the entry 3 is the mapping for the 2nd segment.
In the third step some decisions/processing based on the durations and offsets defined for a segment are performed:
If the stream of 'off' frames (pause) between two streams of 'on' frames (signal) is shorter than the defined maximum segment 'off' duration inside a segment, the two 'on' streams are connected.
If the duration of the 'on'-stream is shorter than the minimum segment 'on' duration, the 'on'-stream is ignored (not saved)
Before the segment is saved, the begin- and end-time are corrected to 'detected-begin-time – pre-offset' and end-time = detected-end-time + post-offset
The checked and corrected segment definitions are saved into the output table TABO. For each segment the index of the segment definition entry (for identification) and the begin- and end-time is saved.
Table 4: TABO: output table (3 fields, 1 field / segment){| |- |Field |Content / Description |- |0 |identification = index of segment definition entry |- |1 |begin time |- |2 |end time |}