Explanation of audio stat using sox -


i have bunch of audio files , need split each files based on silence , using sox. however, realize files have noisy background , don't can't use single set of parameter iterate on files doing split. try figure out how separate them noisy background. here got sox input1.flac -n stat , sox input2.flac -n stat

samples read:          18207744 length (seconds):    568.992000 scaled by:         2147483647.0 maximum amplitude:     0.999969 minimum amplitude:    -1.000000 midline amplitude:    -0.000015 mean    norm:          0.031888 mean    amplitude:    -0.000361 rms     amplitude:     0.053763 maximum delta:         0.858917 minimum delta:         0.000000 mean    delta:         0.018609 rms     delta:         0.039249 rough   frequency:         1859 volume adjustment:        1.000 

and

samples read:         198976896 length (seconds):   6218.028000 scaled by:         2147483647.0 maximum amplitude:     0.999969 minimum amplitude:    -1.000000 midline amplitude:    -0.000015 mean    norm:          0.156168 mean    amplitude:    -0.000010 rms     amplitude:     0.211787 maximum delta:         1.999969 minimum delta:         0.000000 mean    delta:         0.091605 rms     delta:         0.123462 rough   frequency:         1484 volume adjustment:        1.000 

the former not contain noisy background , latter does. suspect can use sample mean of max delta because of big gap. can explain me meaning of stats, or @ least show me can myself (i tried looking in official documentation don't explain). many thanks.

i cannot explain meaning of stats. i've tried figure out myself on many occasions, doesn't appear documented anywhere. i'd use stat function instead, output easier understand.

as measure differentiate between more or less noisy audio i'd try using difference between highest , lowest sound levels. quietest parts never quieter background noise alone, if there little difference audio either noisy, or loud time, compressed pop song. take difference between maximum , minimum rms values, or between peak , minimum rms. rms window length should kept short, , if audio has fade-in or fade-out sections, should trimmed away, though didn't include in code.

audio="input1.flac" width=0.01  peak=$(sox "$audio" -n channels 1 stats -w $width 2>&1 |\   grep "pk lev db" |\   sed 's/[^0-9.-]*//g') rmsmax=$(sox "$audio" -n channels 1 stats -w $width 2>&1 |\   grep "rms pk db" |\   sed 's/[^0-9.-]*//g') rmsmin=$(sox "$audio" -n channels 1 stats -w $width 2>&1 |\   grep "rms tr db" |\   sed 's/[^0-9.-]*//g') rmsdif=$(echo "($rmsmax)-($rmsmin)" | bc -l) pkmindif=$(echo "($peak)-($rmsmin)" | bc -l)  echo "   max rms: $max   min rms: $min    diff rms: $rmsdif   peak-min: $pkmindif " 

Comments

Popular posts from this blog

php - Permission denied. Laravel linux server -

google bigquery - Delta between query execution time and Java query call to finish -

python - Pandas two dataframes multiplication? -