Page 42 - Kỷ yếu hội thảo khoa học lần thứ 12 - Công nghệ thông tin và Ứng dụng trong các lĩnh vực (CITA 2023)
P. 42

26


                     There are two common activation functions to use with networks such as the Softmax
                     function and the rectified linear activation function (ReLU). In this research, we will
                     choose Softmax as the activation function because it is suitable for classification which
                     it exploit the relation between input data by put these input follow probabilistic distri-
                     bution with values that sums to one for optimal output especially for multi-class classi-
                     fication.
                                                                                                     (1)


                     2.3   Spectrogram

                     Since the above deep learning model using CNN algorithms only accept imagine as


                     sound as input by visually them using spectrogram.
                       The spectrogram is utilized to visualize the length, frequency, and intensity of an
                     audio signal present in a particular waveform. The spectrogram is computed by a for-
                     mula called Short-time Fourier transform which is used to determine the sinusoidal fre-
                     quency and phase content of a signal as it changes over time.
                       A Spectrogram of a sound will represent it in frequency domain with three parame-
                                                                 parameters, the x-axis is time and y-axis
                     are frequency, the magnitude is higher as the bold of the color.





















                                Fig. 4. Sound of a gun display in waveform(above) and spectrogram,
                                                   (under) firing in single shot



                     3     Implementation and Analysis


                     3.1   Training process

                     Dataset:                                  -consuming  part  when  training  a  model.
                     The research was tasked to collect samples for 4 labels: scream, glass breaking sound,
                     gunshot and environment. The first label is environment that is the easiest sound we
                     can collect in real life. That label contains about 100 samples about normal activities
                     such as talking, laughing, singing and quiet background noise.  Another label is scream




                     CITA 2023                                                   ISBN: 978-604-80-8083-9
   37   38   39   40   41   42   43   44   45   46   47