Page 168 - Kỷ yếu hội thảo khoa học lần thứ 12 - Công nghệ thông tin và Ứng dụng trong các lĩnh vực (CITA 2023)
P. 168

152


                     Python  to  preprocess  the  data,  after  processing,  it  has  20,551  reviews,  including
                     2,268,646 words total, and vocabulary size of 32,687.
                       From a total of 20,551 reviews collected from 3-5 star hotels in Vietnam. The longest
                     review is 10,335 words, the shortest is 1 word.
                       Among download data, the number of reviews is rated 5 stars the most, it has about
                     more than 16,000 reviews. At least the reviews rated 2 stars. Figure 3 below lists the
                     number of reviews according to rating ratings from 1-5 stars.































                                 Fig. 3. Distribution of reviews with ratings in the collected data set


                     There is a list of words hat are mentioned many times in the positive reviews. It express
                     the  aspects  that  are  interested  by  customers.  The  aspects  that  customers  want  to
                     mention:  food,  room,  staff,  services,  bathr
                     positive reviews,  the  positive  words  are  clearly  shown  including:  greate,  excellent,
                     friendly, clean, beautifull, shows that the aspects are rated positively, and the customer's
                     attitude towards these aspects is good. Figure 4 is a word cloud that lists the most talked
                     about polar aspects and words.

























                     CITA 2023                                                   ISBN: 978-604-80-8083-9
   163   164   165   166   167   168   169   170   171   172   173