Page 168 - Kỷ yếu hội thảo khoa học lần thứ 12 - Công nghệ thông tin và Ứng dụng trong các lĩnh vực (CITA 2023)
P. 168
152
Python to preprocess the data, after processing, it has 20,551 reviews, including
2,268,646 words total, and vocabulary size of 32,687.
From a total of 20,551 reviews collected from 3-5 star hotels in Vietnam. The longest
review is 10,335 words, the shortest is 1 word.
Among download data, the number of reviews is rated 5 stars the most, it has about
more than 16,000 reviews. At least the reviews rated 2 stars. Figure 3 below lists the
number of reviews according to rating ratings from 1-5 stars.
Fig. 3. Distribution of reviews with ratings in the collected data set
There is a list of words hat are mentioned many times in the positive reviews. It express
the aspects that are interested by customers. The aspects that customers want to
mention: food, room, staff, services, bathr
positive reviews, the positive words are clearly shown including: greate, excellent,
friendly, clean, beautifull, shows that the aspects are rated positively, and the customer's
attitude towards these aspects is good. Figure 4 is a word cloud that lists the most talked
about polar aspects and words.
CITA 2023 ISBN: 978-604-80-8083-9