Collecting reliable training data to extract tourism strolling behaviour from smartphone GPS logs during walking

Ai, Hisatoshi; Kaji, Hideki

doi:https://doi.org/10.5194/ica-abs-1-3-2019

Articles | Volume 1

https://doi.org/10.5194/ica-abs-1-3-2019

© Author(s) 2019. This work is distributed under
the Creative Commons Attribution 4.0 License.

https://doi.org/10.5194/ica-abs-1-3-2019

© Author(s) 2019. This work is distributed under
the Creative Commons Attribution 4.0 License.

Articles | Volume 1

15 Jul 2019

| 15 Jul 2019

Collecting reliable training data to extract tourism strolling behaviour from smartphone GPS logs during walking

Hisatoshi Ai and Hideki Kaji

Keywords: Tourism strolling behaviour, training data collection, smartphone GPS logs

Abstract. A smartphone can be a useful device to deliver tourism information to users. Many earlier studies have discussed a method to determine whether the user is away from the area of their daily life, which would imply that the user is a potential tourist, or to select appropriate contents to be delivered to the user based on the user’s preference and circumstances. However, few studies actually attempt to find the right timing for these tourism recommendations. The ultimate goal for this study is to develop a method to extract tourism strolling behaviour through real-time analysis of GPS log data collected from smartphones. We assume that the user will be inclined to visit a recommended spot if the information about the spot is received while strolling. This method will be useful to develop tourism spot recommendation applications or to equip current maps or navigation systems with the recommendation system.

We have developed a web application (Ai and Kaji, 2017; 2019) that watches location information collected from a smartphone GPS and analyses walking speed to provide a notification about nearby tourism spots only if the user is considered to be inclined to visit this spot. Figure 1 shows the interface of the web application. In an initial step, the user is asked to input a unique ID for data collection and to choose a target region. The web application starts to collect location information and analyse walking speed after the “Start logging” button is tapped. The data is collected once per second and is saved on the local storage of the browser. If the user’s walking speed shows a certain pattern, the web application determines that the user is now strolling and shows tourism information on a screen. The user will evaluate the timing of the notification by tapping either a “Good” button or a “Bad” button. For this research, we added a “Request” button to express an interest in receiving a tourism recommendation when the web application does not show anything and neither the “Good” nor the “Bad” button is activated. These evaluations are also saved on local storage. If the user taps the “Send to server” button, data on the local storage will be sent to our server. By tapping the “End logging” button, the web application stops monitoring location information.

A proof-of-concept experiment is conducted in the field to collect training data, namely the evaluation taps, to improve the extracting algorithm of tourism strolling behaviour. We picked two target areas, Kawagoe and Yokohama; both cities are located in the Tokyo metropolitan area and have several tourism spots downtown. Participants are asked to walk from a railway station to a designated meeting place for two hours. During the walk, they were also requested to go through a shopping mall from the station and visit well-known tourism destination zones located between the shopping mall and the meeting place. Table 1 shows the date and participants’ ID. The ID starts with a single letter, K or Y, meaning the participant only joined the experiment in Kawagoe or Yokohama respectively. The ID may include two letters, which means that the participant joined the experiment in both areas, and the order in the letters corresponds to that of participation, e.g. KY means first in Kawagoe and then in Yokohama.

In this experiment, we collected two sets of training data. One comprises evaluation taps from participants, while the other contains hand-written marks on a paper map that show where participants thought they were strolling. Evaluation taps can be input in real time and on-site, however, “Good” or “Bad” evaluations can only be obtained while the web application identifies the participant as strolling. Although a participant can also tap the “Request” button, there is a limitation in collecting training data during the entire span of the experiment. On the other hand, marks on a paper map can cover the entire span of the experiment as it relies on a participant’s memory. However, there are potential limitations on the accuracy front. 1) Do participants perfectly remember where they were strolling for a whole two-hour walk? 2) Can participants correctly match the location they recalled strolling at to a location on the map? These two limitations can be overcome by using smartphone devices, since we can collect training data in real time and on-site. However, the use of a smartphone still introduces a limitation in data collection intervals as mentioned before.

In this presentation, we will discuss potential differences between the two training data sets – namely evaluation taps on a smartphone and marks on a paper map.

Table 2 shows the summary for training data. Rows marked G, B, and R show how many taps are made on “Good,” “Bad,” and “Request” buttons respectively. “Match” rows to the right of G, B, and R show how many taps do not conflict with marks on a paper map, e.g. number of “Good,” or “Request” taps within the marks, and number of “Bad” outside the marks. The row labelled “Rate” shows the overall rate of taps that matched marks on a paper map, e.g. total match counts divided by total number of taps. Data is missing on K05 following an error transferring data to our server. Only few participants achieved a high rate even when they made many taps, e.g. Y07 and KY19. We also found that for some participants, such as KY03 and KY10, the rate varies in the two areas even when they originated from the same person. This suggests that the evaluations are vague and that the methodology used to efficiently collect reliable training data is key as we try to improve the timing of recommendations.