1st-hand & in-depth info about Alibaba's tech innovation in AI, Big Data, & Computer Engineering
From the short, automated routes your robot vacuum cleaner makes as it navigates your apartment to extended periods of international travel, trajectory data plays a bigger role in life today than you might expect. Whether it’s as short as a minute-long ride on a shared bicycle or as long as 10-year records at a base station, mining people’s movement patterns, and those of vehicles too, can help urban governance officials reduce traffic, ensure public safety, respond to emergencies, and make cities smarter and more efficient.
In an effort to improve on traditional visualization approaches, the Alibaba group recently advanced a novel vector field generation algorithm designed to improve analysis of key data on human and vehicle flows in cities. Inside, we explore this new method in detail alongside previous visualization approaches, concluding with findings from a case study demonstrating its ability to reduce visual overlap and reveal trends in the movement of people through urban settings.
Trajectory data describes changes over time in the spatial positions and attributes of objects moving in the space-time dimension. When people and objects move, their locations and other attributes are recorded at regular intervals by the devices they carry, such as mobile phones, sensors, and so on. These recorded data points can be taken to form sample points which can then be arranged in chronological order to constitute the trajectory of a moving object.
One inconvenience in this process, however, is that there are many sources of trajectory data, including but not limited to mobile phone signal data, vehicle GPS data, Wi-Fi sniffing data, and check-in data. What’s more, new data is constantly being generated. As a result, there are typically huge volumes of such data, which can lead to serious visual confusion and rendering pressure. With the help of visual analysis technology, by transforming trajectory data into vector field data, not only are the main features of trajectory data retained, but the volume of data is also greatly reduced. This then allows us to better observe movement patterns in crowds.
When using visual analysis technology to analyze and mine trajectory data, it is important to visualize the trajectory on an interactive interface, as this provides users with space for observation and exploration. Trajectory expression models that have received attention from experts in the field include the expression method, the flying line method, and the path connection method. This last one — the most intuitive form of trajectory presentation — connects in chronological order all of the sample points in the trajectory data of each object, and then uses other visual channels such as attributes of the encoding object like color, width, and line shape. It also mostly clearly demonstrates the spatial position of the moving object’s path, hence why it is used as the means of visualization in DataV’s base plane map and in the linear thermal layer subcomponent of the 3D map.
The flying line method, while similar to the path connection method, has one major divergence: it simulates the movement of a moving object in the form of an animation. Typically, a moving line is encoded using a line segment with an arrow, which moves chronologically between the sample points. This method clearly reflects the direction of a moving object, and has a more advanced visual effect. The flying line layer within the DataV map components, as well as the arc layer, trajectory layer, and road network trajectory layer in the 3D map, all use this visualization method. The lightning effect in the lightning graph, which made its impressive debut during Alibaba’s 11/11 Global Shopping Festival last year, was also made possible using this method.
The above visualizations are classic models for visualizing trajectory data, but they are not perfect. When applied to massive amounts of data, the seemingly straightforward visualizations become blurred and cluttered with obstructions and overlap. If not dealt with properly, this may then affect the user’s ability to observe and explore them. In addition, increases in data volume put pressure on the quality of graphics. To cope with this pressure, the performance of hardware devices has to constantly be improved, thereby raising the threshold of trajectory analysis. As such, for massive amounts of trajectory data, a more effective visualization method is required to gain insight into the movement patterns of moving objects in cities.
Following wide-ranging research on the topic, the Alibaba team has now proposed a vector field generation algorithm for massive trajectory data that can transform trajectory data in a specific time segment into vector field data, enabling expression and characterization of both ‘human flows’ and ‘vehicle flows’. The defining characteristic of this method is that it does not directly visualize massive amounts of trajectory data, but rather aggregates it, extracts its main features, converts it into vector field data, and then selects an appropriate visualization method to present it with. Since the vector field data retains the main features of the trajectory data and greatly reduces the volume of the data, it clearly and intuitively reflects the movement laws of moving objects in urban areas while eliminating visual obstructions and reducing the pressure on graphical quality.
The following sections discuss the algorithm’s component steps in sequence, as displayed in the diagram below.
Trajectory data is composed of multiple sample points, including the position, time, and other attributes of movement points. The Alibaba team first calculated the position of the movement points based on all of the sample points. They then calculated the direction and size of the entering and leaving vectors of each movement point according to the inflow and outflow between every two sample points in the trajectory data, wherein the size included the number of trajectory and the speed of the moving object.
All the entry and exit vectors of the movement points obtained in the previous step are filtered according to the customized threshold for trajectories, obtaining the entry and exit major vector of each movement point.
In this step, all entry and exit master vectors of all movement points obtained in the previous step are classified and aggregated according to the direction of the customized vector field. At most one entry master vector and one exit master vector for each direction is generated. Simultaneously, it is necessary to count the average speed, average moving distance, and average difference angle of the vectors in each direction.
Next, an n*m grid is tiled into the user-specified area, and the master vector of each movement point in each direction is diffused into the n*m grid according to certain conditions and rules. Among them, in the diffusion, the vectors diffused in the grid remain in the same direction and at the speed, with a lower number of trajectories. A grid is affected by the radiation of a master vector only if the following conditions are met:
· The distance between the center of the diffused grid and the movement point is not greater than the average moving distance of the movement point.
· If the diffusion vector is an entry vector, the angle between the vector formed by the center of the diffusion grid and the movement point and the diffusion master vector should be between [180 — average difference angle, 180 + average difference angle]. If the vector is an exit vector, such an angle should be between [- average difference angle, + average difference angle].
In the previous step, the same grid may be affected by the radiation of multiple vectors, resulting in multiple diffusion vectors. Therefore, in this step, the aggregate vectors in each direction in each grid (including the vector field direction, the speed of movement, and the number of trajectories) must be calculated to obtain the final vector field data.
In this method, the threshold for trajectories must be defined, as well as the direction of the vector field and the number of grids. The threshold for trajectories is mainly used to filter movement point vectors so as to preserve the main trajectories and prevent ‘noise’ from interfering with the accuracy of the result. The vector field in different directions is calculated so as to avoid the movement being offset in the opposite direction, thereby retaining more details and getting a more accurate final result. When the number of grids is defined, it is necessary to balance the calculation pressure caused by numbers that are too large and the rough effect caused by numbers that are too small.
To make the method more adaptive, the Alibaba team did not use fixed angles and distances in the diffusion of vectors. Instead, they used the average moving distance and average difference angle in all directions, allowing the diffusion to adapt to different vector distributions and the result to be more reasonable.
The figure below shows the visualization of a city’s mobile phone signal data between 8:00AM and 8:10AM on August 14, 2017.
In this case, the Alibaba team used particle flow to represent the vector field data. The number of particles in the grid represents the number of trajectories (i.e. the number of moving objects); the moving direction of the particles represents the moving direction of the moving objects; and the color and velocity of the particles both represent the velocity of the moving objects (the higher the speed, the closer the color is to blue, and the lower the speed, the closer the color is to red). Meanwhile, controls have been provided for adjusting parameters for interactive query, such as the vector field direction, trajectory thresholds, and number of grids. In this way, pressure on graphical quality is significantly reduced and visual overlaps are removed, making people’s movements through the city readily observable.
(Original article by Guan Huihua关会华)