The Yomiuri ShimbunArtificial intelligence has somewhat taken off. What supports the “intellect” of AI are huge volumes of diverse “big data.” But what is big data, and what should it be used for? Makoto Shirota, who specializes in the investigative research of information technology and has detailed knowledge on big data, spoke about it during an interview with The Yomiuri Shimbun.
The Yomiuri Shimbun: The phrase big data is frequently used when speaking about AI. Nevertheless, even people from IT companies admit they don’t really know what it is.
Shirota: In reality, the term big data does not have a clear definition. In Japan, this term started being used around 2012. At the time, it was used with the vague implication of large volumes of data that was difficult to manage using standard technology.
Q: Has that changed?
A: It has come to refer to various huge sets of data, such as textual information on the internet, information from surveillance cameras and sensors and a customer’s purchase history. Previously, these types of data were discarded. However, the idea came up to store this data instead of discarding it. As a result, the current trends lean toward analyzing this data in an attempt to discover some sort of treasure.
Q: Does this mean that people realized the significance of data?
A: It is becoming clearer that analyzing huge sets of data would make it possible to see big-picture trends. Ideas that used to be somewhat intuitive can now be clearly seen through numbers.
Q: Did it begin in the United States?
A: At an early stage, U.S. companies such as Google and Amazon discovered the value that data produces.
Q: Has Japan lagged?
A: Yes. Many companies believed it wouldn’t be cost effective to spend large sums of money to store data that may or may not be useful. However, the trends changed through advances in data analysis technology and the ability to store large volumes of data at a low cost.
Q: How is big data being used?
A: First, it is being used for market analysis. When a customer presents a rewards card when making a purchase at a convenience store, this card contains data about their purchase history. For example, the card run mainly by Lawson has more than 80 million members and over 100 partner companies. By analyzing this big data, it is possible to efficiently prepare product selections based on gender, age, region and other characteristics. By sharing data between partner companies, the value will increase further.
Amazon uses big data to drive sales by observing what other products people purchase in addition to a specific product. Amazon is a typical company whose business is based on the power of big data.
Q: Does this apply to fields outside of shopping?
A: In Japan, big data is used to analyze where the next flu outbreak might occur based on Yahoo search terms, in order to come up with countermeasures.
It is also frequently used to predict elections. The outcomes of elections are projected based on the number of searches for candidates and political parties, as well as the terms frequently searched in conjunction with them.
Q: Are these projections accurate?
A: They are quite accurate.
Q: Do candidates perform this type of analysis themselves?
A: They couldn’t, even if they wanted to. Only places possessing search data, like Yahoo and Google, are able to do so.
Q: In other words, data is like a “gold mine,” and there is a large gap between those who have it and those who don’t.
A: That’s correct. Essentially, you cannot do anything if you do not have data. Those who first control it put themselves in a stronger position. However, there is the question of whether this is OK.
Q: Can you elaborate?
A: Recently, people have started to discuss whether or not something like the Antimonopoly Law is necessary for big data in Japan. The idea has been brought up that it would not be good for certain companies to monopolize data that contributes positively to the public, to a certain extent.
Q: Is there anything big data cannot do?
A: It is influenced significantly by the quality of data. If the quality of the collected data is not very good, no matter how hard you try to analyze it, you will not produce useful results.
Q: How is quality determined?
A: It depends on whether or not necessary categories are contained. Let’s say that you would like to analyze the best-selling products in a certain region of Shinjuku, Tokyo. If you expand the sample of the data to all of Shinjuku due to concerns about privacy and the protection of personal data, the data will be useless. What is important is how you can gather as much necessary information as possible.
Q: What is needed to utilize big data?
A: There are some managers who believe their companies can become profitable just by analyzing big data. However, what is important is how you connect this analysis to your next actions.
There are many companies that are troubled by the question of how much analysis is acceptable. There has been one instance in which a supermarket in the United States created an uproar by sending maternity products to a female high school student after assuming she was pregnant based on her purchase history. Consumers feel it is unpleasant to be targeted too much, because it feels like their privacy is being invaded. This is a delicate issue, and how to handle it is a significant task.
Q: Can big data be used to predict the future?
A: It is frequently said that it may be used for stock market prediction. Stock companies in the United States do so, but it can only be used for short-term forecasting of about one month, at best. Only past data can be used as a basis for predictions. If major events or accidents occur unexpectedly, it becomes impossible to predict the future.
Q: Big data, AI and the internet of things — technology that connects things to other things — serve as the pillar of the fourth industrial revolution that the government is currently advancing. Will a new pillar emerge in the future?
A: I don’t expect that to happen for a while. Rather, policies that are clearly lacking but would advance the revolution, such as nurturing human resources for analyzing data, will become important.
(This interview was conducted by Yomiuri Shimbun Senior Writer Keiko Chino.)
■ Makoto Shirota / Senior researcher at Nomura Research Institute
Shirota graduated from the School of Engineering at Hokkaido University. He investigates the latest technologies and researches their impact on companies and society. He is the author of “Big Data no Shogeki” (The impact of big data), “Personal Data no Shogeki” (The impact of personal data) and other works. Shirota is 46 years old.