How the neural network was trained
Posted: Thu Jan 23, 2025 8:18 am
Problem with biometrics
During the discussion, Sergei Shcherbakov noted that during the development of the system, a problem arose with identifying buyers and analyzing their faces, which had to be abandoned for legal reasons.
"We had a lot of ideas to build correlations around a person - around his face, starting with the fact that if a person stole something and was not detected or was iceland whatsapp resource detected after the fact, he is immediately accepted at the entrance and told: "You owe me, how will you pay?" But here we ran into such a gray area as biometric data. Because while we work with a top view - "on the top of the head and shoulders" - these are not personal and not biometric data, we cannot identify a person. But as soon as we start working with his face, then people from the legal department began to frown and say: let's not do this. And all our ideas - physiognomic, how someone behaves at the time of theft - they went to hell," the developer of the video analytics system admitted.
At the same time, he noted that in the future, a method for predicting the behavior of buyers based on the set of goods that are in their baskets may emerge. "It is possible to very well distinguish between people who forgot to scan the goods and those who deliberately commit the fact of theft. And there is a hypothesis that these people can be identified by the set of purchases: conventionally speaking, if we see beer, chips and something else in the basket, then this is an area of increased interest, and if diapers - then okay, in the order of the general queue. But this is a hypothesis for future research," Shcherbakov added.
Answering a question from Alexander Kraynov, Director of Artificial Intelligence Technologies Development at Yandex, about the data on which the system's neural network was trained, Sergey Shcherbakov said that the developers processed about 70 TB of video recordings, which recorded the behavior of about 500 thousand buyers over three months. But some of the processes were solved using open data.
"We trained on customer videos, but to speed up the process, we used open data sets on products: one of the tasks is to identify a specific sausage, a specific yogurt, and there are open data sets on products. And secondly, we used image classifiers that are essentially trained on open data sets. This is a combined scenario," noted Sergey Shcherbakov.
At the same time, he emphasized that it would be difficult to upload a full sample for training into open access, because it contains private data. "This is an open and debatable issue, because in fact, the privacy of the tops and shoulders is there. But in any case, this is the customer's data, and without their approval we will not upload it," noted Sergey Shcherbakov.
Alexander Kraynov noted that there are very few data sets in the public domain that contain this or that user behavior. "I find this interesting, because when it comes to complex stories, when some atypical user behavior is recorded, there is very little such data," Alexander Kraynov noted.
He called on developers to share data sets and make them publicly available to help other developers. "I would like to urge you to make some information publicly available, even if not with restrictions, even if you clean out personal information or some competitive advantage, because it will help others. And it is also very important, in addition to the data itself, to share cases on how to extract profit from data, especially from open data. This moves our industry forward," concluded Alexander Kraynov.
During the discussion, Sergei Shcherbakov noted that during the development of the system, a problem arose with identifying buyers and analyzing their faces, which had to be abandoned for legal reasons.
"We had a lot of ideas to build correlations around a person - around his face, starting with the fact that if a person stole something and was not detected or was iceland whatsapp resource detected after the fact, he is immediately accepted at the entrance and told: "You owe me, how will you pay?" But here we ran into such a gray area as biometric data. Because while we work with a top view - "on the top of the head and shoulders" - these are not personal and not biometric data, we cannot identify a person. But as soon as we start working with his face, then people from the legal department began to frown and say: let's not do this. And all our ideas - physiognomic, how someone behaves at the time of theft - they went to hell," the developer of the video analytics system admitted.
At the same time, he noted that in the future, a method for predicting the behavior of buyers based on the set of goods that are in their baskets may emerge. "It is possible to very well distinguish between people who forgot to scan the goods and those who deliberately commit the fact of theft. And there is a hypothesis that these people can be identified by the set of purchases: conventionally speaking, if we see beer, chips and something else in the basket, then this is an area of increased interest, and if diapers - then okay, in the order of the general queue. But this is a hypothesis for future research," Shcherbakov added.
Answering a question from Alexander Kraynov, Director of Artificial Intelligence Technologies Development at Yandex, about the data on which the system's neural network was trained, Sergey Shcherbakov said that the developers processed about 70 TB of video recordings, which recorded the behavior of about 500 thousand buyers over three months. But some of the processes were solved using open data.
"We trained on customer videos, but to speed up the process, we used open data sets on products: one of the tasks is to identify a specific sausage, a specific yogurt, and there are open data sets on products. And secondly, we used image classifiers that are essentially trained on open data sets. This is a combined scenario," noted Sergey Shcherbakov.
At the same time, he emphasized that it would be difficult to upload a full sample for training into open access, because it contains private data. "This is an open and debatable issue, because in fact, the privacy of the tops and shoulders is there. But in any case, this is the customer's data, and without their approval we will not upload it," noted Sergey Shcherbakov.
Alexander Kraynov noted that there are very few data sets in the public domain that contain this or that user behavior. "I find this interesting, because when it comes to complex stories, when some atypical user behavior is recorded, there is very little such data," Alexander Kraynov noted.
He called on developers to share data sets and make them publicly available to help other developers. "I would like to urge you to make some information publicly available, even if not with restrictions, even if you clean out personal information or some competitive advantage, because it will help others. And it is also very important, in addition to the data itself, to share cases on how to extract profit from data, especially from open data. This moves our industry forward," concluded Alexander Kraynov.