DeepNFTValue is an interesting project. It applies ML to complicated questions like NFT valuation. In the last year, I am thinking about applying ML to active Uniswap LP management. This is what Kosen lab is working on.
ML is a highly professional field. I discussed DeepNFTValue with Dr. Gills.
Dr. Gills is a programmer and researcher who has been wandering the world of Web2 for quite a while. Gills is currently a Ph.D. student, working on applied Machine Learning (ML) stuff.
Although Gills entered the world of Crypto and Web3 just through some friends randomly, Gills is naturally excited about decentralized AI and ML as a researcher who is plagued by the shackles of a highly centralized working field.
So, Gills would like to introduce some innovative AI, ML, and data analytics projects in or towards the Web3, from a rather technical and conventional perspective.
As an opener, this issue will introduce DeepNFTValue, an application of machine learning in NFT pricing.
DeepNFTValue currently focuses on CryptoPunks-related NFT price prediction and valuation. Last week, the NFT project
The project aims at effectively valuing NFT, providing investment suggestions, and predicting price movements. According to the company founder’s
Based on these issues, DeepNFTValue uses ML models to value mainstream NFTs in the market with integrated and processed data from multiple channels to help users invest.
The application celebrates itself with the following features:
On the technical side, according to the project developers’ disclosure, the ML model used to perform valuation and prediction should not be a single model, but an Ensemble of multiple small models or expert systems.
The individual expert systems specifically, are currently only mentioned on the
In the blog post, the developer also answered some related questions and speculations, such as:
I know I have introduced some technical terms which are a bit strange to some readers, so here is a brief intro to them.
Let’s first talk about Deep Neural Network (DNN), which is the most popular machine learning infrastructure nowadays. We can intuitively regard it as a black box with pre-defined input and output shape. Through a series of linear and nonlinear mathematical operations, a DNN learns to turn inputs into the outputs expected by the developers.
As the undoubted top-1 model in the machine learning community now, it has many relevant learning resources, so I won’t go over it here. The diagram below shows a typical DNN architecture, just for interested ones.
Ensemble, on the other hand, is relatively new to readers who have not been exposed to ML context. In fact, as the name suggests, it represents a series of algorithms that combine multiple similar sub-models of machine learning into one model at the output side and output the prediction results.
In the field of data science, this method is often employed to improve performance, especially in cases where the amount of data is very large and requires some pre-processing based on specific properties.
It is important to note that sometimes researchers confuse integrated models with Fusion models. If the data is very different from each other and the architecture and output of each system are very well defined, this is generally a fusion model.
Integration models are more of a concept in the field of data analysis and processing and have proprietary algorithms such as AdaBoost, Boosting, and Bagging.
I think the biggest difference between ensemble and fusion is that the sub-models of a fusion model are generally responsible for different tasks and the physical properties of their outputs are very different.
The architecture of each sub-model of the ensemble is similar (as shown in the figure) and the training data are various subsets of the same dataset.
Prediction based on big data has been a very mature technical branch in the Web2 era. From simple regression models to monstrous DNNs, the models in question are not only powerful but also ubiquitous.
But even in web2, similar prediction models have not progressed well for stock forecasting, which is analogous to what this project works on.
This is mainly due to the stock market’s inherent instability and extremely high randomness. Once the application scenario is transposed to the Crypto and NFT sides, these two characteristics will only be further amplified.
More volatility and more frequent unexpected events will become outliers and have a detrimental effect on the prediction.
I am very interested in the technology used by DeepNFTValue. There are actually similar ideas in the field of stock forecasting. For example,
I speculate that the sub-models used in DeepNFTValue are a bit more complex, and not just Ensemble. Still, different models are responsible for different tasks (data integration, scoring, judgment, etc.). The architecture at the infra-level thus is probably closer to Fusion.
Web3’s ML models must themselves have knowledge of the contingency side of the equation and need to be given considerable weight during training, as seen in the recent series of black swan events.
Of course, even if the above predictions hold true, there are still many questions about this project remain. For example, how the data will be collected, the training and update cycles of the model, the degree of separation between the data and the model itself, etc. (more on this topic in
There are two other issues that I am concerned about from a traditional machine learning perspective. These two are not limited to this project, but to all ML projects in Web3, and will be concerned for future analysis issues.
Security. Although the black-box nature of DNNs makes them not very interpretable (and interpretability is a pressing problem in the ML community nowadays), it is not difficult in theory to get a DNN model to produce the output you want.
With enough data and a fake label of one’s own creation, and access to the model, an attacker can manipulate the results. Model adaptation is already a relatively mature area of research. Of course, integrated model design can circumvent this problem to some extent (multiple subsystems with similar tasks can dilute this effect), but it still sounds a bit dangerous.
Transparency. Open-sourcing and reproducibility have always been two sources of seizures for researchers. As if looking at a project involving DNNs, even if the code is completely open-sourced, it doesn’t mean that the system is transparent enough, and the researchers are not always able to reproduce the results.
This is likely not because the developers are deliberately hiding something, but because DNN training and debugging can be fragile against too many variations (hardware, software version, data segmentation …..). The results of DNN training and debugging are affected by too many factors (hardware, software version, data segmentation).
In Crypto and Web3, I think people will only care more about them. So I’m looking forward to more information and white papers from this project. As a prototype-like instance, I’m curious to see how they, as senior developers and researchers, deal with those problems and odds.
I believe DeepNFTValue is just the beginning of machine learning applications for blockchain and Web3. It doesn’t have fancy ideas from a technological perspective (for now) and has many questions pending. But its orientation, the user experience, and related operations are very much worth acknowledging and learning from.
Like this project, if you readers find something that meets the following criteria.