Freitag, 2. November 2018

A marketplace for machine learning models? onnx, pmml or pfa?

A simple search for "marketplace for machine learning models" reveals that a company called Algorithmia has built such a thing and has over 5000 models in the marketplace, most of which are machine learning related. However, if one wants to use/buy a model, one has to trust a third party, namely Algorithmia, to run that model. Also, the cost to use a model is based on the number of request. Hence the buyer does not own the model. There is another option, which does not exist yet to my knowledge, namely to directly sell the machine learning model to the buyer. This option is proposed here: https://arxiv.org/pdf/1805.11450.pdf. Although the paper is very technical, the main idea is to directly sell the machine learning model. This decouples the process of buying and running a model. Since models are trained on data and later scored on new data, this means, that the seller of the model and the buyer of the model, must have access to some similar dataset. This could be achieved by using open data. This is also the reason, why most of the models on Algorithmia are based on commond data, such as text and images. The data scientist trains his model on open data to solve a specific problem and then uploads his model to a marketplace. The buyer, could then upload open train data to the marketplace and let the model score the data to see how well the model performs. So he does not have to trust the seller of the model in terms of model accurracy. Based on the performance of different models, the buyer of the model, can choose the model, with the best accurracy and then the buyer can decide where to run the model, either on the cloud or on premise. This raises the question: What actually is a machine learning model? You could say, that a machine learning model is a specification of how to produce an output given some input. The next question is, how to transport a model from seller to buyer? This could be achieved using a model-format such as ONNX, PMML or PFA (portable format for analytics). The next question is , which of these model-format will be used? This is a wrong question to be posed. Why? Because as there are many document formats ( doc, docx, pdf, odf etc.) there can be many model-based formats. The analogy to document formats makes one thing clear: There are producers of documents and consumer of documents. The producers can nowadays choose from a set of marketplaces [just google: marketplace sell pdf], where to sell the document. Similarily the buyer can choose where to buy. This is only possible by decoupling the process of creating a document and viewing the document. Thus, it should be possible to do the same with machine learning models, using model-formats. A company comes to mind, where already a lot of data scientists, compete for producing the best model: Kaggle. Thus Kaggle or similar websites, might be candidates for offering such a marketplace for "opend-data-ml-models". The only thing they have to do, is offering a service for data scientists to export their "kernels" to a model-format such as ONNX, PMML or PFA and then let companys find the best model which solves their problem. Edit: I wrote a request to Kaggle, but did not get a response, so I decided to start an experiment, to see how the idea eveolves: mlmarketplace, buy and sell onnx, pmml, pfa What is your opinion on this topic? Comments are welcome! Please share this article if you like the idea.