In the past several months, I have seen several talks, papers and discussions about the machine learning systems. The term “Machine Learning Systems” is too big to be discussed in a short blog post, and therefore in this post, I will try to focus on a single topic: Why it is important to have a system that supports publish, discover, deploy and monitor machine learning.
Does SOTA Reflect Real World?. Currently, most hypotheses about how to improve deep learning models are based on empirical experiments. These experiments are conducted on top of several well-established dataset, imagenet for example. However, it is still a question that even though the SOTA models are working well on these dataset, does it really works on the real scenario?. Some models have been tested on rather large dataset, and we could probably assume it works on general image data. For example, a model trained on imagenet will probably works well on some new data that have similar distributions with imagenet. However, the scenario in real life might be different: for example the different light intensity in images, some blur or missing parts of the image, some broken sentences, etc. Most deep learning models have not been tested under real world scenario after they were published.
If there is such a system, that users, who have a real world scenario, can easily leverage the SOTA models, then things might be a bit different. Companies, individuals, governments could give some feedback to deep learning researchers and tell them when the models works and when they will not work. The benefit will be three-fold:
- For companies, individuals, governments and other users, such system will help them integrate SOTA models to solve their own issues.
- For researchers, they could test their models in real life, and get some insight and feedback from the users. They will know the advantages and drawbacks of these models better.
- When several models are used and tested in a same scenario, like a challenge, we can easily know which works better and which is worse. Such information could significantly help us design better models.
The Gap between Production and ML Research. In the research environment, we usually have rather clean settings: we have a dataset, and we write some scripts that read the input and calculate the accuracies or some other metrics. However, in production environment, things become more complex and somehow tedious: we need to accept the user input from some client software, and we are supposed to give the predictions in a given time limits. Besides these, different researchers use different frameworks and software libraries, but when deploying them we do not want to meet any conflicts between them. Hence we will have to isolate the environment for each machine learning models. Beyond these things, there are many stuffs that need to be resolved when we need to deploy the machine learning models.
Large companies could afford to hire some “deep learning engineers” to focus on deployments, however, small companies and individuals cannot. What make things worse is that these deep learning engineers have to identify the frameworks, libraries, operating systems and so on from the machine learning models, which takes them extra time and efforts. On the other hand, the deep learning researchers know their models better than anyone else. Hence inspired by a popular concept “DevOps”, we are inspired to design a system that allows deep learning researchers to directly deploy their models into production with ease. Even though the system might be less efficient, it provides an interface that bridges building the model and deploying the model.
In the last part, I want to describe how we think about AID and why it could be different from other systems.
There are already some systems that support deployments of machine learning models. However, they can hardly say that they are a full lifecycle manager for machine learning. For example, how can they be used to interpret the models? How can they be used to compress and deploy the models into mobiles? How users could provide feedbacks to the models? These questions still need to be solved by developers, but are there anything in common that can be solved in a clean, decent, and unified way?