Deep Learning (DL) has achieved significant progress over the last decade. In industry, we have seen impulses in DeepMind AlphaGo, Amazon Alexa, and OpenAI GPT-3. In academia, breakthroughs in artificial neural network architectures have also continually been made from the initial AlexNet to Inception, ResNet, and the recent Vision Transformer (ViT).
While DL has shown its strength in understanding unstructured data (i.e., images, videos, audio, and text data), the data tooling has not caught up (or commoditised) yet. Building in-house AI solutions requires not only a tremendous investment in both time and hiring, but also intrinsic transformation of the team culture. As a result, only big tech companies have the luxury to form a multidisciplinary team to build exclusive functionalities and components to process their unstructured visual data to distill business insights or deliver vision applications.
To date, there isn’t any simple tool that can make one easily tap on the value of unstructured visual data.
In spite of the fact that according to projections from IDC, 80% of worldwide data will be unstructured by 2025, the current data industry has paid little attention to unstructured data. The data journey of unstructured data mostly ends up at a certain storage component, without being further processed for extracting value. This status quo is counterintuitive.
At Instill AI, we are dedicated to making Vision AI accessible to everyone. To fix the broken value chain in the modern data stack for unstructured visual data, we propose integrating with a component, named visual data preparation (see Why Instill AI exists). The rest of this post will focus on the components of the modern data stack, the stakeholders, and their role and goal of visual data preparation.
Modern data stack
First and foremost, it is helpful to be knowledgeable of the status quo of the modern data stack, i.e., what we have on the plate if we would like to build a data pipeline to enhance the business. The modern data stack cares mostly about structured and semi-structured data. The data journey begins at Data Source, then travels through an ETL/ELT, Data Warehouse, Feature Store and MLOps to ultimately deliver value to the business.
Data Source can be any source where the raw data originally comes from. For example, sales/marketing data from Salesforce, Mailchimp, etc., or structured/semi-structured data files such as CVS, JSON or XML from a Data Lake.
ETL/ELT is a tool for moving raw data from the source to the destination with necessary data cleansing process. The moving process involves Extract (E), Load (L) and Transform (T). The nuanced difference between ETL and ELT is mainly the place where the transformation happens. ETL transforms data on a separate processing server, while ELT transforms data within the data warehouse itself (the destination).
Data Warehouse can be a standalone SQL/NoSQL/key-value/time-series database such as MySQL, MongoDB, Redis, etc., or a cloud-based database solution such as Google BigQuery, Amazon Redshift, etc. It is the hub for data transportation during the data journey.
Feature Store is a data management layer for storing commonly used or representative features shared by Data Scientists and Data Engineers. When Data Scientists develop features for a machine learning model, the features can be continually added to the Feature Store to be managed and retrieved later, making the collaboration of feature engineering between the two roles more efficient.
MLOps tools are used for developing machine learning (ML) models. It includes the actual codes of the machine learning algorithm and the functionality of model training, model evaluation, model deployment, and post-production model monitoring.
Data Engineers integrate data pipelines and provide clean data sets to end users (i.e., Data Scientists). They also apply software engineering best practices like version control and continuous integration to the codebase.
Data Scientists design and construct new processes for data modelling and production using prototypes, algorithms, predictive models, and custom analysis. Data Scientists are typically aligned with a line of business and remain focused on the goals of that particular business unit or a specific project.
The outputs of the modern data stack are consumed by Data Analysts, who examine large data sets to identify trends, develop charts, and create visual presentations to help business decision makers make more strategic decisions.
It is worth mentioning that there is also a new emerging role, Analytics Engineers, who sit in between Data Engineers and Data Analysts, deliver lean transformed datasets to end-users, with the effective data tooling (e.g., ETL/ELT). While a Data Analyst spends their time analysing data, an Analytics Engineer spends their time transforming, testing, deploying, and documenting data.
The trend of tooling development in the data industry is to empower a certain traditional role to be more versatile, so they can be independent and capable of covering miscellaneous tasks for the daily jobs. The same trend applies to development of visual data processing.
Modern data stack with Vision AI
To tap on the value of the unstructured visual data, having experts and tools for the typical modern data stack is not enough. There are two ways to approach this. Organisations can leverage either off-the-shelf Vision AI as a service or build up an AI team for in-house Vision AI development and deployment.
Components and stakeholders
Vision AI as a Service provides API for inference on pre-trained models. A Data Engineer can simply call the API to convert the unstructured visual data to structured data. There are many cloud-based solutions in the market now such as Google Vision AI and Amazon Rekognition.
Data Engineers can use Vision AI as a service to harness Vision AI to connect and process unstructured visual data without any knowledge of building DL and computer vision models at all.
Data Scientists shed light on the high-level project goals and pin down what insights to be collected from the visual data, so Data Engineers can survey suitable Vision AI API to integrate with the data pipeline.
The main issue about the solution of the off-the-shelf Vision AI API is inflexibility and poor performance. This is mainly because pre-trained models are likely to underperform in a customers’ production environment due to domain difference, and the use cases usually simply don’t fit (e.g., desired categories are not defined in the pre-trained image classification model).
MLOps for Vision AI provides MLOps tooling particularly for Vision AI model development. AI Engineers and AI Researchers in a small AI team can adopt computer vision specific MLOps platforms such as Roboflow, Clarifai and V7 Labs, or, can also employ general-purpose MLOps solutions such as Google Vertex AI and Amazon SageMaker to label image data and develop Vision AI models from scratch.
AI Engineers are in charge of building data infrastructure and preparing data for AI Researchers. In addition to making POC-level model production-ready, they use modern MLOps tools to collect data, train and evaluate models, and deploy models in production. Furthermore, they monitor the online model performance in production day to day. When the model performance drops (due to domain drift or any unexpected reason), they inform AI Researchers to analyse the potential reasons and bring updated models online by iterating the model lifecycle.
AI Researchers are the experts proficient at DL and computer vision. They provide guidelines to AI Engineers about what data to collect, use DL frameworks such as TensorFlow or PyTorch to design and train Vision AI models that meet the project requirements, and write research reports to benchmark the trained models on collected datasets. These models are called POC models as they are developed in a lab environment. They have only proved their value in the offline mode, benchmarked by AI Researchers. However, they are not optimised for production and there is no guarantee that they will work as well online as they do offline.
These roles are equipped with different skill sets and expertises. Essentially, Data Engineers are skilful in manipulating structured data using the tooling in modern data stack, but they do not know much about DL and are incapable of devising and prototyping DL models. AI Engineers are good at engineering DL models but lack expertise in data engineering. Despite that both the Data Scientists and AI Researchers adopt MLOps practices, they use different tools, and work separately in their own “comfort zone”: one with structured data and the other with unstructured data. All the above factors bring disconnection and create silos between roles and need new tooling for the rescue.
Modern data stack with visual data preparation
As discussed above and mentioned in What is missing in Why Instill AI exists, we are in the era of emerging MLOps tools and Vision AI solutions. On one hand, they make tapping the value of visual data possible, but on the other hand, they provide different proprietary frameworks, causing difficulties for AI practitioners to piece them together to build a custom end-to-end solution and integrate with existing stack. The boundary of the AI/ML tech stack is also aggravating the team silo.
How to solve these issues and seamlessly bring Vision AI into the modern data stack? The answer is by introducing visual data preparation.
How visual data preparation breaks the tech silo?
Visual data preparation and MLOps for Vision AI share a lot in common. For example, they are both for processing visual data. Unlike MLOps for Vision AI that is focused on maintaining the Vision AI models, visual data preparation zooms out to take a wider look at the end-to-end visual data processing pipeline:
- Ingest unstructured visual data from data sources such as IoT devices or Data Lake;
- Depending on what insights to be derived, transform visual data to meaningful structured data representations by corresponding online Vision AI models;
- Load the structured data into Data Warehouse where end-users can access and analyse further with Feature Store, or directly send to Vision AI applications that rely on actionable insights from the visual data.
The goal of visual data preparation is not just focused on one single step but to streamline the whole process by providing
- Seamless data access: rich and robust integration with various data sources/destinations
- Faster POC to Production for Vision AI: support deploying model from different DL frameworks to accelerate time-to-value
It unleashes the power of Vision AI in the data stack by connecting the dots and breaking the barriers. To achieve all of this, we propose standardising visual data preparation and building tools within an open and maintainable framework, making it possible for communities to benefit and participate.
How visual data preparation breaks the team silo?
Good tooling helps break the team silo. With easy-to-use visual data preparation tools:
- AI Engineers can have automatic model optimization, simplified and managed model serving, and tools for production model monitoring.
- AI Researchers can have easier access to visual data for production experimentation and benchmarking.
- Data Engineers can have low-code for integrating with various data sources and destinations, and easier visual data pipeline management.
- Data Scientists can have richer insights from unstructured visual data to uncover unknown patterns and produce better analysis with no-code UI.
Like Feature Store which makes the collaboration between Data Engineers and Data Scientists easier, visual data preparation eliminates team silos by streamlining data processing across different roles with a standardised framework.
Unstructured visual data needs more love, considering the huge volume and untapped value. Visual data preparation pushes the modern data stack a step further to seamlessly integrate Vision AI, so the modern tooling can now process and extract the value of the image and video data more effectively.
To standardise visual data preparation, this is an epoch-making attempt. We are thrilled to popularise visual data preparation and build an open platform that encourages all sorts of integration and collaboration across different roles. We’d love to learn your feedback and exchange ideas. Please join our community to start getting involved.
Have a nice day!
Instill Cloud is currently in private alpha, working very closely with early users to build the most effective tool for visual data preparation. Sign in here if you would like to have a free trial.