“Scale is growing. Grow with us.”

Founded: June 2016, San Francisco, US

Category: Artificial intelligence/Machine learning

Primary office:San Francisco, USA

Core technical team: San Francisco, USA

Status: Private

Employees: 97 + 30,000 contractors scattered across the globe aiding in the object-identification and labelling process.

Amount raised:USD$298.1million(8 rounds – Sept 2020


  • Trusted by world class companies, Scale delivers high quality training data for AI applications such as self-driving cars, mapping, AR/VR, robotics, and more.
  • Scale AI believes that the transition from traditional software to AI is one of the most important shifts of our time. Their mission is to make that happen faster, across every industry.
  • One of the biggest bottlenecks for real world AI applications is access to labeled data. Scale AI’s first product is the developer-friendly data labeling API for AI applications: self-driving cars, mapping, AR/VR, robotics, drones, retail,
  • Building infrastructure is more durable than building applications, Scale will amplify the whole impact of machine learning by bending the curve of technology, and therefore the curve of humanity.
  • Scale AI products for image annotation, semantic segmentation, 3D point cloud annotation, and LIDAR and RADAR annotation are used by industry leaders and provide world-class accuracy.
  • Impactful partnership: Scale AI is the standard solution for quality, cost, and scalability and takes the pain out of annotating data and creating high quality datasets.
  • Other metrics: Scale’s Platform and products were developed by ML engineers for ML engineers to deliver large volumes of unbiased, highly accurate training data at speed.


  • Valuation: USD$1 billion (Aug 2019)
  • Revenue: USD$5 million (2019)
  • Customers include Toyota, Voyage, Embark, Lyft, Open AI, Skydo, Skip, Sea Machines, Standard cognition, Pinterest, SAP, Samsung, nuro, doordash, NVIDIA, Honda, Airbnb, Valeo, APTIV,
  • 121,983 monthly web visitors, according to SimilarWeb


  • The world’s most advanced LiDAR dataset for commercial use (May 2020)
  • Launch of Scale Document – a new product as an endpoint for the secure processing of documents (April 2020)
  • Product Merger for impactful service: Sensor Fusion Cuboids and Sensor Fusion Segmentation is now one product (Scale 3D Sensor Fusion)
  • Scale image: 3D Cuboids, Bounding Boxes, Image Categorization, Lines & Splines, Polygons and Semantic Segmentation
  • The Natural Language products have been updated to Scale Text with Scale Audio currently in private beta.
  • Feb 2020; Open Sourcing the World’s First AV Dataset for Wintry Environments


Data annotation products to support an increasing range of data inputs and annotation types for computer vision and natural language (NLP) applications.

  • Scale 3D Sensor Fusion; The advanced annotation platform for 3D sensor, LiDAR and RADAR data.
  • Scale Image; Comprehensive annotation for images
  • Scale Text; Sophisticated annotation for text-based data
  • Scale document; Secure processing of document
  • Scale video; Scalable annotation for video data
  • Scale Audio
  • Scale Nucleus; Nucleus is a new way, the right way, to develop ML models, helping to move away from the concept of one dataset and towards a paradigm of collections of scenarios


  • Partnerships and individual investments including Dropbox founder Drew Houston, Twitch founder, OpenAI, Quora
  • Community: Toyota, Voyage, Embark, Lyft, Open AI, Skydo, Skip, Sea Machines, Standard cognition, Pinterest, SAP, Samsung, nuro, doordash, NVIDIA, Honda, Airbnb, Valeo, APTIV.


  • Big data, cloud, AI/ML
  • Highlights machine learning’s intimate bond between human contractors and algorithms. The “human insight” can help minimize labeling bias and provide customers data that is more precise and more accurate,
  • Technology: Developer of a platform designed to accelerate the development of AI applications. Company’s software is more advanced and is able to label data faster and cheaper than the current alternatives.
  • Offering solutions to various industries such as, Self-driving cars, Drones, Robotics, AR/VR and Retail

Distinct AI Features


Data labeling is not only practically important, it is also philosophically important to the field. Machine learning is a form of metaprogramming—the developer doesn’t directly write the program; the developer writes a program which itself writes the program. The developer provides a rough framework for what the program should look like (usually a neural network), and what its goal should be (usually a labeled dataset), and that spits out a program that is nonsensical to humans, but is better than any program a human could ever write.

  AI use

  • Scale accelerates the development of AI by democratizing access to intelligent data. By leveraging its API for autonomous vehicles and other use cases, companies like Alphabet, Voyage, nuTonomy, Embark, DriveAI and others, leverage Scale to turn raw information into human-labeled training data that dependably powers their AI applications.
  • Scale uses a combination of high-quality human task work, smart tools, statistical confidence checks and machine learning to consistently return scalable, precise data. Scale AI turns raw data into high-quality training data by combining machine learning powered pre-labeling and active tooling with varying levels and types of human review.

AI useRate of return on customer’s investment to make AI work


  • Addressed the need for large volumes of annotated data with a commitment to safety being non-negotiable in self-driving industry

Long term:

  • The ability to go to Scale for multi-modal annotation provides advantages in the automated driving space, including scenarios they don’t foresee today.

Standard Cognition

  • Immediate:Scaling ground-truth data with Scale AI and classifying large volumes of images to develop an autonomous checkout system.
  • Long term: Experience the future of retail, by developing an autonomous checkout platform for brick and mortar retailers that can change how people shop in the future.s


  • An open-source data set called PandaSet that can be used for training machine learning models for autonomous driving with 48,000 camera images and 16,000 lidar sweeps and more than 100 scenes of 8s each.
  • ImageNet a repository of 14 million labeled images in more than 20,000 categories.

Quantum Computing

  • A carefully trained machine learning algorithm can process very large data sets with enormous efficiency. One branch of machine learning is known as a convolutional neural network (CNN) – an extremely powerful tool for image recognition and classification problems. A quantum computer could develop AI-based digital assistants with true contextual awareness and the ability to fully understand interactions with customers. That is because quantum computers have the potential to sort through a vast number of possibilities within a fraction of a second to come up with a probable solution.



  • Data set:Labeled data is the key bottleneck to the growth of the machine learning industry. In fact, labeled data is even more essential than algorithms. ImageNet is a repository of 14 million labeled images in more than 20,000 categories.
  • Innovation and reputation


  • Detailed labeling for companies’ old data via point cloud segmentation in self-driving car industry: using 3D maps of the environment around a vehicle to encode what every point corresponds to (pedestrian, stop sign, window, shrub, and stroller).
  • The team is also encoding the behavior of drivers, pedestrians, and cyclists with technology including “gaze detection,” which aims to indicate whether a driver might yield or a pedestrian plan to cross the street.


  • Building the Future of Autonomous Vehicles
  • Collecting and open sourcing labeled data
  • Labeling data for companies, allowing them to identify blind spots and biases