Skip links

The 5 guiding principles to choose the right supplier for your annotation tasks

Labeled and annotated data are the pulsating heart of every autonomous system. Machine Learning and AI technologies have enabled tremendous learning capabilities for robots and software relying on vision, and 3D or image labeling is still playing a fundamental, necessary role in their success.

Due to the importance of annotations, there’s a huge need in the market to access labeled data. Lately, some major players in the Autonomous Driving landscape have started to release to the public their own annotated datasets with the goal to foster advances the field – We, at deepen.ai have also provided access to part of KITTI 3D dataset with semantic segmentation – While Public datasets can significantly help the research community, private companies (but also researchers) are always hungry to label a ton of newly collected data. 

There are various choices that can be done in relations to the labeling strategy to adopt. One of them is to outsource the annotation job to external suppliers. Which means you send your dataset to a third party, along with labeling specifications, and you receive back your annotated dataset. On one hand, this comes with many advantages and, depending on the case, it can definitely be the optimal decision. On the other, there are some critical points which need to be carefully handled when outsourcing labeling tasks to third party organizations. 

Getting a labeling job done is like going through a journey: first, you need to make sure you are buying the right service for your needs. Once you have figured that out, the journey starts from sharing and checking the dataset, laying down specifications. Subsequently, pre-labeling start and quality checks follow, before the outcome can successfully be shipped to you.  

I put together a detailed checklist listing the crucial aspects along this journey you have to focus on when selecting a labeling supplier, which are valid for both LiDAR or image labeling, namely:

  • 1. Buy the right drill for the right hole
  • 2. Data integrity, data integrity… Data integrity
  • 3. Don’t score in your own hoop
  • 4. Give your house keys to people, not software
  • 5. Keep the Helm of your labels’ ship

Let’s dive into the details of each one. 

1. Buy the right drill for the right hole 

The first aspect I want to focus on is about getting the very basics right. I am sure you are familiar with this kind of situation: Your green thumb recently sparked and you want to start hanging beautiful plants from your ceilings. You go to the tool shop to buy an impact-drill to make the holes, but once you get back home you realize your concrete is not suitable for the drill bits you have, and also the power range of your brand new drill is far from being able to make a hole in your steel reinforced ceiling. Of course, you can go back to the shop more prepared with your receipt and get it right. But what’s happened already is that you wasted time, and probably lost your temper. 

The same can happen with your annotation vendor if you don’t make sure to check beforehand if your technical specifications match the ones of the tool and/or team your vendor uses. Multiple sensors, different sampling frequencies and so on, do not only pose a threat to data integrity but sometimes can also determine whether a third party labeler is really able to successfully digest and process your data or not. What you want to avoid, is wasting time and resources buying a tool or service that’s not adequate for your requirements.

2. Data integrity, data integrity… data integrity

Data integrity is a very very important aspect of your data collection and annotation pipeline. 

Today’s most complex computer-vision based systems, such as autonomous vehicles, rely on a multitude of different sensors that collect data from the world. Multiple cameras and LiDARS are an example of that. What happens when you have a multitude of sensors working together is that you incur in sensor fusion related challenges. 

Usually, the possible issues with bad raw sensor fused data are originated by inaccurate calibration and synchronization.  These two terms refer to inconsistencies either in space or time domain. Cameras might be slightly misplaced and their frames do not accurately match with LiDAR frames, for example. Or the different sensors work at different sampling frequencies and the fused frames suffer from time-shifts and/or duplication issues. 

Regardless of the case, all of these problems represent a big obstacle for annotation and labeling. It is not possible to accomplish accurate annotation tasks on miscalibrated or unsynchronized frames. Of course, when you deliver your raw dataset to an annotation vendor, it should be able to spot this kind of issues within your dataset. Unfortunately, this could be very inconvenient for you to deal with. Not only it can produce communication friction, but it can also take significant time and resources for you to go over your data and fix data integrity issues.

That’s why your labeling supplier should be able to fix calibration and sync issue himself, in addition to giving you guidance on how to avoid incurring in the same kind of issues in the future. Proper and accurate data integrity checks can be a huge painkiller for your annotation needs,  so, don’t take this too lighthearted and make sure your vendor has this solved. 

3. Don’t score on your own hoop 

Majority of companies dealing with the need to annotate datasets is probably working on cutting edge computer vision-based autonomous technologies. Which means that they are most likely developing innovative and unique tech that can translate in sustained competitive advantage. No need to say that confidentiality is a fundamental requirement in this case and a key aspect to consider in their supplier decision. 

As you may know, annotation tasks are predominantly accomplished by humans who manually label and classify objects in the single frames. However, automated pre-labeling can represent a great advantage, depending on the case. Automated pre-labeling means that the labeling software comes up with (majority) of the labels itself, and humans have just to vet the machine’s output and correct or refine mistakes. The efficiency and efficacy of pre-labeling can have a significant impact on both the quality of your delivered annotations and also on the price you will end up paying for the service. Pre-labeling usually entails some ML algorithm itself, whose efficacy and efficiency improve the more data they are fed with.  

While, of course, efficient automated pre-labeling is desirable for you, what you want to avoid is that your data is being used to make pre-labeling better for customers who will come after you. In this sense, you could potentially indirectly benefit competition by using a specific supplier for your annotation needs. So, golden rule, make sure the service you are using treats customers data in silos and does not exploit your proprietary dataset to train their automated labeling tech outside of the scope of your own project. 

4. Give your house-keys to people, not software

As mentioned before, annotation tasks are predominantly humanly driven, and partially software-assisted. Regardless of the case, usually annotating datasets takes multiple steps. Automated pre-labeling can be the first one of them, followed by one, or usually multiple rounds of quality checks. 

Quality is a fundamental aspect of computer vision labeling. Every machine learning engineer knows well how the quality of data determines the quality of the output of a machine learning model. Moreover, in the field of Autonomous Vehicles, quality and accuracy are vitally important. As safety is the number one priority in AV development, training ML models on bad or inaccurate data is the recipe for disaster. In order to avoid that, everything starts from accurate, high-quality labels. 

What you want to make sure of is that Quality checks are always carried out by humans with the annotation tool in their hands and under their control. It can certainly be convenient to auto-generate labels, and sometimes checks can and should be automated. However, the only real way to ensure pixel or point level accuracy is by having a human-driven vetting process for each label and each frame. Accuracy of the software cannot beat a trained human with high attention to details. 

On top of that, it is crucial to have consistent checks for the whole dataset, not just for a sub-sample of it. The entire burden of quality check should be on your annotation supplier, not on yourself. Keep that in mind and raise this point during your next conversations with your vendor. 

5. Keep the helm of your labels’ ship

Right next to quality, the other thing which you really care about when considering to outsource your annotation tasks… It’s friction. While 3D or image labeling can seem a trivial task, each case is different and many times specifying how to label single corner cases beforehand is non-obvious at all. There’s a pedestrian with an umbrella in hand, should I include the umbrella in the bounding-box or not?… A car is 90% occluded by a giant tree, how should I treat it? These are just simple examples of practical issues that could easily lead to tedious back and forths with your supplier and ultimately delays and hidden costs. 

Usually, the company you chose for your labeling task should be able to guide you through the development of a comprehensive, unambiguous and highly effective specification document. However, no matter how good this spec. Doc will be, there are some chances that some small things here and there will need to be slightly adjusted or modified. And that’s where you would like to have a very frictionless experience.

How you can avoid the annoying and time-wasting back and forth to get those small fixes done? Simply by doing it yourself. Having access to the labeling tool your supplier uses is a fundamental piece of the puzzle. Not only this will help you in monitoring the status of your labels and doing some small quick fixes. It will be also very beneficial in establishing and communicating your annotation specifications. There are a ton of cases in which an example is more eloquent than a thousand words. Accessing the tool and editing the labels yourself will make you way more effective in developing specifications, do some quick fixings and have a positive frictionless experience with your supplier.  

Following these 5 guidelines should help you in making the right decision when it comes to choosing the best annotation supplier for your needs. Keep in mind that, at the end of the day, what you are really looking for is high-quality output and a frictionless experience.