Distance Finding in Mobile Autonomous Systems

Damien Clarke - Senior Consultant, Data Exploitation

By: Damien Clarke
Lead Consultant, Data Exploitation

7th November 2018

Home » Damien Clarke

You might already be familiar with FiveAI, a driverless car startup based in Cambridge, and their recent work making headlines but for those that aren’t, allow me to bring you up to speed. FiveAI’s vision is to bring a shared driverless taxi service to London by next year and they have already started gathering data of London’s streets with their iconic blue branded cars.

A key component in the development of mobile autonomous systems is the ability to produce a 3D map of the local environment which can be used for route planning and collision avoidance (e.g. sense and avoid). There are various sensors which can be used to achieve this and each one has specific advantages and disadvantages.

Stereo Vision

The first approach (and one that many animals, including humans, use) is to combine images from a pair of cameras placed at slightly different positions to enable depth perception. This is achieved by determining the horizontal disparity between the same object in both cameras. Nearby objects produce a large disparity in position between the two cameras whereas far objects have a small disparity.

This technique can also be used with a single camera if it is moving as the video is effectively a series of images taken at different positions. This is known as Structure from Motion and is commonly used with airborne cameras, such as those used on small consumer drones.

The primary advantage of this technique is that cameras are small and inexpensive. At close range, good depth resolution can be achieved and fused with the image content itself to produce a 3D colour image. A large number of cameras with overlapping fields of view can potentially produce a 360° panoramic 3D map of the environment around a vehicle.

The main limitation of this approach is that it will only work when suitable images can be produced and therefore adverse environmental conditions (e.g. dust, fog, rain, etc.) will prevent the production of a 3D map. Operation at night time is potentially possible with illumination or the use of thermal imagers rather than standard cameras. Poor camera dynamic range can also be a problem as bright lights (e.g. headlights or the sun) will cause glare. In addition, the processing required to locate features within both images and match them is complex and adds computational burden when using this technique to produce a 3D map.

Lidar

An alternative optical approach to stereo vision is a scanning laser range finder, also known as lidar. This approach uses a laser to send a pulse towards a surface and a sensor to record how long it takes for the reflection to return. The measurement of the time of flight can then be used to determine the range. To produce a 3D map of a scene, this beam must then be scanned in azimuth and elevation. To reduce the amount of scanning, some lidar sensors use multiple beams at different elevation angles and then only scan in azimuth.

Lidar has very good depth resolution and due to the narrow beam can also produce very good lateral resolution. In general, the technology for emitting and sensing light is entirely solid state, however, at present many lidar system still use the mechanical method to scan the beam across the scene. Fully solid state systems would be small and cheap, though this promise has not yet been fully realised in commercial lidar systems which are often large and expensive.

As simple lidar sensors only record the time for the first reflection to return, a drawback of some lidar systems is that they will only detect the nearest object in a specific direction. This is problematic when the environment is dusty or foggy as the first reflection may not be from a solid object and the resulting 3D map will be degraded. More sophisticated (and costly) systems measure the entire reflection over time which then allows a full range profile to be measured through the obscurant. Direct sunlight can also produce problems as the large level of background illumination can make it difficult to detect weak reflections. Similarly, if a surface has low reflectivity (i.e. it is black) then it may not be detected by the lidar. This can be a problem for autonomous vehicles as black car surfaces will only be detected at a closer range than more reflective vehicles.

Radar

Radar is similar to lidar but uses microwaves rather than light (typically 25 or 77 GHz). Lidar was in fact inspired by radar (e.g. laser radar) and only became possible once lasers were invented. The exact mechanism by which the distance is measured varies slightly between different radar systems; however, the concept is the same. A signal is emitted, the length of time it takes for a reflection to return is measured and this is then converted into a range profile. While panoramic mechanically scanned radars are available, it is more common to use an antenna array and calculate the angle of arrival of a reflection by the difference in signal across the array.

One advantage of radar is the ability to measure speed directly via Doppler shift without complex processing. Therefore, objects moving relative to a mainly static scene are generally easy to detect. Poor environmental conditions (e.g. fog, rain and snow) have little impact on the performance of the radar which provides a useful all-weather capability for autonomous vehicles. Single chip radars with integrated processing capabilities are also available for use as small and inexpensive sensor solutions.

A disadvantage of radar is the limited lateral resolution. While the depth resolution can be good, the angular resolution is significantly lower than for optical sensors. However, this is partially mitigated if an object can be uniquely separated from other objects and clutter by its range or velocity value.

Ultrasonic

The final sensor used for range finding on autonomous vehicles is an ultrasonic sensor which emits high-frequency sounds beyond the range of human hearing. Bats are, of course, well-known users of this approach. Ultrasonic sensors are very similar to lidar sensors; however, as the speed of sound in air is vastly slower than the speed of light it is much easier to measure the time for a reflection to return from a surface.

Ultrasonic sensors work well regardless of light level or environmental conditions and are very small and inexpensive. This makes the technology ideal for ultra-short range collision avoidance sensors on small or slow moving vehicles which can be placed in many locations to provide wide area coverage.

The main disadvantage of ultrasonic sensors is their extremely short range as they can only produce distance measurements for surfaces up to a few metres away. For this reason, it is also uncommon for an ultrasonic sensor to be used to explicitly form a 3D map.

Data Fusion

In practice, to achieve a robust and effective sensor solution for autonomous vehicles it is necessary to combine different sensors and perform sensor fusion. As yet there is no standard sensor suite and research is still ongoing to determine the optimum combination with an acceptable performance across all weather conditions.

Furthermore, as an example, Tesla’s latest models that are claimed to be suitable for autonomous operation have eight cameras (with varying fields of view) and twelve ultrasonic sensors to enable panoramic sensing while a single forward-looking radar measures range and speed of objects up to 160m away.

The combination of cameras with radar is a common sensor choice as it provides good lateral and range resolution under various weather conditions for a relatively low price. It remains to be seen whether or not it is sufficient for safe autonomous operation without the addition of lidar.

Save

Save

Save

Save

Save

Save

Save

Save

Save

Save

Save

Save

Save

Save

Save

You might already be familiar with FiveAI, a driverless car startup based in Cambridge, and their recent work making headlines but for those that aren’t, allow me to bring you up to speed. FiveAI’s vision is to bring a shared driverless taxi service to London by next year and they have already started gathering data of London’s streets with their iconic blue branded cars.

A key component in the development of mobile autonomous systems is the ability to produce a 3D map of the local environment which can be used for route planning and collision avoidance (e.g. sense and avoid). There are various sensors which can be used to achieve this and each one has specific advantages and disadvantages.

Stereo Vision

The first approach (and one that many animals, including humans, use) is to combine images from a pair of cameras placed at slightly different positions to enable depth perception. This is achieved by determining the horizontal disparity between the same object in both cameras. Nearby objects produce a large disparity in position between the two cameras whereas far objects have a small disparity.

This technique can also be used with a single camera if it is moving as the video is effectively a series of images taken at different positions. This is known as Structure from Motion and is commonly used with airborne cameras, such as those used on small consumer drones.

The primary advantage of this technique is that cameras are small and inexpensive. At close range, good depth resolution can be achieved and fused with the image content itself to produce a 3D colour image. A large number of cameras with overlapping fields of view can potentially produce a 360° panoramic 3D map of the environment around a vehicle.

The main limitation of this approach is that it will only work when suitable images can be produced and therefore adverse environmental conditions (e.g. dust, fog, rain, etc.) will prevent the production of a 3D map. Operation at night time is potentially possible with illumination or the use of thermal imagers rather than standard cameras. Poor camera dynamic range can also be a problem as bright lights (e.g. headlights or the sun) will cause glare. In addition, the processing required to locate features within both images and match them is complex and adds computational burden when using this technique to produce a 3D map.

Lidar

An alternative optical approach to stereo vision is a scanning laser range finder, also known as lidar. This approach uses a laser to send a pulse towards a surface and a sensor to record how long it takes for the reflection to return. The measurement of the time of flight can then be used to determine the range. To produce a 3D map of a scene, this beam must then be scanned in azimuth and elevation. To reduce the amount of scanning, some lidar sensors use multiple beams at different elevation angles and then only scan in azimuth.

Lidar has very good depth resolution and due to the narrow beam can also produce very good lateral resolution. In general, the technology for emitting and sensing light is entirely solid state, however, at present many lidar system still use the mechanical method to scan the beam across the scene. Fully solid state systems would be small and cheap, though this promise has not yet been fully realised in commercial lidar systems which are often large and expensive.

As simple lidar sensors only record the time for the first reflection to return, a drawback of some lidar systems is that they will only detect the nearest object in a specific direction. This is problematic when the environment is dusty or foggy as the first reflection may not be from a solid object and the resulting 3D map will be degraded. More sophisticated (and costly) systems measure the entire reflection over time which then allows a full range profile to be measured through the obscurant. Direct sunlight can also produce problems as the large level of background illumination can make it difficult to detect weak reflections. Similarly, if a surface has low reflectivity (i.e. it is black) then it may not be detected by the lidar. This can be a problem for autonomous vehicles as black car surfaces will only be detected at a closer range than more reflective vehicles.

Radar

Radar is similar to lidar but uses microwaves rather than light (typically 25 or 77 GHz). Lidar was in fact inspired by radar (e.g. laser radar) and only became possible once lasers were invented. The exact mechanism by which the distance is measured varies slightly between different radar systems; however, the concept is the same. A signal is emitted, the length of time it takes for a reflection to return is measured and this is then converted into a range profile. While panoramic mechanically scanned radars are available, it is more common to use an antenna array and calculate the angle of arrival of a reflection by the difference in signal across the array.

One advantage of radar is the ability to measure speed directly via Doppler shift without complex processing. Therefore, objects moving relative to a mainly static scene are generally easy to detect. Poor environmental conditions (e.g. fog, rain and snow) have little impact on the performance of the radar which provides a useful all-weather capability for autonomous vehicles. Single chip radars with integrated processing capabilities are also available for use as small and inexpensive sensor solutions.

A disadvantage of radar is the limited lateral resolution. While the depth resolution can be good, the angular resolution is significantly lower than for optical sensors. However, this is partially mitigated if an object can be uniquely separated from other objects and clutter by its range or velocity value.

Ultrasonic

The final sensor used for range finding on autonomous vehicles is an ultrasonic sensor which emits high-frequency sounds beyond the range of human hearing. Bats are, of course, well-known users of this approach. Ultrasonic sensors are very similar to lidar sensors; however, as the speed of sound in air is vastly slower than the speed of light it is much easier to measure the time for a reflection to return from a surface.

Ultrasonic sensors work well regardless of light level or environmental conditions and are very small and inexpensive. This makes the technology ideal for ultra-short range collision avoidance sensors on small or slow moving vehicles which can be placed in many locations to provide wide area coverage.

The main disadvantage of ultrasonic sensors is their extremely short range as they can only produce distance measurements for surfaces up to a few metres away. For this reason, it is also uncommon for an ultrasonic sensor to be used to explicitly form a 3D map.

Data Fusion

In practice, to achieve a robust and effective sensor solution for autonomous vehicles it is necessary to combine different sensors and perform sensor fusion. As yet there is no standard sensor suite and research is still ongoing to determine the optimum combination with an acceptable performance across all weather conditions.

Furthermore, as an example, Tesla’s latest models that are claimed to be suitable for autonomous operation have eight cameras (with varying fields of view) and twelve ultrasonic sensors to enable panoramic sensing while a single forward-looking radar measures range and speed of objects up to 160m away.

The combination of cameras with radar is a common sensor choice as it provides good lateral and range resolution under various weather conditions for a relatively low price. It remains to be seen whether or not it is sufficient for safe autonomous operation without the addition of lidar.

Save

Save

Save

Save

Save

Save

Save

Save

Save

Save

Save

Save

Save

Save

Save

Neural Networks: Can Androids Recognise Electric Sheep?

Neural Networks: Can Androids Recognise Electric Sheep?

Damien Clarke - Senior Consultant, Data Exploitation

By: Damien Clarke
Senior Consultant, Data Exploitation

20th September 2017

Home » Damien Clarke

In 2010, Lt. Gen. David Deptula, the US Air Force deputy chief of staff for intelligence, was quoted as saying:

“We’re going to find ourselves, in the not too distant future, swimming in sensors and drowning in data.”

Since then, this flood of data is showing no signs of slowing. In fact, it is actually accelerating as greater volumes of data are being generated every day. This is just as true in the civilian context as the military context.

For organisations with access to these large volumes of data, it would be profitable to employ data exploitation techniques to convert the raw data into useful information. This can sometimes be achieved by developing custom data processing techniques for specific situations. However, in many cases, it is better to use machine learning techniques to allow computers to learn from data without being explicitly programmed. At Plextek, we’re passionate about developing and implementing the right data exploitation techniques for the application and are working to ensure that humanity stays afloat (and dry) in Deptula’s prediction.

There is a wide range of potential machine learning techniques to choose from, but one approach is to copy nature and mimic biological brains. This was inspired by the fact that one of the primary purposes of a brain is to process sensory inputs and extract useful information for future exploitation. A biological brain can be produced in software form by modelling a large set of connected neurons. This is an artificial neural network.

How does an artificial neural network work?

The basic building block of a neural network is a single neuron. A neuron transforms a set of one or more input values into a single output by applying a mathematical function to the weighted sum of input values. This output value is then passed to one or more connected neurons to be used as a subsequent input value.

The neural network as a whole can, therefore, be defined by three sets of parameters:

  The weight assigned to each input value for each neuron.

  The function which converts the weighted sum of input values into the output value.

  The pattern of connections between neurons.

A simple example neural network consists of three layers. The first layer contains the input values which represent the data being analysed. This layer is then connected to a hidden layer of neurons. The hidden layer then connects to the third and final layer which contains the output neurons whose values represent the processed data. This design allows a complicated relationship between inputs and outputs.



How is a neural network trained?

Just like biological brains, simply creating a neural network is not sufficient to extract information from raw data. It is also necessary to train the network by exposing it to data for which the desired outputs are already known. This process is used to define the weights assigned to each connection throughout the entire network.

As the size and complexity of the neural network increases, the number of weights that must be defined for optimum performance increases significantly. This training process, therefore, requires a large and representative set of labelled data; otherwise, the neural network may not work successfully on future input data. Also, this training process is computationally challenging and may take significant processing time to perform. GPU acceleration can be used to mitigate this; however, the process may still take days for very large data sets.  

Conversely, if large volumes of suitable training data are available, it is possible to create a more complex neural network to improve performance. This can be achieved by increasing the number of hidden layers and therefore the total number of connections within the network. This use of complex neural networks with many layers and connections is called deep learning.



What can neural networks be used for?

With a sufficiently large neural network and suitable training data, it is possible to learn complex non-linear relationships between input and output values. This can reveal insights into data which are not possible when using simple linear mathematical models.

While neural networks are suitable as general purpose problem solvers, they are particularly suited for tasks when an understanding of the underlying relationships in the raw data is neither available nor necessarily required and sufficient data is also available for training.

An important example of this capability is the recognition of objects in images. This is achieved through the use of a neural network which has been trained on a large volume of photos of known objects (e.g. ImageNet). While the training process can take a long time, subsequent object recognition is much faster and can potentially be performed in real time. Due to the large volume of training data and the complexity of the neural network used the resulting object recognition performance is close to human level performance. This can be used to in a military context to recognise different vehicles (e.g. a tank) or in a civilian context to see if computers can distinguish between different animals (Do Androids Dream of Electric Sheep?).



Neural networks are not just limited to processing photos and the same approach can be applied to a wide range of sensor and non-sensor data. The most important requirement is that a suitable volume of labelled training data is available to train the network before it can be used on unknown data.

Save

Save

Save

Save

Save

Save

Save

Save

Save

Save

In 2010, Lt. Gen. David Deptula, the US Air Force deputy chief of staff for intelligence, was quoted as saying:

“We’re going to find ourselves, in the not too distant future, swimming in sensors and drowning in data.”

Since then, this flood of data is showing no signs of slowing. In fact, it is actually accelerating as greater volumes of data are being generated every day. This is just as true in the civilian context as the military context.

For organisations with access to these large volumes of data, it would be profitable to employ data exploitation techniques to convert the raw data into useful information. This can sometimes be achieved by developing custom data processing techniques for specific situations. However, in many cases, it is better to use machine learning techniques to allow computers to learn from data without being explicitly programmed. At Plextek, we’re passionate about developing and implementing the right data exploitation techniques for the application and are working to ensure that humanity stays afloat (and dry) in Deptula’s prediction.

There is a wide range of potential machine learning techniques to choose from, but one approach is to copy nature and mimic biological brains. This was inspired by the fact that one of the primary purposes of a brain is to process sensory inputs and extract useful information for future exploitation. A biological brain can be produced in software form by modelling a large set of connected neurons. This is an artificial neural network.

How does an artificial neural network work?

The basic building block of a neural network is a single neuron. A neuron transforms a set of one or more input values into a single output by applying a mathematical function to the weighted sum of input values. This output value is then passed to one or more connected neurons to be used as a subsequent input value.

The neural network as a whole can, therefore, be defined by three sets of parameters:

  The weight assigned to each input value for each neuron.

  The function which converts the weighted sum of input values into the output value.

  The pattern of connections between neurons.

A simple example neural network consists of three layers. The first layer contains the input values which represent the data being analysed. This layer is then connected to a hidden layer of neurons. The hidden layer then connects to the third and final layer which contains the output neurons whose values represent the processed data. This design allows a complicated relationship between inputs and outputs.



How is a neural network trained?

Just like biological brains, simply creating a neural network is not sufficient to extract information from raw data. It is also necessary to train the network by exposing it to data for which the desired outputs are already known. This process is used to define the weights assigned to each connection throughout the entire network.

As the size and complexity of the neural network increases, the number of weights that must be defined for optimum performance increases significantly. This training process, therefore, requires a large and representative set of labelled data; otherwise, the neural network may not work successfully on future input data. Also, this training process is computationally challenging and may take significant processing time to perform. GPU acceleration can be used to mitigate this; however, the process may still take days for very large data sets.  

Conversely, if large volumes of suitable training data are available, it is possible to create a more complex neural network to improve performance. This can be achieved by increasing the number of hidden layers and therefore the total number of connections within the network. This use of complex neural networks with many layers and connections is called deep learning.



What can neural networks be used for?

With a sufficiently large neural network and suitable training data, it is possible to learn complex non-linear relationships between input and output values. This can reveal insights into data which are not possible when using simple linear mathematical models.

While neural networks are suitable as general purpose problem solvers, they are particularly suited for tasks when an understanding of the underlying relationships in the raw data is neither available nor necessarily required and sufficient data is also available for training.

An important example of this capability is the recognition of objects in images. This is achieved through the use of a neural network which has been trained on a large volume of photos of known objects (e.g. ImageNet). While the training process can take a long time, subsequent object recognition is much faster and can potentially be performed in real time. Due to the large volume of training data and the complexity of the neural network used the resulting object recognition performance is close to human level performance. This can be used to in a military context to recognise different vehicles (e.g. a tank) or in a civilian context to see if computers can distinguish between different animals (Do Androids Dream of Electric Sheep?).



Neural networks are not just limited to processing photos and the same approach can be applied to a wide range of sensor and non-sensor data. The most important requirement is that a suitable volume of labelled training data is available to train the network before it can be used on unknown data.

Save

Save

Save

Save

Save

Save

Save

Save