Research in the field of machine learning and AI, now a key technology in practically every industry and company, is far too voluminous for anyone to read it all. This column, Perceptron, aims to collect some of the most relevant recent discoveries and papers – especially in, but not limited to, artificial intelligence – and explain why they matter.
This month, engineers at Meta detailed two recent innovations from the depths of the company’s research labs: an AI system that compresses audio files and an algorithm that can speed up protein-folding AI performance by 60x. Elsewhere, scientists at MIT revealed that they are using spatial acoustic information to help machines better perceive their environments, simulating how a listener would hear a sound from any point in a room.
Meta’s compression work doesn’t exactly reach unexplored territory. Last year, Google announced Lyra, a neural audio codec trained to compress low-bitrate speech. But Meta claims its system is the first to work for CD-quality, stereo audio, making it useful for business applications like voice calls.
An architectural drawing of Meta’s AI audio compression model. Image credits: Meta
Using AI, Meta’s compression system, called Encodec, can compress and decompress audio in real time on a single CPU core at a rate of around 1.5 kbps to 12 kbps. Compared to MP3, Encodec can achieve a compression rate of approximately 10x at 64 kbps without a perceptible loss in quality.
The researchers behind Encodec say that human evaluators preferred the quality of audio processed by Encodec versus Lyra-processed audio, suggesting that Encodec could eventually be used to provide better quality audio in situations where bandwidth is constrained or at a premium.
As for Meta’s protein folding work, it has less immediate commercial potential. But it can lay the groundwork for important scientific research in the field of biology.
Protein structures predicted by Meta’s system. Image credits: Meta
Meta says its AI system, ESMFold, predicted the structures of around 600 million proteins from bacteria, viruses and other microbes that have not yet been characterized. That’s more than triple the 220 million structures Alphabet-backed DeepMind managed to predict earlier this year, which covered nearly every protein from known organisms in DNA databases.
Meta’s system is not as accurate as DeepMind’s. Of the ~600 million proteins it generated, only a third were “high quality.” But it is 60 times faster in predicting structures, which allows it to scale up the structure prediction to much larger databases of proteins.
Not to be outdone, the company’s AI division also this month detailed a system designed for mathematical reasoning. Researchers at the company say that their “neural problem solver” learned from a dataset of successful mathematical proofs to generalize to new different types of problems.
Meta is not the first to build such a system. OpenAI developed its own, called Lean, which it announced in February. Separately, DeepMind has experimented with systems that can solve challenging mathematical problems in the study of symmetries and knots. But Meta claims its neural problem solver is capable of solving five times more International Math Olympiads than any previous AI system and bests other systems on widely used math benchmarks.
Meta notes that math-solving AI could benefit the fields of software verification, cryptography, and even aerospace.
Turning our attention to MIT’s work, research scientists there developed a machine learning model that can capture how sounds in a room will propagate through the space. By modeling the acoustics, the system can learn the geometry of a room from sound recordings, which can be used to build visual renderings of a room.
The researchers say the technology could be applied to virtual and augmented reality software or robots that have to navigate complex environments. In the future, they plan to improve the system so that it can generalize to new and larger scenes, such as entire buildings or even entire towns and cities.
In Berkeley’s robotics department, two separate teams are accelerating the rate at which a quadrupedal robot can learn to walk and do other tricks. One team looked to combine the best-of-breed work of many other advances in reinforcement learning to allow a robot to go from blank slate to healthy walking on uncertain terrain in just 20 minutes in real-time.
“Perhaps surprisingly, we find that with some careful design decisions in terms of the task setup and algorithm implementation, it is possible for a quadrupedal robot to learn to walk from scratch with deep RL in under 20 minutes, across a range of different environments and Critically, this does not require novel algorithmic components or any other unexpected innovation,” the researchers write.
Instead, they select and combine several modern approaches and get amazing results. You can read the paper here.
Robot dog demo from EECS Professor Pieter Abbeel’s lab in Berkeley, California in 2022. (Photo courtesy of Philipp Wu/Berkeley Engineering)
Another locomotion study project, from (TechCrunch’s pal) Pieter Abbeel’s lab, is described as “training an imagination.” They set up the robot with the ability to try to predict how its actions will work out, and although it starts quite helpless, it quickly gains more knowledge about the world and how it works. This leads to a better prediction process, which leads to better knowledge, and so on in the report until it goes in less than an hour. It learns just as quickly to recover from being pushed or otherwise “perturbed,” as the lingo has it. Their work is documented here.
Work with a potentially more immediate application came earlier this month from Los Alamos National Laboratory, where researchers developed a machine learning technique to predict the friction that occurs during earthquakes — providing a way to predict earthquakes. Using a language model, the team says they were able to analyze the statistical features of seismic signals emitted from a fault in a laboratory earthquake machine to project the timing of a next quake.
“The model is not constrained by physics, but it predicts the physics, the real behavior of the system,” said Chris Johnson, one of the research leads on the project. “Now we make a future prediction from past data, which goes beyond describing the instantaneous state of the system.”
Image credits: Dreamstime
It is challenging to apply the technique in the real world, the researchers say, because it is not clear whether there is enough data to train the forecasting system. But all the same, they are optimistic about the applications, which could include anticipating damage to bridges and other structures.
Last week came a note of caution from MIT researchers, who warned that neural networks used to simulate real neural networks should be carefully examined for training bias.
Neural networks are of course based on the way our intelligence process and signal information, reinforcing certain connections and combinations of nodes. But this does not mean that the synthetic and real ones work the same. In fact, the MIT team found, neural network-based simulations of grid cells (part of the nervous system) only produced similar activity when they were carefully constrained to do so by their creators. If allowed to govern themselves, as real cells do, they did not produce the desired behavior.
This does not mean that deep learning models are useless in this field – far from it, they are very valuable. However, as Professor Ila Fiete said in the school’s news post: “They can be a powerful tool, but one has to be very careful in interpreting them and deciding whether they really make de-novo predictions, or even shed light on What it is that the brain is optimizing.”