RIPAC (microprocessor)

RIPAC (microprocessor)

RIPAC was a VLSI single-chip microprocessor designed for automatic recognition of the connected speech, one of the first of this use. The project of the microprocessor RIPAC started in 1984. RIPAC was aimed to provide efficient real-time speech recognition services to the italian telephone system provided by SIP. The microprocessor was presented in September 1986 at The Hague (Netherlands) at EUSPICO conference. It was composed of 70.000 transistors and structured as Harvard architecture. The name RIPAC is the acronym for "Riconoscimento del PArlato Connesso", that means "Recognition of the connected speech" in Italian. The microprocessor was designed by the Italian companies CSELT and ELSAG and was produced by SGS: a combination of Hidden Markov Model and Dynamic Time Warping algorithms was used for processing speech signals. It was able to do real-time speech recognition of Italian and many languages with a good affordability. The chip, issued by U.S. Patent No. 4,907,278, worked at first run.

AIXI

AIXI is a theoretical mathematical formalism for artificial general intelligence. It combines Solomonoff induction with sequential decision theory. AIXI was first proposed by Marcus Hutter in 2000 and several results regarding AIXI are proved in Hutter's 2005 book Universal Artificial Intelligence. AIXI is a reinforcement learning (RL) agent. It maximizes the expected total rewards received from the environment. Intuitively, it simultaneously considers every computable hypothesis (or environment). In each time step, it looks at every possible program and evaluates how many rewards that program generates depending on the next action taken. The promised rewards are then weighted by the subjective belief that this program constitutes the true environment. This belief is computed from the length of the program: longer programs are considered less likely, in line with Occam's razor. AIXI then selects the action that has the highest expected total reward in the weighted sum of all these programs. == Etymology == According to Hutter, the word "AIXI" can have several interpretations. AIXI can stand for AI based on Solomonoff's distribution, denoted by ξ {\displaystyle \xi } (which is the Greek letter xi), or e.g. it can stand for AI "crossed" (X) with induction (I). There are other interpretations. == Definition == AIXI is a reinforcement learning agent that interacts with some stochastic and unknown but computable environment μ {\displaystyle \mu } . The interaction proceeds in time steps, from t = 1 {\displaystyle t=1} to t = m {\displaystyle t=m} , where m ∈ N {\displaystyle m\in \mathbb {N} } is the lifespan of the AIXI agent. At time step t, the agent chooses an action a t ∈ A {\displaystyle a_{t}\in {\mathcal {A}}} (e.g. a limb movement) and executes it in the environment, and the environment responds with a "percept" e t ∈ E = O × R {\displaystyle e_{t}\in {\mathcal {E}}={\mathcal {O}}\times \mathbb {R} } , which consists of an "observation" o t ∈ O {\displaystyle o_{t}\in {\mathcal {O}}} (e.g., a camera image) and a reward r t ∈ R {\displaystyle r_{t}\in \mathbb {R} } , distributed according to the conditional probability μ ( o t r t | a 1 o 1 r 1 . . . a t − 1 o t − 1 r t − 1 a t ) {\displaystyle \mu (o_{t}r_{t}|a_{1}o_{1}r_{1}...a_{t-1}o_{t-1}r_{t-1}a_{t})} , where a 1 o 1 r 1 . . . a t − 1 o t − 1 r t − 1 a t {\displaystyle a_{1}o_{1}r_{1}...a_{t-1}o_{t-1}r_{t-1}a_{t}} is the "history" of actions, observations and rewards. The environment μ {\displaystyle \mu } is thus mathematically represented as a probability distribution over "percepts" (observations and rewards) which depend on the full history, so there is no Markov assumption (as opposed to other RL algorithms). Note again that this probability distribution is unknown to the AIXI agent. Furthermore, note again that μ {\displaystyle \mu } is computable, that is, the observations and rewards received by the agent from the environment μ {\displaystyle \mu } can be computed by some program (which runs on a Turing machine), given the past actions of the AIXI agent. The only goal of the AIXI agent is to maximize ∑ t = 1 m r t {\displaystyle \sum _{t=1}^{m}r_{t}} , that is, the sum of rewards from time step 1 to m. The AIXI agent is associated with a stochastic policy π : ( A × E ) ∗ → A {\displaystyle \pi :({\mathcal {A}}\times {\mathcal {E}})^{}\rightarrow {\mathcal {A}}} , which is the function it uses to choose actions at every time step, where A {\displaystyle {\mathcal {A}}} is the space of all possible actions that AIXI can take and E {\displaystyle {\mathcal {E}}} is the space of all possible "percepts" that can be produced by the environment. The environment (or probability distribution) μ {\displaystyle \mu } can also be thought of as a stochastic policy (which is a function): μ : ( A × E ) ∗ × A → E {\displaystyle \mu :({\mathcal {A}}\times {\mathcal {E}})^{}\times {\mathcal {A}}\rightarrow {\mathcal {E}}} , where the ∗ {\displaystyle } is the Kleene star operation. In general, at time step t {\displaystyle t} (which ranges from 1 to m), AIXI, having previously executed actions a 1 … a t − 1 {\displaystyle a_{1}\dots a_{t-1}} (which is often abbreviated in the literature as a < t {\displaystyle a_{

Nouvelle AI

Nouvelle artificial intelligence (Nouvelle AI) is an approach to artificial intelligence pioneered in the 1980s by Rodney Brooks, who was then part of MIT artificial intelligence laboratory. Nouvelle AI differs from classical AI by aiming to produce robots with intelligence levels similar to insects. Researchers believe that intelligence can emerge organically from simple behaviors as these intelligences interacted with the "real world", instead of using the constructed worlds which symbolic AIs typically needed to have programmed into them. == Motivation == The differences between nouvelle AI and symbolic AI are apparent in early robots Shakey and Freddy. These robots contained an internal model (or "representation") of their micro-worlds consisting of symbolic descriptions. As a result, this structure of symbols had to be renewed as the robot moved or the world changed. Shakey's planning programs assessed the program structure and broke it down into the necessary steps to complete the desired action. This level of computation required a large amount time to process, so Shakey typically performed its tasks very slowly. Symbolic AI researchers had long been plagued by the problem of updating, searching, and otherwise manipulating the symbolic worlds inside their AIs. A nouvelle system refers continuously to its sensors rather than to an internal model of the world. It processes the external world information it needs from the senses when it is required. As Brooks puts it, "the world is its own best model--always exactly up to date and complete in every detail." A central idea of nouvelle AI is that simple behaviors combine to form more complex behaviors over time. For example, simple behaviors can include elements like "move forward" and "avoid obstacles." A robot using nouvelle AI with simple behaviors like collision avoidance and moving toward a moving object could possibly come together to produce a more complex behavior like chasing a moving object. === The frame problem === The frame problem describes an issue with using first-order logic (FOL) to express facts about a robot in the world. Representing the state of a robot with traditional FOL requires the use of many axioms (symbolic language) to imply that things about an environment do not change arbitrarily. Nouvelle AI seeks to sidestep the frame problem by dispensing with filling the AI or robot with volumes of symbolic language and instead letting more complex behaviors emerge by combining simpler behavioral elements. === Embodiment === The goal of traditional AI was to build intelligences without bodies, which would only have been able to interact with the world via keyboard, screen, or printer. However, nouvelle AI attempts to build embodied intelligence situated in the real world. Brooks quotes approvingly from the brief sketches that Turing gave in 1948 and 1950 of the "situated" approach. Turing wrote of equipping a machine "with the best sense organs that money can buy" and teaching it "to understand and speak English" by a process that would "follow the normal teaching of a child." This approach was contrasted to the others where they focused on abstract activities such as playing chess. == Brooks' robots == === Insectoid robots === Brooks focused on building robots that acted like simple insects while simultaneously working to remove some traditional AI characteristics. He created insect-like robots, named Allen and Herbert after cognitive science and AI pioneers Allen Newell and Herbert A. Simon. Brooks's insectoid robots contained no internal models of the world. Herbert, for example, discarded a high volume of the information received from its sensors and never stored information for more than two seconds. ==== Allen ==== Allen had a ring of twelve ultrasonic sonars as its primary sensors and three independent behavior-producing modules. These modules were programmed to avoid both stationary and moving objects. With only this module activated, Allen stayed in the middle of a room until an object approached and then it ran away while avoiding obstacles in its way. ==== Herbert ==== Herbert used infrared sensors to avoid obstacles and a laser system to collect 3D data over a distance of about 12 feet. Herbert also carried a number of simple sensors in its "hand." The robot's testing ground was the real world environment of the busy offices and workspaces of the MIT AI lab where it searched for empty soda cans and carried them away, a seemingly goal-oriented activity that emerged as a result of 15 simple behavior units combining. As a parallel, Simon noted that an ant's complicated path is due to the structure of its environment rather than the depth of its thought processes. ==== Other insectoid robots ==== Other robots by Brooks' team were Genghis and Squirt. Genghis had six legs and was able to walk over rough terrain and follow a human. Squirt's behavior modules had it stay in dark corners until it heard a noise, then it would begin to follow the source of the noise. Brooks agreed that the level of nouvelle AI had come near the complexity of a real insect, which raised a question about whether or not insect level-behavior was and is a reasonable goal for nouvelle AI. === Humanoid robots === Brooks' own recent work has taken the opposite direction to that proposed by Von Neumann in the quotations "theorists who select the human nervous system as their model are unrealistically picking 'the most complicated object under the sun,' and that there is little advantage in selecting instead the ant, since any nervous system at all exhibits exceptional complexity." ==== Cog ==== In the 1990s, Brooks decided to pursue the goal of human-level intelligence and, with Lynn Andrea Stein, built a humanoid robot called Cog. Cog is a robot with an extensive collection of sensors, a face, and arms (among other features) that allow it to interact with the world and gather information and experience so as to assemble intelligence organically in the manner described above by Turing. The team believed that Cog would be able to learn and able to find a correlation between the sensory information it received and its actions, and to learn common sense knowledge on its own. As of 2003, all development of the project had ceased.

Progress in artificial intelligence

Progress in artificial intelligence (AI) refers to the advances, milestones, and breakthroughs that have been achieved in the field of artificial intelligence over time. AI is a branch of computer science that aims to create machines and systems capable of performing tasks that typically require human intelligence. AI applications have been used in a wide range of fields including medical diagnosis, finance, robotics, law, video games, agriculture, and scientific discovery. The society as a whole is looking for artificial intelligence to be on a key factor in the upcming years because of its potential. However, many AI applications are not perceived as AI: "A lot of cutting-edge AI has filtered into general applications, often without being called AI because once something becomes useful enough and common enough it's not labeled AI anymore." "Many thousands of AI applications are deeply embedded in the infrastructure of every industry." In the late 1990s and early 2000s, AI technology became widely used as elements of larger systems, but the field was rarely credited for these successes at the time. Kaplan and Haenlein structure artificial intelligence along three evolutionary stages: Artificial narrow intelligence – AI capable only of specific tasks; Artificial general intelligence – AI with ability in several areas, and able to autonomously solve problems they were never even designed for; Artificial superintelligence – AI capable of general tasks, including scientific creativity, social skills, and general wisdom. To allow comparison with human performance, artificial intelligence can be evaluated on constrained and well-defined problems. Such tests have been termed subject-matter expert Turing tests. Also, smaller problems provide more achievable goals and there are an ever-increasing number of positive results. In 2023, humans still substantially outperformed both GPT-4 and other models tested on the ConceptARC benchmark. Those models scored 60% on most, and 77% on one category, while humans scored 91% on all and 97% on one category. However, later research in 2025 showed that human-generated output grids were only accurate 73% of the time, while AI models available that year managed to score above 77%. == History == Increasing, promoting or constraining AI progress has often be done via controlling or increasing the amount of compute. == Current performance in specific areas == There are many useful abilities that can be described as showing some form of intelligence. This gives better insight into the comparative success of artificial intelligence in different areas. AI, like electricity or the steam engine, is a general-purpose technology. There is no consensus on how to characterize which tasks AI tends to excel at. Some versions of Moravec's paradox observe that humans are more likely to outperform machines in areas such as physical dexterity that have been the direct target of natural selection. While projects such as AlphaZero have succeeded in generating their own knowledge from scratch, many other machine learning projects require large training datasets. Researcher Andrew Ng has suggested, as a "highly imperfect rule of thumb", that "almost anything a typical human can do with less than one second of mental thought, we can probably now or in the near future automate using AI." Games provide a high-profile benchmark for assessing rates of progress; many games have a large professional player base and a well-established competitive rating system. AlphaGo brought the era of classical board-game benchmarks to a close when Artificial Intelligence proved their competitive edge over humans in 2016. Deep Mind's AlphaGo AI software program defeated the world's best professional Go Player Lee Sedol. Games of imperfect knowledge provide new challenges to AI in the area of game theory; the most prominent milestone in this area was brought to a close by Libratus' poker victory in 2017. E-sports continue to provide additional benchmarks; Facebook AI, Deepmind, and others have engaged with the popular StarCraft franchise of videogames. Broad classes of outcome for an AI test may be given as: optimal: it is not possible to perform better (note: some of these entries were solved by humans) super-human: performs better than all humans high-human: performs better than most humans par-human: performs similarly to most humans sub-human: performs worse than most humans === Optimal === Tic-tac-toe Connect Four: 1988 Checkers (aka 8x8 draughts): Weakly solved (2007) Rubik's Cube: Mostly solved (2010) Heads-up limit hold'em poker: Statistically optimal in the sense that "a human lifetime of play is not sufficient to establish with statistical significance that the strategy is not an exact solution" (2015) === Super-human === Othello (aka reversi): c. 1997 Scrabble: 2006 Backgammon: c. 1995–2002 Chess: Supercomputer (c. 1997); Personal computer (c. 2006); Mobile phone (c. 2009); Computer defeats human + computer (c. 2017) Jeopardy!: Question answering, although the machine did not use speech recognition (2011) Arimaa: 2015 Shogi: c. 2017 Go: 2017 Heads-up no-limit hold'em poker: 2017 Six-player no-limit hold'em poker: 2019 Gran Turismo Sport: 2022 === High-human === Crosswords: c. 2012 Freeciv: 2016 Dota 2: 2018 Bridge card-playing: According to a 2009 review, "the best programs are attaining expert status as (bridge) card players", excluding bidding. StarCraft II: 2019 Mahjong: 2019 Stratego: 2022 No-Press Diplomacy: 2022 Hanabi: 2022 Natural language processing === Par-human === Optical character recognition for ISO 1073-1:1976 and similar special characters. Classification of images Handwriting recognition Facial recognition Visual question answering SQuAD 2.0 English reading-comprehension benchmark (2019) SuperGLUE English-language understanding benchmark (2020) Some school science exams (2019) Some tasks based on Raven's Progressive Matrices Many Atari 2600 games (2015) === Sub-human === Optical character recognition for printed text (nearing par-human for Latin-script typewritten text) Object recognition Various robotics tasks that may require advances in robot hardware as well as AI, including: Stable bipedal locomotion: Bipedal robots can walk, but are less stable than human walkers (as of 2017) Humanoid soccer Speech recognition: "nearly equal to human performance" (2017) Explainability. Current medical systems can diagnose certain medical conditions well, but cannot explain to users why they made the diagnosis. Many tests of fluid intelligence (2020) Bongard visual cognition problems, such as the Bongard-LOGO benchmark (2020) Visual Commonsense Reasoning (VCR) benchmark (as of 2020) Stock market prediction: Financial data collection and processing using Machine Learning algorithms Angry Birds video game, as of 2020 Various tasks that are difficult to solve without contextual knowledge, including: Translation Word-sense disambiguation == Proposed tests of artificial intelligence == In his famous Turing test, Alan Turing picked language, the defining feature of human beings, for its basis. The Turing test is now considered too exploitable to be a meaningful benchmark. The Feigenbaum test, proposed by the inventor of expert systems, tests a machine's knowledge and expertise about a specific subject. A paper by Jim Gray of Microsoft in 2003 suggested extending the Turing test to speech understanding, speaking and recognizing objects and behavior. Proposed "universal intelligence" tests aim to compare how well machines, humans, and even non-human animals perform on problem sets that are generic as possible. At an extreme, the test suite can contain every possible problem, weighted by Kolmogorov complexity; however, these problem sets tend to be dominated by impoverished pattern-matching exercises where a tuned AI can easily exceed human performance levels. == Exams == According to OpenAI, in 2023 GPT-4 achieved high scores on several standardized and professional examinations, including around the 90th percentile on the Uniform Bar Exam, the 89th percentile on the mathematics section of the SAT, the 93rd percentile on SAT Reading and Writing, the 54th percentile on the analytical writing section of the GRE, the 88th percentile on GRE quantitative reasoning, and the 99th percentile on GRE verbal reasoning. OpenAI also reported that GPT-4 scored in the 99th to 100th percentile on the 2020 USA Biology Olympiad semifinal exam and earned top scores on several AP exams. Independent researchers found in 2023 that ChatGPT based on GPT-3.5 performed "at or near the passing threshold" on all three parts of the United States Medical Licensing Examination (USMLE), suggesting that large language models could reach passing-level performance on some medical knowledge assessments even without domain-specific fine-tuning. GPT-3.5 was also reported to attain a low but passing grade on examinations for four law school courses at the University of Minnes

List of artificial intelligence journals

This is a list of notable peer-reviewed academic journals that publish research in the field of artificial intelligence (AI), including areas such as machine learning, computer vision, natural language processing, robotics, and intelligent systems. == General artificial intelligence == Artificial Intelligence (journal) – Elsevier Journal of Artificial Intelligence Research (JAIR) – AI Access Foundation Knowledge-Based Systems – Elsevier == Machine learning == Data Mining and Knowledge Discovery – Springer Machine Learning (journal) – Springer Journal of Machine Learning Research – Microtome Pattern Recognition (journal) – Elsevier Neural Networks (journal) – Elsevier Neural Computation (journal) – MIT Press Neurocomputing (journal) - Elsevier == Deep learning and neural computation == IEEE Transactions on Evolutionary Computation – IEEE IEEE Transactions on Neural Networks and Learning Systems – IEEE Nature Machine Intelligence – Springer Nature == Computer vision == International Journal of Computer Vision – Springer IEEE Transactions on Pattern Analysis and Machine Intelligence – IEEE Machine Vision and Applications – Springer == Natural language processing == Computational Linguistics (journal) – MIT Press Natural Language Processing Transactions of the Association for Computational Linguistics – ACL == Robotics and intelligent systems == IEEE Transactions on Robotics – IEEE Autonomous Robots – Springer Journal of Intelligent & Robotic Systems – Springer == Interdisciplinary and ethics in AI == AI & Society – Springer Artificial Life – MIT Press Philosophy & Technology – Springer Minds and Machines – Springer

BigDog

BigDog is a dynamically stable quadruped military robot platform that was created in 2005 by Boston Dynamics with the Harvard University Concord Field Station. It was funded by the U.S. Defense Advanced Research Projects Agency (DARPA), but the project was shelved after the BigDog's gas engine was deemed too loud for combat. == History == BigDog was funded by the Defense Advanced Research Projects Agency (DARPA) in the hopes that it would be able to serve as a mechanic pack mule to accompany soldiers in terrain too rough for conventional vehicles. Instead of wheels or treads, BigDog uses four legs for movement, allowing it to move across surfaces that would be difficult for wheels. The legs contain a variety of sensors, including joint position and ground contact. BigDog also features a laser gyroscope and a stereo vision system. BigDog is 3 feet (0.91 m) long, stands 2.5 feet (0.76 m) tall, and weighs 240 pounds (110 kg), making it about the size of a small mule. It is capable of traversing difficult terrain, running at four miles per hour (6.4 km/h), carrying 340 pounds (150 kg), and climbing a 35 degree incline. Locomotion is controlled by an onboard computer that receives input from the robot's various sensors. Navigation and balance are also managed by the control system. BigDog's walking pattern is controlled through four legs, each equipped with four low-friction hydraulic cylinder actuators that power the joints. BigDog's locomotion behaviors can vary greatly. It can stand up, sit down, walk with a crawling gait that lifts one leg at a time, walk with a trotting gait lifting diagonal legs, or trot with a running gait. The travel speed of BigDog varies from a 0.62 mph (1 km/h) crawl to a 3.3 mph (5.3 km/h) trot. The BigDog project was headed by Dr. Martin Buehler, who received the Joseph Engelberger Award from the Robotics Industries Association in 2012 for the work. Dr. Buehler while previously a professor at McGill University, headed the robotics lab there, developing four-legged walking and running robots. Built onto the actuators are sensors for joint position and force, and movement is ultimately controlled through an onboard computer which manages the sensors. Approximately 50 sensors are located on BigDog. These measure the attitude and acceleration of the body, motion, and force of joint actuators as well as engine speed, temperature and hydraulic pressure inside the robot's internal engine. Low-level control, such as position and force of the joints, and high-level control such as velocity and altitude during locomotion, are both controlled through the onboard computer. BigDog was featured in episodes of Web Junk 20 and Hungry Beast, and in articles in New Scientist, Popular Science, Popular Mechanics, and The Wall Street Journal. In September 2011 Boston Dynamics released video footage of a new generation of BigDog known as AlphaDog. The footage shows AlphaDog's ability to walk on rough terrain and recover its balance when kicked from the side. The refined equivalent has been designed by Boston Dynamics to exceed the BigDog in terms of capabilities and use to dismounted soldiers. In February 2012, with further DARPA support, the militarized Legged Squad Support System (LS3) variant of BigDog demonstrated its capabilities during a hike over a rough terrain. Starting in the summer of 2012, DARPA planned to complete the overall development of the system and refine its key capabilities in 18 months, ensuring its worth to dismounted warfighters before it is rolled out to squads operating in-theatre. BigDog must be able to demonstrate its ability to complete a 20-mile (32 km) trail in 24 hours, without refuelling, while carrying a 325-pound (150 kg) load. A refinement of its vision sensors will also be conducted. At the end of February 2013, Boston Dynamics released video footage of a modified BigDog with an arm. The arm could pick up objects and throw them. The robot is relying on its legs and torso to help power the motions of the arm. It is believed that it can lift weights around 55 pounds (25 kg). This work was funded by the United States Army Research Laboratory and paved the way for integrating manipulators with quadrupeds as found on Spot, the spiritual successor of BigDog. === Discontinuation === At the end of December 2013, the BigDog project was discontinued. Despite hopes that it would one day work like a pack mule for US soldiers in the field, the gasoline-powered engine was deemed too noisy for use in combat, and it could be heard from hundreds of meters away. A similar project for an all-electric robot named Spot in 2016 was much quieter, but could only carry 45 pounds (20 kg). Both projects are no longer in progress, but the Spot was only released in 2020. == Hardware == BigDog is powered by a small two-stroke, one-cylinder, 15-brake-horsepower (11 kW) engine operating at 9,000 RPM. The engine drives a hydraulic pump, which in turn drives the hydraulic leg actuators. Each leg has four actuators (two for the hip joint, and two each for the knee and ankle joints), for a total of 16. Each actuator unit consists of a hydraulic cylinder, servo valve, position sensor, and force sensor. Onboard computing power is a ruggedized PC/104 board stack with two computers, one running a Pentium M processor running QNX (used for sensor data processing) and another running a Core Duo processor (used for visual data processing). == Gallery ==

Model compression

Model compression is a machine learning technique for reducing the size of trained models. Large models can achieve high accuracy, but often at the cost of significant resource requirements. Compression techniques aim to compress models without significant performance reduction. Smaller models require less storage space, and consume less memory and compute during inference. Compressed models enable deployment on resource-constrained devices such as smartphones, embedded systems, edge computing devices, and consumer electronics computers. Efficient inference is also valuable for large corporations that serve large model inference over an API, allowing them to reduce computational costs and improve response times for users. Model compression is not to be confused with knowledge distillation, in which a smaller "student" model is trained to imitate the input-output behavior of a larger "teacher" model (as opposed to using the "teacher"'s trained parameters or the "teacher"'s training targets). == Techniques == Several techniques are employed for model compression. === Pruning === Pruning sparsifies a large model by setting some parameters to exactly zero. This effectively reduces the number of parameters. This allows the use of sparse matrix operations, which are faster than dense matrix operations. Pruning criteria can be based on magnitudes of parameters, the statistical pattern of neural activations, Hessian values, etc. === Quantization === Quantization reduces the numerical precision of weights and activations. For example, instead of storing weights as 32-bit floating-point numbers, they can be represented using 8-bit integers. Low-precision parameters take up less space, and takes less compute to perform arithmetic with. It is also possible to quantize some parameters more aggressively than others, so for example, a less important parameter can have 8-bit precision while another, more important parameter, can have 16-bit precision. Inference with such models requires mixed-precision arithmetic. Quantized models can also be used during training (rather than after training). PyTorch implements automatic mixed-precision (AMP), which performs autocasting, gradient scaling, and loss scaling. === Low-rank factorization === Weight matrices can be approximated by low-rank matrices. Let W {\displaystyle W} be a weight matrix of shape m × n {\displaystyle m\times n} . A low-rank approximation is W ≈ U V T {\displaystyle W\approx UV^{T}} , where U {\displaystyle U} and V {\displaystyle V} are matrices of shapes m × k , n × k {\displaystyle m\times k,n\times k} . When k {\displaystyle k} is small, this both reduces the number of parameters needed to represent W {\displaystyle W} approximately, and accelerates matrix multiplication by W {\displaystyle W} . Low-rank approximations can be found by singular value decomposition (SVD). The choice of rank for each weight matrix is a hyperparameter, and jointly optimized as a mixed discrete-continuous optimization problem. The rank of weight matrices may also be pruned after training, taking into account the effect of activation functions like ReLU on the implicit rank of the weight matrices. == Training == Model compression may be decoupled from training, that is, a model is first trained without regard for how it might be compressed, then it is compressed. However, it may also be combined with training. The "train big, then compress" method trains a large model for a small number of training steps (less than it would be if it were trained to convergence), then heavily compress the model. It is found that at the same compute budget, this method results in a better model than lightly compressed, small models. In Deep Compression, the compression has three steps. First loop (pruning): prune all weights lower than a threshold, then finetune the network, then prune again, etc. Second loop (quantization): cluster weights, then enforce weight sharing among all weights in each cluster, then finetune the network, then cluster again, etc. Third step: Use Huffman coding to losslessly compress the model. The SqueezeNet paper reported that Deep Compression achieved a compression ratio of 35 on AlexNet, and a ratio of ~10 on SqueezeNets.