Highway network

Highway network

In machine learning, the Highway Network was the first working very deep feedforward neural network with hundreds of layers, much deeper than previous neural networks. It uses skip connections modulated by learned gating mechanisms to regulate information flow, inspired by long short-term memory (LSTM) recurrent neural networks. The advantage of the Highway Network over other deep learning architectures is its ability to overcome or partially prevent the vanishing gradient problem, thus improving its optimization. Gating mechanisms are used to facilitate information flow across the many layers ("information highways"). Highway Networks have found use in text sequence labeling and speech recognition tasks. In 2014, the state of the art was training deep neural networks with 20 to 30 layers. Stacking too many layers led to a steep reduction in training accuracy, known as the "degradation" problem. In 2015, two techniques were developed to train such networks: the Highway Network (published in May), and the residual neural network, or ResNet (December). ResNet behaves like an open-gated Highway Net. == Model == The model has two gates in addition to the H ( W H , x ) {\displaystyle H(W_{H},x)} gate: the transform gate T ( W T , x ) {\displaystyle T(W_{T},x)} and the carry gate C ( W C , x ) {\displaystyle C(W_{C},x)} . The latter two gates are non-linear transfer functions (specifically sigmoid by convention). The function H {\displaystyle H} can be any desired transfer function. The carry gate is defined as: C ( W C , x ) = 1 − T ( W T , x ) {\displaystyle C(W_{C},x)=1-T(W_{T},x)} while the transform gate is just a gate with a sigmoid transfer function. == Structure == The structure of a hidden layer in the Highway Network follows the equation: y = H ( x , W H ) ⋅ T ( x , W T ) + x ⋅ C ( x , W C ) = H ( x , W H ) ⋅ T ( x , W T ) + x ⋅ ( 1 − T ( x , W T ) ) {\displaystyle {\begin{aligned}y=H(x,W_{H})\cdot T(x,W_{T})+x\cdot C(x,W_{C})\\=H(x,W_{H})\cdot T(x,W_{T})+x\cdot (1-T(x,W_{T}))\end{aligned}}} == Related work == Sepp Hochreiter analyzed the vanishing gradient problem in 1991 and attributed to it the reason why deep learning did not work well. To overcome this problem, Long Short-Term Memory (LSTM) recurrent neural networks have residual connections with a weight of 1.0 in every LSTM cell (called the constant error carrousel) to compute y t + 1 = F ( x t ) + x t {\textstyle y_{t+1}=F(x_{t})+x_{t}} . During backpropagation through time, this becomes the residual formula y = F ( x ) + x {\textstyle y=F(x)+x} for feedforward neural networks. This enables training very deep recurrent neural networks with a very long time span t. A later LSTM version published in 2000 modulates the identity LSTM connections by so-called "forget gates" such that their weights are not fixed to 1.0 but can be learned. In experiments, the forget gates were initialized with positive bias weights, thus being opened, addressing the vanishing gradient problem. As long as the forget gates of the 2000 LSTM are open, it behaves like the 1997 LSTM. The Highway Network of May 2015 applies these principles to feedforward neural networks. It was reported to be "the first very deep feedforward network with hundreds of layers". It is like a 2000 LSTM with forget gates unfolded in time, while the later Residual Nets have no equivalent of forget gates and are like the unfolded original 1997 LSTM. If the skip connections in Highway Networks are "without gates," or if their gates are kept open (activation 1.0), they become Residual Networks. The residual connection is a special case of the "short-cut connection" or "skip connection" by Rosenblatt (1961) and Lang & Witbrock (1988) which has the form x ↦ F ( x ) + A x {\displaystyle x\mapsto F(x)+Ax} . Here the randomly initialized weight matrix A does not have to be the identity mapping. Every residual connection is a skip connection, but almost all skip connections are not residual connections. The original Highway Network paper not only introduced the basic principle for very deep feedforward networks, but also included experimental results with 20, 50, and 100 layers networks, and mentioned ongoing experiments with up to 900 layers. Networks with 50 or 100 layers had lower training error than their plain network counterparts, but no lower training error than their 20 layers counterpart (on the MNIST dataset, Figure 1 in ). No improvement on test accuracy was reported with networks deeper than 19 layers (on the CIFAR-10 dataset; Table 1 in ). The ResNet paper, however, provided strong experimental evidence of the benefits of going deeper than 20 layers. It argued that the identity mapping without modulation is crucial and mentioned that modulation in the skip connection can still lead to vanishing signals in forward and backward propagation (Section 3 in ). This is also why the forget gates of the 2000 LSTM were initially opened through positive bias weights: as long as the gates are open, it behaves like the 1997 LSTM. Similarly, a Highway Net whose gates are opened through strongly positive bias weights behaves like a ResNet. The skip connections used in modern neural networks (e.g., Transformers) are dominantly identity mappings.

Neural computation

Neural computation is the information processing performed by networks of neurons. Neural computation is affiliated with the philosophical tradition of computationalism, which advances the thesis that neural computation explains cognition. Warren McCulloch and Walter Pitts were the first to propose an account of neural activity as being computational in their seminal 1943 paper "A Logical Calculus of the Ideas Immanent in Nervous Activity." There are three general branches of computationalism, including classicism, connectionism, and computational neuroscience. All three branches agree that cognition is computation, however, they disagree on what sorts of computations constitute cognition. The classicism tradition believes that computation in the brain is digital, analogous to digital computing. Both connectionism and computational neuroscience do not require that the computations that realize cognition are necessarily digital computations. However, the two branches greatly disagree upon which sorts of experimental data should be used to construct explanatory models of cognitive phenomena. Connectionists rely upon behavioral evidence to construct models to explain cognitive phenomena, whereas computational neuroscience leverages neuroanatomical and neurophysiological information to construct mathematical models that explain cognition. When comparing the three main traditions of the computational theory of mind, as well as the different possible forms of computation in the brain, it is helpful to define what we mean by computation in a general sense. Computation is the processing of information, otherwise known as variables or entities, according to a set of rules. A rule in this sense is simply an instruction for executing a manipulation on the current state of the variable, in order to produce a specified output. In other words, a rule dictates which output to produce given a certain input to the computing system. A computing system is a mechanism whose components must be functionally organized to process the information in accordance with the established set of rules. The types of information processed by a computing system determine which type of computations it performs. Traditionally in cognitive science, there have been two proposed types of computation related to neural activity, digital and analog, with the vast majority of theoretical work incorporating a digital understanding of cognition. Computing systems that perform digital computation are functionally organized to execute operations on strings of digits with respect to the type and location of the digit on the string. It has been argued that neural spike train signaling implements some form of digital computation, since neural spikes may be considered as discrete units or digits, like 0 or 1—the neuron either fires an action potential or it does not. Accordingly, neural spike trains could be seen as strings of digits. Alternatively, analog computing systems perform manipulations on non-discrete, irreducibly continuous variables, that is, entities that vary continuously as a function of time. These sorts of operations are characterized by systems of differential equations. Neural computation can be studied by, for example, building models of neural computation. Work on artificial neural networks has been somewhat inspired by knowledge of neural computation.

Oculus Quill

Quill is a painting and animation software for virtual reality. It runs on Microsoft Windows with Oculus Rift headsets. It is used to create 3D paintings and animated cartoons. Quill was released on November 29, 2016, on the Oculus Store. Theater Elsewhere(formerly Quill Theater), an application for viewing creations made in Quill, was later made available following the release of the Oculus Quest. In September 2021, Facebook, now known as Meta Platforms, and the owner of Oculus, sold Quill to its original creator, who continues to develop and support the app. == Development == Quill was originally developed by Oculus Story Studio as an internal tool for the creative needs of the studio's project Dear Angelica directed by Saschka Unseld along with its art-director Wesley Allsbrook. == Controls == The software works on Oculus Rift utilizing its 6DoF motion controllers. Users can paint in 3D space using their hands naturally, and animate those paintings with keyframes. They can also capture videos and photos of their creations. == Reception == Dear Angelica, a VR story fully painted in Quill, was nominated for an Emmy Award in 2017.

Aseprite

Aseprite ( ace-prite) is a proprietary, source-available image editor designed primarily for pixel art drawing and animation. It runs on Windows, macOS, and Linux, and features different tools for image and animation editing such as layers, frames, tilemap support, command-line interface, Lua scripting, among others. It is developed by Igara Studio S.A. and led by the developers David, Gaspar, and Martín Capello. Aseprite can be downloaded as freeware, (albeit it does not have the ability to save sprites) or purchased on Steam or Itch.io. Aseprite source code and binaries are distributed under EULA, educational, and Steam proprietary licenses. == History == Aseprite, formerly known as Allegro Sprite Editor, had its first release in 2001 as a free software project under the GPLv2 license. This license was kept until August 2016 with version v1.1.8, when the developers switched to a EULA, thus making the software proprietary. On the 1st of September 2016, the main developer, David Capello, wrote a post on the Aseprite Devblog explaining this change. The EULA permits others to download the Aseprite source code, compile it, and use it for personal purposes, but forbids its redistribution to third parties. After the license change, LibreSprite, a free and open source version of it, was created. Both before and after the license change, Aseprite was sold online, on Steam, itch.io, and the project's website. The project's code repository was hosted on Google Code until August 2014, when it was migrated to GitHub, where it remains hosted to date. As of October 2022, its repository has had 68 contributors and around 19 thousand stars. From 2014 to 2021, Aseprite had 66 different releases. Aseprite was used in the development of several notable games such as TowerFall (2013), Celeste (2018), Minit (2018), Wargroove (2019), Loop Hero (2021), Eastward (2021), Unpacking (2021), Haiku the Robot (2022) and Pizza Tower (2023). == Design and features == The main design purpose of Aseprite is to create animated 2D pixel-art sprites. Some of its features include: Layers and frames, with layer grouping and animation tagging Pixel-art specific transformations and tools (pixel-perfect modes, custom brushes, etc.) Animation real-time preview and onion skinning Tilemap and tileset modes Color palette managing, including 65 default palettes Color profiles and modes (RGBA, indexed and grayscale) Non-square pixels Command line interface (CLI) and Lua scripting Aseprite uses its own binary file type to store data, which is typically saved with .ase or .aseprite extensions. Different third-party projects were developed to support parsing of .ase files in programming languages including C#, Python and JavaScript, and in game engines such as Unity and Godot. Images and animations can be exported to different file formats including PNG, GIF, FLC, FLI, JPEG, PCX, TGA, ICO, SVG, and bitmap (BMP).

Medical imaging

Medical imaging is the technique and process of imaging the interior of a body for clinical analysis and medical intervention, as well as visual representation of the function of some organs or tissues (physiology). Medical imaging seeks to reveal internal structures hidden by the skin and bones, as well as to diagnose and treat disease. Medical imaging also establishes a database of normal anatomy and physiology to make it possible to identify abnormalities. Although imaging of removed organs and tissues can be performed for medical reasons, such procedures are usually considered part of pathology instead of medical imaging. Measurement and recording techniques that are not primarily designed to produce images, such as electroencephalography (EEG), magnetoencephalography (MEG), electrocardiography (ECG), and others, represent other technologies that produce data susceptible to representation as a parameter graph versus time or maps that contain data about the measurement locations. In a limited comparison, these technologies can be considered forms of medical imaging in another discipline of medical instrumentation. As of 2010, 5 billion medical imaging studies had been conducted worldwide. Radiation exposure from medical imaging in 2006 made up about 50% of total ionizing radiation exposure in the United States. Medical imaging equipment is manufactured using technology from the semiconductor industry, including CMOS integrated circuit chips, power semiconductor devices, sensors such as image sensors (particularly CMOS sensors) and biosensors, and processors such as microcontrollers, microprocessors, digital signal processors, media processors and system-on-chip devices. As of 2015, annual shipments of medical imaging chips amount to 46 million units and $1.1 billion. The term "noninvasive" is used to denote a procedure where no instrument is introduced into a patient's body, which is the case for most imaging techniques used. == History == In 1972, engineer Godfrey Hounsfield from the British company EMI invented the X-ray computed tomography device for head diagnosis, which is commonly referred to as computed tomography (CT). The CT nucleus method is based on the projecting X-rays through a section of the human head, which are then processed by computer to reconstruct the cross-sectional image, known as image reconstruction. In 1975, EMI successfully developed a CT device for the entire body, enabling the clear acquisition of tomographic images of various parts of the human body. This revolutionary diagnostic technique earned Hounsfield and physicist Allan Cormack the Nobel Prize in Physiology or Medicine in 1979. Digital image processing technology for medical applications was inducted into the Space Foundation's Space Technology Hall of Fame in 1994. By 2010, over 5 billion medical imaging studies had been conducted worldwide. Radiation exposure from medical imaging in 2006 accounted for about 50% of total ionizing radiation exposure in the United States. Medical imaging equipment is manufactured using technology from the semiconductor industry, including CMOS integrated circuit chips, power semiconductor devices, sensors such as image sensors (particularly CMOS sensors) and biosensors, as well as processors like microcontrollers, microprocessors, digital signal processors, media processors and system-on-chip devices. As of 2015, annual shipments of medical imaging chips reached 46 million units, generating a market value of $1.1 billion. == Types == In the clinical context, "invisible light" medical imaging is generally equated to radiology or "clinical imaging". "Visible light" medical imaging involves digital video or still pictures that can be seen without special equipment. Dermatology and wound care are two modalities that use visible light imagery. Interpretation of medical images is generally undertaken by a physician specialising in radiology known as a radiologist; however, this may be undertaken by any healthcare professional who is trained and certified in radiological clinical evaluation. Increasingly interpretation is being undertaken by non-physicians, for example radiographers frequently train in interpretation as part of expanded practice. Diagnostic radiography designates the technical aspects of medical imaging and in particular the acquisition of medical images. The radiographer (also known as a radiologic technologist) is usually responsible for acquiring medical images of diagnostic quality; although other professionals may train in this area, notably some radiological interventions performed by radiologists are done so without a radiographer. As a field of scientific investigation, medical imaging constitutes a sub-discipline of biomedical engineering, medical physics or medicine depending on the context: Research and development in the area of instrumentation, image acquisition (e.g., radiography), modeling and quantification are usually the preserve of biomedical engineering, medical physics, and computer science; Research into the application and interpretation of medical images is usually the preserve of radiology and the medical sub-discipline relevant to medical condition or area of medical science (neuroscience, cardiology, psychiatry, psychology, etc.) under investigation. Many of the techniques developed for medical imaging also have scientific and industrial applications. === Radiography === Two forms of radiographic images are in use in medical imaging. Projection radiography and fluoroscopy, with the latter being useful for catheter guidance. These 2D techniques are still in wide use despite the advance of 3D tomography due to the low cost, high resolution, and depending on the application, lower radiation dosages with 2D technique. This imaging modality uses a wide beam of X-rays for image acquisition and is the first imaging technique available in modern medicine. Fluoroscopy produces real-time images of internal structures of the body in a similar fashion to radiography, but employs a constant input of X-rays, at a lower dose rate. Contrast media, such as barium, iodine, and air are used to visualize internal organs as they work. Fluoroscopy is also used in image-guided procedures when constant feedback during a procedure is required. An image receptor is required to convert the radiation into an image after it has passed through the area of interest. Early on, this was a fluorescing screen, which gave way to an Image Amplifier (IA) which was a large vacuum tube that had the receiving end coated with cesium iodide, and a mirror at the opposite end. Eventually the mirror was replaced with a TV camera. Projectional radiographs, more commonly known as X-rays, are often used to determine the type and extent of a fracture as well as for detecting pathological changes in the lungs. With the use of radio-opaque contrast media, such as barium, they can also be used to visualize the structure of the stomach and intestines – this can help diagnose ulcers or certain types of colon cancer. === Magnetic resonance imaging === A magnetic resonance imaging instrument (MRI scanner), or "nuclear magnetic resonance (NMR) imaging" scanner as it was originally known, uses powerful magnets to polarize and excite hydrogen nuclei (i.e., single protons) of water molecules in human tissue, producing a detectable signal that is spatially encoded, resulting in images of the body. The MRI machine emits a radio frequency (RF) pulse at the resonant frequency of the hydrogen atoms on water molecules. Radio frequency antennas ("RF coils") send the pulse to the area of the body to be examined. The RF pulse is absorbed by protons, causing their direction with respect to the primary magnetic field to change. When the RF pulse is turned off, the protons "relax" back to alignment with the primary magnet and emit radio waves in the process. This radio-frequency emission from the hydrogen atoms on water is what is detected and reconstructed into an image. The resonant frequency of a spinning magnetic dipole (of which protons are one example) is called the Larmor frequency and is determined by the strength of the main magnetic field and the chemical environment of the nuclei of interest. MRI uses three electromagnetic fields: a very strong (typically 1.5 to 3 teslas) static magnetic field to polarize the hydrogen nuclei, called the primary field; gradient fields that can be modified to vary in space and time (on the order of 1 kHz) for spatial encoding, often simply called gradients; and a spatially homogeneous radio-frequency (RF) field for manipulation of the hydrogen nuclei to produce measurable signals, collected through an RF antenna. Like CT, MRI traditionally creates a two-dimensional image of a thin "slice" of the body and is therefore considered a tomographic imaging technique. Modern MRI instruments are capable of producing images in the form of 3D blocks, which may be considered a generalization of the single-slice

Local ternary patterns

Local ternary patterns (LTP) are an extension of local binary patterns (LBP). Unlike LBP, it does not threshold the pixels into 0 and 1, rather it uses a threshold constant to threshold pixels into three values. Considering k as the threshold constant, c as the value of the center pixel, a neighboring pixel p, the result of threshold is: { 1 , if p > c + k 0 , if p > c − k and p < c + k − 1 if p < c − k {\displaystyle {\begin{cases}1,&{\text{if }}p>c+k\\0,&{\text{if }}p>c-k{\text{ and }}p

Scene statistics

Scene statistics is a discipline within the field of perception. It is concerned with the statistical regularities related to scenes. It is based on the premise that a perceptual system is designed to interpret scenes. Biological perceptual systems have evolved in response to physical properties of natural environments. Therefore natural scenes receive a great deal of attention. Natural scene statistics are useful for defining the behavior of an ideal observer in a natural task, typically by incorporating signal detection theory, information theory or estimation theory. == Within-domain versus across-domain == Geisler (2008) distinguishes between four kinds of domains: (1) Physical environments (2) Images/Scenes (3) Neural responses and (4) Behavior. Within the domain of images/scenes one can study the characteristics of information related to redundancy and efficient coding. Across-domain statistics determine how an autonomous system should make inferences about its environment, process information and control its behavior. To study these statistics it is necessary to sample or register information in multiple domains simultaneously. == Applications == === Prediction of picture and video quality === One of the most successful applications of Natural Scenes Statistics Models has been perceptual picture and video quality prediction. For example, the Visual Information Fidelity (VIF) algorithm, which is used to measure the degree of distortion of pictures and videos, is used extensively by the image and video processing communities to assess perceptual quality. This is often after processing, such as compression, which can degrade the appearance of a visual signal. The premise is that the scene statistics are changed by distortion and that the visual system is sensitive to the changes in the scene statistics. VIF is heavily used in the streaming television industry. Other popular picture quality models that use natural scene statistics include BRISQUE and NIQE, both of which are no-reference since they do not require any reference picture to measure quality against.