AI Video Generator Tools

AI Video Generator Tools — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Hi uTandem

    Hi uTandem

    Hi uTandem, also known as uTandem, is a free language exchange mobile app. It helps people to connect with other language learners in order to carry out face-to-face language exchange sessions and also offers learners lists of businesses in the field of language learning or language exchange. == Use == Hi uTandem is built around the concept of language exchange, which is a method of language learning based on mutual oral linguistic exchange between partners. Ideally, each partner is a native speaker of the language they are helping their counterpart to learn. The app designed for users to chat with other users and translate messages, find suitable language partners and to locate language schools, bars, cafés and language exchange groups around them. == Team and development == Hi uTandem was released in January, 2016. The initial idea was conceived by Alberto Rodríguez as part of a team of eight Spanish youngsters. Hi uTandem belongs to the company Velvor Tech S.L., founded by the same members and registered in Ronda (Spain). == Reception == Hi uTandem was listed on the Top 4 Apps to Learn Languages list by ElPlural.com and since its launch it has been featured in numerous online and physical sources, including 20 minutos, Europapress, ABC Andalucía and Telefónica's Think Big Blog.

    Read more →
  • Automatic scorer

    Automatic scorer

    An automatic scorer is the computerized scoring system to keep track of scoring in ten-pin bowling. It was introduced en masse in bowling alleys in the 1970s and combined with mechanical pinsetters to detect overturned pins. By eliminating the need for manual score-keeping, these systems have introduced new bowlers into the game who otherwise would not participate because they had to count the score themselves, as many do not understand the mathematical formula involved in bowler scoring. At first, people were skeptical about whether a computer could keep an accurate score. In the twenty-first century, automatic scorers are used in most bowling centers around the world. The three manufacturers of these specialty computers have been Brunswick Bowling, AMF Bowling (later QubicaAMF), and RCA. == History == Automatic equipment is considered a cornerstone of the modern bowling center. The traditional bowling center of the early 20th century was advanced in automation when the pinsetter person ("pin boy"), who set back up by hand the bowled down pins, was replaced by a machine that automatically replaced the pins in their proper play positions. This machine came out in the 1950s. A detection system was developed from the pinsetter mechanism in the 1960s that could tell which pins had been knocked down, and that information could be transferred to a digital computer. Automatic electronic scoring was first conceived by Robert Reynolds, who was described by a newspaper story at the time as "a West Coast electronics calculator expert." He worked with the technical staff of Brunswick Bowling to develop it. The goal was realized in the late 1960s when a specialized computer was designed for the purpose of automatic scorekeeping for bowling. The field test for the automatic scorer took place at Village Lanes bowling center, Chicago in 1967. The scoring machine received approval for official use by the American Bowling Congress in August of that year. They were first used in national official league gaming on October 10, 1967. In November, Brunswick announced that they were accepting orders for the new digital computer, which cost around $3,000 per bowling lane. Bowling centers that installed these new automatic scoring devices in the 1970s charged a ten cents extra per line of scoring for the convenience. == Description == Each Automatic Scorer computer unit kept score for four lanes. It had two bowler identification panels serving two lanes each. The bowler pushed it into his named position when his turn came up so the computer knew who was bowling and score accordingly. After the bowler rolled the bowling ball down the lane and knocked down pins, the pinsetter detected which pins were down and relayed this information back to the computer for scoring. The result was then printed on a scoresheet and projected overhead onto a large screen for all to see. The Automatic Scorer digital computer was mathematically accurate, however the detection system at the pinsetter mechanism sometimes reported the wrong number of pins knocked down. The computer could be corrected manually for any errors in the system; similarly, human errors, such as neglecting to move the bowler identification mechanism, could be corrected for by manual action. The scorer could take into account bowlers' handicaps and could adjust for late-arriving bowlers. The automatic scorer is directly connected to the foul detection unit. As a result, foul line violations are automatically scored. Brunswick had put ten years of research and development into the Automatic Scorer, and by 1972 there were over 500 of these computers installed in bowling centers around the world. AMF Bowling, competitor to Brunswick, entered into the automatic scorer computer field during the 1970s and their systems were installed into their brand of bowling centers. By 1974, RCA was also making these computers for automatic scoring. == Reception and further developments == The purposes of the computerized scoring were to avoid errors by human scorers and to prevent cheating. It had the side benefit of speeding up the progress of the game and introducing new bowlers to the game. Score-keeping for bowling is based on a formula that many new to bowling were not familiar with and thought difficult to learn. These casual bowlers unfamiliar with the formula thought the scores given by the computers were confusing. Some bowlers were not comfortable with automatic scorers when they were introduced in the 1970s, so kept score using the traditional method on paper score sheets. The introduction of this device increased the popularity of the sport. Automatic scorers came to be considered a normal part of modern bowling installations worldwide, with owners and managers saying that bowlers expect such equipment to be present in bowling establishments and that business increased following their introduction. Brunswick introduced a color television style automatic scorer in 1983. Bowling center owners could use these style automatic scorers for advertising, management, videos, and live television. By the 2010s, these types of electronic visual displays could show bowler avatars and social media connections to publicize the bowlers' scores. Some are capable of being extended entertainment systems of games for children and adults. Some scoring systems support variations on traditional bowling, such as different kinds of bingo games where certain pins have to be knocked down at certain times or practice regimes where certain spares have to be accomplished. By this point, QubicaAMF Worldwide, an outgrowth of AMF, was one of the leading providers of bowling scoring equipment.

    Read more →
  • Adobe After Effects

    Adobe After Effects

    Adobe After Effects is a digital effects, motion graphics, and compositing application developed by Adobe Inc.; it is used for animation and in the post-production process of film making, video games and television production. Among other things, After Effects can be used for keying, tracking, compositing, and animation. It also functions as a very basic non-linear editor, audio editor, and media transcoder. In 2019, the program won an Academy Award for scientific and technical achievement. == History == After Effects was originally created by David Herbstman, David Simons, Daniel Wilk, David M. Cotter, and Russell Belfer at the Company of Science and Art in Providence, Rhode Island. The first two versions of the software, 1.0 (January 1993) and 1.1, were released there by the company. CoSA with After Effects was acquired by Aldus Corporation in July 1993, which in turn was acquired by Adobe in 1994. Adobe acquired PageMaker as well. Adobe's first new release of After Effects was version 3.0. == Third-party integrations == After Effects functionality can be extended through a variety of third-party integrations. The most common integrations are: plug-ins, scripts, and extensions. === Plug-ins === Plug-ins are predominantly written in C or C++ and extend the functionality of After Effects, allowing for more advanced features such as particle systems, physics engines, 3D effects, and the ability to bridge the gap between After Effects and another. === Scripts === After Effects Scripts are a series of commands written in both JavaScript and the ExtendScript language. After Effects Scripts, unlike plug-ins, can only access the core functionality of After Effects. Scripts are often developed to automate repetitive tasks, to simplify complex After Effects features, or to perform complex calculations that would otherwise take a long time to complete. Scripts can also use some functionality not directly exposed through the graphical user interface. === Extensions === After Effects Extensions offer the ability to extend After Effects functionality through modern web development technologies like HTML5, and Node.js, without the need for C++. After Effects Extensions make use of Adobe's Common Extensibility Platform or CEP Panels, which means they can be built to interact with other Adobe CC apps.

    Read more →
  • FuseBase

    FuseBase

    FuseBase (previously Nimbus Note and Nimbus Platform) is a B2B SaaS platform. It is among the first to support the Model Context Protocol (MCP), an open standard enabling seamless integration of AI agents with external tools, systems, and data sources. == History == The platform was founded in 2014 as Nimbus Note, the platform started as a cross-platform note-taking and information management tool. As it evolved into Nimbus Platform, it added project management and client portal capabilities. In 2023, the company rebranded as FuseBase, pivoting to connect and automate both internal and external collaboration through AI Agents and cutting-edge protocol adoption like MCP. At the same time, FuseBase was named Product of the Year on Product Hunt. == Technical overview == The platform integrates the Model Context Protocol (MCP), an open-source framework created by Anthropic. MCP allows AI models to securely access and interact with external data, tools, and systems. This enables FuseBase AI Agents to gather relevant context, perform actions, and provide more advanced automation.

    Read more →
  • Telligent Community

    Telligent Community

    Telligent Community is a community and collaboration software platform developed by Telligent Systems and was first released in 2004. Telligent Community is built on the Telligent Evolution platform, with a variety of core applications running on top of it such as blogs, forums, media galleries, and wikis. Additional applications from third parties using the API's and REST stack can be installed or integrated with the platform. Telligent Community is built with ASP.NET, C#, and Microsoft SQL Server. It is available as downloadable software that can be installed on a web server or via hosting providers. The current version is Verint Community 12.0 which was released February 2012. The product used to be named Community Server before being rebranded as part of the 5.0 release. == History == Telligent Systems was founded by Rob Howard in 2004, who was previously part of Microsoft's ASP.NET team. Telligent introduced its first product, Community Server, in the fall of 2004. Community Server was one of the first integrated community platforms that brought together blogs, photo galleries, wikis, forums, user profiles and more. Community Server was based on the merger of three then-widely used open source ASP.NET projects: the ASP.NET Forums, nGallery photo gallery, and .Text blog engine. The people behind those projects (Scott Watermasysk, Jason Alexander, and Rob Howard) joined together as Telligent Systems and along with several other software developers created Community Server 1.0. Between 2004 and 2009 Community Server steadily grew in scope, features, and capabilities. In 2008 Telligent Systems released a second version of Community Server that targeted as an Enterprise Social Software platform used to create and manage internal employee communities and intranets. Originally branded as Community Server Evolution this was later renamed Telligent Enterprise. Telligent also announced a new Enterprise Reporting platform at its first Community Server Developers Conference in 2008, which was later renamed Harvest. It was one of the first analytics suites for enterprise collaboration software, and provides social analytics including sentiment analysis, social fingerprints, and buzz analysis on social networking sites such as Twitter. Telligent rebranded all of its products on June 23, 2009 at the Enterprise 2.0 conference when it launched its new Evolution platform product suite. Community Server became known as Telligent Community, Community Server Evolution became known as Telligent Enterprise and the underlying platform that both run on is now referred to as Telligent Evolution. The Social Analytics suite was renamed Telligent Analytics.

    Read more →
  • Structural similarity index measure

    Structural similarity index measure

    The structural similarity index measure (SSIM) is a method for predicting the perceived quality of digital television and cinematic pictures, as well as other kinds of digital images and videos. It is also used for measuring the similarity between two images. The SSIM index is a full reference metric; in other words, the measurement or prediction of image quality is based on an initial uncompressed or distortion-free image as reference. SSIM is a perception-based model that considers image degradation as perceived change in structural information, while also incorporating important perceptual phenomena, including both luminance masking and contrast masking terms. This distinguishes from other techniques such as mean squared error (MSE) or peak signal-to-noise ratio (PSNR) that instead estimate absolute errors. Structural information is the idea that the pixels have strong inter-dependencies especially when they are spatially close. These dependencies carry important information about the structure of the objects in the visual scene. Luminance masking is a phenomenon whereby image distortions (in this context) tend to be less visible in bright regions, while contrast masking is a phenomenon whereby distortions become less visible where there is significant activity or "texture" in the image. == History == The predecessor of SSIM was called Universal Quality Index (UQI), or Wang–Bovik index, which was developed by Zhou Wang and Alan Bovik in 2001. This evolved, through their collaboration with Hamid Sheikh and Eero Simoncelli, into the current version of SSIM, which was published in April 2004 in the IEEE Transactions on Image Processing. In addition to defining the SSIM quality index, the paper provides a general context for developing and evaluating perceptual quality measures, including connections to human visual neurobiology and perception, and direct validation of the index against human subject ratings. The basic model was developed in the Laboratory for Image and Video Engineering (LIVE) at The University of Texas at Austin and further developed jointly with the Laboratory for Computational Vision (LCV) at New York University. Further variants of the model have been developed in the Image and Visual Computing Laboratory at University of Waterloo and have been commercially marketed. SSIM subsequently found strong adoption in the image processing community and in the television and social media industries. The 2004 SSIM paper has been cited over 50,000 times according to Google Scholar, making it one of the highest cited papers in the image processing and video engineering fields. It was recognized with the IEEE Signal Processing Society Best Paper Award for 2009. It also received the IEEE Signal Processing Society Sustained Impact Award for 2016, indicative of a paper having an unusually high impact for at least 10 years following its publication. Because of its high adoption by the television industry, the authors of the original SSIM paper were each accorded a Primetime Engineering Emmy Award in 2015 by the Television Academy. == Algorithm == The SSIM index is calculated between two windows of pixel values x {\displaystyle x} and y {\displaystyle y} of common size, from corresponding locations in two images to be compared. These SSIM values can be aggregated across the full images by averaging or other variations. === Special-case formula === In one simple special case, further explained in the next section, the SSIM measure between x {\displaystyle x} and y {\displaystyle y} is: SSIM ( x , y ) = ( 2 μ x μ y + c 1 ) ( 2 σ x y + c 2 ) ( μ x 2 + μ y 2 + c 1 ) ( σ x 2 + σ y 2 + c 2 ) {\displaystyle {\hbox{SSIM}}(x,y)={\frac {(2\mu _{x}\mu _{y}+c_{1})(2\sigma _{xy}+c_{2})}{(\mu _{x}^{2}+\mu _{y}^{2}+c_{1})(\sigma _{x}^{2}+\sigma _{y}^{2}+c_{2})}}} with: μ x {\displaystyle \mu _{x}} the pixel sample mean of x {\displaystyle x} ; μ y {\displaystyle \mu _{y}} the pixel sample mean of y {\displaystyle y} ; σ x 2 {\displaystyle \sigma _{x}^{2}} the sample variance of x {\displaystyle x} ; σ y 2 {\displaystyle \sigma _{y}^{2}} the sample variance of y {\displaystyle y} ; σ x y {\displaystyle \sigma _{xy}} the sample covariance of x {\displaystyle x} and y {\displaystyle y} ; c 1 = ( k 1 L ) 2 {\displaystyle c_{1}=(k_{1}L)^{2}} , c 2 = ( k 2 L ) 2 {\displaystyle c_{2}=(k_{2}L)^{2}} two variables to stabilize the division with weak denominator; L {\displaystyle L} the dynamic range of the pixel-values (typically this is 2 # b i t s p e r p i x e l − 1 {\displaystyle 2^{\#bits\ per\ pixel}-1} ); k 1 = 0.01 {\displaystyle k_{1}=0.01} and k 2 = 0.03 {\displaystyle k_{2}=0.03} by default. === General formula and components === The SSIM formula is based on three comparison measurements between the samples of x {\displaystyle x} and y {\displaystyle y} : luminance ( l {\displaystyle l} ), contrast ( c {\displaystyle c} ), and structure ( s {\displaystyle s} ). The individual comparison functions are: l ( x , y ) = 2 μ x μ y + c 1 μ x 2 + μ y 2 + c 1 {\displaystyle l(x,y)={\frac {2\mu _{x}\mu _{y}+c_{1}}{\mu _{x}^{2}+\mu _{y}^{2}+c_{1}}}} c ( x , y ) = 2 σ x σ y + c 2 σ x 2 + σ y 2 + c 2 {\displaystyle c(x,y)={\frac {2\sigma _{x}\sigma _{y}+c_{2}}{\sigma _{x}^{2}+\sigma _{y}^{2}+c_{2}}}} s ( x , y ) = σ x y + c 3 σ x σ y + c 3 {\displaystyle s(x,y)={\frac {\sigma _{xy}+c_{3}}{\sigma _{x}\sigma _{y}+c_{3}}}} The SSIM for each block is then a weighted combination of those comparative measures: SSIM ( x , y ) = l ( x , y ) α ⋅ c ( x , y ) β ⋅ s ( x , y ) γ {\displaystyle {\text{SSIM}}(x,y)=l(x,y)^{\alpha }\cdot c(x,y)^{\beta }\cdot s(x,y)^{\gamma }} Choosing the third denominator stabilizing constant as: c 3 = c 2 / 2 {\displaystyle c_{3}=c_{2}/2} leads to a simplification when combining the c and s components with equal exponents ( β = γ {\displaystyle \beta =\gamma } ), as the numerator of c is then twice the denominator of s, leading to a cancellation leaving just a 2. Setting the weights (exponents) α , β , γ {\displaystyle \alpha ,\beta ,\gamma } to 1, the formula can then be reduced to the special case shown above. === Mathematical properties === SSIM satisfies the identity of indiscernibles, and symmetry properties, but not the triangle inequality or non-negativity, and thus is not a distance function. However, under certain conditions, SSIM may be converted to a normalized root MSE measure, which is a distance function. The square of such a function is not convex, but is locally convex and quasiconvex, making SSIM a feasible target for optimization. === Application of the formula === In order to evaluate the image quality, this formula is usually applied only on luma, although it may also be applied on color (e.g., RGB) values or chromatic (e.g. YCbCr) values. The resultant SSIM index is a decimal value between -1 and 1, where 1 indicates perfect similarity, 0 indicates no similarity, and -1 indicates perfect anti-correlation. For an image, it is typically calculated using a sliding Gaussian window of size 11×11 or a block window of size 8×8. The window can be displaced pixel-by-pixel on the image to create an SSIM quality map of the image. In the case of video quality assessment, the authors propose to use only a subgroup of the possible windows to reduce the complexity of the calculation. === Variants === ==== Multi-scale SSIM ==== A more advanced form of SSIM, called Multiscale SSIM (MS-SSIM) is conducted over multiple scales through a process of multiple stages of sub-sampling, reminiscent of multiscale processing in the early vision system. It has been shown to perform equally well or better than SSIM on different subjective image and video databases. ==== Multi-component SSIM ==== Three-component SSIM (3-SSIM) is a form of SSIM that takes into account the fact that the human eye can see differences more precisely on textured or edge regions than on smooth regions. The resulting metric is calculated as a weighted average of SSIM for three categories of regions: edges, textures, and smooth regions. The proposed weighting is 0.5 for edges, 0.25 for the textured and smooth regions. The authors mention that a 1/0/0 weighting (ignoring anything but edge distortions) leads to results that are closer to subjective ratings. This suggests that edge regions play a dominant role in image quality perception. The authors of 3-SSIM have also extended the model into four-component SSIM (4-SSIM). The edge types are further subdivided into preserved and changed edges by their distortion status. The proposed weighting is 0.25 for all four components. ==== Structural dissimilarity ==== Structural dissimilarity (DSSIM) may be derived from SSIM, though it does not constitute a distance function as the triangle inequality is not necessarily satisfied. DSSIM ( x , y ) = 1 − SSIM ( x , y ) 2 {\displaystyle {\hbox{DSSIM}}(x,y)={\frac {1-{\hbox{SSIM}}(x,y)}{2}}} ==== Video quality metrics and temporal variants ==== It is worth noting that the original vers

    Read more →
  • Phase stretch transform

    Phase stretch transform

    Phase stretch transform (PST) is a computational approach to signal and image processing. One of its utilities is for feature detection and classification. PST is related to time stretch dispersive Fourier transform. It transforms the image by emulating propagation through a diffractive medium with engineered 3D dispersive property (refractive index). The operation relies on symmetry of the dispersion profile and can be understood in terms of dispersive eigenfunctions or stretch modes. PST performs similar functionality as phase-contrast microscopy, but on digital images. PST can be applied to digital images and temporal (time series) data. It is a physics-based feature engineering algorithm. == Operation principle == Here the principle is described in the context of feature enhancement in digital images. The image is first filtered with a spatial kernel followed by application of a nonlinear frequency-dependent phase. The output of the transform is the phase in the spatial domain. The main step is the 2-D phase function which is typically applied in the frequency domain. The amount of phase applied to the image is frequency dependent, with higher amount of phase applied to higher frequency features of the image. Since sharp transitions, such as edges and corners, contain higher frequencies, PST emphasizes the edge information. Features can be further enhanced by applying thresholding and morphological operations. PST is a pure phase operation whereas conventional edge detection algorithms operate on amplitude. == Physical and mathematical foundations of phase stretch transform == Photonic time stretch technique can be understood by considering the propagation of an optical pulse through a dispersive fiber. By disregarding the loss and non-linearity in fiber, the non-linear Schrödinger equation governing the optical pulse propagation in fiber upon integration reduces to: E o ( z , t ) = 1 2 π ∫ − ∞ ∞ E ~ i ( 0 , ω ) ⋅ e − i β 2 z ω 2 2 ⋅ e i ω t d ω {\displaystyle E_{o}(z,t)={\frac {1}{2\pi }}\int _{-\infty }^{\infty }{\tilde {E}}_{i}(0,\omega )\cdot e^{\frac {-i\beta _{2}z\omega ^{2}}{2}}\cdot e^{i\omega {t}}\,d\omega } (1) where β 2 {\displaystyle \beta _{2}} = GVD parameter, z is propagation distance, E o ( z , t ) {\displaystyle E_{o}(z,t)} is the reshaped output pulse at distance z and time t. The response of this dispersive element in the time-stretch system can be approximated as a phase propagator as presented in H ( ω ) = e i φ ( ω ) = e i ∑ m = 0 ∞ φ m ( ω ) = ∏ m = 0 ∞ H m ( ω ) {\displaystyle H(\omega )=e^{i\varphi (\omega )}=e^{i\sum _{m=0}^{\infty }\varphi _{m}(\omega )}=\prod _{m=0}^{\infty }H_{m}(\omega )} (2) Therefore, Eq. 1 can be written as following for a pulse that propagates through the time-stretch system and is reshaped into a temporal signal with a complex envelope given by E o ( t ) = 1 2 π ∫ − ∞ ∞ E ~ i ( ω ) ⋅ H ( ω ) ⋅ e i ω t d ω {\displaystyle E_{o}(t)={\frac {1}{2\pi }}\int _{-\infty }^{\infty }{\tilde {E}}_{i}(\omega )\cdot H(\omega )\cdot e^{i\omega t}\,d\omega } (3) The time stretch operation is formulated as generalized phase and amplitude operations, S { E i ( t ) } = ∫ − ∞ + ∞ F { E i ( t ) } ⋅ e i φ ( ω ) ⋅ L ~ ( ω ) ⋅ e i ω t d ω {\displaystyle \mathbb {S} \{E_{i}(t)\}=\int _{-\infty }^{+\infty }{\mathcal {F}}\{E_{i}(t)\}\cdot e^{i\varphi (\omega )}\cdot {\tilde {L}}(\omega )\cdot e^{i\omega {t}}d\omega } (4) where e i φ ( ω ) {\displaystyle e^{i\varphi (\omega )}} is the phase filter and L ~ ( ω ) {\displaystyle {\tilde {L}}(\omega )} is the amplitude filter. Next the operator is converted to discrete domain, S { E i [ n ] } = 1 N ∑ u = 0 N − 1 F F T { E i ( n ) } ⋅ K ~ ( u ) ⋅ L ~ ( u ) ⋅ e i 2 π N u n {\displaystyle \mathbb {S} \{E_{i}[n]\}={\frac {1}{N}}\sum _{u=0}^{N-1}FFT\{E_{i}(n)\}\cdot {\tilde {K}}(u)\cdot {\tilde {L}}(u)\cdot e^{i{\frac {2\pi }{N}}un}} (5) where u {\displaystyle u} is the discrete frequency, K ~ ( u ) {\displaystyle {\tilde {K}}(u)} is the phase filter, L ~ ( u ) {\displaystyle {\tilde {L}}(u)} is the amplitude filter and FFT is fast Fourier transform. The stretch operator S { } {\displaystyle \mathbb {S} \{\}} for a digital image is then S { E i [ n , m ] } = 1 M N ∑ v = 0 N − 1 ∑ u = 0 M − 1 F F T 2 { E i ( n , m ) } ⋅ K ~ ( u , v ) ⋅ L ~ ( u , v ) ⋅ e i 2 π M u m ⋅ e i 2 π N v n {\displaystyle \mathbb {S} \{E_{i}[n,m]\}={\frac {1}{MN}}\sum _{v=0}^{N-1}\sum _{u=0}^{M-1}FFT^{2}\{E_{i}(n,m)\}\cdot {\tilde {K}}(u,v)\cdot {\tilde {L}}(u,v)\cdot e^{i{\frac {2\pi }{M}}um}\cdot e^{i{\frac {2\pi }{N}}vn}} (6) In the above equations, E i [ n , m ] {\displaystyle E_{i}[n,m]} is the input image, n {\displaystyle n} and m {\displaystyle m} are the spatial variables, F F T 2 {\displaystyle FFT^{2}} is the two-dimensional fast Fourier transform, and u {\displaystyle u} and v {\displaystyle v} are spatial frequency variables. The function K ~ ( u , v ) {\displaystyle {\tilde {K}}(u,v)} is the warped phase kernel and the function L ~ ( u , v ) {\displaystyle {\tilde {L}}(u,v)} is a localization kernel implemented in frequency domain. PST operator is defined as the phase of the Warped Stretch Transform output as follows P S T { E i [ n , m ] } ≜ ∡ { S { E i [ x , y ] } } {\displaystyle PST\{E_{i}[n,m]\}\triangleq \measuredangle \{\mathbb {S} \{E_{i}[x,y]\}\}} (7) where ∡ { } {\displaystyle \measuredangle \{\}} is the angle operator. == PST kernel implementation == The warped phase kernel K ~ ( u , v ) {\displaystyle {\tilde {K}}(u,v)} can be described by a nonlinear frequency dependent phase K ~ ( u , v ) = e i φ ( u , v ) {\displaystyle {\tilde {K}}(u,v)=e^{i\varphi (u,v)}} While arbitrary phase kernels can be considered for PST operation, here we study the phase kernels for which the kernel phase derivative is a linear or sublinear function with respect to frequency variables. A simple example for such phase derivative profiles is the inverse tangent function. Consider the phase profile in the polar coordinate system φ ( u , v ) = φ polar ( r , θ ) = φ polar ( r ) {\displaystyle \varphi (u,v)=\varphi _{\text{polar}}(r,\theta )=\varphi _{\text{polar}}(r)} From d φ ( r ) d r = tan − 1 ⁡ ( r ) {\displaystyle {\frac {d\varphi (r)}{dr}}=\tan ^{-1}(r)} we have φ ( r ) = r tan − 1 ⁡ ( r ) − 1 2 log ⁡ ( r 2 + 1 ) {\displaystyle \varphi (r)=r\tan ^{-1}(r)-{\frac {1}{2}}\log(r^{2}+1)} Therefore, the PST kernel is implemented as φ ( r ) = S ⋅ ( W r ) ⋅ tan − 1 ⁡ ( W r ) − 1 2 log ⁡ ( 1 + ( W r ) 2 ) ( W r max ) ⋅ tan − 1 ⁡ ( W r max ) − 1 2 log ⁡ ( 1 + ( W r max ) 2 ) {\displaystyle \varphi (r)=S\cdot {\frac {(Wr)\cdot \tan ^{-1}(Wr)-{\frac {1}{2}}\log(1+(Wr)^{2})}{(Wr_{\max })\cdot \tan ^{-1}(Wr_{\max })-{\frac {1}{2}}\log(1+(Wr_{\max })^{2})}}} where S {\displaystyle S} and W {\displaystyle W} are real-valued numbers related to the strength and warp of the phase profile == Applications == PST has been used for edge detection in biological and biomedical images as well as synthetic-aperture radar (SAR) image processing, as well as detail and feature enhancement for digital images. PST has also been applied to improve the point spread function for single molecule imaging in order to achieve super-resolution. The transform exhibits intrinsic superior properties compared to conventional edge detectors for feature detection in low contrast visually impaired images. The PST function can also be performed on 1-D temporal waveforms in the analog domain to reveal transitions and anomalies in real time. == Open source code release == On February 9, 2016, a UCLA Engineering research group has made public the computer code for PST algorithm that helps computers process images at high speeds and "see" them in ways that human eyes cannot. The researchers say the code could eventually be used in face, fingerprint, and iris recognition systems for high-tech security, as well as in self-driving cars' navigation systems or for inspecting industrial products. The Matlab implementation for PST can also be downloaded from Matlab Files Exchange. However, it is provided for research purposes only, and a license must be obtained for any commercial applications. The software is protected under a US patent. The code was then significantly refactored and improved to support GPU acceleration. In May 2022, it became one algorithm in PhyCV: the first physics-inspired computer vision library.

    Read more →
  • EditDV

    EditDV

    EditDV was a video editing software released by Radius, Inc. in late 1997 as an evolution of their earlier Radius Edit product. EditDV was one of the first products providing professional-quality editing of the then new DV format at a relatively affordable cost ($999 including Radius FireWire capture card) and was named "The Best Video Tool of 1998". Originally EditDV was available for Macintosh only but in February 2000 EditDV 2.0 for Windows was released. With version 3.0 EditDV's name was changed to CineStream. == Features == Originally bundled with a FireWire card, EditDV 1.5 got updated into a less expensive software only package for use with the newer PowerMac G3 that came with a FireWire interface. Later, a scaled down version named EditDV 1.6.1 Unplugged was released as a freeware version next to EditDV 2.0. Unlike many other applications at the time which transcoded video to M-JPEG for editing, EditDV provided lossless native editing of the DV format. Only transitions (such as dissolves or wipes), effects (such as rotating or scaling the video, adjusting the audio level, or adding titles) and filters (such as changing the brightness or color balance) needed to be rendered. This also had the disadvantage to not work with analogue video capture. EditDV was built on top of QuickTime and supported QuickTime filters as well as its own built-in effects and transitions. Effects could be animated using keyframes. EditDV 2.0 worked natively with Quicktime MOV format. For Microsoft Windows users, where the standard was AVI, this required the use of a provided external conversion tool afterwards when AVI was wanted. The user interface had a Project window for organising clips into bins, a Sequence window with a multi-track timeline for arranging clips into a program using three-point editing, and Source and Program monitor windows. A finished program could either be exported as a QuickTime movie or written back to DV tape using the "print to video" command. Version 3.0, then renamed CineStream, shifted towards web designers who wanted to add video streaming interactivity to a website. The new feature called EventStream allowed setting clickable hot spots to link to another location, either to another page with a URL or to another video. This feature distinguished CineStream from the rest of the competition. == Products == The EditDV product family included a number of related products, all sharing a similar name: EditDV Video editing software (Mac and Windows) SoftDV A QuickTime software codec for playing DV media, included as part of EditDV (Mac and Windows) MotoDV PCI-based FireWire interface with DV capture software (Mac and Windows) PhotoDV Software to capture high-quality stills from a DV tape using MotoDV hardware (Mac and Windows) RotoDV Software for rotoscoping (painting over video), released in Sept 1999 (Macintosh only) == Name changes and eventual demise == In 1999, the company Radius Inc. changed its name to Digital Origin. In 2000, Digital Origin Inc (and EditDV) was bought by Media 100. In early 2001, Media 100 released an updated version of EditDV under the new name CineStream 3.0. Later that year (October 2001) Media 100 was bought by Autodesk's Discreet Division. CineStream for Macintosh required classic Mac OS. It was never ported to Mac OS X and faced increasing competition on that platform from Apple's own Final Cut Pro application. Development of EditDV/Cinestream was officially discontinued in 2002.

    Read more →
  • Color histogram

    Color histogram

    In image processing and photography, a color histogram is a representation of the distribution of colors in an image. For digital images, a color histogram represents the number of pixels that have colors in each of a fixed list of color ranges that span the image's color space (the set of all possible colors). A color histogram can be built for any kind of color space, although the term is more often used for three-dimensional spaces such as RGB or HSV. For monochromatic images, the term intensity histogram may be used instead. For multi-spectral images, where each pixel is represented by an arbitrary number of measurements (for example, beyond the three measurements in RGB), a color histogram is N-dimensional, with N being the number of measurements taken. Each measurement has its own wavelength range of the light spectrum, some of which may be outside the visible spectrum. If the set of possible color values is sufficiently small, each of those colors may be placed on a range by itself; then the histogram is merely the count of pixels that have each possible color. Most often, the space is divided into an appropriate number of ranges, often arranged as a regular grid, each containing many similar color values. A color histogram may also be represented and displayed as a smooth function defined over the color space that approximates the pixel counts. Like other kinds of histograms, a color histogram is a statistic that can be viewed as an approximation of an underlying continuous distribution of color values. == Overview == Color histograms are flexible constructs that can be built from images in various color spaces, whether RGB, rg chromaticity or any other color space of any dimension. A histogram of an image is produced first by discretization of the colors in the image into a number of bins, and counting the number of image pixels in each bin. For example, a red–blue chromaticity histogram can be formed by first normalizing color pixel values by dividing RGB values by R+G+B, then quantizing the normalized R and B coordinates into N bins each. A two-dimensional histogram of red–blue chromaticity divided into four bins (N=4) may yield a histogram similar to this table: A histogram can be N-dimensional. Although harder to display, a three-dimensional color histogram for the above example could be thought of as four separate red–blue histograms, where each of the four histograms contains the red–blue values for a bin of green (0–63, 64–127, 128–191, and 192–255). The histogram provides a compact summarization of the distribution of data in an image. A color histogram of an image is relatively invariant with translation and rotation about the viewing axis, and varies only slowly with the angle of view. By comparing histogram signatures of two images and matching the color content of one image with the other, a color histogram is particularly well suited for the problem of recognizing an object of unknown position and rotation within a scene. Importantly, translation of an RGB image into the illumination invariant rg-chromaticity space allows the histogram to operate well in varying light levels. 1. What is a histogram? A histogram is a graphical representation of the number of pixels in an image. In a more simple way to explain, a histogram is a bar graph, whose X-axis represents the tonal scale (black at the left and white at the right), and Y-axis represents the number of pixels in an image in a certain area of the tonal scale. For example, the graph of a luminance histogram shows the number of pixels for each brightness level (from black to white), and when there are more pixels, the peak at the certain luminance level is higher. 2. What is a color histogram? A color histogram of an image represents the distribution of the composition of colors in the image. It shows different types of colors appeared and the number of pixels in each type of the colors appeared. The relation between a color histogram and a luminance histogram is that a color histogram can be also expressed as “three luminance histograms”, each of which shows the brightness distribution of each individual red/green/blue color channel. == Characteristics of a color histogram == A color histogram focuses only on the proportion of the number of different types of colors, regardless of the spatial location of the colors. The values of a color histogram are from statistics. They show the statistical distribution of colors and the essential tone of an image. In general, as the color distributions of the foreground and background in an image are different, there might be a bimodal distribution in the histogram. For the luminance histogram alone, there is no perfect histogram and in general, the histogram can tell whether it is over-exposure or not, but there are times when you might think the image is over exposed by viewing the histogram; however, in reality it is not. == Principles of the formation of a color histogram == The formation of a color histogram is rather simple. From the definition above, we can simply count the number of pixels for each 256 scales in each of the 3 RGB channel, and plot them on 3 individual bar graphs. In general, a color histogram is based on a certain color space, such as RGB or HSV. When we compute the pixels of different colors in an image, if the color space is large, then we can first divide the color space into certain numbers of small intervals. Each of the intervals is called a bin. This process is called color quantization. Then, by counting the number of pixels in each of the bins, we get a color histogram of the image. The concrete steps of the principles can be viewed in Example 1. == Examples == === Example 1 === Given the following image of a cat (an original version and a version that has been reduced to 256 colors for easy histogram purposes), the following data represents a color histogram in the RGB color space, using four bins. Bin 0 corresponds to intensities 0–63 Bin 1 is 64–127 Bin 2 is 128–191 and Bin 3 is 192–255. === Example 2 === Application in camera: Nowadays, some cameras have the ability to show the 3 color histograms when we take photos. We can examine clips (spikes on either the black or white side of the scale) in each of the 3 RGB color histograms. If we find one or more clipping on a channel of the 3 RGB channels, then this would result in a loss of detail for that color. To illustrate this, consider this example: We know that each of the three R, G, B channels has a range of values from 0 to 255 (8 bit). So consider a photo that has a luminance range of 0–255. Assume the photo we take is made of 4 blocks that are adjacent to each other and we set the luminance scale for each of the 4 blocks of original photo to be 10, 100, 205, 245. Thus, the image looks like the topmost figure on the right. Then, we overexpose the photo a little, say, the luminance scale of each block is increased by 10. Thus, the luminance scale for each of the 4 blocks of new photo is 20, 110, 215, 255. Then, the image looks like the second figure on the right. There is not much difference between both figures, all we can see is that the whole image becomes brighter (the contrast for each of the blocks remain the same). Now, we overexpose the original photo again, this time the luminance scale of each block is increased by 50. Thus, the luminance scale for each of the 4 blocks of the new photo is 60, 150, 255, 255. The new image now looks like the third figure on the right. Note that the scale for the last block is 255 instead of 295, for 255 is the top scale and thus the last block has clipped. When this happens, we lose the contrast of the last 2 blocks, and thus we cannot recover the image no matter how we adjust it. To conclude, when taking photos with a camera that displays histograms, always keep the brightest tone in the image below the largest scale 255 on the histogram in order to avoid losing details. == Drawbacks and other approaches == The main drawback of histograms for classification is that the representation is dependent on the color of the object being studied, ignoring its shape and texture. Color histograms can potentially be identical for two images with different object content which happens to share color information. Conversely, without spatial or shape information, similar objects of different color may be indistinguishable based solely on color histogram comparisons. There is no way to distinguish a red and white cup from a red and white plate. Put it another way: histogram-based algorithms have no concept of a generic 'cup', and a model of a red and white cup is no use when given an otherwise identical blue and white cup. Another problem is that color histograms have high sensitivity to noisy interference such as lighting intensity changes and quantization errors. High dimensionality (bins) color histograms are also another issue. Some color histogram feature spaces often occupy more than one hundred di

    Read more →
  • Alexis Spectral Data

    Alexis Spectral Data

    Alexis Spectral Data is a software developed for colour matching processes that calculates from available spectral data the colour numbers used by computers to display colours on screen. It displays the colour for each spectral reflectance curve and records the calculated trichromatic values and colour numbers along with the spectral curves. This eliminates the need to scan the samples separately with a truecolour Scanner while creating the database. The spectral data can be introduced manually as a series of reflectance values at wavelengths measured in different standard illuminants with an arbitrary but fixed increment that must be kept for each spectral curve throughout the creation of the whole database. Therefore, older UV-VIS Spectrophotometers that can't be interfaced with computers can also be used for creating the database needed for colour matching. Alexis Spectral Data determines the whiteness degree in a less time-consuming method, which permits storage and easier handling of the obtained data. Alexis Spectral Data can export the trichromatic values, calculated from the spectral curves, to Alexis Analyser, software that handles only trichromatic data. The earliest information about the development of this software comes from a paper published by a student at the University Politehnica Bucharest in 1993. The software runs on Windows based computers but not on other operating systems.

    Read more →
  • Bump (application)

    Bump (application)

    Bump was an iOS and Android mobile app that enabled smartphone users to transfer contact information, photos and files between devices. In 2011, it was #8 on Apple's list of all-time most popular free iPhone apps, and by February 2013 it had been downloaded 125 million times. Its developer, Bump Technologies, shut down the service and discontinued the app on January 31, 2014, after being acquired by Google for Google Photos and Android Camera. == Features == Bump sent contact information, photos and files to another device over the internet. Before activating the transfer, each user confirmed what they want to send to the other user. To initiate a transfer, two people physically bumped their phones together. A screen appeared on both users' smartphone displays, allowing them to confirm what they want to send to each other. When two users bumped their phones, software on the phones send a variety of sensor data to an algorithm running on Bump servers, which included the location of the phone, accelerometer readings, IP address, and other sensor readings. The algorithm figured out which two phones felt the same physical bump and then transfers the information between those phones. Bump did not use Near Field Communication. February 2012 release of Bump 3.0 for iOS, the company streamlined the app to focus on its most frequently used features: contact and photo sharing. Bump 3.0 for Android maintained the features eliminated from the iOS version but moved them behind swipeable layers. In May 2012, a Bump update enabled users to transfer photos from their phone to their computer via a web service. To initiate a transfer, the user goes to the Bump website on their computer and bumps the smartphone on the computer keyboard's space bar. By December 2012, various Bump updates for iOS and Android had added the abilities to share video, audio, and any files. Users swipe to access those features. In February 2013, an update to the Bump iOS and Android apps enabled users to transfer photos, videos, contacts and other files from a computer to a smartphone and vice versa via a web service. To perform the transfer, users went to the Bump website on their computer and bump the smartphone on the computer keyboard's space bar. == History == The underlying idea of a synchronous gesture like bumping two devices for content transfer or pairing them was first conceived by Ken Hinkley of Microsoft Research in 2003. This idea was presented at a user interface and technology conference that same year. The paper proposed the use of accelerometers and a bumping gesture of two devices to enable communication, screen sharing and content transfer between them. Similar to this original concept, the idea for Bump app was conceived by David Lieb, a former employee of Texas Instruments, while he was attending the University of Chicago Booth School of Business for his MBA. While going through the orientation and meeting process of business school, he became frustrated by constantly entering contact information into his iPhone and felt that the process could be improved. His fellow Texas Instruments employees Andy Huibers and Jake Mintz, who was a classmate of Lieb's at the University of Chicago's MBA program, joined Lieb to form Bump Technologies. Bump Technologies launched in 2008 and is located in Mountain View, CA. Early funding for the project was provided by startup incubator Y Combinator, Sequoia Capital and other angel investors. It gained attention at the CTIA international wireless conference, due to its accessibility and novelty factor. In October 2009, Bump received $3.4m in Series A funding followed in January 2011 with a $16m series B financing round led by Andreessen Horowitz. Silicon Valley venture capitalist Marc Andreessen sits on the company's board. The Bump app debuted in the Apple iOS App Store in March 2009 and was “one of the apps that helped to define the iPhone” (Harry McCracken, Technologizer). It soon became the billionth download on Apple's App Store. An Android version launched in November 2009. By the time Bump 3.0 for iOS was released in February 2012, the app had been installed 77 million times, with users sharing more than 2 million photos daily. As of February 2013, there had been 125 million Bump app downloads. == Other apps created by Bump Technologies == Bump Technologies worked with PayPal in March 2010 to create a PayPal iPhone application. The application, which allows two users to automatically activate an Internet transfer of money between their accounts, found widespread adoption. A similar version was released for Android in August 2010. The Bump capability in PayPal's apps was removed in March 2012. At that time, Bump Technologies released Bump Pay, an iOS app that lets users transfer money via PayPal by physically bumping two smartphones together. The tool was originally created for the Bump team to use when splitting up restaurant bills. The payment feature was not added to the Bump app because the company “wanted to make it as simple as possible so people understand how this works,” Lieb told ABC News. Bump Pay was the first app from the company's Bump Labs initiative. A goal of Bump Labs is to test new app ideas that may not fit within the main Bump app. ING Direct added a feature to its iPhone app in 2011 that lets users transfer money to each other using Bump's technology. The feature was later added to its Android app, now called Capital One 360. In July 2012, Bump Technologies released Flock, an iPhone photo sharing app. An Android version was released in December 2012. Using geolocation data embedded in photos and a user's Facebook connections, Flock finds pictures the user takes while out with friends and family and puts everyone's photos from that event into a single shared album. Users receive a push notification after the event, asking if they want to share their photos with friends who were there in the moment. The app will also scan previous photos in the iPhone camera roll and uncover photos that have yet to be shared. If location services were enabled at the time a photo was taken, Flock allows users to create an album of photos from the past with the friends who were there with them. == Acquisition by Google == On September 16, 2013, Bump Technologies announced that it had been acquired by Google. On December 31, 2013, they broke the news that both Bump and Flock would be discontinued so that the team could focus on new projects at Google. The apps were removed from the App Store and Google Play on January 31, 2014. The company subsequently deleted all user data and shut down their servers, thus rendering existing installations of the apps inoperable.

    Read more →
  • Template matching

    Template matching

    Template matching is a technique in digital image processing for finding small parts of an image which match a template image. It can be used for quality control in manufacturing, navigation of mobile robots, or edge detection in images. The main challenges in a template matching task are detection of occlusion, when a sought-after object is partly hidden in an image; detection of non-rigid transformations, when an object is distorted or imaged from different angles; sensitivity to illumination and background changes; background clutter; and scale changes. == Feature-based approach == The feature-based approach to template matching relies on the extraction of image features, such as shapes, textures, and colors, that match the target image or frame. This approach is usually achieved using neural networks and deep-learning classifiers such as VGG, AlexNet, and ResNet.Convolutional neural networks (CNNs), which many modern classifiers are based on, process an image by passing it through different hidden layers, producing a vector at each layer with classification information about the image. These vectors are extracted from the network and used as the features of the image. Feature extraction using deep neural networks, like CNNs, has proven extremely effective has become the standard in state-of-the-art template matching algorithms. This feature-based approach is often more robust than the template-based approach described below. As such, it has become the state-of-the-art method for template matching, as it can match templates with non-rigid and out-of-plane transformations, as well as high background clutter and illumination changes. == Template-based approach == For templates without strong features, or for when the bulk of a template image constitutes the matching image as a whole, a template-based approach may be effective. Since template-based matching may require sampling of a large number of data points, it is often desirable to reduce the number of sampling points by reducing the resolution of search and template images by the same factor before performing the operation on the resultant downsized images. This pre-processing method creates a multi-scale, or pyramid, representation of images, providing a reduced search window of data points within a search image so that the template does not have to be compared with every viable data point. Pyramid representations are a method of dimensionality reduction, a common aim of machine learning on data sets that suffer the curse of dimensionality. == Common challenges == In instances where the template may not provide a direct match, it may be useful to implement eigenspaces to create templates that detail the matching object under a number of different conditions, such as varying perspectives, illuminations, color contrasts, or object poses. For example, if an algorithm is looking for a face, its template eigenspaces may consist of images (i.e., templates) of faces in different positions to the camera, in different lighting conditions, or with different expressions (i.e., poses). It is also possible for a matching image to be obscured or occluded by an object. In these cases, it is unreasonable to provide a multitude of templates to cover each possible occlusion. For example, the search object may be a playing card, and in some of the search images, the card is obscured by the fingers of someone holding the card, or by another card on top of it, or by some other object in front of the camera. In cases where the object is malleable or poseable, motion becomes an additional problem, and problems involving both motion and occlusion become ambiguous. In these cases, one possible solution is to divide the template image into multiple sub-images and perform matching on each subdivision. == Deformable templates in computational anatomy == Template matching is a central tool in computational anatomy (CA). In this field, a deformable template model is used to model the space of human anatomies and their orbits under the group of diffeomorphisms, functions which smoothly deform an object. Template matching arises as an approach to finding the unknown diffeomorphism that acts on a template image to match the target image. Template matching algorithms in CA have come to be called large deformation diffeomorphic metric mappings (LDDMMs). Currently, there are LDDMM template matching algorithms for matching anatomical landmark points, curves, surfaces, volumes. == Template-based matching explained using cross correlation or sum of absolute differences == A basic method of template matching sometimes called "Linear Spatial Filtering" uses an image patch (i.e., the "template image" or "filter mask") tailored to a specific feature of search images to detect. This technique can be easily performed on grey images or edge images, where the additional variable of color is either not present or not relevant. Cross correlation techniques compare the similarities of the search and template images. Their outputs should be highest at places where the image structure matches the template structure, i.e., where large search image values get multiplied by large template image values. This method is normally implemented by first picking out a part of a search image to use as a template. Let S ( x , y ) {\displaystyle S(x,y)} represent the value of a search image pixel, where ( x , y ) {\displaystyle (x,y)} represents the coordinates of the pixel in the search image. For simplicity, assume pixel values are scalar, as in a greyscale image. Similarly, let T ( x t , y t ) {\textstyle T(x_{t},y_{t})} represent the value of a template pixel, where ( x t , y t ) {\textstyle (x_{t},y_{t})} represents the coordinates of the pixel in the template image. To apply the filter, simply move the center (or origin) of the template image over each point in the search image and calculate the sum of products, similar to a dot product, between the pixel values in the search and template images over the whole area spanned by the template. More formally, if ( 0 , 0 ) {\displaystyle (0,0)} is the center (or origin) of the template image, then the cross correlation T ⋆ S {\displaystyle T\star S} at each point ( x , y ) {\displaystyle (x,y)} in the search image can be computed as: ( T ⋆ S ) ( x , y ) = ∑ ( x t , y t ) ∈ T T ( x t , y t ) ⋅ S ( x t + x , y t + y ) {\displaystyle (T\star S)(x,y)=\sum _{(x_{t},y_{t})\in T}T(x_{t},y_{t})\cdot S(x_{t}+x,y_{t}+y)} For convenience, T {\displaystyle T} denotes both the pixel values of the template image as well as its domain, the bounds of the template. Note that all possible positions of the template with respect to the search image are considered. Since cross correlation values are greatest when the values of the search and template pixels align, the best matching position ( x m , y m ) {\displaystyle (x_{m},y_{m})} corresponds to the maximum value of T ⋆ S {\displaystyle T\star S} over S {\displaystyle S} . Another way to handle translation problems on images using template matching is to compare the intensities of the pixels, using the sum of absolute differences (SAD) measure. To formulate this, let I S ( x s , y s ) {\displaystyle I_{S}(x_{s},y_{s})} and I T ( x t , y t ) {\displaystyle I_{T}(x_{t},y_{t})} denote the light intensity of pixels in the search and template images with coordinates ( x s , y s ) {\displaystyle (x_{s},y_{s})} and ( x t , y t ) {\displaystyle (x_{t},y_{t})} , respectively. Then by moving the center (or origin) of the template to a point ( x , y ) {\displaystyle (x,y)} in the search image, as before, the sum of absolute differences between the template and search pixel intensities at that point is: S A D ( x , y ) = ∑ ( x t , y t ) ∈ T | I T ( x t , y t ) − I S ( x t + x , y t + y ) | {\displaystyle SAD(x,y)=\sum _{(x_{t},y_{t})\in T}\left\vert I_{T}(x_{t},y_{t})-I_{S}(x_{t}+x,y_{t}+y)\right\vert } With this measure, the lowest SAD gives the best position for the template, rather than the greatest as with cross correlation. SAD tends to be relatively simple to implement and understand, but it also tends to be relatively slow to execute. A simple C++ implementation of SAD template matching is given below. == Implementation == In this simple implementation, it is assumed that the above described method is applied on grey images: This is why Grey is used as pixel intensity. The final position in this implementation gives the top left location for where the template image best matches the search image. One way to perform template matching on color images is to decompose the pixels into their color components and measure the quality of match between the color template and search image using the sum of the SAD computed for each color separately. == Speeding up the process == In the past, this type of spatial filtering was normally only used in dedicated hardware solutions because of the computational complexity of the operation, however we can lessen this complexity b

    Read more →
  • AI washing

    AI washing

    AI washing is a deceptive marketing tactic that consists of promoting a product or a service by overstating the role of artificial intelligence (AI) and the integration of it. Companies often involve in the practice to mislead customers to boost their offerings, and to secure funding from investors. The practice raises concerns regarding transparency, and legal issues. == Definition == AI washing is a deceptive marketing practice. It involves promoting a product or a service by overstating the role of artificial intelligence (AI) and its integration in the design and manufacture of the same. The practice raises concerns regarding transparency, compliance with security regulations, and consumer trust in the AI industry potentially hampering legitimate advancements in AI. The term was first defined by the AI Now Institute, a research institute based at New York University in 2019. The term is derived from greenwashing, another deceptive marketing technique that misrepresents a product's environmental impact in a similar manner. AI washing might involve a company claiming to have used AI in the development or enhancement of its products or services without its actual involvement, or using buzzwords such as "smart" or "AI-powered" without the product actually offering it or making use of it. A company may overstate the usage of AI or misuse the term, which is also construed as AI washing. In 2026, The Washington Post defined AI washing as "a trend for bosses to blame layoffs on the productive capabilities of AI and its ability to replace workers, even when job cuts may have little to do with the technology". == Usage and effects == AI washing can lead to deception of customers and misleading of investors. It is also an illegal and unethical practice that lacks transparency regarding disclosing the details of a product or a service. Companies get involved in such a practice often in response to competition who might have used AI in their offerings. It might also be used as a ploy to secure funding and investment, assuming that it will attract them towards it. AI washing has been compared to dot-com bubble, when businesses appended "dot-com" to the end of the business name to boost their valuation. In September 2023, Coca-Cola released a new product called Coca-Cola Y3000, and the company stated that the Y3000 flavor had been "co-created with human and artificial intelligence". The company was accused of AI washing due to no proof of AI involvement in the creation of the product, and critics believed that AI was used as a way to grab consumer attention more than it was used in the actual product creation. In 2026, mass tech layoffs were attributed to AI washing from AI innovation instead of balance sheet restructuring. == Mitigation == Companies are expected to be transparent and clearer in communicating the usage of AI in their products or services. Consumers can mitigate the same by requesting for hard evidence from the companies regarding the usage of AI tools. Customers should evaluate the product or service as a whole rather than being swayed by the usage of AI. Informed decision making and purchasing can keep them from falling for such marketing gimmicks. The United States Securities and Exchange Commission (SEC) imposes penalties for companies indulging in such practices. In March 2024, the SEC imposed the first civil penalties on two companies for misleading statements about their use of AI, and in July 2024, it charged a corporate executive from a supposed AI hiring startup with fraud for the usage of buzzwords related to AI.

    Read more →
  • Scan line

    Scan line

    A scan line (also scanline) is one line, or row, in a raster scanning pattern, such as a line of video on a cathode-ray tube (CRT) display of a television set or computer monitor. On CRT screens the horizontal scan lines are visually discernible, even when viewed from a distance, as alternating colored lines and black lines, especially when a progressive scan signal with below maximum vertical resolution is displayed. This is sometimes used today as a visual effect in computer graphics. The term is used, by analogy, for a single row of pixels in a raster graphics image. Scan lines are important in representations of image data, because many image file formats have special rules for data at the end of a scan line. For example, there may be a rule that each scan line starts on a particular boundary (such as a byte or word; see for example BMP file format). This means that even otherwise compatible raster data may need to be analyzed at the level of scan lines in order to convert between formats.

    Read more →
  • Cybernetics

    Cybernetics

    Cybernetics is the transdisciplinary study of circular causal processes such as feedback and recursion, where the effects of a system's actions (its outputs) return as inputs to that system, influencing subsequent actions. It is concerned with general principles that are relevant across multiple contexts, including engineering, ecological, economic, biological, cognitive and social systems and also in practical activities such as designing, learning, and managing. Cybernetics' transdisciplinary character means that it intersects with a number of other fields, resulting in a wide influence and diverse interpretations. The field is named after an example of circular causal feedback—that of steering a ship (the ancient Greek κυβερνήτης (kybernḗtēs) refers to the person who steers a ship). In steering a ship, the position of the rudder is adjusted in continual response to the effect it is observed as having, forming a feedback loop through which a steady course can be maintained in a changing environment, responding to disturbances from cross winds and tide. Cybernetics has its origins in exchanges between numerous disciplines during the 1940s. Initial developments were consolidated through meetings such as the Macy conferences and the Ratio Club. Early focuses included purposeful behaviour, neural networks, heterarchy, information theory, and self-organising systems. As cybernetics developed, it became broader in scope to include work in design, family therapy, management and organisation, pedagogy, sociology, the creative arts and the counterculture. == Definitions == Cybernetics has been defined in a variety of ways, reflecting "the richness of its conceptual base". One of the best known definitions is that of the American scientist Norbert Wiener, who characterised cybernetics as concerned with "control and communication in the animal and the machine". Another early definition is that of the Macy cybernetics conferences, where cybernetics was understood as the study of "circular causal and feedback mechanisms in biological and social systems". Margaret Mead emphasised the role of cybernetics as "a form of cross-disciplinary thought which made it possible for members of many disciplines to communicate with each other easily in a language which all could understand". Other definitions include: "the art of governing or the science of government" (André-Marie Ampère); "the art of steersmanship" (Ross Ashby); "the study of systems of any nature which are capable of receiving, storing, and processing information so as to use it for control" (Andrey Kolmogorov); and "a branch of mathematics dealing with problems of control, recursiveness, and information, focuses on forms and the patterns that connect" (Gregory Bateson). == Etymology == The Ancient Greek term κυβερνητικός (kubernētikos, '(good at) steering') appears in Plato's Republic and Alcibiades, where the metaphor of a steersman is used to signify the governance of people. The French word cybernétique was also used in 1834 by the physicist André-Marie Ampère to denote the sciences of government in his classification system of human knowledge. According to Norbert Wiener, the word cybernetics was coined by a research group involving himself and Arturo Rosenblueth in the summer of 1947. It has been attested in print since at least 1948 through Wiener's book Cybernetics: Or Control and Communication in the Animal and the Machine. In the book, Wiener states: After much consideration, we have come to the conclusion that all the existing terminology has too heavy a bias to one side or another to serve the future development of the field as well as it should; and as happens so often to scientists, we have been forced to coin at least one artificial neo-Greek expression to fill the gap. We have decided to call the entire field of control and communication theory, whether in the machine or in the animal, by the name Cybernetics, which we form from the Greek κυβερνήτης or steersman. Moreover, Wiener explains, the term was chosen to recognize James Clerk Maxwell's 1868 publication on feedback mechanisms involving governors, noting that the term governor is also derived from κυβερνήτης (kubernḗtēs) via a Latin corruption gubernator. Finally, Wiener motivates the choice by steering engines of a ship being "one of the earliest and best-developed forms of feedback mechanisms". == History == === First wave === The initial focus of cybernetics was on parallels between regulatory feedback processes in biological and technological systems. Two foundational articles were published in 1943: "Behavior, Purpose and Teleology" by Arturo Rosenblueth, Norbert Wiener, and Julian Bigelow – based on the research on living organisms that Rosenblueth did in Mexico – and the paper "A Logical Calculus of the Ideas Immanent in Nervous Activity" by Warren McCulloch and Walter Pitts. The foundations of cybernetics were then developed through a series of transdisciplinary conferences funded by the Josiah Macy, Jr. Foundation, between 1946 and 1953. The conferences were chaired by McCulloch and had participants that included Ross Ashby, Gregory Bateson, Heinz von Foerster, Margaret Mead, John von Neumann, and Norbert Wiener. In the UK, similar focuses were explored by the Ratio Club, an informal dining club of young psychiatrists, psychologists, physiologists, mathematicians and engineers that met between 1949 and 1958. Wiener introduced the neologism cybernetics to denote the study of "teleological mechanisms" and popularized it through the book Cybernetics: Or Control and Communication in the Animal and the Machine. During the 1950s, cybernetics was developed as a primarily technical discipline, such as in Qian Xuesen's 1954 "Engineering Cybernetics". The text was quickly translated into multiple languages and became a foundational text on automation. In the Soviet Union, Cybernetics was initially considered with suspicion but became accepted from the mid to late 1950s. By the 1960s and 1970s, however, cybernetics' transdisciplinarity fragmented, with technical focuses separating into separate fields. Artificial intelligence (AI) was founded as a distinct discipline at the Dartmouth workshop in 1956, differentiating itself from the broader cybernetics field. After some uneasy coexistence, AI gained funding and prominence. Consequently, cybernetic sciences such as the study of artificial neural networks were downplayed. Similarly, computer science became defined as a distinct academic discipline in the 1950s and early 1960s. === Second wave === The second wave of cybernetics came to prominence from the 1960s onwards, with its focus shifting away from technology toward social, ecological, and philosophical concerns. It was still grounded in biology, notably Maturana and Varela's autopoiesis, and built on earlier work on self-organising systems and the presence of anthropologists Mead and Bateson in the Macy meetings. The Biological Computer Laboratory, founded in 1958 and active until the mid-1970s under the direction of Heinz von Foerster at the University of Illinois at Urbana–Champaign, was a major incubator of this trend in cybernetics research. Focuses of the second wave of cybernetics included management cybernetics, such as Stafford Beer's biologically inspired viable system model; work in family therapy, drawing on Bateson; social systems, such as in the work of Niklas Luhmann; epistemology and pedagogy, such as in the development of radical constructivism. Cybernetics' core theme of circular causality was developed beyond goal-oriented processes to concerns with reflexivity and recursion, notably in Mead's invocation at the inaugural meeting of the American Society for Cybernetics (ASC) to apply cybernetics to the activities of the ASC itself. This focus on reflexivity was especially prominent in the development of second-order cybernetics (or the cybernetics of cybernetics), developed and promoted by Heinz von Foerster, which focused on questions of observation, cognition, epistemology, and ethics. The 1960s onwards also saw cybernetics begin to develop exchanges with the creative arts, design, and architecture, notably with the Cybernetic Serendipity exhibition (ICA, London, 1968), curated by Jasia Reichardt, and the unrealised Fun Palace project (London, unrealised, 1964 onwards), where Gordon Pask was consultant to architect Cedric Price and theatre director Joan Littlewood. In 1962, Qian Xuesen recruited Song Jian and Guan Zhaozhi to establish China's first cybernetics laboratory with him. Following the Sino-Soviet split, cybernetics was deemed disreputable in China. The field was again favored in the 1970s and 1980s following Deng Xiaoping's emphasis on modernisation. === Third wave === From the 1990s onwards, there has been a renewed interest in cybernetics from a number of directions. Early cybernetic work on artificial neural networks has been returned to as a paradigm in machine learning and artifi

    Read more →