AI Coding Interview Questions

AI Coding Interview Questions — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Apps to analyse COVID-19 sounds

    Apps to analyse COVID-19 sounds

    Apps to analyse COVID-19 sounds are mobile software applications designed to collect respiratory sounds and aid diagnosis in response to the COVID-19 pandemic. Numerous applications are in development, with different institutions and companies taking various approaches to privacy and data collection. Current efforts are aimed at gathering data. In a later stage, it is possible that sound apps will have the capacity (and ethical approvals) to provide information back to users. In order to develop and train signal analysis approaches, large datasets are required. == History == The COVID-19 outbreak was announced as a global pandemic by the World Health Organization in March 2020 and has affected a growing number of people globally. In this context, advanced artificial intelligence techniques are being considered as tools in aiding our response to global health crisis. Other COVID-19 apps which offer solutions for user tracking have been developed. At the same time a number of approaches which tries to use respiratory sounds and artificial intelligence to understand if the disease can be diagnosed have been proposed. A few studies are available as preprints (i.e. not yet peer-reviewed) documents. == Methodologies == The potential for using speech and sound analysis by artificial intelligence to help in this scenario, by surveying which types of related or contextually significant phenomena can be automatically assessed from speech or sound has been recently overviewed. These include the automatic recognition and monitoring of breathing, dry and wet coughing or sneezing sounds, speech under cold, eating behaviour, sleepiness, or pain. Additionally, the potential use-cases of intelligent speech analysis for COVID-19 diagnosed patients has also been presented. In particular, by analysing speech recordings from these patients, an audio-only-based model to automatically categorise the health state of patients from four aspects, including the severity of illness, sleep quality, fatigue, and anxiety, is constructed. This work shows promise in estimating the severity of illness. Machine learning methods have been explored to recognize and diagnose coughs from different diseases. These included a low complexity, automated recognition and diagnostic tool for screening respiratory infections that utilizes convolutional neural networks (CNNs) to detect cough within environment audio and diagnose three potential illnesses (i.e. bronchitis, bronchiolitis and pertussis) based on their unique cough audio features. A large-scale crowdsourced dataset of respiratory sounds has been collected to aid diagnosis of COVID-19: coughs and breathing sounds are sufficient to distinguish users affected by COVID-19 versus those affected by asthma or healthy controls. Behind these studies is the ambition that automated systems to screen for respiratory diseases based on voice, raw cough or other sound data would have positive medical applications in both clinical and public health arenas. == List of apps to analyse COVID-19 sounds ==

    Read more →
  • Semi-automation

    Semi-automation

    Semi-automation is a process or procedure that is performed by the combined activities of man and machine with both human and machine steps typically orchestrated by a centralized computer controller. Within manufacturing, production processes may be fully manual, semi-automated, or fully automated. In this case, semi-automation may vary in its degree of manual and automated steps. Semi-automated manufacturing processes are typically orchestrated by a computer controller which sends messages to the worker at the time in which he/she should perform a step. The controller typically waits for feedback that the human performed step has been completed via either a human-machine interface or via electronic sensors distributed within the process. Controllers within semi-automated processes may either directly control machinery or send signals to machinery distributed within the process. Centralized computer controllers within semi-automated processes orchestrate processes by instructing the worker, providing electronic communication and control to process equipment, tools, or machines, as well as perform data management to record and ensure that the process meets established process criteria. Many manufacturers choose not to fully automate a process, and instead implement semi-automation due to the complexity of the task, or the number of products produced is too low to justify the investment in full automation. Other processes may not be fully automated because it may reduce the flexibility to easily adapt the processes to reflect production needs.

    Read more →
  • Mix automation

    Mix automation

    In music recording, mix automation allows the mixing console to remember the mixing engineer's dynamic adjustment of faders during a musical piece in the post-production editing process. A timecode is necessary for the synchronization of automation. Modern mixing consoles and digital audio workstations use comprehensive mix automation. The need for automated mixing originated from the late 1970s transition form 8-track to 16-track and then 24-track multitrack recording, as mixing could be laborious and require multiple people and hands, and the results could be almost impossible to reproduce. With 48-track recording - synchronized twin 24-track recorders (for a net 46 audio tracks, with one on each machine for SMPTE timecode) - came larger recording and mixing consoles with even more channel faders to manage during mixdown. Manufacturers, such as Neve Electronics (now AMS Neve) and Solid State Logic (SSL), both English companies, developed systems that enabled one engineer to oversee every detail of a complex mix, although the computers required to power these desks remained a rarity into the late 1970s. According to record producer Roy Thomas Baker, Queen's 1975 single "Bohemian Rhapsody" was one of the first mixes to be done with automation. == Types == Voltage Controlled Automation fader levels are regulated by voltage-controlled amplifiers (VCA). VCAs control the audio level and not the actual fader. Moving Fader Automation a motor is attached to the fader, which then can be controlled by the console, digital audio workstation (DAW), or user. Software Controlled Automation the software can be internal to the console, or external as part of a DAW. The virtual fader can be adjusted in the software by the user. MIDI Automation the communications protocol MIDI can be used to send messages to the console to control automation. == Modes == Auto Write used the first time automation is created or when writing over existing automation Auto Touch writes automation data only while a fader is touched/faders return to any previously automated position after release Auto Latch starts writing automation data when a fader is touched/stays in position after release Auto Read digital Audio Workstation performs the written automation Auto Off automation is temporarily disabled All of these include the mute button. If mute is pressed during writing of automation, the audio track will be muted during playback of that automation. Depending on software, other parameters such as panning, sends, and plug-in controls can be automated as well. In some cases, automation can be written using a digital potentiometer instead of a fader.

    Read more →
  • Group of Governmental Experts on Lethal Autonomous Weapons Systems

    Group of Governmental Experts on Lethal Autonomous Weapons Systems

    The Group of Governmental Experts on Lethal Autonomous Weapons Systems, commonly known as the GGE on LAWS, refers to a group of governmental experts established under the framework of the Convention on Certain Conventional Weapons (CCW), a United Nations arms control framework. The group examines legal, ethical, societal and moral questions that arise from the increased use of autonomous robots to carry weapons and to be programmed to engage in combat in various situations that might arise, including battles between countries, or in patrolling border areas or sensitive areas, or other similar roles. As of 18 March 2025, the Convention on Certain Conventional Weapons had 128 High Contracting Parties. In the Geneva Conventions, the term "High Contracting Parties" refers to the states that have joined the conventions and are therefore bound to uphold them. Among the countries that have joined are states with tense relations or ongoing armed conflict with one another, including Russia and Ukraine, Israel and the State of Palestine, and Pakistan and Afghanistan. == Background == In 2013, the Meeting of State Parties to the Convention on Certain Conventional Weapons agreed on a mandate on lethal autonomous weapon systems and tasked its chairperson with convening an informal Meeting of Experts to discuss issues related to emerging technologies in the area of LAWS. Those informal Meetings of Experts were then held in 2014, 2015 and 2016, and their reports fed into subsequent meetings of the High Contracting Parties. At the Fifth CCW Review Conference in 2016, the High Contracting Parties decided to establish an open-ended Group of Governmental Experts on emerging technologies in the area of LAWS, building on the earlier expert meetings. Since then, the group has been reconvened annually. In 2023, the Meeting of the High Contracting Parties to the CCW decided that the GGE on LAWS would continue its work in 2024 and 2025. The group was tasked with developing, by consensus, elements of a possible instrument, without predetermining its form, as well as other measures addressing lethal autonomous weapon systems, drawing on existing CCW protocols, earlier recommendations, state proposals, and legal, military, and technological expertise. == 2024 == In 2024, the GGE met twice, and the group was chaired by Robert in den Bosch, the Netherlands' disarmament ambassador. The 2024 Meeting of the High Contracting Parties decided that the group would meet for 10 days in 2025, in two five-day sessions, and reaffirmed its mandate to continue work by consensus on possible elements of an instrument and other measures addressing lethal autonomous weapon systems. == 2025 == At its first 2025 session, held in Geneva from 3 to 7 March 2025, the Group of Governmental Experts on Lethal Autonomous Weapon Systems discussed revisions to the chair's rolling text. The text was structured into five sections, or "boxes", though delegates held differing views on whether headings were useful or appropriate. Broadly, the discussions covered the characterization of lethal autonomous weapon systems, the application of international humanitarian law, possible prohibitions and regulations, legal review, and questions of accountability and responsibility. At its second session, held from 1 to 5 September 2025, delegations continued work on the chair's rolling text, which set out elements of a possible instrument and was organized into five thematic "boxes". == 2026 == === Developments before the 2026 session === A few weeks before the meeting, autonomous weapons drew renewed attention when the United States pressured Anthropic to revise the terms of use for its AI model Claude. Anthropic prohibited the model's use for mass domestic surveillance and for fully autonomous weapons operating without human oversight, while reports also emerged that OpenAI had reached an agreement with the U.S. Department of War for the use of its AI models, reportedly stipulating that they would not independently direct autonomous weapons where human control was required. The U.S. military nevertheless continued to use Claude during its war on Iran, and there was increasing alarm about the use of AI-assisted semi-autonomous weapons in conflicts including those in Ukraine, Sudan, Gaza, and Iran. Before the start of the sessions, Robert in den Bosch, as chair, warned that progress was urgent because technological developments were moving quickly. At the same time, although states agreed that international humanitarian law applied to LAWS, specific internationally binding standards governing such systems remained largely absent. A key divide before the session was that Russia and the United States opposed new legally binding instruments, while other states argued that new rules were necessary. According to Robert in den Bosch, the talks could lead to new rules, amendments to an existing convention, or a new treaty. === First session === From 2 to 6 March 2026, the group held its penultimate session under the group's three-year mandate. Delegations discussed the chair's rolling draft text, circulated in December 2025, on elements of a possible instrument or other measures concerning lethal autonomous weapon systems. In revised text circulated by the chair on 5 March 2026, a lethal autonomous weapon system was characterized as "a functionally integrated combination of one or more weapons and technological components, that can identify, select, and engage a target, without intervention by a human operator in the execution of these tasks". The text was divided into five boxes to structure discussion. During the session, delegates conducted a first reading of the draft text, and the chair later circulated revised language for several sections. Informal consultations were also held. According to campaign groups and participating observers, support grew during the week for moving to negotiations on the basis of the rolling text, with more than 70 states said to support that step by the end of the session, though some participants warned that attempts to bridge differences risked blurring the group's core purpose. The International Committee of the Red Cross argued that the text should not only restate existing international humanitarian law, but also clarify how those rules apply to autonomous weapons and set out additional measures tailored to the specific challenges such systems raise. Stop Killer Robots likewise emphasized the need to preserve meaningful human judgment and control over increasingly autonomous systems. During the discussions, the U.S. delegation opposed the term "human control" and reportedly proposed the alternative phrase "good faith human judgment and care". Other delegations rejected that wording as too weak, while many states continued to insist that meaningful human control over weapon systems remained essential.

    Read more →
  • Symbolic artificial intelligence

    Symbolic artificial intelligence

    In artificial intelligence, symbolic artificial intelligence (also known as classical artificial intelligence or logic-based artificial intelligence) is the term for the collection of all methods in artificial intelligence research that are based on high-level symbolic (human-readable) representations of problems, logic, and search. Symbolic AI used tools such as logic programming, production rules, semantic nets and frames, and it developed applications such as knowledge-based systems (in particular, expert systems), symbolic mathematics, automated theorem provers, ontologies, the semantic web, and automated planning and scheduling systems. The Symbolic AI paradigm led to important ideas in search, symbolic programming languages, agents, multi-agent systems, the semantic web, and the strengths and limitations of formal knowledge and reasoning systems. Symbolic AI was the dominant paradigm of AI research from the mid-1950s until the mid-1990s. Researchers in the 1960s and the 1970s were convinced that symbolic approaches would eventually succeed in creating a machine with artificial general intelligence and considered this the ultimate goal of their field. An early boom, with early successes such as the Logic Theorist and Samuel's Checkers Playing Program, led to unrealistic expectations and promises and was followed by the first AI Winter as funding dried up. A second boom (1969–1986) occurred with the rise of expert systems, their promise of capturing corporate expertise, and an enthusiastic corporate embrace. That boom, and some early successes, e.g., with XCON at DEC, was followed again by later disappointment. Problems with difficulties in knowledge acquisition, maintaining large knowledge bases, and brittleness in handling out-of-domain problems arose. Another, second, AI Winter (1988–2011) followed. Subsequently, AI researchers focused on addressing underlying problems in handling uncertainty and in knowledge acquisition. Uncertainty was addressed with formal methods such as hidden Markov models, Bayesian reasoning, and statistical relational learning. Symbolic machine learning addressed the knowledge acquisition problem with contributions including Version Space, Valiant's PAC learning, Quinlan's ID3 decision-tree learning, case-based learning, and inductive logic programming to learn relations. Neural networks, a subsymbolic approach, had been pursued from early days and reemerged strongly in 2012. Early examples are Rosenblatt's perceptron learning work, the backpropagation work of Rumelhart, Hinton and Williams, and work in convolutional neural networks by LeCun et al. in 1989. However, neural networks were not viewed as successful until about 2012: "Until Big Data became commonplace, the general consensus in the Al community was that the so-called neural-network approach was hopeless. Systems just didn't work that well, compared to other methods. ... A revolution came in 2012, when a number of people, including a team of researchers working with Hinton, worked out a way to use the power of GPUs to enormously increase the power of neural networks." Over the next several years, deep learning had spectacular success in handling vision, speech recognition, speech synthesis, image generation, and machine translation, though symbolic approaches continue to be useful in a few domains such as computer algebra systems and proof assistants. == History == A short history of symbolic AI to the present day follows below. Time periods and titles are drawn from Henry Kautz's 2020 AAAI Robert S. Engelmore Memorial Lecture and the longer Wikipedia article on the History of AI, with dates and titles differing slightly for increased clarity. === The first AI summer: irrational exuberance, 1948–1966 === Success at early attempts in AI occurred in three main areas: artificial neural networks, knowledge representation, and heuristic search, contributing to high expectations. This section summarizes Kautz's reprise of early AI history. ==== Approaches inspired by human or animal cognition or behavior ==== Cybernetic approaches attempted to replicate the feedback loops between animals and their environments. A robotic turtle, with sensors, motors for driving and steering, and seven vacuum tubes for control, based on a preprogrammed neural net, was built as early as 1948. This work can be seen as an early precursor to later work in neural networks, reinforcement learning, and situated robotics. An important early symbolic AI program was the Logic theorist, written by Allen Newell, Herbert Simon and Cliff Shaw in 1955–56, as it was able to prove 38 elementary theorems from Whitehead and Russell's Principia Mathematica. Newell, Simon, and Shaw later generalized this work to create a domain-independent problem solver, GPS (General Problem Solver). GPS solved problems represented with formal operators via state-space search using means-ends analysis. During the 1960s, symbolic approaches achieved great success at simulating intelligent behavior in structured environments such as game-playing, symbolic mathematics, and theorem-proving. AI research was concentrated in four institutions in the 1960s: Carnegie Mellon University, Stanford, MIT and (later) University of Edinburgh. Each one developed its own style of research. Earlier approaches based on cybernetics or artificial neural networks were abandoned or pushed into the background. Herbert Simon and Allen Newell studied human problem-solving skills and attempted to formalize them, and their work laid the foundations of the field of artificial intelligence, as well as cognitive science, operations research and management science. Their research team used the results of psychological experiments to develop programs that simulated the techniques that people used to solve problems. This tradition, centered at Carnegie Mellon University would eventually culminate in the development of the Soar architecture in the middle 1980s. ==== Heuristic search ==== In addition to the highly specialized domain-specific kinds of knowledge that we will see later used in expert systems, early symbolic AI researchers discovered another more general application of knowledge. These were called heuristics, rules of thumb that guide a search in promising directions: "How can non-enumerative search be practical when the underlying problem is exponentially hard? The approach advocated by Simon and Newell is to employ heuristics: fast algorithms that may fail on some inputs or output suboptimal solutions." Another important advance was to find a way to apply these heuristics that guarantees a solution will be found, if there is one, not withstanding the occasional fallibility of heuristics: "The A algorithm provided a general frame for complete and optimal heuristically guided search. A is used as a subroutine within practically every AI algorithm today but is still no magic bullet; its guarantee of completeness is bought at the cost of worst-case exponential time. ==== Early work on knowledge representation and reasoning ==== Early work covered both applications of formal reasoning emphasizing first-order logic, along with attempts to handle common-sense reasoning in a less formal manner. ===== Modeling formal reasoning with logic: the "neats" ===== Unlike Simon and Newell, John McCarthy felt that machines did not need to simulate the exact mechanisms of human thought, but could instead try to find the essence of abstract reasoning and problem-solving with logic, regardless of whether people used the same algorithms. His laboratory at Stanford (SAIL) focused on using formal logic to solve a wide variety of problems, including knowledge representation, planning and learning. Logic was also the focus of the work at the University of Edinburgh and elsewhere in Europe which led to the development of the programming language Prolog and the science of logic programming. ===== Modeling implicit common-sense knowledge with frames and scripts: the "scruffies" ===== Researchers at MIT (such as Marvin Minsky and Seymour Papert) found that solving difficult problems in vision and natural language processing required ad hoc solutions—they argued that no simple and general principle (like logic) would capture all the aspects of intelligent behavior. Roger Schank described their "anti-logic" approaches as "scruffy" (as opposed to the "neat" paradigms at CMU and Stanford). Commonsense knowledge bases (such as Doug Lenat's Cyc) are an example of "scruffy" AI, since they must be built by hand, one complicated concept at a time. === The first AI winter: crushed dreams, 1967–1977 === The first AI winter was a shock: During the first AI summer, many people thought that machine intelligence could be achieved in just a few years. The Defense Advance Research Projects Agency (DARPA) launched programs to support AI research to use AI to solve problems of national security; in particular, to automate the translation of Russian to English for inte

    Read more →
  • Frankenstein complex

    Frankenstein complex

    The Frankenstein complex is a term coined by Isaac Asimov in his robot series, referring to the fear of mechanical men. == History == Some of Asimov's science fiction short stories and novels predict that this suspicion will become strongest and most widespread in respect of "mechanical men" that most-closely resemble human beings (see android), but it is also present on a lower level against robots that are plainly electromechanical automatons. The "Frankenstein complex" is similar in many respects to Masahiro Mori's uncanny valley hypothesis. The name, "Frankenstein complex", is derived from the name of Victor Frankenstein in the 1818 novel Frankenstein; or, The Modern Prometheus by Mary Shelley. In Shelley's story, Frankenstein created an intelligent, somewhat superhuman being, but he finds that his creation is horrifying to behold and abandons it. This ultimately leads to Victor's death at the conclusion of a vendetta between himself and his creation. In much of his fiction, Asimov depicts the general attitude of the public towards robots as negative, with ordinary people fearing that robots will either replace them or dominate them, although dominance would not be allowed under the specifications of the Three Laws of Robotics, the first of which is: "A robot may not harm a human being or, through inaction, allow a human being to come to harm." However, Asimov's fictitious earthly public is not fully persuaded by this, and remains largely suspicious and fearful of robots. I, Robot's short story "Little Lost Robot" is about this "fear of robots". In Asimov's robot novels, the Frankenstein complex is a major problem for roboticists and robot manufacturers. They do all they can to reassure the public that robots are harmless, even though this sometimes involves hiding the truth because they think that the public would misunderstand it. The fear by the public and the response of the manufacturers is an example of the theme of paternalism, the dread of paternalism, and the conflicts that arise from it in Asimov's fiction. The same theme occurs in many later works of fiction featuring robots, although it is rarely referred to as such.

    Read more →
  • Desktop video

    Desktop video

    Desktop video refers to a phenomenon lasting from the mid-1980s to the early 1990s when the graphics capabilities of personal computers such as the Amiga, Macintosh II, and specially-upgraded IBM PC compatibles had advanced to the point where individuals and local broadcasters could use them for analog non-linear editing and vision mixing in video production. Despite the use of computers, desktop video should not be confused with digital video since the video data remained analog, and it uses items like a VCR and a camcorder to record the video. Full-screen, full-motion video's vast storage requirements meant that the promise of digital encoding would not be realized on desktop computers for at least another decade. == Description == There were multiple models of genlock cards available to synchronize the content; the Newtek Video Toaster was commonly used in Amiga in countries that used NTSC (PAL-M in Brazil), while PCs had Truevision and Matrox Illuminator cards and Mac systems had the SuperMac Video Spigot and Radius VideoVision cards. Apple later introduced the Macintosh Quadra 840AV and Centris 660AV systems to specifically address this market. Desktop video was a parallel development to desktop publishing and enabled many small production houses and local TV stations to produce their own original content for the first time. Along with the advent of public-access cable channels, desktop video meant that television advertising became affordable for local businesses such as retailers, restaurants, real estate agents, contractors and auto dealers. As with the phrase desktop publishing, use of the term died out as the technologies to which it referred become the norm for any kind of video production.

    Read more →
  • Color picker

    Color picker

    A color picker (also color chooser or color tool) is a graphical user interface widget, usually found within graphics software or online, used to select colors and, in some cases, to create color schemes (the color picker might be more sophisticated than the palette included with the program). Operating systems such as Microsoft Windows or macOS have a system color picker, which can be used by third-party programs (e.g., Adobe Photoshop). == History == The concept of color pickers dates back to the early days of computer graphics and digital design. Early versions were rudimentary, often featuring basic color palettes and limited functionality. One of the first drawing programs to include a color picker was SketchPad (also referred to as LisaSketch), designed by Bill Atkinson in 1983 to showcase LisaGraf's capabilities. It used a black and white pattern system, using dithering to create the illusion of color depth. With the increased popularity of personal computers with color graphics, there soon came software similar to SketchPad that supported more than two colors, like Broderbund's Dazzle Draw for the Apple II or Electronic Arts' Deluxe Paint. However, the color pickers present in those programs relied on indexed colors. Color pickers, resembling ones used in modern software with support for direct, 24-bit color, appeared soon after the release of the Macintosh II, with the release of programs like Adobe Photoshop and Corel Painter. As the increase of color depth allowed the choice of significantly more colors, the shape and form of color pickers started to diverge. For example, Adobe Photoshop used a hue-saturation color wheel with a slider for brightness in version 0.63, later on switching to a rectangular design accompanied by a hue slider. Corel Painter pioneered the triangular saturation and brightness picker with a hue ring around it, aiming to better represent the continuity of the hue spectrum and the relationship between saturation and brightness. == Purpose == A color picker is used to select and adjust color values. In graphic design and image editing, users typically choose colors via an interface with a visual representation of a color—organized with quasi-perceptually-relevant hue, saturation and lightness dimensions (HSL) – instead of keying in alphanumeric text values. Because color appearance depends on comparison of neighboring colors (see color vision), many interfaces attempt to clarify the relationships between colors. == Interface == Color tools can vary in their interface. Some may use sliders, buttons, text boxes for color values, or direct manipulation. Often a two-dimensional square is used to create a range of color values (such as lightness and saturation) that can be clicked on or selected in some other manner. Drag and drop, color droppers, and various other forms of interfaces are commonly used as well. Usually, color values are also displayed numerically, so they can be precisely remembered and keyed-in later, such as three values of 0-255 representing red, green, and blue, respectively. === Eyedropper === The eyedropper is a tool present in most color pickers and graphics software that allows a user to read a color at a specific point in an image, or position on a display. This enables the color to be transferred to other applications particularly quickly. Modern implementations of eyedropper tools are also available as browser extensions, allowing users to pick colors directly from web pages, such as in Google Chrome and Microsoft Edge. == Working == A color picker has two main parts, first a color slider and second a color canvas. The color slider has a linear or radial gradient of the seven rainbow colors i.e. Violet, Indigo, Blue, Green, Yellow, Orange and Red. It allows one to choose any of the seven primary colors. The color value chosen from the color slider instantly reflects in the color canvas. The color canvas is a mixture of two linear color gradients. First a linear gradient of the current chosen color and second a linear gradient of the black color. This mixture of color gradients lets one choose a lighter and darker version of the current chosen color from the color slider.

    Read more →
  • Voice activity detection

    Voice activity detection

    Voice activity detection (VAD), also known as speech activity detection or speech detection, is the detection of the presence or absence of human speech, used in speech processing. The main uses of VAD are in speaker diarization, speech coding and speech recognition. It can facilitate speech processing, and can also be used to deactivate some processes during non-speech section of an audio session: it can avoid unnecessary coding/transmission of silence packets in Voice over Internet Protocol (VoIP) applications, saving on computation and on network bandwidth. VAD is an important enabling technology for a variety of speech-based applications. Therefore, various VAD algorithms have been developed that provide varying features and compromises between latency, sensitivity, accuracy and computational cost. Some VAD algorithms also provide further analysis, for example whether the speech is voiced, unvoiced or sustained. Voice activity detection is usually independent of language. It was first investigated for use on time-assignment speech interpolation (TASI) systems. == Algorithm overview == The typical design of a VAD algorithm is as follows: There may first be a noise reduction stage, e.g. via spectral subtraction. Then some features or quantities are calculated from a section of the input signal. A classification rule is applied to classify the section as speech or non-speech – often this classification rule finds when a value exceeds a certain threshold. There may be some feedback in this sequence, in which the VAD decision is used to improve the noise estimate in the noise reduction stage, or to adaptively vary the threshold(s). These feedback operations improve the VAD performance in non-stationary noise (i.e. when the noise varies a lot). A representative set of recently published VAD methods formulates the decision rule on a frame by frame basis using instantaneous measures of the divergence distance between speech and noise. The different measures which are used in VAD methods include spectral slope, correlation coefficients, log likelihood ratio, cepstral, weighted cepstral, and modified distance measures. Independently from the choice of VAD algorithm, a compromise must be made between having voice detected as noise, or noise detected as voice (between false positive and false negative). A VAD operating in a mobile phone must be able to detect speech in the presence of a range of very diverse types of acoustic background noise. In these difficult detection conditions it is often preferable that a VAD should fail-safe, indicating speech detected when the decision is in doubt, to lower the chance of losing speech segments. The biggest difficulty in the detection of speech in this environment is the very low signal-to-noise ratios (SNRs) that are encountered. It may be impossible to distinguish between speech and noise using simple level detection techniques when parts of the speech utterance are buried below the noise. == Applications == VAD is an integral part of different speech communication systems such as audio conferencing, echo cancellation, speech recognition, speech encoding, speaker recognition and hands-free telephony. In the field of multimedia applications, VAD allows simultaneous voice and data applications. Similarly, in Universal Mobile Telecommunications Systems (UMTS), it controls and reduces the average bit rate and enhances overall coding quality of speech. In cellular radio systems (for instance GSM and CDMA systems) based on Discontinuous Transmission (DTX) mode, VAD is essential for enhancing system capacity by reducing co-channel interference and power consumption in portable digital devices. In speech processing applications, voice activity detection plays an important role since non-speech frames are often discarded. For a wide range of applications such as digital mobile radio, Digital Simultaneous Voice and Data (DSVD) or speech storage, it is desirable to provide a discontinuous transmission of speech-coding parameters. Advantages can include lower average power consumption in mobile handsets, higher average bit rate for simultaneous services like data transmission, or a higher capacity on storage chips. However, the improvement depends mainly on the percentage of pauses during speech and the reliability of the VAD used to detect these intervals. On the one hand, it is advantageous to have a low percentage of speech activity. On the other hand, clipping, that is the loss of milliseconds of active speech, should be minimized to preserve quality. This is the crucial problem for a VAD algorithm under heavy noise conditions. === Use in telemarketing === One controversial application of VAD is in conjunction with predictive dialers used by telemarketing firms. In order to maximize agent productivity, telemarketing firms set up predictive dialers to call more numbers than they have agents available, knowing most calls will end up in either "Ring – No Answer" or answering machines. When a person answers, they typically speak briefly ("Hello", "Good evening", etc.) and then there is a brief period of silence. Answering machine messages are usually 3–15 seconds of continuous speech. By setting VAD parameters correctly, dialers can determine whether a person or a machine answered the call and, if it's a person, transfer the call to an available agent. If it detects an answering machine message, the dialer hangs up. Often, even when the system correctly detects a person answering the call, no agent may be available, resulting in a "silent call". Call screening with a multi-second message like "please say who you are, and I may pick up the phone" will frustrate such automated calls. == Performance evaluation == To evaluate a VAD, its output using test recordings is compared with those of an "ideal" VAD – created by hand-annotating the presence or absence of voice in the recordings. The performance of a VAD is commonly evaluated on the basis of the following four parameters: FEC (Front End Clipping): clipping introduced in passing from noise to speech activity; MSC (Mid Speech Clipping): clipping due to speech misclassified as noise; OVER: noise interpreted as speech due to the VAD flag remaining active in passing from speech activity to noise; NDS (Noise Detected as Speech): noise interpreted as speech within a silence period. Although the method described above provides useful objective information concerning the performance of a VAD, it is only an approximate measure of the subjective effect. For example, the effects of speech signal clipping can at times be hidden by the presence of background noise, depending on the model chosen for the comfort noise synthesis, so some of the clipping measured with objective tests is in reality not audible. It is therefore important to carry out subjective tests on VADs, the main aim of which is to ensure that the clipping perceived is acceptable. In VoIP applications, front-end clipping can be reduced by rewinding to shortly before the detection and sending very slightly delayed data. This kind of test requires a certain number of listeners to judge recordings containing the processing results of the VADs being tested, giving marks to several speech sequences on the following features: Quality; Comprehension difficulty; Audibility of clipping. These marks are then used to calculate average results for each of the features listed above, thus providing a global estimate of the behavior of the VAD being tested. To conclude, whereas objective methods are very useful in an initial stage to evaluate the quality of a VAD, subjective methods are more significant. As they require the participation of several people for a few days, increasing cost, they are generally only used when a proposal is about to be standardized. == Implementations == One early standard VAD is that developed by British Telecom for use in the Pan-European digital cellular mobile telephone service in 1991. It uses inverse filtering trained on non-speech segments to filter out background noise, so that it can then more reliably use a simple power-threshold to decide if a voice is present. The G.729 standard calculates the following features for its VAD: line spectral frequencies, full-band energy, low-band energy (<1 kHz), and zero-crossing rate. It applies a simple classification using a fixed decision boundary in the space defined by these features, and then applies smoothing and adaptive correction to improve the estimate. The GSM standard includes two VAD options developed by ETSI. Option 1 computes the SNR in nine bands and applies a threshold to these values. Option 2 calculates different parameters: channel power, voice metrics, and noise power. It then thresholds the voice metrics using a threshold that varies according to the estimated SNR. The Speex audio compression library uses a procedure named Improved Minima Controlled Recursive Averaging, which uses a smoothed representation of spectral pow

    Read more →
  • BeHafizh

    BeHafizh

    BeHafizh is a mobile application to assist in the effort to memorize Qur'anic verses. The software runs on the Android operating system. This application was made by a team from Gadjah Mada University (UGM) consisting of Farid Amin Ridwanto, Rian Adam Rajagede and Alfian Try Putranto in order to participate in the National Student Musabaqoh Tilawatil Quran (MTQ) held at University of Indonesia (UI) on 1- August 8, 2015. This application then won a gold medal in the branch of Computer Application Design in the competition. == Features == === Audio Player === Audio player, paragraph can be played repeatedly, with pause, and can be done on a certain range of Quranic verses. === Memorization Test === Memorization testing continues users to improve their memorization. Memorization Recorders improves user's ability to recite Quran. === Colour indicators === === Achievements === === Reminders ===

    Read more →
  • BeReal

    BeReal

    BeReal (stylized on the app logo as BeReal.) is a French social-networking app released in 2020, developed by Alexis Barreyat and Kévin Perreau. Currently, it is owned by Voodoo. Its main feature is a daily notification that encourages users to share photos of themselves in their day-to-day life, on any randomly selected two-minute window every day. Critics noted its emphasis on authenticity, which some felt crossed the line into the mundane. The primary reference of its name relates to its focus on users uploading unpolished photos, with it being a pun of the term B-reel. According to the app's description on Apple's App Store, BeReal encourages its users to "show their friends who they really are, for once," by removing filters and opportunities to stage or edit photos. After a couple of years of relative obscurity, it rapidly gained popularity in early and mid-2022 growing from 21.6 million to 73.5 million users between July and August, before experiencing a decrease in use in 2023 and continuing to decline to 23 million users at the beginning of 2024. == History == The app was developed by Alexis Barreyat, a former employee at GoPro, and Kévin Perreau, a graduate from 42 in Paris. Initially released in 2020, it first gained widespread popularity in early 2022. It first spread widely on college campuses, partially due to a paid ambassador program. In late August 2022, the application had over 10 million active daily users and 21.6 million active monthly users. As of February 2023, the app has grown to 13 million active daily users and 47.8 million active monthly users. In June 2021, BeReal received a $30 million funding round led by Andreessen Horowitz and Accel. In May 2022, BeReal secured $85 million in a funding round led by Yuri Milner's DST Global, increasing its valuation to about $600 million. On July 25, 2022, BeReal topped Apple's free app list in the iOS App Store, and remained until September 2022. BeReal also received Apple's iPhone App of the Year in 2022. By late spring 2023, the app's momentum was waning, as daily users dropped to about 6 million, from 15 million in October 2022. In August 2024, there was a resurgence after a campaign at the Paris Olympics 2024, with the app reportedly gaining 1000 users. In June 2024, BeReal was acquired by the French company Voodoo for a reported €500 million. Alexis Barreyat is set to step down after a transition period. == Features == Once per day, BeReal notifies all users that a two-minute window to post is open. It asks users to create a post (known eponymously as a "BeReal") which, using mandatory simultaneous photos and now short videos from both the front and back cameras, provides a visual depiction of what they are doing at that moment, with an option to caption their post. The given window varies from day to day, and is not known to users before the notification is received. Once the daily notification is sent, users lose the ability to see others' BeReals from the previous day. Furthermore, users cannot see any of the current day's BeReals until they upload their own. On-time BeReals show the time it was uploaded, meanwhile, late BeReals uploaded after the two-minute window shows how late the BeReal was taken, but the user has to long-press the BeReal to reveal the time it was uploaded. Other users can also see how many attempts the poster took to take the BeReal, as well as their location when the BeReal was taken. Users only get one chance to delete their BeReal and post another one, and they used to not be able to post more than one at any time. However, in 2023, a feature was added that allowed users to post up to two extra BeReals on days when they posted their first BeReal within the 2-minute window. In July 2024, the number of bonus BeReals was increased to 5. [1] BeReal also features a "Discovery" section, wherein users are given the option to share to a much wider, public audience. This feature, however, is limited, as users are not able to interact with the posts through commenting—unlike the "My Friends" feature. In August 2023, in an attempt to make BeReal more social, another feature was added so that users are now able to see their friends of friends' BeReal. The app reportedly uses HiveAI to automate its image moderation process. However, there is also a report function that allows users to report a photo or another user if they are posting inappropriate content. === Comparison to other platforms === Because of its daily cycle of engagement, it has been compared to Wordle, which gained popularity earlier in 2022. It also supports a platform similar to Snapchat with a theme of impermanence and brevity. BeReal has been described as designed to compete with Instagram while simultaneously de-emphasising social media addiction and overuse. The app does not allow any photo filters or other editing, and has no follower counts. Marketing material from the company said that the app "can be addictive" and that "BeReal won't make you famous." Jacob Arnott, managing director of social agency We the People, describes BeReal as "an anti-Instagram" due to its raw and unedited nature. The app's foundation on friends rather than followers resembles Facebook's platform of adding friends, which comprise the content of a user's feed. This also resembles Instagram's "close friends" story feature. Further, rather than "liking" posts, BeReal uses "RealMojis" which involves taking a photo to interact with other posts. With the popularity of BeReal, other providers have launched similar features. In July 2022, Instagram launched a "Dual Camera" feature similar to BeReal, and in August 2022 it began testing a feature called "IG Candid Challenges", where users are prompted to post once a day within two minutes. As of September 2022, TikTok has also launched a feature called TikTok Now, following the same concept. In December 2022, similar to Spotify's "Wrapped," BeReal launched a feature involving a video of a compilation of users' BeReal posts of 2022. == User characteristics == BeReal is considered to be targeted towards Generation Z users, and attempts to minimise "social media fatigue", a feeling of numbness and disconnection from reality caused by constant interaction with an idealised version of others. This is a "core generational value" that this demographic holds compared to Millennials. Further, BeReal's users have been particularly strong across universities and university-aged students, and the majority of users are in the United States, the United Kingdom, and Germany. In 2022, the majority of users were female, with 43.2% of users falling within the age range of 16 to 25 and 55.1% of users being 26 to 44 years old. BeReal, the platform encourages users to share their real time moments by sending a daily notification that gives a least two minutes to post a unedited photo using bot the front and back camera, although users can post later and retake photos from when the notification happens, this action are still visible to friends, reinforcing transparency and genuine in the moment sharing. == Reception == Jason Koebler, a writer for Vice, wrote that in contrast to Instagram, which presents an unattainable view of people's lives, BeReal instead "makes everyone look extremely boring". Niklas Myhr, a professor of social media at Chapman University, argued that depth of engagement may determine whether the app is a passing trend or has "staying power". Kelsey Weekman, a reporter for BuzzFeed News, noted that the app's unwillingness to "glamorise the banality of life" made it feel "humbling" in its emphasis on authenticity. Niloufar Haidari for The Guardian comments similarly that where the app succeeds in being "drab" in perhaps a positive way, it fails in potentially "un-inspiring" users. Likewise, Dr. Brad Ridout, a behavioral psychologist at the University of Sydney, emphasizes that the "boring" experience is what the creators are targeting for the app and, in response to Instagram's platform of flawlessness, that "perfection is the enemy of happiness". === Criticisms === Some people regularly post after the two-minute notification expires, leading to some criticism of the app, as the ability to post late undermines its aims of authenticity. In addition, BeReal's daily two-minute window has been argued to contribute to social media fatigue and a need for self-exposure, as well as constant access to phones.

    Read more →
  • Level-set method

    Level-set method

    The Level-set method (LSM) is a conceptual framework for using level sets as a tool for numerical analysis of surfaces and shapes. LSM can perform numerical computations involving curves and surfaces on a fixed Cartesian grid without having to parameterize these objects. LSM makes it easier to perform computations on shapes with sharp corners and shapes that change topology (such as by splitting in two or developing holes). These characteristics make LSM effective for modeling objects that vary in time, such as an airbag inflating or a drop of oil floating in water. == Overview == The figure on the right illustrates several ideas about LSM. In the upper left corner is a bounded region with a well-behaved boundary. Below it, the red surface is the graph of a level set function φ {\displaystyle \varphi } determining this shape, and the flat blue region represents the X-Y plane. The boundary of the shape is then the zero-level set of φ {\displaystyle \varphi } , while the shape itself is the set of points in the plane for which φ {\displaystyle \varphi } is positive (interior of the shape) or zero (at the boundary). In the top row, the shape's topology changes as it is split in two. It is challenging to describe this transformation numerically by parameterizing the boundary of the shape and following its evolution. An algorithm can be used to detect the moment the shape splits in two and then construct parameterizations for the two newly obtained curves. On the bottom row, however, the plane at which the level set function is sampled is translated upwards, on which the shape's change in topology is described. It is less challenging to work with a shape through its level-set function rather than with itself directly, in which a method would need to consider all the possible deformations the shape might undergo. Thus, in two dimensions, the level-set method amounts to representing a closed curve Γ {\displaystyle \Gamma } (such as the shape boundary in our example) using an auxiliary function φ {\displaystyle \varphi } , called the level-set function. The curve Γ {\displaystyle \Gamma } is represented as the zero-level set of φ {\displaystyle \varphi } by Γ = { ( x , y ) ∣ φ ( x , y ) = 0 } , {\displaystyle \Gamma =\{(x,y)\mid \varphi (x,y)=0\},} and the level-set method manipulates Γ {\displaystyle \Gamma } implicitly through the function φ {\displaystyle \varphi } . This function φ {\displaystyle \varphi } is assumed to take positive values inside the region delimited by the curve Γ {\displaystyle \Gamma } and negative values outside. == The level-set equation == If the curve Γ {\displaystyle \Gamma } moves in the normal direction with a speed v {\displaystyle v} , then by chain rule and implicit differentiation, it can be determined that the level-set function φ {\displaystyle \varphi } satisfies the level-set equation ∂ φ ∂ t = v | ∇ φ | . {\displaystyle {\frac {\partial \varphi }{\partial t}}=v|\nabla \varphi |.} Here, | ⋅ | {\displaystyle |\cdot |} is the Euclidean norm (denoted customarily by single bars in partial differential equations), and t {\displaystyle t} is time. This is a partial differential equation, in particular a Hamilton–Jacobi equation, and can be solved numerically, for example, by using finite differences on a Cartesian grid. However, the numerical solution of the level set equation may require advanced techniques. Simple finite difference methods fail quickly. Upwinding methods such as the Godunov method are considered better; however, the level set method does not guarantee preservation of the volume and shape of the set level in an advection field that maintains shape and size, for example, a uniform or rotational velocity field. Instead, the shape of the level set may become distorted, and the level set may disappear over a few time steps. Therefore, high-order finite difference schemes, such as high-order essentially non-oscillatory (ENO) schemes, are often required, and even then, the feasibility of long-term simulations is questionable. More advanced methods have been developed to overcome this; for example, combinations of the leveling method with tracking marker particles suggested by the velocity field. == Example == Consider a unit circle in R 2 {\textstyle \mathbb {R} ^{2}} , shrinking in on itself at a constant rate, i.e. each point on the boundary of the circle moves along its inwards pointing normally at some fixed speed. The circle will shrink and eventually collapse down to a point. If an initial distance field is constructed (i.e. a function whose value is the signed Euclidean distance to the boundary, positive interior, negative exterior) on the initial circle, the normalized gradient of this field will be the circle normal. If the field has a constant value subtracted from it in time, the zero level (which was the initial boundary) of the new fields will also be circular and will similarly collapse to a point. This is due to this being effectively the temporal integration of the Eikonal equation with a fixed front velocity. == Applications == In mathematical modeling of combustion, LSM is used to describe the instantaneous flame surface, known as the G equation. Level-set data structures have been developed to facilitate the use of the level-set method in computer applications. Computational fluid dynamics Trajectory planning Optimization Image processing Computational biophysics Discrete complex dynamics (visualization of the parameter plane and the dynamic plane) == History == The level-set method was developed in 1979 by Alain Dervieux, and subsequently popularized by Stanley Osher and James Sethian. It has since become popular in many disciplines, such as image processing, computer graphics, computational geometry, optimization, computational fluid dynamics, and computational biology.

    Read more →
  • Pray.com

    Pray.com

    Pray.com is a Christian social networking service and mobile application designed to facilitate religious communities. Launched in 2016, it was founded by Steve Gatena, Michael Lynn, Ryan Beck and Matthew Potter. The platform offers features for social networking, daily prayers, sermons, biblical content, and podcasts. The COVID-19 pandemic significantly increased Pray.com's user base, with downloads surging by 955%. During this period, the platform collaborated with churches to support virtual ministry services as in-person gatherings were restricted. The Federal Election Commission issued an opinion in 2021 that allows the platform to feature members of the United States Congress. Pray.com serves as a specialized social media platform for religious groups. Congregations can establish their own groups where members and leaders can participate in discussions, livestream services, and manage donations. Additionally, users can join "prayer communities" to post and respond to prayer requests. For those who subscribe to premium services, the platform provides access to biblically-inspired meditations and bedtime stories, and Bible stories for children. Pray.com also produces Radio drama-style productions with notable actors such as Kristen Bell and Blair Underwood narrating biblical stories. == History == === Funding and development === Pray.com has secured significant funding to support its development and growth. In 2017, the platform raised $2 million in seed funding from Science Inc., Greylock Partners, and Spark Capital. This was followed by a Series A funding round in March 2018, in which the company secured an additional $14 million from TPG Growth, Science Inc., and Greylock Partners. Founder Steve Gatena has highlighted difficulties in securing funding, noting some venture capitalists' negative attitudes towards faith-based technology. === Clinical studies === There have been clinical studies on Pray.com. In one study, the app was found to be acceptable and easy to use among racial and ethnic minority groups, with participants reporting improved mental health and well-being. Greater app use was associated with better outcomes, though low and variable usage suggests the need for further research to fully understand its impact. Another study examined Pray.com's impact on mental health by assigning 192 participants to use the app freely, use its meditative prayer function, or not use it at all. Over two months, participants reported overall improvements in mental health and well-being. Although no significant differences were found between groups, greater app usage correlated with better mental health outcomes. This suggests that religiously based mobile apps may help improve mental health and well-being. Another study of pray.com had similar findings. === National Day of Prayer === Pray first hosted a National Day of Prayer event in 2020 when it streamed to nearly one million viewers on Facebook. In 2021, Pray hosted a virtual event for the National Day of Prayer in the United States. The event featured remarks from public figures including United States President Joe Biden and former Vice President Mike Pence. President Biden spoke of his faith and prayed for an end to the COVID-19 pandemic. Biden remarked: "It means the world to me to know that there are people across the country who include Jill and me in their prayers. And I hope you know that you and your families are in our prayers as well. Today I am praying for the end of this great COVID crisis." The event featured musical performances from Gary Valenciano, Brooke Ligertwood from the Christian band Hillsong Worship, Lecrae, Heather Headley and Michael Neale. Other notable speakers included Ronnie Floyd, Ed Young, Mark Driscoll, and Samuel Rodriguez. Pray.com partnered with Sirius XM, DirecTV and Facebook to stream the event across multiple platforms. Pray.com was featured as a pop-up channel on Sirius XM, channel 154, to host the prayer event and celebrate people of all faith. === Partnerships and sponsorships === In 2024, Pray.com partnered with Sting Ray Robb as the primary sponsor for his No. 41 Chevrolet in the 2024 NTT IndyCar Series. The partnership, highlighting Robb's Christian faith, aims to engage younger audiences with faith-based content. The car, featuring Pray.com's branding, was set to debut at the Firestone Grand Prix of St. Petersburg. A partnership with Palantir Technologies for use of its AI systems was also announced in 2024. === Censorship in China === The app was removed from Apple's App Store in China as part of the country's broader efforts to restrict access to religious content. The app was targeted due to China's stringent regulations on religious material, particularly content distributed through digital platforms. The removal aligns with China's ongoing campaign to control online religious expression and maintain state-approved religious activities.

    Read more →
  • Separable filter

    Separable filter

    A separable filter in image processing can be written as product of two more simple filters. Typically a 2-dimensional convolution operation is separated into two 1-dimensional filters. This reduces the computational costs on an N × M {\displaystyle N\times M} image with a m × n {\displaystyle m\times n} filter from O ( M ⋅ N ⋅ m ⋅ n ) {\displaystyle {\mathcal {O}}(M\cdot N\cdot m\cdot n)} down to O ( M ⋅ N ⋅ ( m + n ) ) {\displaystyle {\mathcal {O}}(M\cdot N\cdot (m+n))} . == Examples == 1. A two-dimensional smoothing filter: 1 3 [ 1 1 1 ] ∗ 1 3 [ 1 1 1 ] = 1 9 [ 1 1 1 1 1 1 1 1 1 ] {\displaystyle {\frac {1}{3}}{\begin{bmatrix}1\\1\\1\end{bmatrix}}{\frac {1}{3}}{\begin{bmatrix}1&1&1\end{bmatrix}}={\frac {1}{9}}{\begin{bmatrix}1&1&1\\1&1&1\\1&1&1\end{bmatrix}}} 2. Another two-dimensional smoothing filter with stronger weight in the middle: 1 4 [ 1 2 1 ] ∗ 1 4 [ 1 2 1 ] = 1 16 [ 1 2 1 2 4 2 1 2 1 ] {\displaystyle {\frac {1}{4}}{\begin{bmatrix}1\\2\\1\end{bmatrix}}{\frac {1}{4}}{\begin{bmatrix}1&2&1\end{bmatrix}}={\frac {1}{16}}{\begin{bmatrix}1&2&1\\2&4&2\\1&2&1\end{bmatrix}}} 3. The Sobel operator, used commonly for edge detection: [ 1 2 1 ] ∗ [ 1 0 − 1 ] = [ 1 0 − 1 2 0 − 2 1 0 − 1 ] {\displaystyle {\begin{bmatrix}1\\2\\1\end{bmatrix}}{\begin{bmatrix}1&0&-1\end{bmatrix}}={\begin{bmatrix}1&0&-1\\2&0&-2\\1&0&-1\end{bmatrix}}} This works also for the Prewitt operator. In the examples, there is a cost of 3 multiply–accumulate operations for each vector which gives six total (horizontal and vertical). This is compared to the nine operations for the full 3x3 matrix. Another notable example of a separable filter is the Gaussian blur whose performance can be greatly improved the bigger the convolution window becomes.

    Read more →
  • Shape analysis (digital geometry)

    Shape analysis (digital geometry)

    This article describes shape analysis to analyze and process geometric shapes. == Description == Shape analysis is the (mostly) automatic analysis of geometric shapes, for example using a computer to detect similarly shaped objects in a database or parts that fit together. For a computer to automatically analyze and process geometric shapes, the objects have to be represented in a digital form. Most commonly a boundary representation is used to describe the object with its boundary (usually the outer shell, see also 3D model). However, other volume based representations (e.g. constructive solid geometry) or point based representations (point clouds) can be used to represent shape. Once the objects are given, either by modeling (computer-aided design), by scanning (3D scanner) or by extracting shape from 2D or 3D images, they have to be simplified before a comparison can be achieved. The simplified representation is often called a shape descriptor (or fingerprint, signature). These simplified representations try to carry most of the important information, while being easier to handle, to store and to compare than the shapes directly. A complete shape descriptor is a representation that can be used to completely reconstruct the original object (for example the medial axis transform). == Application fields == Shape analysis is used in many application fields: archeology for example, to find similar objects or missing parts architecture for example, to identify objects that spatially fit into a specific space medical imaging to understand shape changes related to illness or aid surgical planning virtual environments or on the 3D model market to identify objects for copyright purposes security applications such as face recognition entertainment industry (movies, games) to construct and process geometric models or animations computer-aided design and computer-aided manufacturing to process and to compare designs of mechanical parts or design objects. == Shape descriptors == Shape descriptors can be classified by their invariance with respect to the transformations allowed in the associated shape definition. Many descriptors are invariant with respect to congruency, meaning that congruent shapes (shapes that could be translated, rotated and mirrored) will have the same descriptor (for example moment or spherical harmonic based descriptors or Procrustes analysis operating on point clouds). Another class of shape descriptors (called intrinsic shape descriptors) is invariant with respect to isometry. These descriptors do not change with different isometric embeddings of the shape. Their advantage is that they can be applied nicely to deformable objects (e.g. a person in different body postures) as these deformations do not involve much stretching but are in fact near-isometric. Such descriptors are commonly based on geodesic distances measures along the surface of an object or on other isometry invariant characteristics such as the Laplace–Beltrami spectrum (see also spectral shape analysis). There are other shape descriptors, such as graph-based descriptors like the medial axis or the Reeb graph that capture geometric and/or topological information and simplify the shape representation but can not be as easily compared as descriptors that represent shape as a vector of numbers. From this discussion it becomes clear, that different shape descriptors target different aspects of shape and can be used for a specific application. Therefore, depending on the application, it is necessary to analyze how well a descriptor captures the features of interest.

    Read more →