AI App How To Use

AI App How To Use — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • DeepSeek (chatbot)

    DeepSeek (chatbot)

    DeepSeek is a generative artificial intelligence chatbot developed by the Chinese company DeepSeek. Released on 20 January 2025, DeepSeek-R1 surpassed ChatGPT as the most downloaded freeware app on the iOS App Store in the United States by 27 January. DeepSeek's success against larger and more established rivals has been described as "upending AI" and initiating "a global AI space race". DeepSeek's compliance with Chinese government censorship policies and its data collection practices have also raised concerns over privacy and information control in the model, prompting regulatory scrutiny in multiple countries. However, it has also been praised for its open weights and infrastructure code, energy efficiency and contributions to open-source artificial intelligence. == History == On 10 January 2025, DeepSeek released the chatbot, based on the DeepSeek-R1 model, for iOS and Android. By 27 January, DeepSeek-R1 surpassed ChatGPT as the most-downloaded freeware app on the iOS App Store in the United States, which resulted in an 18% drop in Nvidia's share price. And after a "large-scale" cyberattack on the same day disrupted the proper functioning of its servers, DeepSeek had limited its new user registration to phone numbers from mainland China, email addresses, or Google account logins. On 3 April 2025, in collaboration with researchers at Tsinghua University, DeepSeek published a paper unveiling a new model that combines the techniques generative reward modeling (GRM) and self-principled critique tuning (SPCT). The resulting model is referred to as DeepSeek-GRM. The goal of using these techniques is to foster more effective inference-time scaling within their LLM and chatbot services. Notably, DeepSeek has said that these new models will be released and made open source. On 30 April 2025, Deepseek released its math-focused Artificial Intelligence Model named "DeepSeek-Prover-V2-671B". This model is useful for formal theorem proving and mathematical reasoning. On 24 April 2026, DeepSeek released DeepSeek V4 and V4-Pro. == Usage == DeepSeek can answer questions, solve logic problems, and write computer programs on par with other chatbots, according to benchmark tests used by American AI companies. Users can access the chatbot for free through the official DeepSeek website or mobile application, without limitation on the number of queries. DeepSeek only supports user-signup via a global email service, e.g. Gmail, Google or Yahoo. DeepSeek also offers access to the R1 and V3 models that power the chatbot via an API with a usage-based pricing model. This modality is primarily targeted towards developers and businesses. As of February 2025, API usage is priced at approximately $0.28 per million input tokens and $0.42 per million output tokens, making it less expensive than some competing services. Its web version is completely free, with 500 messages per hour cap limit to prevent bots from spamming. == Operation == DeepSeek-V3 uses significantly fewer resources compared to its peers. For example, whereas the world's leading AI companies train their chatbots with supercomputers using as many as 16,000 graphics processing units (GPUs), DeepSeek claims to have needed only about 2,000 GPUs—namely, the H800 series chips from Nvidia. It was trained in around 55 days at a cost of US$5.58 million, which is roughly one-tenth of what tech giant Meta spent building its latest AI technology. == Reactions == DeepSeek's success against larger and more established rivals has been described as "upending AI", constituting "the first shot at what is emerging as a global AI space race", and ushering in "a new era of AI brinkmanship". === Challenge to US AI dominance === DeepSeek's competitive performance at relatively minimal cost has been recognized as potentially challenging the global dominance of American AI models. Various publications and news media, such as The Hill and The Guardian, have described the release of the R1 chatbot as a "Sputnik moment" for American AI, echoing Marc Andreessen's view. OpenAI wrote a letter to the Office of Science and Technology Policy (OSTP), in March 2025, citing issues concerning a possibility that Deepseek could manipulate responses to cause harm. === Chinese perspective === DeepSeek's founder Liang Wenfeng has been compared to OpenAI CEO Sam Altman, with CNN calling him the Sam Altman of China and an evangelist for AI. Chinese state media widely praised DeepSeek as a national asset. On 20 January 2025, Chinese Premier Li Qiang invited Wenfeng to his symposium with experts and asked him to provide opinions and suggestions on a draft for comments of the annual 2024 government work report. On 20 February 2025, Wenfeng met with General Secretary of the Chinese Communist Party Xi Jinping, who encouraged party and state leaders to experiment with DeepSeek. Government officials responded to Xi's approval of the chatbot by reportedly using it to draft legal judgements, propose medical treatment plans, and analyze surveillance videos to search for missing persons. === Performance and success === Leading figures in the American AI sector had mixed reactions to DeepSeek's performance and success. Microsoft CEO Satya Nadella and OpenAI CEO Altman—whose companies are involved in the United States government-backed "Stargate Project" to develop American AI infrastructure—both called DeepSeek "super impressive". Various companies including Amazon Web Services, Toyota, and Stripe are seeking to use the model in their program. When American President Donald Trump announced The Stargate Project, he referred to DeepSeek as a wake-up call and a positive development. Other leaders in the AI field, however—including Scale AI CEO Alexandr Wang, Anthropic cofounder and CEO Dario Amodei, and Elon Musk—have expressed skepticism of the app's performance or of the sustainability of its success. Wang in particularly referred to DeepSeek-V3 as "earth-shattering" and DeepSeek-R1 as "top performing, or roughly on par with the best American models", but speculated that China may possess more AI-powering Nvidia H100 GPUs than thought. === Stock market implications === DeepSeek's optimization of limited resources has highlighted potential limits of United States sanctions on China's AI development, including export restrictions on advanced AI chips to China. The success of the company's AI models consequently "sparked market turmoil" and caused shares in major global technology companies to plunge on 27 January 2025: Nvidia's stock fell by as much as 17–18%, as did the stock of rival Broadcom. Other tech firms also sank, including Microsoft (down 2.5%), Google's owner Alphabet (down over 4%), and Dutch chip equipment maker ASML (down over 7%). A global sell-off of technology stocks on Nasdaq, prompted by the release of the R1 model, led to record losses of about $593 billion in the market capitalizations of AI and computer hardware companies; and by the next day a total of $1 trillion of value was wiped from American stocks. == Concerns == === Distillation === DeepSeek has been reported to sometimes claim that it is ChatGPT. OpenAI said that DeepSeek may have "inappropriately" used outputs from its model as training data in a process called distillation. However, there is currently no method to prove this conclusively. === Censorship === DeepSeek's compliance with Chinese government censorship policies and its data collection practices have raised concerns over information control in the model, prompting regulatory scrutiny in multiple countries. Reports indicate that it applies content moderation in accordance with the government's "public opinion guidance" regulations, limiting responses on topics such as the Tiananmen Square massacre and Taiwan's political status. DeepSeek models that have been uncensored also display a bias towards Chinese government viewpoints on controversial topics such as Xi Jinping's human rights record and Taiwan's political status. However, users who have downloaded the models and hosted them on their own devices and servers have reported successfully removing this censorship. Some sources have observed that the official application programming interface (API) version of R1, which runs from servers located in mainland China, uses censorship mechanisms for topics considered politically sensitive for the government of China. For example, the model may initially generate answers to questions about the 1989 Tiananmen Square massacre, persecution of Uyghurs, comparisons between Xi Jinping and Winnie the Pooh, and human rights in China, but a censorship mechanism deletes the uncensored response afterwards and replaces it with a message such as:"Sorry, that's beyond my current scope. Let's talk about something else." The post hoc censorship mechanisms and restrictions added on top of the model's output can be removed in the open-source version of the R1 model. If the "core Socialist values" defined by the Chinese Internet regul

    Read more →
  • Resolution enhancement technology

    Resolution enhancement technology

    Resolution enhancement technology (RET) is a form of image processing technology used to manipulate dot characteristics popular among laser printer and inkjet printer manufacturers. Closely related RET techniques are also used in VLSI photolithography manufacturing technology, in particular in relation to 90 nanometre technology. Resolution refers to the sharpness of image detail, smoothness of curved lines, and the faithful reproduction of an image. In both cases, RET uses pre-compensation of the image in order to try to mitigate the effects of the printing process. Among the major issues in RET in VLSI technology are the fundamental properties of a wave: amplitude, phase, and direction.

    Read more →
  • Showbox.com

    Showbox.com

    Showbox is an online video streaming platform that enables users to stream and download many videos, commonly movies and TV shows, for free. == History == The company opened the platforms to users who registered from its beta in late 2015. The platform was officially launched in February 2016, enabling any visitor to sign up and create videos online. In April 2016, Showbox was featured on the Product Hunt website, coming to the top of the website's lists for that day and week with over 1400 upvotes from the Product Hunt community. Also in April 2016, Showbox partnered with YouTube's leading multi-channel networks, including Fullscreen, BroadbandTV, StyleHaul, AwesomenessTV, and BuzzMyVideos, to enable their communities of creators to access the platform. In June 2016, the company launched Showbox For Brands, a business-oriented video creation platform, enabling companies to create video content in-house and with their communities and influencers. In March 2017, the company launched Showbox Engage, a use case of its B2B product launched in 2016, enabling companies to launch user-generated content campaigns with their communities. In April 2017, Showbox and the United Nations announced a partnership around the 70th anniversary of the declaration of human rights, with an annual, ongoing global campaign in 135 languages, inviting people worldwide to create their part of the declaration in a video from anywhere around the world. In November 2017, Showbox partnered with the Ad:tech and Digital Marketing World Forum conferences (DMWF) in New York to provide their users and communities with a User Generated Content video solution. == Technology == Showbox's video creation technology includes an online green screen feature, proprietary computer vision algorithms, deep learning technology to support the automatic creation of videos in the cloud, and advanced video composition, including special effects. == Coverage and awards == In March 2015, Showbox was nominated as one of the 10 Israeli startups to take over our TV screens this year. In July 2016, Showbox won the Publicis90 award as part of Publicis' "global initiative to foster digital entrepreneurship". In March 2017, Showbox was chosen as one of The Culture Trip's 10 startups to watch for in 2017.

    Read more →
  • Human–robot interaction

    Human–robot interaction

    Human–robot interaction (HRI) is the study of interactions between humans and robots. Human–robot interaction is a multidisciplinary field with contributions from human–computer interaction, artificial intelligence, robotics, natural language processing, design, psychology and philosophy. A subfield known as physical human–robot interaction (pHRI) has tended to focus on device design to enable people to safely interact with robotic systems. == Origins == Human–robot interaction has been a topic of both science fiction and academic speculation even before any robots existed. Because much of active HRI development depends on natural language processing, many aspects of HRI are continuations of human communications, a field of research which is much older than robotics. The origin of HRI as a discrete problem was stated by 20th-century author Isaac Asimov in 1941, in his novel I, Robot. Asimov coined Three Laws of Robotics, namely: A robot may not injure a human being or, through inaction, allow a human being to come to harm. A robot must obey the orders given it by human beings except where such orders would conflict with the First Law. A robot must protect its own existence as long as such protection does not conflict with the First or Second Laws. These three laws provide an overview of the goals engineers and researchers hold for safety in the HRI field, although the fields of robot ethics and machine ethics are more complex than these three principles. However, generally human–robot interaction prioritizes the safety of humans that interact with potentially dangerous robotics equipment. Solutions to this problem range from the philosophical approach of treating robots as ethical agents (individuals with moral agency), to the practical approach of creating safety zones. These safety zones use technologies such as lidar to detect human presence or physical barriers to protect humans by preventing any contact between machine and operator. Although initially robots in the human–robot interaction field required some human intervention to function, research has expanded this to the extent that fully autonomous systems are now far more common than in the early 2000s. Autonomous systems include from simultaneous localization and mapping systems which provide intelligent robot movement to natural-language processing and natural-language generation systems which allow for natural, human-esque interaction which meet well-defined psychological benchmarks. Anthropomorphic robots (machines which imitate human body structure) are better described by the biomimetics field, but overlap with HRI in many research applications. Examples of robots which demonstrate this trend include Willow Garage's PR2 robot, the NASA Robonaut, and Honda ASIMO. However, robots in the human–robot interaction field are not limited to human-like robots: Paro and Kismet are both robots designed to elicit emotional response from humans, and so fall into the category of human–robot interaction. Goals in HRI range from industrial manufacturing through Cobots, medical technology through rehabilitation, autism intervention, and elder care devices, entertainment, human augmentation, and human convenience. Future research therefore covers a wide range of fields, much of which focuses on assistive robotics, robot-assisted search-and-rescue, and space exploration. == The goal of friendly human–robot interactions == Robots are artificial agents with capacities of perception and action in the physical world often referred by researchers as workspace. Their use has been generalized in factories but nowadays they tend to be found in the most technologically advanced societies in such critical domains as search and rescue, military battle, mine and bomb detection, scientific exploration, law enforcement, entertainment and hospital care. These new domains of applications imply a closer interaction with the user, sharing the workspace but also goals in terms of task achievement. The subfield of physical human–robot interaction (pHRI) has largely focused on device design to enable people to safely interact with robotic systems but is increasingly developing algorithmic approaches in an attempt to support fluent and expressive interactions between humans and robotic systems. With the advance in AI, the research is focusing on one part towards the safest physical interaction but also on a socially correct interaction, dependent on cultural criteria. The goal is to build an intuitive, and easy communication with the robot through speech, gestures, and facial expressions. Kerstin Dautenhahn refers to friendly Human–robot interaction as "Robotiquette" defining it as the "social rules for robot behaviour (a 'robotiquette') that is comfortable and acceptable to humans" The robot has to adapt itself to our way of expressing desires and orders and not the contrary. But every day environments such as homes have much more complex social rules than those implied by factories or even military environments. Thus, the robot needs perceiving and understanding capacities to build dynamic models of its surroundings. It needs to categorize objects, recognize and locate humans and further recognize their emotions. The need for dynamic capacities pushes forward every sub-field of robotics. Furthermore, by understanding and perceiving social cues, robots can enable collaborative scenarios with humans. For example, with the rapid rise of personal fabrication machines such as desktop 3D printers, laser cutters, etc., entering our homes, scenarios may arise where robots can collaboratively share control, co-ordinate and achieve tasks together. Industrial robots have already been integrated into industrial assembly lines and are collaboratively working with humans. The social impact of such robots have been studied and has indicated that workers still treat robots and social entities, rely on social cues to understand and work together. On the other end of HRI research the cognitive modelling of the "relationship" between human and the robots benefits the psychologists and robotic researchers the user study are often of interests on both sides. This research endeavours part of human society. For effective human – humanoid robot interaction numerous communication skills and related features should be implemented in the design of such artificial agents/systems. == General HRI research == HRI research spans a wide range of fields, some general to the nature of HRI. === Methods for perceiving humans === Methods for perceiving humans in the environment are based on sensor information. Research on sensing components and software led by Microsoft provide useful results for extracting the human kinematics (see Kinect). An example of older technique is to use colour information for example the fact that for light skinned people the hands are lighter than the clothes worn. In any case a human modelled a priori can then be fitted to the sensor data. The robot builds or has (depending on the level of autonomy the robot has) a 3D mapping of its surroundings to which is assigned the humans locations. Most methods intend to build a 3D model through vision of the environment. The proprioception sensors permit the robot to have information over its own state. This information is relative to a reference. Theories of proxemics may be used to perceive and plan around a person's personal space. A speech recognition system is used to interpret human desires or commands. By combining the information inferred by proprioception, sensor and speech the human position and state (standing, seated). In this matter, natural-language processing is concerned with the interactions between computers and human (natural) languages, in particular how to program computers to process and analyze large amounts of natural-language data. For instance, neural-network architectures and learning algorithms that can be applied to various natural-language processing tasks including part-of-speech tagging, chunking, named-entity recognition, and semantic role labeling. === Methods for motion planning === Motion planning in dynamic environments is a challenge that can at the moment only be achieved for robots with 3 to 10 degrees of freedom. Humanoid robots or even 2 armed robots, which can have up to 40 degrees of freedom, are unsuited for dynamic environments with today's technology. However lower-dimensional robots can use the potential field method to compute trajectories which avoid collisions with humans. === Cognitive models and theory of mind === Humans exhibit negative social and emotional responses as well as decreased trust toward some robots that closely, but imperfectly, resemble humans; this phenomenon has been termed the "Uncanny Valley". However recent research in telepresence robots has established that mimicking human body postures and expressive gestures has made the robots likeable and engaging in a remote setting. Further, the presence o

    Read more →
  • Actionstep

    Actionstep

    Actionstep is a cloud-based legal practice management software for law firms and compliance-focused businesses. Actionstep is built to be a comprehensive practice management software with features for workflow automation as well as automatic document generation == History == Actionstep was created by Ted Jordan, CEO of Actionstep, in 2004. It was first used commercially in 2005 by a New Zealand construction franchise as well as a law firm. Actionstep soon expanded into central government and a wider range of small business users (mainly in New Zealand and Australia). After a few years the expanse of their legal client base prompted the company to add key legal specific features to the product with the aim of further expanding their legal market. Through Actionstep's tenure as a practice management software they have gradually expanded from their headquarters in New Zealand and offices located in the United Kingdom and the United States of America. In October 2020, private equity firm Serent Capital Partners purchased 84.25% stake in Actionstep. In April 2022, the company announced unlimited annual leave to its staff == Product == The premise of Actionstep is that it saves companies from having to purchase software tailored to their work flow and instead allows companies to modify the program without additional coding.{{Citation needed}} The founder and CEO Ted Jordan used cloud technology to allow the software to be continuously updated without the need to purchase or redesign new software. This theoretically allows businesses to remain current all the time and cut external I.T. costs.{{Citation needed}} Actionstep also integrates with software from other companies, such as Xero accounting, Microsoft Office & Office 365, Gmail, Google Drive, Dropbox, NetDocuments, QuickBooks, LawPay, BundleDocs, Box, HotDocs, Infotrack, GlobalX, PEXA, JOSEF and Zapier. Actionstep contains workflow automation features aimed at increasing office efficiency. These automated processes include automatic task assignment, information collection, document generation & automation, cataloguing, and matter generation. == Awards == Actionstep was named First International Best of SaaS Showplace Award Winner in 2009. Actionstep has also been a finalist in the ComputerWorld Excellence Awards (2007), and the Vero Excellence in Business Support (2010).

    Read more →
  • COVID-19 apps

    COVID-19 apps

    COVID-19 apps include mobile-software applications for digital contact-tracing—i.e. the process of identifying persons ("contacts") who may have been in contact with an infected individual—deployed during the COVID-19 pandemic. Numerous tracing applications have been developed or proposed, with official government support in some territories and jurisdictions. Several frameworks for building contact-tracing apps have been developed. Privacy concerns have been raised, especially about systems that are based on tracking the geographical location of app users. Less overtly intrusive alternatives include the co-option of Bluetooth signals to log a user's proximity to other cellphones. (Bluetooth technology has form in tracking cell-phones' locations.)) On 10 April 2020, Google and Apple jointly announced that they would integrate functionality to support such Bluetooth-based apps directly into their Android and iOS operating systems. India's COVID-19 tracking app Aarogya Setu became the world's fastest growing application—beating Pokémon Go—with 50 million users in the first 13 days of its release. == Rationale == Contact tracing is an important tool in infectious disease control, but as the number of cases rises time constraints make it more challenging to effectively control transmission. Digital contact tracing, especially if widely deployed, may be more effective than traditional methods of contact tracing. In a March 2020 model by the University of Oxford Big Data Institute's Christophe Fraser's team, a coronavirus outbreak in a city of one million people is halted if 80% of all smartphone users take part in a tracking system; in the model, the elderly are still expected to self-isolate en masse, but individuals who are neither symptomatic nor elderly are exempt from isolation unless they receive an alert that they are at risk of carrying the disease. Some proponents advocate for legislation exempting certain COVID-19 apps from general privacy restrictions. == Issues == === Uptake === Ross Anderson, professor of security engineering at Cambridge University, listed a number of potential practical problems with app-based systems, including false positives and the potential lack of effectiveness if takeup of the app is limited to only a small fraction of the population. In Singapore, only one person in three had downloaded the TraceTogether app by the end of June 2020, despite legal requirements for most workers; the app was also underused, as it required users to keep it open at all times on iOS. A team at the University of Oxford simulated the effect of a contact tracing app on a city of 1 million. They estimated that if the app was used in conjunction with the shielding of over-70s, then 56% of the population would have to be using the app for it to suppress the virus. This would be equivalent to 80% of smartphone users in the United Kingdom. They found that the app could still slow the spread of the virus if fewer people downloaded it, with one infection being prevented for every one or two users. In August 2020, the American Civil Liberties Union (ACLU) argued that there were disparities in smartphone use between demographics and minority groups, and that "even the most comprehensive, all-seeing contact tracing system is of little use without social and medical systems in place to help those who may have the virus — including access to medical care, testing, and support for those who are quarantined." === App store restrictions === Addressing concerns about the spread of misleading or harmful apps, Apple, Google and Amazon set limits on which types of organizations could add coronavirus-related apps to its App Store, limiting them to only "official" or otherwise reputable organizations. === Ethical principles of mass surveillance using COVID-19 contact tracing apps === The advent of COVID-19 contact tracing apps has led to concerns around privacy, the rights of app users, and governmental authority. The European Convention on Human Rights, the International Covenant on Civil and Political Rights (ICCPR) and the United Nations and the Siracusa Principles have outlined 4 principles to consider when looking at the ethical principles of mass surveillance with COVID-19 contact tracing apps. These are necessity, proportionality, scientific validity, and time boundedness. Necessity is defined as the idea that governments should only interfere with a person's rights when deemed essential for public health interests. The potential risks associated with infringements of personal privacy must be outweighed by the possibility of reducing significant harm to others. Potential benefits of contact-tracing apps that may be considered include allowing for blanket population-level quarantine measures to be lifted sooner and the minimization of people under quarantine. Hence, some contend that contact-tracing apps are justified as they may be less intrusive than blanket quarantine measures. Furthermore, the delay of an effective contact-tracing app with significant health and economic benefits may be considered unethical. Proportionality refers to the concept that a contact tracing app's potential negative impact on a person's rights should be justifiable by the severity of the health risks that are being addressed. Apps must use the most privacy-preserving options available to achieve their goals, and the selected option should not only be a logical option for achieving the goal but also an effective one. Scientific validity evaluates whether an app is effective, timely and accurate. Traditional manual contact-tracing procedures are not efficient enough for the COVID-19 pandemic, and do not consider asymptomatic transmission. Contact-tracing apps, on the other hand, can be effective COVID-19 contact-tracing tools that reduce R value to less than 1, leading to sustained epidemic suppression. However, for apps to be effective, there needs to be a minimum 56-60% uptake in the population. Apps should be continually modified to reflect current knowledge on the diseases being monitored. Some argue that contact-tracing apps should be considered societal experimental trials where results and adverse effects are evaluated according to the stringent guidelines of social experiments. Analyses should be conducted by independent research bodies and published for wide dissemination. Despite the current urgency of our pandemic situation, we should still adhere to the standard rigors of scientific evaluation. Time boundedness describe the need for establishing legal and technical sunset clauses so that they are only allowed to operate as long as necessary to address the pandemic situation. Apps should be withdrawn as soon as possible after the end of the pandemic. If the end of the pandemic cannot be predicted, the use of apps should be regularly reviewed and decisions about continued use should be made at each review. Collected data should only be retained by public health authorities for research purposes with clear stipulations on how long the data will be held for and who will be responsible for security, oversight, and ownership. === Privacy, discrimination and marginalisation concerns === The American Civil Liberties Union (ACLU) has published a set of principles for technology-assisted contact tracing and Amnesty International and over 100 other organizations issued a statement calling for limits on this kind of surveillance. The organisations declared eight conditions on governmental projects: surveillance would have to be "lawful, necessary and proportionate"; extensions of monitoring and surveillance would have to have sunset clauses; the use of data would have to be limited to COVID-19 purposes; data security and anonymity would have to be protected and shown to be protected based on evidence; digital surveillance would have to address the risk of exacerbating discrimination and marginalisation; any sharing of data with third parties would have to be defined in law; there would have to be safeguards against abuse and the rights of citizens to respond to abuses; "meaningful participation" by all "relevant stakeholders" would be required, including that of public health experts and marginalised groups. The German Chaos Computer Club (CCC) and Reporters Without Borders also issued checklists. The Exposure Notification service intends to address the problem of persistent surveillance by removing the tracing mechanism from their device operating systems once it is no longer needed. On 20 April 2020, it was reported that over 300 academics had signed a statement favouring decentralised proximity tracing applications over centralised models, given the difficulty in precluding centralised options being used "to enable unwarranted discrimination and surveillance." In a centralised model, a central database records the ID codes of meetings between users. In a decentralised model, this information is recorded on individual phones, with the role of the central

    Read more →
  • Super-resolution imaging

    Super-resolution imaging

    Super-resolution imaging (SR) is a class of techniques that improve the resolution of an imaging system. In optical SR the diffraction limit of systems is transcended, while in geometrical SR the resolution of digital imaging sensors is enhanced. In some radar and sonar imaging applications (e.g. magnetic resonance imaging (MRI), high-resolution computed tomography), subspace decomposition-based methods (e.g. MUSIC) and compressed sensing-based algorithms (e.g., SAMV) are employed to achieve SR over standard periodogram algorithm. Super-resolution imaging techniques are used in general image processing and in super-resolution microscopy. == Super-resolution principles == Several concepts are fundamental to super-resolution imaging: Diffraction limit: the capacity of an optical instrument to reproduce the details of an object in an image has limits that are imposed by laws of physics: the diffraction equations in the wave theory of light, or the uncertainty principle for photons in quantum mechanics. Information transfer can never be increased beyond this boundary, but packets outside the limits can be cleverly swapped for (or multiplexed with) some inside it. Super-resolution microscopy does not so much “break” as “circumvent” the diffraction limit. New procedures probing electro-magnetic disturbances at the molecular level (in the so-called near field) remain fully consistent with Maxwell's equations. Spatial frequency domain: A succinct expression of the diffraction limit is given in the spatial frequency domain. In Fourier optics light distributions are expressed as superpositions of a series of grating light patterns in a range of fringe widths - these widths represent the spatial frequencies. It is generally taught that diffraction theory stipulates an upper limit, the cut-off spatial-frequency, beyond which pattern elements fail to be transferred into the optical image, i.e., are not resolved. But in fact what is set by diffraction theory is the width of the passband, not a fixed upper limit. No laws of physics are broken when a spatial frequency band beyond the cut-off spatial frequency is swapped for one inside it: this has long been implemented in dark-field microscopy. Nor are information-theoretical rules broken when superimposing several bands, disentangling them in the received image needs assumptions of object invariance during multiple exposures, i.e., the substitution of one kind of uncertainty for another. Information: When the term super-resolution is used in techniques based on the inference of object details using a statistical treatment of the image within standard resolution limits (for example, averaging multiple exposures), it involves an exchange of one kind of information (extracting signal from noise) for another (the assumption that the target has remained invariant). Recent breakthroughs incorporate quantum-transformer hybrids into super-resolution, such as QUIET‑SR, a 2025 model that employs shifted quantum window attention within a transformer to enhance image detail while respecting diffraction and information-theory limits Similarly, frequency-integrated transformers (e.g., FIT) enrich super-resolution by explicitly combining spatial and frequency-domain information via FFT-based attention, improving reconstruction across scales Resolution and localization: True resolution involves the distinction of whether a target, e.g. a star or a spectral line, is single or double, ordinarily requiring separable peaks in the image. When a target is known to be single, its location can be determined with higher precision than the image width by finding the centroid (center of gravity) of its image light distribution. The word ultra-resolution had been proposed for this process but it did not catch on, and the high-precision localization procedure is typically referred to as super-resolution. == Techniques == === Optical or diffractive super-resolution === Substituting spatial-frequency bands: Though the bandwidth allowable by diffraction is fixed, it can be positioned anywhere in the spatial-frequency spectrum. Dark-field illumination in microscopy is an example. See also aperture synthesis. ==== Multiplexing spatial-frequency bands ==== An image is formed using the normal passband of the optical device. Then, some known light structure (for example, a set of light fringes) is superimposed on the target. The image now contains components resulting from the combination of the target and the superimposed light structure, e.g. moiré fringes, and carries information about target detail which simple unstructured illumination does not. The “superresolved” components, however, need disentangling to be revealed. For an example, see structured illumination (figure to left). ==== Multiple parameter use within traditional diffraction limit ==== If a target has no special polarization or wavelength properties, two polarization states or non-overlapping wavelength regions can be used to encode target details, one in a spatial-frequency band inside the cut-off limit the other beyond it. Both would use normal passband transmission but are then separately decoded to reconstitute target structure with extended resolution. ==== Probing near-field electromagnetic disturbance ==== Super-resolution microscopy is generally discussed within the realm of conventional optical imagery. However, modern technology allows the probing of electromagnetic disturbance within molecular distances of the source, which has superior resolution properties. See also evanescent waves and the development of the new super lens. === Geometrical or image-processing super-resolution === ==== Multi-exposure image noise reduction ==== When an image is degraded by noise, the resolution may be improved by averaging multiple exposures. See example on the right. ==== Single-frame deblurring ==== Known defects in a given imaging situation, such as defocus or aberrations, can sometimes be mitigated in whole or in part by suitable spatial-frequency filtering of even a single image. Such procedures all stay within the diffraction-mandated passband, and do not extend it. ==== Sub-pixel image localization ==== The location of a single source can be determined by computing the "center of gravity" (centroid) of the light distribution extending over several adjacent pixels (see figure on the left). Provided that there is enough light, this can be achieved with arbitrary precision, very much better than pixel width of the detecting apparatus and the resolution limit for the decision of whether the source is single or double. This technique, which requires the presupposition that all the light comes from a single source, is at the basis of what has become known as super-resolution microscopy, e.g. stochastic optical reconstruction microscopy (STORM), where fluorescent probes attached to molecules give nanoscale distance information. It is also the mechanism underlying visual hyperacuity. ==== Bayesian induction beyond traditional diffraction limit ==== Some object features, though beyond the diffraction limit, may be known to be associated with other object features that are within the limits and hence contained in the image. Then conclusions can be drawn, using statistical methods, from the available image data about the presence of the full object. The classical example is Toraldo di Francia's proposition of judging whether an image is that of a single or double star by determining whether its width exceeds the spread from a single star. This can be achieved at separations well below the classical resolution bounds, and requires the prior limitation to the choice "single or double?" The approach can take the form of extrapolating the image in the frequency domain, by assuming that the object is an analytic function, and that we can exactly know the function values in some interval. This method is severely limited by the ever-present noise in digital imaging systems, but it can work for radar, astronomy, microscopy or magnetic resonance imaging. More recently, a fast single image super-resolution algorithm based on a closed-form solution to ℓ 2 − ℓ 2 {\displaystyle \ell _{2}-\ell _{2}} problems has been proposed and demonstrated to accelerate most of the existing Bayesian super-resolution methods significantly. == Aliasing == Geometrical SR reconstruction algorithms are possible if and only if the input low resolution images have been under-sampled and therefore contain aliasing. Because of this aliasing, the high-frequency content of the desired reconstruction image is embedded in the low-frequency content of each of the observed images. Given a sufficient number of observation images, and if the set of observations vary in their phase (i.e. if the images of the scene are shifted by a sub-pixel amount), then the phase information can be used to separate the aliased high-frequency content from the true low-frequency content, and the full-resolution image can be accurate

    Read more →
  • Reverse correlation technique

    Reverse correlation technique

    The reverse correlation technique is a data driven study method used primarily in psychological and neurophysiological research. This method earned its name from its origins in neurophysiology, where cross-correlations between white noise stimuli and sparsely occurring neuronal spikes could be computed quicker when only computing it for segments preceding the spikes. The term has since been adopted in psychological experiments that usually do not analyze the temporal dimension, but also present noise to human participants. In contrast to the original meaning, the term is here thought to reflect that the standard psychological practice of presenting stimuli of defined categories to the participants is "reversed": Instead, the participant's mental representations of categories are estimated from interactions of the presented noise and the behavioral responses. It is used to create composite pictures of individual and/or group mental representations of various items (e.g. faces, bodies, and the self) that depict characteristics of said items (e.g. trustworthiness and self-body image). This technique is helpful when evaluating the mental representations of those with and without mental illnesses. == Terms == This technique utilizes spike-triggered average to explain what areas of signal and noise in an image are valuable for the given research question. Signal is information used to produce objects of value that help explain and connect the world around us. Noise is commonly referred to as unwanted signal that obscures the information that the signal is trying to present. Most importantly for reverse correlation studies, noise is randomly varying information. To determine the areas of importance using reverse correlation, noise is applied to a base image and then evaluated by observers. A base image is any image void of noise that relates to the research question. A base image that has noise superimposed on top is the stimuli that is presented to and evaluated by participants. Each time a new set of stimuli is presented to a participant, this is known as a trial. After a participant has responded to hundreds to thousands of trials, a researcher is ready to create a classification image. A classification image (abbreviated as "CI" in some studies) is a single image that represents the average noise patterns in the images selected by participants. A classification image can also be computed for groups by averaging the individuals’ classification images. These classification images are what researchers use to interpret the data and draw conclusions. As a whole, the reverse correlation method is a process that results in a composite image (from an individual or group) that can be used to estimate and interpret mental representations. == Basic study layout == The reverse correlation method is typically executed as an in-lab computer experiment. This method follows four broad steps. Each of the following steps are described in greater detail below. After creating a research question and determining that the reverse correlation method is the most suitable technique to answer the question, a researcher must (1) design randomly varying stimuli. After the stimuli have been prepared, a researcher should (2) collect data from participants who will see and respond to approximately 300 -1,000 trials. Each trial will either consist of one or two images (side by side) derived from the same base image with noise superimposed on top. Participant responses will depend on the chosen study design; if a researcher presents only one image at a time, participants rate the image on a 4pt scale, but when two images are shown, the participant is asked to choose which best aligns with the given category (e.g. choose the image that looks the most aggressive). Once all of the data is collected, the researcher will (3) compute classification images for each participant and using those images compute group classification images. Finally, with the classification images available, the researcher will (4) evaluate the images and draw conclusions about their results. === Step 1: making stimuli === When designing the stimuli for a reverse correlation study, the two primary factors that one should consider are (1) the base image and (2) the noise that will be used. While not all bases are images per se, the majority are and for this reason the base is typically referred to as a base image. The base image should represent whatever the research question is addressing. For example, if you are interested in peoples’ mental representations of Chinese people, it would not make sense to use a base image of a Spanish or Caucasian person. Again, if you are interested in the mental representations of male vocal patterns, it would make the most sense to use a base vocal pattern that has been produced by a male. Having a base is important because it provides a kind of anchor for participants to work from. When there is no base image, the number of trials that are required increases dramatically, thus making it harder to collect data. While there are studies that have excluded a base image, (e.g. the S study), for more elaborate and nuanced research questions, it is important to have a base image that is a fair representation of what participants are being asked to categorize. Photographs of faces are generally the most popular base image. Although the reverse correlation method is capable of investigating a wide variety of research questions, the most common application of the method is for evaluating faces on a single trait. Reverse correlation studies that address evaluations of the face are sometimes referred to as being a face space reverse correlation model (FSRCM). Thankfully, there are existing databases for face images of varying demographics and emotion that work well as base images. The reverse correlation method can also be used to help researchers identify what areas of an image (e.g. the areas on the face) have diagnostic value. In order to identify these areas of value, researchers start by minimizing the space a participant can pull information from. By imposing a “mask” on an image (e.g. blur an image while leaving random areas un-blurred), this reduces the information individuals might see, and forces them to focus on certain areas. Then, if/when participants are able to correctly identify an image with a trait repeatedly, we can draw conclusions about what areas have diagnostic value. While faces and visual stimuli are the most popular, this is not the only stimuli that can be used in a reverse correlation study. This method was originally designed for auditory stimuli which allows researchers to investigate how perceivers interpret auditory information and create trait based attributions to different sound patterns. For example, by segmenting a vocal recording of a single word (total sound time 426 ms) into six segments (71 ms each), and varying each segment's pitch using Gaussian distributions, researchers were able to uncover what vocal patterns people associated with certain traits. Specifically, this study investigated how listeners rated sound clips of the word “really” as sounding more interrogative (i.e. like the more common reverse correlation studies this study had participants listen to two sound clips per trial, choose which fit the category the best, and then created an average of the pitch contours). Beyond face and auditory perception, research utilizing the reverse correlation method has expanded to investigate how individuals see three-dimensional objects in images with noise (but no signal). After selecting your base image, regardless of what the image is, it is helpful to apply a Gaussian blur to smooth noise in the image. While noise will be applied later, it is helpful to reduce existing noise in the photo before applying your chosen noise. There are three primary choices when it comes to noise: white noise, sine-wave noise, and Gabor noise. The latter two of these constrain the configurations that the noise can have, and because of this white noise is usually the most commonly used. Regardless of the type of noise that is chosen, it is crucial that the noise randomly varies. === Step 2: data collection === Once the stimuli for the study has been developed, the researcher must make a few decisions before actually collecting the data. The researcher must come to a conclusion on how many stimuli will be presented at a time and how many trials the participants will see. In terms of stimuli presentation, a researcher can choose from either a 2-Image Forced Choice (2IFC) or a 4-Alternative Forced Choice (4AFC). The 2IFC presents two images at once (side by side) and requires participants to choose between the two on a specified category (e.g. which image looks the most like a male). Typically the noise from the left image is the mathematical inverse of the noise from the right image. This method was developed to better answer questions that could n

    Read more →
  • Lessac Technologies

    Lessac Technologies

    Lessac Technologies, Inc. (LTI) is an American firm which develops voice synthesis software, licenses technology and sells synthesized novels as MP3 files. The firm currently has seven patents granted and three more pending for its automated methods of converting digital text into human-sounding speech, more accurately recognizing human speech and outputting the text representing the words and phrases of said speech, along with recognizing the speaker's emotional state. The LTI technology is partly based on the work of the late Arthur Lessac, a Professor of Theater at the State University of New York and the creator of Lessac Kinesensic Training, and LTI has licensed exclusive rights to exploit Arthur Lessac's copyrighted works in the fields of speech synthesis and speech recognition. Based on the view that music is speech and speech is music, Lessac's work and books focused on body and speech energies and how they go together. Arthur Lessac's textual annotation system, which was originally developed to assist actors, singers, and orators in marking up scripts to prepare for performance, is adapted in LTI's speech synthesis system as the basic representation of the speech to be synthesized (Lessemes), in contrast to many other systems which use a phonetic representation. LTI's software has two major components: (1) a linguistic front-end that converts plain text to a sequence of prosodic and phonosensory graphic symbols (Lessemes) based on Arthur Lessac's annotation system, which specify the speech units to be synthesized; (2) a signal-processing back-end that takes the Lessemes as acoustic data and produces human-sounding synthesized speech as output, using unit selection and concatenation. LTI's text-to-speech system came in second in the world-wide Blizzard Challenge 2011 and 2012. The first-place team in 2011 also employed LTI's "front-end" technology, but with its own back-end. The Blizzard Challenge, conducted by the Language Technologies Institute of Carnegie Mellon University, was devised as a way to evaluate speech synthesis techniques by having different research groups build voices from the same voice-actor recordings, and comparing the results through listening tests. LTI was founded in 2000 by H. Donald Wilson (chairman), a lawyer, LexisNexis entrepreneur and business associate of Arthur Lessac; and Gary A. Marple (chief inventor), after Marple suggested that Arthur Lessac's kinesensic voice training might be applicable to computational linguistics. After Wilson's death in 2006, his nephew John Reichenbach became the firm's CEO.

    Read more →
  • Pixelmator Pro

    Pixelmator Pro

    Pixelmator Pro is a photo, video, and vector graphic editor developed by Apple for macOS and iPadOS as part of its Pixelmator and pro apps platforms and as a part of their Apple Creator Studio suite of applications. Pixelmator Pro relies heavily on technologies from Apple platforms such as Metal, CoreML, Core Image, AVFoundation, GCD, and SwiftUI. == Features == GPU accelerated with Metal 50+ standard image editing tools Layer-based image editor Video editing support Vector graphic support (including SVG support) AI-powered editing features such as background removal ML Super Resolution and Smart Replace Supports a variety of media formats (JPEG, RAW, Apple ProRAW, PSD, PNG, GIF, MP4, HEIF, etc) == Reception == Pixelmator Pro was generally well-received by reviewers who praised its deep use of machine learning, fully macOS-native design, and relatively affordable one-time purchase compared to subscription software such as Adobe Photoshop. Some reviewers criticized that some features are hard to find or hard to use. It was awarded Apple's Mac App of the Year in 2018. Pixelmator Pro does not have support for panorama stitching. == Acquisition by Apple == On November 1, 2024, the Pixelmator Team announced that they were to be acquired by Apple, subject to regulatory approval. Their site promises "There will be no material changes to the Pixelmator Pro, Pixelmator for iOS, and Photomator apps at this time." The acquisition was completed in February 2025. On January 13, 2026, Apple announced that a new version of Pixelmator Pro with AI features would be included in its new Apple Creator Studio subscription, the app would be brought to the iPad and the Mac app would be redesigned with Liquid Glass. == Version history == == Applescript == In 2020 Pixelmator Pro added the ability to leverage Apple's automation language 'AppleScript' to automate many tasks in version 1.8 (Lynx). This enabled simple and advanced automation activities such as image resize, crop, color adjustments, format change, moving layers around, and more advanced actions like removing background, Gaussian blur, text replacement, shadows, color replacement, etc.

    Read more →
  • Conduit (company)

    Conduit (company)

    Conduit Ltd. is an international software company. From its founding in 2005 to 2013, its most well-known product was the Conduit toolbar, which was widely-described as malware. In 2013, it spun off its toolbar business; today, its main product is a mobile development platform that allows users to create native and web mobile applications for smartphones. == Products == From 2005 to 2013, the company's most well-known product was the Conduit toolbar, which is flagged by most antivirus software as potentially unwanted and adware. Conduit's toolbar software is often downloaded by malware packages from other publishers. The company spun off the toolbar division that manages the Conduit toolbar in 2013. Today, the company's main product is a mobile development platform that allows users to create native and web mobile applications for smartphones. App creation for its App Gallery is free, but it charges a monthly subscription fee to place apps on the App Store or Google Play. == History == Conduit was founded in 2005 by Shilo, Dror Erez, and Gaby Bilcyzk. Between years 2005 and 2013, it ran a successful but controversial toolbar platform business. Conduit was part of the so-called Download Valley companies monetizing free software and downloads by bundling adware. The toolbars were criticized by some as being very difficult to uninstall. The toolbar software was referred to as a "potentially unwanted program" by some in the computer industry because it could be used to change browser settings. The company had more than 400 employees in 2013. In September same year, Conduit spun off its entire website toolbar business division, which combined with Perion Network. After the deal, Conduit shareholders owned 81% of Perion's existing shares and both Perion and Conduit remained independent companies. The substantial size of the Conduit user base allowed Perion to immediately surpass AOL in U.S. searches. In 2015, Conduit announced it would purchase Keeprz, a mobile customer loyalty platform, for $45 million.

    Read more →
  • Sensory, Inc.

    Sensory, Inc.

    Sensory, Inc. is an American company which develops software AI technologies for speech, sound and vision. It is based in Santa Clara, California. Sensory’s technologies have shipped in over three billion products from hundreds of leading consumer electronics manufacturers including AT&T, Hasbro, Huawei, Google, Amazon, Samsung, LG, Mattel, Motorola, Plantronics, GoPro, Sony, Tencent, Garmin, LG, Microsoft, Lenovo, and more. Sensory has over 60 issued patents covering speech recognition in consumer electronics, biometric authentication, sensor/speech combinations, wake word technology, and more. == History == Sensory, Inc. was founded in 1994, originally as Sensory Circuits, by Forrest Mozer, Mike Mozer and Todd Mozer. The three had also co-founded ESS Technology years earlier. In 1999 Sensory acquired Fluent Speech Technologies, which was formed and started by a group of professors out of the Oregon Graduate Institute (formerly OGI, now OHSU). Fluent Speech Technologies developed high performance embedded speech engines, the technology from this acquisition is now the core technology used throughout Sensory's chip and software line. === Company timeline === 1994 – Founded 1995 – Introduces the RSC 164 - first commercially successful speech recognition IC 1998 – Introduces first speaker verification IC 2000 – Acquires Oregon based Fluent-Speech Technologies 2002 – Acquires Texas Instruments line of speech output ICs (the SC series) 2007 – Introduces first Voice User Interface for Bluetooth silicon (CSR BC-5) - BlueGenie 2008 - Sensory and BlueAnt partner on the V1 - Revolutionary new Bluetooth headset with a voice user interface. First wearable to use a voice user interface for control and best-reviewed speech recognition product in history 2009 – Introduced world's smallest text to speech system (TTS) and Truly HandsfreeTM Triggers/ wake words. 2010 – Introduced the NLP-5x – First Natural Language Voice Processor and TrulyHandsfree wake words in SDKs for Android, iOS, Linux, and Windows. NLP5x used the first generation of TrulyHandsfree wake words with low power and enhanced accuracy. 2011 – Sensory partners with Google and Microsoft to enable TrulyHandsfree as a front end to Goog411 and Bing411 2012 – Partnered with Tensilica to offer ultra-low power TrulyHandsfree wake words; introduced Speaker Verification and Speaker Identification for mobile phones and other consumer electronics. 2012 - TrulyHandsfree released into Samsung's Galaxy S2 for "Hey Galaxy" wake word 2013 – TrulyHandsfree wake words migrated to many new platforms and began shipping as MotoVoice in the Google-owned MotoX. Sensory's TrulyHandsfree in mobile takes off with the Galaxy S3 and S4 and Galaxy Note and is licensed into wearables like Google Glass. 2014 – Announced new initiative in Vision; added LG and Motorola as customers; received the 2014 Global Mobile Award for Best Mobile Technology Breakthrough at the GSMA Mobile World Congress in Barcelona, Spain (judges commented, "A big advance for the wearables market, this offers many benefits for consumers, increasing uptake and usage of many mobile apps, driving revenue for operators and content providers.") 2015-2018 - Licensed Google, Amazon, MSFT, Baidu, Huawei, ZTE, and many others with TrulyHandsfree wake words. Sensory develops first wake words for OK Google, Hey Siri, and Hey Cortana. 2019 - Sensory launched two new solutions: SoundID, sound identification, and TrulyNatural, embedded large vocabulary speech recognition. Sensory also acquired Vocalize.ai, an independent testing lab. 2020 - Sensory introduced VoiceHub, which allows the automated generation of wake words. 2021 - Sensory expands VoiceHub with speech recognition and NLU capabilities. The company initiated a new cloud platform, SensoryCloud.ai. 2022-Sensory rolls out SensoryCloud.ai with speech to text, text to speech, face & voice biometrics 2024- Sensory Automotive & TrulyNatural Speech-to-text On-Device launched == Technology and products == Sensory originally developed both hardware (Integrated Circuit - IC or "chip") and software platforms but migrated to software only around 2005 and added cloud and hybrid computing capabilities in 2021. Sensory's RSC-164 IC (Integrated Circuit or "chip") was used on NASA's Mars Polar Lander in the Mars Microphone on the Lander. Speech Synthesis SC-6x chips – acquired some speech synthesis technology from Texas Instruments. Sensory’s embedded AI solutions include the following: TrulyHandsfree (THF) - wake word detection and phrase spotting. TrulyNatural (TNL) - large vocabulary continuous speech recognition with NLU. TrulySecure (TS) - face and voice biometrics. TrulySecureSpeakerVerification (TSSV) - speaker and sound identification. VoiceHub - Online portal for creating custom wake words and speech recognition models with NLU. Sensory Automotive- Sensory Automotive is a full voice and vision suite of AI technologies that operate efficiently in the car without connecting to a network. The cloud initiative, SensoryCloud.ai, is targeting Speech To Text (STT), Text To Speech (TTS), Wake Word verification, face and voice recognition, and sound identification.

    Read more →
  • Shepp–Logan phantom

    Shepp–Logan phantom

    The Shepp–Logan phantom is a standard test image created by Larry Shepp and Benjamin F. Logan for their 1974 paper "The Fourier Reconstruction of a Head Section". It serves as the model of a human head in the development and testing of image reconstruction algorithms. == Definition == The function describing the phantom is defined as the sum of 10 ellipses inside a 2×2 square:

    Read more →
  • Landweber iteration

    Landweber iteration

    The Landweber iteration or Landweber algorithm is an algorithm to solve ill-posed linear inverse problems, and it has been extended to solve non-linear problems that involve constraints. The method was first proposed in the 1950s by Louis Landweber, and it can be now viewed as a special case of many other more general methods. == Basic algorithm == The original Landweber algorithm attempts to recover a signal x from (noisy) measurements y. The linear version assumes that y = A x {\displaystyle y=Ax} for a linear operator A. When the problem is in finite dimensions, A is just a matrix. When A is nonsingular, then an explicit solution is x = A − 1 y {\displaystyle x=A^{-1}y} . However, if A is ill-conditioned, the explicit solution is a poor choice since it is sensitive to any noise in the data y. If A is singular, this explicit solution doesn't even exist. The Landweber algorithm is an attempt to regularize the problem, and is one of the alternatives to Tikhonov regularization. We may view the Landweber algorithm as solving: min x ‖ A x − y ‖ 2 2 / 2 {\displaystyle \min _{x}\|Ax-y\|_{2}^{2}/2} using an iterative method. The algorithm is given by the update x k + 1 = x k − ω A ∗ ( A x k − y ) . {\displaystyle x_{k+1}=x_{k}-\omega A^{}(Ax_{k}-y).} where the relaxation factor ω {\displaystyle \omega } satisfies 0 < ω < 2 / σ 1 2 {\displaystyle 0<\omega <2/\sigma _{1}^{2}} . Here σ 1 {\displaystyle \sigma _{1}} is the largest singular value of A {\displaystyle A} . If we write f ( x ) = ‖ A x − y ‖ 2 2 / 2 {\displaystyle f(x)=\|Ax-y\|_{2}^{2}/2} , then the update can be written in terms of the gradient x k + 1 = x k − ω ∇ f ( x k ) {\displaystyle x_{k+1}=x_{k}-\omega \nabla f(x_{k})} and hence the algorithm is a special case of gradient descent. For ill-posed problems, the iterative method needs to be stopped at a suitable iteration index, because it semi-converges. This means that the iterates approach a regularized solution during the first iterations, but become unstable in further iterations. The reciprocal of the iteration index 1 / k {\displaystyle 1/k} acts as a regularization parameter. A suitable parameter is found, when the mismatch ‖ A x k − y ‖ 2 2 {\displaystyle \|Ax_{k}-y\|_{2}^{2}} approaches the noise level. Using the Landweber iteration as a regularization algorithm has been discussed in the literature. == Nonlinear extension == In general, the updates generated by x k + 1 = x k − τ ∇ f ( x k ) {\displaystyle x_{k+1}=x_{k}-\tau \nabla f(x_{k})} will generate a sequence f ( x k ) {\displaystyle f(x_{k})} that converges to a minimizer of f whenever f is convex and the stepsize τ {\displaystyle \tau } is chosen such that 0 < τ < 2 / ( ‖ ∇ f ‖ 2 ) {\displaystyle 0<\tau <2/(\|\nabla f\|^{2})} where ‖ ⋅ ‖ {\displaystyle \|\cdot \|} is the spectral norm. Since this is special type of gradient descent, there currently is not much benefit to analyzing it on its own as the nonlinear Landweber, but such analysis was performed historically by many communities not aware of unifying frameworks. The nonlinear Landweber problem has been studied in many papers in many communities; see, for example. == Extension to constrained problems == If f is a convex function and C is a convex set, then the problem min x ∈ C f ( x ) {\displaystyle \min _{x\in C}f(x)} can be solved by the constrained, nonlinear Landweber iteration, given by: x k + 1 = P C ( x k − τ ∇ f ( x k ) ) {\displaystyle x_{k+1}={\mathcal {P}}_{C}(x_{k}-\tau \nabla f(x_{k}))} where P {\displaystyle {\mathcal {P}}} is the projection onto the set C. Convergence is guaranteed when 0 < τ < 2 / ( ‖ A ‖ 2 ) {\displaystyle 0<\tau <2/(\|A\|^{2})} . This is again a special case of projected gradient descent (which is a special case of the forward–backward algorithm) as discussed in. == Applications == Since the method has been around since the 1950s, it has been adopted and rediscovered by many scientific communities, especially those studying ill-posed problems. In X-ray computed tomography it is called simultaneous iterative reconstruction technique (SIRT). It has also been used in the computer vision community and the signal restoration community. It is also used in image processing, since many image problems, such as deconvolution, are ill-posed. Variants of this method have been used also in sparse approximation problems and compressed sensing settings.

    Read more →
  • Non-native speech database

    Non-native speech database

    A non-native speech database is a speech database of non-native pronunciations of English. Such databases are used in the development of: multilingual automatic speech recognition systems, text to speech systems, pronunciation trainers, and second language learning systems. == List == The actual table with information about the different databases is shown in Table 2. === Legend === In the table of non-native databases some abbreviations for language names are used. They are listed in Table 1. Table 2 gives the following information about each corpus: The name of the corpus, the institution where the corpus can be obtained, or at least further information should be available, the language which was actually spoken by the speakers, the number of speakers, the native language of the speakers, the total amount of non-native utterances the corpus contains, the duration in hours of the non-native part, the date of the first public reference to this corpus, some free text highlighting special aspects of this database and a reference to another publication. The reference in the last field is in most cases to the paper which is especially devoted to describe this corpus by the original collectors. In some cases it was not possible to identify such a paper. In these cases a paper is referenced which is using this corpus is. Some entries are left blank and others are marked with unknown. The difference here is that blank entries refer to attributes where the value is just not known. Unknown entries, however, indicate that no information about this attribute is available in the database itself. As an example, in the Jupiter weather database no information about the origin of the speakers is given. Therefore this data would be less useful for verifying accent detection or similar issues. Where possible, the name is a standard name of the corpus, for some of the smaller corpora, however, there was no established name and hence an identifier had to be created. In such cases, a combination of the institution and the collector of the database is used. In the case where the databases contain native and non-native speech, only attributes of the non-native part of the corpus are listed. Most of the corpora are collections of read speech. If the corpus instead consists either partly or completely of spontaneous utterances, this is mentioned in the Specials column.

    Read more →