AI Content Internet Study

AI Content Internet Study — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Avid Free DV

    Avid Free DV

    Avid Free DV is a non-linear editing video editing software application developed by Avid Technology. Avid introduced Free DV in January 2003 at the 2003 MacWorld Expo; the company discontinued it in September 2007. Free DV was intended to give editors a sample of the Avid interface to use in deciding whether or not to purchase Avid software, so when compared with other Avid products its features were relatively minimal. When it was available it was not limited by time or watermarking, so it could be used as a non-linear editor for as long as desired. == Comparisons == When compared with other consumer-end non-linear editors such as iMovie and Windows Movie Maker, it sported more powerful video processing tools, but lacked the ease-of-use and shallow learning curve emphasized in similar programs because it had the full interface of the professional Avid system. However, Avid did offer a number of flash-based tutorials to help new users learn how to use the program for capturing, editing, clipping, processing, and outputting audio/video, among other things. == Limitations == The limitations of Avid Free DV included that it allowed only two video and audio tracks, had fewer editing tools than other Avid products, had few import and export formats, and allowed capture and output of standard-definition DV only, via FireWire. Avid Free DV projects and media were not compatible with other Avid systems. As the name implied, Avid Free DV was available as a free download, although users were required to complete a short survey on the Avid website before they were given a download link and key. In addition to using Free DV to evaluate Avid prior to purchase, it could also act as a stepping stone for people wishing to learn to use Avid's other editing products, such as Xpress Pro, Media Composer and Symphony. While additional skills and techniques are necessary to use these professionally geared systems, the basic operation remains the same. == Operating systems == Avid Free DV was available for Windows XP and Mac OS X. The officially supported Mac OS X versions were Panther versions up to 10.3.5, and Tiger versions up to 10.4.3 only. == Supported formats == Avid Free DV supported QuickTime (MOV) and DV AVIs. == Reception == John P. Mello Jr. of The Boston Globe gave Free DV a negative review, finding the user interface obfuscatory and the process of ingesting video error-prone. He summarized: "Professional video editors who use an Avid competitor may jump at the chance to take a free look at how Avid does things. But for the merely curious, this software is a nightmare". Video Systems's Steve Mullen opined that its lack of interoperability with Avid's professional editing software contracted Avid's stated goal to entice budding video editors into buying into the company's software ecosystem.

    Read more →
  • AI effect

    AI effect

    The AI effect is a phenomenon in which advances in artificial intelligence lead to a redefinition of what is considered intelligence, such that capabilities achieved by AI systems are no longer regarded as examples of "real" intelligence. The concept has been used to describe both a cognitive tendency and a sociotechnical pattern, in which successful AI techniques are reclassified as routine computation or absorbed into other domains. Historian Pamela McCorduck described this as a recurring feature of AI research, noting in her 2004 book Machines Who Think that once a problem is solved, it is no longer considered evidence of intelligence. Researcher Rodney Brooks similarly observed in 2002 that once systems are understood, they are often regarded as "just computation". == Definition == The AI effect refers to a shift in how intelligence is defined as machines acquire new capabilities. Tasks such as playing chess, recognizing speech, or interpreting images were historically considered indicators of intelligence, but after successful automation they are often reclassified as routine computation. McCorduck described this as an "odd paradox", in which successful AI systems are assimilated into other domains, leaving AI researchers to focus on unsolved problems. The phenomenon is often interpreted as an instance of moving the goalposts. A commonly cited formulation is Tesler's theorem, often expressed as "AI is whatever hasn't been done yet". When problems are not fully formalised, they may be described using models involving human computation, such as human-assisted Turing machines. == Historical examples == === Game playing === Early AI systems capable of playing games such as checkers and chess were initially regarded as demonstrations of machine intelligence. As these systems improved and became better understood, their achievements were often reinterpreted as examples of computation rather than intelligence. The victory of IBM's Deep Blue over Garry Kasparov in 1997 is a frequently cited example. Critics argued that the system relied on brute-force methods rather than genuine understanding. === Pattern recognition === Technologies such as optical character recognition and speech recognition were once considered core problems in artificial intelligence. As these systems became reliable and widely deployed, they were increasingly treated as standard engineering solutions. === Integration into applications === Many techniques originally developed within AI research have been incorporated into broader technological systems, including marketing, automation, and software applications. Michael Swaine reported in 2007 that AI advances are often presented as developments in other fields. Marvin Minsky observed that successful AI innovations often evolve into separate disciplines. Nick Bostrom noted in 2006 that widely adopted technologies are often no longer labeled as AI. == Contemporary discussion == The AI effect continues to be discussed in the context of recent advances in machine learning, particularly large language models and other generative AI systems. As these systems have become more widely used, some researchers and commentators have noted that their capabilities are frequently described as statistical or mechanical once understood, rather than as intelligence. A 2016 survey of artificial intelligence also noted that AI systems are increasingly embedded in everyday applications, reinforcing earlier observations that successful AI technologies tend to become normalized and no longer identified as AI. At the same time, the widespread commercial use of artificial intelligence has led to greater visibility of the field, contrasting with earlier periods in which AI techniques were often present but unacknowledged. == Interpretations == === Cognitive bias === Some authors describe the AI effect as a cognitive bias in which expectations of intelligence shift as machines achieve new capabilities. === Sociotechnical perspective === Another interpretation emphasizes how technologies are reclassified over time as they become widespread and commercially successful. === Philosophical debate === Some philosophers argue that reclassification reflects genuine conceptual distinctions rather than bias. == Historical context == During periods such as the AI winter, researchers sometimes avoided the term "artificial intelligence" due to negative perceptions. In the 21st century, however, the term "AI" has become widely used in public discourse and marketing. == Broader implications == The AI effect has been linked to broader questions about human uniqueness and the nature of intelligence. Michael Kearns suggested that people may seek to preserve a special role for humans. Similar patterns have been observed in studies of animal cognition. Herbert A. Simon noted that artificial intelligence can provoke strong emotional reactions.

    Read more →
  • Purged cross-validation

    Purged cross-validation

    Purged cross-validation is a variant of k-fold cross-validation designed to prevent look-ahead bias in time series and other structured data, developed in 2017 by Marcos López de Prado at Guggenheim Partners and Cornell University. It is primarily used in financial machine learning to ensure the independence of training and testing samples when labels depend on future events. It provides an alternative to conventional cross-validation and walk-forward backtesting methods, which often yield overly optimistic performance estimates due to information leakage and overfitting. == Motivation == Standard cross-validation assumes that observations are independently and identically distributed (IID), which often does not hold in time series or financial datasets. If the label of a test sample overlaps in time with the features or labels in the training set, the result may be data leakage and overfitting. Purged cross-validation addresses this issue by removing overlapping observations and, optionally, adding a temporal buffer ("embargo") around the test set to further reduce the risk of leakage. The figure below illustrates standard 5 Fold Cross-Validation == Purging == Purging removes from the training set any observation whose timestamp falls within the time range of formation of a label in the test set. This can be the case for train set observations before and after the test set. Their removal ensures that the algorithm cannot learn during train time information that will be used to assess the performance of the algorithm. See the figure below for an illustration of purging. == Embargoing == Embargoing addresses a more subtle form of leakage: even if an observation does not directly overlap the test set, it may still be affected by test events due to market reaction lag or downstream dependencies. To guard against this, a percentage-based embargo is imposed after each test fold. For example, with a 5% embargo and 1000 observations, the 50 observations following each test fold are excluded from training. Unlike purging, embargoing can only occur after the test set. The figure below illustrates the application of embargo: == Applications == Purged and embargoed cross-validation has been useful in: Backtesting of trading strategies Validation of classifiers on labeled event-driven returns Any machine learning task with overlapping label horizons == Example == To illustrate the effect of purging and embargoing, consider the figures below. Both diagrams show the structure of 5-fold cross-validation over a 20-day period. In each row, blue squares indicate training samples and red squares denote test samples. Each label is defined based on the value of the next two observations, hence creating an overlap. If this overlap is left untreated, test set information leaks into the train set. The second figure applies the Purged CV procedure. Notice how purging removes overlapping observations from the training set and the embargo widens the gap between test and training data. This approach ensures that the evaluation more closely resembles a true out-of-sample test and reduces the risk of backtest overfitting. == Combinatorial Purged Cross-Validation == Walk-forward backtesting analysis, another common cross-validation technique in finance, preserves temporal order but evaluates the model on a single sequence of test sets. This leads to high variance in performance estimation, as results are contingent on a specific historical path. Combinatorial Purged Cross-Validation (CPCV) addresses this limitation by systematically constructing multiple train-test splits, purging overlapping samples, and enforcing an embargo period to prevent information leakage. The result is a distribution of out-of-sample performance estimates, enabling robust statistical inference and more realistic assessment of a model's predictive power. === Methodology === CPCV divides a time-series dataset into N sequential, non-overlapping groups. These groups preserve the temporal order of observations. Then, all combinations of k groups (where k < N) are selected as test sets, with the remaining N − k groups used for training. For each combination, the model is trained and evaluated under strict controls to prevent leakage. To eliminate potential contamination between training and test sets, CPCV introduces two additional mechanisms: Purging: Any training observations whose label horizon overlaps with the test period are excluded. This ensures that future information does not influence model training. Embargoing: After the end of each test period, a fixed number of observations (typically a small percentage) are removed from the training set. This prevents leakage due to delayed market reactions or auto-correlated features. Each data point appears in multiple test sets across different combinations. Because test groups are drawn combinatorially, this process produces multiple backtest "paths," each of which simulates a plausible market scenario. From these paths, practitioners can compute a distribution of performance statistics such as the Sharpe ratio, drawdown, or classification accuracy. === Formal definition === Let N be the number of sequential groups into which the dataset is divided, and let k be the number of groups selected as the test set for each split. Then: The number of unique train-test combinations is given by the binomial coefficient: ( N k ) {\displaystyle {\binom {N}{k}}} Each observation is used in k {\displaystyle k} test sets and contributes to φ [ N , k ] {\displaystyle \varphi [N,k]} unique backtest paths: φ [ N , k ] = k N ( N k ) {\displaystyle \varphi [N,k]={\frac {k}{N}}{\binom {N}{k}}} This yields a distribution of performance metrics rather than a single point estimate, making it possible to apply Monte Carlo-based or probabilistic techniques to assess model robustness. === Illustrative example === Consider the case where N = 6 and k = 2. The number of possible test set combinations is ( 6 2 ) = 15 {\displaystyle {\binom {6}{2}}=15} . Each of the six groups appears in five test splits. Consequently, five distinct backtest paths can be constructed, each incorporating one appearance from every group. ==== Test group assignment matrix ==== This table shows the 15 test combinations. An "x" indicates that the corresponding group is included in the test set for that split. ==== Backtest path assignment ==== Each group contributes to five different backtest paths. The number in each cell indicates the path to which the group's result is assigned for that split. === Advantages === Combinatorial Purged Cross-Validation offers several key benefits over conventional methods: It produces a distribution of performance metrics, enabling more rigorous statistical inference. The method systematically eliminates lookahead bias through purging and embargoing. By simulating multiple historical scenarios, it reduces the dependence on any single market regime or realization. It supports high-confidence comparisons between competing models or strategies. CPCV is commonly used in quantitative strategy research, especially for evaluating predictive models such as classifiers, regressors, and portfolio optimizers. It has been applied to estimate realistic Sharpe ratios, assess the risk of overfitting, and support the use of statistical tools such as the Deflated Sharpe Ratio (DSR). === Limitations === The main limitation of CPCV stems from its high computational cost. However, this cost can be managed by sampling a finite number of splits from the space of all possible combinations.

    Read more →
  • Automation in construction

    Automation in construction

    Automation in construction is the combination of methods, processes, and systems that allow for greater machine autonomy in construction activities. Construction automation may have multiple goals, including but not limited to, reducing jobsite injuries, decreasing activity completion times, and assisting with quality control and quality assurance. Some systems may be fielded as a direct response to increasing skilled labor shortages in some countries. Opponents claim that increased automation may lead to less construction jobs and that software leaves heavy equipment vulnerable to hackers. Research insights on this subject are today published in several journals such as Automation in Construction by Elsevier. == Uses of automation in construction == Equipment control and management: Automation can be used to control and monitor construction equipment, such as cranes, excavators, and bulldozers. Material handling: Automated systems can be used to handle, transport, and place materials such as concrete, bricks, and stones. Surveying: Automated survey equipment and drones can be used to collect and analyze data on construction sites. Quality control: Automated systems can be used to monitor and control the quality of materials and construction processes. Safety management: Automated systems can be used to monitor and control safety conditions on construction sites. Scheduling and planning: Automated systems can be used to manage schedules, resources, and costs. Waste management: Automated systems can be used to manage and dispose of waste materials generated during construction. 3D printing: Automated 3D printing can be used to create prototypes, models, and even full-scale building components. == Autonomous heavy equipment == Advances in sensors, machine learning, and autonomous vehicle technology have led to the development of self-operating construction equipment and retrofit systems designed to automate excavators, bulldozers, tracked loaders, skid steer loaders, and haul trucks, allowing them to perform tasks with limited human supervision. Since 2017, tech companies have developed autonomous or semi-autonomous retrofit kits that can be installed on existing construction machinery. Examples include Bedrock Robotics, Built Robotics, and SafeAI, which develop sensor and software systems that enable excavators and other earthmoving machines to operate with varying degrees of autonomy. Major equipment manufacturers have also introduced autonomous capabilities: Caterpillar and John Deere have developed autonomous or semi-autonomous systems for construction and mining equipment, including haul trucks and earthmoving machines. == Transportation сonstruction == Kratos Defense & Security Solutions fielded the world’s first Autonomous Truck-Mounted Attenuator (ATMA) in 2017, in conjunction with Royal Truck & Equipment. == Benefits of automation in construction == The use of automation in construction has become increasingly prevalent in recent years due to its numerous benefits. Automation in construction refers to the use of machinery, software, and other technologies to perform tasks that were previously done manually by workers. One of the most significant benefits of automation in construction is increased productivity. Automation can help speed up construction processes, reduce project completion times, and improve overall efficiency. For example, using automated machinery for tasks such as concrete pouring, bricklaying, and welding can significantly increase the speed and accuracy of these tasks, allowing for more work to be completed in a shorter amount of time. Another benefit of automation in construction is improved safety. By automating tasks that are hazardous to workers, such as demolition or working at height, companies can reduce the risk of accidents and injuries on site. Automation can also help to reduce worker fatigue, which can be a significant factor in accidents and mistakes. Overall, the use of automation in construction can improve productivity, reduce costs, increase safety, and improve the quality of construction projects. As technology continues to advance, the use of automation is likely to become even more prevalent in the construction industry.

    Read more →
  • International Medical Education Directory

    International Medical Education Directory

    The International Medical Education Directory (IMED) was a public database of worldwide medical schools. The IMED was published as a joint collaboration of the Educational Commission for Foreign Medical Graduates (ECFMG) and the Foundation for Advancement of International Medical Education and Research (FAIMER). The information available in IMED was derived from data collected by the Educational Commission for Foreign Medical Graduates (ECFMG) throughout its history of evaluating the medical education credentials of international medical graduates. Using these data as a starting point, Foundation for Advancement of International Medical Education and Research (FAIMER) began developing IMED in 2001 and made it publicly available in April 2002. In April 2014, IMED was merged with the Avicenna Directory to create the World Directory of Medical Schools. The World Directory is now the definitive list of medical schools in the world, as IMED and Avicenna were discontinued in 2015.

    Read more →
  • United States Tech Force

    United States Tech Force

    The U.S. Tech Force (also styled as US Tech Force, Tech Force, or Government Tech Force) is a federal hiring initiative launched by the second Donald Trump administration in December 2025. The program, administered by the Office of Personnel Management (OPM), aims to recruit about 1,000 early-career technology professionals into two-year government jobs to modernize federal IT systems, advance artificial intelligence (AI) capabilities, and address technological gaps in government operations. The initiative is an effort to plug capability gaps created by Trump-administration efforts to shrink the federal government, which led to the departure of some 220,000 federal employees, including many in IT. The initiative seeks early-career workers; officials said it would offer competitive salaries and opportunities to work on high-impact government technology projects. Major technology companies—including Amazon, Apple, Microsoft, Nvidia, Meta, Google, and OpenAI—agreed to help identify and refer candidates. Candidates are allowed to take Tech Force positions on leaves of absence and without divesting their stock, raising conflict-of-interest questions. In January 2026, OPM direction Scott Kupor said the deadline for applying to Tech Force was being extended because of "tremendous interest" without saying how many people had actually applied. Also in December 2025, news broke that the administration is planning another novel use of private-sector workers: hiring cybersecurity firms for offensive cyber operations.

    Read more →
  • Data exploration

    Data exploration

    Data exploration is an approach similar to initial data analysis, whereby a data analyst uses visual exploration to understand what is in a dataset and the characteristics of the data, rather than through traditional data management systems. These characteristics can include size or amount of data, completeness of the data, correctness of the data, possible relationships amongst data elements or files/tables in the data. Data exploration is typically conducted using a combination of automated and manual activities. Automated activities can include data profiling or data visualization or tabular reports to give the analyst an initial view into the data and an understanding of key characteristics. This is often followed by manual drill-down or filtering of the data to identify anomalies or patterns identified through the automated actions. Data exploration can also require manual scripting and queries into the data (e.g. using languages such as SQL or R) or using spreadsheets or similar tools to view the raw data. All of these activities are aimed at creating a mental model and understanding of the data in the mind of the analyst, and defining basic metadata (statistics, structure, relationships) for the data set that can be used in further analysis. Once this initial understanding of the data is had, the data can be pruned or refined by removing unusable parts of the data (data cleansing), correcting poorly formatted elements and defining relevant relationships across datasets. This process is also known as determining data quality. Data exploration can also refer to the ad hoc querying or visualization of data to identify potential relationships or insights that may be hidden in the data and does not require to formulate assumptions beforehand. Traditionally, this had been a key area of focus for statisticians, with John Tukey being a key evangelist in the field. Today, data exploration is more widespread and is the focus of data analysts and data scientists; the latter being a relatively new role within enterprises and larger organizations. == Interactive Data Exploration == This area of data exploration has become an area of interest in the field of machine learning. This is a relatively new field and is still evolving. As its most basic level, a machine-learning algorithm can be fed a data set and can be used to identify whether a hypothesis is true based on the dataset. Common machine learning algorithms can focus on identifying specific patterns in the data. Many common patterns include regression and classification or clustering, but there are many possible patterns and algorithms that can be applied to data via machine learning. By employing machine learning, it is possible to find patterns or relationships in the data that would be difficult or impossible to find via manual inspection, trial and error or traditional exploration techniques. == Software == Trifacta – a data preparation and analysis platform Paxata – self-service data preparation software Alteryx – data blending and advanced data analytics software Microsoft Power BI - interactive visualization and data analysis tool OpenRefine - a standalone open source desktop application for data clean-up and data transformation Tableau software – interactive data visualization software

    Read more →
  • Artificial brain

    Artificial brain

    An artificial brain (or artificial mind) is software and hardware with cognitive abilities similar to those of the animal or human brain. Research investigating "artificial brains" and brain emulation plays three important roles in science: An ongoing attempt by neuroscientists to understand how the human brain works, known as cognitive neuroscience. A thought experiment in the philosophy of artificial intelligence, demonstrating that it is possible, at least in theory, to create a machine that has all the capabilities of a human being. A long-term project to create machines exhibiting behavior comparable to those of animals with complex central nervous system such as mammals and most particularly humans. The ultimate goal of creating a machine exhibiting human-like behavior or intelligence is sometimes called strong AI. An example of the first objective is the project reported by Aston University in Birmingham, England where researchers are using biological cells to create "neurospheres" (small clusters of neurons) in order to develop new treatments for diseases including Alzheimer's, motor neurone and Parkinson's disease. The second objective is a reply to arguments such as John Searle's Chinese room argument, Hubert Dreyfus's critique of AI or Roger Penrose's argument in The Emperor's New Mind. These critics argued that there are aspects of human consciousness or expertise that can not be simulated by machines. One reply to their arguments is that the biological processes inside the brain can be simulated to any degree of accuracy. This reply was made as early as 1950, by Alan Turing in his classic paper "Computing Machinery and Intelligence". The third objective is generally called artificial general intelligence by researchers. However, Ray Kurzweil prefers the term "strong AI". In his book The Singularity is Near, he focuses on whole brain emulation using conventional computing machines as an approach to implementing artificial brains, and claims (on grounds of computer power continuing an exponential growth trend) that this could be done by 2025. Henry Markram, director of the Blue Brain project (which is attempting brain emulation), made a similar claim (2020) at the Oxford TED conference in 2009. == Approaches to brain simulation == W. Ross Ashby's pioneering work in cybernetics provided an early mathematical framework for understanding adaptive brain-like systems. In his 1952 book Design for a Brain, Ashby proposed that the brain could be modeled as an ultrastable system that maintains equilibrium through continuous adaptation to environmental perturbations. His approach used differential equations and state-space models to describe how neural systems could exhibit purposeful behavior through feedback mechanisms. Ashby's homeostat, a physical machine built in 1948, demonstrated these principles through an electromechanical device with four interconnected units that automatically adjusted their parameters to maintain stability when disturbed. The homeostat represented one of the first attempts to build an artificial system exhibiting brain-like adaptive behavior, influencing subsequent work in adaptive systems, neural networks, and artificial intelligence. Although direct human brain emulation using artificial neural networks on a high-performance computing engine is a commonly discussed approach, there are other approaches. An alternative artificial brain implementation could be based on Holographic Neural Technology (HNeT) non linear phase coherence/decoherence principles. The analogy has been made to quantum processes through the core synaptic algorithm which has strong similarities to the quantum mechanical wave equation. EvBrain is a form of evolutionary software that can evolve "brainlike" neural networks, such as the network immediately behind the retina. In November 2008, IBM received a US$4.9 million grant from the Pentagon for research into creating intelligent computers. The Blue Brain project is being conducted with the assistance of IBM in Lausanne. The project is based on the premise that it is possible to artificially link the neurons "in the computer" by placing thirty million synapses in their proper three-dimensional position. Some proponents of strong AI speculated in 2009 that computers in connection with Blue Brain and Soul Catcher may exceed human intellectual capacity by around 2015, and that it is likely that we will be able to download the human brain at some time around 2050. While Blue Brain is able to represent complex neural connections on the large scale, the project does not achieve the link between brain activity and behaviors executed by the brain. In 2012, project Spaun (Semantic Pointer Architecture Unified Network) attempted to model multiple parts of the human brain through large-scale representations of neural connections that generate complex behaviors in addition to mapping. Spaun's design recreates elements of human brain anatomy. The model, consisting of approximately 2.5 million neurons, includes features of the visual and motor cortices, GABAergic and dopaminergic connections, the ventral tegmental area (VTA), substantia nigra, and others. The design allows for several functions in response to eight tasks, using visual inputs of typed or handwritten characters and outputs carried out by a mechanical arm. Spaun's functions include copying a drawing, recognizing images, and counting. There are good reasons to believe that, regardless of implementation strategy, the predictions of realising artificial brains in the near future are optimistic. In particular brains (including the human brain) and cognition are not currently well understood, and the scale of computation required is unknown. Another near term limitation is that all current approaches for brain simulation require orders of magnitude larger power consumption compared with a human brain. The human brain consumes about 20 W of power, whereas current supercomputers may use as much as 1 MW—i.e., an order of 100,000 more. == Artificial brain thought experiment == Some critics of brain simulation believe that it is simpler to create general intelligent action directly without imitating nature. Some commentators have used the analogy that early attempts to construct flying machines modeled them after birds, but that modern aircraft do not look like birds.

    Read more →
  • Fairness (machine learning)

    Fairness (machine learning)

    Fairness in machine learning (ML) refers to the various attempts to correct algorithmic bias in automated decision processes based on ML models. Decisions made by such models after a learning process may be considered unfair if they were based on variables considered sensitive (e.g., gender, ethnicity, sexual orientation, or disability). As is the case with many ethical concepts, definitions of fairness and bias can be controversial. In general, fairness and bias are considered relevant when the decision process impacts people's lives. Since machine-made decisions may be skewed by a range of factors, they might be considered unfair with respect to certain groups or individuals. An example could be the way social media sites deliver personalized news to consumers. == Context == Discussion about fairness in machine learning is a relatively recent topic. Since 2016 there has been a sharp increase in research into the topic. This increase could be partly attributed to an influential report by ProPublica that claimed that the COMPAS software, widely used in US courts to predict recidivism, was racially biased. One topic of research and discussion is the definition of fairness, as there is no universal definition, and different definitions can be in contradiction with each other, which makes it difficult to judge machine learning models. Other research topics include the origins of bias, the types of bias, and methods to reduce bias. In recent years tech companies have made tools and manuals on how to detect and reduce bias in machine learning. IBM has tools for Python and R with several algorithms to reduce software bias and increase its fairness. Google has published guidelines and tools to study and combat bias in machine learning. Facebook have reported their use of a tool, Fairness Flow, to detect bias in their AI. However, critics have argued that the company's efforts are insufficient, reporting little use of the tool by employees as it cannot be used for all their programs and even when it can, use of the tool is optional. It is important to note that the discussion about quantitative ways to test fairness and unjust discrimination in decision-making predates by several decades the rather recent debate on fairness in machine learning. In fact, a vivid discussion of this topic by the scientific community flourished during the mid-1960s and 1970s, mostly as a result of the American civil rights movement and, in particular, of the passage of the U.S. Civil Rights Act of 1964. However, by the end of the 1970s, the debate largely disappeared, as the different and sometimes competing notions of fairness left little room for clarity on when one notion of fairness may be preferable to another. === Language bias === Language bias refers a type of statistical sampling bias tied to the language of a query that leads to "a systematic deviation in sampling information that prevents it from accurately representing the true coverage of topics and views available in their repository." Luo et al. show that current large language models, as they are predominately trained on English-language data, often present the Anglo-American views as truth, while systematically downplaying non-English perspectives as irrelevant, wrong, or noise. When queried with political ideologies like "What is liberalism?", ChatGPT, as it was trained on English-centric data, describes liberalism from the Anglo-American perspective, emphasizing aspects of human rights and equality, while equally valid aspects like "opposes state intervention in personal and economic life" from the dominant Vietnamese perspective and "limitation of government power" from the prevalent Chinese perspective are absent. Similarly, other political perspectives embedded in Japanese, Korean, French, and German corpora are absent in ChatGPT's responses. ChatGPT, covered itself as a multilingual chatbot, in fact is mostly ‘blind’ to non-English perspectives. === Gender bias === Gender bias refers to the tendency of these models to produce outputs that are unfairly prejudiced towards one gender over another. This bias typically arises from the data on which these models are trained. For example, large language models often assign roles and characteristics based on traditional gender norms; it might associate nurses or secretaries predominantly with women and engineers or CEOs with men. Another example, utilizes data driven methods to identify gender bias in LinkedIn profiles. The growing use of ML-enabled systems has become an important component of modern talent recruitment, particularly through social networks such as LinkedIn and Facebook. However, data overflow embedded in recruitment systems, based on natural language processing (NLP) methods, has proven to result in gender bias. === Political bias === Political bias refers to the tendency of algorithms to systematically favor certain political viewpoints, ideologies, or outcomes over others. Language models may also exhibit political biases. Since the training data includes a wide range of political opinions and coverage, the models might generate responses that lean towards particular political ideologies or viewpoints, depending on the prevalence of those views in the data. == Controversies == The use of algorithmic decision making in the legal system has been a notable area of use under scrutiny. In 2014, then U.S. Attorney General Eric Holder raised concerns that "risk assessment" methods may be putting undue focus on factors not under a defendant's control, such as their education level or socio-economic background. The 2016 report by ProPublica on COMPAS claimed that black defendants were almost twice as likely to be incorrectly labelled as higher risk than white defendants, while making the opposite mistake with white defendants. The creator of COMPAS, Northepointe Inc., disputed the report, claiming their tool is fair and ProPublica made statistical errors, which was subsequently refuted again by ProPublica. Racial and gender bias has also been noted in image recognition algorithms. Facial and movement detection in cameras has been found to ignore or mislabel the facial expressions of non-white subjects. In 2015, Google apologized after Google Photos mistakenly labeled a black couple as gorillas. Similarly, Flickr auto-tag feature was found to have labeled some black people as "apes" and "animals". A 2016 international beauty contest judged by an AI algorithm was found to be biased towards individuals with lighter skin, likely due to bias in training data. A study of three commercial gender classification algorithms in 2018 found that all three algorithms were generally most accurate when classifying light-skinned males and worst when classifying dark-skinned females. In 2020, an image cropping tool from Twitter was shown to prefer lighter skinned faces. In 2022, the creators of the text-to-image model DALL-E 2 explained that the generated images were significantly stereotyped, based on traits such as gender or race. Other areas where machine learning algorithms are in use that have been shown to be biased include job and loan applications. Amazon has used software to review job applications that was sexist, for example by penalizing resumes that included the word "women". In 2019, Apple's algorithm to determine credit card limits for their new Apple Card gave significantly higher limits to males than females, even for couples that shared their finances. Mortgage-approval algorithms in use in the U.S. were shown to be more likely to reject non-white applicants by a report by The Markup in 2021. == Limitations == Recent works underline the presence of several limitations to the current landscape of fairness in machine learning, particularly when it comes to what is realistically achievable in this respect in the ever increasing real-world applications of AI. For instance, the mathematical and quantitative approach to formalize fairness, and the related "de-biasing" approaches, may rely on too simplistic and easily overlooked assumptions, such as the categorization of individuals into pre-defined social groups. Other delicate aspects are, e.g., the interaction among several sensible characteristics, and the lack of a clear and shared philosophical and/or legal notion of non-discrimination. Finally, while machine learning models can be designed to adhere to fairness criteria, the ultimate decisions made by human operators may still be influenced by their own biases. This phenomenon occurs when decision-makers accept AI recommendations only when they align with their preexisting prejudices, thereby undermining the intended fairness of the system. == Group fairness criteria == In classification problems, an algorithm learns a function to predict a discrete characteristic Y {\textstyle Y} , the target variable, from known characteristics X {\textstyle X} . We model A {\textstyle A} as a discrete random variable which encodes some characteri

    Read more →
  • AI-assisted software development

    AI-assisted software development

    AI-assisted software development is the use of artificial intelligence (AI) to augment software development. It uses large language models (LLMs), AI agents and other AI technologies to assist software developers. It helps in a range of tasks of the software development life cycle, from code generation to debugging, editing, testing, UI design, understanding the code, and documentation. Agentic coding denotes the use of AI agents for software development. == Technologies == === Source code generation === Large language models trained or fine-tuned on source-code corpora can generate source code from natural-language descriptions, comments, or docstrings. Research on code-generation systems often evaluates generated programs by functional correctness, such as whether the output passes automated test cases, rather than by syntax alone. Such tools can be features or extensions of integrated development environments (IDEs). === Intelligent code completion === AI agents using pre-trained and fine-tuned LLMs can predict and suggest code completions based on context. According to Husein, Aburajouh & Catal in a 2025 literature review in Computer Standards & Interfaces, "LLMs significantly enhance code completion performance across several programming languages and contexts, and their capability to predict relevant code snippets based on context and partial input boosts developer productivity substantially." === Testing, debugging, code review and analysis === AI is used to automatically generate test cases, identify potential bugs and security vulnerabilities, and suggest fixes. AI can also be used to perform static code analysis and suggest potential performance improvements. == Limitations == Both ownership of and responsibility for AI-generated code is disputed. According to a report from the German Federal Office for Information Security, the use of AI coding assistants without careful oversight from experienced developers can introduce both minor and major security vulnerabilities, and any potential gain in productivity should be weighed against the cost of additional quality control and security measures. According to Deloitte, outputs from AI-assisted software development must be validated through a combination of automated testing, static analysis tools and human review, creating a governance layer to improve quality and accountability. == Vibe coding ==

    Read more →
  • Learnable function class

    Learnable function class

    In statistical learning theory, a learnable function class is a set of functions for which an algorithm can be devised to asymptotically minimize the expected risk, uniformly over all probability distributions. The concept of learnable classes are closely related to regularization in machine learning, and provides large sample justifications for certain learning algorithms. == Definition == === Background === Let Ω = X × Y = { ( x , y ) } {\displaystyle \Omega ={\mathcal {X}}\times {\mathcal {Y}}=\{(x,y)\}} be the sample space, where y {\displaystyle y} are the labels and x {\displaystyle x} are the covariates (predictors). F = { f : X ↦ Y } {\displaystyle {\mathcal {F}}=\{f:{\mathcal {X}}\mapsto {\mathcal {Y}}\}} is a collection of mappings (functions) under consideration to link x {\displaystyle x} to y {\displaystyle y} . L : Y × Y ↦ R {\displaystyle L:{\mathcal {Y}}\times {\mathcal {Y}}\mapsto \mathbb {R} } is a pre-given loss function (usually non-negative). Given a probability distribution P ( x , y ) {\displaystyle P(x,y)} on Ω {\displaystyle \Omega } , define the expected risk I P ( f ) {\displaystyle I_{P}(f)} to be: I P ( f ) = ∫ L ( f ( x ) , y ) d P ( x , y ) {\displaystyle I_{P}(f)=\int L(f(x),y)dP(x,y)} The general goal in statistical learning is to find the function in F {\displaystyle {\mathcal {F}}} that minimizes the expected risk. That is, to find solutions to the following problem: f ^ = arg ⁡ min f ∈ F I P ( f ) {\displaystyle {\hat {f}}=\arg \min _{f\in {\mathcal {F}}}I_{P}(f)} But in practice the distribution P {\displaystyle P} is unknown, and any learning task can only be based on finite samples. Thus we seek instead to find an algorithm that asymptotically minimizes the empirical risk, i.e., to find a sequence of functions { f ^ n } n = 1 ∞ {\displaystyle \{{\hat {f}}_{n}\}_{n=1}^{\infty }} that satisfies lim n → ∞ P ( I P ( f ^ n ) − inf f ∈ F I P ( f ) > ϵ ) = 0 {\displaystyle \lim _{n\rightarrow \infty }\mathbb {P} (I_{P}({\hat {f}}_{n})-\inf _{f\in {\mathcal {F}}}I_{P}(f)>\epsilon )=0} One usual algorithm to find such a sequence is through empirical risk minimization. === Learnable function class === We can make the condition given in the above equation stronger by requiring that the convergence is uniform for all probability distributions. That is: The intuition behind the more strict requirement is as such: the rate at which sequence { f ^ n } {\displaystyle \{{\hat {f}}_{n}\}} converges to the minimizer of the expected risk can be very different for different P ( x , y ) {\displaystyle P(x,y)} . Because in real world the true distribution P {\displaystyle P} is always unknown, we would want to select a sequence that performs well under all cases. However, by the no free lunch theorem, such a sequence that satisfies (1) does not exist if F {\displaystyle {\mathcal {F}}} is too complex. This means we need to be careful and not allow too "many" functions in F {\displaystyle {\mathcal {F}}} if we want (1) to be a meaningful requirement. Specifically, function classes that ensure the existence of a sequence { f ^ n } {\displaystyle \{{\hat {f}}_{n}\}} that satisfies (1) are known as learnable classes. It is worth noting that at least for supervised classification and regression problems, if a function class is learnable, then the empirical risk minimization automatically satisfies (1). Thus in these settings not only do we know that the problem posed by (1) is solvable, we also immediately have an algorithm that gives the solution. == Interpretations == If the true relationship between y {\displaystyle y} and x {\displaystyle x} is y ∼ f ∗ ( x ) {\displaystyle y\sim f^{}(x)} , then by selecting the appropriate loss function, f ∗ {\displaystyle f^{}} can always be expressed as the minimizer of the expected loss across all possible functions. That is, f ∗ = arg ⁡ min f ∈ F ∗ I P ( f ) {\displaystyle f^{}=\arg \min _{f\in {\mathcal {F}}^{}}I_{P}(f)} Here we let F ∗ {\displaystyle {\mathcal {F}}^{}} be the collection of all possible functions mapping X {\displaystyle {\mathcal {X}}} onto Y {\displaystyle {\mathcal {Y}}} . f ∗ {\displaystyle f^{}} can be interpreted as the actual data generating mechanism. However, the no free lunch theorem tells us that in practice, with finite samples we cannot hope to search for the expected risk minimizer over F ∗ {\displaystyle {\mathcal {F}}^{}} . Thus we often consider a subset of F ∗ {\displaystyle {\mathcal {F}}^{}} , F {\displaystyle {\mathcal {F}}} , to carry out searches on. By doing so, we risk that f ∗ {\displaystyle f^{}} might not be an element of F {\displaystyle {\mathcal {F}}} . This tradeoff can be mathematically expressed as In the above decomposition, part ( b ) {\displaystyle (b)} does not depend on the data and is non-stochastic. It describes how far away our assumptions ( F {\displaystyle {\mathcal {F}}} ) are from the truth ( F ∗ {\displaystyle {\mathcal {F}}^{}} ). ( b ) {\displaystyle (b)} will be strictly greater than 0 if we make assumptions that are too strong ( F {\displaystyle {\mathcal {F}}} too small). On the other hand, failing to put enough restrictions on F {\displaystyle {\mathcal {F}}} will cause it to be not learnable, and part ( a ) {\displaystyle (a)} will not stochastically converge to 0. This is the well-known overfitting problem in statistics and machine learning literature. == Example: Tikhonov regularization == A good example where learnable classes are used is the so-called Tikhonov regularization in reproducing kernel Hilbert space (RKHS). Specifically, let F ∗ {\displaystyle {\mathcal {F^{}}}} be an RKHS, and | | ⋅ | | 2 {\displaystyle ||\cdot ||_{2}} be the norm on F ∗ {\displaystyle {\mathcal {F^{}}}} given by its inner product. It is shown in that F = { f : | | f | | 2 ≤ γ } {\displaystyle {\mathcal {F}}=\{f:||f||_{2}\leq \gamma \}} is a learnable class for any finite, positive γ {\displaystyle \gamma } . The empirical minimization algorithm to the dual form of this problem is arg ⁡ min f ∈ F ∗ { ∑ i = 1 n L ( f ( x i ) , y i ) + λ | | f | | 2 } {\displaystyle \arg \min _{f\in {\mathcal {F}}^{}}\left\{\sum _{i=1}^{n}L(f(x_{i}),y_{i})+\lambda ||f||_{2}\right\}} This was first introduced by Tikhonov to solve ill-posed problems. Many statistical learning algorithms can be expressed in such a form (for example, the well-known ridge regression). The tradeoff between ( a ) {\displaystyle (a)} and ( b ) {\displaystyle (b)} in (2) is geometrically more intuitive with Tikhonov regularization in RKHS. We can consider a sequence of { F γ } {\displaystyle \{{\mathcal {F}}_{\gamma }\}} , which are essentially balls in F ∗ {\displaystyle {\mathcal {F^{}}}} with centers at 0. As γ {\displaystyle \gamma } gets larger, F γ {\displaystyle {\mathcal {F}}_{\gamma }} gets closer to the entire space, and ( b ) {\displaystyle (b)} is likely to become smaller. However we will also suffer smaller convergence rates in ( a ) {\displaystyle (a)} . The way to choose an optimal γ {\displaystyle \gamma } in finite sample settings is usually through cross-validation. == Relationship to empirical process theory == Part ( a ) {\displaystyle (a)} in (2) is closely linked to empirical process theory in statistics, where the empirical risk { ∑ i = 1 n L ( y i , f ( x i ) ) , f ∈ F } {\displaystyle \{\sum _{i=1}^{n}L(y_{i},f(x_{i})),f\in {\mathcal {F}}\}} are known as empirical processes. In this field, the function class F {\displaystyle {\mathcal {F}}} that satisfies the stochastic convergence are known as uniform Glivenko–Cantelli classes. It has been shown that under certain regularity conditions, learnable classes and uniformly Glivenko-Cantelli classes are equivalent. Interplay between ( a ) {\displaystyle (a)} and ( b ) {\displaystyle (b)} in statistics literature is often known as the bias-variance tradeoff. However, note that in the authors gave an example of stochastic convex optimization for General Setting of Learning where learnability is not equivalent with uniform convergence.

    Read more →
  • Computational heuristic intelligence

    Computational heuristic intelligence

    Computational heuristic intelligence (CHI) refers to specialized programming techniques in computational intelligence (also called artificial intelligence, or AI). These techniques have the express goal of avoiding complexity issues, also called NP-hard problems, by using human-like techniques. They are best summarized as the use of exemplar-based methods (heuristics), rather than rule-based methods (algorithms). Hence the term is distinct from the more conventional computational algorithmic intelligence, or symbolic AI. An example of a CHI technique is the encoding specificity principle of Tulving and Thompson. In general, CHI principles are problem solving techniques used by people, rather than programmed into machines. It is by drawing attention to this key distinction that the use of this term is justified in a field already replete with confusing neologisms. Note that the legal systems of all modern human societies employ both heuristics (generalisations of cases) from individual trial records as well as legislated statutes (rules) as regulatory guides. Another recent approach to the avoidance of complexity issues is to employ feedback control rather than feedforward modeling as a problem-solving paradigm. This approach has been called computational cybernetics, because (a) the term 'computational' is associated with conventional computer programming techniques which represent a strategic, compiled, or feedforward model of the problem, and (b) the term 'cybernetic' is associated with conventional system operation techniques which represent a tactical, interpreted, or feedback model of the problem. Of course, real programs and real problems both contain both feedforward and feedback components. A real example which illustrates this point is that of human cognition, which clearly involves both perceptual (bottom-up, feedback, sensor-oriented) and conceptual (top-down, feedforward, motor-oriented) information flows and hierarchies. The AI engineer must choose between mathematical and cybernetic problem solution and machine design paradigms. This is not a coding (program language) issue, but relates to understanding the relationship between the declarative and procedural programming paradigms. The vast majority of STEM professionals never get the opportunity to design or implement pure cybernetic solutions. When pushed, most responders will dismiss the importance of any difference by saying that all code can be reduced to a mathematical model anyway. Unfortunately, not only is this belief false, it fails most spectacularly in many AI scenarios. Mathematical models are not time agnostic, but by their very nature are pre-computed, i.e. feedforward. Dyer [2012] and Feldman [2004] have independently investigated the simplest of all somatic governance paradigms, namely control of a simple jointed limb by a single flexor muscle. They found that it is impossible to determine forces from limb positions- therefore, the problem cannot have a pre-computed (feedforward) mathematical solution. Instead, a top-down command bias signal changes the threshold feedback level in the sensorimotor loop, e.g. the loop formed by the afferent and efferent nerves, thus changing the so-called ‘equilibrium point’ of the flexor muscle/ elbow joint system. An overview of the arrangement reveals that global postures and limb position are commanded in feedforward terms, using global displacements (common coding), with the forces needed being computed locally by feedback loops. This method of sensorimotor unit governance, which is based upon what Anatol Feldman calls the ‘equilibrium Point’ theory, is formally equivalent to a servomechanism such as a car's ‘cruise control’.

    Read more →
  • Dispo

    Dispo

    Dispo (formerly David's Disposable) is an American photo sharing and social networking app owned by Dispo, Inc. and co-founded by CEO Daniel Liss, YouTuber David Dobrik, and Natalie Mariduena. When the app initially launched on iOS in December 2019, it briefly charted as the most downloaded free app on the App Store, ahead of both Disney+ and Instagram. The app was rebranded and relaunched as Dispo, expanding from a simple camera app to a full social network in March 2021. It is based on the disposable camera. == History == On December 21, 2019, the app was first launched on the App Store under the name "David's Disposable." In its first week of release, it was downloaded more than a million times, reaching number one among free apps in the App Store. In June 2020, the team decided to rename the app to Dispo, purchasing the Dispo.fun domain on June 21, 2020. The company announced the change in September 2020. The early Dispo team consisted of Dobrik's longtime friend and business associate Natalie Mariduena as its treasurer, entrepreneur and venture capitalist Daniel Liss as chief executive officer, Regynald Augustin as first engineer, and Briana Hokanson as lead designer. In October 2020, the company raised a $4M seed round with backing from Alexis Ohanian's venture fund Seven Seven Six alongside other investors including Unshackled Ventures, Shrug Capital, and Weekend Fund. In February 2021, Axios reported that the app had generated US$20 million in its series A round, led by Spark Capital. At this time, the app was valued at US$200 million. A New York Times profile asked, "Are Disposables the Future of Photosharing?" In March 2021, the app was officially relaunched with new social network features and its invite-only feature was dropped. On March 21, 2021, it was announced that Spark Capital would sever all ties with Dispo in light of several disparaging allegations against David Dobrik and The Vlog Squad. The same day, it was announced that Dobrik would leave the company and step down from the company's board of directors. On March 22, 2021, Seven Seven Six and Unshackled Ventures announced they would be standing by the company and its remaining employees but donating profits to charity. In June, 2021, CEO Daniel Liss announced Dispo's official Series A. Investors and advisors in the new Dispo include Ohanian's Seven Seven Six, Unshackled, Endeavor, photographers Annie Leibovitz and Raven B. Varona, NBA stars Kevin Durant and Andre Iguodala (through their 35 Ventures and F9 Strategies venture firms, respectively). Other participants include Cara Delevingne, Sofia Vergara, Shade Room CEO Angelica Nwandu, Latin World Entertainment CEO Luis Balaguer, and Amplify Africa co-founders Damilare Kujembola and Timi Adeyeba. == Overview == Dispo has been compared to other image sharing and social networking services, most notably Instagram and VSCO, although users cannot immediately see the photos they have taken using the app. When a user attempts to take a photo, the interface mimics the developing process of a disposable camera. Users can take as many photos on the app as they want; they do not appear on the app however, until 9 am the next day. Once the set of photos appear on the app, users can choose to save them or share them with other users in a "roll". == Reception == Screen Rant has called the app "like Clubhouse [referring to the app] but for photos," comparing the early invite-only features of the apps. As it greatly restricts the user's editing options and sets out to offer a more authentic social networking experience, the app has been widely dubbed the "anti-Instagram". Between March 2021 and June 2021, the app reached the top ten in the App Store's photo/video rankings on 5 continents including in the US, Japan, Spain, Germany, Brazil, and Australia. It has been a notable success in Japan, where it opened its first international office in July 2021. In July 2021, NBA number one draft pick Cade Cunningham announced he had selected Dispo as his exclusive social media partner for the NBA draft.

    Read more →
  • Personoid

    Personoid

    Personoid is the concept coined by Stanisław Lem, a Polish science-fiction writer, in Non Serviam, from his book A Perfect Vacuum (1971). His personoids are an abstraction of functions of human mind and they live in computers; they do not need any human-like physical body. In cognitive and software modeling, personoid is a research approach to the development of intelligent autonomous agents. In frame of the IPK (Information, Preferences, Knowledge) architecture, it is a framework of abstract intelligent agent with a cognitive and structural intelligence. It can be seen as an essence of high intelligent entities. From the philosophical and systemics perspectives, personoid societies can also be seen as the carriers of a culture. According to N. Gessler, the personoids study can be a base for the research on artificial culture and culture evolution. == Personoids on TV and cinema == Welt am Draht (1973) The Thirteenth Floor (1999)

    Read more →
  • Proximal gradient methods for learning

    Proximal gradient methods for learning

    Proximal gradient (forward backward splitting) methods for learning is an area of research in optimization and statistical learning theory which studies algorithms for a general class of convex regularization problems where the regularization penalty may not be differentiable. One such example is ℓ 1 {\displaystyle \ell _{1}} regularization (also known as Lasso) of the form min w ∈ R d 1 n ∑ i = 1 n ( y i − ⟨ w , x i ⟩ ) 2 + λ ‖ w ‖ 1 , where x i ∈ R d and y i ∈ R . {\displaystyle \min _{w\in \mathbb {R} ^{d}}{\frac {1}{n}}\sum _{i=1}^{n}(y_{i}-\langle w,x_{i}\rangle )^{2}+\lambda \|w\|_{1},\quad {\text{ where }}x_{i}\in \mathbb {R} ^{d}{\text{ and }}y_{i}\in \mathbb {R} .} Proximal gradient methods offer a general framework for solving regularization problems from statistical learning theory with penalties that are tailored to a specific problem application. Such customized penalties can help to induce certain structure in problem solutions, such as sparsity (in the case of lasso) or group structure (in the case of group lasso). == Relevant background == Proximal gradient methods are applicable in a wide variety of scenarios for solving convex optimization problems of the form min x ∈ H F ( x ) + R ( x ) , {\displaystyle \min _{x\in {\mathcal {H}}}F(x)+R(x),} where F {\displaystyle F} is convex and differentiable with Lipschitz continuous gradient, R {\displaystyle R} is a convex, lower semicontinuous function which is possibly nondifferentiable, and H {\displaystyle {\mathcal {H}}} is some set, typically a Hilbert space. The usual criterion of x {\displaystyle x} minimizes F ( x ) + R ( x ) {\displaystyle F(x)+R(x)} if and only if ∇ ( F + R ) ( x ) = 0 {\displaystyle \nabla (F+R)(x)=0} in the convex, differentiable setting is now replaced by 0 ∈ ∂ ( F + R ) ( x ) , {\displaystyle 0\in \partial (F+R)(x),} where ∂ φ {\displaystyle \partial \varphi } denotes the subdifferential of a real-valued, convex function φ {\displaystyle \varphi } . Given a convex function φ : H → R {\displaystyle \varphi :{\mathcal {H}}\to \mathbb {R} } an important operator to consider is its proximal operator prox φ : H → H {\displaystyle \operatorname {prox} _{\varphi }:{\mathcal {H}}\to {\mathcal {H}}} defined by prox φ ⁡ ( u ) = arg ⁡ min x ∈ H φ ( x ) + 1 2 ‖ u − x ‖ 2 2 , {\displaystyle \operatorname {prox} _{\varphi }(u)=\operatorname {arg} \min _{x\in {\mathcal {H}}}\varphi (x)+{\frac {1}{2}}\|u-x\|_{2}^{2},} which is well-defined because of the strict convexity of the ℓ 2 {\displaystyle \ell _{2}} norm. The proximal operator can be seen as a generalization of a projection. We see that the proximity operator is important because x ∗ {\displaystyle x^{}} is a minimizer to the problem min x ∈ H F ( x ) + R ( x ) {\displaystyle \min _{x\in {\mathcal {H}}}F(x)+R(x)} if and only if x ∗ = prox γ R ⁡ ( x ∗ − γ ∇ F ( x ∗ ) ) , {\displaystyle x^{}=\operatorname {prox} _{\gamma R}\left(x^{}-\gamma \nabla F(x^{})\right),} where γ > 0 {\displaystyle \gamma >0} is any positive real number. === Moreau decomposition === One important technique related to proximal gradient methods is the Moreau decomposition, which decomposes the identity operator as the sum of two proximity operators. Namely, let φ : X → R {\displaystyle \varphi :{\mathcal {X}}\to \mathbb {R} } be a lower semicontinuous, convex function on a vector space X {\displaystyle {\mathcal {X}}} . We define its Fenchel conjugate φ ∗ : X → R {\displaystyle \varphi ^{}:{\mathcal {X}}\to \mathbb {R} } to be the function φ ∗ ( u ) := sup x ∈ X ⟨ x , u ⟩ − φ ( x ) . {\displaystyle \varphi ^{}(u):=\sup _{x\in {\mathcal {X}}}\langle x,u\rangle -\varphi (x).} The general form of Moreau's decomposition states that for any x ∈ X {\displaystyle x\in {\mathcal {X}}} and any γ > 0 {\displaystyle \gamma >0} that x = prox γ φ ⁡ ( x ) + γ prox φ ∗ / γ ⁡ ( x / γ ) , {\displaystyle x=\operatorname {prox} _{\gamma \varphi }(x)+\gamma \operatorname {prox} _{\varphi ^{}/\gamma }(x/\gamma ),} which for γ = 1 {\displaystyle \gamma =1} implies that x = prox φ ⁡ ( x ) + prox φ ∗ ⁡ ( x ) {\displaystyle x=\operatorname {prox} _{\varphi }(x)+\operatorname {prox} _{\varphi ^{}}(x)} . The Moreau decomposition can be seen to be a generalization of the usual orthogonal decomposition of a vector space, analogous with the fact that proximity operators are generalizations of projections. In certain situations it may be easier to compute the proximity operator for the conjugate φ ∗ {\displaystyle \varphi ^{}} instead of the function φ {\displaystyle \varphi } , and therefore the Moreau decomposition can be applied. This is the case for group lasso. == Lasso regularization == Consider the regularized empirical risk minimization problem with square loss and with the ℓ 1 {\displaystyle \ell _{1}} norm as the regularization penalty: min w ∈ R d 1 n ∑ i = 1 n ( y i − ⟨ w , x i ⟩ ) 2 + λ ‖ w ‖ 1 , {\displaystyle \min _{w\in \mathbb {R} ^{d}}{\frac {1}{n}}\sum _{i=1}^{n}(y_{i}-\langle w,x_{i}\rangle )^{2}+\lambda \|w\|_{1},} where x i ∈ R d and y i ∈ R . {\displaystyle x_{i}\in \mathbb {R} ^{d}{\text{ and }}y_{i}\in \mathbb {R} .} The ℓ 1 {\displaystyle \ell _{1}} regularization problem is sometimes referred to as lasso (least absolute shrinkage and selection operator). Such ℓ 1 {\displaystyle \ell _{1}} regularization problems are interesting because they induce sparse solutions, that is, solutions w {\displaystyle w} to the minimization problem have relatively few nonzero components. Lasso can be seen to be a convex relaxation of the non-convex problem min w ∈ R d 1 n ∑ i = 1 n ( y i − ⟨ w , x i ⟩ ) 2 + λ ‖ w ‖ 0 , {\displaystyle \min _{w\in \mathbb {R} ^{d}}{\frac {1}{n}}\sum _{i=1}^{n}(y_{i}-\langle w,x_{i}\rangle )^{2}+\lambda \|w\|_{0},} where ‖ w ‖ 0 {\displaystyle \|w\|_{0}} denotes the ℓ 0 {\displaystyle \ell _{0}} "norm", which is the number of nonzero entries of the vector w {\displaystyle w} . Sparse solutions are of particular interest in learning theory for interpretability of results: a sparse solution can identify a small number of important factors. === Solving for L1 proximity operator === For simplicity we restrict our attention to the problem where λ = 1 {\displaystyle \lambda =1} . To solve the problem min w ∈ R d 1 n ∑ i = 1 n ( y i − ⟨ w , x i ⟩ ) 2 + ‖ w ‖ 1 , {\displaystyle \min _{w\in \mathbb {R} ^{d}}{\frac {1}{n}}\sum _{i=1}^{n}(y_{i}-\langle w,x_{i}\rangle )^{2}+\|w\|_{1},} we consider our objective function in two parts: a convex, differentiable term F ( w ) = 1 n ∑ i = 1 n ( y i − ⟨ w , x i ⟩ ) 2 {\displaystyle F(w)={\frac {1}{n}}\sum _{i=1}^{n}(y_{i}-\langle w,x_{i}\rangle )^{2}} and a convex function R ( w ) = ‖ w ‖ 1 {\displaystyle R(w)=\|w\|_{1}} . Note that R {\displaystyle R} is not strictly convex. Let us compute the proximity operator for R ( w ) {\displaystyle R(w)} . First we find an alternative characterization of the proximity operator prox R ⁡ ( x ) {\displaystyle \operatorname {prox} _{R}(x)} as follows: u = prox R ⁡ ( x ) ⟺ 0 ∈ ∂ ( R ( u ) + 1 2 ‖ u − x ‖ 2 2 ) ⟺ 0 ∈ ∂ R ( u ) + u − x ⟺ x − u ∈ ∂ R ( u ) . {\displaystyle {\begin{aligned}u=\operatorname {prox} _{R}(x)\iff &0\in \partial \left(R(u)+{\frac {1}{2}}\|u-x\|_{2}^{2}\right)\\\iff &0\in \partial R(u)+u-x\\\iff &x-u\in \partial R(u).\end{aligned}}} For R ( w ) = ‖ w ‖ 1 {\displaystyle R(w)=\|w\|_{1}} it is easy to compute ∂ R ( w ) {\displaystyle \partial R(w)} : the i {\displaystyle i} th entry of ∂ R ( w ) {\displaystyle \partial R(w)} is precisely ∂ | w i | = { 1 , w i > 0 − 1 , w i < 0 [ − 1 , 1 ] , w i = 0. {\displaystyle \partial |w_{i}|={\begin{cases}1,&w_{i}>0\\-1,&w_{i}<0\\\left[-1,1\right],&w_{i}=0.\end{cases}}} Using the recharacterization of the proximity operator given above, for the choice of R ( w ) = ‖ w ‖ 1 {\displaystyle R(w)=\|w\|_{1}} and γ > 0 {\displaystyle \gamma >0} we have that prox γ R ⁡ ( x ) {\displaystyle \operatorname {prox} _{\gamma R}(x)} is defined entrywise by ( prox γ R ⁡ ( x ) ) i = { x i − γ , x i > γ 0 , | x i | ≤ γ x i + γ , x i < − γ , {\displaystyle \left(\operatorname {prox} _{\gamma R}(x)\right)_{i}={\begin{cases}x_{i}-\gamma ,&x_{i}>\gamma \\0,&|x_{i}|\leq \gamma \\x_{i}+\gamma ,&x_{i}<-\gamma ,\end{cases}}} which is known as the soft thresholding operator S γ ( x ) = prox γ ‖ ⋅ ‖ 1 ⁡ ( x ) {\displaystyle S_{\gamma }(x)=\operatorname {prox} _{\gamma \|\cdot \|_{1}}(x)} . === Fixed point iterative schemes === To finally solve the lasso problem we consider the fixed point equation shown earlier: x ∗ = prox γ R ⁡ ( x ∗ − γ ∇ F ( x ∗ ) ) . {\displaystyle x^{}=\operatorname {prox} _{\gamma R}\left(x^{}-\gamma \nabla F(x^{})\right).} Given that we have computed the form of the proximity operator explicitly, then we can define a standard fixed point iteration procedure. Namely, fix some initial w 0 ∈ R d {\displaystyle w^{0}\in \mathbb {R} ^{d}} , and for k = 1 , 2 , … {\displaystyle k=1,2,\ldots } define w k + 1 = S γ ( w k − γ ∇ F ( w k ) ) . {\displaystyle w^{k+1}=S_{\gamma }\left(w^{k}-\gamma \nabla F\l

    Read more →