AI Analytics Masters

AI Analytics Masters — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Exercism

    Exercism

    Exercism is an online, open-source, free coding platform that offers code practice and mentorship on 77 different programming languages. == History == Software developer Katrina Owen created Exercism while she was teaching programming at Jumpstart Labs. The platform was developed as an internal tool to solve the problem of her own students not receiving feedback on the coding problems they were practicing. Katrina put the site publicly online and found that people were sharing it with their friends, practicing together and giving each other feedback. Within 12 months, the site had organically grown to see over 6,000 users had submitted code or feedback, and hundreds of volunteers contribute to the languages or tooling on the platform. In 2016, Jeremy Walker joined as co-founder and CEO. In July 2018, the site was relaunched with a new design and centered around a formal mentoring mode, at which point Katrina stepped back from day-to-day involvement. == Product == In the past, the website differed from other coding platforms by requiring students to download exercises through a command line client, solve the code on their own computers then submit the solution for feedback, at which point they can also view other's solutions to the same problem. Since its second relaunch in 2021, solutions can be edited and submitted through a web editor, though the command line client remains available. Exercism has tracks for 74 programming languages. Among the notable languages taught: ABAP, C, C#, C++, CoffeeScript, Delphi, Elm, Erlang, F#, Gleam, Go, Java, JavaScript, Julia, Kotlin, Objective-C, PHP, Python, Raku, Red, Ruby, Rust, Scala, Swift, and V (Vlang). In 2023, the site launched a "12 in 23" challenge for users to learn the basics of 12 different languages - one per month in 2023. == Open source == The Exercism codebase is open source. In April 2016, it consisted of 50 repositories including website code, API code, command-line code and, most of all, over 40 stand-alone repositories for different language tracks. As of February 2024 Exercism has 14,344 contributors, maintains 366 repositories, and 19,603 mentors.

    Read more →
  • They're Made Out of Meat

    They're Made Out of Meat

    "They're Made Out of Meat" is a short story by American writer Terry Bisson. It was originally published in OMNI. It consists entirely of dialogue between two characters. Bisson's website hosts a theatrical adaptation. A film adaptation won the Grand Prize at the Seattle Science Fiction Museum's 2006 film festival. The story was collected in the 1993 anthology Bears Discover Fire and Other Stories, and has circulated widely on the Internet, which Bisson found "flattering". It has been quoted in cognitive, cosmological, and philosophical scholarship. == Plot == The two characters are intelligent beings capable of traveling faster than light, on a mission to "contact, welcome and log in any and all sentient races or multibeings in this quadrant of the Universe." Bisson's stage directions represent them as "two lights moving like fireflies among the stars" on a projection screen. One of them tells the incredulous other about the recent discovery of carbon-based lifeforms "made up entirely of meat". After conversing briefly about it, they both deem such beings and communication with them too bizarre and agree to "erase the records and forget the whole thing", marking the Solar System "unoccupied". == Film adaptations == === They're Made out of Meat (2005) === In 2005, Stephen O'Regan wrote and directed a live film adaptation starring Tom Noonan and Ben Bailey. The film was made as a final project for the New York Film Academy. The main action takes place inside a diner full of teenagers in Staten Island, New York. The music for the film was scored by Bob Reynolds. === They're Made out of Meat (2010) === Jeff Frumess and Trevor Scott produced a version in 2010. They added the character of a homeless conspiracy theorist with an original score by musician Sam Belkin. The film was shot at Hartsdale station in Westchester County, New York. === Meat (2021) === Masha Maksimova developed a version in Cinemiracle format, a triple split-screen process, as a student project at the Berlin University of Applied Sciences in the communication design course. The dialogue is conducted by two telepathic humanoid aliens and the thoughts are visualised by found-footage collages.

    Read more →
  • Pommerman Challenge

    Pommerman Challenge

    The Pommerman Challenge is a multi-agent game to test autonomous artificial intelligence systems. == Game structure == Two-agent team compete against each other on an 11 x 11 board. Each agent can observe only part of the board, and the agents cannot communicate. The goal is to knock down the opponents. Agents place explosives to destroy walls and collect power-ups that appear from those walls, while avoiding death. Game objects can move unpredictably or be moved by an agent. == Play == The game involves real-time decision making. Agents must choose moves in about .1 seconds. == Algorithms == The real-time requirement limits the use of compute-heavy techniques such as Monte Carlo tree search. The branching factor at each move can be as large as 1,296, because all four agents act in each step, choosing among six possibilities. The agents choose by accounting for explosions, which have lifetimes of 10 steps. Explosions derail tree search techniques, as searches with less than 10 levels ignore explosions while deeper searches consider too many choices (given the branching factor). A hybrid approach uses a limited-depth tree search followed by exploring a deterministic/pessimistic scenario. Limiting the depth keeps the search tree small. The deterministic approach can predict far in the future, by omitting branching. "Good" actions are often those that perform well under pessimistic scenarios, particularly if safety is important. Identifying the worst sequence of positions for an object can suggest where to move it. After generating pessimistic scenarios, the agent quantifies the survivability of each move, notionally the number of positions in which the agent can then remain safely (without encountering other agents). == Competitions == 3 competitions were organized with slightly changing rules during 2018–2019. === Online - FFA === This round was a warm-up online event, where each competitor controlled only one agent. Results: 1st: Agent47Agent by Yichen Gong 2nd: aiKiller by Márton Görög === NeurIPS 2018 - Team === The first Pommerman competition with in-person finals. Results: 1st: hakozakijunctions by Toshihiro Takahashi 2nd: eisenach by Márton Görög 3rd: dypm by Takayuki Osogami The 3 best performing solutions used online tree search. === NeurIPS 2019 - Team Radio === The second competition with in-person finals improved communication between teammate agents. Results: 1st: Márton Görög 2nd: Paul Jasek 3rd: Yifan Zhang

    Read more →
  • WebCrow

    WebCrow

    The WebCrow is a research project carried out at the Information Engineering Department of the University of Siena with the purpose of automatically solving crosswords. == The Project == The scientific relevance of the project can be understood considering that cracking crosswords requires human-level knowledge. Unlike chess and related games and there is no closed world configuration space. A first nucleus of technology, such as search engines, information retrieval, and machine learning techniques enable computers to enfold with semantics real-life concepts. The project is based on a software system whose major assumption is to attack crosswords making use of the Web as its primary source of knowledge. WebCrow is very fast and often thrashes human challengers in competitions, especially on multi language crossword schemes. A distinct feature of the WebCrow software system is to combine properly natural language processing (NLP) techniques, the Google web search engine, and constraint satisfaction algorithms from artificial intelligence to acquire knowledge and to fill the schema. The most important component of WebCrow is the Web Search Module (WSM), which implements a domain specific web based question answering algorithm. The way WebCrow approaches crosswords solving is quite different with respect to humans: Whereas we tend to first answer clues we are sure of and then proceed filling the schema by exploiting the already answered clues as hints, WebCrow uses two clearly distinct stages. In the first one, it processes all the clues and tries to answer them all: For each clue it finds many possible candidates and sorts them according to complex ranking models mainly based on a probability criteria. In the second stage, WebCrow uses constraint satisfaction algorithms to fill the grid with the overall most likely combination of clue answers. In order to interact with Google, first of all, WebCrow needs to compose queries on the basis of the given clues. This is done by query expansion, whose purpose is to convert the clue into a query expressed by a simplified and more appropriate language for Google. The retrieved documents are parsed so as to extract a list of word candidates that are congruent with the crossword length constraints. Crosswords can hardly be faced by using encyclopedic knowledge only, since many clues are wordplays or are otherwise purposefully very ambiguous. This enigmatic component of crosswords is faced by a massive use of database of solved crosswords, and by automatic reasoning on a properly organized knowledge base of wired rules. Last but not the least, the final constraint satisfaction step is very effective to fill the correct candidate, even though, unlike humans, the system can not rely on very high confidence on the correctness of the answer. == Competitions == WebCrow speed and effectiveness has been tested many times in man-machine competitions on Italian, English and multi-language crosswords The outcome of the tests is that WebCrow can successfully compete with average human players on single language schemes and reaches expert level performance in multi-language crosswords. However, WebCrow has not reached expert level in single-language crosswords, yet. === ECAI-06 Competition === On August 30, 2006, at the European Conference on Artificial Intelligence (ECAI2006), 25 conference attendees and 53 internet connected crosswords lovers, competed with WebCrow in an official challenge organized within the conference program. The challenge consisted in 5 different crosswords (2 in Italian, 2 in English and one multi-language in Italian and English) and 15 minutes were assigned for each crossword. WebCrow ranked 21 out of 74 participants in the Italian competition, and won both the bilingual and English competitions. === Other Competitions === Several competitions have been held in Florence, Italy within the Creativity Festival in December 2006, and another official conference competition took place in Hyderabad, India in January 2007, within the International Conference of Artificial Intelligence, where it ranked second out of 25 participants.

    Read more →
  • Audio inpainting

    Audio inpainting

    Audio inpainting (also known as audio interpolation) is an audio restoration task which deals with the reconstruction of missing or corrupted portions of a digital audio signal. Inpainting techniques are employed when parts of the audio have been lost due to various factors such as transmission errors, data corruption or errors during recording. The goal of audio inpainting is to fill in the gaps (i.e., the missing portions) in the audio signal seamlessly, making the reconstructed portions indistinguishable from the original content and avoiding the introduction of audible distortions or alterations. Many techniques have been proposed to solve the audio inpainting problem and this is usually achieved by analyzing the temporal and spectral information surrounding each missing portion of the considered audio signal. Classic methods employ statistical models or digital signal processing algorithms to predict and synthesize the missing or damaged sections. Recent solutions, instead, take advantage of deep learning models, thanks to the growing trend of exploiting data-driven methods in the context of audio restoration. Depending on the extent of the lost information, the inpainting task can be divided in three categories. Short inpainting refers to the reconstruction of few milliseconds (approximately less than 10) of missing signal, that occurs in the case of short distortions such as clicks or clipping. In this case, the goal of the reconstruction is to recover the lost information exactly. In long inpainting instead, with gaps in the order of hundreds of milliseconds or even seconds, this goal becomes unrealistic, since restoration techniques cannot rely on local information. Therefore, besides providing a coherent reconstruction, the algorithms need to generate new information that has to be semantically compatible with the surrounding context (i.e., the audio signal surrounding the gaps). The case of medium duration gaps lays between short and long inpainting. It refers to the reconstruction of tens of millisecond of missing data, a scale where the non-stationary characteristic of audio already becomes important. == Definition == Consider a digital audio signal x {\displaystyle \mathbf {x} } . A corrupted version of x {\displaystyle \mathbf {x} } , which is the audio signal presenting missing gaps to be reconstructed, can be defined as x ~ = m ∘ x {\displaystyle \mathbf {\tilde {x}} =\mathbf {m} \circ \mathbf {x} } , where m {\displaystyle \mathbf {m} } is a binary mask encoding the reliable or missing samples of x {\displaystyle \mathbf {x} } , and ∘ {\displaystyle \circ } represents the element-wise product. Audio inpainting aims at finding x ^ {\displaystyle \mathbf {\hat {x}} } (i.e., the reconstruction), which is an estimation of x {\displaystyle \mathbf {x} } . This is an ill-posed inverse problem, which is characterized by a non-unique set of solutions. For this reason, similarly to the formulation used for the inpainting problem in other domains, the reconstructed audio signal can be found through an optimization problem that is formally expressed as x ^ ∗ = argmin X ^ L ( m ∘ x ^ , x ~ ) + R ( x ^ ) {\displaystyle \mathbf {\hat {x}} ^{}={\underset {\hat {\mathbf {X} }}{\text{argmin}}}~L(\mathbf {m} \circ \mathbf {\hat {x}} ,\mathbf {\tilde {x}} )+R(\mathbf {\hat {x}} )} . In particular, x ^ ∗ {\displaystyle \mathbf {\hat {x}} ^{}} is the optimal reconstructed audio signal and L {\displaystyle L} is a distance measure term that computes the reconstruction accuracy between the corrupted audio signal and the estimated one. For example, this term can be expressed with a mean squared error or similar metrics. Since L {\displaystyle L} is computed only on the reliable frames, there are many solutions that can minimize L ( m ∘ x ^ , x ~ ) {\displaystyle L(\mathbf {m} \circ \mathbf {\hat {x}} ,\mathbf {\tilde {x}} )} . It is thus necessary to add a constraint to the minimization, in order to restrict the results only to the valid solutions. This is expressed through the regularization term R {\displaystyle R} that is computed on the reconstructed audio signal x ^ {\displaystyle \mathbf {\hat {x}} } . This term encodes some kind of a-priori information on the audio data. For example, R {\displaystyle R} can express assumptions on the stationarity of the signal, on the sparsity of its representation or can be learned from data. == Techniques == There exist various techniques to perform audio inpainting. These can vary significantly, influenced by factors such as the specific application requirements, the length of the gaps and the available data. In the literature, these techniques are broadly divided in model-based techniques (sometimes also referred as signal processing techniques) and data-driven techniques. === Model-based techniques === Model-based techniques involve the exploitation of mathematical models or assumptions about the underlying structure of the audio signal. These models can be based on prior knowledge of the audio content or statistical properties observed in the data. By leveraging these models, missing or corrupted portions of the audio signal can be inferred or estimated. An example of a model-based techniques are autoregressive models. These methods interpolate or extrapolate the missing samples based on the neighboring values, by using mathematical functions to approximate the missing data. In particular, in autoregressive models the missing samples are completed through linear prediction. The autoregressive coefficients necessary for this prediction are learned from the surrounding audio data, specifically from the data adjacent to each gap. Some more recent techniques approach audio inpainting by representing audio signals as sparse linear combinations of a limited number of basis functions (as for example in the Short Time Fourier Transform). In this context, the aim is to find the sparse representation of the missing section of the signal that most accurately matches the surrounding, unaffected signal. The aforementioned methods exhibit optimal performance when applied to filling in relatively short gaps, lasting only a few tens of milliseconds, and thus they can be included in the context of short inpainting. However, these signal-processing techniques tend to struggle when dealing with longer gaps. The reason behind this limitation lies in the violation of the stationarity condition, as the signal often undergoes significant changes after the gap, making it substantially different from the signal preceding the gap. As a way to overcome these limitations, some approaches add strong assumptions also about the fundamental structure of the gap itself, exploiting sinusoidal modeling or similarity graphs to perform inpainting of longer missing portions of audio signals. === Data-driven techniques === Data-driven techniques rely on the analysis and exploitation of the available audio data. These techniques often employ deep learning algorithms that learn patterns and relationships directly from the provided data. They involve training models on large datasets of audio examples, allowing them to capture the statistical regularities present in the audio signals. Once trained, these models can be used to generate missing portions of the audio signal based on the learned representations, without being restricted by stationarity assumptions. Data-driven techniques also offer the advantage of adaptability and flexibility, as they can learn from diverse audio datasets and potentially handle complex inpainting scenarios. As of today, such techniques constitute the state-of-the-art of audio inpainting, being able to reconstruct gaps of hundreds of milliseconds or even seconds. These performances are made possible by the use of generative models that have the capability to generate novel content to fill in the missing portions. For example, generative adversarial networks, which are the state-of-the-art of generative models in many areas, rely on two competing neural networks trained simultaneously in a two-player minmax game: the generator produces new data from samples of a random variable, the discriminator attempts to distinguish between generated and real data. During the training, the generator's objective is to fool the discriminator, while the discriminator attempts to learn to better classify real and fake data. In GAN-based inpainting methods the generator acts as a context encoder and produces a plausible completion for the gap only given the available information surrounding it. The discriminator is used to train the generator and tests the consistency of the produced inpainted audio. Recently, also diffusion models have established themselves as the state-of-the-art of generative models in many fields, often beating even GAN-based solutions. For this reason they have also been used to solve the audio inpainting problem, obtaining valid results. These models generate new data instances by inverting the

    Read more →
  • International Olympiad in Artificial Intelligence

    International Olympiad in Artificial Intelligence

    The International Olympiad in Artificial Intelligence (IOAI) is an annual International Science Olympiad in the field of artificial intelligence (AI) for secondary education students under the age of 20. The first IOAI was held in Burgas, Bulgaria, in 2024. Each country or territory may send up to two teams, each consisting of up to four students supported by one leader. Participants are selected through a multi-stage National Olympiad in Artificial Intelligence (NOAI) and/or a Regional Olympiad such as the NAOAI or APOAI. Participants at the IOAI compete on an individual basis. As of 2025, there were 61 countries and territories participating in the IOAI. Three hundred students participated in IOAI 2025. As of 2026, 130 countries and territories are accredited for participation in the IOAI. == Competition Structure == The IOAI consists of three contests: the Individual Contest, the Team Challenge, and the GAITE contest. Medals are awarded based solely on the Individual Contest. === Individual Contest === The Individual Contest is the main competition of the IOAI in which contestants compete individually on separate computers and are not permitted to communicate during the contest. Medals are awarded solely on the basis of the total score from the two-day Individual Contest. The Individual Contest consists of two on-site contest days (six hours per day), preceded by an at-home practice round and an on-site practice session. In IOAI 2025, three at-home problems were released for preparation approximately one month before the on-site contest. Results from this at-home round do not affect final results. The first on-site contest day (Individual Contest 1) comprises three tasks as extensions and continuations of the at-home tasks, while the second day (Individual Contest 2) comprises two or three tasks which are novel and different from the at-home tasks. The Individual Contest tasks span various AI domains such as machine learning, natural language processing, and computer vision. The IOAI 2025 contest rules describe tasks as requiring typical machine-learning workflows, including writing code, fitting models on training data, and running inference on test data, using identical local machines and GPU resources (minimum 24 GB RAM). Tasks, datasets, and submissions are handled through a contest platform (Bohrium), including a web-based Jupyter notebook environment for GPU access. Internet access is restricted to a whitelist of documentation sites and an integrated compact large language model accessible within the platform. The use of external APIs are prohibited unless a task explicitly allows them. In IOAI 2025, each contest task was scored up to 100 points and could include multiple subtasks. Scores are normalized using a baseline solution and a maximum score derived from either a Scientific Committee solution or the best contestant submission. Contestants can view only their own scores during the contest; a live scoreboard may be available publicly outside the contest hall but is not permitted to be viewed by contestants during the contest. For non-English-speaking teams, the IOAI hold a translation session beginning three hours before each contest day in which team leaders review and may amend machine-translated task statements; translations must match the English original and are published after the contest. The IOAI committee also enforces quarantine restrictions during these translation sessions, where neither contestants or team leaders may not use cell phones, laptops, and other communication devices. === Team Challenge === The Team Challenge is a team-based component of the IOAI. The results of this part do not affect the distribution of medals. The IOAI 2025 rules describe it as a “creative and AI-oriented challenge” in which a team's contestants sit together and cooperate, with the format varying by year. In IOAI 2024, teams worked with existing AI image and video generation tools to produce a visual result. In IOAI 2025, teams were assigned to program a robot to complete various tasks. === GAITE Contest === The GAITE (Global AI Talent Empowerment) contest is a simplified version of the individual contest with a separate scoreboard, where participants may ask for hints. It is designed for countries and territories with limited International Science Olympiads history, and it awards alternative prizes instead of medals. == Awards Distribution == The top 50% of the participants in the individual contest receive gold, silver and bronze medals in ratio of 1:2:3, respectively. The top three individuals receive honorary trophies. As in other International Science Olympiads, if an individual is in the top 50% on one of the days, but does not receive a medal, they receive an honorary mention during the awards ceremony. The GAITE contest has similar cutoff logic, but receives a reward instead of a medal. The top three teams in the Team Challenge receive trophies. == National selection and regional competitions == National delegations are selected through country-level qualification processes referred to as National Olympiads in Artificial Intelligence (NOAI) or equivalent, which are widely known for their low success rates. Although the total number of participants worldwide is not published, available data indicate exceptionally competitive national pools; for example, Brazil reports over 716,000 competitors, while Russia reports more than 72,000. In addition, Regional Olympiads (for example, APOAI or NAOAI) provide continent-level competition and preparation platforms in most regions. === National Selection (National Olympiads in Artificial Intelligence) === Participating countries and territories select their students for the IOAI through a National Olympiad in Artificial Intelligence (NOAI) or an equivalent process. The names of these selection processes differ by country, but almost all of them (excluding newer countries participating in the GAITE contest) have in common that the process comprises multiple and/or extremely rigorous selection stages. United States / Canada – The USA–North America AI Olympiad (USAAIO) is a three-round process including an invitational in-person round and a subsequent selection camp, after which a national delegation is selected for IOAI. Russia – The Russian Olympiad in Artificial Intelligence is organized as a multi-stage process (training, qualification, main round, final). Organizers reported 72,316 registrations for the training round and 52,260 registrations for the qualifying round in one season, with tasks spanning mathematics, algorithms/programming, and machine learning; 977 students were disqualified following plagiarism checks. Japan – Japan's national selection consists of multiple stages, beginning with the Japan Olympiad in Artificial Intelligence (JOAI), a large-scale Kaggle-style competition. High-performing participants advance through additional assessment stages, including written solution reports and technical interviews. From this process, eight students are selected for the APOAI team, with four ultimately chosen to represent Japan at the IOAI. Brazil – Brazil's National Olympiad in Artificial Intelligence (ONIA) is conducted as a large competition which consists of progressive rounds of evaluation. It identifies 28 top students from over 716,000 competitors, four of which are selected for the IOAI. The competition is held in four phases across two cycles, including a two-step third phase and a final training-and-evaluation phase that selects a four-student national team. Singapore – Singapore's national Olympiad consists of two rounds: an online preliminary round (300 MCQs in 3 hours) selects the top 150 performers to advance to the final assessment, which includes both theory questions and Python programming tasks. Additional training and selection may follow the finals for top performers. Poland – The Polish AI Olympiad adopts a two-stage structure: an open online first stage (at-home tasks) and a second-stage competitive camp with 30 selected participants competing for a four-person IOAI team. France – The Olympiades Françaises d'Intelligence Artificielle (OFIA), organized by France-IOI, follow a three-stage structure consisting of an open online qualification round, a second selection round, and a multi-day national training camp and final in Paris. Bangladesh – The Bangladesh AI Olympiad (BdAIO) selects competitors in three rounds: the online preliminary round, the national finals, and the team selection camp. In 2025, 406 participants competed in the national finals. Norway – The Norwrgian AI Olympiad (NOKI) is a three-stage selection system; however, unlike other countries, its first two rounds are shared with the Norwegian Informatics Olympiad. The national Olympiad reports 1,180 participants in the first round. Hong Kong – The national Olympiad reported more than 800 preliminary-round entrants, narrowing through multiple rounds to 25 finalists, with a subsequent

    Read more →
  • Darwin among the Machines

    Darwin among the Machines

    "Darwin among the Machines" is a letter to the editor published in The Press newspaper on 13 June 1863 in Christchurch, New Zealand. The title, which was chosen by the author, references the work of Charles Darwin. Written by Samuel Butler but signed Cellarius, the letter raised the possibility that machines were a kind of "mechanical life" undergoing constant evolution, and that eventually machines might supplant humans as the dominant species. == Book of the Machines == Butler developed this and subsequent articles into The Book of the Machines, three chapters of Erewhon, published anonymously in 1872. The Erewhonian society Butler envisioned had long ago undergone a revolution that destroyed most mechanical inventions. The narrator of the story finds a book that details the reasons for this revolution, which he translates for the reader. Despite the initial popularity of Erewhon, Butler commented in the preface to the second edition that reviewers had "in some cases been inclined to treat the chapters on Machines as an attempt to reduce Mr. Darwin's theory to an absurdity." He protested that "few things would be more distasteful to me than any attempt to laugh at Mr. Darwin", but also added "I am surprised, however, that the book at which such an example of the specious misuse of analogy would seem most naturally levelled should have occurred to no reviewer; neither shall I mention the name of the book here, though I should fancy that the hint given will suffice", which may suggest that the chapter on Machines was in fact a satire intended to illustrate the "specious misuse of analogy", even if the target was not Darwin; Butler, fearing that he had offended Darwin, wrote him a letter explaining that the actual target was Joseph Butler's 1736 The Analogy of Religion, Natural and Revealed, to the Constitution and Course of Nature. The Victorian scholar Herbert Sussman has suggested that although Butler's exploration of machine evolution was intended to be whimsical, he may also have been genuinely interested in the notion that living organisms are a type of mechanism and was exploring this notion with his writings on machines, while the philosopher Louis Flaccus called it "a mixture of fun, satire, and thoughtful speculation." == Evolution of Global Intelligence == George Dyson applies Butler's original premise to the artificial life and intelligence of Alan Turing in Darwin Among the Machines: The Evolution of Global Intelligence (1998) ISBN 0-7382-0030-1, to suggest that the internet is a living, sentient being. Dyson's main claim is that the evolution of a conscious mind from today's technology is inevitable. It is not clear whether this will be a single mind or multiple minds, how smart that mind would be, and even if we will be able to communicate with it. He also clearly suggests that there are forms of intelligence on Earth that we are currently unable to understand. From the book: "What mind, if any, will become apprehensive of the great coiling of ideas now under way is not a meaningless question, but it is still too early in the game to expect an answer that is meaningful to us."

    Read more →
  • Deepfake

    Deepfake

    Deepfakes (a portmanteau of 'deep learning' and 'fake') are images, videos, or audio that have been edited or generated using artificial intelligence, AI-based tools or audio-video editing software. They may depict real or fictional people and are considered a form of synthetic media, that is media that is usually created by artificial intelligence systems by combining various media elements into a new media artifact. While the act of creating fake content is not new, deepfakes uniquely leverage machine learning and artificial intelligence techniques, including facial recognition algorithms and artificial neural networks such as variational autoencoders and generative adversarial networks (GANs). In turn, the field of image forensics has worked to develop techniques to detect manipulated images. Deepfakes have garnered widespread attention for their potential use in creating child sexual abuse material, celebrity pornographic videos, revenge porn, fake news, hoaxes, bullying, and financial fraud. Academics have raised concerns about the potential for deepfakes to promote disinformation and hate speech, as well as interfere with elections. In response, the information technology industry and governments have proposed recommendations and methods to detect and mitigate their use. Academic research has also delved deeper into the factors driving deepfake engagement online as well as potential countermeasures to malicious application of deepfakes. From traditional entertainment to gaming, deepfake technology has evolved to be increasingly convincing and available to the public, allowing for the disruption of the entertainment and media industries. == History == Photo manipulation was developed in the 19th century and soon applied to motion pictures. Technology steadily improved during the 20th century, and more quickly with the advent of digital video. Deepfake technology has been developed by researchers at academic institutions beginning in the 1990s, and later by amateurs in online communities. More recently, the methods have been adopted by industry. The development of generative adversarial networks (GANs) in the mid-2010s represented a key technical turning point in the evolution of deepfakes. GANs allowed for the creation of highly realistic fake images and videos by training competing neural networks, achieving a much improved visual fidelity over previous methods of creating the content using rules or by using autoencoders, and formed the basis for modern deepfake methods. === Academic research === Academic research related to deepfakes is split between the field of computer vision, a sub-field of computer science, which develops techniques for creating and identifying deepfakes, and humanities and social science approaches that study the social, ethical, aesthetic implications as well as journalistic and informational implications of deepfakes. As deepfakes have risen in prominence in popularity with innovations provided by AI tools, significant research has gone into detection methods and defining the factors driving engagement with deepfakes on the internet. Deepfakes have been shown to appear on social media platforms and other parts of the internet for purposes ranging from entertainment and education related to deepfakes to misinformation to elicit strong reactions. There are gaps in research related to the propagation of deepfakes on social media. Negativity and emotional response are the primary driving factors for users sharing deepfakes. === Social science and humanities approaches to deepfakes === In cinema studies, deepfakes illustrate how "the human face is emerging as a central object of ambivalence in the digital age". Video artists have used deepfakes to "playfully rewrite film history by retrofitting canonical cinema with new star performers". Film scholar Christopher Holliday analyses how altering the gender and race of performers in familiar movie scenes destabilizes gender classifications and categories. The concept of "queering" deepfakes is also discussed in Oliver M. Gingrich's discussion of media artworks that use deepfakes to reframe gender, including British artist Jake Elwes' Zizi: Queering the Dataset, an artwork that uses deepfakes of drag queens to intentionally play with gender. The aesthetic potentials of deepfakes are also beginning to be explored. Theatre historian John Fletcher notes that early demonstrations of deepfakes are presented as performances, and situates these in the context of theater, discussing "some of the more troubling paradigm shifts" that deepfakes represent as a performance genre. While most English-language academic studies of deepfakes focus on the Western anxieties about disinformation and pornography, digital anthropologist Gabriele de Seta has analyzed the Chinese reception of deepfakes, which are known as huanlian, which translates to "changing faces". The Chinese term does not contain the "fake" of the English deepfake, and de Seta argues that this cultural context may explain why the Chinese response has centered on practical regulatory measures to "fraud risks, image rights, economic profit, and ethical imbalances". === Computer science research on deepfakes === A landmark early project was the "Video Rewrite" program, published in 1997. The program modified existing video footage of a person speaking to depict that person mouthing the words from a different audio track. It was the first system to fully automate this kind of facial reanimation, and it did so using machine learning techniques to make connections between the sounds produced by a video's subject and the shape of the subject's face. Contemporary academic projects have focused on creating more realistic videos and improving deepfake techniques. The "Synthesizing Obama" program, published in 2017, modifies video footage of former president Barack Obama to depict him mouthing the words contained in a separate audio track. The project lists as a main research contribution to its photorealistic technique for synthesizing mouth shapes from audio. The "Face2Face" program, published in 2016, modifies video footage of a person's face to depict them mimicking another person's facial expressions. The project highlights its primary research contribution as the development of the first method for re-enacting facial expressions in real time using a camera that does not capture depth, enabling the technique to work with common consumer cameras. Researchers have also shown that deepfakes are expanding into other domains such as medical imagery. In this work, it was shown how an attacker can automatically inject or remove lung cancer in a patient's 3D CT scan. The result was so convincing that it fooled three radiologists and a state-of-the-art lung cancer detection AI. To demonstrate the threat, the authors successfully performed the attack on a hospital in a White hat penetration test. A survey of deepfakes, published in May 2020, provides a timeline of how the creation and detection of deepfakes have advanced over the last few years. The survey identifies that researchers have been focusing on resolving the following challenges of deepfake creation: Generalization. High-quality deepfakes are often achieved by training on hours of footage of the target. This challenge is to minimize the amount of training data and the time to train the model required to produce quality images and to enable the execution of trained models on new identities (unseen during training). Paired Training. Training a supervised model can produce high-quality results, but requires data pairing. This is the process of finding examples of inputs and their desired outputs for the model to learn from. Data pairing is laborious and impractical when training on multiple identities and facial behaviors. Some solutions include self-supervised training (using frames from the same video), the use of unpaired networks such as Cycle-GAN, or the manipulation of network embeddings. Identity leakage. This is where the identity of the driver (i.e., the actor controlling the face in a reenactment) is partially transferred to the generated face. Some solutions proposed include attention mechanisms, few-shot learning, disentanglement, boundary conversions, and skip connections. Occlusions. When part of the face is obstructed with a hand, hair, glasses, or any other item then artifacts can occur. A common occlusion is a closed mouth which hides the inside of the mouth and the teeth. Some solutions include image segmentation during training and in-painting. Temporal coherence. In videos containing deepfakes, artifacts such as flickering and jitter can occur because the network has no context of the preceding frames. Some researchers provide this context or use novel temporal coherence losses to help improve realism. As the technology improves, the interference is diminishing. Overall, deepfakes are expected to have several implications in media and society, med

    Read more →
  • Deep Zoom

    Deep Zoom

    Deep Zoom is a technology developed by Microsoft for efficiently transmitting and viewing images. It allows users to pan around and zoom in on a large, high resolution image or a large collection of images. It reduces the time required for initial load by downloading only the region being viewed or only at the resolution it is displayed at. Subsequent regions are downloaded as the user pans to (or zooms into) them; animations are used to hide any jerkiness in the transition. The libraries are also available in other platforms including Java and Flash. == History == The Deep Zoom file format is very similar to the Google Maps image format where images are broken into tiles and then displayed as required. The tiling typically follows a quadtree pattern of increasing resolution of image (in other words twice the zoom and twice the resolution). The main difference is that with Google Maps the actual details on the image change from one zoom level to another, while with Deep Zoom the same image is displayed at each zoom level. Seadragon Software, formerly Sand Codex, first created the Seadragon technology and its implementation of what is now called Deep Zoom. This technology was then absorbed into the Microsoft Live Labs when Seadragon Software was acquired. Engineers from Seadragon now work with Microsoft to integrate their work into technology such as Silverlight and Photosynth. == Deep Zoom examples == The most famous implementation of Deep Zoom was probably the first: the memorabilia collection at the Hard Rock website. Conceived and designed by Duncan/Channon and built by Vertigo, it was demonstrated for the first time in March 2008 at the Microsoft MIX convention in Las Vegas. In 2010, Microsoft Live Labs partnered with the University of California, Berkeley to create ChronoZoom, a DeepZoom-powered time visualization tool that pushed the limits of DeepZoom, since it required zooming from the scale of 13 billion years down to a single day. The project has since graduated to development under Microsoft Research. Another example is the Deep Earth project. It is described by its creators as "a community project focused on creating a rich interactive mapping control using Silverlight2 Deep Zoom. Concentrating on Microsoft Virtual Earth imagery and data the project offers team members the opportunity to learn and share while creating something cool and useful." A paintings collection project http://galleryzoom.co.uk/ shows 1000 high resolution/sensor images individually indexed. (Using Deep Zoom Composer). Blaise Aguera y Arcas gave a demonstration of Seadragon and Photosynth at the 2007 TED conference. In November 2009, 352 Media Group, a Silverlight developer in the Microsoft Silverlight Partner Program, created an example of Deep Zoom using Microsoft Silverlight version 3. It is online at 352 Media Group's Web site. The Winston Churchill Deep Zoom Archived 2010-07-04 at the Wayback Machine mosaic, created by Silverlight developers Shoothill, features as both an online interactive deep zoom and a standalone deep zoom which forms part of the Churchill exhibit in the Churchill War Rooms in Whitehall. In 2010, Shoothill built the Sumatran Tiger Deep Zoom - the largest seen to date - for worldwide conservation charity Fauna and Flora International, featuring thousands of images of endangered species. An early example of Deep Zoom-like technology was implemented at The Department of Maori Affairs in New Zealand in 1997. The technology was used to display Maori land ownership. == Deep Zoom images == The file format used by Deep Zoom (as well as Photosynth and Seadragon Ajax) is XML based. Users can specify a single large image (dzi) or a collection of images (dzc). It also allows for "Sparse Images"; where some parts of the image have greater resolution than others, an example of which can be found on the Seadragon Ajax home page; The bike image displayed is a sparse image. Though used in the proprietary Deep Zoom, the dzi format is open and able to be used by anyone. === Deep Zoom image (dzi) === A DZI has two parts: a DZI file (with either a .dzi or .xml extension) and a subdirectory of image folders. Each folder in the image subdirectory is labeled with its level of resolution. Higher numbers correspond to a higher resolution level; inside each folder are the image tiles corresponding to that level of resolution, numbered consecutively in columns from top left to bottom right. === Deep Zoom collection (dzc) === A DZC is a collection of some number of DZIs linked and referenced by a DZC file (with either a .dzc or .xml extension). At a high level, a collection is a number of image thumbnails whose location is kept track of by the .dzc/.xml file, when zooming into an image, it accesses greater resolutions tiles. A DZC's structure is similar to that of a DZI; the .dzc/.xml file defines the collection and the subdirectory of folders maps to the DZI file structure, each with their set of .dzi/.xml and image tiles. The DZC is used in Microsoft's Pivot, but not in SeaDragon per se. === Sparse Images === Sparse images are a sub-classification of the DZI file type. A sparse image is normally a number of separate photographs with varying resolution levels that have been placed in a single DZI instead of a DZC. Sparse images have no different file structure than that of a DZI and differ only in that there is not a single "highest resolution" level for the entire DZI. == Software that uses Deep Zoom == Image Composite Editor - image stitching tool created by Microsoft Research Deep Zoom Composer - collage maker and simple panorama tool created by Microsoft. Images' resolution is maintained when exporting for web use (via Silverlight Deep Zoom or JavaScript using a third-party template). No longer available for download from Microsoft though it can be found on various other sources such as Internet Archive. == iPhone OS development == Microsoft Live Labs has created an application for the App Store called Seadragon Mobile. It is run over the internet and includes Deep Zoom on the following categories; art, history, maps, photos, Photosynth which anybody can upload to, space and technology & web.

    Read more →
  • Pop music automation

    Pop music automation

    Pop music automation is a field of study among musicians and computer scientists with a goal of producing successful pop music algorithmically. It is often based on the premise that pop music is especially formulaic, unchanging, and easy to compose. The idea of automating pop music composition is related to many ideas in algorithmic music, artificial intelligence (AI) and computational creativity. == History of automation in music == Algorithms (or, at the very least, formal sets of rules) have been used to compose music for centuries; the procedures used to plot voice-leading in counterpoint, for example, can often be reduced to algorithmic determinant. Now the term is usually reserved, however, for the use of formal procedures to make music without human intervention. Classical music automation software exists that generates music in the style of Mozart and Bach and jazz. Most notably, David Cope has written a software system called "Experiments in Musical Intelligence" (or "EMI") that is capable of analyzing and generalizing from existing music by a human composer to generate novel musical compositions in the same style. EMI's output is convincing enough to persuade human listeners that its music is human-generated to a high level of competence. Creativity research in jazz has focused on the process of improvisation and the cognitive demands that this places on a musical agent: reasoning about time, remembering and conceptualizing what has already been played, and planning ahead for what might be played next. Inevitably associated with pop music automation is pop music analysis. Projects in pop music automation may include, but are not limited to, ideas in melody creation and song development, vocal generation or improvement, automatic accompaniment and lyric composition. == Automatic accompaniment == Some systems exist that automatically choose chords to accompany a vocal melody in real-time. A user with no musical experience can create a song with instrumental accompaniment just by singing into a microphone. An example is a Microsoft Research project called Songsmith, which trains a Hidden Markov model using a music database and uses that model to select chords for new melodies. == Melody generation == Automatic melody generation is often done with a Markov chain, the states of the system become note or pitch values, and a probability vector for each note is constructed, completing a transition probability matrix (see below). An algorithm is constructed to produce an output note values based on the transition matrix weightings, which could be MIDI note values, frequency (Hz), or any other desirable metric. A second-order Markov chain can be introduced by considering the current state and also the previous state, as indicated in the second table. Higher, nth-order chains tend to "group" particular notes together, while 'breaking off' into other patterns and sequences occasionally. These higher-order chains tend to generate results with a sense of phrasal structure, rather than the 'aimless wandering' produced by a first-order system. == Lyric composition == Automated lyric creating software may take forms such as: Selecting words according to their rhythm The Tra-la-Lyrics system produces song lyrics, in Portuguese, for a given melody. This not only involves matching each word syllable with a note in the melody, but also matching the word's stress with the strong beats of the melody. Parsing existing pop music (e.g. for content or word choice) This involves natural language processing. Pablo Gervás has developed a noteworthy system called ASPERA that employs a case-based reasoning (CBR) approach to generating poetic formulations of a given input text via a composition of poetic fragments that are retrieved from a case-base of existing poems. Each poem fragment in the ASPERA case-base is annotated with a prose string that expresses the meaning of the fragment, and this prose string is used as the retrieval key for each fragment. Metrical rules are then used to combine these fragments into a well-formed poetic structure. Automatic analogy or story creation Programs like TALE-SPIN and The MINSTREL system represent a complex elaboration of this basis approach, distinguishing a range of character-level goals in the story from a range of author-level goals for the story. Systems like Bringsjord's BRUTUS can create stories with complex interpersonal themes like betrayal. On-line metaphor generation systems like 'Sardonicus' or 'Aristotle' can suggest lexical metaphors for a given descriptive goal (e.g., to describe a supermodel as skinny, the source terms “pencil”, “whip”, “whippet”, “rope”, “stick-insect” and “snake” are suggested). Free association of grouped words Using a language database (such as wordnet) one can create musings on a subject that may be weak grammatically but are still sensical. See such projects as the Flowerewolf automatic poetry generator or the Dada engine. == Software == === More or less free === BreathCube by xoxos. Simple lyrical vocal content is generated with simple music. CubeBreath by xoxos. Audio input is vocoded in tune with the music. Midi Internet Algorithmic Composition infno, infinite generator of electronic dance music and synth-pop. Algorithmic Trap, trap beat generator. === Commercial === Band in a box generates any element, potentially creates whole new songs from scratch. Musical Palette - Melody Composing Tool SongSmith: Automatic accompaniment for vocal melodies Ludwig 3.0 automatic accompaniment, writes arrangements for given instruments, plays its own songs for an infinitely long time. Automated Composing System creates music in many different styles

    Read more →
  • Minne Atairu

    Minne Atairu

    Minne Atairu is a Nigerian interdisciplinary artist, a recipient of the 2021 Global South Award Lumen Prize for Art and Technology. She generates synthetic Benin Bronzes through recombination of historical fragments, sculptures, texts, images, and sounds. == Early life and education == Atairu was born in Benin, Nigeria. She holds a bachelor's degree in art history from the University of Maiduguri in Maiduguri, Nigeria; a master's degree in museum studies from the George Washington University in Washington, D.C.; and a doctorate in art education from Teachers College, Columbia University in New York City. Her academic research integrates artificial intelligence, art/museum education and hip-hop based education. == Works == Atairu's artmaking involves using artificial intelligence (AI; such as StyleGAN, GPT-3) to make artwork. She uses tools such as Midjourney and Blender software to develop her works. === Mami Wata === Her first work is a Yoruba goddess called Mami Wata where she used Midjourney in generating the images. === To the Hand === For her 2023 installation To the Hand at The Shed arts center, she worked with Blender to convert text into 3D-printed sculptures made of corn starch or sugarcane infused with bronze. The rings of ground terra-cotta that surround the sculpture represent the walls and deep moats of Benin. == Publications == Atairu, Minne (February 1, 2024). "Reimagining Benin Bronzes using generative adversarial networks". AI & Society. 39 (1): 91–102. doi:10.1007/s00146-023-01761-7. ISSN 1435-5655.

    Read more →
  • IJCAI Computers and Thought Award

    IJCAI Computers and Thought Award

    The IJCAI Computers and Thought Award is presented every two years by the International Joint Conference on Artificial Intelligence (IJCAI), recognizing outstanding young scientists in artificial intelligence. It was originally funded with royalties received from the book Computers and Thought (edited by Edward Feigenbaum and Julian Feldman), and is currently funded by IJCAI. It is considered to be "the premier award for artificial intelligence researchers under the age of 35". == Laureates == Terry Winograd (1971) Patrick Winston (1973) Chuck Rieger (1975) Douglas Lenat (1977) David Marr (1979) Gerald Sussman (1981) Tom Mitchell (1983) Hector Levesque (1985) Johan de Kleer (1987) Henry Kautz (1989) Rodney Brooks (1991) Martha E. Pollack (1991) Hiroaki Kitano (1993) Sarit Kraus (1995) Stuart Russell (1995) Leslie Kaelbling (1997) Nicholas Jennings (1999) Daphne Koller (2001) Tuomas Sandholm (2003) Peter Stone (2007) Carlos Guestrin (2009) Andrew Ng (2009) Vincent Conitzer (2011) Malte Helmert (2011) Kristen Grauman (2013) Ariel D. Procaccia (2015) Percy Liang (2016) for his contributions to both the approach of semantic parsing for natural language understanding and better methods for learning latent-variable models, sometimes with weak supervision, in machine learning. Devi Parikh (2017) Stefano Ermon (2018) Guy Van den Broeck (2019) for his contributions to statistical and relational artificial intelligence, and the study of tractability in learning and reasoning. Piotr Skowron (2020) for his contributions to computational social choice, and to the theory of committee elections. Fei Fang (2021) for her contributions to integrating machine learning with game theory and the use of these novel techniques to tackle societal challenges such as more effective deployment of security resources, enhancing environmental sustainability, and reducing food insecurity. Bo Li (2022) for her contributions to uncovering the underlying connections among robustness, privacy, and generalization in AI, showing how different models are vulnerable to malicious attacks, and how to eliminate these vulnerabilities using mathematical tools that provide robustness guarantees for learning models and privacy protection. Pin-Yu Chen (2023) for his contributions to consolidating properties of trust, robustness and safety into rigorous algorithmic procedures and computable metrics for improving AI systems. Nisarg Shah (2024) for his contributions to AI and society, in particular foundational work on the theory of algorithmic fairness using principles from social choice theory. Aditya Grover (2025) for his foundational contributions uniting deep generative models, representation learning, and reinforcement learning, and for their applications in advancing scientific reasoning.

    Read more →
  • Network Abstraction Layer

    Network Abstraction Layer

    The Network Abstraction Layer (NAL) is a part of the H.264/AVC and HEVC video coding standards. The main goal of the NAL is the provision of a "network-friendly" video representation addressing "conversational" (video telephony) and "non conversational" (storage, broadcast, or streaming) applications. NAL has achieved a significant improvement in application flexibility relative to prior video coding standards. == Introduction == An increasing number of services and growing popularity of high definition TV are creating greater needs for higher coding efficiency. Moreover, other transmission media such as cable modem, xDSL, or UMTS offer much lower data rates than broadcast channels, and enhanced coding efficiency can enable the transmission of more video channels or higher quality video representations within existing digital transmission capacities. Video coding for telecommunication applications has diversified from ISDN and T1/E1 service to embrace PSTN, mobile wireless networks, and LAN/Internet network delivery. Throughout this evolution, continued efforts have been made to maximize coding efficiency while dealing with the diversification of network types and their characteristic formatting and loss/error robustness requirements. The H.264/AVC and HEVC standards are designed for technical solutions including areas like broadcasting (over cable, satellite, cable modem, DSL, terrestrial, etc.) interactive or serial storage on optical and magnetic devices, conversational services, video-on-demand or multimedia streaming, multimedia messaging services, etc. Moreover, new applications may be deployed over existing and future networks. This raises the question about how to handle this variety of applications and networks. To address this need for flexibility and customizability, the design covers a NAL that formats the Video Coding Layer (VCL) representation of the video and provides header information in a manner appropriate for conveyance by a variety of transport layers or storage media. The NAL is designed in order to provide "network friendliness" to enable simple and effective customization of the use of VCL for a broad variety of systems. The NAL facilitates the ability to map VCL data to transport layers such as: RTP/IP for any kind of real-time wire-line and wireless Internet services. File formats, e.g., ISO MP4 for storage and MMS. H.32X for wireline and wireless conversational services. MPEG-2 systems for broadcasting services, etc. The full degree of customization of the video content to fit the needs of each particular application is outside the scope of the video coding standardization effort, but the design of the NAL anticipates a variety of such mappings. Some key concepts of the NAL are NAL units, byte stream, and packet formats uses of NAL units, parameter sets, and access units. A short description of these concepts is given below. == NAL units == The coded video data is organized into NAL units, each of which is effectively a packet that contains an integer number of bytes. The first byte of each H.264/AVC NAL unit is a header byte that contains an indication of the type of data in the NAL unit. For HEVC the header was extended to two bytes. All the remaining bytes contain payload data of the type indicated by the header. The NAL unit structure definition specifies a generic format for use in both packet-oriented and bitstream-oriented transport systems, and a series of NAL units generated by an encoder is referred to as a NAL unit stream. == NAL Units in Byte-Stream Format Use == Some systems require delivery of the entire or partial NAL unit stream as an ordered stream of bytes or bits within which the locations of NAL unit boundaries need to be identifiable from patterns within the coded data itself. For use in such systems, the H.264/AVC and HEVC specifications define a byte stream format. In the byte stream format, each NAL unit is prefixed by a specific pattern of three bytes called a start code prefix. The boundaries of the NAL unit can then be identified by searching the coded data for the unique start code prefix pattern. The use of emulation prevention bytes guarantees that start code prefixes are unique identifiers of the start of a new NAL unit. A small amount of additional data (one byte per video picture) is also added to allow decoders that operate in systems that provide streams of bits without alignment to byte boundaries to recover the necessary alignment from the data in the stream. Additional data can also be inserted in the byte stream format that allows expansion of the amount of data to be sent and can aid in achieving more rapid byte alignment recovery, if desired. == NAL Units in Packet-Transport System Use == In other systems (e.g., IP/RTP systems), the coded data is carried in packets that are framed by the system transport protocol, and identification of the boundaries of NAL units within the packets can be established without use of start code prefix patterns. In such systems, the inclusion of start code prefixes in the data would be a waste of data carrying capacity, so instead the NAL units can be carried in data packets without start code prefixes. == VCL and Non-VCL NAL Units == NAL units are classified into VCL and non-VCL NAL units. VCL NAL units contain the data that represents the values of the samples in the video pictures. Non-VCL NAL units contain any associated additional information such as parameter sets (important header data that can apply to a large number of VCL NAL units) and supplemental enhancement information (timing information and other supplemental data that may enhance usability of the decoded video signal but are not necessary for decoding the values of the samples in the video pictures). == Parameter Sets == A parameter set contains shared configuration data that is carried in non-VCL NAL units. Parameter sets are typically reused when decoding many coded pictures within a video sequence. Each VCL NAL unit references a picture parameter set (PPS), which in turn references a sequence parameter set (SPS). There are two types of parameter sets: Sequence parameter set (SPS), which specifies mostly constant configuration such as resolution, bit depth, or chroma format. (For a concrete implementation, see FFmpeg's SPS struct.) Picture parameter set (PPS), which applies on top of an SPS, and specifies configuration such as QP offsets. (For a concrete implementation, see FFmpeg's PPS struct.) The sequence and picture parameter-set mechanism decouples the transmission of infrequently changing information from the transmission of coded representations of the values of the samples in the video pictures. Each VCL NAL unit contains an identifier that refers to the content of the relevant picture parameter set and each picture parameter set contains an identifier that refers to the content of the relevant sequence parameter set. In this manner, a small amount of data (the identifier) can be used to refer to a larger amount of information (the parameter set) without repeating that information within each VCL NAL unit. Sequence and picture parameter sets can be sent well ahead of the VCL NAL units that they apply to, and can be repeated to provide robustness against data loss. In some applications, parameter sets may be sent within the channel that carries the VCL NAL units (termed "in-band" transmission). In other applications, it can be advantageous to convey the parameter sets "out-of-band" using a more reliable transport mechanism than the video channel itself. == Access Units == A set of NAL units in a specified form is referred to as an access unit. The decoding of each access unit results in one decoded picture. Each access unit contains a set of VCL NAL units that together compose a primary coded picture. It may also be prefixed with an access unit delimiter to aid in locating the start of the access unit. Some supplemental enhancement information containing data such as picture timing information may also precede the primary coded picture. The primary coded picture consists of a set of VCL NAL units consisting of slices or slice data partitions that represent the samples of the video picture. Following the primary coded picture may be some additional VCL NAL units that contain redundant representations of areas of the same video picture. These are referred to as redundant coded pictures, and are available for use by a decoder in recovering from loss or corruption of the data in the primary coded pictures. Decoders are not required to decode redundant coded pictures if they are present. Finally, if the coded picture is the last picture of a coded video sequence (a sequence of pictures that is independently decodable and uses only one sequence parameter set), an end of sequence NAL unit may be present to indicate the end of the sequence; and if the coded picture is the last coded picture in the entire NAL unit stream, an end of stream NAL unit may be present to

    Read more →
  • HYPO CBR

    HYPO CBR

    HYPO is a computer program, an expert system, that models reasoning with cases and hypotheticals in the legal domain. It is the first of its kind and the most sophisticated of the case-based legal reasoners, which was designed by Kevin Ashley for his Ph.D dissertation in 1987 at the University of Massachusetts Amherst under the supervision of Edwina Rissland. HYPO's design represents a hybrid generalization/comparative evaluation method appropriate for a domain with a weak analytical theory and applies to tasks that rarely involve just one right answer. The domain covers US trade secret law, and is substantially a common law domain. Since Anglo-American common law operates under the doctrine of precedent, the definitive way of interpreting problems is of necessity and case-based. Thus, HYPO did not involve the analysis of a statute, as required by the Prolog program. Rissland and Ashley (1987) envisioned HYPO as employing the key tasks performed by lawyers when analyzing case law for precedence to generate arguments for the prosecution or the defence. HYPO was a successful example of a general category of legal expert systems (LESs), it applies artificial intelligence (A.I.) techniques to the domain of legal reasoning in patent law, implementing a case-based reasoning (CBR) system, in contrast to rule based systems like MYCIN, or mixed-paradigm systems integrating CBR with rule-based or model-based reasoning like IKBALS II. A legal case-based reasoning essentially reasons from prior tried cases, comparing the contextual information in the current input case with that of cases previously tried and entered into the system. As noted by Ashley and Rissland (1988) CBR is used to "... capture expertise in domains where rules are ill-defined, incomplete or inconsistent". The HYPO project set out to model the creation of hypotheticals in law, where no case matches well enough. HYPO uses hypotheticals for a variety of tasks necessary for good interpretation: "to redefine old situations in terms of new dimensions, to create new standard cases when an appropriate one doesn’t exist, to explore and test the limits of a concept, to refocus a case by excluding some issues and to organize or cluster cases". Hypotheticals can include facts that support two conflicting lines of reasoning. So, it makes and responds to arguments from competing viewpoints about who should win the dispute. HYPO use heuristics such as making a case weaker or stronger, making a case extreme, enabling a near-miss, disabling a near-hit to generate hypotheticals in the context of an argument by using the dimensions mechanism. Dimensions have a range of values, along which the supportive strength that may shift from one side to the other. What differentiated this expert system from others was its facility not only to return a primary to best-case response but to return near-best-fit responses also. == Components == Legal knowledge in HYPO is contained in: the case-knowledge-base (CKB) and the library of dimensions. The CKB contains HYPO's base of known cases that are highly structured objects and sub-objects both real and hypothetical in the area of trade secret law. Each case is represented as a hierarchical set of frames whose slots are important facets of the case (e.g. Plaintiff, defendant, secret knowledge, employer/employee data).Ashley’s HYPO system used a database of thirty cases in the area indexed by thirteen dimensions. A key mechanism in HYPO is a dimension i.e. a mechanism to allow retrieval from the CKB, in order to represent legal cases. Ashley's dimensions are composed of (i) prerequisites, which are a set of factual predicates that must be satisfied for the dimension to apply (ii) focal slots, which accommodate one or two of the dimension's prerequisites designated as being indicative of the case's strength along that dimension and (iii) range information, which tells how a change in focal slot value effects the strength of a party's case along a given dimension. Dimensions focus attention on important aspects of cases. In HYPO's domain of misappropriation of trade secrets the dimension called “secrets voluntary disclosed” captures the idea that the more disclosures the plaintiff has made of his/her putative secret, the less convincing is his/her argument that the defendant is responsible for letting the secret. HYPO, like any other CBR system has also the following components: Similarity/relevancy metrics: that is, standards by which to evaluate the closeness of cases, judge their relevancy to the instant case, and select “most on point” cases. Half-Order Theory of the Application Domain: that is, hierarchies and taxonomies of knowledge, especially regarding the application domain. Precedent-based argumentation abilities: that is, capabilities to generate and evaluate precedent-based arguments. Knowledge to generate hypotheticals: that is, the ability to generate hypothetical cases to deal with various circumstances, like testing the validity of an interpretation or argument by providing gedanken experiments such as test cases or to fill in a weak CKB. == Functions == HYPO's method of creating an argument and justifying a solution or position has several steps. HYPO begins its processing with the current fact situation (cfs) which is direct input by the user into HYPO's representation framework. Once the user inputs the case, HYPO begins its legal analysis. The cfc is analyzed for relevant factors. Based on these factors HYPO selects the relevant cases and produces a case-analysis-record that records which dimensions apply to the cfc and which nearly apply (i.e. are "near misses"). The combined list of applicable and near miss dimensions is called the D-list. At this point the fact gathered module may request additional information from the user in order to draw a legal conclusion. Once all the facts are in the case-positioner module it uses the case-analysis record to create the claim lattice. This is a technique that organizes the relevant retrieved cases from the point of view of the cfc and makes it easy for HYPO to ascertain the most-on point cases (mopc) and to least on-point-cases. HYPO's arguments are 3ply, leading to the construction of the skeleton of an argument: it makes a point for one side, drawing the analogy between the problem and the precedent, responds with an argument for the opponent side, endeavoring to differentiate the cited case and citing other cases as counterarguments. Then it makes a final rebuttal, attempting to differentiate the counterarguments. The claim lattice also enables the HYPO-generator module to produce legally hypotheticals. With its use of dimension-based heuristics, the HYPO-generator does a heuristic search of the space of all possible cases. Lastly, the Explanation module expands upon the argument skeleton and provides explanation and justification for the different lines of analysis and cases found by HYPO. == An intelligent legal tutoring system == Legal expert systems are specifically designed to teach an area of law and are useful for pedagogical purposes. Ashley's work was mainly concerned to build tools to help students understand legal reasoning. Explanation and argument are the bases of the case method used in many professional schools in the U.S., first introduced by the Dean of the Harvard Law School, Christopher Columbus Langdell in 1870. The case method focuses on close readings of cases and principles; it involves students in pointed Socratic dialogue and makes strong use of hypotheticals (hypos). Thus, CATO (Aleven 1997) was a research project to device and test an intelligent, case-based tutorial program for teaching law students how to argue with cases implementing the HYPO program. Within the tutor system, Ashley and Aleven (1991) proposed to leverage an understanding of legal reasoning against the standard case-based tutoring methodology. What makes this tutoring system stand out is the additional levels of abstraction involved in its results. The system presents exercises, including the facts of a problem and a set of on-line cases and instructions to make, or respond to, a legal argument about the problem. The student/user will have a set of tools to analyze the problem and fashion an answer comparing it to other cases. Instead of simply generating precedent cases, the system works to interpret student responses, comparing them against a list of possibilities and responding to student entries, for example, by citing counterexamples, and providing feedback on a student's problem solving activities with explanations of correctness or giving further hints as to what may be wrong with evaluating a student's ability to perform legal reasoning and argument, examples and follow-up assignments by employing HYPO's model of case-based structure. == HYPO’s progeny == The quality of HYPO's results speak for themselves, in that a number of sequent legal reasoning systems are either directly based upon H

    Read more →
  • Herbrand Award

    Herbrand Award

    The Herbrand Award for Distinguished Contributions to Automated Reasoning is an award given by the Conference on Automated Deduction (CADE), Inc., (although it predates the formal incorporation of CADE) to honour persons or groups for important contributions to the field of automated deduction. The award is named after the French scientist Jacques Herbrand and given at most once per CADE or International Joint Conference on Automated Reasoning (IJCAR). It comes with a prize of US$1,000. Anyone can be nominated, the award is awarded after a vote among CADE trustees and former recipients, usually with input from the CADE/IJCAR programme committee. == Recipients == Past award recipients are: === 1990s === Larry Wos (1992) Woody Bledsoe (1994) John Alan Robinson (1996) Wu Wenjun (1997) Gérard Huet (1998) Robert S. Boyer and J Strother Moore (1999) === 2000s === William W. McCune (2000) Donald W. Loveland (2001) Mark E. Stickel (2002). Peter B. Andrews (2003) Harald Ganzinger (2004) Martin Davis (2005) Wolfgang Bibel (2006) Alan Bundy (2007) Edmund M. Clarke (2008) Deepak Kapur (2009) === 2010s === David Plaisted (2010) Nachum Dershowitz (2011) Melvin Fitting (2012) C. Greg Nelson (2013) Robert L. Constable (2014) Andrei Voronkov (2015) Zohar Manna and Richard Waldinger (2016) Lawrence C. Paulson (2017) Bruno Buchberger (2018) Nikolaj Bjørner and Leonardo de Moura (2019) === 2020s === Franz Baader (2020) Tobias Nipkow (2021) Natarajan Shankar (2022) Moshe Vardi (2023) Armin Biere (2024) Aart Middeldorp (2025)

    Read more →