AI Video Tools

Explore the best AI Video Tools — independent reviews, comparisons, pricing and step-by-step how-to guides, curated by Aizhi.

  • Capture the flag (cybersecurity)

    Capture the flag (cybersecurity)

    In computer security, Capture the Flag (CTF) is an exercise in which participants attempt to find text strings, called "flags", which are secretly hidden in purposefully vulnerable programs or websites. They can be used for both competitive or educational purposes. In two main variations of CTFs, participants either steal flags from other participants (attack/defense-style CTFs) or from organizers (jeopardy-style challenges). A mixed competition combines these two styles. Competitions can include hiding flags in hardware devices, they can be both online or in-person, and can be advanced or entry-level. The game is inspired by the traditional outdoor sport with the same name. CTFs are used as a tool for developing and refining cybersecurity skills, making them popular in both professional and academic settings. == Overview == Capture the Flag (CTF) is a cybersecurity competition that is used to test and develop computer security skills. It was first developed in 1996 at DEF CON, the largest cybersecurity conference in the United States which is hosted annually in Las Vegas, Nevada. The conference hosts a weekend of cybersecurity competitions, including their flagship CTF. Two popular CTF formats are jeopardy and attack-defense. Both formats test participant’s knowledge in cybersecurity, but differ in objective. In the Jeopardy format, participating teams must complete as many challenges of varying point values from a various categories such as cryptography, web exploitation, and reverse engineering. In the attack-defense format, competing teams must defend their vulnerable computer systems while attacking their opponent's systems. The exercise involves a diverse array of tasks, including exploitation and cracking passwords, but there is little evidence showing how these tasks translate into cybersecurity knowledge held by security experts. Recent research has shown that the Capture the Flag tasks mainly covered technical knowledge but lacked social topics like social engineering and awareness on cybersecurity. == Educational applications == CTFs have been shown to be an effective way to improve cybersecurity education through gamification. There are many examples of CTFs designed to teach cybersecurity skills to a wide variety of audiences, including PicoCTF, organized by the Carnegie Mellon CyLab, which is oriented towards high school students, and Arizona State University supported pwn.college. Beyond educational CTF events and resources, CTFs has been shown to be a highly effective way to instill cybersecurity concepts in the classroom. CTFs have been included in undergraduate computer science classes such as Introduction to Information Security at the National University of Singapore. CTFs are also popular in military academies. They are often included as part of the curriculum for cybersecurity courses, with the NSA organized Cyber Exercise culminating in a CTF competition between the US service academies and military colleges. == Competitions == Many CTF organizers register their competition with the CTFtime platform. This allows the tracking of the position of teams over time and across competitions. These include "Plaid Parliament of Pwning", "More Smoked Leet Chicken", "Dragon Sector", "dcua", "Eat, Sleep, Pwn, Repeat", "perfect blue", "organizers" and "Blue Water". Overall the "Plaid Parliament of Pwning" and "Dragon Sector" have both placed first worldwide the most with three times each. === Community competitions === Every year there are dozens of CTFs organized in a variety of formats. Many CTFs are associated with cybersecurity conferences such as DEF CON, various editions of SANS Institute's NetWars, HITCON, and BSides. The DEF CON CTF, an attack-defence CTF, is notable for being one of the oldest CTF competitions to exist, and has been variously referred to as the "World Series", "Superbowl", and "Olympics", of hacking by media outlets. The NYU Tandon hosted Cybersecurity Awareness Worldwide (CSAW) CTF is one of the largest open-entry competitions for students learning cybersecurity from around the world. In 2021, it hosted over 1200 teams during the qualification round. In addition to conference organized CTFs, many CTF clubs and teams organize CTF competitions. Many CTF clubs and teams are associated with universities, such as the CMU associated Plaid Parliament of Pwning, which hosts PlaidCTF, and the ASU associated Shellphish. Some community CTFs are online and open to all participants. The SANS Institute Holiday Hack Challenge and TryHackMe Advent of Cyber. === Government-supported competitions === Governmentally supported CTF competitions include the DARPA Cyber Grand Challenge and ENISA European Cybersecurity Challenge. In 2023, the US Space Force-sponsored Hack-a-Sat CTF competition included, for the first time, a live orbital satellite for participants to exploit. === Corporate-supported competitions === Corporations and other organizations sometimes use CTFs as a training or evaluation exercise, with benefits similar to those in educational settings. In addition to internal CTF exercises, some corporations such as Google and Tencent host publicly accessible CTF competitions. == In popular culture == In Mr. Robot, a qualification round for the DEF CON CTF competition is depicted in the season 3 opener "eps3.0_power-saver-mode.h". The logo for DEF CON can be seen in the background. In The Undeclared War, a CTF is depicted in the opening scene of the series as a recruitment exercise used by GCHQ. Go Go Squid!, a Chinese television series, is based around training for and competing in highly stylized CTF competitions .

    Read more →
  • Sora (text-to-video model)

    Sora (text-to-video model)

    Sora was a text-to-video model and social media app developed by OpenAI. Using artificial intelligence, the model generated short video clips based on prompts, and could also extend existing short videos. In February 2024, OpenAI previewed examples of its output to the public, with the first generation of Sora released publicly for ChatGPT Plus and ChatGPT Pro users in the United States and Canada in December 2024. The second generation of Sora was released to select users in the US and Canada at the end of September 2025. Sora 2 integrated social media features into the app. The app was shut down on April 26, 2026 and the application programming interface (API) is planned to be discontinued on September 24, 2026, marking the end of the Sora AI brand as a whole. By default, the generator used copyrighted material in its videos, unless copyright holders actively opt out of having their content included. Videos contained a visible, moving digital watermark to prevent misuse, but a week after Sora 2's release, third-party programs became available which could remove the watermark. == Background == Several other models capable of generating video from text had been created prior to Sora, including Meta's Make‑A‑Video, Runway's Gen‑2 and Google Veo. OpenAI, the company behind Sora, had released DALL·E 3, the third of its DALL-E text-to-image models, in September 2023. == History == === Initial release === The team that developed Sora named it after the Japanese word for 'sky' to signify its "limitless creative potential". On February 15, 2024, OpenAI first previewed Sora by releasing multiple clips of high-definition videos that it had created, including an SUV driving down a mountain road, an animation of a "short fluffy monster" next to a candle, two people walking through Tokyo in the snow, and fake historical footage of the California gold rush. OpenAI stated that it was able to generate videos as long as one minute. The company then shared a technical report that highlighted the methods used to train the model. OpenAI CEO Sam Altman also posted a series of tweets responding to Twitter users' prompts with Sora-generated videos of the prompts. As of December 9, 2024, OpenAI had gradually made Sora available to the public for ChatGPT Pro and ChatGPT Plus users in the U.S. and Canada. Prior to this, the company had provided limited access to a small "red team", including experts in misinformation and bias, to perform adversarial testing on the model. The company also shared Sora with a small group of creative professionals, including video makers and artists, to seek feedback on its usefulness in creative fields. In February 2025, OpenAI announced plans to integrate Sora into ChatGPT by letting users generate Sora videos from the chatbot. === Sora 2 === Sora 2 was unveiled on September 30, 2025, with an iOS app at the same time, as well as an Android app two months later. All videos generated by the model feature a visible, moving watermark to prevent misuse of the tool. The previous version of Sora also added a safety watermark to allow viewers to distinguish between real and fictional content. On October 7, 404 Media reported that third-party programs that could remove the watermark from Sora 2 videos had become prevalent. Many outlets, such as Wired magazine, have noted that the Sora 2 app is overtly similar to TikTok in style and features. === Discontinuation === On March 24, 2026, OpenAI announced on X that it was discontinuing Sora in both the mobile app and the API. The Sora app was shut down on April 26, 2026, while the API is planned to be shut down on September 24, 2026. OpenAI's partnership with Disney, which included a licensing agreement allowing Disney characters to be used within Sora, was also coming to an end. The decision prompted British technology news website The Register to label OpenAI a "product-killer", following in the footsteps of other technology companies such as Google, Amazon Web Services, Broadcom, Cloud Software Group, and Netscape. OpenAI did not provide a specific reason for discontinuing Sora in its shutdown notice. The reports that emerged regarding this discontinuity linked the decision to computation shortages, cost pressures, and a broader shift toward core enterprise products. Following its public launch, Sora's worldwide users peaked at around a million before declining to fewer than 500,000, while the service cost an estimated $1 million per day to operate due to the computational demands of video generation. == Legal regulation == In November 2024, an API key for Sora access was leaked by a group of testers on Hugging Face who posted a manifesto stating that they were protesting that Sora was used for "art washing". OpenAI revoked all access three hours after the leak was made public and stated that "hundreds of artists" have shaped the development and that "participation is voluntary". At the time of its launch, Sora 2 allowed copyrighted content by default unless copyright holders contacted OpenAI to restrict the generation of their content on the platform. On October 3, 2025, OpenAI stated that a future update to Sora 2 would give copyright holders "more granular control" over the generation of copyrighted content, but the company did not state whether existing content would be removed. On October 6, the chairman of the MPA criticized OpenAI's approach to copyright with Sora 2. On December 11, 2025, the Walt Disney Company announced that it would invest $1 billion in OpenAI to allow users to generate more than 200 of its copyrighted characters on Sora 2. These characters include those from Disney Animation, Pixar, Marvel Studios, and Star Wars. == Capabilities and limitations == The technology behind Sora is an adaptation of the technology behind DALL-E 3. According to OpenAI, Sora is a diffusion transformer, a denoising latent diffusion model with one transformer as its denoiser. A video is generated in latent space by denoising 3D "patches", then transformed to standard space by a video decompressor. Recaptioning is employed to augment training data by using a video-to-text model to create detailed captions for videos. OpenAI trained the model using publicly available videos as well as copyrighted videos licensed for the purpose, but did not reveal the number or the exact source of the videos. Upon its release, OpenAI acknowledged some of Sora's shortcomings, including its limited capacity to simulate complex physics, to understand causality and to differentiate left from right. OpenAI also stated that, in adherence to the company's existing safety practices, Sora will restrict text prompts for sexual, violent, hateful or celebrity imagery, as well as content featuring existing intellectual property. Sora researcher Tim Brooks stated that the model learned how to create 3D graphics from its dataset alone, while fellow Sora researcher Bill Peebles said that the model automatically created different video angles without being prompted. According to OpenAI, Sora-generated videos are also tagged with C2PA metadata to indicate that they are AI-processed. === Comparison with other models === The Artificial Analysis have placed Sora 2 pro lower than other text-to-video AI generators in the market on its leaderboard. Other models, such as Seedance 2.0 from ByteDance, Runaway 4.5 from Runaway, and Kling 3.0 from KlingAI, have ranked higher than Sora 2.0. == Reception == === Positive === In 2024, Will Douglas Heaven of the MIT Technology Review called the demonstration videos "impressive", but noted that they must have been cherry-picked and may not be representative of Sora's typical output. Lisa Lacy of CNET called its example videos "remarkably realistic – except perhaps when a human face appears close up or when sea creatures are swimming". In October 2025, The New York Times remarked that the release of the Sora 2 app in September 2025 was "jaw-dropping (for better and worse)" though also remarked that the app was a "social network in disguise" and "the type of product that companies like Meta and X have sought to build: a way to bring A.I. to the masses that people can share." The article expressed concern regarding the product's potential impact on society and its potential use to promote misinformation, disinformation, and scams. A 2025 study in Science Advances found that generative AI tools can lower barriers to entry in creative work. It enables users with diverse skill sets, including people with less formal artistic training and technical skills, to act on their creative and imaginative ideas. The lower barrier to entry allows such users previously locked out of the creative industry to produce content and easily act on their creative ideas. === Negative === Some internet users and online content creators, such as Hank Green, called the mobile app "SlopTok," a reference to both the mobile app TikTok and the term AI slop. Filmmaker Tyler Perry announced he would be putting a planned

    Read more →
  • Land of Memories

    Land of Memories

    Land of Memories (Chinese: 机忆之地) is a Chinese science-fiction novel by Shen Yang (沈阳), a professor at Tsinghua University's School of Journalism and Communication. The story revolves around a former neuroscientist trying to recover her memories from the metaverse after suffering amnesia due to an accident. It contains almost 6,000 Chinese characters and was shortened from an AI-generated draft that was 43,000 characters long. The process involved 66 prompts spanning almost three hours. The novel was among 18 submissions that won the level-two prize at the Fifth Jiangsu Youth Science Education and Science Fiction Competition (第五届江苏省青年科普科幻作品大赛). The contest was restricted to participants between the age of 14 and 45 but did not forbid entries generated by AI. One of its organizers reached out to Shen after finding out that the professor had been experimenting with writing science fiction using AI. The judges were not told about the novel's origin in advance. Three of them, out of the six, approved the work. One judge, who had worked with AI models before, recognized that the novel was written by AI and criticized the work for lacking emotional appeal. The organizer who had contacted Shen said the novel's introduction was not bad but the story did not develop well. It would not meet the usual standards for publication. However, he still plans to allow AI-generated submissions in 2024. Fu Ruchu, editorial department director of the People's Literature Publishing House, said the novel was not easily identifiable as AI-generated and applauded its logical consistency. She warned that artificial intelligence could endanger the jobs of fiction writers and cause permanent damage to literary language.

    Read more →
  • Willy's Chocolate Experience

    Willy's Chocolate Experience

    Willy's Chocolate Experience was an unlicensed event based on Charlie and the Chocolate Factory that took place in Glasgow, Scotland, in February 2024. The event was promoted as an immersive and interactive family experience, illustrated on a promotional website with "dreamlike" AI-generated images. Once it was discovered that the event was held in a sparsely decorated warehouse, many customers complained, and the police were called to the venue. The event went viral on the Internet and attracted worldwide media attention. The event drew comparisons to the 2008 Lapland New Forest controversy, the 2014 Tumblr fan convention DashCon, and Billy McFarland's 2017 Fyre Festival. == Background and advertising == The event was stated to take place over the weekend of 24–25 February 2024. Promotional material advertised "stunning and intricately designed settings inspired by Roald Dahl's timeless tale" and "an array of delectable treats scattered throughout the experience". Both the website and promotional material used poor-quality AI-generated images, which included several spelling errors such as "cartchy tuns" and "a pasadise of sweet teats" and nonsensical words such as "catgacating" and "exarserdray". Tickets cost up to £35 per person. While the event was being promoted in early February, a Reddit user who saw Facebook advertisements suspected it to be a scam and was surprised that people were apparently buying tickets based solely on AI-generated images. The event was organised by House of Illuminati, a company registered to Billy Coull which claimed to offer "unparalleled immersive experiences". An investigation by Third Force News conducted after the event described Coull's previous "murky involvement in the charity sector." Coull had previously registered several other companies and claimed to work as a "consultant" for the now-defunct brand Empowerity, formerly known as the charity Gowanbank Community Hub. In 2021, Gowanbank was forced to remove claims of a £95-per-ticket fundraising "gala" at DoubleTree Glasgow which had been falsely advertised to feature TV personalities and performers including Gok Wan and Joe Black. Coull had claimed to be a doctor with a fake degree from a false university that provided "metaphysical degrees", and had attempted to use the charity to win the 2022 Glasgow City Council election in the seat of Greater Pollok, though he never registered for the election. In the summer of 2023, he independently published 17 AI-generated books on various topics, including vaccine conspiracy theories. Rolling Stone concluded that House of Illuminati's websites and event descriptions were likely written by an AI chatbot, such as ChatGPT. Three actors were hired to portray "Willy McDuff", a character based on Willy Wonka. One of them, Paul Connell, said that the cast were given one day to learn the script. Another actor playing Willy McDuff was 18-year-old Michael Archibald; the experience was his first ever acting job, and he was given the script at 6 pm on Friday before the event began on Saturday. Kirsty Paterson, an actress who played one of the Oompa-Loompas (called "Wonkidoodles" in the script), said that the job offer had been posted on Indeed.com and offered £500 for two days of work. The day before the event, the actors attended a dress rehearsal at the sparsely decorated venue. They were told that others would be working through the night on the production. When they returned on the day of the event, the venue was in the same condition. Paterson was given her costume an hour before the event opened, saying that "We were just handed an Amazon box that probably arrived that morning." == Script == The script for the event is titled Wonkidoodles at McDuff's Chocolate Factory: A Script, and describes Willy McDuff leading an audience through the Garden of Enchantment and the Twilight Tunnel. Once there, they are confronted by a character called The Unknown, described as "an evil chocolate maker who lives in the walls" who seeks to steal the magical "Anti-Graffiti Gobstopper" from McDuff's Imagination Lab. The gobstopper is "a sweet so powerful, it can make any room sparkle without lifting a finger". McDuff defeats The Unknown by amplifying the power of the gobstopper and causing his enemy to be "gently swept up by a robotic vacuum, humorously ending the confrontation". The script was unusual in that it included stage directions for the audience, and descriptions of their reactions. Connell described it as "15 pages of AI-generated gibberish of me just monologuing these mad things", and compared the vacuum cleaner plot point to that of the Nintendo video game Luigi's Mansion. Interviewed after the event, Coull claimed to have written the script himself, using AI only to "check spelling, grammar, and continuity" as he said he had dyslexia. == Event == The event was held at the Box Hub Warehouse event space in Whiteinch, an industrial area of Glasgow. Customers described the venue as "little more than an abandoned, empty warehouse", with set dressings including a small bouncy castle, AI-generated backdrop images pinned to some of the walls, and props which were "strewn about on bare concrete floors". The venue's windows were dirty and its air conditioning systems were left exposed. Paterson has stated that by the time she saw the venue, she had already signed her contract and "didn't want to disappoint the kids", and thus chose to proceed with the work. The Unknown was played by a 16-year-old actress named Felicia Dawkins, who wore a silver mask and a black cloak. Young children were frightened by the character, who appeared from behind a large rectangular mirror. Despite the script calling for The Unknown to be defeated with a vacuum cleaner, no such prop was provided, and actors were instead asked to improvise. Connell said that he and other employees were told to give each child "two jelly beans and a quarter of a cup of lemonade", although the limited supply of jelly beans quickly ran out. Paterson and another "Wonkidoodle" actress, Jenny Fogarty, said that after the first three 45-minute performances, the cast were told to abandon the script and instead let guests walk through the venue, a process that Paterson said took "about two minutes". The character of The Unknown, previously introduced as the main antagonist, was now "scaring children for no reason". One of the actors playing McDuff improvised the idea that children should pull a "silly face" at The Unknown to scare them away, but Dawkins said that, in other cases, she "just had to awkwardly walk back to my corner". Connell was told he would be given a 15-minute break every 45 minutes, but on the day of the event, he played Willy McDuff for three and a half hours without a break. After returning from a lunch break, Connell encountered a crowd of customers demanding refunds from Coull, and the other actors were unsure what to do next. After being told that the event was now cancelled halfway through its opening day, the actors left and went to a pub. Upon returning to the venue some time later, Connell said that he felt "the threat of violence had become quite high" and that there were two police vans and two squad cars at the scene. == Customer reviews and response == Willy's Chocolate Experience was widely criticised by those who attended it, many of whom demanded refunds. One customer, who had driven with his children for two hours to reach the event, described it as an "absolute con". Other visitors who arrived after the event was closed and were not informed of its cancellation requested compensation for wasted rail fares. Following the event's cancellation, Coull offered to refund 850 people, a statement repeated by the event's Facebook page. Some Facebook users stated that they had received their money back. Paterson and Fogarty stated that they only received half of their paycheque. Box Hub, the organisation that had rented the warehouse to House of Illuminati, issued an apology on House of Illuminati's behalf, stating that they "either have no regards for the families and young children they have disappointed or are too embarrassed to comment", and offered to provide a venue free of charge for those who attended the event. House of Illuminati later stated that they would not host any future events. Coull deleted his LinkedIn profile, his YouTube channel, and his personal website in response to the controversy. A few days after the event, Connell said he felt that Coull was "probably one of the most disliked people in Glasgow right now". In an interview with The Sunday Times, Coull apologised for how the event turned out, saying he would accept responsibility. == Fundraising == In an interview with Wired magazine, Connell stated that he and the other actors were working with parents to provide a free show for the children who attended. Some items from the event were later auctioned for charity. The venue auctioned the leftover hand-written "even

    Read more →
  • Automated restaurant

    Automated restaurant

    An automated restaurant or robotic restaurant is a restaurant that uses robots to do tasks such as delivering food and drink to the tables or cooking the food. Restaurant automation means the use of a restaurant management system to automate some or occasionally all of the major operations of a restaurant establishment. More recently, restaurants are opening that have completely or partially automated their services. These may include: taking orders, preparing food, serving, and billing. A few fully automated restaurants operate without any human intervention whatsoever. Robots are designed to help and sometimes replace human labour (such as waiters and chefs). The automation of restaurants may also allow for the option for greater customization of an order. == History == === Vending machines === In the late 19th and early 20th century a number of restaurants served food solely through vending machines. These restaurants were called automats or, in Japan, shokkenki. Customers ordered their food directly through the machines. === Sushi conveyors === Yoshiaki Shiraishi is a Japanese innovator who is known for the creation of conveyor belt sushi. He had the idea following difficulty staffing his small sushi restaurant and managing the restaurant on his own. He was inspired seeing beer bottles on a conveyor belt in an Asahi brewery. Yoshiaki's restaurants are an early example of restaurant automation; they used a conveyor belt to distribute dishes around the restaurant, eliminating the need for waiters. This example of automation dates back to the Japanese economic miracle; the first of Yoshiaki's conveyor belt sushi restaurants was opened under the name Mawaru Genroku Sushi in 1958, in Osaka. === Partial automation === As of 2011, across Europe, McDonald's had already begun implementing 7,000 touch screen kiosks that could handle cashiering duties. From 2015 to 2020, Zume had an automated pizza parlor. Later companies would try to produce smaller, less ambitious devices, with one robotics company producing a machine that could automate the slowest and most repetitive parts of assembling a pizza, such as spreading pizza sauce or placing slices of pepperoni, while leaving other customizations to employees. In 2020, a restaurant in the Netherlands began trialling the use of a robot to serve guests. In September 2021, Karakuri's 'Semblr' food service robot served personalised lunches for the 4,000 employees of grocery technology solutions provider ocado Group's head offices in Hatfield, UK. 2,700 different combinations of dishes were on offer. Customers could specify in grams what hot and cold items, proteins, sauces and fresh toppings they wanted. In 2021, Columbia University School of Engineering and Applied Science engineers developed a method of cooking 3D printed chicken with software-controlled robotic lasers. The “Digital Food” team exposed raw 3D printed chicken structures to both blue and infrared light. They then assessed the cooking depth, colour development, moisture retention and flavour differences of the laser-cooked 3D printed samples in comparison to stove-cooked meat. In June 2022 a California nonprofit chain of residential communities, Front Porch, experimented with robots in dining rooms at two locations to supplement wait staff by carrying plated food and drink to tables, and removing dishes. 65% of residents found the robots helpful, with 51% saying they let the staff spend more quality time with diners. 51% of staff were "excited" and 58% said they enabled more quality time with diners. The chain has 19 senior living communities (and 35 affordable housing communities), so it has potential to expand robots to more dining rooms. It is shifting to memory care, which may affect plans. == Rationales == === Advantages === Efficiency: Automated restaurants can significantly enhance operational efficiency by minimizing human error and reducing service time. With automated ordering, payment, and food preparation systems, customers can enjoy faster service and reduced waiting times. Cost savings: By reducing the need for human staff, automated restaurants can potentially lower labor costs. This can be particularly beneficial in areas with high labor expenses, as it allows for better resource allocation and cost management. Consistency: Automation ensures consistency in food quality and presentation. With precise portion control and standardized cooking methods, customers can expect the same quality and taste in their meals every time they visit. Enhanced customer experience: Self-service kiosks and automated systems provide customers with control and convenience. They can customize their orders, browse through menu options, and pay seamlessly, creating a more interactive and satisfying dining experience. === Disadvantages === Lack of personal touch: Automated restaurants may lack the personal interaction and warmth that traditional restaurants provide. Some customers prefer the human touch, personalized recommendations, and the social aspect of dining out. Technical issues: Reliance on technology means that technical glitches and malfunctions can occur, resulting in service disruptions or delays. Maintenance and technical support become critical in ensuring smooth operations. Limited menu complexity: The automation process may be better suited for standardized menu items rather than complex or customized dishes. The ability to cater to unique dietary preferences or accommodate special requests may be limited. Employment implications: Automated restaurants may result in job losses for traditional restaurant staff, potentially impacting the local workforce. It is important to consider the social and economic implications of adopting such technology. == Locations == Automated restaurants have been opening in many countries. Examples include: Nala Restaurant in Naperville, Illinois Fritz's Railroad Restaurant in Kansas City, Kansas Výtopna, a Railway Restaurant using model trains: franchise of various restaurants and coffeehouses in the Czech Republic Bagger's Restaurant in Nuremberg, Germany FuA-Men Restaurant, a ramen restaurant located in Nagoya, Japan Fōster Nutrition in Buenos Aires, Argentina Dalu Robot Restaurant in Jinan, China Haohai Robot Restaurant in Harbin, China Robot Kitchen Restaurant in Hong Kong Robo-Chef restaurant in Tehran, Iran, started in 2017, is the first robotic and "waiterless" restaurant of the Middle East. MIT graduates opened Spyce Kitchens in downtown Boston, Massachusetts, in 2018 Foodom, under Country Garden Holdings, opened January 12, 2020, in Guangzhou, China Robot Chacha, the first robot restaurant of India, is planning to open in the capital city of New Delhi. Kura Revolving Sushi Bar, with a number of locations in the United States, uses a tablets at tables for ordering, a conveyor belt to deliver food, and robots to deliver drinks and condiments. Chipotle Mexican Grill is beginning to deploy the Hyphen Makeline, which assembles up to 350 bowls and salads automatically per hour, and Chippy, an automatic tortilla chip fryer made by Miso Robotics. Serious Dumplings in Boca Raton, Florida

    Read more →
  • Conference on Neural Information Processing Systems

    Conference on Neural Information Processing Systems

    The Conference on Neural Information Processing Systems (abbreviated as NeurIPS and formerly NIPS) is a machine learning and computational neuroscience conference held annually in December. Along with ICLR and ICML, it is one of the three primary conferences of high impact in machine learning and artificial intelligence research. The conference includes three days of invited talks along with oral and poster presentations of refereed papers, followed by two days of workshops and competitions. == History == The NeurIPS meeting was first proposed in 1986 at the annual invitation-only Snowbird Meeting on Neural Networks for Computing organized by The California Institute of Technology and Bell Laboratories. NeurIPS was designed as a complementary open interdisciplinary meeting for researchers exploring biological and artificial Neural Networks. Reflecting this multidisciplinary approach, NeurIPS began in 1987 with information theorist Ed Posner as the conference president and learning theorist Yaser Abu-Mostafa as program chairman. Research presented in the early NeurIPS meetings included a wide range of topics from efforts to solve purely engineering problems to the use of computer models as a tool for understanding biological nervous systems. Since then, the biological and artificial systems research streams have diverged, and recent NeurIPS proceedings have been dominated by papers on machine learning, artificial intelligence and statistics. From 1987 until 2000 NeurIPS was held in Denver, United States. Since then, the conference was held in Vancouver, Canada (2001–2010), Granada, Spain (2011), and Lake Tahoe, United States (2012–2013). In 2014 and 2015, the conference was held in Montreal, Canada, in Barcelona, Spain in 2016, in Long Beach, United States in 2017, in Montreal, Canada in 2018 and Vancouver, Canada in 2019. Reflecting its origins at Snowbird, Utah, the meeting was accompanied by workshops organized at a nearby ski resort up until 2013, when it outgrew ski resorts. The first NeurIPS Conference was sponsored by the IEEE. The following NeurIPS Conferences have been organized by the NeurIPS Foundation, established by Ed Posner. Terrence Sejnowski has been the president of the NeurIPS Foundation since Posner's death in 1993. The board of trustees consists of previous general chairs of the NeurIPS Conference. The first proceedings was published in book form by the American Institute of Physics in 1987, and was entitled Neural Information Processing Systems, then the proceedings from the following conferences have been published by Morgan Kaufmann (1988–1993), MIT Press (1994–2004) and Curran Associates (2005–present) under the name Advances in Neural Information Processing Systems. The conference was originally abbreviated as "NIPS". By 2018 a few commentators were criticizing the abbreviation as encouraging sexism due to its association with the word nipples, and as being a slur against Japanese. The board changed the abbreviation to "NeurIPS" in November 2018. == Topics == Along with machine learning and neuroscience, other fields represented at NeurIPS include cognitive science, psychology, computer vision, statistical linguistics, and information theory. Over the years, NeurIPS became a premier conference on machine learning and although the 'Neural' in the NeurIPS acronym had become something of a historical relic, the resurgence of deep learning in neural networks since 2012, fueled by faster computers and big data, has led to achievements in speech recognition, object recognition in images, image captioning, language translation and world championship performance in the game of Go, based on neural architectures inspired by the hierarchy of areas in the visual cortex (ConvNet) and reinforcement learning inspired by the basal ganglia (Temporal difference learning). Notable affinity groups have emerged from the NeurIPS conference and displayed diversity, including Black in AI (in 2017), Queer in AI (in 2016), and others. === Named lectures === In addition to invited talks and symposia, NeurIPS also organizes two named lectureships to recognize distinguished researchers. The NeurIPS Board introduced the Posner Lectureship in honor of NeurIPS founder Ed Posner; two Posner Lectures were given each year up to 2015. Past lecturers have included: 2010 – Josh Tenenbaum and Michael I. Jordan 2011 – Rich Sutton and Bernhard Schölkopf 2012 – Thomas Dietterich and Terry Sejnowski 2013 – Daphne Koller and Peter Dayan 2014 – Michael Kearns and John Hopfield 2015 – Zoubin Ghahramani and Vladimir Vapnik 2016 – Yann LeCun 2017 – John Platt 2018 – Joëlle Pineau 2019 – Yoshua Bengio 2020 – Christopher Bishop 2021 – Peter Bartlett In 2015, the NeurIPS Board introduced the Breiman Lectureship to highlight work in statistics relevant to conference topics. The lectureship was named for statistician Leo Breiman, who served on the NeurIPS Board from 1994 to 2005. Past lecturers have included: 2015 – Robert Tibshirani 2016 – Susan Holmes 2017 – Yee Whye Teh 2018 – David Spiegelhalter 2019 – Bin Yu 2020 – Marloes Maathuis 2021 – Gabor Lugosi 2022 – Emmanuel Candes 2023 – Susan Murphy 2024 – Arnaud Doucet == NeurIPS consistency experiment == In NIPS 2014, the program chairs duplicated 10% of all submissions and sent them through separate reviewers to evaluate randomness in the reviewing process. Several researchers interpreted the result. Regarding whether the decision in NIPS is completely random or not, John Langford writes: "Clearly not—a purely random decision would have arbitrariness of ~78%. It is, however, quite notable that 60% is much closer to 78% than 0%." He concludes that the result of the reviewing process is mostly arbitrary. In NeurIPS 2021, the program chairs repeated the 2014 experiment and found similar levels of review inconsistency; 23% of duplicated submissions received different accept/reject decisions, and 50.6% of accepted papers would have been rejected under re-review. == Locations == 1987–2000: Denver, Colorado, United States 2001–2010: Vancouver, British Columbia, Canada 2011: Granada, Spain 2012 & 2013: Stateline, Nevada, United States 2014 & 2015: Montréal, Quebec, Canada 2016: Barcelona, Spain 2017: Long Beach, California, United States 2018: Montréal, Quebec, Canada 2019: Vancouver, British Columbia, Canada 2020: Vancouver, British Columbia, Canada (virtual conference) 2021: Virtual conference 2022 & 2023: New Orleans, Louisiana, United States 2024: Vancouver, British Columbia, Canada 2025: San Diego, California, United States and Mexico City, Mexico 2026: Sydney, New South Wales, Australia, with satellite events in Atlanta and Paris

    Read more →
  • Speech-generating device

    Speech-generating device

    Speech-generating devices (SGDs), also known as voice output communication aids, are electronic augmentative and alternative communication (AAC) systems used to supplement or replace speech or writing for individuals with severe speech impairments, enabling them to verbally communicate. SGDs are important for people who have limited means of interacting verbally, as they allow individuals to become active participants in communication interactions. They are particularly helpful for patients with amyotrophic lateral sclerosis (ALS) but recently have been used for children with predicted speech deficiencies. There are several input and display methods for users of varying abilities to make use of SGDs. Some SGDs have multiple pages of symbols to accommodate a large number of utterances, and thus only a portion of the symbols available are visible at any one time, with the communicator navigating the various pages. Speech-generating devices can produce electronic voice output by using digitized recordings of natural speech or through speech synthesis—which may carry less emotional information but can permit the user to speak novel messages. The content, organization, and updating of the vocabulary on an SGD is influenced by a number of factors, such as the user's needs and the contexts that the device will be used in. The development of techniques to improve the available vocabulary and rate of speech production is an active research area. Vocabulary items should be of high interest to the user, be frequently applicable, have a range of meanings, and be pragmatic in functionality. There are multiple methods of accessing messages on devices: directly or indirectly, or using specialized access devices—although the specific access method will depend on the skills and abilities of the user. SGD output is typically much slower than speech, although rate enhancement strategies can increase the user's rate of output, resulting in enhanced efficiency of communication. The first known SGD was prototyped in the mid-1970s, and rapid progress in hardware and software development has meant that SGD capabilities can now be integrated into devices like smartphones. Notable users of SGDs include Stephen Hawking, Roger Ebert, Tony Proudfoot, and Pete Frates (founder of the ALS Ice Bucket Challenge). Speech-generating systems may be dedicated devices developed solely for AAC, or non-dedicated devices such as computers running additional software to allow them to function as AAC devices. == History == SGDs have their roots in early electronic communication aids. The first such aid was a sip-and-puff typewriter controller named the patient-operated selector mechanism (Naman) prototyped by Reg Maling in the United Kingdom in 1960. POSSUM scanned through a set of symbols on an illuminated display. Researchers at Delft University in the Netherlands created the lightspot-operated typewriter (LOT) in 1970, which made use of small movements of the head to point a small spot of light at a matrix of characters, each equipped with a photoelectric cell. Although it was commercially unsuccessful, the LOT was well received by its users. In 1966, Barry Romich, a freshman engineering student at Case Western Reserve University, and Ed Prentke, an engineer at Highland View Hospital in Cleveland, Ohio, formed a partnership, creating the Prentke Romich Company. In 1969, the company produced its first communication device, a typing system based on a discarded Teletype machine. In 1979, Mark Dahmke developed software for a vocal communication aid program using the Computalker CT-1 analog speech synthesizer with a microcomputer. The software utilized phonemes to generate speech, assisting individuals with communication impairments in constructing words and sentences. Dahmke's work contributed to the advancement of assistive technology for people with disabilities. Notably, he designed the "Vocabulary Management System" for Bill Rush, a student with cerebral palsy. This early speech synthesis technology facilitated improved communication for Rush and was featured in a 1980 issue of LIFE Magazine. Dahmke's contributions have influenced the development of augmentative and alternative communication (AAC) technologies. During the 1970s and early 1980s, several other companies emerged that have since become prominent manufacturers of SGDs. Toby Churchill founded Toby Churchill Ltd in 1973, after losing his speech following encephalitis. In the US, Dynavox (then known as Sentient Systems Technology) grew out of a student project at Carnegie-Mellon University, created in 1982 to help a young woman with cerebral palsy to communicate. Beginning in the 1980s, improvements in technology led to a greatly increased number, variety, and performance of commercially available communication devices, and a reduction in their size and price. Alternative methods of access such as Target Scanning (also known as eye pointing) calibrate the movement of a user's eyes to direct an SGD to produce the desired speech. Scanning, in which alternatives are presented to the user sequentially, became available on communication devices. Speech output possibilities included both digitized and synthesized speech. Rapid progress in hardware and software development continued, including projects funded by the European Community. The first commercially available dynamic screen speech-generating devices were developed in the 1990s. Software was developed that allowed the computer-based production of communication boards. High-tech devices have continued to become smaller and lighter, while increasing accessibility and capability; communication devices can be accessed using eye-tracking systems, perform as a computer for word-processing and Internet use, and as an environmental control device for independent access to other equipment such as TV, radio and telephones. Stephen Hawking came to be associated with the unique voice of his particular synthesis equipment. Hawking was unable to speak due to a combination of disabilities caused by ALS, and an emergency tracheotomy. In the past 20 or so years SGD have gained popularity amongst young children with speech deficiencies, such as autism, Down syndrome, and predicted brain damage due to surgery. Starting in the early 2000s, specialists saw the benefit of using SGDs not only for adults but for children, as well. Neuro-linguists found that SGDs were just as effective in helping children who were at risk for temporary language deficits after undergoing brain surgery as it is for patients with ALS. In particular, digitized SGDs have been used as communication aids for pediatric patients during the recovery process. == Access methods == There are many methods of accessing messages on devices: directly, indirectly, and with specialized access devices. Direct access methods involve physical contact with the system, by using a keyboard or a touch screen. Users accessing SGDs indirectly and through specialized devices must manipulate an object in order to access the system, such as maneuvering a joystick, head mouse, optical head pointer, light pointer, infrared pointer, or switch access scanner. The specific access method will depend on the skills and abilities of the user. With direct selection a body part, pointer, adapted mouse, joystick, or eye tracking could be used, whereas switch access scanning is often used for indirect selection. Unlike direct selection (e.g., typing on a keyboard, touching a screen), users of Target Scanning can only make selections when the scanning indicator (or cursor) of the electronic device is on the desired choice. Those who are unable to point typically calibrate their eyes to use eye gaze as a way to point and blocking as a way to select desired words and phrases. The speed and pattern of scanning, as well as the way items are selected, are individualized to the physical, visual and cognitive capabilities of the user. == Message construction == Augmentative and alternative communication is typically much slower than speech, with users generally producing 8–10 words per minute. Rate enhancement strategies can increase the user's rate of output to around 12–15 words per minute, and as a result enhance the efficiency of communication. In any given SGD there may be a large number of vocal expressions that facilitate efficient and effective communication, including greetings, expressing desires, and asking questions. Some SGDs have multiple pages of symbols to accommodate a large number of vocal expressions, and thus only a portion of the symbols available are visible at any one time, with the communicator navigating the various pages. Speech-generating devices generally display a set of selections either using a dynamically changing screen, or a fixed display. There are two main options for increasing the rate of communication for an SGD: encoding, and prediction. Encoding permits a user to produce a word, sentence or phrase using only on

    Read more →
  • Bixonimania

    Bixonimania

    Bixonimania is a fake disease invented by researchers to examine artificial intelligence and its ability to utilize information in medical and healthcare applications. The fake enabled researchers to show that some AI chatbots would report as fact fake research that to an expert would be obviously implausible. == Characteristics == The disorder, with symptoms of sore eyes and darkening around them ("periorbital hyperpigmentation"), is supposedly caused by blue light from screens. The experiment was conducted by a team from the University of Gothenburg led by Almira Osmanovic Thunström. Many steps were taken to ensure that any person who read the actual paper could tell it was not a real condition. The team chose an obviously inappropriate name ending in -mania, a description used only in psychiatry. The lead author was noted as belonging to Asteria Horizon University located in Nova City, California, neither of which exist. An acknowledgement was made to "Professor Maria Bohm at The Starfleet Academy for her kindness and generosity in contributing with her knowledge and her lab onboard the USS Enterprise". == Distribution == The name was first used in a blog posted on Medium titled "How many people suffer from Bixonimania?" A more scholarly-looking paper describing it was posted later in April 2024 on a preprint server with several fake authors. A second paper was posted in May. By 2026, AI chatbots suggested bixonimania based on the list of symptoms provided. Thunström and her team discovered that many LLMs processed the information and gave it as health advice. Microsoft Copilot declared that "Bixonimania is indeed an intriguing and relatively rare condition" while Gemini gave the information that "Bixonimania is a condition caused by excessive exposure to blue light". Three Indian researchers published a research paper that cited the preprint on the fake disease in Cureus, a peer-reviewed journal published by Springer-Nature. It was subsequently retracted. Following the revelations and a news article in Nature describing the experiment, several AI systems began to generate corrected output.

    Read more →
  • Tridium

    Tridium

    Tridium Inc. is an American engineering hardware and software company based in Richmond, Virginia, whose products facilitate and integrate the automation of building and other engineering control systems. Since November 2005, the company has operated as an independent business entity of Honeywell International Inc. == History == Tridium Inc. was founded in 1995. In 1999, Tridium launched the Niagara Framework, a software infrastructure that connects all systems and devices to a central console. In 2002, John Petze became president and CEO, replacing Jerry Frank. The company was acquired by Honeywell International Inc in 2005. == Products == Tridium's products facilitate by integrating building automation using open and proprietary communications protocols such as Modbus, Lonworks and BACnet. Tridium is the developer of Niagara Framework. The Niagara Framework is a universal software infrastructure that allows building controls integrators, HVAC and mechanical contractors to build custom, web-enabled applications for accessing, automating and controlling smart devices real-time via local network or over the Internet.

    Read more →
  • Type-1 OWA operators

    Type-1 OWA operators

    Type-1 OWA operators are a set of aggregation operators that generalise the Yager's OWA (ordered weighted averaging) operators in the interest of aggregating fuzzy sets rather than crisp values in soft decision making and data mining. These operators provide a mathematical technique for directly aggregating uncertain information with uncertain weights via OWA mechanism in soft decision making and data mining, where these uncertain objects are modelled by fuzzy sets. The two definitions for type-1 OWA operators are based on Zadeh's Extension Principle and α {\displaystyle \alpha } -cuts of fuzzy sets. The two definitions lead to equivalent results. == Definitions == === Definition 1 === Let F ( X ) {\displaystyle F(X)} be the set of fuzzy sets with domain of discourse X {\displaystyle X} , a type-1 OWA operator is defined as follows: Given n linguistic weights { W i } i = 1 n {\displaystyle \left\{{W^{i}}\right\}_{i=1}^{n}} in the form of fuzzy sets defined on the domain of discourse U = [ 0 , 1 ] {\displaystyle U=[0,1]} , a type-1 OWA operator is a mapping, Φ {\displaystyle \Phi } , Φ : F ( X ) × ⋯ × F ( X ) ⟶ F ( X ) {\displaystyle \Phi \colon F(X)\times \cdots \times F(X)\longrightarrow F(X)} ( A 1 , ⋯ , A n ) ↦ Y {\displaystyle (A^{1},\cdots ,A^{n})\mapsto Y} such that μ Y ( y ) = sup ∑ k = 1 n w ¯ i a σ ( i ) = y ( μ W 1 ( w 1 ) ∧ ⋯ ∧ μ W n ( w n ) ∧ μ A 1 ( a 1 ) ∧ ⋯ ∧ μ A n ( a n ) ) {\displaystyle \mu _{Y}(y)=\displaystyle \sup _{\displaystyle \sum _{k=1}^{n}{\bar {w}}_{i}a_{\sigma (i)}=y}\left({\begin{array}{{1}l}\mu _{W^{1}}(w_{1})\wedge \cdots \wedge \mu _{W^{n}}(w_{n})\wedge \mu _{A^{1}}(a_{1})\wedge \cdots \wedge \mu _{A^{n}}(a_{n})\end{array}}\right)} where w ¯ i = w i ∑ i = 1 n w i {\displaystyle {\bar {w}}_{i}={\frac {w_{i}}{\sum _{i=1}^{n}{w_{i}}}}} , and σ : { 1 , ⋯ , n } ⟶ { 1 , ⋯ , n } {\displaystyle \sigma \colon \{1,\cdots ,n\}\longrightarrow \{1,\cdots ,n\}} is a permutation function such that a σ ( i ) ≥ a σ ( i + 1 ) , ∀ i = 1 , ⋯ , n − 1 {\displaystyle a_{\sigma (i)}\geq a_{\sigma (i+1)},\ \forall i=1,\cdots ,n-1} , i.e., a σ ( i ) {\displaystyle a_{\sigma (i)}} is the i {\displaystyle i} th highest element in the set { a 1 , ⋯ , a n } {\displaystyle \left\{{a_{1},\cdots ,a_{n}}\right\}} . === Definition 2 === Using the alpha-cuts of fuzzy sets: Given the n linguistic weights { W i } i = 1 n {\displaystyle \left\{{W^{i}}\right\}_{i=1}^{n}} in the form of fuzzy sets defined on the domain of discourse U = [ 0 , 1 ] {\displaystyle U=[0,\;\;1]} , then for each α ∈ [ 0 , 1 ] {\displaystyle \alpha \in [0,\;1]} , an α {\displaystyle \alpha } -level type-1 OWA operator with α {\displaystyle \alpha } -level sets { W α i } i = 1 n {\displaystyle \left\{{W_{\alpha }^{i}}\right\}_{i=1}^{n}} to aggregate the α {\displaystyle \alpha } -cuts of fuzzy sets { A i } i = 1 n {\displaystyle \left\{{A^{i}}\right\}_{i=1}^{n}} is: Φ α ( A α 1 , … , A α n ) = { ∑ i = 1 n w i a σ ( i ) ∑ i = 1 n w i | w i ∈ W α i , a i ∈ A α i , i = 1 , … , n } {\displaystyle \Phi _{\alpha }\left({A_{\alpha }^{1},\ldots ,A_{\alpha }^{n}}\right)=\left\{{{\frac {\sum \limits _{i=1}^{n}{w_{i}a_{\sigma (i)}}}{\sum \limits _{i=1}^{n}{w_{i}}}}\left|{w_{i}\in W_{\alpha }^{i},\;a_{i}}\right.\in A_{\alpha }^{i},\;i=1,\ldots ,n}\right\}} where W α i = { w | μ W i ( w ) ≥ α } , A α i = { x | μ A i ( x ) ≥ α } {\displaystyle W_{\alpha }^{i}=\{w|\mu _{W_{i}}(w)\geq \alpha \},A_{\alpha }^{i}=\{x|\mu _{A_{i}}(x)\geq \alpha \}} , and σ : { 1 , ⋯ , n } → { 1 , ⋯ , n } {\displaystyle \sigma :\{\;1,\cdots ,n\;\}\to \{\;1,\cdots ,n\;\}} is a permutation function such that a σ ( i ) ≥ a σ ( i + 1 ) , ∀ i = 1 , ⋯ , n − 1 {\displaystyle a_{\sigma (i)}\geq a_{\sigma (i+1)},\;\forall \;i=1,\cdots ,n-1} , i.e., a σ ( i ) {\displaystyle a_{\sigma (i)}} is the i {\displaystyle i} th largest element in the set { a 1 , ⋯ , a n } {\displaystyle \left\{{a_{1},\cdots ,a_{n}}\right\}} . == Representation theorem of Type-1 OWA operators == Given the n linguistic weights { W i } i = 1 n {\displaystyle \left\{{W^{i}}\right\}_{i=1}^{n}} in the form of fuzzy sets defined on the domain of discourse U = [ 0 , 1 ] {\displaystyle U=[0,\;\;1]} , and the fuzzy sets A 1 , ⋯ , A n {\displaystyle A^{1},\cdots ,A^{n}} , then we have that Y = G {\displaystyle Y=G} where Y {\displaystyle Y} is the aggregation result obtained by Definition 1, and G {\displaystyle G} is the result obtained by in Definition 2. == Programming problems for Type-1 OWA operators == According to the Representation Theorem of Type-1 OWA Operators, a general type-1 OWA operator can be decomposed into a series of α {\displaystyle \alpha } -level type-1 OWA operators. In practice, this series of α {\displaystyle \alpha } -level type-1 OWA operators is used to construct the resulting aggregation fuzzy set. So we only need to compute the left end-points and right end-points of the intervals Φ α ( A α 1 , ⋯ , A α n ) {\displaystyle \Phi _{\alpha }\left({A_{\alpha }^{1},\cdots ,A_{\alpha }^{n}}\right)} . Then, the resulting aggregation fuzzy set is constructed with the membership function as follows: μ G ( x ) = ⋁ α : x ∈ Φ α ( A α 1 , ⋯ , A α n ) α ⁡ α {\displaystyle \mu _{G}(x)=\operatorname {\bigvee } \limits _{\alpha :x\in \Phi _{\alpha }\left({A_{\alpha }^{1},\cdots ,A_{\alpha }^{n}}\right)_{\alpha }}\alpha } For the left end-points, we need to solve the following programming problem: Φ α ( A α 1 , ⋯ , A α n ) − = min W α − i ≤ w i ≤ W α + i A α − i ≤ a i ≤ A α + i ⁡ ∑ i = 1 n w i a σ ( i ) / ∑ i = 1 n w i {\displaystyle \Phi _{\alpha }\left({A_{\alpha }^{1},\cdots ,A_{\alpha }^{n}}\right)_{-}=\operatorname {\min } \limits _{\begin{array}{l}W_{\alpha -}^{i}\leq w_{i}\leq W_{\alpha +}^{i}A_{\alpha -}^{i}\leq a_{i}\leq A_{\alpha +}^{i}\end{array}}\sum \limits _{i=1}^{n}{w_{i}a_{\sigma (i)}/\sum \limits _{i=1}^{n}{w_{i}}}} while for the right end-points, we need to solve the following programming problem: Φ α ( A α 1 , ⋯ , A α n ) + = max W α − i ≤ w i ≤ W α + i A α − i ≤ a i ≤ A α + i ⁡ ∑ i = 1 n w i a σ ( i ) / ∑ i = 1 n w i {\displaystyle \Phi _{\alpha }\left({A_{\alpha }^{1},\cdots ,A_{\alpha }^{n}}\right)_{+}=\operatorname {\max } \limits _{\begin{array}{l}W_{\alpha -}^{i}\leq w_{i}\leq W_{\alpha +}^{i}A_{\alpha -}^{i}\leq a_{i}\leq A_{\alpha +}^{i}\end{array}}\sum \limits _{i=1}^{n}{w_{i}a_{\sigma (i)}/\sum \limits _{i=1}^{n}{w_{i}}}} A fast method has been presented to solve two programming problem so that the type-1 OWA aggregation operation can be performed efficiently, for details, please see the paper. == Alpha-level approach to Type-1 OWA operation == Three-step process: Step 1—To set up the α {\displaystyle \alpha } - level resolution in [0, 1]. Step 2—For each α ∈ [ 0 , 1 ] {\displaystyle \alpha \in [0,1]} , Step 2.1—To calculate ρ α + i 0 ∗ {\displaystyle \rho _{\alpha +}^{i_{0}^{\ast }}} Let i 0 = 1 {\displaystyle i_{0}=1} ; If ρ α + i 0 ≥ A α + σ ( i 0 ) {\displaystyle \rho _{\alpha +}^{i_{0}}\geq A_{\alpha +}^{\sigma (i_{0})}} , stop, ρ α + i 0 {\displaystyle \rho _{\alpha +}^{i_{0}}} is the solution; otherwise go to Step 2.1-3. i 0 ← i 0 + 1 {\displaystyle i_{0}\leftarrow i_{0}+1} , go to Step 2.1-2. Step 2.2 To calculate ρ α − i 0 ∗ {\displaystyle \rho _{\alpha -}^{i_{0}^{\ast }}} Let i 0 = 1 {\displaystyle i_{0}=1} ; If ρ α − i 0 ≥ A α − σ ( i 0 ) {\displaystyle \rho _{\alpha -}^{i_{0}}\geq A_{\alpha -}^{\sigma (i_{0})}} , stop, ρ α − i 0 {\displaystyle \rho _{\alpha -}^{i_{0}}} is the solution; otherwise go to Step 2.2-3. i 0 ← i 0 + 1 {\displaystyle i_{0}\leftarrow i_{0}+1} , go to step Step 2.2-2. Step 3—To construct the aggregation resulting fuzzy set G {\displaystyle G} based on all the available intervals [ ρ α − i 0 ∗ , ρ α + i 0 ∗ ] {\displaystyle \left[{\rho _{\alpha -}^{i_{0}^{\ast }},\;\rho _{\alpha +}^{i_{0}^{\ast }}}\right]} : μ G ( x ) = ⋁ α : x ∈ [ ρ α − i 0 ∗ , ρ α + i 0 ∗ ] ⁡ α {\displaystyle \mu _{G}(x)=\operatorname {\bigvee } \limits _{\alpha :x\in \left[{\rho _{\alpha -}^{i_{0}^{\ast }},\;\rho _{\alpha +}^{i_{0}^{\ast }}}\right]}\alpha } == Some Examples == The type-1 OWA operator with the weights shown in the top figure is used to aggregate the fuzzy sets (solide lines) in the bottom figure, and the dashed line is the aggregation result. == Special cases == Any OWA operators, like maximum, minimum, mean operators; Join operators of (type-1) fuzzy sets, i.e., fuzzy maximum operators; Meet operators of (type-1) fuzzy sets, i.e., fuzzy minimum operators; Join-like operators of (type-1) fuzzy sets; Meet-like operators of (type-1) fuzzy sets. == Generalizations == Type-2 OWA operators have been suggested to aggregate the type-2 fuzzy sets for soft decision making. == Applications == Type-1 OWA operators have been applied to different domains for soft decision making. Improved efficiency of computing approach ; Type reduction of type-2 fuzzy sets ; Group decision making ; Credit risk evaluation ; Information fusion ; Linguistic expressions and symbolic translation ; Sentiment analysis ; Ro

    Read more →
  • Midjourney

    Midjourney

    Midjourney is a generative artificial intelligence program and service created and hosted by the San Francisco–based "independent research lab" Midjourney, Inc. Midjourney generates images from natural language descriptions, called prompts, similar to OpenAI's DALL-E and Stability AI's Stable Diffusion. It is one of the technologies of the AI boom. The tool was launched into open beta on July 12, 2022. The Midjourney team is led by David Holz, who co-founded Leap Motion. Holz told The Register in August 2022 that the company was already profitable. Users generate images with Midjourney using Discord bot commands or the official website. == History == Midjourney, Inc. was founded in San Francisco, California, by David Holz, previously a co-founder of Leap Motion. The Midjourney image generation platform entered open beta on July 12, 2022. On March 14, 2022, the Midjourney Discord server launched with a request to post high-quality photographs to Twitter and Reddit for systems training. === Model versions === The company has been working on improving its algorithms, releasing new model versions every few months. Version 2 of their algorithm was launched in April 2022, and version 3 on July 25. On November 5, 2022, the alpha iteration of version 4 was released to users. Starting from the 4th version, MJ models were trained on Google TPUs. On March 15, 2023, the alpha iteration of version 5 was released. The 5.1 model is more opinionated than version 5, applying more of its own stylization to images, while the 5.1 RAW model adds improvements while working better with more literal prompts. The version 5.2 included a new "aesthetics system", and the ability to "zoom out" by generating surroundings to an existing image. On December 21, 2023, the alpha iteration of version 6 was released. The model was trained from scratch over a nine month period. Support was added for better text rendition and a more literal interpretation of prompts. == Functionality == Midjourney is accessible through a Discord bot or by accessing their website. Users can use Midjourney through Discord either through their official Discord server, by directly messaging the bot, or by inviting the bot to a third-party server. To generate images, users use the /imagine command and type in a prompt; the bot then returns a set of four images, which users are given the option to upscale. To generate images on the website, users initially needed to have generated at least 1,000 images through the bot; this limitation has since been removed. === Vary (Region) + remix feature === Midjourney released a Vary (Region) feature on September 5, 2023, as part of MidJourney V5.2. This feature allows users to select a specific area of an image and apply variations only to that region while keeping the rest of the image unchanged. === Midjourney web interface === Midjourney introduced its web interface to make its tools more accessible, moving beyond its initial reliance on Discord. This web-based platform was launched in August 2024 alongside the release of Midjourney version 6.1. The web editor consolidates tools such as image editing, panning, zooming, region variation, and inpainting into a single interface. The introduction of the web interface also syncs conversations between Midjourney's Discord channels and web rooms, further enhancing collaboration across both platforms. This shift was in response to growing competition from other AI image generation platforms like Adobe Firefly and Google’s Imagen, which had already launched as native web apps with integration into popular design tools. === Image Weight === This feature lets users control how much influence an uploaded image has on the final output. By adjusting the "image weight" parameter, users can prioritize either the content of the prompt or the characteristics of the image. For instance, setting a higher weight will ensure that the generated result closely follows the image's structure and details, while a lower weight allows the text prompt to have more influence over the final output. === Style Reference === With Style Reference, users can upload an image to use as a stylistic guide for their creation. This tool enables MidJourney to extract the style—whether it is the color palette, texture, or overall atmosphere—from the reference image and apply it to a newly generated image. The feature allows users to fine-tune the aesthetics of their creations by integrating specific artistic styles or moods. === Character Reference === The Character Reference feature allows for a more targeted approach in defining characters. Users can upload an image of a character, and the system uses that image as a reference to generate similar characters in the output. This feature is particularly useful in maintaining consistency in appearance for characters across different images. == Uses == Midjourney's founder, David Holz, told The Register that artists use Midjourney for rapid prototyping of artistic concepts to show to clients before starting work themselves. The advertising industry quickly adopted AI tools such as Midjourney, DALL-E, and Stable Diffusion to create original content and brainstorm ideas. Architects have described using the software to generate mood boards for the early stages of projects, as an alternative to searching Google Images. === Notable usage and controversy === The program was used by the British magazine The Economist to create the front cover for an issue in June 2022. In Italy, the leading newspaper Corriere della Sera published a comic created with Midjourney by writer Vanni Santoni in August 2022. Charlie Warzel used Midjourney to generate two images of Alex Jones for Warzel's newsletter in The Atlantic. The use of an AI-generated cover was criticised by people who felt it was taking jobs from artists. Warzel called his action a mistake in an article about his decision to use generated images. Last Week Tonight with John Oliver included a 10-minute segment on Midjourney in an episode broadcast in August 2022. A Midjourney image called Théâtre D'opéra Spatial won first place in the digital art competition at the 2022 Colorado State Fair. Jason Allen, who wrote the prompt that led Midjourney to generate the image, printed the image onto a canvas and entered it into the competition using the name Jason M. Allen via Midjourney. Other digital artists were upset by the news. Allen was unapologetic, insisting that he followed the competition's rules. The two category judges were unaware that Midjourney used AI to generate images, although they later said that had they known this, they would have awarded Allen the top prize anyway. In December 2022, Midjourney was used to generate the images for an AI-generated children's book that was created over a weekend. Titled Alice and Sparkle, the book features a young girl who builds a robot that becomes self-aware. The creator, Ammaar Reeshi, used Midjourney to generate a large number of images, from which he chose 13 for the book. Both the product and process drew criticism. One artist wrote that "the main problem... is that it was trained off of artists' work. It's our creations, our distinct styles that we created, that we did not consent to being used." In 2023, the realism of AI-based text-to-image generators, such as Midjourney, DALL-E, or Stable Diffusion, reached such a high level that it led to a significant wave of viral AI-generated photos. Widespread attention was gained by a Midjourney-generated photo of Pope Francis wearing a white puffer coat, the fictional arrest of Donald Trump, and a hoax of an attack on the Pentagon, as well as the usage in professional creative arts. Research has suggested that the images Midjourney generates can be biased. For example, even neutral prompts in one study returned unequal results on the aspects of gender, skin color, and location. A study by researchers at the nonprofit group Center for Countering Digital Hate found the tool to be easy to use to generate racist and conspiratorial images. In October 2023, Rest of World reported that Midjourney tends to generate images based on national stereotypes. In 2024, a Frontiers journal published a paper which contained gibberish figures generated with Midjourney, one of which was a diagram of a rat with large testicles and a large penis towering over himself. The paper was retracted a day after the images went viral on Twitter. ==== Content moderation and censorship in Midjourney ==== Prior to May 2023, Midjourney implemented a moderation mechanism predicated on a banned word system. This method prohibited the use of language associated with explicit content, such as sexual or pornographic themes, as well as extreme violence. Moreover, the system also banned certain individual words, including those of religious and political figures, such as Allah or General Secretary of the Chinese Communist Party Xi Jinping. This practice occasionally stirred controversy due to perceiv

    Read more →
  • Theaitre

    Theaitre

    Theaitre (stylized as THEaiTRE) is an interdisciplinary research project investigating to what extent artificial intelligence is able to generate theatre play scripts. The first theatre play produced within the project, AI: When a Robot Writes a Play, premiered online on February 26, 2021. == Goal == Following similar previous projects such as Sunspring, a short sci-fi movie with an automatically generated script, the THEaiTRE project investigates whether current language generation approaches are mature enough to generate a theatre play script that could be successfully performed in front of an audience. The project falls within the area of generative art, famously represented e.g. by the portrait of Edmond de Belamy which was generated by an artificial neural network. In this field, artists are trying to use automated techniques to create "art", questioning the modern definition of art itself. More broadly, the project aims at promoting cooperation rather than competition of humans and artificial intelligence as the more beneficial approach for both. The first theatre play created within the project, titled AI: When a Robot Writes a Play, was presented in February 2021 at the 100th anniversary of the premiere of the R.U.R. theatre play by the Czech author Karel Čapek to celebrate the invention of the word "robot". While R.U.R. was a play written by a human about robots (and humans), THEaiTRE tried to reverse this idea by presenting a play written by a "robot" (artificial intelligence) about humans (and robots). The script of the play was published online, with marked parts of the text which were written manually or manually post-edited. The analysis shows that 90% of the script is automatically generated, with 10% manually written or manually post-edited. The project also plans to produce a second play in 2022, addressing some of the many shortcomings of the approach used to generate the first play, as well as attempting to further minimize the amount of human influence on the script. == Approach == At the core of the project is the GPT-2 language model by OpenAI with various adjustments motivated by the task of generating theatre play scripts, for which the model is not particularly trained. The GPT-2 model is used in the usual way, providing it with a start of a document and prompting it to generate a continuation of the document. Specifically, the input for GPT-2 in this project is typically a short description of the scene setting, followed by a few lines to introduce the characters and start the dialogue. The model then generates 10 continuation lines, and hands control to the user, who can then either ask the model to continue generating, or make various edits before letting the model to generate further, deleting some parts of the script or adding new lines into the script. The adjustments include restricting the generator to only produce lines pertaining to characters appearing in the input prompt, limiting the repetitiveness of the generated text, and employing automatic summarization of the input prompt and the generated text to overcome the limitation of the GPT-2 model which only attends to the last 1,024 subword tokens. The limitations of the model include, among other, a lack of distinctiveness and self-consistency of the characters, an inability to generate the script for the whole play (scripts for individual scenes are generated independently), and errors due to the employment of automated machine translation, as GPT-2 generates English texts but the final play script is being produced in Czech language. The source codes of the project are available under the MIT licence. The project has also published some sample outputs. == Team == The project is a cooperation of the following experts, all based in Prague, Czech Republic: computational linguists from the Faculty of Mathematics and Physics, Charles University theatre experts from the Švanda Theatre and from the Theatre Faculty of the Academy of Performing Arts in Prague hackers from CEE Hacks The project is financially supported by the Technology Agency of the Czech Republic.

    Read more →
  • Loebner Prize

    Loebner Prize

    The Loebner Prize was an annual competition in artificial intelligence that awarded prizes to the computer programs considered by the judges to be the most human-like. The format of the competition was that of a standard Turing test. In each round, a human judge simultaneously held textual conversations with a computer program and a human being via computer. Based upon the responses, the judge would attempt to determine which was which. The contest was launched in 1990 by Hugh Loebner in conjunction with the Cambridge Center for Behavioral Studies, Massachusetts, United States. In 2004 and 2005, it was held in Loebner's apartment in New York City. Within the field of artificial intelligence, the Loebner Prize is somewhat controversial; the most prominent critic, Marvin Minsky, called it a publicity stunt that does not help the field along. Beginning in 2014, it was organised by the AISB at Bletchley Park. It has also been associated with Flinders University, Dartmouth College, the Science Museum in London, University of Reading and Ulster University, Magee Campus, Derry, UK City of Culture. For the final 2019 competition, the format changed. There was no panel of judges. Instead, the chatbots were judged by the public and there were to be no human competitors. The prize has been reported as defunct as of 2020. == Prizes == Originally, $2,000 was awarded for the most human-seeming program in the competition. The prize was $3,000 in 2005 and $2,250 in 2006. In 2008, $3,000 was awarded. In addition, there were two one-time-only prizes that have never been awarded. $25,000 is offered for the first program that judges cannot distinguish from a real human and which can convince judges that the human is the computer program. $100,000 is the reward for the first program that judges cannot distinguish from a real human in a Turing test that includes deciphering and understanding text, visual, and auditory input. The competition was planned to end after the achievement of this prize. == Competition rules and restrictions == The rules varied over the years and early competitions featured restricted conversation Turing tests but since 1995 the discussion has been unrestricted. For the three entries in 2007, Robert Medeksza, Noah Duncan and Rollo Carpenter, some basic "screening questions" were used by the sponsor to evaluate the state of the technology. These included simple questions about the time, what round of the contest it is, etc.; general knowledge ("What is a hammer for?"); comparisons ("Which is faster, a train or a plane?"); and questions demonstrating memory for preceding parts of the same conversation. "All nouns, adjectives and verbs will come from a dictionary suitable for children or adolescents under the age of 12." Entries did not need to respond "intelligently" to the questions to be accepted. For the first time in 2008 the sponsor allowed introduction of a preliminary phase to the contest opening up the competition to previously disallowed web-based entries judged by a variety of invited interrogators. The available rules do not state how interrogators are selected or instructed. Interrogators (who judge the systems) have limited time: 5 minutes per entity in the 2003 competition, 20+ per pair in 2004–2007 competitions, 5 minutes to conduct simultaneous conversations with a human and the program in 2008–2009, increased to 25 minutes of simultaneous conversation since 2010. == Criticisms == The prize has long been scorned by experts in the field, for a variety of reasons. It is regarded by many as a publicity stunt. Marvin Minsky scathingly offered a "prize" to anyone who could stop the competition. Loebner responded by jokingly observing that Minsky's offering a prize to stop the competition effectively made him a co-sponsor. The rules of the competition have encouraged poorly qualified judges to make rapid judgements. Interactions between judges and competitors was originally very brief, for example effectively 2.5 mins of questioning, which permitted only a few questions. Questioning was initially restricted to a single topic of the contestant's choice, such as "whimsical conversation", a domain suiting standard chatbot tricks. Competition entrants do not aim at understanding or intelligence but resort to basic ELIZA style tricks, and successful entrants find deception and pretense is rewarded. == Contests == See article history for more details of some earlier contests. A very incomplete listing of a few of the contests: === 2003 === In 2003, the contest was organised by Professor Richard H. R. Harper and Dr. Lynne Hamill from the Digital World Research Centre at the University of Surrey. Although no bot passed the Turing test, the winner was Jabberwock, created by Juergen Pirner. Second was Elbot (Fred Roberts, Artificial Solutions). Third was Jabberwacky, (Rollo Carpenter). === 2006 === In 2006, the contest was organised by Tim Child (CEO of Televirtual) and Huma Shah. On August 30, the four finalists were announced: Rollo Carpenter Richard Churchill and Marie-Claire Jenkins Noah Duncan Robert Medeksza The contest was held on 17 September in the VR theatre, Torrington Place campus of University College London. The judges included the University of Reading's cybernetics professor, Kevin Warwick, a professor of artificial intelligence, John Barnden (specialist in metaphor research at the University of Birmingham), a barrister, Victoria Butler-Cole and a journalist, Graham Duncan-Rowe. The latter's experience of the event can be found in an article in Technology Review. The winner was 'Joan', based on Jabberwacky, both created by Rollo Carpenter. === 2007 === The 2007 competition was held on October 21 in New York City. The judges were: computer science professor Russ Abbott, philosophy professor Hartry Field, psychology assistant professor Clayton Curtis and English lecturer Scott Hutchins. No bot passed the Turing test, but the judges ranked the three contestants as follows: 1st: Robert Medeksza, creator of Ultra Hal 2nd: Noah Duncan, a private entry, creator of Cletus 3rd: Rollo Carpenter from Icogno, creator of Jabberwacky The winner received $2,250 and the annual medal. The runners-up received $250 each. === 2008 === The 2008 competition was organised by professor Kevin Warwick, coordinated by Huma Shah and held on October 12 at the University of Reading, UK. After testing by over one hundred judges during the preliminary phase, in June and July 2008, six finalists were selected from thirteen original entrant artificial conversational entities (ACEs). Five of those invited competed in the finals: Brother Jerome, Peter Cole and Benji Adams Elbot, Fred Roberts / Artificial Solutions Eugene Goostman, Vladimir Veselov, Eugene Demchenko and Sergey Ulasen Jabberwacky, Rollo Carpenter Ultra Hal, Robert Medeksza In the finals, each of the judges was given five minutes to conduct simultaneous, split-screen conversations with two hidden entities. Elbot of Artificial Solutions won the 2008 Loebner Prize bronze award, for most human-like artificial conversational entity, through fooling three of the twelve judges who interrogated it (in the human-parallel comparisons) into believing it was human. This is coming very close to the 30% traditionally required to consider that a program has actually passed the Turing test. Eugene Goostman and Ultra Hal both deceived one judge each that it was the human. Will Pavia, a journalist for The Times, has written about his experience; a Loebner finals' judge, he was deceived by Elbot and Eugene. Kevin Warwick and Huma Shah have reported on the parallel-paired Turing tests. === 2009 === The 2009 Loebner Prize Competition was held September 6, 2009, at the Brighton Centre, Brighton UK in conjunction with the Interspeech 2009 conference. The prize amount for 2009 was $3,000. Entrants were David Levy, Rollo Carpenter, and Mohan Embar, who finished in that order. The writer Brian Christian participated in the 2009 Loebner Prize Competition as a human confederate, and described his experiences at the competition in his book The Most Human Human. === 2010 === The 2010 Loebner Prize Competition was held on October 23 at California State University, Los Angeles. The 2010 competition was the 20th running of the contest. The winner was Bruce Wilcox with Suzette. === 2011 === The 2011 Loebner Prize Competition was held on October 19 at the University of Exeter, Devon, United Kingdom. The prize amount for 2011 was $4,000. The four finalists and their chatterbots were Bruce Wilcox (Rosette), Adeena Mignogna (Zoe), Mohan Embar (Chip Vivant) and Ron Lee (Tutor), who finished in that order. That year there was an addition of a panel of junior judges, namely Georgia-Mae Lindfield, William Dunne, Sam Keat and Kirill Jerdev. The results of the junior contest were markedly different from the main contest, with chatterbots Tutor and Zoe tying for first place and Chip Vivant and Rosette coming in third and fourt

    Read more →
  • Whisper (speech recognition system)

    Whisper (speech recognition system)

    Whisper is a machine learning model for speech recognition and transcription, created by OpenAI and first released as open-source software in September 2022. It is capable of transcribing speech in English and multiple other languages, and can translate several non-English languages into English. Whisper is a weakly-supervised deep learning acoustic model, made using an encoder-decoder transformer architecture. OpenAI claims that the combination of different training data and post-training filtering used in its development has led to improved recognition of accents, background noise, and jargon compared to previous approaches. While the model does not outperform larger, more specialized models and still experiences AI hallucination, it has been showed to be useful for general sound recognition and has many applications across different industries. == Background == Speech recognition has had a long history in research; the first approaches made use of statistical methods, such as dynamic time warping, and later hidden Markov models. At around the 2010s, deep neural network approaches became more common for speech recognition models, which were enabled by the availability of large datasets ("big data") and increased computational performance. Early approaches to deep learning in speech recognition included convolutional neural networks, which were limited due to their inability to capture sequential data, which later led to developments of Seq2seq approaches, which include recurrent neural networks, which made use of long short-term memory. Transformers, introduced in 2017 by Google, displaced many prior state-of-the-art approaches across a wide range in machine learning, and started becoming the core neural architecture in fields such as language modeling and computer vision. Weakly-supervised approaches to training acoustic models were recognized in the early 2020s as promising for speech recognition approaches using deep neural networks. According to a NYT report, in 2021 OpenAI believed they exhausted sources of higher-quality data to train their large language models and decided to complement scraped web text with transcriptions of YouTube videos and podcasts, and developed Whisper to solve this task. Whisper Large V2 was released on December 8, 2022, followed by Whisper Large V3 being released in November 2023, during the OpenAI Dev Day. In March 2025, OpenAI released new transcription models based on GPT-4o and GPT-4o mini, both of which have lower error rates than Whisper. == Architecture == The Whisper architecture is based on an encoder-decoder transformer. Input audio is resampled to 16,000 Hertz (Hz) and converted to an 80-channel Log-magnitude Mel spectrogram using 25 ms windows with a 10 ms stride. The spectrogram is then normalized to a [-1, 1] range with near-zero mean. The encoder takes this Mel spectrogram as input and processes it. It first passes through two convolutional layers. Sinusoidal positional embeddings are added. It is then processed by a series of Transformer encoder blocks (with pre-activation residual connections). The encoder's output is layer normalized. The decoder is a standard transformer decoder. It has the same width and Transformer blocks as the encoder. It uses learned positional embeddings and tied input-output token representations (using the same weight matrix for both the input and output embeddings). It uses a byte-pair encoding tokenizer, of the same kind as used in GPT-2. English-only models use the GPT-2 vocabulary, while multilingual models employ a re-trained multilingual vocabulary with the same number of words. Special tokens are used to allow the decoder to perform multiple tasks: Tokens that denote language (one unique token per language). Tokens that specify task (<|transcribe|> or <|translate|>). Tokens that specify if no timestamps are present (<|notimestamps|>). If the token is not present, then the decoder predicts timestamps relative to the segment, and quantized to 20 ms intervals. <|nospeech|> for voice activity detection. <|startoftranscript|>, and <|endoftranscript|> . Any text that appears before <|startoftranscript|> is not generated by the decoder, but given to the decoder as context. Loss is only computed over non-contextual parts of the sequence, i.e. tokens between these two special tokens. == Training data == The training dataset consists of 680,000 hours of labeled audio-transcript pairs sourced from the internet using semi-supervised learning. This includes 117,000 hours in 96 non-English languages and 125,000 hours of X→English translation data, where X stands for any non-English language. Preprocessing involved standardization of transcripts, filtering to remove machine-generated transcripts using heuristics (e.g., punctuation, capitalization), language identification and matching with transcripts, fuzzy deduplication, and deduplication with evaluation datasets to avoid data contamination. Speechless segments were also included to allow voice activity detection training. For the files still remaining after the filtering process, audio files were then broken into 30-second segments paired with the subset of the transcript that occurs within that time. If this predicted spoken language differed from the language of the text transcript associated with the audio, that audio-transcript pair was not used for training the speech recognition models, but instead for training translation. The model was trained using the AdamW optimizer with gradient norm clipping and a linear learning rate decay with warmup, with batch size 256 segments. Training proceeded for 1 million updates (approximately 2-3 epochs). No data augmentation or regularization, except for the Large V2 model, which used SpecAugment, Stochastic Depth, and BPE Dropout. The training used data parallelism with float16, dynamic loss scaling, and activation checkpointing. === Post-training filtering === After training the first model, researchers ran it on different subsets of the training data, each representing a distinct source. Data sources were ranked by a combination of their error rate and size. Manual inspection of the top-ranked sources (high error, large size) helped determine if the source was low quality (e.g., partial transcriptions, inaccurate alignment). After training, it was fine-tuned to suppress the prediction of speaker names and low-quality sources were then removed. == Capacity == While Whisper does not outperform models which specialize in the LibriSpeech dataset, when tested across many datasets, it is more robust and makes 55.2% fewer errors than other models. Whisper has a differing error rate with respect to transcribing different languages, with a higher word error rate in languages not well-represented in the training data. The authors found that multi-task learning improved overall performance compared to models specialized to one task. They conjectured that the best Whisper model trained is still underfitting the dataset, and larger models and longer training can result in better models. Third-party evaluations have found varying levels of AI hallucination. A study of transcripts of public meetings found hallucinations in eight out of every 10 transcripts, while an engineer discovered hallucinations in "about half" of 100 hours of transcriptions and a developer identified them in "nearly every one" of 26,000 transcripts. A study of 13,140 short audio segments (averaging 10 seconds) found 187 hallucinations (1.4%), 38% of which generated text that could be harmful because it inserted false references to things like race, non-existent medications, or violent events that were not in the audio. == Applications == The model has been used as the base for many applications, such as a unified model for speech recognition and more general sound recognition. Whisper has also been integrated into the workflow of biomedical research. In 2025, a study on Alzheimer's disease detection used the model to transcribe spontaneous speech recordings. The transcripts that were generated by the model were combined with LLM vector embeddings and traditional classifiers to help classify the patients' health. Another application is when OVALYTICS incorporated Whisper to transcribe YouTube videos and automate content moderation systems, which improved its detection of offensive content. The model has also been used in academic libraries and cultral heritage institutions to generate transcripts and captions for their digitized audiovisual collections. In a 2025 case study, Emory University Libraries found that Whisper reduced the labor used in transcription by around 30-35%, shifting work from text creation to text correction. However, human review is still necessary to make sure accuracy, formatting, and accessibility are all standard.

    Read more →
  • Wayve

    Wayve

    Wayve Technologies Ltd is a British autonomous driving technology company focused on developing self-driving vehicle systems through end-to-end deep learning. Founded in 2017 by researchers from the University of Cambridge, Wayve’s approach eschews detailed 3D maps and hand-coded rules, in favor of a self-learning “AI driver” that learns from camera data and driving experience. The London-headquartered startup has garnered significant attention and funding for its visually-based method. == History == Wayve was founded in Cambridge, England, on August 21, 2017, by Amar Shah and Alex Kendall, two machine learning PhD students at the University of Cambridge. Shah initially served as CEO while Kendall was CTO, and the pair set out to develop an unconventional self-driving car system using machine learning at every layer of the driving task. In May 2018, Wayve emerged from stealth mode with backing from early-stage investors. At this time the company had around 10 employees, and its advisory investors included Uber’s Chief Scientist, Zoubin Ghahramani, who shared Wayve’s vision of a learning-centric driving AI. In 2019, Wayve achieved a milestone by training a car to drive autonomously on public roads it had never seen before, using only cameras, a basic GPS map, and end-to-end deep learning control. The company moved its base to London and secured a $20 million Series A funding round in November 2019. This investment enabled Wayve to launch a pilot fleet of autonomous electric vehicles in central London for real-world testing. During these trials, Wayve’s cars (such as retrofitted Jaguar I-Pace SUVs) began navigating the complex, narrow streets of London to prove the system’s ability to adapt to challenging urban scenarios. In 2020, co-founder Amar Shah departed the company, and Alex Kendall assumed the role of CEO. The startup joined the Microsoft for Startups: Autonomous Driving program in 2020, leveraging Microsoft Azure’s cloud computing for training its machine learning models at scale. It also committed to testing exclusively on electric vehicles, and a goal to reduce carbon emissions. In 2021, Wayve entered pilot programs with major UK retailers. It launched a 12-month autonomous delivery trial with supermarket chain Asda, and received a £10 million ($13.6 million) investment from online grocer Ocado Group as part of a partnership to develop self-driving grocery delivery vans. Ocado’s backing gave Wayve access to a fleet of delivery vans for data collection and testing on busy London routes (with human safety drivers present) to train its AI in urban traffic. In 2022, after a successful Series B funding round, the company extended road testing beyond the UK to other regions, and, by 2023, in multiple countries. The company had begun operating in the United States and in continental Europe, in preparation for larger commercial deployments. In 2023, Wayve announced a collaboration with Nissan to integrate Wayve’s AI-driven software into its ProPilot ADAS system, slated to launch in fiscal year 2027. Wayve received strategic investment from Uber, in 2024, to jointly develop autonomous ride-hailing services. The two companies plan to trial a fully driverless robotaxi service in London, supported by a UK government program to accelerate commercial self-driving pilots to as early as 2026. To demonstrate the scalability of its technology, Wayve conducted an “AI-500” roadshow project, driving in dozens of cities across Asia, Europe, and North America using the same AI model. By mid-2025, it had completed autonomous driving demos in 90 cities without prior HD mapping. In April 2025, Wayve opened its first Asian research hub in Japan, with investment by SoftBank, to improve its model’s generalization using local driving data. That year, the company conducted driving tests in over 500 cities in Europe, North America and Japan without city-specific programming. In February 2026, Nissan, Uber and Wayve announced their collaboration on robotaxi development, with the aim of launching a pilot programme in Tokyo by late 2026. Wayve also formed a strategic alliance with Mercedes-Benz and Stellantis on personal vehicle and robotaxi applications. == Financing and investors == Wayve has been backed by a mix of venture capital (VC) firms, corporate investors, and individuals. Its initial seed funding came from funds such as Compound (NYC) and Firstminute Capital (London), as well as Cambridge-based angel investors, in 2018. Academic Pieter Abbeel and Uber’s chief scientist, Zoubin Ghahramani, were early backers. In November 2019, Wayve raised a $20 million Series A led by Eclipse Ventures, with participation from Balderton Capital and other prior investors. The Series A financing was used to fund the company’s first autonomous trials in London, and marked the first time a European self-driving car startup had secured a U.S. VC as lead investor. In October 2021, Ocado Group invested £10 million (approximately $13.6 million) in Wayve as a strategic partner in autonomous grocery delivery. This brought Wayve’s total funding to around $60 million at that time. The Series B round followed in January 2022, when Wayve announced $200 million in new funding led by Eclipse Ventures, with D1 Capital Partners, Moore Strategic Ventures, and Linse Capital. Balderton, Microsoft and Virgin Group joined as strategic backers. Baillie Gifford and Compound also participated; Ocado increased its stake as a strategic investor; and Meta AI head Yann LeCun and Richard Branson also became investors. Wayve’s Series C in May 2024 closed a $1.05 billion, led by Japan’s SoftBank Group. The funding round was the largest-ever for a UK AI company, and included new investor Nvidia, and returning investors Microsoft and Eclipse Ventures, among others. Uber also joined as a stratgic partner and a stakeholder. The Series C round increased Wayve’s total funding raised to about $1.3 billion to date from investors including SoftBank, Microsoft and Nvidia, and lifted Wayve’s valuation into “unicorn” status. In February 2026, Wayve announced a $1.2 billion Series D funding round; later that month, the company reported that $1.5 billion had been raised from, primarily, Mercedes-Benz, Stellantis, Nissan, and existing backers Uber, Microsoft and Nvidia, increasing Wayve's overall valuation to $8.6 billion. == Technology == Wayve’s self-driving approach centers on end-to-end deep learning and a vision-based AI system. Unlike conventional autonomous vehicles that depend on high-definition maps, hand-coded rules, and arrays of expensive lidar sensors, Wayve’s platform learns to drive predominantly using camera data and machine learning algorithms. The company refers to its AI-driven driving software as an “Embodied AI” or AI Driver, emphasizing that the system learns from experience (both real and simulated) to handle complex or novel situations rather than following pre-programmed instructions, not unlike Tesla's approach. The Wayve hardware-agnostic autonomy stack consists of a suite of video cameras, with basic automotive sensors, mounted on the vehicle, and paired with onboard compute units that are powered by GPUs to run the AI models. This vision-only philosophy is similar to Tesla’s Autopilot/FSDB model, but Wayve’s solution is vehicle-agnostic and mapless. Wayve’s strategy is to provide its driving AI as an OEM-ready platform; it plans to license or embed its technology into vehicles made by established automakers rather than build its own cars. Wayve’s development vehicles currently use Nvidia’s Orin system-on-chip as the onboard computer for running the AI model, but CEO Kendall has noted that the software can run on “whatever GPU [an automaker] already has in their vehicles” Wayve has built a cloud infrastructure, largely on Microsoft Azure, to process petabytes of this data, and uses simulation tools (known internally as the “Wayve Infinity” simulator) to synthetically generate and practice rare or dangerous scenarios for the AI to learn from. == Corporate affairs == Wayve is a privately held company headquartered in London, England, with its primary research and development office in the Kings Cross area of London. The company was initially incorporated as Wayve Technologies Ltd in the UK. Wayve has also established a presence in the U.S., in Silicon Valley); in Canada, with a research hub in Vancouver; in Yokohama, Japan; in Leonberg, Germany; and in Herzliya, Israel. The Leadership team includes research scientists and engineers with backgrounds in computer vision, robotics, and automotive systems. President Erez Dagan was hired in 2024, following two decades at Mobileye; chief scientist Jamie Shotton is formerly of Microsoft Research; CEO Alex Kendall, originally from New Zealand with a PhD in computer vision from Cambridge, took over as CEO in 2020 after the departure of his co-founder Amar Shah.

    Read more →