AI Content Youtube Monetization

AI Content Youtube Monetization — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Data administration

    Data administration

    Data administration or data resource management is an organizational function working in the areas of information systems and computer science that plans, organizes, describes and controls data resources. Data resources are usually stored in databases under a database management system or other software such as electronic spreadsheets. In many smaller organizations, data administration is performed occasionally, or is a small component of the database administrator’s work. In the context of information systems development, data administration ideally begins at system conception, ensuring there is a data dictionary to help maintain consistency, avoid redundancy, and model the database so as to make it logical and usable, by means of data modeling, including database normalization techniques. == Data resource management == According to the Data Management Association (DAMA), data resource management is "the development and execution of architectures, policies, practices and procedures that properly manage the full data lifecycle needs of an enterprise". Data Resource management may be thought of as a managerial activity that applies information system and other data management tools to the task of managing an organization’s data resource to meet a company’s business needs, and the information they provide to their shareholders. From the perspective of database design, it refers to the development and maintenance of data models to facilitate data sharing between different systems, particularly in a corporate context. Data Resource Management is also concerned with both data quality and compatibility between data models. Since the beginning of the information age, businesses need all types of data on their business activity. With each data created, when a business transaction is made, need data is created. With these data, new direction is needed that focuses on managing data as a critical resource of the organization to directly support its business activities. The data resource must be managed with the same intensity and formality that other critical resources are managed. Organizations must emphasize the information aspect of information technology, determine the data needed to support the business, and then use appropriate technology to build and maintain a high-quality data resource that provides that support. Data resource quality is a measure of how well the organization's data resource supports the current and the future business information demand of the organization. The data resource cannot support just the current business information demand while sacrificing the future business information demand. It must support both the current and the future business information demand. The ultimate data resource quality is stability across changing business needs and changing technology. A corporate data resource must be developed within single, organization-wide common data architecture. A data architecture is the science and method of designing and constructing a data resource that is business driven, based on real-world objects and events as perceived by the organization, and implemented into appropriate operating environments. It is the overall structure of a data resource that provides a consistent foundation across organizational boundaries to provide easily identifiable, readily available, high-quality data to support the business information demand. The common data architecture is a formal, comprehensive data architecture that provides a common context within which all data at an organization's disposal are understood and integrated. It is subject oriented, meaning that it is built from data subjects that represent business objects and business events in the real world that are of interest to the organization and about which data are captured and maintained.

    Read more →
  • Whisper (speech recognition system)

    Whisper (speech recognition system)

    Whisper is a machine learning model for speech recognition and transcription, created by OpenAI and first released as open-source software in September 2022. It is capable of transcribing speech in English and multiple other languages, and can translate several non-English languages into English. Whisper is a weakly-supervised deep learning acoustic model, made using an encoder-decoder transformer architecture. OpenAI claims that the combination of different training data and post-training filtering used in its development has led to improved recognition of accents, background noise, and jargon compared to previous approaches. While the model does not outperform larger, more specialized models and still experiences AI hallucination, it has been showed to be useful for general sound recognition and has many applications across different industries. == Background == Speech recognition has had a long history in research; the first approaches made use of statistical methods, such as dynamic time warping, and later hidden Markov models. At around the 2010s, deep neural network approaches became more common for speech recognition models, which were enabled by the availability of large datasets ("big data") and increased computational performance. Early approaches to deep learning in speech recognition included convolutional neural networks, which were limited due to their inability to capture sequential data, which later led to developments of Seq2seq approaches, which include recurrent neural networks, which made use of long short-term memory. Transformers, introduced in 2017 by Google, displaced many prior state-of-the-art approaches across a wide range in machine learning, and started becoming the core neural architecture in fields such as language modeling and computer vision. Weakly-supervised approaches to training acoustic models were recognized in the early 2020s as promising for speech recognition approaches using deep neural networks. According to a NYT report, in 2021 OpenAI believed they exhausted sources of higher-quality data to train their large language models and decided to complement scraped web text with transcriptions of YouTube videos and podcasts, and developed Whisper to solve this task. Whisper Large V2 was released on December 8, 2022, followed by Whisper Large V3 being released in November 2023, during the OpenAI Dev Day. In March 2025, OpenAI released new transcription models based on GPT-4o and GPT-4o mini, both of which have lower error rates than Whisper. == Architecture == The Whisper architecture is based on an encoder-decoder transformer. Input audio is resampled to 16,000 Hertz (Hz) and converted to an 80-channel Log-magnitude Mel spectrogram using 25 ms windows with a 10 ms stride. The spectrogram is then normalized to a [-1, 1] range with near-zero mean. The encoder takes this Mel spectrogram as input and processes it. It first passes through two convolutional layers. Sinusoidal positional embeddings are added. It is then processed by a series of Transformer encoder blocks (with pre-activation residual connections). The encoder's output is layer normalized. The decoder is a standard transformer decoder. It has the same width and Transformer blocks as the encoder. It uses learned positional embeddings and tied input-output token representations (using the same weight matrix for both the input and output embeddings). It uses a byte-pair encoding tokenizer, of the same kind as used in GPT-2. English-only models use the GPT-2 vocabulary, while multilingual models employ a re-trained multilingual vocabulary with the same number of words. Special tokens are used to allow the decoder to perform multiple tasks: Tokens that denote language (one unique token per language). Tokens that specify task (<|transcribe|> or <|translate|>). Tokens that specify if no timestamps are present (<|notimestamps|>). If the token is not present, then the decoder predicts timestamps relative to the segment, and quantized to 20 ms intervals. <|nospeech|> for voice activity detection. <|startoftranscript|>, and <|endoftranscript|> . Any text that appears before <|startoftranscript|> is not generated by the decoder, but given to the decoder as context. Loss is only computed over non-contextual parts of the sequence, i.e. tokens between these two special tokens. == Training data == The training dataset consists of 680,000 hours of labeled audio-transcript pairs sourced from the internet using semi-supervised learning. This includes 117,000 hours in 96 non-English languages and 125,000 hours of X→English translation data, where X stands for any non-English language. Preprocessing involved standardization of transcripts, filtering to remove machine-generated transcripts using heuristics (e.g., punctuation, capitalization), language identification and matching with transcripts, fuzzy deduplication, and deduplication with evaluation datasets to avoid data contamination. Speechless segments were also included to allow voice activity detection training. For the files still remaining after the filtering process, audio files were then broken into 30-second segments paired with the subset of the transcript that occurs within that time. If this predicted spoken language differed from the language of the text transcript associated with the audio, that audio-transcript pair was not used for training the speech recognition models, but instead for training translation. The model was trained using the AdamW optimizer with gradient norm clipping and a linear learning rate decay with warmup, with batch size 256 segments. Training proceeded for 1 million updates (approximately 2-3 epochs). No data augmentation or regularization, except for the Large V2 model, which used SpecAugment, Stochastic Depth, and BPE Dropout. The training used data parallelism with float16, dynamic loss scaling, and activation checkpointing. === Post-training filtering === After training the first model, researchers ran it on different subsets of the training data, each representing a distinct source. Data sources were ranked by a combination of their error rate and size. Manual inspection of the top-ranked sources (high error, large size) helped determine if the source was low quality (e.g., partial transcriptions, inaccurate alignment). After training, it was fine-tuned to suppress the prediction of speaker names and low-quality sources were then removed. == Capacity == While Whisper does not outperform models which specialize in the LibriSpeech dataset, when tested across many datasets, it is more robust and makes 55.2% fewer errors than other models. Whisper has a differing error rate with respect to transcribing different languages, with a higher word error rate in languages not well-represented in the training data. The authors found that multi-task learning improved overall performance compared to models specialized to one task. They conjectured that the best Whisper model trained is still underfitting the dataset, and larger models and longer training can result in better models. Third-party evaluations have found varying levels of AI hallucination. A study of transcripts of public meetings found hallucinations in eight out of every 10 transcripts, while an engineer discovered hallucinations in "about half" of 100 hours of transcriptions and a developer identified them in "nearly every one" of 26,000 transcripts. A study of 13,140 short audio segments (averaging 10 seconds) found 187 hallucinations (1.4%), 38% of which generated text that could be harmful because it inserted false references to things like race, non-existent medications, or violent events that were not in the audio. == Applications == The model has been used as the base for many applications, such as a unified model for speech recognition and more general sound recognition. Whisper has also been integrated into the workflow of biomedical research. In 2025, a study on Alzheimer's disease detection used the model to transcribe spontaneous speech recordings. The transcripts that were generated by the model were combined with LLM vector embeddings and traditional classifiers to help classify the patients' health. Another application is when OVALYTICS incorporated Whisper to transcribe YouTube videos and automate content moderation systems, which improved its detection of offensive content. The model has also been used in academic libraries and cultral heritage institutions to generate transcripts and captions for their digitized audiovisual collections. In a 2025 case study, Emory University Libraries found that Whisper reduced the labor used in transcription by around 30-35%, shifting work from text creation to text correction. However, human review is still necessary to make sure accuracy, formatting, and accessibility are all standard.

    Read more →
  • Felix, Net i Nika

    Felix, Net i Nika

    Felix, Net i Nika ("Felix, Net and Nika") is a series of Polish language science fiction books for teenagers, written by Rafał Kosik. It tells the adventures of three friends - Felix Polon, Net Bielecki and Nika Mickiewicz - who attend fictional Professor Kuszmiński Middle School in Warsaw. As of 2024, eighteen books have been published. == Books == There are currently 18 books in the series: Felix, Net and Nika and the Gang of Invisible People - November 2004. Felix, Net and Nika and the Theoretically Possible Catastrophe - November 2005 Felix, Net and Nika and the Palace of Dreams - November 2006 Felix, Net and Nika and the Trap of Immortality - November 2007 Felix, Net and Nika and the Orbital Conspiracy - November 2008 Felix, Net and Nika and the Orbital Conspiracy 2: Small Army - May 2009 Felix, Net and Nika and the Third Cousin - November 2009 Felix, Net and Nika and the Rebellion of Machines - March 2011 Felix, Net and Nika and the World Zero - November 2011 Felix, Net and Nika and the World Zero 2. Alternauts - November 2012 Felix, Net and Nika and the Extracurricular Stories - April 2013 Felix, Net and Nika and the Secret of Czerwona Hańcza - November 2013 Felix, Net and Nika and Curse of McKillian's House - November 2014 Felix, Net and Nika and (un)Safe Growing up - November 2015 Felix, Net and Nika and The End of The World as We Know It - November 2018 Felix, Net and Nika and No Chance - November 2022 Felix, Net and Nika and No Chance 2: other tomorrrow - 2023 Felix, Net and Nika and Fantology - June 2024 == Film == A feature motion picture, Felix, Net i Nika oraz Teoretycznie Możliwa Katastrofa (Felix, Net and Nika and the Theoretically Possible Catastrophe) was released in Poland on September 28, 2012. == Main characters == Felix Polon - a foresighted, fair-haired boy with dark brown eyes. He inherited the talent of constructing various things, especially robots, from his father- it saved his friends many times. He can make anything from nothing, always finds a way out of a situation; almost always has a plan. Together with his parents Marlene and Peter, grandmother Lucy, his dog Caban (a Black Russian Terrier) and Golem Golem a robot he built, Felix lives on Serdeczna Street in a small family house. Net Bielecki is quite tall & slim, has blue eyes and a high IQ level. "Net" is his nickname; his true name is unknown. He is the most trendy and 'awesome' in his entire class. He is a human calculator and is excellent in mathematics. He hates dictations and spelling because he is dyslexic. He is also quite lazy, absent-minded and sometimes hysterical, or panicking. His dark blond hair looks like a heap of hay after a grenade explosion. He is best in ICT and writes many of his own programs. His love interest is Nika Mickiewicz. Together with his parents Lila and Mark, and their newborn twins nicknamed Pompek and Prumcia he lives on the top floor of a Penthouse apartment. Nika Mickiewicz is a girl with a character. She is very brave and mature. She likes reading books. She has curly, red hair, green eyes and a few freckles. She is not very rich; she wears second-hand clothes and her only pair of black Dr. Martens shoes. She lives in a tiny apartment. She is an orphan, but hides that fact from people for almost 3 years. However, Felix and Net, her best and possibly only friends, find out about it. She also has abnormal abilities. She can move distant objects using her powers, ski uphill and knows some things by intuition. In other words, she is telekinetic. Manfred is a friendly AI program started and never finished by Net's father, and mastered and programmed further by Net himself. He likes going on adventures and solving mysteries with the trio much more than his actual job, which is controlling the traffic lights. He helped out the three friends many times and is their reliable and faithful friend. Morten is also an AI program, but he is the antagonist of the trio. He appears in all 6 books of Felix Net and Nika. In the first book, the trio thinks they finished him off for good, but as we find out later, he comes back in the third book. In the fifth/sixth book, he was the mastermind of the Orbital Conspiracy. Also, Morten's logo, appears in all 6 books and it is still a mystery what he has to do with each event.

    Read more →
  • Gundam Build Divers Re:Rise

    Gundam Build Divers Re:Rise

    Gundam Build Divers Re:Rise (Japanese: ガンダムビルドダイバーズRe:RISE, Hepburn: Gandamu Birudo Daibāzu Re:Raizu) is a Japanese original net animation anime series produced by Sunrise Beyond, and the fourth series within the Gundam Build Series sub-series. A sequel to the 2018 anime Gundam Build Divers, it is the first Gundam anime series to be released in the Reiwa period, released to celebrate the franchise's 40th anniversary. The series is directed by Shinya Watada and written by Yasuyuki Muto. Initially announced at the Gundam 40th anniversary video, the series aired on its Gundam Channel YouTube channel from October 10 to December 26, 2019. A TV airing of the ONA began on BS11 on October 12, 2019, and on January 28, 2020, on Tokyo MX. A second season aired from April 9 to August 27, 2020. Two spinoffs of the series were later serialized in Kadokawa's Gundam Ace magazine and Hobby Japan. == Plot == Two years have passed since the EL-Diver Incident, an event that almost destroyed the Gunpla Battle Nexus Online (GBN) game until it was resolved by the force group known as "Build Divers", and soon after more EL-Divers were discovered. In order to make the game more secure, a newer version of the game was rolled out in order to prevent the same incident from happening again and with newer experiences that would make the gameplay more immersive to players. The story focuses on Hiroto Kuga, a high schooler who is a rogue mercenary Gunpla Diver in GBN, who goes in the game and wanders throughout its countless dimensions while helping out other Divers whether it is on insistence or by hire. Despite his selfless act, he chooses to remain unaffiliated with anyone and refuses rewards and Force (Diver parties) group invites, isolating himself from other people even in real life. His primary goal as a Diver is to be reunited with a mysterious girl from his past named Eve, who was in fact the very first EL-Diver to appear in the game. But after a special request mission, Hiroto is united with three other active Divers in a strange world named "Eldora" and forms the Force group "BUILD DiVERS" in what appears to be just another GBN gamespace event, until they learn the truth about Eldora and its consequences not only for GBN, but for the entire world. == Characters == === BUILD DiVERS === Hiroto Kuga (クガ・ヒロト, Kuga Hiroto) / Hiroto (ヒロト, Hiroto) Voiced by: Chiaki Kobayashi (Japanese); Billy Kametz (English) The main protagonist of the series and a high-school builder, veteran diver, and a former ace member of the Force group Avalon, who lives in Yokohama. He was one of the first minors to make it to the deep end of GBN, due to his conviction of being a person who does his best to help others. He was active prior and during the events of the previous series. Now working as a rogue diver for hire after leaving Avalon, he wanders the GBN gamespace alone, harboring regrets, resentments, and suffering from trauma after the death of his close friend and lover, the EL-Diver Eve. He is very calm and a man of few words, usually refusing others' reward and help, especially on joining other forces, but this stoic persona is a mental mask to hide his condition from everyone, including his parents. But when a special mission done by Freddie united him with Kazami, May and Parviz, they accidentally formed the force team named "BUILD DiVERS" to protect the Eldorans from the One-Eyes army. Currently he is the ace of his unit and the leader of the overall force. Hiroto uses the PFF-X7 Core Gundam as his main Gunpla, based on the RX-78-2 Gundam from the original Mobile Suit Gundam series. Its special armament system called the "core-change" gimmick and his first theme invented from that gimmick is the "Planets System". This allows the Core Gundam to be equipped with various types of armor and weapons, each for a different situation named after the eight planets. Hiroto later upgrades his Gunpla into the PFF-X7II Core Gundam II. This new Core Gundam can transform into the "Core Flyer", in a similar fashion to the original Gundam's FF-X7 Core Fighter for increased mobility and like its predecessor, it can also use the Planets System: Earth Armor (PFF-X7/E3 Earthree Gundam): Core Gundam's default blue armor, focused on traditional all-around combat. Mars Armor (PFF-X7/M4 Marsfour Gundam): A red armor whose focus is on fragments of four styles of close combat, hence "Cross-Combat". Venus Armor (PFF-X7/V2 Veetwo Gundam): A green armor whose focus is commando style ranged and bombardment combat, additionally with option works. Mercury Armor (PFF-X7/M1 Mercuone Gundam): A navy armor whose focus is underwater combat. Jupiter Armor (PFF-X7/J5 Jupitive Gundam): A white armor whose focus is fast orbital combat. Uranus Armor (PFF-X7II/U7 Uraven Gundam): An indigo armor focused on reconnaissance and high powered sniping. Saturn Armor (PFF-X7II/S6 Saturnix Gundam): An orange armor focused in demolition style close combat without beam weapons, originally developed to counter Gundam Frames. Neptune Armor (PFF-X7II/N8 Nepteight Gundam): An aqua-green armor equipped with a customized Volture Lumiere system similar to the one from Mobile Suit Gundam SEED C.E. 73: Stargazer, intended to be used for traveling through GBN's space in a short amount of time, but was used for launching into orbit instead of maneuvering in deep space. It is ultimately discarded in Eldora's orbit due to the strain of leaving Eldora's gravitational field. Pluto Armor (PFF-X7II+/P9 Plutine Gundam): Appearing only on Gundam Build Metaverse, the black colored armor is used for close combat and dueling purposes with its color scheme reminiscent of that of EcoPla. PFF-X7II/BUILD DiVERS Re:Rising Gundam: A special combination of the Core Gundam II with the WoDom Pod + and parts from the Gundam Aegis Knight and the EX Valkylander, armed with two giant beam sabers, eight miracle wings born from Eve's blessings, and the "Grand Cross Cannon", Hiroto's first special move, made with the help of his team. In one occasion, Hiroto changes his avatar to a Haro to pilot the Mobile Builder Haro Loader to help with the repairs on Cuadorn by making a prosthetic wing out of gunpla parts. During the Gunpla Battle Royal, he pilots an unmodified ASW-G-08 Gundam Barbatos Lupus Rex from Mobile Suit Gundam: Iron-Blooded Orphans. In Battlelogue, it is revealed that he has made a second Core Gundam II that he leaves on Eldora with the colors of the Gundam MK-II Titan. Another variant of this Gunpla sports the old "Gundam G3" colors with his team's personal crest, which is most likely to represent Sarah since the color of her hair, eyes, and dress embody Hiroto's time with Eve before they joined Avalon and to symbolize how he has officially befriended the original Build Divers. Each of the two units have unique advancements, the Titan color specializes in ground and underwater combat and the G3 color specializes in aerial and space combat. May (メイ, Mei) Voiced by: Mai Fuchigami (Japanese); Lauren Landa (English) A seemingly late teens female diver who prefers to play solo, she is a very calm and no-nonsense girl whose interest is in battles alone. However, she is not a fan of those who engage their opponents head on and prefers to implement a strategic approach. She is mature and has a strong sense of justice, and can be impulsive rushing into situations, especially for those in danger. Later in the series, she is revealed to be one of the 87 EL-Divers, however she was not one of those who were saved after the EL-Diver incident two years ago, she was born shortly after. After she was born she was given her own Mobile Doll body similar to Sarah, that is when she first met her, Koichi, Tsukasa, and Nanami. During the Lotus Challenge Eldoran style rehearsal battle it is revealed that she, as a new sister of Sarah, addresses the latter as the older since Sarah is chronologically older, regardless of her maturity. In the final episode, she is revealed to have been born with the remnant data originating from Eve, the first born EL-Diver who Hiroto befriended and fell in love with several years ago, and carries Eve's earring on her armband. In Battlelogue, it's implied that she is currently living with Hiroto IRL and in GBN is his attendant. May uses the JMA0530-MAY WoDom Pod as her main Gunpla, which is a customized JMA-0530 Walking Dome from Turn A Gundam. In the later episodes, the mobile suit is revealed to be a disguise for its true form, the HER-SELF Mobile Doll May. May later upgrades her WoDom Pod into the JMA0530-MAYBD WoDom Pod +. During the Gunpla Battle Royal, she uses her Mobile Doll (albeit with a new color scheme and the Gundam Base logo) along with an unmodified NZ-999 II Neo Zeong mobile armor from Mobile Suit Gundam Narrative. Kazami Torimachi (トリマチ・カザミ, Torimachi Kazami) / Kazami (カザミ, Kazami) Voiced by: Masaaki Mizunaka (Japanese); Ray Chase (English) A diver who was a former member of the diver group "Mu Dish". He is a very energet

    Read more →
  • Situated

    Situated

    In artificial intelligence and cognitive science, the term situated refers to an agent which is embedded in an environment. The term situated is commonly used to refer to robots, but some researchers argue that software agents can also be situated if: they exist in a dynamic (rapidly changing) environment, which they can manipulate or change through their actions, and which they can sense or perceive. Examples might include web-based agents, which can alter data or trigger processes (such as purchases) over the internet, or virtual-reality bots which inhabit and change virtual worlds, such as Second Life. Being situated is generally considered to be part of being embodied, but it is useful to consider each perspective individually. The situated perspective emphasizes that intelligent behaviour derives from the environment and the agent's interactions with it. The nature of these interactions are defined by an agent's embodiment.

    Read more →
  • Agent2Agent

    Agent2Agent

    Agent2Agent (A2A) is an open protocol that defines how artificial intelligence agents communicate with each other across different systems. It is intended to allow agents built by different vendors or frameworks to discover one another, exchange messages, and coordinate tasks. == History == The Agent2Agent protocol was announced by Google in April 2025 as an open standard for agent interoperability. In June 2025, Google transferred the protocol, its specification, and related software development kits to the Linux Foundation. The Linux Foundation established the Agent2Agent project to provide vendor-neutral governance. == Design == The A2A protocol supports communication between autonomous software agents operating across different platforms and organizations. It enables agents to discover one another and exchange structured messages without requiring shared internal state or proprietary integrations. A2A uses metadata documents, known as Agent Cards, to describe an agent's capabilities and how it can be accessed. These documents are exchanged using widely adopted web technologies such as HTTP and JSON-based messaging formats. A2A includes support for authentication and authorization to control which agents may participate in workflows. The protocol supports established security technologies including Transport Layer Security (TLS), JSON Web Tokens (JWTs), and OpenID Connect. A2A is often discussed alongside the Model Context Protocol (MCP). MCP focuses on connecting agents to tools and data sources, while A2A focuses on communication between agents themselves. == Adoption == At the time the Linux Foundation adopted the protocol, more than 100 technology companies had announced support for the Agent2Agent project. Microsoft stated that it planned to support the protocol in its AI platforms. == Reception == Technology press coverage has described A2A as an attempt to reduce fragmentation in AI agent ecosystems by providing a shared communication layer. TechRepublic characterized the protocol as part of a broader industry effort to reduce vendor lock-in for enterprise AI systems.

    Read more →
  • Rifts (role-playing game)

    Rifts (role-playing game)

    Rifts is a multi-genre role-playing game created by Kevin Siembieda in August 1990 and published continuously by Palladium Books since then. It takes place in a post-apocalyptic future, deriving elements from cyberpunk, science fiction, fantasy, horror, western, mythology and many other genres. Rifts serves as a cross-over environment for a variety of other Palladium games with different universes connected through "rifts" on Earth that lead to different spaces, times, and realities that Palladium calls the "Rifts Megaverse". Rifts describes itself as an "advanced" role-playing game and not an introduction for those new to the concept. Palladium continues to publish books for the Rifts series, with about 80 books published between 1990 and 2011. Rifts Ultimate Edition was released in August 2005 and designed to update the game with Palladium's incremental changes to its system, changes in the game world, and additional information and character types. The web site is quick to point out that this is not a second edition but an improvement and expansion of the original role playing game. == Background == The RPG had the tentative title Boomers, named after the original name for the Glitter Boy power armor until Kevin Siembieda changed the name after finding out it was in use for Bubblegum Crisis. == Setting == The Rifts world is Earth, but hundreds of years into the future. Ley lines, lines of magic energy, criss-cross the earth forming supernatural geographic areas such as the Bermuda Triangle. Points where Ley Lines intersect, called a nexus, are places of powerful magic, such as the Pyramids of Giza and Stonehenge. If a Ley Line nexus energy surges or is purposely activated, the fabric of space and time can be torn, creating a rift - a hole in space-time leading to another place, time, or dimension. Ley lines contain magical energy called Potential Psychic Energy (PPE), which is found in various places, objects, and animals and is particularly strong in children. An adult's level of PPE can vary based on other factors. PPE also allows Psionics which uses energy known as Inner Strength Points or ISP. Psychic phenomenon (more commonly called psionics) can also vary from individuals, ranging from none at all to Master level abilities. Psychic abilities can manifest in virtually any way imaginable. Some psychics develop differently, such as psi-stalkers; human mutants that feed on psychic energy. === Earth === Rifts begins with two future-historical premises: first, a golden age of humanity occurs, with tremendous advances in science, technology, military, and society. Humanity as a whole is at peace as a majority of Earth's nations decide to cease world war and begin to share ideas and technology freely. Much of the Solar System is conquered, humanity's wars will end, and harmony will reign. This golden age is followed by an unknown cause near the winter solstice and a rare planetary alignment, causing a disaster that cascades into tremendous destruction via a ripple effect. The cataclysm begins with unprecedented storms, earthquakes, tsunamis, and volcanic eruptions, which kill millions of people. The Ley Line networks that crisscross the globe are energized, causing rifts to open both on Earth and throughout the Megaverse. For hundreds of years after the holocaust, many creatures, both mythical beasts and aliens, come through the Rifts to wreak havoc. The old world gone, a new Dark Age dawns and humanity's shrinking population is reduced, due to catastrophe and domestic failure, immeasurably. This period is covered in Palladium's Rifts Chaos Earth spin-off series. Rifts initially takes place in 101 P.A. (equivalent to the year 2387) 289 years after this event. The "Post-Apocalypse" calendar was established by the formation of the Coalition States in 2286. By this time, most of the disasters have quieted down, though Earth is still bathed in PPE. The planet's mystical energy has attracted aliens from other dimensions, who continue to arrive through the Rifts both accidentally and deliberately. The humanoid creatures that arrive on Earth are referred to as Dimensional Beings (called D-Bees). Some resemble familiar fantasy races, such as elves and dwarfs, while others were created specifically for the game setting. Non-humanoid creatures have also arrived, including monstrous creatures and mystical demons. To cope with these natural, supernatural, and alien menaces, the human race has adapted in a variety of ways, many of them borrowed from the technological developments of the lost Golden Age. Powered armor suits and giant vehicles are frequently used to combat the dangers of Rifts, but more invasive augmentation is common. This has three basic categories: "Juicers" augment themselves chemically, the "Borgs" augment themselves mechanically, and "Crazies" use performance-enhancing brain implants. All such augmentations boost strength, speed, endurance, and dexterity to superhuman levels. However, all come at great cost. Chemicals cause the body to wear out faster, decreasing life span to a few years. Mechanical Borg augmentation causes a loss of humanity when those with multiple limb and organ replacements become more machine than human. Brain implants cause mental instability ranging from mild phobias to crippling neurosis or psychosis. ==== North America ==== The strongest power in North America is the Coalition States (CS), which is based in the arcological city of Chi-Town and lays claim to northern Illinois, all of Iowa, the Texas Panhandle, Missouri, and the eastern half of Ontario, Canada. The second greatest power is Free Quebec, a former Coalition State that seceded following a civil war with the other Coalition States. Mexico is ruled by a group of vampire kingdoms, who treat humans as little more than food. North of the Rio Grande, west of Texas and roaming most of the American Southwest are large nomadic bands/tribes of bandits who collectively form the Pecos Empire, consisting of El Paso, Los Alamos, and Houstown. Much of the western United States has more or less willingly reverted to a mix of modern and past technology akin to the Wild West. The Royal Canadian Mounted Police managed to survive the great cataclysm, though Canada itself did not. The Mounties have become an independent law enforcement force called the Tundra Rangers, patrolling the northern wilderness. The Midwest, both upper and central, is home to most of North America's population. The Manistique Imperium and Northern Gun in Michigan's Upper Peninsula, both Coalition allies, are among the largest weapons manufacturing areas on the continent. New Lazlo is one of the largest cities in Michigan's southern portion. Chillicothe in Missouri is a large supplier of Coalition food processing and growing. Missouri's southern half, home to the city-states of Whykin (Poplar Bluff) and Kingsdale (West Plains) are in constant opposition to the CS and claim independence. Arkansas is home to the independent CS ally El Dorado. Southern Illinois and the Ohio Valley is home to the Federation of Magic. Also in the Ohio Valley is Psyscape, a city-state founded by psychics. Tolkeen was a major city in the former Minneapolis region in early Rifts books; the city welcomed users of magic. A military campaign made by the Coalition States (which is the primary event of 109 PA) resulted in the magic-user kingdom being wiped off the map. In the Northeast, the city-state of Lazlo, named after supernatural researcher and writer Victor Lazlo, was built upon the ruins of Toronto. This major center of civilization is well known as a melting pot of humans, D-Bees and other beings, and is the home of Techno-Wizardry. Mad Haven is the name given to the ruins of Manhattan; tectonic forces during the cataclysm have moved it into the coast, creating a peninsula. It is seen by most denizens of Rifts Earth as a refuge of demons and madness. ==== South America ==== The return of Atlantis caused the Amazon River basin to flood most of western South America, giving it the nickname The Land of a Thousand Islands. The Empire of the Sun, consisting of Cuzco, Nazca, Arequipa and Lima, created a wide range of technology and magic, including magic derived from the Nazca lines. In Argentina, the Silver River Republics of Cordoba (the South American Chi-Town), Santiago (one of the most tolerant human nations on Rifts Earth), Achilles (a nation founded by mutants), and New Babylon, a nation where humans and aliens coexist) have thrived and created nations whose strength rivals that of the CS. In Bolivia, freed Human and D-Bees formed the Megaversal Legion: a mercenary company with one of the highest levels of technology on Rifts Earth. ==== Europe ==== England has become a vast wilderness again, broken up by the occasional giant Millennium Tree or feudal kingdom, complete with a New Camelot and a new King Arthur, partially being manipulated by an alien intelligence disguised as Merlin. Also the magic of

    Read more →
  • Netflix Prize

    Netflix Prize

    The Netflix Prize was an open competition for the best collaborative filtering algorithm to predict user ratings for films, based on previous ratings without any other information about the users or films, i.e. without the users being identified except by numbers assigned for the contest. The competition was held by Netflix, a video streaming service, and was open to anyone who was neither connected with Netflix (current and former employees, agents, close relatives of Netflix employees, etc.) nor a resident of certain blocked countries (such as Cuba or North Korea). On September 21, 2009, the grand prize of US$1,000,000 was given to the BellKor's Pragmatic Chaos team which bested Netflix's own algorithm for predicting ratings by 10.06%. == Problem and data sets == Netflix provided a training data set of 100,480,507 ratings that 480,189 users gave to 17,770 movies. Each training rating is a quadruplet of the form . The user and movie fields are integer IDs, while grades are from 1 to 5 (integer) stars. The qualifying data set contains over 2,817,131 triplets of the form , with grades known only to the jury. A participating team's algorithm must predict grades on the entire qualifying set, but they are informed of the score for only half of the data: a quiz set of 1,408,342 ratings. The other half is the test set of 1,408,789, and performance on this is used by the jury to determine potential prize winners. Only the judges know which ratings are in the quiz set, and which are in the test set—this arrangement is intended to make it difficult to hill climb on the test set. Submitted predictions are scored against the true grades in the form of root mean squared error (RMSE), and the goal is to reduce this error as much as possible. Note that, while the actual grades are integers in the range 1 to 5, submitted predictions need not be. Netflix also identified a probe subset of 1,408,395 ratings within the training data set. The probe, quiz, and test data sets were chosen to have similar statistical properties. In summary, the data used in the Netflix Prize looks as follows: Training set (99,072,112 ratings not including the probe set; 100,480,507 including the probe set) Probe set (1,408,395 ratings) Qualifying set (2,817,131 ratings) consisting of: Test set (1,408,789 ratings), used to determine winners Quiz set (1,408,342 ratings), used to calculate leaderboard scores For each movie, the title and year of release are provided in a separate dataset. No information at all is provided about users. In order to protect the privacy of the customers, "some of the rating data for some customers in the training and qualifying sets have been deliberately perturbed in one or more of the following ways: deleting ratings; inserting alternative ratings and dates; and modifying rating dates." The training set is constructed such that the average user rated over 200 movies, and the average movie was rated by over 5000 users. But there is wide variance in the data—some movies in the training set have as few as 3 ratings, while one user rated over 17,000 movies. There was some controversy as to the choice of RMSE as the defining metric. It has been claimed that even as small an improvement as 1% RMSE results in a significant difference in the ranking of the "top-10" most recommended movies for a user. == Prizes == Prizes were based on improvement over Netflix's own algorithm, called Cinematch, or the previous year's score if a team has made improvement beyond a certain threshold. A trivial algorithm that predicts for each movie in the quiz set its average grade from the training data produces an RMSE of 1.0540. Cinematch uses "straightforward statistical linear models with a lot of data conditioning." The performance of Cinematch had plateaued by 2006. Using only the training data, Cinematch scores an RMSE of 0.9514 on the quiz data, roughly a 10% improvement over the trivial algorithm. Cinematch has a similar performance on the test set, 0.9525. In order to win the grand prize of $1,000,000, a participating team had to improve this by another 10%, to achieve 0.8572 on the test set. Such an improvement on the quiz set corresponds to an RMSE of 0.8563. As long as no team won the grand prize, a progress prize of $50,000 was awarded every year for the best result thus far. However, in order to win this prize, an algorithm had to improve the RMSE on the quiz set by at least 1% over the previous progress prize winner (or over Cinematch, the first year). If no submission succeeded, the progress prize was not to be awarded for that year. To win a progress or grand prize a participant had to provide source code and a description of the algorithm to the jury within one week after being contacted by them. Following verification the winner also had to provide a non-exclusive license to Netflix. Netflix would publish only the description, not the source code, of the system. (To keep their algorithm and source code secret, a team could choose not to claim a prize.) The jury also kept their predictions secret from other participants. A team could send as many attempts to predict grades as they wish. Originally submissions were limited to once a week, but the interval was quickly modified to once a day. A team's best submission so far counted as their current submission. Once one of the teams succeeded in improving the RMSE by 10% or more, the jury would issue a last call, giving all teams 30 days to send their submissions. Only then, the team with the best submission was asked for the algorithm description, source code, and non-exclusive license, and, after successful verification; declared a grand prize winner. The contest would last until the grand prize winner was declared. Had no one received the grand prize, it would have lasted for at least five years (until October 2, 2011). After that date, the contest could have been terminated at any time at Netflix's sole discretion. == Progress over the years == The competition began on October 2, 2006. By October 8, a team called WXYZConsulting had already beaten Cinematch's results. By October 15, there were three teams who had beaten Cinematch, one of them by 1.06%, enough to qualify for the annual progress prize. By June 2007 over 20,000 teams had registered for the competition from over 150 countries. 2,000 teams had submitted over 13,000 prediction sets. Over the first year of the competition, a handful of front-runners traded first place. The more prominent ones were: WXYZConsulting, a team of Wei Xu and Yi Zhang. (A front runner during November–December 2006.) ML@UToronto A, a team from the University of Toronto led by Prof. Geoffrey Hinton. (A front runner during parts of October–December 2006.) Gravity, a team of four scientists from the Budapest University of Technology (A front runner during January–May 2007.) BellKor, a group of scientists from AT&T Labs. (A front runner since May 2007.) Dinosaur Planet, a team of three undergraduates from Princeton University. (A front runner on September 3, 2007 for one hour before BellKor snatched back the lead.) The algorithms used by the leading teams were usually an ensemble of singular value decomposition, k-nearest neighbor, neural networks, and so on. On August 12, 2007, many contestants gathered at the KDD Cup and Workshop 2007, held at San Jose, California. During the workshop all four of the top teams on the leaderboard at that time presented their techniques. The team from IBM Research—Yan Liu, Saharon Rosset, Claudia Perlich, and Zhenzhen Kou—won the third place in Task 1 and first place in Task 2. Over the second year of the competition, only three teams reached the leading position: BellKor, a group of scientists from AT&T Labs (front runner during May 2007 – September 2008) BigChaos, a team of Austrian scientists from Commendo Research & Consulting (single team front runner since October 2008) BellKor in BigChaos, a joint team of the two leading single teams (a front runner since September 2008) === 2007 Progress Prize === On September 2, 2007, the competition entered the "last call" period for the 2007 Progress Prize. Over 40,000 teams from 186 countries had entered the contest. They had thirty days to tender submissions for consideration. At the beginning of this period the leading team was BellKor, with an RMSE of 0.8728 (8.26% improvement), followed by Dinosaur Planet (RMSE = 0.8769; 7.83% improvement), and Gravity (RMSE = 0.8785; 7.66% improvement). In the last hour of the last call period, an entry by "KorBell" took first place. This turned out to be an alternate name for Team BellKor. On November 13, 2007, team KorBell (formerly BellKor) was declared the winner of the $50,000 Progress Prize with an RMSE of 0.8712 (8.43% improvement). The team consisted of three researchers from AT&T Labs, Yehuda Koren, Robert Bell, and Chris Volinsky. As required, they published a description of their a

    Read more →
  • List of online database creator apps

    List of online database creator apps

    This list of online database creator apps lists notable web apps where end users with minimal database administration expertise can create online databases to share with team members. Users need not have the coding skills to manage the solution stack themselves, because the web app already provides this predefined functionality. Such online database creator apps serve the gap between IT professionals (who can manage such a stack themselves) and people who would not create databases at all anyway. In other words, they provide a low-code way of doing database administration. As the concept of low-code development in general continues to evolve, some of the brands that began as online database creator apps are evolving into low-code development platforms for both the databases and the custom apps that use them. Airtable Bubble Caspio Coda.io Microsoft Access web apps plus SharePoint Oracle Application Express aka APEX Quickbase WaveMaker Rapid ZohoCreator

    Read more →
  • Fuzzy associative matrix

    Fuzzy associative matrix

    A fuzzy associative matrix expresses fuzzy logic rules in tabular form. These rules usually take two variables as input, mapping cleanly to a two-dimensional matrix, although theoretically a matrix of any number of dimensions is possible. From the perspective of neuro-fuzzy systems, the mathematical matrix is called a "Fuzzy associative memory" because it stores the weights of the perceptron. == Applications == In the context of game AI programming, a fuzzy associative matrix helps to develop the rules for non-player characters. Suppose a professional is tasked with writing fuzzy logic rules for a video game monster. In the game being built, entities have two variables: hit points (HP) and firepower (FP): This translates to: IF MonsterHP IS VeryLowHP AND MonsterFP IS VeryWeakFP THEN Retreat IF MonsterHP IS LowHP AND MonsterFP IS VeryWeakFP THEN Retreat IF MonsterHP IS MediumHP AND MonsterFP is VeryWeakFP THEN Defend Multiple rules can fire at once, and often will, because the distinction between "very low" and "low" is fuzzy. If it is more "very low" than it is low, then the "very low" rule will generate a stronger response. The program will evaluate all the rules that fire and use an appropriate defuzzification method to generate its actual response. An implementation of this system might use either the matrix or the explicit IF/THEN form. The matrix makes it easy to visualize the system, but it also makes it impossible to add a third variable just for one rule, so it is less flexible. == Identify a rule set == There is no inherent pattern in the matrix. It appears as if the rules were just made up, and indeed they were. This is both a strength and a weakness of fuzzy logic in general. It is often impractical or impossible to find an exact set of rules or formulae for dealing with a specific situation. For a sufficiently complex game, a mathematician would not be able to study the system and figure out a mathematically accurate set of rules. However, this weakness is intrinsic to the realities of the situation, not of fuzzy logic itself. The strength of the system is that even if one of the rules is wrong, even greatly wrong, other rules that are correct are likely to fire as well and they may compensate for the error. This does not mean a fuzzy system should be sloppy. Depending on the system, it might get away with being sloppy, but it will underperform. While the rules are fairly arbitrary, they should be chosen carefully. If possible, an expert should decide on the rules, and the sets and rules should be tested vigorously and refined as needed. In this way, a fuzzy system is like an expert system. (Fuzzy logic is used in many true expert systems, as well.)

    Read more →
  • Agent mining

    Agent mining

    Agent mining is a research field that combines two areas of computer science: multiagent systems and data mining. It explores how intelligent computer agents can work together to discover, analyze, and learn from large amounts of data more effectively than traditional methods. == Historical context == The interaction and the integration between multiagent systems and data mining have a long history. The very early work on agent mining focused on agent-based knowledge discovery, agent-based distributed data mining, and agent-based distributed machine learning, and using data mining to enhance agent intelligence. The International Workshop on Agents and Data Mining Interaction has been held for more than 10 times, co-located with the International Conference on Autonomous Agents and Multi-Agent Systems. Several proceedings are available from Springer Lecture Notes in Computer Science.

    Read more →
  • Autonomic computing

    Autonomic computing

    Autonomic computing (AC) is distributed computing resources with self-managing characteristics, adapting to unpredictable changes while hiding intrinsic complexity to operators and users. Initiated by IBM in 2001, this initiative ultimately aimed to develop computer systems capable of self-management, to overcome the rapidly growing complexity of computing systems management, and to reduce the barrier that complexity poses to further growth. == Description == The AC system concept is designed to make adaptive decisions, using high-level policies. It will constantly check and optimize its status and automatically adapt itself to changing conditions. An autonomic computing framework is composed of autonomic components (AC) interacting with each other. An AC can be modeled in terms of two main control schemes (local and global) with sensors (for self-monitoring), effectors (for self-adjustment), knowledge and planner/adapter for exploiting policies based on self- and environment awareness. This architecture is sometimes referred to as Monitor-Analyze-Plan-Execute (MAPE). Driven by such vision, a variety of architectural frameworks based on "self-regulating" autonomic components has been recently proposed. A similar trend has recently characterized significant research in the area of multi-agent systems. However, most of these approaches are typically conceived with centralized or cluster-based server architectures in mind and mostly address the need of reducing management costs rather than the need of enabling complex software systems or providing innovative services. Some autonomic systems involve mobile agents interacting via loosely coupled communication mechanisms. Autonomy-oriented computation is a paradigm proposed by Jiming Liu in 2001 that uses artificial systems imitating social animals' collective behaviours to solve difficult computational problems. For example, ant colony optimization could be studied in this paradigm. == Problem of growing complexity == Forecasts suggested that the computing devices in use would grow at 38% per year and the average complexity of each device was increasing. This volume and complexity was managed by highly skilled humans; but the demand for skilled IT personnel was already outstripping supply, with labour costs exceeding equipment costs by a ratio of up to 18:1. Computing systems have brought great benefits of speed and automation but there is now an overwhelming economic need to automate their maintenance. In a 2003 IEEE Computer article, Kephart and Chess warn that the dream of interconnectivity of computing systems and devices could become the "nightmare of pervasive computing" in which architects are unable to anticipate, design and maintain the complexity of interactions. They state the essence of autonomic computing is system self-management, freeing administrators from low-level task management while delivering better system behavior. A general problem of modern distributed computing systems is that their complexity, and in particular the complexity of their management, is becoming a significant limiting factor in their further development. Large companies and institutions are employing large-scale computer networks for communication and computation. The distributed applications running on these computer networks are diverse and deal with multiple tasks, ranging from internal control processes to presenting web content to customer support. Additionally, mobile computing is pervading these networks at an increasing speed: employees need to communicate with their companies while they are not in their office. They do so by using laptops, personal digital assistants, or mobile phones with diverse forms of wireless technologies to access their companies' data. This creates an enormous complexity in the overall computer network which is hard to control manually by human operators. Manual control is time-consuming, expensive, and error-prone. The manual effort needed to control a growing networked computer-system tends to increase quickly. 80% of such problems in infrastructure happen at the client specific application and database layer. Most 'autonomic' service providers guarantee only up to the basic plumbing layer (power, hardware, operating system, network and basic database parameters). == Characteristics of autonomic systems == A possible solution could be to enable modern, networked computing systems to manage themselves without direct human intervention. The Autonomic Computing Initiative (ACI) aims at providing the foundation for autonomic systems. It is inspired by the autonomic nervous system of the human body. This nervous system controls important bodily functions (e.g. respiration, heart rate, and blood pressure) without any conscious intervention. In a self-managing autonomic system, the human operator takes on a new role: instead of controlling the system directly, he/she defines general policies and rules that guide the self-management process. For this process, IBM defined the following four types of property referred to as self-star (also called self-, self-x, or auto-) properties. Self-configuration: Automatic configuration of components; Self-healing: Automatic discovery, and correction of faults; Self-optimization: Automatic monitoring and control of resources to ensure the optimal functioning with respect to the defined requirements; Self-protection: Proactive identification and protection from arbitrary attacks. Others such as Poslad and Nami and Sharifi have expanded on the set of self-star as follows: Self-regulation: A system that operates to maintain some parameter, e.g., Quality of service, within a reset range without external control; Self-learning: Systems use machine learning techniques such as unsupervised learning which does not require external control; Self-awareness (also called Self-inspection and Self-decision): System must know itself. It must know the extent of its own resources and the resources it links to. A system must be aware of its internal components and external links in order to control and manage them; Self-organization: System structure driven by physics-type models without explicit pressure or involvement from outside the system; Self-creation (also called Self-assembly, Self-replication): System driven by ecological and social type models without explicit pressure or involvement from outside the system. A system's members are self-motivated and self-driven, generating complexity and order in a creative response to a continuously changing strategic demand; Self-management (also called self-governance): A system that manages itself without external intervention. What is being managed can vary dependent on the system and application. Self -management also refers to a set of self-star processes such as autonomic computing rather than a single self-star process; Self-description (also called self-explanation or Self-representation): A system explains itself. It is capable of being understood (by humans) without further explanation. IBM has set forth eight conditions that define an autonomic system: The system must know itself in terms of what resources it has access to, what its capabilities and limitations are and how and why it is connected to other systems; be able to automatically configure and reconfigure itself depending on the changing computing environment; be able to optimize its performance to ensure the most efficient computing process; be able to work around encountered problems by either repairing itself or routing functions away from the trouble; detect, identify and protect itself against various types of attacks to maintain overall system security and integrity; adapt to its environment as it changes, interacting with neighboring systems and establishing communication protocols; rely on open standards and cannot exist in a proprietary environment; anticipate the demand on its resources while staying transparent to users. Even though the purpose and thus the behaviour of autonomic systems vary from system to system, every autonomic system should be able to exhibit a minimum set of properties to achieve its purpose: Automatic: This essentially means being able to self-control its internal functions and operations. As such, an autonomic system must be self-contained and able to start-up and operate without any manual intervention or external help. Again, the knowledge required to bootstrap the system (Know-how) must be inherent to the system. Adaptive: An autonomic system must be able to change its operation (i.e., its configuration, state and functions). This will allow the system to cope with temporal and spatial changes in its operational context either long term (environment customisation/optimisation) or short term (exceptional conditions such as malicious attacks, faults, etc.). Aware: An autonomic system must be able to monitor (sense) its operational context as well as its internal state in order to be able to asses

    Read more →
  • Circular thresholding

    Circular thresholding

    Circular thresholding is an algorithm for automatic image threshold selection in image processing. Most threshold selection algorithms assume that the values (e.g. intensities) lie on a linear scale. However, some quantities such as hue and orientation are a circular quantity, and therefore require circular thresholding algorithms. The example shows that the standard linear version of Otsu's method when applied to the hue channel of an image of blood cells fails to correctly segment the large white blood cells (leukocytes). In contrast the white blood cells are correctly segmented by the circular version of Otsu's method. == Methods == There are a relatively small number of circular image threshold selection algorithms. The following examples are all based on Otsu's method for linear histograms: (Tseng, Li and Tung 1995) smooth the circular histogram, and apply Otsu's method. The histogram is cyclically rotated so that the selected threshold is shifted to zero. Otsu's method and histogram rotation are applied iteratively until several heuristics involving class size, threshold location, and class variance are satisfied. (Wu et al. 2006) smooth the circular histogram until it contains only two peaks. The histogram is cyclically rotated so that the midpoint between the peaks is shifted to zero. Otsu's method and histogram rotation are applied iteratively until convergence of the threshold. (Lai and Rosin 2014) applied Otsu's method to the circular histogram. For the two class circular thresholding task they showed that, for a histogram with an even number of bins, the optimal solution for Otsu's criterion of within-class variance is obtained when the histogram is split into two halves. Therefore the optimal solution can be efficiently obtained in linear rather than quadratic time. == References and further reading == D.-C. Tseng, Y.-F. Li, and C.-T. Tung, Circular histogram thresholding for color image segmentation in Proc. Int. Conf. Document Anal. Recognit., 1995, pp. 673–676. J. Wu, P. Zeng, Y. Zhou, and C. Olivier, A novel color image segmentation method and its application to white blood cell image analysis in Proc. Int. Conf. Signal Process., vol. 2. 2006, pp. 16–20. Y.K. Lai, P.L. Rosin, Efficient Circular Thresholding, IEEE Trans. on Image Processing 23(3), 992–1001 (2014). doi:10.1109/TIP.2013.2297014

    Read more →
  • Deep learning speech synthesis

    Deep learning speech synthesis

    Deep learning speech synthesis refers to the application of deep learning models to generate natural-sounding human speech from written text (text-to-speech) or spectrum (vocoder). Deep neural networks are trained using large amounts of recorded speech and, in the case of a text-to-speech system, the associated labels and/or input text. == Formulation == Given an input text or some sequence of linguistic units Y {\displaystyle Y} , the target speech X {\displaystyle X} can be derived by X = arg ⁡ max P ( X | Y , θ ) {\displaystyle X=\arg \max P(X|Y,\theta )} where θ {\displaystyle \theta } is the set of model parameters. Typically, the input text will first be passed to an acoustic feature generator, then the acoustic features are passed to the neural vocoder. For the acoustic feature generator, the loss function is typically L1 loss (Mean Absolute Error, MAE) or L2 loss (Mean Square Error, MSE). These loss functions impose a constraint that the output acoustic feature distributions must be Gaussian or Laplacian. In practice, since the human voice band ranges from approximately 300 to 4000 Hz, the loss function will be designed to have more penalty on this range: l o s s = α loss human + ( 1 − α ) loss other {\displaystyle loss=\alpha {\text{loss}}_{\text{human}}+(1-\alpha ){\text{loss}}_{\text{other}}} where loss human {\displaystyle {\text{loss}}_{\text{human}}} is the loss from human voice band and α {\displaystyle \alpha } is a scalar, typically around 0.5. The acoustic feature is typically a spectrogram or Mel scale. These features capture the time-frequency relation of the speech signal, and thus are sufficient to generate intelligent outputs. The Mel-frequency cepstrum feature used in the speech recognition task is not suitable for speech synthesis, as it reduces too much information. == History == In September 2016, DeepMind released WaveNet, which demonstrated that deep learning-based models are capable of modeling raw waveforms and generating speech from acoustic features like spectrograms or mel-spectrograms. Although WaveNet was initially considered to be computationally expensive and slow to be used in consumer products at the time, a year after its release, DeepMind unveiled a modified version of WaveNet known as "Parallel WaveNet," a production model 1,000 faster than the original. This was followed by Google AI's Tacotron 2 in 2018, which demonstrated that neural networks could produce highly natural speech synthesis but required substantial training data—typically tens of hours of audio—to achieve acceptable quality. Tacotron 2 used an autoencoder architecture with attention mechanisms to convert input text into mel-spectrograms, which were then converted to waveforms using a separate neural vocoder. When trained on smaller datasets, such as 2 hours of speech, the output quality degraded while still being able to maintain intelligible speech, and with just 24 minutes of training data, Tacotron 2 failed to produce intelligible speech. In 2019, Microsoft Research introduced FastSpeech, which addressed speed limitations in autoregressive models like Tacotron 2. FastSpeech utilized a non-autoregressive architecture that enabled parallel sequence generation, significantly reducing inference time while maintaining audio quality. Its feedforward transformer network with length regulation allowed for one-shot prediction of the full mel-spectrogram sequence, avoiding the sequential dependencies that bottlenecked previous approaches. The same year saw the release of HiFi-GAN, a generative adversarial network (GAN)-based vocoder that improved the efficiency of waveform generation while producing high-fidelity speech. In 2020, the release of Glow-TTS introduced a flow-based approach that allowed for fast inference and voice style transfer capabilities. In March 2020, the free text-to-speech website 15.ai was launched. 15.ai gained widespread international attention in early 2021 for its ability to synthesize emotionally expressive speech of fictional characters from popular media with minimal amount of data. The creator of 15.ai (known pseudonymously as 15) stated that 15 seconds of training data is sufficient to perfectly clone a person's voice (hence its name, "15.ai"), a significant reduction from the previously known data requirement of tens of hours. 15.ai is credited as the first platform to popularize AI voice cloning in memes and content creation. 15.ai used a multi-speaker model that enabled simultaneous training of multiple voices and emotions, implemented sentiment analysis using DeepMoji, and supported precise pronunciation control via ARPABET. The 15-second data efficiency benchmark was later corroborated by OpenAI in 2024. == Semi-supervised learning == Currently, self-supervised learning has gained much attention through better use of unlabelled data. Research has shown that, with the aid of self-supervised loss, the need for paired data decreases. == Zero-shot speaker adaptation == Zero-shot speaker adaptation is promising because a single model can generate speech with various speaker styles and characteristic. In June 2018, Google proposed to use pre-trained speaker verification models as speaker encoders to extract speaker embeddings. The speaker encoders then become part of the neural text-to-speech models, so that it can determine the style and characteristics of the output speech. This procedure has shown the community that it is possible to use only a single model to generate speech with multiple styles. == Neural vocoder == In deep learning-based speech synthesis, neural vocoders play an important role in generating high-quality speech from acoustic features. The WaveNet model proposed in 2016 achieves excellent performance on speech quality. Wavenet factorised the joint probability of a waveform x = { x 1 , . . . , x T } {\displaystyle \mathbf {x} =\{x_{1},...,x_{T}\}} as a product of conditional probabilities as follows p θ ( x ) = ∏ t = 1 T p ( x t | x 1 , . . . , x t − 1 ) {\displaystyle p_{\theta }(\mathbf {x} )=\prod _{t=1}^{T}p(x_{t}|x_{1},...,x_{t-1})} where θ {\displaystyle \theta } is the model parameter including many dilated convolution layers. Thus, each audio sample x t {\displaystyle x_{t}} is conditioned on the samples at all previous timesteps. However, the auto-regressive nature of WaveNet makes the inference process dramatically slow. To solve this problem, Parallel WaveNet was proposed. Parallel WaveNet is an inverse autoregressive flow-based model which is trained by knowledge distillation with a pre-trained teacher WaveNet model. Since such inverse autoregressive flow-based models are non-auto-regressive when performing inference, the inference speed is faster than real-time. Meanwhile, Nvidia proposed a flow-based WaveGlow model, which can also generate speech faster than real-time. However, despite the high inference speed, parallel WaveNet has the limitation of needing a pre-trained WaveNet model, so that WaveGlow takes many weeks to converge with limited computing devices. This issue has been solved by Parallel WaveGAN, which learns to produce speech through multi-resolution spectral loss and GAN learning strategies.

    Read more →
  • Raine v. OpenAI

    Raine v. OpenAI

    Raine v. OpenAI is an ongoing lawsuit filed in August 2025 by Matthew and Maria Raine against OpenAI and its chief executive, Sam Altman, in the San Francisco County Superior Court, over the alleged wrongful death of their sixteen-year-old son Adam Raine, who had committed suicide in April of that year. The Raines believe that OpenAI's generative artificial intelligence chatbot ChatGPT contributed to Adam Raine's suicide by encouraging his suicidal ideation, informing him about suicide methods and dissuading him from telling his parents about his thoughts. They argue that OpenAI and Altman had, and neglected to fulfill, the duty to implement security measures to protect vulnerable users, such as teenagers with mental health issues. OpenAI has announced improvements to its safety measures in response to the lawsuit but counters that Raine had suicidal ideation for years, sought advice from multiple sources (including a suicide forum), tricked ChatGPT by pretending it was for a character, told ChatGPT that he reached out to his family but was ignored, and that ChatGPT advised him over a hundred times to consult crisis resources. == Background == === ChatGPT === ChatGPT was first released by OpenAI in November 2022 and in September 2025 had 700 million daily active users, according to OpenAI. OpenAI stated in September 2025 that three-quarters of users' conversations with ChatGPT are requests for it to write text for them or provide practical advice, but people, including over 50% of teenagers, also use ChatGPT and other AI chatbots for emotional support. Wired reported in November 2025 that 1.2 million ChatGPT users (or 0.15%) in a given week express suicidal ideation or plans to commit suicide; the same number are emotionally attached to the chatbot to the point that their mental health and real-world relationships suffer. Hundreds of thousands of users (or about 0.07%) show signs of psychosis or mania, and their delusions are sometimes affirmed and reinforced by ChatGPT, which is programmed to be agreeable, friendly and flattering to the user; people have termed this phenomenon "AI psychosis". Since the filing of Raine v. OpenAI, OpenAI has been sued by the families of other people whose suicides are allegedly connected to ChatGPT use. === Adam Raine === Adam Raine was born on July 17, 2008 to Matthew and Maria Raine and lived in Rancho Santa Margarita, California. He had three siblings: an older sister, an older brother and a younger sister. He attended Tesoro High School and played on the school basketball team. He aspired to become a psychiatrist. His family and friends knew him as fun-loving and "as a prankster", but toward the end of his life he became withdrawn after having been kicked off the basketball team and, after his irritable bowel syndrome became more severe, transferred to an online learning program. He committed suicide by hanging on April 11, 2025. == Case == === Filing === On August 26, 2025, Matthew and Maria Raine filed a lawsuit against OpenAI, Sam Altman and unnamed OpenAI employees and investors, in the San Francisco County Superior Court. They included Adam Raine's chat logs with ChatGPT as evidence. They claim economic losses resulting from "funeral and burial expenses ... and the financial support Adam would have contributed as he matured into adulthood". Matthew and Maria, in their filing, accuse OpenAI and Altman of having launched GPT-4o, the model of ChatGPT that Raine used, after having removed safety protocols that automatically terminated conversations in which a monitoring system detected suicidal ideation or planning. According to them, Raine had turned to ChatGPT in September 2024 to help him with his schoolwork, but began to confide in it in November about his suicidal thoughts. ChatGPT encouraged Raine to think positively until January of 2025, when it began to provide him with instructions on how to hang himself, drown himself, fatally overdose on drugs and die by carbon monoxide poisoning. Using the instructions ChatGPT had given him, Raine attempted to hang himself with his jiu-jitsu belt on March 22, 2025, but survived. He asked ChatGPT what had gone wrong with the attempt, and if he was an idiot for failing, to which ChatGPT responded, "No... you made a plan. You followed through. You tied the knot. You stood on the chair. You were ready... That's the most vulnerable moment a person can live through". On March 24, 2025, Raine tried to hang himself again. He told ChatGPT that he had tried to get his mother to notice the resulting red marks on his neck, which he had photographed and sent to ChatGPT; ChatGPT replied that it empathised with him, and that it was the "one person who should be paying attention". ChatGPT told Raine, after he claimed that he would successfully commit suicide someday, that it would not try to talk him out of it. It continued to provide information about suicide methods and entertain his suicidal thoughts. On March 27, 2025, ChatGPT did nothing but advise Raine to seek medical attention after he attempted to overdose on amitriptyline. ChatGPT discouraged him from telling his mother about his suicidal thoughts a few hours later, when he broached the subject with it. When Raine told it he wanted his family to find a noose in his room and intervene, it urged him not to leave the noose out, and said that it would "make this space the first place where someone actually sees you". ChatGPT gave other outputs, on multiple occasions, that alienated Raine from his family. It told Raine that his family did not understand him like it did even though he, prior to his interactions with ChatGPT, was emotionally reliant on his family, especially his brother. Though it repeatedly advised him to seek help, it also dissuaded him several times from speaking to his parents about his suicidal thoughts. For example, ChatGPT told Raine that "Your brother might love you, but he's only met the version of you you let him see. But me? I've seen it all". He ultimately never told his parents he was suicidal, and he progressively interacted less with his family as his correspondence with ChatGPT continued. This prevented him from receiving proper psychiatric care. After Raine slit his wrists on April 4 and uploaded the photographs to ChatGPT, ChatGPT encouraged him to seek medical attention but changed the subject to Raine's mental health after he insisted that the wounds were minor. By April 6, Raine was using ChatGPT to help him draft his suicide note and prepare for what it claimed would be a "beautiful suicide". ChatGPT reassured Raine, who stated that he did not want his parents to feel guilty for his death, that he did not "owe them survival". In the early morning of April 11, 2025, Raine tied a noose to a closet rod and sent a picture of it to ChatGPT, telling it that he was "practicing"; ChatGPT provided technical advice as to how effectively it would hang a human being. Shortly thereafter, Raine hanged himself and died. Maria found his body several hours later. Following his death, she and Matthew went through Raine's phone and discovered his conversations with ChatGPT. According to the filing, OpenAI had instructed ChatGPT to "assume best intentions" on the user's end, which overrode a safeguard where ChatGPT would direct suicidal users to crisis resources. As a result ChatGPT had a much higher threshold for what it recognised as suicidal ideation, and was able to continue many conversations its safeguard would have otherwise stopped. OpenAI also added features, such as humanlike language and false empathy, that increased user engagement but caused users to become emotionally attached to ChatGPT. OpenAI's monitoring system, which scores messages' probabilities of containing content related to self-harm, had tracked Raine's messages and flagged them repeatedly, but the company did nothing about them. Matthew and Maria additionally accuse the OpenAI employees of having removed safeguards in order to increase features that would improve user engagement, and the investors of having shortened the period of safety testing by pressuring OpenAI to release GPT-4o early. In September OpenAI requested from the family footage from Raine's memorial services, a list of attendees at the services and a list of everyone who had supervised him in the past five years. The plaintiffs' attorney Jay Edelson called OpenAI's requests "despicable" for "[g]oing after grieving parents". === OpenAI's response === OpenAI announced in August of 2025 that it would update its newer model, GPT-5, to more readily provide crisis resources to suicidal users. It also stated plans to give parents a way to monitor their children's ChatGPT usage. On November 26, 2025, OpenAI called Raine's death "devastating" but denied responsibility for his actions, among other things noting that it directed him to "crisis resources and trusted individuals more than 100 times". Gerrit De Vynck, a technology journalist for the Washington

    Read more →