AI Email Tools

AI Email Tools — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Medical data breach

    Medical data breach

    Medical data, including patients' identity information, health status, disease diagnosis and treatment, and biogenetic information, not only involve patients' privacy but also have a special sensitivity and important value, which may bring physical and mental distress and property loss to patients and even negatively affect social stability and national security once leaked. However, the development and application of medical AI must rely on a large amount of medical data for algorithm training, and the larger and more diverse the amount of data, the more accurate the results of its analysis and prediction will be. However, the application of big data technologies such as data collection, analysis and processing, cloud storage, and information sharing has increased the risk of data leakage. In the United States, the rate of such breaches has increased over time, with 176 million records breached by the end of 2017. By 2024, the U.S. Department of Health and Human Services reported 725 large healthcare data breaches affecting approximately 275 million individual records in a single year, marking a significant escalation in both the frequency and scale of incidents. == Black market for health data == In February 2015 an NPR report claimed that organized crime networks had ways of selling health data in the black market. In 2015 a Beazley employee estimated that medical records could sell on the black market for US$40-50. == How data is lost == Theft, data loss, hacking, and unauthorized account access are ways in which medical data breaches happen. Among reported breaches of medical information in the United States networked information systems accounted for the largest number of records breached. There are many data breaches happening in the US health care system, among business associates of the health care providers that continuously gain access to patients' data. == List of data breaches == In February 2024, a ransomware attack on Change Healthcare, a subsidiary of UnitedHealth Group, compromised the protected health information of approximately 100 million individuals, making it the largest healthcare data breach in United States history. The attack disrupted claims processing for healthcare providers nationwide for several weeks. In May 2024, MediSecure suffered a cyberattack involving ransomware in Australia. In May 2021, the Health Service Executive in the Republic of Ireland was the victim of a cyberattack involving ransomware, in the Health Service Executive cyberattack, with admission records and test results present in a sample of the data reviewed by the Financial Times. In October 2018, the Centers for Medicare and Medicaid Services in the US reported that around 75,000 individual records had been affected by a data breach that took place through the ACA Agent and Broker Portal. In 2018, Social Indicators Research published the scientific evidence of 173,398,820 (over 173 million) individuals affected in USA from October 2008 (when the data were collected) to September 2017 (when the statistical analysis took place). In 2015, Anthem Inc. lost data for 37 million people in the Anthem medical data breach In 2014 4.5 million people using Complete Health Systems had their data stolen In 2013-14 1 million people using Montana Department of Public Health and Human Services had their data stolen In 2013 4 million people using Advocate Health and Hospitals Corporation had their data stolen In 2011 4.9 million users of Tricare services had their data stolen due to an employee error by Science Applications International Corporation In 2011 1.9 million people using Health Net had their data stolen In 2011 1 million people using Nemours Foundation had their data stolen In 2010 6800 people using New York-Presbyterian Hospital and Columbia University Medical Center had their data breached. In response, those organizations agreed to pay the United States Department of Health and Human Services a US$4.8 million dollar fine. In 2009 1 million people using BlueCross BlueShield of Tennessee had their data stolen == Regulation == In the United States, the Health Insurance Portability and Accountability Act and Health Information Technology for Economic and Clinical Health Act require companies to report data breaches to affected individuals and the federal government. Under the HIPAA Breach Notification Rule, covered entities must notify affected individuals without unreasonable delay and no later than 60 days after discovering a breach of unsecured protected health information. Breaches affecting 500 or more individuals must also be reported to the HHS Secretary and to prominent media outlets serving the affected state or jurisdiction within the same timeframe; HHS publicly lists these larger breaches on its breach portal, commonly known as the "wall of shame." Breaches affecting fewer than 500 individuals are reported to HHS annually, no later than 60 days after the end of the calendar year in which they were discovered. Health Information Privacy Health Insurance Portability and Accountability Act of 1996 (HIPAA). - 45 CFR Parts 160 and 164, Standards for Privacy of Individually Identifiable Health Information and Security Standards for the Protection of Electronic Protected Health Information. HIPAA includes provisions designed to save health care businesses money by encouraging electronic transactions, as well as regulations to protect the security and confidentiality of patient information. The Privacy Rule became effective April 14, 2001, and most covered entities (health plans, health care clearinghouses, and health care providers that conduct certain financial and administrative transactions electronically) had until April 2003 to comply. This security provision became effective April 21, 2003. The Health Insurance Portability and Accountability Act (HIPAA) is the baseline set of federal regulations governing medical information. It does three things: i. i. i.Establish a structure for how personal health information is disclosed and establish the rights of individuals with respect to health information; ii.Specify security standards for the retention and transmission of electronic patient information; iii.Need a common format and data structure for the electronic exchange of health information. California-Specific Laws California’s medical privacy laws, primarily the Confidentiality of Medical Information Act (CMIA), the data breach sections of the Civil Code, and sections of the Health and Safety Code, provide HIPAA-like protections, although the terminology is different. HIPAA establishes a federal "minimum standard" that applies where there are gaps in California law, and HIPAA also specifies that stricter state laws will override or supersede HIPAA. California's health care privacy laws apply to providers who provide personal health records (PHR), while HIPAA only applies when the provider providing the PHR is a business associate of a covered entity. Federal law does not grant individuals the right to file a lawsuit in the event of a data breach (only the Attorney General can file a lawsuit), but California law does. This means that California law sets a higher standard for medical privacy, and that individuals in California enjoy stronger legal protections and more ways to hold entities that violate their medical privacy accountable. In the UK, the legal framework for how patient data is cared for and processed is the Data Protection Act 2018 (DPA), which incorporates the EU General Data Protection Regulation (GDPR) into law, and the common law duty of confidentiality (CLDC). The data protection legislation requires that the collection and processing of personal data be fair, lawful and transparent. This means that the collection and processing of data as defined by data protection legislation must always have a valid lawful basis and must also meet the requirements of the CLDC. In the China, Article 18 of the "National Health Care Big Data Standards, Security and Services Management Measures (for Trial Implementation)" (National Health Planning and Development (2018) No. 23) promulgated by the National Health Care Commission in 2018 states, "The responsible unit shall adopt measures such as data classification, important data backup, and encryption authentication to guarantee the security of health care big data." However, the scope and definition of important data are not covered. Although the "Information Security Technology-Healthcare Data Security Guide" (the "Guide") issued by the National Standardization Committee also proposes that important data should be evaluated and approved in accordance with the regulations, there is likewise no definition of the connotation and definition of important data.

    Read more →
  • ComfyUI

    ComfyUI

    ComfyUI is an open source, node-based program that allows users to generate images from a series of text prompts. It uses free diffusion models such as Stable Diffusion as the base model for its image capabilities combined with other tools such as ControlNet and LCM Low-rank adaptation with each tool being represented by a node in the program. == History == ComfyUI was released on GitHub in January 2023. According to comfyanonymous, the creator, a major goal of the project was to improve on existing software designs in terms of the user interface. The creator had been involved with Stability AI but by 3 June 2024 that involvement had ended and an organization called Comfy Org had been created along with the core developers. In July 2024, Nvidia announced support for ComfyUI within its RTX Remix modding software. In August 2024, support was added for the Flux diffusion model developed by Black Forest Labs, and Comfy Org joined the Open Model Initiative created by the Linux Foundation. As of Sept 2025, the project has 89.2k stars on GitHub. ComfyUI is one of the most popular user interfaces for Stable Diffusion, along with Automatic1111. == Features == ComfyUI's main feature is that it is node based. Each node has a function such as "load a model" or "write a prompt". The nodes are connected to form a control-flow graph called a workflow. When a prompt is queued, a highlighted frame appears around the currently executing node, starting from "load checkpoint" and ending with the final image and its save location. Workflows commonly consist of tens of nodes, forming a complex directed acyclic graph. Node types include loading a model, specifying prompts, samplers, schedulers, VAE decoders, face restoration and upscaling models, LoRAs, embeddings, and ControlNets. Several samplers are supported, such as Euler, Euler_a, dpmpp_2m_sde and dpmpp_3m_sde. Workflows can be saved to a file, allowing users to re-use node workflows and share them with other users. The file format for the workflows is in JSON and can be embedded in the generated images. Users have also created custom extensions to the base system which are exposed as new nodes, such as the extension for AnimateDiff, which aims to create videos. ComfyUI has been described as more complex compared to other diffusion UIs such as Automatic1111. A default node group is also included with the program. As of December 2024, 1,674 nodes were supported. ComfyUI Supports multiple text-to-image models including, Stable Diffusion, Flux and Tencent's Hunyuan-DiT, as well as custom models from Civitai like Pony. == LLMVision extension compromise == In June 2024, a hacker group called "Nullbulge" compromised an extension of ComfyUI to add malicious code to it. The compromised extension, called ComfyUI_LLMVISION, was used for integrating the interface with AI language models GPT-4 and Claude 3, and was hosted on GitHub. Nullbulge hosted a list of hundreds of ComfyUI users' login details across multiple services on its website, while users of the extension reported receiving numerous login notifications. vpnMentor conducted security research on the extension and claimed it could "steal crypto wallets, screenshot the user’s screen, expose device information and IP addresses, and steal files that contain certain keywords or extensions". Nullbulge's website claims they targeted users who committed "one of our sins", which included AI-art generation, art theft, promoting cryptocurrency, and any other kind of theft from artists such as from Patreon. They claimed that they were "a collective of individuals who believe in the importance of protecting artists' rights and ensuring fair compensation for their work" and that they believed that "AI-generated artwork is detrimental to the creative industry and should be discouraged".

    Read more →
  • Sriram Krishnan

    Sriram Krishnan

    Sriram Krishnan (born 1984) is a tech executive and White House official, currently serving as the Senior White House Policy Advisor on Artificial Intelligence. Krishnan was named a Time Person of the Year in 2025 as an "Architect of Artificial Intelligence." He was described in Time as providing the "wake-up call that we needed" to the other AI builders, leading to "a multiyear, $500 billion initiative dubbed Stargate" to push American-made AI, as well as numerous other AI initiatives. Also in December 2025, President Trump said of Krishnan, "without him, things on AI would not function well" and cited Krishnan as the leading figure behind the American executive order on AI. As the leader of the United States' policy team regarding artificial intelligence, Krishnan plays "a significant role in shaping the administration’s approach to AI and driving measures to advance federal adoption of AI." The role calls for removing barriers to AI adoption within the government, driving vendors toward solutions suitable for federal needs, designing sensible regulation of private-sector AI, and conducting "AI diplomacy". He has stated a policy goal of "reinvigorating US dominance in emerging technologies," including AI. He also represents the United States' interests in AI abroad, such as at the Paris AI Summit. He is one of the authors of the American "AI Action Plan" released in July, 2025, which he contends is necessary to win the "existential race with China" for AI supremacy. Krishnan, a U.S. citizen born in India, is also a venture capitalist, podcaster, product manager and author. Early in his career, he led product teams at Microsoft, Twitter, Yahoo!, Facebook, and Snap. In addition to his work as an investor and technologist, he and his wife, Aarthi Ramamurthy, rose to additional prominence in 2021 as podcast hosts. He served as a general partner at the venture capital firm Andreessen Horowitz and led its London office. In 2022, Krishnan announced that he was working with Elon Musk on the rebuilding of Twitter following Musk's acquisition of the company. On December 22, 2024, US president-elect Donald Trump announced that Krishnan would be Senior White House Policy Advisor on Artificial Intelligence in his incoming administration; in 2026 he joined the National Economic Council. == Early life and education == Krishnan was born in Chennai, India. He earned his Bachelor of Technology in Information Technology from SRM University (2001–2005), moved to the United States in 2007 to join Microsoft, and became a naturalized U.S. citizen in 2016. == Career == === Early career === In 2007, he began working at Microsoft where he served as a program manager for Visual Studio. At Facebook, Krishnan built the Facebook Audience Network, a competitive platform to Google's ad technologies. At Twitter, he led product and core user experience, driving a 20% annual user growth rate and launching a redesigned home page and events experience. === Andreessen Horowitz === Krishnan was appointed a general partner of American venture capital firm Andreessen Horowitz ("a16z") in February 2021. He was anticipated to serve consumer and social markets, however he has also theorized on the impact of "deep tech" on society. In 2023 he was appointed to lead the firm's London office, its first non-US location. The office is expected to serve Web3 investments as well as AI and other fields. Krishnan announced that he would leave the firm at the end of 2024. === Social media and AI === In 2022, various news media reported that Krishnan was assisting Elon Musk in the revamp of Twitter following Musk's takeover of the company. Additional reports named Krishnan as the leading candidate for the role of CEO of the newly private company. Krishnan penned a 2023 New York Times opinion column regarding social media, AI, and related fields. He predicted a rise in the number and diversity of online spaces due to decentralization and platforms like Farcaster, Bluesky and Mastodon. === Public office === In 2024, the Financial Times reported that Krishnan was active in international affairs, reintroducing Boris Johnson to Elon Musk, following Musk's nomination to the proposed Department of Government Efficiency. Krishnan was also reported as potentially leaving a16z at the end of the year to "be jumping into something I've wanted to spend [his] energy on," which was widely reported as being related to Musk's and Vivek Ramaswamy's work at DOGE. Others reported to be involved include Joe Lonsdale, Marc Andreesen, Bill Ackman, and Travis Kalanick. On December 22, 2024, US president-elect Donald Trump announced that he would be Senior White House Policy Advisor on Artificial Intelligence in his incoming administration. On February 6, 2025, Reuters reported that Krishnan would be accompanying Vice President Vance to the Paris AI Summit, a "major artificial intelligence" event later that month. Other members of the White House Office of Science and Technology Policy would also be joining the event with around 100 other countries to "focus on AI's potential." Krishnan joined a U.S. technology policy delegation to the Middle East in advance of President Trump's visit in May 2025. Conducting "AI diplomacy," Krishnan negotiated the spread of U.S. AI technologies with Crown Prince Mohammed bin Salman of Saudi Arabia, as well as other means to strengthen bilateral trade in artificial intelligence technologies. He explained that the goal of the diplomatic mission was that "we want American A.I. to spread." Krishnan, along with David Sacks and Michael Kratsios, were credited as authors of the American AI Action Plan released in July 2025. The plan is "the administration’s most significant policy directive" regarding artificial intelligence; it calls for financing to support the global spread of American AI models and a policy to enforce neutrality in models. The Washington Post referred to the plan as a "bold action to ensure that American AI remains at the cutting edge." The AI Action Plan is a continuation of prior efforts to reduce barriers to U.S. production of AI systems and the removal of rules that were considered to hinder such growth. Later in 2025, at the POLITICO AI & Tech Summit, Krishnan called national AI development "an existential race with China." He suggested that private companies are best positioned to create new models, quipping "let them cook." He further suggested that state-by-state regulation of AI technologies may hinder national AI competitiveness. Also in 2025, at the Axios AI+ Summit, Krishnan stated that the United States and China are in a race for AI supremacy, in which the winner will be judged by market share. Winning the race is a "business strategy" to Krishnan. Krishnan was named in the 2025 Time Person of the Year article as an "AI Architect". === The Aarthi and Sriram Show and other media === In early 2021, Krishnan and his wife, Aarthi Ramamurthy, launched a Clubhouse talk show that "focuses on organic conversations on anything from startups to venture capitalism and cryptocurrencies." An early appearance by Elon Musk on the Good Time Show was described as the first show that "broke Clubhouse" by rapidly exceeding the limit of 5,000 simultaneous users. The desire to interact with a larger community led to a variety of later innovations to allow streaming and replaying of Clubhouse chats. On that episode, Elon Musk grilled Robinhood CEO Vlad Tenev regarding the GameStop trading controversy. As of December 2021, the show had over 187,000 subscribers, plus 735,000 subscribers between Krishnan and Ramamurthy's personal Clubhouse accounts. Other guests have included Facebook CEO Mark Zuckerberg, Diane von Fürstenberg, Tony Hawk, MrBeast, and A.R. Rahman. In 2022, the Good Time Show moved to YouTube. It then evolved to a podcasting format under the name The Aarthi and Sriram Show, with both audio and video content. The Hollywood Reporter reported that the podcast had received more than 1 million downloads by early 2023. == Personal life == Krishnan is married to Aarthi Ramamurthy, co-host of The Aarthi and Sriram Show (formerly the Good Time Show) and a serial entrepreneur. They met in college in 2003 through a Yahoo! chat room related to a coding project and began dating in 2006 and eloped in 2010. == Awards == Time Person of the Year - 2025

    Read more →
  • Dataism

    Dataism

    Dataism is a term that has been used to describe the mindset or philosophy created by the emerging significance of big data. It was first used by David Brooks in The New York Times in 2013. The term has been expanded to describe what historian Yuval Noah Harari, in his book Homo Deus: A Brief History of Tomorrow from 2015, calls an emerging ideology or even a new form of religion, in which "information flow" is the "supreme value". In art, the term was used by Albert-Laszlo Barabasi to refer to an artist movement that uses data as its primary source of inspiration. == History == "If you asked me to describe the rising philosophy of the day, I'd say it is Data-ism", wrote David Brooks in The New York Times in February 2013. Brooks argued that in a world of increasing complexity, relying on data could reduce cognitive biases and "illuminate patterns of behavior we haven't yet noticed". In 2015, Steve Lohr's book Data-ism looked at how Big Data is transforming society, using the term to describe the Big Data revolution. In his 2016 book Homo Deus: A Brief History of Tomorrow, Yuval Noah Harari argues that all competing political or social structures can be seen as data processing systems: "Dataism declares that the universe consists of data flows, and the value of any phenomenon or entity is determined by its contribution to data processing" and "we may interpret the entire human species as a single data processing system, with individual humans serving as its chips." According to Harari, a Dataist should want to "maximise dataflow by connecting to more and more media". Harari predicts that the logical conclusion of this process is that, eventually, humans will give algorithms the authority to make the most important decisions in their lives, such as whom to marry and which career to pursue. Harari argues that Aaron Swartz could be called the "first martyr" of Dataism. In 2022, Albert-László Barabási coined the term "Dataism" to define an artistic movement that positions data as the central means of understanding nature, society, technology, and human essence. This movement underscores the necessity for art to integrate with data to stay relevant in contemporary society. Dataism responds to the intricacy and interconnectedness of modern social, economic, and technological realms, which exceed individual understanding. Advocating for the use of methodologies from various fields like science, business, and politics in art, Dataism sees this fusion as essential for art to retain its significance and influence. == Criticism == Commenting on Harari's characterisation of Dataism, security analyst Daniel Miessler believes that Dataism does not present the challenge to the ideology of liberal humanism that Harari claims, because humans will simultaneously be able to believe in their own importance and that of data. Harari himself raises some criticisms, such as the problem of consciousness, which Dataism is unlikely to illuminate. Humans may also find out that organisms are not algorithms, he suggests. Dataism implies that all data is public, even personal data, to make the system work as a whole, which is a factor that's already showing resistance today. Other analysts, such as Terry Ortleib, have looked at the extent to which Dataism poses a dystopian threat to humanity. The Facebook–Cambridge Analytica data scandal showed how political leaders manipulated Facebook's users' data to build specific psychological profiles that went on to manipulate the network. A team of data analysts reproduced the AI technology developed by Cambridge Analytica around Facebook's data and was able to define the following rules: 10 likes enables a machine to know a person like a coworker, 70 likes like a friend would, 150 likes like a parent would, 300 likes like a lover would, and beyond it may be possible to know a people better than they know themselves.

    Read more →
  • Containerization (computing)

    Containerization (computing)

    In software engineering, containerization is operating-system-level virtualization or application-level virtualization over multiple resources so that software applications can run in isolated user spaces called containers in any cloud or non-cloud environment, regardless of type or vendor. The term "container" has different meanings in different contexts, and it is important to ensure that the intended definition aligns with the audience's understanding. == Usage == Each container is basically a fully functional and portable cloud or non-cloud computing environment surrounding the application and keeping it independent of other environments running in parallel. Individually, each container simulates a different software application and runs isolated processes by bundling related configuration files, libraries and dependencies. But, collectively, multiple containers share a common operating system kernel (OS). In recent times, containerization technology has been widely adopted by cloud computing platforms like Amazon Web Services, Microsoft Azure, Google Cloud Platform, and IBM Cloud. Containerization has also been pursued by the U.S. Department of Defense as a way of more rapidly developing and fielding software updates, with first application in its F-22 air superiority fighter. == History == The concept of containerization in computing originated from early operating system–level isolation mechanisms. One of the earliest implementations was the chroot system call introduced in Version 7 Unix in 1979, which changed the apparent root directory for a process and its children, providing a basic form of filesystem isolation. In the early 2000s, more advanced forms of operating system–level virtualization were developed. FreeBSD introduced "jails" in 2000, which extended isolation by restricting processes to a subset of system resources. Around the same time, Solaris introduced "zones" (also known as Solaris Containers), providing similar capabilities with resource management and isolation features. Linux later incorporated comparable functionality through kernel features such as namespaces and control groups (cgroups), which enabled isolation of process IDs, network stacks, filesystems, and resource allocation. These features formed the foundation for Linux Containers (LXC), which provided a userspace interface for managing containers. The widespread adoption of containerization accelerated with the release of Docker in 2013, which introduced a standardized format for packaging applications and their dependencies, along with tooling for image distribution and container management. == Types of containers == OS containers Application containers == Security issues == Because of the shared OS, security threats can affect the whole containerized system. In containerized environments, security scanners generally protect the OS, but not the application containers, which adds unwanted vulnerability. == Container management, orchestration, clustering == Container orchestration or container management is mostly used in the context of application containers. Implementations providing such orchestration include Kubernetes and Docker swarm. == Container cluster management == Container clusters need to be managed. This includes functionality to create a cluster, to upgrade the software or repair it, balance the load between existing instances, scale by starting or stopping instances to adapt to the number of users, to log activities and monitor produced logs or the application itself by querying sensors. Open-source implementations of such software include OKD and Rancher. Quite a number of companies provide container cluster management as a managed service, like Alibaba, Amazon, Google, and Microsoft.

    Read more →
  • Geoffrey Hinton

    Geoffrey Hinton

    Geoffrey Everest Hinton (born 6 December 1947) is a British-Canadian computer scientist, cognitive scientist, cognitive psychologist and Nobel Prize laureate known for his work on artificial neural networks, which earned him the title "the Godfather of AI". He is University Professor Emeritus at the University of Toronto. From 2013 to 2023, he divided his time working for Google Brain and the University of Toronto before publicly announcing his departure from Google in May 2023, citing concerns about the many risks of artificial intelligence (AI) technology. In 2017, he co-founded and became the chief scientific advisor of the Vector Institute in Toronto. With David Rumelhart and Ronald J. Williams, Hinton was co-author of a highly cited paper published in 1986 that popularised the backpropagation algorithm for training multi-layer neural networks, although they were not the first to propose the approach. Hinton is viewed as a leading figure in the deep learning community. The image-recognition milestone of the AlexNet designed in collaboration with his students Alex Krizhevsky and Ilya Sutskever for the ImageNet challenge 2012 was a breakthrough in the field of computer vision. Hinton received the 2018 Turing Award, together with Yoshua Bengio and Yann LeCun for their work on deep learning. They are sometimes referred to as the "Godfathers of Deep Learning" and have continued to give public talks together. He was also awarded, along with John Hopfield, the 2024 Nobel Prize in Physics for "foundational discoveries and inventions that enable machine learning with artificial neural networks". In May 2023, Hinton announced his resignation from Google to be able to "freely speak out about the risks of AI". He has voiced concerns about deliberate misuse by malicious actors, technological unemployment, and existential risk from artificial general intelligence. He noted that establishing safety guidelines will require cooperation among those competing in use of AI in order to avoid the worst outcomes. After receiving the Nobel Prize, he called for urgent research into AI safety to figure out how to control AI systems smarter than humans. == Education == Hinton was born on 6 December 1947 in Wimbledon in the United Kingdom and was educated at Clifton College in Bristol. In 1967, he matriculated as an undergraduate student at King's College, Cambridge and, after switching between different fields such as natural sciences, history of art, and philosophy, eventually graduated with a Bachelor of Arts in experimental psychology in 1970. He spent a year apprenticing carpentry before returning to academic studies. From 1972 to 1975, he continued his study at the University of Edinburgh, where he was awarded a PhD in artificial intelligence in 1978 for research supervised by Christopher Longuet-Higgins, who favored the symbolic AI approach over the neural network approach. == Career == After his PhD, Hinton initially worked at the University of Sussex and at the MRC Applied Psychology Unit. After having difficulty getting funding in Britain, he worked in the US at the University of California, San Diego, and Carnegie Mellon University. He was the founding director of the Gatsby Charitable Foundation Computational Neuroscience Unit at University College London. He is currently University Professor Emeritus in the Department of Computer Science at the University of Toronto, where he has been affiliated since 1987. Upon arrival in Canada, Geoffrey Hinton was appointed at the Canadian Institute for Advanced Research (CIFAR) in 1987 as a Fellow in CIFAR's first research program, Artificial Intelligence, Robotics & Society. In 2004, Hinton and collaborators successfully proposed the launch of a new program at CIFAR, "Neural Computation and Adaptive Perception" (NCAP), which today is named "Learning in Machines & Brains". Hinton would go on to lead NCAP for ten years. Among the members of the program are Yoshua Bengio and Yann LeCun, with whom Hinton would go on to win the ACM A.M. Turing Award in 2018. All three Turing winners continue to be members of the CIFAR Learning in Machines & Brains program. Hinton taught a free online course on Neural Networks on the education platform Coursera in 2012. He co-founded DNNresearch Inc. in 2012 with his two graduate students, Alex Krizhevsky and Ilya Sutskever, at the University of Toronto's department of computer science. In March 2013, Google acquired DNNresearch Inc. for $44 million, and Hinton planned to "divide his time between his university research and his work at Google". In May 2023, Hinton publicly announced his resignation from Google. He explained his decision, saying he wanted to "freely speak out about the risks of AI" and added that part of him now regrets his life's work. Notable former PhD students and postdoctoral researchers from his group include Peter Dayan, Sam Roweis, Max Welling, Richard Zemel, Brendan Frey, Radford M. Neal, Yee Whye Teh, Ruslan Salakhutdinov, Ilya Sutskever, Yann LeCun, Alex Graves, Zoubin Ghahramani, and Peter Fitzhugh Brown. == Research == Hinton's research concerns the use of neural networks for machine learning, memory, perception, and symbol processing. He has written or co-written more than 200 peer-reviewed publications. In the 1980s, Hinton was part of the "Parallel Distributed Processing" group at Carnegie Mellon University, which included notable scientists like Terrence Sejnowski, Francis Crick, David Rumelhart, and James McClelland. This group favoured the connectionist approach during the AI winter. Their findings were published in a two-volume set. The connectionist approach adopted by Hinton suggests that capabilities in areas like logic and grammar can be encoded into the parameters of neural networks, and that neural networks can learn them from data. Symbolists on the other side advocated for explicitly programming knowledge and rules into AI systems. In 1985, Hinton co-invented Boltzmann machines with David Ackley and Terry Sejnowski. His other contributions to neural network research include distributed representations, time delay neural network, mixtures of experts, Helmholtz machines and product of experts. An accessible introduction to Geoffrey Hinton's research can be found in his articles in Scientific American in September 1992 and October 1993. In 1995, Hinton and colleagues proposed the wake-sleep algorithm, involving a neural network with separate pathways for recognition and generation, being trained with alternating "wake" and "sleep" phases. In 2007, Hinton coauthored an unsupervised learning paper titled Unsupervised learning of image transformations. In 2008, he developed the visualization method t-SNE with Laurens van der Maaten.While Hinton was a postdoc at UC San Diego, David Rumelhart, Hinton and Ronald J. Williams applied the backpropagation algorithm to multi-layer neural networks. Their experiments showed that such networks can learn useful internal representations of data. In a 2018 interview, Hinton said that "David Rumelhart came up with the basic idea of backpropagation, so it's his invention." Although this work was important in popularising backpropagation, it was not the first to suggest the approach. Reverse-mode automatic differentiation, of which backpropagation is a special case, was proposed by Seppo Linnainmaa in 1970, and Paul Werbos proposed to use it to train neural networks in 1974. In 2017, Hinton co-authored two open-access research papers about capsule neural networks, extending the concept of "capsule" introduced by Hinton in 2011. The architecture aims to better model part-whole relationships within objects in visual data. In 2021, Hinton presented GLOM, a speculative architecture idea also aiming to improve image understanding by modeling part-whole relationships in neural networks. In 2021, Hinton co-authored a widely cited paper proposing a framework for contrastive learning in computer vision. The technique involves pulling together representations of augmented versions of the same image, and pushing apart dissimilar representations. At the 2022 Conference on Neural Information Processing Systems (NeurIPS), Hinton introduced a new learning algorithm for neural networks that he calls the "Forward-Forward" algorithm. The idea is to replace the traditional forward-backwards passes of backpropagation with two forward passes, one with positive (i.e. real) data and the other with negative data that could be generated solely by the network. The Forward-Forward algorithm is well-suited for what Hinton calls "mortal computation", where the knowledge learned is not transferable to other systems and thus dies with the hardware, as can be the case for certain analog computers used for machine learning. == Honours and awards == Hinton is a Fellow of the US Association for the Advancement of Artificial Intelligence (FAAAI) since 1990. He was elected a Fellow of the Royal Society of Canada (FRSC) in 1996, and then a

    Read more →
  • Learning Applied to Ground Vehicles

    Learning Applied to Ground Vehicles

    The Learning Applied to Ground Vehicles (LAGR) program, which ran from 2004 until 2008, had the goal of accelerating progress in autonomous, perception-based, off-road navigation in robotic unmanned ground vehicles (UGVs). LAGR was funded by DARPA, a research agency of the United States Department of Defense. == History and background == While mobile robots had been in existence since the 1960s, (e.g. Shakey), progress in creating robots that could navigate on their own, outdoors, off-road, on irregular, obstacle-rich terrain had been slow. In fact, no clear metrics were in place to measure progress. A baseline understanding of off-road capabilities began to emerge with the DARPA PerceptOR program in which independent research teams fielded robotic vehicles in unrehearsed Government tests that measured average speed and number of required operator interventions over a fixed course over widely spaced waypoints. These tests exposed the extreme challenges of off-road navigation. While the PerceptOR vehicles were equipped with sensors and algorithms that were state-of-the-art for the beginning of the 21st century, the limited range of their perception technology caused them to become trapped in natural cul-de-sacs. Furthermore, their reliance on pre-scripted behaviors did not allow them to adapt to unexpected circumstances. The overall result was that except for essentially open terrain with minimal obstacles, or along dirt roads, the PerceptOR vehicles were unable navigate without numerous, repeated operator intervention. The LAGR program was designed to build on the methodology started in PerceptOR while seeking to overcome the technical challenges exposed by the PerceptOR tests. == LAGR goals == The principal goal of LAGR was to accelerate progress in off navigation of UGVs. Additional, synergistic goals included (1) establishing benchmarking methodology for measuring progress for autonomous robots operating in unstructured environments, (2) advancing machine vision and thus enabling long-range perception, and (3) increasing the number of institutions and individuals who were able to contribute to forefront UGV research. == Structure and rationale of the LAGR program == The LAGR program was designed to focus on developing new science for robot perception and control rather than on new hardware. Thus, it was decided to create a fleet of identical, relatively simple robots that would be supplied to the LAGR researchers, who were members of competitive teams, freeing them to concentrate on algorithm development. The teams were each given two robots of the standard design. They developed new software on these robots, and then sent the code to a government test team that then tested that code on Government robots at various test courses. These courses were located throughout the US and were not previously known to the teams. In this way, the code from all teams could be tested in essentially identical circumstances. After an initial startup period, the code development/test cycle was repeated about once every month. The standard robot was designed and built by the Carnegie Mellon University National Robotics Engineering Center (CMU NREC). The vehicles’ computers were preloaded with a modular “Baseline” perception and navigation system that was essentially the same system that CMU NREC had created for the PerceptOR program and was considered to represent the state-of-the-art at the inception of LAGR. The modular nature of the Baseline system allowed the researchers to replace parts of the Baseline code with their own modules and still have a complete working system without having to create an entire navigation system from scratch. Thus, for example, they were able to compare the performance of their own obstacle detection module with that of the Baseline code, while holding everything else fixed. The Baseline code also served as a fixed reference – in any environment and at any time in the program, teams’ code could be compared to the Baseline code. This rapid cycle gave the Government team and the performer teams quick feedback and allowed the Government team to design test courses that challenged the performers in specific perception tasks and whose difficulty was likely to challenge, but not overwhelm, the performers’ current capabilities. Teams were not required to submit new code for every test, but usually did. Despite this leeway, some teams found the rapid test cycle distracting to their long term progress and would have preferred a longer interval between tests. === Phase II === To advance to Phase II, each team had to modify the Baseline code so that on the final 3 tests of Phase I of the government tests, robots running the team's code averaged at least 10% faster than a vehicle running the original Baseline code. This rather modest “Go/ No Go” metric was chosen to allow teams to choose risky, but promising approaches that might not be fully developed in the first 18 months of the program. All 8 teams achieved this metric, with some scoring more twice the speed of the Baseline on the later tests which was the objective for Phase II. Note that the Phase I Go / No Go metric was such that teams were not in completion with each other for a limited number of slots on Phase II: any number of teams, from eight to zero could make the grade. This strategy by DARPA was to designed to encourage cooperation and even code sharing among the teams. == The LAGR teams == Eight teams were selected as performers in Phase I, the first 18 months of LAGR. The teams were from Applied Perception (Principal Investigator [PI] Mark Ollis), Georgia Tech (PI Tucker Balch), Jet Propulsion Laboratory (PI Larry Matthies), Net-Scale Technologies (PI Urs Muller), NIST (PI James Albus), Stanford University (PI Sebastian Thrun), SRI International (PI Robert Bolles), and University of Pennsylvania (PI Daniel Lee). The Stanford team resigned at the end of Phase I to focus its efforts on the DARPA Grand Challenge; it was replaced by a team from the University of Colorado, Boulder (PI Greg Grudic). Also in Phase II, the NIST team suspended its participation in the competition and instead concentrated on assembling the best software elements from each team into a single system. Roger Bostelman became PI of that effort. == The LAGR vehicle == The LAGR vehicle, which was about the size of a supermarket shopping cart, was designed to be simple to control. (A companion DARPA program, Learning Locomotion, addressed complex motor control.) It was battery powered and had two independently driven wheelchair motors in the front, and two caster wheels in the rear. When the front wheels were rotated in the same direction the robot was driven either forward or reverse. When these wheels were driven in opposite directions, the robot turned. The ~ $30,000 cost of the LAGR vehicle meant that a fleet could be built and distributed to a number of teams expanding on the field of researchers who had traditionally participated in DARPA robotics programs. The vehicle's top speed of about 3 miles/ hour and relatively modest weight of ~100 kg meant that it posed a much reduced safety hazard compared to vehicles used in previous programs in unmanned ground vehicles and thus further reduced the budget required for each team to manage its robot. Nevertheless, the LAGR vehicles were sophisticated machines. Their sensor suite included 2 pairs of stereo cameras, an accelerometer, a bumper sensor, wheel encoders, and a GPS. The vehicle also had three computers that were user-programmable. == Scientific results == A cornerstone of the program was incorporation of learned behaviors in the robots. In addition, the program used passive optical systems to accomplish long-range scene analysis. The difficulty of testing UGV navigation in unstructured, off-road environments made accurate, objective measurement of progress a challenging task. While no absolute measure of performance had been defined in LAGR, the relative comparison of a team's code to that of the Baseline code on a given course demonstrated whether progress was being made in that environment. By the conclusion of the program, testing showed that many of the performers had attained leaps in performance. In particular, average autonomous speeds were increased by factor of 3 and useful visual perception was extended to ranges as far as 100 meters. While LAGR did succeed in extending the useful range of visual perception, this was primarily done by either pixel or patch-based color or texture analysis. Object recognition was not directly addressed. Even though the LAGR vehicle had a WAAS GPS, its position was never determined down to the width of the vehicle, so it was hard for the systems to re-use obstacle maps of areas the robots had previously traversed since the GPS continually drifted. The drift was especially severe if there was a forest canopy. A few teams developed visual odometry algorithms that essentially eliminated this drift.

    Read more →
  • MindSpore

    MindSpore

    MindSpore is an open-source software framework for deep learning, machine learning and artificial intelligence developed by Huawei. == Overview == MindSpore provides support for Python by allowing users to define models, control flow, and custom operators using native Python syntax. Unlike graph-based frameworks that require users to learn DSL or complex APIs, MindSpore adopts a source-to-source (S2S) automatic differentiation approach, allowing Python code to be automatically transformed into optimized computational graphs. It has support for custom OpenHarmony-based HarmonyOS NEXT single core framework system built for HarmonyOS, includes an AI system stack that comes with Huawei's built LLM model called PanGu-Σ with full MindSpore framework support. Alongside, OpenHarmony Native device-side AI support for training interface and ArkTS programming interface for its NNRt (Neural Network Runtime) backend configurations via MindSpore Lite AI framework codebase introduced in API 11 Beta 1 of OpenHarmony 4.1. MindSpore platform runs on Ascend AI chips and Kirin alongside other HiSilicon NPU chips. CANN (Compute Architecture of Neural Networks), heterogeneous computing architecture for AI developed by Huawei. With CANN backend in OpenCV DNN, giving developers ability to run created AI models on the Ascend, Kirin and other HiSilicon NPU enabled chips. It supports cross platform development such as Android, iOS, Windows, global OpenHarmony-based distro, Eclipse Oniro, Linux-based EulerOS alongside OpenEuler Huawei's server OS platforms, macOS and Linux. == History == On April 24, 2024, Huawei's MindSpore 2.3.RC1 was released to open source community with Foundation Model Training, Full-Stack Upgrade of Foundation Model Inference, Static Graph Optimization, IT Features and new MindSpore Elec MT (MindSpore-powered magnetotelluric) Intelligent Inversion Model.

    Read more →
  • Telebirr

    Telebirr

    Telebirr (Amharic: ቴሌብር) is a mobile payment service developed and was launched by Ethio telecom, the state owned telecommunication and Internet service provider in Ethiopia. It took five months to develop the end-to-end service. It facilitates the delivery of cashless transactions. The platform deployed currently has the capacity of processing up to 100 transactions per second (TPS) and can be scaled up to 1000 TPS. The service is accessible via SMS, USSD, and smartphone applications. Telebirr works in five languages. == Services == Though the service is fully accessible for any customer of Ethio telecom, the users need to register through the mobile application called Telebirr or using an authorized agent or Ethio telecom shop or Unstructured Supplementary Service Data (USSD), 127# nationally. However, Telebirr also provides a “quick registration” by using any information that already exists in Ethio telecom's system.

    Read more →
  • Versata

    Versata

    Versata is a privately held software company, one of several business units under the ESW Capital umbrella. Versata acquires underperforming or financially struggling enterprise software companies, integrates them into their portfolio, and makes operational changes to improve the viability and performance of the companies. == History == === Early years (1991–2000) === This company was founded in 1991 with the name Image Innovations; Naren Bakshi was co-founder and president, Kevin Fletcher Tweedy was vice president of technology, and they sold a development tool set named Image Application WorkBench that worked with Plexus Software's imaging platform. In 1997, the company name changed to Vision Software. They sold a small suite of software: Vision Builder for accelerated coding; and Vision StoryBoard Pro for creating software documentation. In 1998, their flagship product was a Java development tool named Vision JADE. In January 2000, the company changed names again, this time to Versata, and their e-business automation system, Versata Logic Suite, had three components: Versata Logic Server to host business rules written in Java, Versata Studio for developing the business rules, and Versata Connectors for connecting the logic server to IBM database servers. === Public company (2000–2006) === They went public in March 2000 during the dot-com bubble, raising about $94 million and reaching a market capitalization of over $2.5 billion despite reporting just $13 million in revenue and a $21 million loss in the prior year. In November 2000, Versata expanded into the business workflow area with the acquisition of Verve, Inc. and its workflow management system by the same name. From early 2001 through mid-2003, Versata's revenues were in quarter-over-quarter decline until Alan Baratz took over as CEO. Five consecutive quarters of growth followed until early 2005, when revenues once again took a downward plunge. In mid-2005, the company was notified by NASDAQ that it no longer met NASDAQ's requirements for continued listing, related to maintenance of a minimum amount of shareholder's equity, market value, or net income. In July 2005, Versata was delisted from NASDAQ and publicly traded on the OTC (also known as the Pink Sheets). == Versata, a business unit of ESW Capital == In January 2006, Austin-based Trilogy, Inc. acquired the company and took it private. Trilogy then proceeded to merge portions of Trilogy, specifically, Trilogy Technology Group, into Versata and began acquiring further companies, reorganizing dramatically and offshoring most technical positions to its office in Bangalore, India. From 2006 to 2008, Versata continued to make acquisitions mostly in US. Most of the employees in the acquired companies were laid -off with the majority work being offshored to its India office in Bangalore. In early 2009, Versata made another major overhaul of its business model when it asked all its employees in India to work as contractors through oDesk for a gDev which is an entity incorporated by Trilogy to manage its outsourcing activities. The only employees left in Versata were the ones in US. == Acquisitions == a Corizon was acquired by Metatomix, while Metatomix was part of Versata. b Infopia was acquired by Everest Software, while Everest Software was part of Versata. c Symphony Commerce was acquired by Quantum Retail, while Quantum Retail was part of Versata. == Legal disputes == === Patent infringement and "poison pill" lawsuits with Selectica === The legal disputes with Selectica began in 2004 (before Trilogy acquired Versata in January 2006) and lasted until 2010. While there were many suits and counter-suits, they largely centered around three issues: 2004–2006: Patent infringement in configure, price, and quote (CPQ) software 2005–2007: Patent infringement in contract lifecycle management (CLM) software 2008–2010: The "poison pill" lawsuit In 2004, Selectica and Trilogy had competing CPQ software: Selectica sold Solutions Advisor and Deal Optimization, while Trilogy sold Selling Chain. In April of that year, Trilogy Software sued Selectica for patent infringement. In 2005, before the court ruling, Trilogy made several offers to buy Selectica, but the board rejected them. In January 2006, the court ordered Selectica to pay Trilogy $7.5 million in damages. Four days after the January 2006 judgment in the first lawsuit, Trilogy announced its acquisition of Versata for an undisclosed amount. In 2005, Selectica had acquired the Determine CLM software platform, which included features that overlapped with some offered by Versata. In October 2006, Versata filed a second patent infringement lawsuit. The case was settled in 2007, with Selectica agreeing to pay Trilogy and Versata $10 million, plus up to $7.5 million in additional contingent payments. In 2008, Versata began acquiring Selectica stock. By December, Selectica's board amended its shareholder rights plan to adopt a "poison pill" with an unusually low trigger threshold: if any shareholder acquired more than 4.99% of company stock, their ownership would be diluted. The board explained that the move was meant to protect Selectica's net operating losses (NOLs), which were tax-deductible if the company returned to profitability. Under IRS Section 382, a significant change in stock ownership could cause those NOLs to be disqualified. Versata intentionally triggered the poison pill and also offered to sell back the stocks at a profit (greenmailing them), which prompted a legal dispute over whether Selectica's board had the authority to set such a low threshold and whether defending NOLs justified triggering shareholder dilution. The case ultimately reached the Delaware Supreme Court, which upheld the poison pill in October 2010, ruling in favor of Selectica. === Intellectual property lawsuit over joint development with Sun Microsystems === In 1998, Sun Microsystems hired Trilogy to help Sun's developers in California create a software configurator (later named the WC5 Configurator) that Sun's customers could use to modify products they wanted to buy, customizing products to have the features they wanted. Trilogy worked on the WC5 Configurator for several years, then Sun transferred the work to Oracle to finish. Trilogy believed that they owned the copyright to the work they'd done for Sun, and in 2006 after the merger with Versata they sued Sun for more than $100 million in damages. In April 2009, a jury ruled in favor of Sun and rejected Versata's claims. === Patent lawsuit and ruling on patents of abstract ideas with SAP === SAP developed Pricing Engine, a component in their enterprise resource planning (ERP) system. It competed with an older Trilogy product called Pricer, which was part of Trilogy's Selling Chain platform in the mid-1990s before they merged with Versata. In April 2007—the year after Trilogy acquired Versata—Versata filed a lawsuit against SAP for patent infringement. In August 2009, the jury agreed with Versata and awarded them $139 million. The court granted a new trial on damages and in September 2011, in the retrial, the jury awarded Versata $345 million. This then went to the US Court of Appeals, which in May 2013 affirmed the $345 million damages award, plus interest that had accumulated. In October 2014, Versata and SAP settled their litigation for an undisclosed amount of money. With the dispute between Versata and SAP settled, in June 2013 the Patent Trial and Appeal Board (PTAB) reviewed the validity of the patent itself, and issued a decision in a Covered Business Method (CBM) review, stating that the disputed items were abstract ideas and thus under the US patent law not patentable. In July 2015, the Federal Circuit agreed with PTAB's decision that the challenged items were not patentable. === Trade secrets and damages dispute with Internet Brands === Internet Brands was formerly known as CarsDirect and AutoData Solutions. Like Trilogy, they made software for automakers that helped customers compare vehicles online. In the late 1990s, Trilogy and Internet Brands tried to combine their products but failed to do so, and after a December 1999 lawsuit they made a settlement agreement in May 2001. In 2008, Versata sued Internet Brands claiming they had violated the settlement agreement by making presentations to potential clients stating they had a license from Versata to use and sell Versata technical solutions; and doing so had cost Versata business with Chrysler. Internet Brands' countersuit argued that Versata had misappropriated trade secrets and asked the jury to use Versata's business relationship with Toyota—including revenue from Toyota contracts—as a benchmark to calculate damages. The jury agreed and used that data to determine a $2 million damages award in favor of Internet Brands’ subsidiary, AutoData Solutions. Versata appealed the decision, and in January 2014 the court upheld the $2 million award to Internet Brands. === Patent challenges a

    Read more →
  • AGROVOC

    AGROVOC

    AGROVOC is a multilingual controlled vocabulary covering areas of interest of the Food and Agriculture Organization of the United Nations (FAO), aiming to promote the visibility of research produced among FAO members. By March 2024, AGROVOC consisted of over 42 000 concepts and up to 1 000 000 terms in more than 42 different languages. It is a collaborative effort, the outcome of consensus among a community of experts coordinated by FAO. == History == FAO first published AGROVOC at the beginning of the 1980s in English, Spanish and French to serve as a controlled vocabulary to index publications in agricultural science and technology, especially for the International System for Agricultural Science and Technology (AGRIS). In the 1990s, AGROVOC shifted from paper printing to a digital format opting for data storage handled by a relational database. In 2004, preliminary experiments with expressing AGROVOC into the Web Ontology Language (OWL) took place. At the same time a web based editing tool was developed, then called WorkBench, nowadays VocBench. In 2009 AGROVOC became an SKOS resource. == Usage == Today, AGROVOC is available in different languages. It is employed for tagging resources, allowing searches in a specific language while providing results in many others, enhancing their visibility worldwide. Additionally, it serves for organizing knowledge to facilitate subsequent data retrieval, tagging website content for search engine discovery, standardizing agricultural information data and acting as a reference for translations. Moreover, it finds applications in fields such as data mining, big data, or artificial intelligence. Updated AGROVOC content is released once a month and is available for public use. == Maintenance == FAO coordinates the editorial activities related to the maintenance of AGROVOC. Content curation is carried out by a community of editors and institutions responsible for each of the language versions. VocBench, is the tool used to edit and maintain AGROVOC in a distributed way. FAO also facilitates the technical maintenance of AGROVOC. == Copyright and license == Copyright for AGROVOC content in FAO languages (English, French, Spanish, Arabic, Russian and Chinese) is held by FAO, while content in other languages stays with the institutions that authored it. AGROVOC thesaurus content in English, Russian, French, Spanish, Arabic and Chinese is licensed under the international Creative Commons Attribution License (CC-BY-4.0).

    Read more →
  • CatBoost

    CatBoost

    CatBoost is an open-source software library developed by Yandex. It provides a gradient boosting framework which, among other features, attempts to solve for categorical features using a permutation-driven alternative to the classical algorithm. It works on Linux, Windows, macOS, and is available in Python, R, and models built using CatBoost can be used for predictions in C++, Java, C#, Rust, Core ML, ONNX, and PMML. The source code is licensed under Apache License and available on GitHub. InfoWorld magazine awarded the library "The best machine learning tools" in 2017. along with TensorFlow, Pytorch, XGBoost and 8 other libraries. Kaggle listed CatBoost as one of the most frequently used machine learning (ML) frameworks in the world. It was listed as the top-8 most frequently used ML framework in the 2020 survey and as the top-7 most frequently used ML framework in the 2021 survey. As of April 2022, CatBoost is installed about 100000 times per day from PyPI repository == Features == CatBoost has gained popularity compared to other gradient boosting algorithms primarily due to the following features Native handling for categorical features Fast GPU training Visualizations and tools for model and feature analysis Using oblivious trees or symmetric trees for faster execution Ordered boosting to overcome overfitting == History == In 2009 Andrey Gulin developed MatrixNet, a proprietary gradient boosting library that was used in Yandex to rank search results. Since 2009 MatrixNet has been used in different projects at Yandex, including recommendation systems and weather prediction. In 2014–2015 Andrey Gulin worked with a team of researchers to start a new project called Tensornet which was aimed at solving the problem of "how to work with categorical data". Their work resulted in several proprietary Gradient Boosting libraries with different approaches to handling categorical data. In 2016 the Machine Learning Infrastructure team led by Anna Dorogush started working on Gradient Boosting in Yandex, including Matrixnet and Tensornet. They implemented and open-sourced the next version of Gradient Boosting library called CatBoost, which has support for categorical and text data, GPU training, model analysis, and visualization tools. CatBoost was open-sourced in July 2017 and is under active development in Yandex and the open-source community. == Application == JetBrains uses CatBoost for code completion Cloudflare uses CatBoost for bot detection Careem uses CatBoost to predict future destinations of the rides

    Read more →
  • Ulead DVD MovieFactory

    Ulead DVD MovieFactory

    Corel DVD MovieFactory is a video editing and DVD authoring software product for Microsoft Windows, initially made by Ulead Systems and subsequently by Corel. It creates and authors multimedia discs in HD DVD, Blu-ray, DVD Video and DVD Audio. It also creates and rips Audio CDs and MP3 CDs. DVD MovieFactory is commonly bundled with many of the modern Toshiba Satellite laptops. Official Japanese version is also known as MovieWriter.

    Read more →
  • Common Crawl

    Common Crawl

    The Common Crawl Foundation (Common Crawl) is a nonprofit 501(c)(3) organization that crawls the web and freely provides its archives and datasets to the public. Common Crawl was founded by Gil Elbaz. The data had mostly been primarily used by researchers and some startups until the 2020s, when AI companies started training large language models using the data. In November 2025, an investigation by The Atlantic revealed that Common Crawl misled publishers when it claimed it respected paywalls in its scraping and it was not honoring requests from publishers to have their content removed from its databases. == History == Common Crawl was founded in 2007 in San Francisco. It began publishing its crawls in 2011. By 2013, sites like TinEye were building their products off of Common Crawl. The crawl reduces the reliance of companies and researchers on Google, which has the biggest dataset. Common Crawl was designed to have more and fresher data that was more efficient to analyze and utilize than the Wayback Machine created by the Internet Archive. By 2015, 1.8 billion webpages were on the Common Crawl, which started by crawling a list of URLs donated by the search engine Blekko. They use Amazon Web Services, which provides some of its services for free, allowing computing costs to average $2-4000/month. The Common Crawl website listed 30 studies based on Common Crawl data. Before 2023, Common Crawl was not very well known outside of academic researchers who utilize the data. Common Crawl received its first requests to redact information in 2023 and increasingly started seeing its crawler, CCBot, blocked. In 2023, it began receiving significant financial support from AI companies, including Anthropic and OpenAI, each of which donated $250,000. It was also used to train Google DeepMind's large language model Gemini. By April 2023, Common Crawl was capturing 3.1 billion webpages, with an estimated 5% of pages before 2021 containing hate speech or slurs. As of 2024, Common Crawl had been cited in more than 10,000 academic studies. By 2024, The Pile and Common Crawl had been the two main training datasets being used to train AI models. In November 2025, an investigation by technology journalist Alex Reisner for The Atlantic revealed that Common Crawl misled publishers when it claimed it respected paywalls in its scraping and when it said that it was honoring requests from publishers to have their content removed from its databases. It included misleading results in the public search function on its website that showed no entries for websites that had requested their archives be removed, when in fact those sites were still included in its scrapes used by AI companies. As of 2025, Reisner found that CCBot was the most widely-blocked bot by the top 1000 websites. A 2026 article in LWN.net discussed an advantage to services like Common Crawl being that it can limit the scraping costs to websites by allowing companies and researchers to download the data from Common Crawl instead of scraping it themselves. In April 2026, Common Crawl experimentally began to distribute its data through Hugging Face Storage Bucket, in addition to its standard storage on Amazon S3. == Organization == Peter Norvig and Joi Ito have served on the advisory board. Rich Skrenta is the executive director. It has received funding almost exclusively from the Elbaz Family Foundation Trust until 2023 when it started receiving donations from the AI industry. == Refined versions == A number of organizations take raw Common Crawl data and refine it into datasets that exclude edgy content or are otherwise higher-quality for their purposes, such as FineWeb, DCLM and C4. === Colossal Clean Crawled Corpus === Google version of the Common Crawl is called the Colossal Clean Crawled Corpus, or C4 for short. It was constructed for the training of the T5 language model series in 2019. As of 2023, there were some concerns over copyrighted content in the C4 as well as racist content. A 2024 study found that 45% of content was explicitly restricted by websites' terms of service to be used for purposes like AI training by for-profit companies.

    Read more →
  • Liveness test

    Liveness test

    A liveness test, liveness check or liveness detection is an automated method for determining whether a subject is a real person or part of a spoofing attack. The technique is used as part of know your customer checks in financial services and during facial age estimation. Liveness detection is a cornerstone of digital safety. == Test process == The threat in face spoofing attacks is that "the attacker only needs to find a good face swap library on Github and understand how to inject the model into the camera feed during the KYC process". Fraudsters usually buy stolen IDs on the dark web to start a deepfake attack. An AI-powered generative adversarial network (GAN) can then generate the face swapping model that many online verification services fail to detect. Low level hackers may use face swapping apps such as SwapFace, DeepFaceLive, and Swapstream (increasing interest for those apps in 2023 according to Google Trends). In a video liveness test, users are typically asked to look into a camera and to move, smile or blink, and features of their moving face may then be compared to that of a still image. Artificial intelligence is used to counter presentation attacks such as deepfakes or users wearing hyperrealistic masks, or video injection attacks. Other forms of liveness test include checking for a pulse when using a fingerprint scanner or checking that a person's voice is not a recording or artificially generated during speaker recognition. == Adoption and certification == In a 2022 report published by the security firm Sensity, it was demonstrated that the liveness test of most US banks was easily cheated with new and publicly-available AI-powered techniques. Many of these banks disregarded the results of the report. In the first half of 2023, the security firm iProov detected a 704% increase in face-swap attacks. In 2023, in the UK, many customers of Ryanair were upset to have to go through many ID verification checks, including liveness tests, before boarding, as the airline was using it as a mean to deter customers to buy tickets through third-party websites. In the first half of 2024 iBeta Quality Assurance issued 18 new ISO/IEC 30107-3 Presentation Attack Detection certificates, raising the cumulative total to 85 since 2018. In January 2024, the Department of Homeland Security (DHS) opened applications from vendors to test their Liveness test. Identity frauds peaked during the COVID-19 lockdown, leading government agencies to take reinforced measures to secure their digital applications.

    Read more →