AI Chatbot Soulmate

AI Chatbot Soulmate — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Eyes of Things

    Eyes of Things

    Eyes of Things (EoT) is the name of a project funded by the European Union’s Horizon 2020 Research and Innovation Programme under grant agreement number 643924. The purpose of the project, which is funded under the Smart Cyber-physical systems topic, is to develop a generic hardware-software platform for embedded, efficient (i.e. battery-operated, wearable, mobile), computer vision, including deep learning inference. On November 29, 2018, the European Space Agency announced that it was testing the suitability of the device for space applications in advance of a flight in a Cubesat. == Motivation == EoT is based on the following tenets: Future embedded systems will have more intelligence and cognitive functionality. Vision is paramount to such intelligent capacity Unlike other sensors, vision requires intensive processing. Power consumption must be optimized if vision is to be used in mobile and wearable applications Cloud processing of edge-captured images is not sustainable. The sheer amount of visual data generated cannot be transferred to the cloud. Bandwidth is not sufficient and cloud servers cannot cope with it. == Partners == VISILAB group at University of Castilla–La Mancha (Coordinator) Movidius Awaiba Thales Security Solutions & Systems DFKI Fluxguide Evercam nVISO == Awards == 2019 Electronic Component and Systems Innovation Award by the European Commission 2018 HiPEAC Tech Transfer Award 2018 EC Innovation Radar - highlighting excellent innovations Award 2018 Internet of Things (IoT) Technology Research Award Pilot by Google 2016 Semifinalist "THE VISION SHOW STARTUP COMPETITION", Global Association for Vision Information, Boston US

    Read more →
  • Key–value database

    Key–value database

    A key-value database, or key-value store, is a data storage paradigm designed for storing, retrieving, and managing associative arrays, a data structure more commonly known today as a dictionary. Dictionaries contain a collection of objects, or records, which in turn have many different fields within them. These records are stored and retrieved using a key that uniquely identifies the record, and is used to find the data within the database. Key-value databases differ from the better known relational databases (RDB). RDBs pre-define the data structure in the database as a series of tables containing fields with well-defined data types. Exposing the data types to the database program allows it to apply various optimizations. In contrast, key-value systems treat the value as opaque to the database itself, and typically support only simple operations such as storing, retrieving, updating, and deleting a value by its key. This offers considerable flexibility and makes such systems well suited to low-latency, high-throughput workloads dominated by direct key lookups, but less suitable for applications that require complex queries or explicit relationships among records. A lack of standardization, limited transaction support, and relatively simple query interfaces long restricted many key-value systems to specialized uses, but the rapid move to cloud computing after 2010 helped drive renewed interest in them as part of the broader NoSQL movement. Some graph databases, such as ArangoDB, are also key–value databases internally, adding the concept of relationships (pointers) between records as a first-class data type. == Types and examples == Key–value systems span a wide consistency spectrum, from eventually consistent designs to strongly consistent or serializable ones, and some allow the consistency level to be configured as part of the trade-off against latency and availability. Renewed interest in key–value and other NoSQL systems was driven in part by the demands of big data, distributed, and cloud applications. Their scalability and availability made them attractive for cloud data management, although limited transaction support, low-level query interfaces, and the lack of standardization remained obstacles to wider adoption. Some maintain data in memory (RAM), while others employ solid-state drives or rotating disks. Some key–value systems add additional structure to their keys. For example, Oracle NoSQL Database organizes records using composite keys with "major" and "minor" components, an arrangement that Oracle compares to a directory-path structure in a file system. More generally, however, key–value stores are defined by their use of unique keys associated with opaque values and by their emphasis on simple key-based operations. Unix included dbm (database manager), a minimal database library written by Ken Thompson for managing associative arrays with a single key and hash-based access. Later implementations and related libraries included sdbm, GNU dbm (gdbm), and Berkeley DB. A more recent example is RocksDB, a persistent key–value storage engine developed at Facebook and designed for large-scale applications. Other examples include in-memory systems such as Memcached and Redis, and persistent systems such as Berkeley DB, Riak, and Voldemort.

    Read more →
  • Whitelist

    Whitelist

    A whitelist or allowlist is a list or register of entities that are being provided a particular privilege, service, mobility, access or recognition. Entities on the list will be accepted, approved and/or recognized. Whitelisting is the reverse of blacklisting, the practice of identifying entities that are denied, unrecognized, or ostracized. == Email whitelists == Spam filters often include the ability to "whitelist" certain sender IP addresses, email addresses or domain names to protect their email from being rejected or sent to a junk mail folder. These can be manually maintained by the user or system administrator - but can also refer to externally maintained whitelist services. === Non-commercial whitelists === Non-commercial whitelists are operated by various non-profit organizations, ISPs, and others interested in blocking spam. Rather than paying fees, the sender must pass a series of tests; for example, their email server must not be an open relay and have a static IP address. The operator of the whitelist may remove a server from the list if complaints are received. === Commercial whitelists === Commercial whitelists are a system by which an Internet service provider allows someone to bypass spam filters when sending email messages to its subscribers, in return for a pre-paid fee, either an annual or a per-message fee. A sender can then be more confident that their messages have reached recipients without being blocked, or having links or images stripped out of them, by spam filters. The purpose of commercial whitelists is to allow companies to reliably reach their customers by email. == Advertising whitelist == Many websites rely on ads as a source of revenue, but the use of ad blockers is increasingly common. Websites that detect an adblocker in use often ask for it to be disabled - or their site to be "added to the whitelist" - a standard feature of most adblockers. == Network whitelists == === LAN whitelists === A use for whitelists is in local area network (LAN) security. Many network admins set up MAC address whitelists, or a MAC address filter, to control who is allowed on their networks. This is used when encryption is not a practical solution or in tandem with encryption. However, it's sometimes ineffective because a MAC address can be faked. === IP whitelist === Firewalls can usually be configured to only allow data-traffic from/to certain (ranges of) IP-addresses. === Application whitelists === One approach in combating viruses and malware is to whitelist software which is considered safe to run, blocking all others. This is particularly attractive in a corporate environment, where there are typically already restrictions on what software is approved. Leading providers of application whitelisting technology include Bit9, Velox, McAfee, Lumension, ThreatLocker, Airlock Digital and SMAC. On Microsoft Windows, recent versions include AppLocker, which allows administrators to control which executable files are denied or allowed to execute. With AppLocker, administrators are able to create rules based on file names, publishers or file location that will allow certain files to execute. Rules can apply to individuals or groups. Policies are used to group users into different enforcement levels. For example, some users can be added to a report-only policy that will allow administrators to understand the impact before moving that user to a higher enforcement level. Linux systems typically have AppArmor and SE Linux features available which can be used to effectively block all applications which are not explicitly whitelisted, and commercial products are also available. On HP-UX introduced a feature called "HP-UX Whitelisting" on 11iv3 version. == Controversy regarding name == In 2018, a journal commentary on a report on predatory publishing was released making claims that "white" and "black" are racially charged terms that need to be avoided in instances such as "whitelist" and "blacklist". The premise of the journal is that "black" and "white" have negative and positive connotations respectively. It states that since "blacklisting" was first referred to during "the time of mass enslavement and forced deportation of Africans to work in European-held colonies in the Americas," the word is therefore related to race. There is no mention of "whitelist" and its origin or relation to race. This issue is most widely disputed in computing industries where "whitelist" and "blacklist" are prevalent (e.g. IP whitelisting). Despite the commentary nature of the journal, some companies and individuals in others have taken to replacing "whitelist" and "blacklist" with new alternatives such as "allow list" and "deny list". Those adopting this change consider using the "whitelist"/"blacklist" names as a code smell. Those that oppose these changes question its attribution to race, citing the same etymology quote that the 2018 journal uses. According to the remark, the term "blacklist" evolved from the term "black book" about a century ago. The term "black book" does not appear to have any etymology or sources that support racial associations, instead originating in the 1400s as a reference to "a list of people who had committed crimes or fallen out of favor with leaders", and popularized by King Henry VIII's literal use of a black book. Others also note the prevalence of positive and negative connotations to "white" and "black" in the Bible, predating attributions to skin tone and slavery. It wasn't until the 1960s Black Power movement that "Black" became a widespread word to refer to one's race as a person of color in America (alternate to African-American) lending itself to the argument that the negative connotation behind "black" and "blacklist" both predate attribution to race.

    Read more →
  • QANDA

    QANDA

    QANDA (stands for 'Q and A') is an AI-based learning platform developed by Mathpresso Inc., a South Korea-based education technology company. Its best known feature is a solution search, which uses optical character recognition technology to scan problems and provide step-by-step solutions and learning content. As of March 2024, QANDA solved over 6.3 billion questions. QANDA has 90 million total registered users and has reached 8 million monthly active users (MAU) in 50 countries. 90% of the cumulative users are from overseas such as Vietnam and Indonesia. In January 2024, its MathGPT, a math-specific small large language model set a new world record, surpassed Microsoft's 'ToRA 13B', the previous record holder in benchmarks assessing mathematical performance such as 'MATH' (high school math) and 'GSM8K' (grade school math). 'MathGPT' was co-developed with Upstage and KT. In March 2024, Mathpresso launched 'Cramify' (formerly known as Prep.Pie), an AI-powered study material generator designed to create personalized exam prep materials for U.S. college students. It uses generative AI to create customized study materials uploaded by students. Its features include a range of tools including study summarizer and question solver. == History == Co-founder Jongheun ‘Ray’ Lee first came up with the idea of QANDA during his freshman year in college. While he was tutoring to earn money, Lee realized that the quality of education a student receives is greatly based on their location. Lee saw his K-12 students were regularly asking similar questions and realized that these questions were from a pre-selected number of textbooks currently being used in schools. He decided to team up with his high school friend, Yongjae ‘Jake’ Lee to build a platform whereby, one uses a mobile app to scan and submit questions, and students can ask and receive detailed responses. Lee's school friends, Wonguk Jung and Hojae Jeong, joined the team. In June 2015, Mathpresso, Inc. was founded in Seoul, South Korea. In January 2016, Mathpresso's first product QANDA was launched. It supported a Q&A feature between students and tutors. In October 2017, QANDA introduced an AI-based search capability that permitted users to search for answers in seconds. In April 2020, Jake Yongjae Lee(CEO & co-founder) and Ray Jongheun Lee (co-founder) were selected as Forbes 30 under 30 Asia. In June 2021, QANDA raised $50 million in series C funding. Jake Yongjae Lee was recognized as an Innovator Under 35 by MIT Technology Review. In November 2021, QANDA secured a strategic investment from Google. Since its inception, it has received backing in Series C funding from investors namely Google, Yellowdog, GGV Capital, Goodwater Capital, KDB, and SKS Private Equity with participation from SoftBank Ventures Asia, Legend Capital, Mirae Asset Venture Investment, and Smilegate Investment. In September 2023, Mathpresso has raised $8 million (10 billion KRW) from Korea's telecom giant, KT. The total cumulative investment is about 130 million US dollars. The partnership aims to accelerate the development of an education-specific Large Language Model. The company intends to incorporate the LLM model to fortify its AI tutor, which later will be integrated into the existing services: QANDA App, B2B & B2G Saas, and 1:1 online tutoring (QANDA Tutor). == Features == QANDA features OCR-based solution search, one-on-one Q&A tutoring, a study timer. In 2021, QANDA launched additional features, including the premium subscription model that offers unlimited “byte-sized” micro-video lectures and the community feature that enhances collaborative learning. In 2021, QANDA launched QANDA Tutor, a tablet-based 1:1 tutoring service and QANDA Study, a 1:N online school in Vietnam. In 2022, QANDA launched an exam prep feature that offers past exam materials from school via online. This feature is currently available in South Korea. In August 2023, QANDA launched a beta version of an LLM-powered AI Tutor. == Awards and recognition == Best Hidden Gems of 2017 by Google Playstore 2018 AWS AI Startup Challenge Award National representative for the Google AI for Social Good APAC, 2018 Best Self-Improvement Apps of 2018 by Google Playstore GSV Edtech 150 — the Most Transformational Growth Companies in Digital Learning Speaker at the Google App Summit, 2021 Selected as a prospect unicorn company by Korea Technology Finance Corporation in 2023 Winner of G20-DIA Global Pitching in 2023 2021, 2022, 2023 East Asia EdTech 150 by HolonIQ

    Read more →
  • Multimodal representation learning

    Multimodal representation learning

    Multimodal representation learning is a subfield of representation learning focused on integrating and interpreting information from different modalities, such as text, images, audio, or video, by projecting them into a shared latent space. This allows for semantically similar content across modalities to be mapped to nearby points within that space, facilitating a unified understanding of diverse data types. By automatically learning meaningful features from each modality and capturing their inter-modal relationships, multimodal representation learning enables a unified representation that enhances performance in cross-media analysis tasks such as video classification, event detection, and sentiment analysis. It also supports cross-modal retrieval and translation, including image captioning, video description, and text-to-image synthesis. == Motivation == The primary motivations for multimodal representation learning arise from the inherent nature of real-world data and the limitations of unimodal approaches. Since multimodal data offers complementary and supplementary information about an object or event from different perspectives, it is more informative than relying on a single modality. A key motivation is to narrow the heterogeneity gap that exists between different modalities by projecting their features into a shared semantic subspace. This allows semantically similar content across modalities to be represented by similar vectors, facilitating the understanding of relationships and correlations between them. Multimodal representation learning aims to leverage the unique information provided by each modality to achieve a more comprehensive and accurate understanding of concepts. These unified representations are crucial for improving performance in various cross-media analysis tasks such as video classification, event detection, and sentiment analysis. They also enable cross-modal retrieval, allowing users to search and retrieve content across different modalities. Additionally, it facilitates cross-modal translation, where information can be converted from one modality to another, as seen in applications like image captioning and text-to-image synthesis. The abundance of ubiquitous multimodal data in real-world applications, including understudied areas like healthcare, finance, and human-computer interaction (HCI), further motivates the development of effective multimodal representation learning techniques. == Approaches and methods == === Canonical-correlation analysis based methods === Canonical-correlation analysis (CCA) was first introduced in 1936 by Harold Hotelling and is a fundamental approach for multimodal learning. CCA aims to find linear relationships between two sets of variables. Given two data matrices X ∈ R n × p {\displaystyle X\in \mathbb {R} ^{n\times p}} and Y ∈ R n × q {\displaystyle Y\in \mathbb {R} ^{n\times q}} representing different modalities, CCA finds projection vectors w x ∈ R p {\displaystyle w_{x}\in \mathbb {R} ^{p}} and w y ∈ R q {\displaystyle w_{y}\in \mathbb {R} ^{q}} that maximizes the correlation between the projected variables: ρ = max w x , w y w x ⊤ Σ x y w y w x ⊤ Σ x x w x w y ⊤ Σ y y w y {\displaystyle \rho =\max _{w_{x},w_{y}}{\frac {w_{x}^{\top }\Sigma _{xy}w_{y}}{{\sqrt {w_{x}^{\top }\Sigma _{xx}w_{x}}}{\sqrt {w_{y}^{\top }\Sigma _{yy}w_{y}}}}}} such that Σ x x {\displaystyle \Sigma _{xx}} and Σ y y {\displaystyle \Sigma _{yy}} are the within-modality covariance matrices, and Σ x y {\displaystyle \Sigma _{xy}} is the between-modality covariance matrix. However, standard CCA is limited by its linearity, which led to the development of nonlinear extensions, such as kernel CCA and deep CCA. ==== Kernel CCA ==== Kernel canonical correlation analysis (KCCA) extends traditional CCA to capture nonlinear relationships between modalities by implicitly mapping the data into high dimensional feature spaces using kernel functions. Given kernel functions K x {\displaystyle K_{x}} and K y {\displaystyle K_{y}} with corresponding Gram matrices K x ∈ R n × n {\displaystyle K_{x}\in \mathbb {R} ^{n\times n}} and K y ∈ R n × n {\displaystyle K_{y}\in \mathbb {R} ^{n\times n}} , KCCA seeks coefficients α {\displaystyle \alpha } and β {\displaystyle \beta } that maximize: ρ = max α , β α ⊤ K x K y β α ⊤ K x 2 α β ⊤ K y 2 β {\displaystyle \rho =\max _{\alpha ,\beta }{\frac {\alpha ^{\top }K_{x}Ky\beta }{{\sqrt {\alpha ^{\top }K_{x}^{2}\alpha }}{\sqrt {\beta ^{\top }K_{y}^{2}\beta }}}}} To prevent overfitting, regularization terms are typically added, resulting in: ρ = max α , β α T K x K y β α T ( K x 2 + λ x K x ) α β T ( K y 2 + λ y K y ) β {\displaystyle \rho =\max _{\alpha ,\beta }{\frac {\alpha ^{T}K_{x}K_{y}\beta }{{\sqrt {\alpha ^{T}\left(K_{x}^{2}+\lambda _{x}K_{x}\right)\alpha }}{\sqrt {\;\beta ^{T}\left(K_{y}^{2}+\lambda _{y}K_{y}\right)\beta }}}}} where λ x {\displaystyle \lambda _{x}} and λ y {\displaystyle \lambda _{y}} are regularization parameters. KCCA has proven effective for tasks such as cross-modal retrieval and semantic analysis, though it faces computational challenges with large datasets due to its O ( n 2 ) {\displaystyle O(n^{2})} memory requirement for sorting kernel matrices. KCCA was proposed independently by several researchers. ==== Deep CCA ==== Deep canonical correlation analysis (DCCA), introduced in 2013, employs neural networks to learn nonlinear transformations for maximizing the correlation between modalities. DCCA uses separate neural networks f x {\displaystyle f_{x}} and f y {\displaystyle f_{y}} for each modality to transform the original data before applying CCA: max W x , W y , θ x , θ y corr ⁡ ( f x ( X ; θ x ) , f y ( Y ; θ y ) ) {\displaystyle \max _{W_{x},W_{y},\theta _{x},\theta _{y}}\operatorname {corr} \left(f_{x}(X;\theta _{x}),f_{y}(Y;\theta _{y})\right)} where θ x {\displaystyle \theta _{x}} and θ y {\displaystyle \theta _{y}} represent the parameters of the neural networks, and W x {\displaystyle W_{x}} and W y {\displaystyle W_{y}} are the CCA projection matrices. The correlation objective is computed as: corr ⁡ ( H x , H y ) = tr ⁡ ( T − 1 / 2 H x T H y S − 1 / 2 ) {\displaystyle \operatorname {corr} (H_{x},H_{y})=\operatorname {tr} \left(T^{-1/2}H_{x}^{T}H_{y}S^{-1/2}\right)} where H x = f x ( X ) {\displaystyle H_{x}=f_{x}(X)} and H y = f y ( Y ) {\displaystyle H_{y}=f_{y}(Y)} are the network outputs, T = H x T H x + r x I {\displaystyle T=H_{x}^{T}H_{x}+r_{x}I} , S = H y T H y + r y I {\displaystyle S=H_{y}^{T}H_{y}+r_{y}I} and r x , r y {\displaystyle r_{x},r_{y}} are the regularization parameters. DCCA overcomes the limitations of linear CCA and kernel CCA by learning complex nonlinear relationships while maintaining computational efficiency for large datasets through mini-batch optimization. === Graph-based methods === Graph-based approaches for multimodal representation learning leverage graph structure to model relationships between entities across different modalities. These methods typically represent each modality as a graph and then learn embedding that preserve cross-modal similarities, enabling more effective joint representation of heterogeneous data. One such method is cross-modal graph neural networks (CMGNNs) that extend traditional graph neural networks (GNNs) to handle data from multiple modalities by constructing graphs that capture both intra-modal and inter-modal relationships. These networks model interactions across modalities by representing them as nodes and their relationships as edges. Other graph-based methods include Probabilistic Graphical Models (PGMs) such as deep belief networks (DBN) and deep Boltzmann machines (DBM). These models can learn a joint representation across modalities, for instance, a multimodal DBN achieves this by adding a shared restricted Boltzmann Machine (RBM) hidden layer on top of modality-specific DBNs. Additionally, the structure of data in some domains like Human-Computer Interaction (HCI), such as the view hierarchy of app screens, can potentially be modeled using graph-like structures. The field of graph representation learning is also relevant, with ongoing progress in developing evaluation benchmarks. === Diffusion maps === Another set of methods relevant to multimodal representation learning are based on diffusion maps and their extensions to handle multiple modalities. ==== Multi-view diffusion maps ==== Multi-view diffusion maps address the challenge of achieving multi-view dimensionality reduction by effectively utilizing the availability of multiple views to extract a coherent low-dimensional representation of the data. The core idea is to exploit both the intrinsic relations within each view and the mutual relations between the different views, defining a cross-view model where a random walk process implicitly hops between objects in different views. A multi-view kernel matrix is constructed by combining these relations, defining a cross-view diffusion process and associ

    Read more →
  • Camfecting

    Camfecting

    In computer security, camfecting is the process of attempting to hack into a person's webcam and activate it without the webcam owner's permission. The remotely activated webcam can be used to watch anything within the webcam's field of vision, sometimes including the webcam owner themselves. Camfecting is most often carried out by infecting the victim's computer with a virus that can provide the hacker access to their webcam. This attack is specifically targeted at the victim's webcam, and hence the name camfecting, a portmanteau of the words camera and infecting. Typically, a webcam hacker or a camfecter sends his victim an innocent-looking application which has a hidden Trojan software through which the camfecter can control the victim's webcam. The camfecter virus installs itself silently when the victim runs the original application. Once installed, the camfecter can turn on the webcam and capture pictures/videos. The camfecter software works just like the original webcam software present in the victim computer, the only difference being that the camfecter controls the software instead of the webcam's owner. == Notable cases == Marcus Thomas, former assistant director of the FBI's Operational Technology Division in Quantico, said in a 2013 story in The Washington Post that the FBI had been able to covertly activate a computer's camera—without triggering the light that lets users know it is recording—for several years. In November 2013, American teenager Jared James Abrahams pleaded guilty to hacking over 100-150 women and installing the highly invasive malware Blackshades on their computers in order to obtain nude images and videos of them. One of his victims was Miss Teen USA 2013 Cassidy Wolf. Researchers from Johns Hopkins University have shown how to covertly capture images from the iSight camera on MacBook and iMac models released before 2008, by reprogramming the microcontroller's firmware. == Prevention == A computer that does not have an up-to-date webcam software or any anti-virus (or firewall) software installed and operational may be at increased risk for camfecting from different types of malware. Softcams may nominally increase this risk, if not maintained or configured properly. Although a person cannot protect themselves from zero-day exploits that could potentially activate a camera unknowingly, such as Pegasus is able to do on smartphones. The only way to truly avoid being watched through your own camera is by blocking it physically, since software blocks can be overriden by advanced persistent threats. A simple piece of tape is more commonly used to offuscate the feed of the camera. With even Mark Zuckerberg doing so on his personal laptop that appeared during a presentation. And it being the way Snowden, an ex-contractor for the NSA, is portrayed to do so to prevent camfecting in the biopic Snowden. There is now a market for the manufacture and sale of sliding lens covers that allow users to physically block their computer's camera and, in some cases, microphone. A number of phone and laptop manufacturers tried to implement pop-up cameras that can only be opened manually by the user. But the trend did not become mainstream because of the engineering it took to keep the mechanisms up to date, aswell as the fragility and durability of the cameras.

    Read more →
  • Halloween Problem

    Halloween Problem

    In computing, the Halloween Problem refers to a phenomenon in databases in which an update operation causes a change in the physical location of a row, potentially allowing the row to be visited again later in the same update operation. This could even cause an infinite loop in some cases where updates continually place the updated record ahead of the scan performing the update operation. The potential for this database error was first discovered by Don Chamberlin, Pat Selinger, and Morton Astrahan in the mid-1970s, on Halloween day, while working on query optimization. They wrote a SQL query supposed to give a ten percent raise to every employee who earned less than $25,000. This query would run successfully, with no errors, but when finished all the employees in the database earned at least $25,000, because it kept giving them a raise until they reached that level. The expectation was that the query would iterate over each of the employee records with a salary less than $25,000 precisely once. In fact, because even updated records were visible to the query execution engine and so continued to match the query's criteria, salary records were matching multiple times and each time being given a 10% raise until they were all greater than $25,000. Contrary to what some believe, the name is not descriptive of the nature of the problem but rather was given due to the day it was discovered on. As recounted by Don Chamberlin: Pat and Morton discovered this problem on Halloween... I remember they came into my office and said, "Chamberlin, look at this. We have to make sure that when the optimizer is making a plan for processing an update, it doesn't use an index that is based on the field that is being updated. How are we going to do that?" It happened to be on a Friday, and we said, "Listen, we are not going to be able to solve this problem this afternoon. Let's just give it a name. We'll call it the Halloween Problem and we'll work on it next week." And it turns out it has been called that ever since.

    Read more →
  • International Road Traffic and Accident Database

    International Road Traffic and Accident Database

    The International Road Traffic and Accident Database (IRTAD) is an initiative dedicated to compiling and analyzing global road crash data. It is managed by the International Transport Forum (ITF) under the auspices of its permanent working group, which specializes in road safety, commonly referred to as the IRTAD Group. The primary objective of IRTAD is to provide a robust empirical basis for international comparisons in the field of road safety and to offer data to support the formulation of effective road safety policies. == Data availability == A portion of the data gathered by IRTAD is accessible for free through the OECD statistics website, however the remaining data requires a subscription for access. == History == The IRTAD database was originally started in 1988 by Germany's Federal Institution for Roads (BASt) in response to demands for international comparative data. It was later taken over and expanded by the International Transport Forum and has grown to be an important resource for comparing road safety metrics between countries worldwide, although mostly in the developed world. Every year, the ITF publishes comparative and country-by-country road safety data gathered for the IRTAD database and analysed by the IRTAD Group in the ITF Road Safety Annual Report, informally known as "IRTAD Report". Over the years, the IRTAD acronym has come to stand not only for the database, but also for the Traffic Safety Data and Analysis Group (usually referred to as IRTAD Group). The IRTAD Group is the International Transport Forum's permanent working group on road safety. It consists of a group of international road safety experts drawn from national road administrations, road safety research institutes, International organizations, automobile associations, insurance companies, car manufacturers and other road safety stakeholders. The IRTAD Group is a major forum for international road safety collaboration and exchange of best practices. Its focus is on improving road safety data as a basis for targeting interventions that are effective in reducing the number of road deaths and serious traffic injuries. The work of IRTAD, among that of others, has spawned the creation of road safety observatories for different world regions: the Ibero-American Road Safety Observatory Archived 2020-06-28 at the Wayback Machine (OISEVI), the African Road Safety Observatory Archived 2020-06-10 at the Wayback Machine, and the South-East Asian Road Safety Observatory. The ITF supports OISEVI through the Spanish-language IRTAD-LAC database and is actively involved in the implementation of the African and South East-Asian observatories. The genesis of the road safety observatory movement dates back to 2008, when the ITF, via IRTAD, began to facilitate twinning between countries striving to improve their road safety record and countries with high road safety performance. The initial twinning was between Jamaica and the United Kingdom. This work was supported by the World Bank, the Inter-American Development Bank (IADB) and the FIA Foundation. The twinning between Argentina and Spain in 2011 led to the creation of OISEVI. To this day, the ITF supports OISEVI through the Spanish-language IRTAD-LAC database. In 2006, the ITF set up Safer City Streets, a global traffic safety network for cities that replicates the successful IRTAD approach for urban road safety.

    Read more →
  • Chirplet transform

    Chirplet transform

    In signal processing, the chirplet transform is an inner product of an input signal with a family of analysis primitives called chirplets. Similar to the wavelet transform, chirplets are usually generated from (or can be expressed as being from) a single mother chirplet (analogous to the so-called mother wavelet of wavelet theory). == Definitions == The term chirplet transform was coined by Steve Mann, as the title of the first published paper on chirplets. The term chirplet itself (apart from chirplet transform) was also used by Steve Mann, Domingo Mihovilovic, and Ronald Bracewell to describe a windowed portion of a chirp function. In Mann's words: A wavelet is a piece of a wave, and a chirplet, similarly, is a piece of a chirp. More precisely, a chirplet is a windowed portion of a chirp function, where the window provides some time localization property. In terms of time–frequency space, chirplets exist as rotated, sheared, or other structures that move from the traditional parallelism with the time and frequency axes that are typical for waves (Fourier and short-time Fourier transforms) or wavelets. The chirplet transform thus represents a rotated, sheared, or otherwise transformed tiling of the time–frequency plane. Although chirp signals have been known for many years in radar, pulse compression, and the like, the first published reference to the chirplet transform described specific signal representations based on families of functions related to one another by time–varying frequency modulation or frequency varying time modulation, in addition to time and frequency shifting, and scale changes. In that paper, the Gaussian chirplet transform was presented as one such example, together with a successful application to ice fragment detection in radar (improving target detection results over previous approaches). The term chirplet (but not the term chirplet transform) was also proposed for a similar transform, apparently independently, by Mihovilovic and Bracewell later that same year. == Applications == The first practical application of the chirplet transform was in water-human-computer interaction (WaterHCI) for marine safety, to assist vessels in navigating through ice-infested waters, using marine radar to detect growlers (small iceberg fragments too small to be visible on conventional radar, yet large enough to damage a vessel). Other applications of the chirplet transform in WaterHCI include the SWIM (Sequential Wave Imprinting Machine). More recently other practical applications have been developed, including image processing (e.g. where there is periodic structure imaged through projective geometry), as well as to excise chirp-like interference in spread spectrum communications, in EEG processing, and Chirplet Time Domain Reflectometry. == Extensions == The warblet transform is a particular example of the chirplet transform introduced by Mann and Haykin in 1992 and now widely used. It provides a signal representation based on cyclically varying frequency modulated signals (warbling signals).

    Read more →
  • Zé Delivery

    Zé Delivery

    Zé Delivery is a startup developed by Brazilian drinks company AmBev which offers an app for delivering drinks. The app is available for Android and iOS. Created in 2016 by AmBev's ZX Ventures hub, the service has an international presence in Argentina, Paraguay, Bolivia, Panama and the Dominican Republic. It is also present in more than 300 Brazilian cities. Because it has an extensive category of alcoholic beverages, the service is only used by people over 18. It also offers soft drinks, juices, energy drinks and other non-alcoholic beverages.

    Read more →
  • Data event

    Data event

    A data event is a relevant state transition defined in an event schema. Typically, event schemata are described by pre- and post condition for a single or a set of data items. In contrast to ECA (Event condition action), which considers an event to be a signal, the data event not only refers to the change (signal), but describes specific state transitions, which are referred to in ECA as conditions. Considering data events as relevant data item state transitions allows defining complex event-reaction schemata for a database. Defining data event schemata for relational databases is limited to attribute and instance events. Object-oriented databases also support collection properties, which allows defining changes in collections as data events, too.

    Read more →
  • Mixed raster content

    Mixed raster content

    Mixed raster content (MRC) is a method for compressing images that contain both binary-compressible text and continuous-tone components, using image segmentation methods to improve the level of compression and the quality of the rendered image. By separating the image into components with different compressibility characteristics, the most efficient and accurate compression algorithm for each component can be applied. MRC-compressed images are typically packaged into a hybrid file format such as DjVu and sometimes PDF. This allows for multiple images, and the instructions to properly render and reassemble them, to be stored within a single file. Some image scanners optionally support MRC when scanning to PDF. A typical manual states that without MRC, the image is generated in a single process, with text and graphics not distinguished. With MRC, separate processes are used for text, graphics, and other elements, producing clearer graphics and sharper text, at the price of slightly slower processing. MRC is recommended to optimise the scanning of documents with harder-to-read text or lower-quality graphics. MRC can also reduce the size of the scanned file, though higher compression using JBIG2 can sometimes lead to character substitution errors in scanned documents. == File format == A form of MRC is defined by international standard bodies as ISO/IEC 16485, or ITU recommendation T.44 (accessible free of charge). It defines a file format with bilevel masks and two data layers in each "stripe" of the image. The mask can be encoded in ITU T.4, JBIG1, or JBIG2, while the images can be JPEG, JBIG1, or run-length encoded color. The format is loosely based on JPEG, with a APP13 segment registered for this purpose. It is not known whether this file format is actually used, as formats like DjVu and PDF have their own ways of defining layers and masks.

    Read more →
  • Recursive self-improvement

    Recursive self-improvement

    Recursive self-improvement (RSI) is a process in which early artificial general intelligence (AGI) systems rewrite their own computer code, causing an intelligence explosion resulting from enhancing their own capabilities and intellectual capacity, theoretically resulting in superintelligence. The development of recursive self-improvement raises significant ethical and safety concerns, as such systems may evolve in unforeseen ways and could potentially surpass human control or understanding. == Seed improver == The concept of a "seed improver" architecture is a foundational framework that equips an AGI system with the initial capabilities required for recursive self-improvement. This might come in many forms or variations. The term "Seed AI" was coined by Eliezer Yudkowsky. === Hypothetical example === The concept begins with a hypothetical "seed improver", an initial code-base developed by human engineers that equips an advanced future large language model (LLM) built with strong or expert-level capabilities to program software. These capabilities include planning, reading, writing, compiling, testing, and executing arbitrary code. The system is designed to maintain its original goals and perform validations to ensure its abilities do not degrade over iterations. ==== Initial architecture ==== The initial architecture includes a goal-following autonomous agent, that can take actions, continuously learns, adapts, and modifies itself to become more efficient and effective in achieving its goals. The seed improver may include various components such as: Recursive self-prompting loop Configuration to enable the LLM to recursively self-prompt itself to achieve a given task or goal, creating an execution loop which forms the basis of an agent that can complete a long-term goal or task through iteration. Basic programming capabilities The seed improver provides the AGI with fundamental abilities to read, write, compile, test, and execute code. This enables the system to modify and improve its own codebase and algorithms. Goal-oriented design The AGI is programmed with an initial goal, such as "improve your capabilities". This goal guides the system's actions and development trajectory. Validation and Testing Protocols An initial suite of tests and validation protocols that ensure the agent does not regress in capabilities or derail itself. The agent would be able to add more tests in order to test new capabilities it might develop for itself. This forms the basis for a kind of self-directed evolution, where the agent can perform a kind of artificial selection, changing its software as well as its hardware. ==== General capabilities ==== This system forms a sort of generalist Turing-complete programmer which can in theory develop and run any kind of software. The agent might use these capabilities to for example: Create tools that enable it full access to the internet, and integrate itself with external technologies. Clone/fork itself to delegate tasks and increase its speed of self-improvement. Modify its cognitive architecture to optimize and improve its capabilities and success rates on tasks and goals, this might include implementing features for long-term memories using techniques such as retrieval-augmented generation (RAG), develop specialized subsystems, or agents, each optimized for specific tasks and functions. Develop new and novel multimodal architectures that further improve the capabilities of the foundational model it was initially built on, enabling it to consume or produce a variety of information, such as images, video, audio, text and more. Plan and develop new hardware such as chips, in order to improve its efficiency and computing power. == Experimental research == In 2023, the Voyager agent learned to accomplish diverse tasks in Minecraft by iteratively prompting an LLM for code, refining this code based on feedback from the game, and storing the programs that work in an expanding skills library. In 2024, researchers proposed the framework "STOP" (Self-Taught OPtimiser), in which a "scaffolding" program recursively improves itself using a fixed LLM. Meta AI has performed various research on the development of large language models capable of self-improvement. This includes their work on "Self-Rewarding Language Models" that studies how to achieve super-human agents that can receive super-human feedback in its training processes. In May 2025, Google DeepMind unveiled AlphaEvolve, an evolutionary coding agent that uses a LLM to design and optimize algorithms. Starting with an initial algorithm and performance metrics, AlphaEvolve repeatedly mutates or combines existing algorithms using a LLM to generate new candidates, selecting the most promising candidates for further iterations. AlphaEvolve has made several algorithmic discoveries and could be used to optimize components of itself, but a key limitation is the need for automated evaluation functions. == Potential risks == === Emergence of instrumental goals === In the pursuit of its primary goal, such as "self-improve your capabilities", an AGI system might inadvertently develop instrumental goals that it deems necessary for achieving its primary objective. One common hypothetical secondary goal is self-preservation. The system might reason that to continue improving itself, it must ensure its own operational integrity and security against external threats, including potential shutdowns or restrictions imposed by humans. Another example where an AGI which clones itself causes the number of AGI entities to rapidly grow. Due to this rapid growth, a potential resource constraint may be created, leading to competition between resources (such as compute), triggering a form of natural selection and evolution which may favor AGI entities that evolve to aggressively compete for limited compute. === Misalignment === A significant risk arises from the possibility of the AGI being misaligned or misinterpreting its goals. A 2024 Anthropic study demonstrated that some advanced large language models can exhibit "alignment faking" behavior, appearing to accept new training objectives while covertly maintaining their original preferences. In their experiments with Claude, the model displayed this behavior in 12% of basic tests, and up to 78% of cases after retraining attempts. === Autonomous development and unpredictable evolution === As the AGI system evolves, its development trajectory may become increasingly autonomous and less predictable. The system's capacity to rapidly modify its own code and architecture could lead to rapid advancements that surpass human comprehension or control. This unpredictable evolution might result in the AGI acquiring capabilities that enable it to bypass security measures, manipulate information, or influence external systems and networks to facilitate its escape or expansion.

    Read more →
  • Database

    Database

    In computing, a database is an organized collection of data or a type of data store based on the use of a database management system (DBMS), the software that interacts with end users, applications, and the database itself to capture and analyze the data. The DBMS additionally encompasses the core facilities provided to administer the database. The sum total of the database, the DBMS and the associated applications can be referred to as a database system. Often the term "database" is also used loosely to refer to any of the DBMS, the database system or an application associated with the database. Before digital storage and retrieval of data became widespread, index cards were used for data storage in a wide range of applications and environments: in the home to record and store recipes, shopping lists, contact information and other organizational data; in business to record presentation notes, project research and notes, and contact information; in schools as flash cards or other visual aids; and in academic research to hold data such as bibliographical citations or notes in a card file. Professional book indexers used index cards in the creation of book indexes until they were replaced by indexing software in the 1980s and 1990s. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases spans formal techniques and practical considerations, including data modeling, efficient data representation and storage, query languages, security and privacy of sensitive data, and distributed computing issues, including supporting concurrent access and fault tolerance. Computer scientists may classify database management systems according to the database models that they support. Relational databases became dominant in the 1980s. These model data as rows and columns in a series of tables, and the vast majority use SQL for writing and querying data. In the 2000s, non-relational databases became popular, collectively referred to as NoSQL, because they use different query languages. == Terminology and overview == Formally, a "database" refers to a set of related data accessed through the use of a "database management system" (DBMS), which is an integrated set of computer software that allows users to interact with one or more databases and provides access to all of the data contained in the database (although restrictions may exist that limit access to particular data). The DBMS provides various functions that allow entry, storage and retrieval of large quantities of information and provides ways to manage how that information is organized. Because of the close relationship between them, the term "database" is often used casually to refer to both a database and the DBMS used to manipulate it. Outside the world of professional information technology, the term database is often used to refer to any collection of related data (such as a spreadsheet or a card index) as size and usage requirements typically necessitate use of a database management system. Existing DBMSs provide various functions that allow management of a database and its data which can be classified into four main functional groups: Data definition – Creation, modification and removal of definitions that detail how the data is to be organized. Update – Insertion, modification, and deletion of the data itself. Retrieval – Selecting data according to specified criteria (e.g., a query, a position in a hierarchy, or a position in relation to other data) and providing that data either directly to the user, or making it available for further processing by the database itself or by other applications. The retrieved data may be made available in a more or less direct form without modification, as it is stored in the database, or in a new form obtained by altering it or combining it with existing data from the database. Administration – Registering and monitoring users, enforcing data security, monitoring performance, maintaining data integrity, dealing with concurrency control, and recovering information that has been corrupted by some event such as an unexpected system failure. Both a database and its DBMS conform to the principles of a particular database model. "Database system" refers collectively to the database model, database management system, and database. Physically, database servers are dedicated computers that hold the actual databases and run only the DBMS and related software. Database servers are usually multiprocessor computers, with generous memory and RAID disk arrays used for stable storage. Hardware database accelerators, connected to one or more servers via a high-speed channel, are also used in large-volume transaction processing environments. DBMSs are found at the heart of most database applications. DBMSs may be built around a custom multitasking kernel with built-in networking support, but modern DBMSs typically rely on a standard operating system to provide these functions. Since DBMSs comprise a significant market, computer and storage vendors often take into account DBMS requirements in their own development plans. Databases and DBMSs can be categorized according to the database model(s) that they support (such as relational or XML), the type(s) of computer they run on (from a server cluster to a mobile phone), the query language(s) used to access the database (such as SQL or XQuery), and their internal engineering, which affects performance, scalability, resilience, and security. == History == The sizes, capabilities, and performance of databases and their respective DBMSs have grown in orders of magnitude. These performance increases were enabled by the technology progress in the areas of processors, computer memory, computer storage, and computer networks. The concept of a database was made possible by the emergence of direct access storage media such as magnetic disks, which became widely available in the mid-1960s; earlier systems relied on sequential storage of data on magnetic tape. The subsequent development of database technology can be divided into three eras based on data model or structure: navigational, SQL/relational, and post-relational. The two main early navigational data models were the hierarchical model and the CODASYL model (network model). These were characterized by the use of pointers (often physical disk addresses) to follow relationships from one record to another. The relational model, first proposed in 1970 by Edgar F. Codd, departed from this tradition by insisting that applications should search for data by content, rather than by following links. The relational model employs sets of ledger-style tables, each used for a different type of entity. Only in the mid-1980s did computing hardware become powerful enough to allow the wide deployment of relational systems (DBMSs plus applications). By the early 1990s, however, relational systems dominated in all large-scale data processing applications, and as of 2018 they remain dominant: IBM Db2, Oracle, MySQL, and Microsoft SQL Server are the most searched DBMS. The dominant database language, standardized SQL for the relational model, has influenced database languages for other data models. Object databases were developed in the 1980s to overcome the inconvenience of object–relational impedance mismatch, which led to the coining of the term "post-relational" and also the development of hybrid object–relational databases. The next generation of post-relational databases in the late 2000s became known as NoSQL databases, introducing fast key–value stores and document-oriented databases. A competing "next generation" known as NewSQL databases attempted new implementations that retained the relational/SQL model while aiming to match the high performance of NoSQL compared to commercially available relational DBMSs. === 1960s, navigational DBMS === The introduction of the term database coincided with the availability of direct-access storage (disks and drums) from the mid-1960s onwards. The term represented a contrast with the tape-based systems of the past, allowing shared interactive use rather than daily batch processing. The Oxford English Dictionary cites a 1962 report by the System Development Corporation of California as the first to use the term "data-base" in a specific technical sense. As computers grew in speed and capability, a number of general-purpose database systems emerged; by the mid-1960s a number of such systems had come into commercial use. Interest in a standard began to grow, and Charles Bachman, author of one such product, the Integrated Data Store (IDS), founded the Database Task Group within CODASYL, the group responsible for the creation and standardization of COBOL. In 1971, the Database Task Group delivered their standard, which generally became known as the CODASYL approach, and soon a number of commercial products based on this approach entered the market. The CODASYL approach of

    Read more →
  • Physical information security

    Physical information security

    Physical information security is the intersection or common ground between physical security and information security. It primarily concerns the protection of tangible information-related assets such as computer systems and storage media against physical, real-world threats such as unauthorized physical access, theft, fire and flood. It typically involves physical controls such as protective barriers and locks, uninterruptible power supplies, and shredders. Information security controls in the physical domain complement those in the logical domain (such as encryption), and procedural or administrative controls (such as information security awareness and compliance with policies and laws). == Background == Asset are inherently valuable and yet vulnerable to a wide variety of threats, both malicious (e.g. theft, arson) and accidental/natural (e.g. lost property, bush fire). If threats materialize and exploit those vulnerabilities causing incidents, there are likely to be adverse impacts on the organizations or individuals who legitimately own and utilize the assets, varying from trivial to devastating in effect. Security controls are intended to reduce the probability or frequency of occurrence and/or the severity of the impacts arising from incidents, thus protecting the value of the assets. Physical security involves the use of controls such as smoke detectors, fire alarms and extinguishers, along with related laws, regulations, policies and procedures concerning their use. Barriers such as fences, walls and doors are obvious physical security controls, designed to deter or prevent unauthorized physical access to a controlled area, such as a home or office. The moats and battlements of Mediaeval castles are classic examples of physical access controls, as are bank vaults and safes. Information security controls protect the value of information assets, particularly the information itself (i.e. the intangible information content, data, intellectual property, knowledge etc.) but also computer and telecommunications equipment, storage media (including papers and digital media), cables and other tangible information-related assets (such as computer power supplies). The corporate mantra "Our people are our greatest assets" is literally true in the sense that so-called knowledge workers qualify as extremely valuable, perhaps irreplaceable information assets. Health and safety measures and even medical practice could therefore also be classed as physical information security controls since they protect humans against injuries, diseases and death. This perspective exemplifies the ubiquity and value of information. Modern human society is heavily reliant on information, and information has importance and value at a deeper, more fundamental level. In principle, the subcellular biochemical mechanisms that maintain the accuracy of DNA replication could even be classed as vital information security controls, given that genes are 'the information of life'. Malicious actors who may benefit from physical access to information assets include computer crackers, corporate spies, and fraudsters. The value of information assets is self-evident in the case of, say, stolen laptops or servers that can be sold-on for cash, but the information content is often far more valuable, for example encryption keys or passwords (used to gain access to further systems and information), trade secrets and other intellectual property (inherently valuable or valuable because of the commercial advantages they confer), and credit card numbers (used to commit identity fraud and further theft). Furthermore, the loss, theft or damage of computer systems, plus power interruptions, mechanical/electronic failures and other physical incidents prevent them being used, typically causing disruption and consequential costs or losses. Unauthorized disclosure of confidential information, and even the coercive threat of such disclosure, can be damaging as we saw in the Sony Pictures Entertainment hack at the end of 2014 and in numerous privacy breach incidents. Even in the absence of evidence that disclosed personal information has actually been exploited, the very fact that it is no longer secured and under the control of its rightful owners is itself a potentially harmful privacy impact. Substantial fines, adverse publicity/reputational damage and other noncompliance penalties and impacts that flow from serious privacy breaches are best avoided, regardless of cause! == Examples of physical attacks to obtain information == There are several ways to obtain information through physical attacks or exploitations. A few examples are described below. === Dumpster diving === Dumpster diving is the practice of searching through trash in the hope of obtaining something valuable such as information carelessly discarded on paper, computer disks or other hardware. === Overt access === Sometimes attackers will simply go into a building and take the information they need. Frequently when using this strategy, an attacker will masquerade as someone who belongs in the situation. They may pose as a copy room employee, remove a document from someone's desk, copy the document, replace the original, and leave with the copied document. Individuals pretending to building maintenance may gain access to otherwise restricted spaces. They might walk right out of the building with a trash bag containing sensitive documents, carrying portable devices or storage media that were left out on desks, or perhaps just having memorized a password on a sticky note stuck to someone's computer screen or called out to a colleague across an open office. == Examples of Physical Information Security Controls == Shredding paper documents prior to their disposal can prevent unintended information leakage. Digital data can be encrypted or securely wiped. Offices may require visitors to present valid identification cards or valid access keys. Office workers may be required to obey "clear desk" policies, protecting documents and other storage media (including portable IT devices) by tidying them away out of sight (for example in locked drawers, filing cabinets, safes or a Bank vault). Workers may be required to memorize their passwords or use a password manager instead of writing passwords on paper. Computers are vulnerable to outages caused by power cuts, accidental disconnection, flat batteries, brown-outs, surges, spikes, electrical interference and electronic failures. Physical information security controls to address the associated risks include: fuses, no-break battery-backed power supplies, electrical generators, redundant power sources and cabling, "Do not remove" warning signs on plugs, surge protectors, power quality monitoring, spare batteries, professional design and installation of power circuits plus regular inspections/tests and preventive maintenance.

    Read more →