AI Grammar Summarizer

AI Grammar Summarizer — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Wetware computer

    Wetware computer

    A wetware computer is an organic computer (which can also be known as an artificial organic brain or a neurocomputer) composed of organic material "wetware" such as "living" neurons. Wetware computers composed of neurons are different than conventional computers because they use biological materials, and offer the possibility of substantially more energy-efficient computing. While a wetware computer is still largely conceptual, there has been limited success with construction and prototyping, which has acted as a proof of the concept's realistic application to computing in the future. The most notable prototypes have stemmed from the research completed by biological engineer William Ditto during his time at the Georgia Institute of Technology. His work constructing a simple neurocomputer capable of basic addition from leech neurons in 1999 was a significant discovery for the concept. This research was a primary example driving interest in creating these artificially constructed, but still organic brains. == Origins and theoretical foundations == The term wetware came from cyberpunk fiction, notably through Gibson's Neuromancer, but was quickly taken up in scientific literature to explain computation by biological material. Theories of early biological computation borrowed from Alan Turing's morphogenesis model, which showed that chemical interactions could produce complex patterns without centralized control. Hopfield's associative memory networks also provided a foundation for biological information systems with fault tolerance and self-organization. == Major characteristics and processes == Biological wetware systems demonstrate dynamic reconfigurability underpinned by neuroplasticity and enable continuous learning and adaptation. Reaction-diffusion-based computing and molecular logic gates allow spatially parallel information processing unachievable in conventional systems. These systems also show fault tolerance and self-repair at the cellular and network level. The development of cerebral organoids—miniature lab-grown brains—demonstrates spontaneous learning behavior and suggests biological tissue as a viable computational substrate. == Overview == The concept of wetware is an application of specific interest to the field of computer manufacturing. Moore's law, which states that the number of transistors which can be placed on a silicon chip is doubled roughly every two years, has acted as a goal for the industry for decades, but as the size of computers continues to decrease, the ability to meet this goal has become more difficult, threatening to reach a plateau. Due to the difficulty in reducing the size of computers because of size limitations of transistors and integrated circuits, wetware provides an unconventional alternative. A wetware computer composed of neurons is an ideal concept because, unlike conventional materials which operate in binary (on/off), a neuron can shift between thousands of states, constantly altering its chemical conformation, and redirecting electrical pulses through over 200,000 channels in any of its many synaptic connections. Because of this large difference in the possible settings for any one neuron, compared to the binary limitations of conventional computers, the space limitations are far fewer. == Background == The concept of wetware is distinct and unconventional and draws slight resonance with both hardware and software from conventional computers. While hardware is understood as the physical architecture of traditional computational devices, comprising integrated circuits and supporting infrastructure, software represents the encoded architecture of storage and instructions. Wetware is a separate concept that uses the formation of organic molecules, mostly complex cellular structures (such as neurons), to create a computational device such as a computer. In wetware, the ideas of hardware and software are intertwined and interdependent. The molecular and chemical composition of the organic or biological structure would represent not only the physical structure of the wetware but also the software, being continually reprogrammed by the discrete shifts in electrical pulses and chemical concentration gradients as the molecules change their structures to communicate signals. The responsiveness of a cell, proteins, and molecules to changing conformations, both within their structures and around them, ties the idea of internal programming and external structure together in a way that is alien to the current model of conventional computer architecture. The structure of wetware represents a model where the external structure and internal programming are interdependent and unified; meaning that changes to the programming or internal communication between molecules of the device would represent a physical change in the structure. The dynamic nature of wetware borrows from the function of complex cellular structures in biological organisms. The combination of "hardware" and "software" into one dynamic, and interdependent system which uses organic molecules and complexes to create an unconventional model for computational devices is a specific example of applied biorobotics. === The cell as a model of wetware === Cells in many ways can be seen as their form of naturally occurring wetware, similar to the concept that the human brain is the preexisting model system for complex wetware. In his book Wetware: A Computer in Every Living Cell (2009) Dennis Bray explains his theory that cells, which are the most basic form of life, are just a highly complex computational structure, like a computer. To simplify one of his arguments a cell can be seen as a type of computer, using its structured architecture. In this architecture, much like a traditional computer, many smaller components operate in tandem to receive input, process the information, and compute an output. In an overly simplified, non-technical analysis, cellular function can be broken into the following components: Information and instructions for execution are stored as DNA in the cell, RNA acts as a source for distinctly encoded input, processed by ribosomes and other transcription factors to access and process the DNA and to output a protein. Bray's argument in favor of viewing cells and cellular structures as models of natural computational devices is important when considering the more applied theories of wetware to biorobotics. === Biorobotics === Wetware and biorobotics are closely related concepts, which both borrow from similar overall principles. A biorobotic structure can be defined as a system modeled from a preexisting organic complex or model such as cells (neurons) or more complex structures like organs (brain) or whole organisms. Unlike wetware, the concept of biorobotics is not always a system composed of organic molecules, but instead could be composed of conventional material which is designed and assembled in a structure similar or derived from a biological model. Biorobotics have many applications and are used to address the challenges of conventional computer architecture. Conceptually, designing a program, robot, or computational device after a preexisting biological model such as a cell, or even a whole organism, provides the engineer or programmer the benefits of incorporating into the structure the evolutionary advantages of the model. == Effects on users == Wetware technologies such as BCIs and neuromorphic chips offer new possibilities for user autonomy. For those with disabilities, such systems could restore motor or sensory functions and enhance quality of life. However, these technologies raise ethical questions: cognitive privacy, consent over biological data, and risk of exploitation. Without proper oversight, wetware technologies may also widen inequality, favoring those with access to cognitive enhancements. Open governance frameworks and ethical AI design grounded in neuro ethics will be essential. With the development of wetware devices, disparities in access could exacerbate social inequalities, benefiting those who have resources to enhance cognitive or physical abilities. It is necessary to create strong ethical frameworks, inclusive development practices, and open systems of governance to reduce risks and make sure that wetware advances are beneficial to all segments of society. == Applications and goals == === Basic neurocomputer composed of leech neurons === In 1999 William Ditto and his team of researchers at Georgia Institute of Technology and Emory University created a basic form of a wetware computer capable of simple addition by harnessing leech neurons. Leeches were used as a model organism due to the large size of their neuron, and the ease associated with their collection and manipulation. However, these results have never been published in a peer-reviewed journal, prompting questions about the validity of the claims. The computer was able to complete basic addition through electrical probes

    Read more →
  • Colloquis

    Colloquis

    Colloquis, previously known as ActiveBuddy and Conversagent, was a company that created conversation-based interactive agents originally distributed via instant messaging platforms. The company had offices in New York, New York, and Sunnyvale, California. == History == Founded in 2000, the company was the brainchild of Robert Hoffer, Timothy Kay, and Peter Levitan. The idea for interactive agents (also known as Internet bots) came from the team's vision to add functionality to increasingly popular instant messaging services. The original implementation took shape as a word-based adventure game but quickly grew to include a wide range of database applications, including access to news, weather, stock information, movie times, Yellow Pages listings, and detailed sports data, as well as a variety of tools (calculators, translator, etc.). These various applications were bundled into one entity and launched as SmarterChild in 2001. SmarterChild acted as a showcase for the quick data access and possibilities for fun conversation that the company planned to turn into customized, niche-specific products. The rapid success of SmarterChild led to targeted promotional products for Radiohead, Austin Powers, The Sporting News, and others. ActiveBuddy sought to strengthen its hold on the interactive agent market for the future by filing for, and receiving, a controversial patent on their creation in 2002. The company also released the BuddyScript SDK, a free developer kit that allow programmers to design and launch their own interactive agents using ActiveBuddy's proprietary scripting language, in 2002. Ultimately, however, the decline in ad spending in 2001 and 2002 led to a shift in corporate strategy towards business focused Automated Service Agents, building products for clients including Cingular, Comcast and Cox Communications. The company subsequently changed its name from ActiveBuddy to Conversagent in 2003, and then again to Colloquis in 2006. Colloquis was purchased by Microsoft in October 2006.

    Read more →
  • Teknomo–Fernandez algorithm

    Teknomo–Fernandez algorithm

    The Teknomo–Fernandez algorithm (TF algorithm), is an efficient algorithm for generating the background image of a given video sequence. By assuming that the background image is shown in the majority of the video, the algorithm is able to generate a good background image of a video in O ( R ) {\displaystyle O(R)} -time using only a small number of binary operations and Boolean bit operations, which require a small amount of memory and has built-in operators found in many programming languages such as C, C++, and Java. == History == People tracking from videos usually involves some form of background subtraction to segment foreground from background. Once foreground images are extracted, then desired algorithms (such as those for motion tracking, object tracking, and facial recognition) may be executed using these images. However, background subtraction requires that the background image is already available and unfortunately, this is not always the case. Traditionally, the background image is searched for manually or automatically from the video images when there are no objects. More recently, automatic background generation through object detection, medial filtering, medoid filtering, approximated median filtering, linear predictive filter, non-parametric model, Kalman filter, and adaptive smoothening have been suggested; however, most of these methods have high computational complexity and are resource-intensive. The Teknomo–Fernandez algorithm is also an automatic background generation algorithm. Its advantage, however, is its computational speed of only O ( R ) {\displaystyle O(R)} -time, depending on the resolution R {\displaystyle R} of an image and its accuracy gained within a manageable number of frames. Only at least three frames from a video is needed to produce the background image assuming that for every pixel position, the background occurs in the majority of the videos. Furthermore, it can be performed for both grayscale and colored videos. == Assumptions == The camera is stationary. The light of the environment changes only slowly relative to the motions of the people in the scene. The number of people does not occupy the scene for most of the time at the same place. Generally, however, the algorithm will certainly work whenever the following single important assumption holds: For each pixel position, the majority of the pixel values in the entire video contain the pixel value of the actual background image (at that position).As long as each part of the background is shown in the majority of the video, the entire background image needs not to appear in any of its frames. The algorithm is expected to work accurately. == Background image generation == === Equations === For three frames of image sequence x 1 {\displaystyle x_{1}} , x 2 {\displaystyle x_{2}} , and x 3 {\displaystyle x_{3}} , the background image B {\displaystyle B} is obtained using B = x 3 ( x 1 ⊕ x 2 ) + x 1 x 2 {\displaystyle B=x_{3}(x_{1}\oplus x_{2})+x_{1}x_{2}} where ⊕ {\displaystyle \oplus } denotes the exclusive disjunctive bit operator. The Boolean mode function S {\displaystyle S} of the table occurs when the number of 1 entries is larger than half of the number of images such that S = { 1 , if ∑ i = 1 n x i ≥ ⌈ n 2 + 1 ⌉ , and n ≥ 3 0 , otherwise {\displaystyle S={\begin{cases}1,&{\text{if }}\sum _{i=1}^{n}x_{i}\geq \left\lceil {\frac {n}{2}}+1\right\rceil ,{\text{ and }}n\geq 3\\0,&{\text{otherwise}}\end{cases}}} For three images, the background image B {\displaystyle B} can be taken as the value x ¯ 1 x 2 x 3 + x 1 x ¯ 2 x 3 + x 1 x 2 x ¯ 3 + x 1 x 2 x 3 {\displaystyle {\bar {x}}_{1}x_{2}x_{3}+x_{1}{\bar {x}}_{2}x_{3}+x_{1}x_{2}{\bar {x}}_{3}+x_{1}x_{2}x_{3}} === Background generation algorithm === At the first level, three frames are selected at random from the image sequence to produce a background image by combining them using the first equation. This yields a better background image at the second level. The procedure is repeated until desired level L {\displaystyle L} . == Theoretical accuracy == At level ℓ {\displaystyle \ell } , the probability p ℓ {\displaystyle p_{\ell }} that the modal bit predicted is the actual modal bit is represented by the equation p ℓ = ( p ℓ − 1 ) 3 + 3 ( p ℓ − 1 ) 2 ( 1 − p ℓ − 1 ) {\displaystyle p_{\ell }=(p_{\ell -1})^{3}+3(p_{\ell -1})^{2}(1-p_{\ell -1})} . The table below gives the computed probability values across several levels using some specific initial probabilities. It can be observed that even if the modal bit at the considered position is at a low 60% of the frames, the probability of accurate modal bit determination is already more than 99% at 6 levels. == Space complexity == The space requirement of the Teknomo–Fernandez algorithm is given by the function O ( R F + R 3 L ) {\displaystyle O(RF+R3^{L})} , depending on the resolution R {\displaystyle R} of the image, the number F {\displaystyle F} of frames in the video, and the desired number L {\displaystyle L} of levels. However, the fact that L {\displaystyle L} will probably not exceed 6 reduces the space complexity to O ( R F ) {\displaystyle O(RF)} . == Time complexity == The entire algorithm runs in O ( R ) {\displaystyle O(R)} -time, only depending on the resolution of the image. Computing the modal bit for each bit can be done in O ( 1 ) {\displaystyle O(1)} -time while the computation of the resulting image from the three given images can be done in O ( R ) {\displaystyle O(R)} -time. The number of the images to be processed in L {\displaystyle L} levels is O ( 3 L ) {\displaystyle O(3^{L})} . However, since L ≤ 6 {\displaystyle L\leq 6} , then this is actually O ( 1 ) {\displaystyle O(1)} , thus the algorithm runs in O ( R ) {\displaystyle O(R)} . == Variants == A variant of the Teknomo–Fernandez algorithm that incorporates the Monte-Carlo method named CRF has been developed. Two different configurations of CRF were implemented: CRF9,2 and CRF81,1. Experiments on some colored video sequences showed that the CRF configurations outperform the TF algorithm in terms of accuracy. However, the TF algorithm remains more efficient in terms of processing time. == Applications == Object detection Face detection Face recognition Pedestrian detection Video surveillance Motion capture Human-computer interaction Content-based video coding Traffic monitoring Real-time gesture recognition

    Read more →
  • Lessac Technologies

    Lessac Technologies

    Lessac Technologies, Inc. (LTI) is an American firm which develops voice synthesis software, licenses technology and sells synthesized novels as MP3 files. The firm currently has seven patents granted and three more pending for its automated methods of converting digital text into human-sounding speech, more accurately recognizing human speech and outputting the text representing the words and phrases of said speech, along with recognizing the speaker's emotional state. The LTI technology is partly based on the work of the late Arthur Lessac, a Professor of Theater at the State University of New York and the creator of Lessac Kinesensic Training, and LTI has licensed exclusive rights to exploit Arthur Lessac's copyrighted works in the fields of speech synthesis and speech recognition. Based on the view that music is speech and speech is music, Lessac's work and books focused on body and speech energies and how they go together. Arthur Lessac's textual annotation system, which was originally developed to assist actors, singers, and orators in marking up scripts to prepare for performance, is adapted in LTI's speech synthesis system as the basic representation of the speech to be synthesized (Lessemes), in contrast to many other systems which use a phonetic representation. LTI's software has two major components: (1) a linguistic front-end that converts plain text to a sequence of prosodic and phonosensory graphic symbols (Lessemes) based on Arthur Lessac's annotation system, which specify the speech units to be synthesized; (2) a signal-processing back-end that takes the Lessemes as acoustic data and produces human-sounding synthesized speech as output, using unit selection and concatenation. LTI's text-to-speech system came in second in the world-wide Blizzard Challenge 2011 and 2012. The first-place team in 2011 also employed LTI's "front-end" technology, but with its own back-end. The Blizzard Challenge, conducted by the Language Technologies Institute of Carnegie Mellon University, was devised as a way to evaluate speech synthesis techniques by having different research groups build voices from the same voice-actor recordings, and comparing the results through listening tests. LTI was founded in 2000 by H. Donald Wilson (chairman), a lawyer, LexisNexis entrepreneur and business associate of Arthur Lessac; and Gary A. Marple (chief inventor), after Marple suggested that Arthur Lessac's kinesensic voice training might be applicable to computational linguistics. After Wilson's death in 2006, his nephew John Reichenbach became the firm's CEO.

    Read more →
  • Beauty.AI

    Beauty.AI

    Beauty.AI is a mobile beauty pageant for humans and a contest for programmers developing algorithms for evaluating human appearance. The mobile app and website created by Youth Laboratories that uses artificial intelligence technology to evaluate people's external appearance through certain algorithms, such as symmetry, facial blemishes, wrinkles, estimated age and age appearance, and comparisons to actors and models. The Beauty.AI 2.0 contest caused great concern over important ethical issues with deep neural networks such as age, race and gender bias and lead to the creation of the Diversity.AI think tank dedicated to developing new methods for uncovering and managing bias in artificially intelligent systems. Beauty.AI was also an attempt to find approaches on how machines can perceive human face through evaluating particular features, commonly associated with health and beauty. == Concept == The Beauty.AI app was created by Youth Laboratories, a company based out of Russia and Hong Kong that focuses on facial skin analytics. The bioinformation company Insilico Medicine assists in the Beauty.AI app by testing its deep learning techniques to the app. One goal of the app is to reduce the need for human and animal testing as well as improving people's overall health. Its first contest was started in December 2016, and the results were announced in August 2016. More than 60,000 people submitted entries into the contest. The mobile app uses artificial intelligence technology to inspect photographs for certain facial features in order to both determine a person's beauty through artificial means by multiple robots. Part of the Beauty.AI app's purpose is to collect visual and anecdotal data to improve its creator's Youth Laboratories skin analyst skills. == Accusations of racism == There were a total of 44 individuals from different age groups and genders judged as the most attractive, with 37 white entrants, six Asian entrants, and one dark-skinned entrant. The app has received criticism from social justice advocates and computer science professionals. However, Alex Zhavoronkov, PhD, chief science officer of Youth Laboratories and chief technology officer Konstantin Kiselev, both for Youth Laboratories, noted that a lack of data may have contributed to these results. Also, Kiselev added that another issue was that approximately 75% of entrants were white Europeans, whereas only 7% and 1% were from India and Africa, respectively. Kiselev stated that they would work on doing more and better outreach to these areas to improve in this area. Despite this, it was said by Dr. Zhavoronkov that the AI would discard photos of dark-skinned people if the lighting is too poor. Dr. Zhavoronkov vowed to weed out the issues for the next beauty pageant and to try to avoid a similar controversy in the future.

    Read more →
  • Artificial intelligence content detection

    Artificial intelligence content detection

    Artificial intelligence detection software aims to determine whether some content (text, image, video, or audio) was generated using artificial intelligence (AI). This software is often unreliable. == Accuracy issues == Many AI detection tools have been shown to be unreliable in detecting AI-generated text. In a 2023 study conducted by Weber-Wulff et al., researchers evaluated 14 detection tools including Turnitin and GPTZero and found that "all scored below 80% of accuracy and only 5 over 70%." They also found that these tools tend to have a bias for classifying texts more as human than as AI, and that accuracy of these tools worsens upon paraphrasing. === False positives === In AI content detection, a false positive is when human-written work is incorrectly flagged as AI-written. Many AI detection platforms claim to have a minimal level of false positives, with Turnitin claiming a less than 1% false positive rate. However, later research by The Washington Post produced much higher rates of 50%, though they used a smaller sample size. False positives in an academic setting frequently lead to accusations of academic misconduct, which can have serious consequences for a student's academic record. Additionally, studies have shown evidence that many AI detection models are prone to give false positives to work written by people whose first language is not English, and also to neurodivergent people. In June 2023, Janelle Shane wrote that portions of her book You Look Like a Thing and I Love You were flagged as AI-generated. === False negatives === A false negative is a failure to identify documents with AI-written text. False negatives often happen as a result of a detection software's sensitivity level or because evasive techniques were used when generating the work to make it sound more human. False negatives are less of a concern academically, since they aren't likely to lead to accusations and ramifications. Notably, Turnitin stated they have a 15% false negative rate. == Text detection == For text, this is usually done to prevent alleged plagiarism, often by detecting repetition of words as telltale signs that a text was AI-generated (including hallucinations). Detection systems may also rely on stylistic and structural regularities associated with LLM output, such as unusually consistent grammar, formulaic transitions, repeated discourse markers, and recurring rhetorical templates. Some tools are designed less to establish authorship provenance than to flag prose that resembles common LLM-generated style patterns. They are often used by teachers marking their students, usually on an ad hoc basis. Following the release of ChatGPT and similar AI text generative software, many educational establishments have issued policies against the use of AI by students. AI text detection software is also used by those assessing job applicants, as well as online search engines, hiring, online moderation and publishing. Current detectors may sometimes be unreliable and have incorrectly marked work by humans as originating from AI while failing to detect AI-generated work in other instances. MIT Technology Review said that the technology "struggled to pick up ChatGPT-generated text that had been slightly rearranged by humans and obfuscated by a paraphrasing tool". AI text detection software has also been shown to discriminate against non-native speakers of English. Two students from the University of California, Davis, were referred to the university's Office of Student Success and Judicial Affairs (OSSJA) after their professors scanned their essays with positive results; the first with an AI detector called GPTZero, and the second with an AI detector integration in Turnitin. However, following media coverage, and a thorough investigation, the students were cleared of any wrongdoing. In April 2023, Cambridge University and other members of the Russell Group of universities in the United Kingdom opted out of Turnitin's AI text detection tool, after expressing concerns it was unreliable. The University of Texas at Austin opted out of the system six months later. In May 2023, a professor at Texas A&M University–Commerce used ChatGPT to detect whether his students' content was written by it, which ChatGPT said was the case. As such, he threatened to fail the class despite ChatGPT not being able to detect AI-generated writing. No students were prevented from graduating because of the issue, and all but one student (who admitted to using the software) were exonerated from accusations of having used ChatGPT in their content. In July 2023, a paper titled "GPT detectors are biased against non-native English writers" was released, reporting that GPTs discriminate against non-native English authors. The paper compared seven GPT detectors against essays from both non-native English speakers and essays from United States students. The essays from non-native English speakers had an average false positive rate of 61.3%. An article by Thomas Germain, published on Gizmodo in June 2024, reported job losses among freelance writers and journalists due to AI text detection software mistakenly classifying their work as AI-generated. In September 2024, Common Sense Media reported that generative AI detectors had a 20% false positive rate for Black students, compared to 10% of Latino students and 7% of White students. To improve the reliability of AI text detection, researchers have explored digital watermarking techniques. A 2023 paper titled "A Watermark for Large Language Models" presents a method to embed imperceptible watermarks into text generated by large language models (LLMs). This watermarking approach allows content to be flagged as AI-generated with a high level of accuracy, even when text is slightly paraphrased or modified. The technique is designed to be subtle and hard to detect for casual readers, thereby preserving readability, while providing a detectable signal for those employing specialized tools. However, while promising, watermarking faces challenges in remaining robust under adversarial transformations and ensuring compatibility across different LLMs. == Anti text detection == There is software available designed to bypass AI text detection. In practice, evasion may not require specialized bypass tools. Paraphrasing, style editing, and removal of repeated discourse markers can substantially reduce the effectiveness of detectors that rely on recognizable surface patterns. A study published in August 2023 analyzed 20 abstracts from papers published in the Eye Journal, which were then paraphrased using GPT-4.0. The AI-paraphrased abstracts were examined for plagiarism using QueText and for AI-generated content using Originality.AI. The texts were then re-processed through an adversarial software called Undetectable.ai in order to reduce the AI-detection scores. The study found that the AI detection tool, Originality.AI, identified text generated by GPT-4 with a mean accuracy of 91.3%. However, after reprocessing by Undetectable.ai, the detection accuracy of Originality.ai dropped to a mean accuracy of 27.8%. Some experts also believe that techniques like digital watermarking are ineffective because they can be removed or added to trigger false positives. "A Watermark for Large Language Models" paper by Kirchenbauer et al. (2023) also addresses potential vulnerabilities of watermarking techniques. The authors outline a range of adversarial tactics, including text insertion, deletion, and substitution attacks, that could be used to bypass watermark detection. These attacks vary in complexity, from simple paraphrasing to more sophisticated approaches involving tokenization and homoglyph alterations. The study highlights the challenge of maintaining watermark robustness against attackers who may employ automated paraphrasing tools or even specific language model replacements to alter text spans iteratively while retaining semantic similarity. Experimental results show that although such attacks can degrade watermark strength, they also come at the cost of text quality and increased computational resources. == Image, video, and audio detection == Several purported AI image detection software exist, to detect AI-generated images (for example, those originating from Midjourney or DALL-E). They are not completely reliable. Industry analyses have also noted that AI-driven image recognition systems often struggle in real-world environments, where inconsistent lighting, noise and variable visual inputs reduce detection reliability, a challenge highlighted in modern agricultural quality-control research. Others claim to identify video and audio deepfakes, but this technology is also not fully reliable yet either. Despite debate around the efficacy of watermarking, Google DeepMind is actively developing a detection software called SynthID, which works by inserting a digital watermark that is invisible to the human eye into the pixels of an image.

    Read more →
  • Common data model

    Common data model

    A common data model (CDM) can refer to any standardised data model which allows for data and information exchange between different applications and data sources. Common data models aim to standardise logical infrastructure so that related applications can "operate on and share the same data", and can be seen as a way to "organize data from many sources that are in different formats into a standard structure". A common data model has been described as one of the components of a "strong information system". A standardised common data model has also been described as a typical component of a well designed agile application besides a common communication protocol. Providing a single common data model within an organisation is one of the typical tasks of a data warehouse. == Examples of common data models == === Border crossings === X-trans.eu was a cross-border pilot project between the Free State of Bavaria (Germany) and Upper Austria with the aim of developing a faster procedure for the application and approval of cross-border large-capacity transports. The portal was based on a common data model that contained all the information required for approval. === Climate data === The Climate Data Store Common Data Model is a common data model set up by the Copernicus Climate Change Service for harmonising essential climate variables from different sources and data providers. === General information technology === Within service-oriented architecture, S-RAMP is a specification released by HP, IBM, Software AG, TIBCO, and Red Hat which defines a common data model for SOA repositories as well as an interaction protocol to facilitate the use of common tooling and sharing of data. Content Management Interoperability Services (CMIS) is an open standard for inter-operation of different content management systems over the internet, and provides a common data model for typed files and folders used with version control. The NetCDF software libraries for array-oriented scientific data implements a common data model called the NetCDF Java common data model, which consists of three layers built on top of each other to add successively richer semantics. === Health === Within genomic and medical data, the Observational Medical Outcomes Partnership (OMOP) research program established under the U.S. National Institutes of Health has created a common data model for claims and electronic health records which can accommodate data from different sources around the world. PCORnet, which was developed by the Patient-Centered Outcomes Research Institute, is another common data model for health data including electronic health records and patient claims. The Sentinel Common Data Model was initially started as Mini-Sentinel in 2008. It is used by the Sentinel Initiative of the USA's Food and Drug Administration. The Generalized Data Model was first published in 2019. It was designed to be a stand-alone data model as well as to allow for further transformation into other data models (e.g., OMOP, PCORNet, Sentinel). It has a hierarchical structure to flexibly capture relationships among data elements. The JANUS clinical trial data repository also provides a common data model which is based on the SDTM standard to represent clinical data submitted to regulatory agencies, such as tabulation datasets, patient profiles, listings, etc. === Logistics === SX000i is a specification developed jointly by the Aerospace and Defence Industries Association of Europe (ASD) and the American Aerospace Industries Association (AIA) to provide information, guidance and instructions to ensure compatibility and the commonality. The associated SX002D specification contains a common data model. === Microsoft Common Data Model === The Microsoft Common Data Model is a collection of many standardised extensible data schemas with entities, attributes, semantic metadata, and relationships, which represent commonly used concepts and activities in various businesses areas. It is maintained by Microsoft and its partners, and is published on GitHub. Microsoft's Common Data Model is used amongst others in Microsoft Dataverse and with various Microsoft Power Platform and Microsoft Dynamics 365 services. === Rail transport === RailTopoModel is a common data model for the railway sector. === Other === There are many more examples of various common data models for different uses published by different sources.

    Read more →
  • DeepSeek (chatbot)

    DeepSeek (chatbot)

    DeepSeek is a generative artificial intelligence chatbot developed by the Chinese company DeepSeek. Released on 20 January 2025, DeepSeek-R1 surpassed ChatGPT as the most downloaded freeware app on the iOS App Store in the United States by 27 January. DeepSeek's success against larger and more established rivals has been described as "upending AI" and initiating "a global AI space race". DeepSeek's compliance with Chinese government censorship policies and its data collection practices have also raised concerns over privacy and information control in the model, prompting regulatory scrutiny in multiple countries. However, it has also been praised for its open weights and infrastructure code, energy efficiency and contributions to open-source artificial intelligence. == History == On 10 January 2025, DeepSeek released the chatbot, based on the DeepSeek-R1 model, for iOS and Android. By 27 January, DeepSeek-R1 surpassed ChatGPT as the most-downloaded freeware app on the iOS App Store in the United States, which resulted in an 18% drop in Nvidia's share price. And after a "large-scale" cyberattack on the same day disrupted the proper functioning of its servers, DeepSeek had limited its new user registration to phone numbers from mainland China, email addresses, or Google account logins. On 3 April 2025, in collaboration with researchers at Tsinghua University, DeepSeek published a paper unveiling a new model that combines the techniques generative reward modeling (GRM) and self-principled critique tuning (SPCT). The resulting model is referred to as DeepSeek-GRM. The goal of using these techniques is to foster more effective inference-time scaling within their LLM and chatbot services. Notably, DeepSeek has said that these new models will be released and made open source. On 30 April 2025, Deepseek released its math-focused Artificial Intelligence Model named "DeepSeek-Prover-V2-671B". This model is useful for formal theorem proving and mathematical reasoning. On 24 April 2026, DeepSeek released DeepSeek V4 and V4-Pro. == Usage == DeepSeek can answer questions, solve logic problems, and write computer programs on par with other chatbots, according to benchmark tests used by American AI companies. Users can access the chatbot for free through the official DeepSeek website or mobile application, without limitation on the number of queries. DeepSeek only supports user-signup via a global email service, e.g. Gmail, Google or Yahoo. DeepSeek also offers access to the R1 and V3 models that power the chatbot via an API with a usage-based pricing model. This modality is primarily targeted towards developers and businesses. As of February 2025, API usage is priced at approximately $0.28 per million input tokens and $0.42 per million output tokens, making it less expensive than some competing services. Its web version is completely free, with 500 messages per hour cap limit to prevent bots from spamming. == Operation == DeepSeek-V3 uses significantly fewer resources compared to its peers. For example, whereas the world's leading AI companies train their chatbots with supercomputers using as many as 16,000 graphics processing units (GPUs), DeepSeek claims to have needed only about 2,000 GPUs—namely, the H800 series chips from Nvidia. It was trained in around 55 days at a cost of US$5.58 million, which is roughly one-tenth of what tech giant Meta spent building its latest AI technology. == Reactions == DeepSeek's success against larger and more established rivals has been described as "upending AI", constituting "the first shot at what is emerging as a global AI space race", and ushering in "a new era of AI brinkmanship". === Challenge to US AI dominance === DeepSeek's competitive performance at relatively minimal cost has been recognized as potentially challenging the global dominance of American AI models. Various publications and news media, such as The Hill and The Guardian, have described the release of the R1 chatbot as a "Sputnik moment" for American AI, echoing Marc Andreessen's view. OpenAI wrote a letter to the Office of Science and Technology Policy (OSTP), in March 2025, citing issues concerning a possibility that Deepseek could manipulate responses to cause harm. === Chinese perspective === DeepSeek's founder Liang Wenfeng has been compared to OpenAI CEO Sam Altman, with CNN calling him the Sam Altman of China and an evangelist for AI. Chinese state media widely praised DeepSeek as a national asset. On 20 January 2025, Chinese Premier Li Qiang invited Wenfeng to his symposium with experts and asked him to provide opinions and suggestions on a draft for comments of the annual 2024 government work report. On 20 February 2025, Wenfeng met with General Secretary of the Chinese Communist Party Xi Jinping, who encouraged party and state leaders to experiment with DeepSeek. Government officials responded to Xi's approval of the chatbot by reportedly using it to draft legal judgements, propose medical treatment plans, and analyze surveillance videos to search for missing persons. === Performance and success === Leading figures in the American AI sector had mixed reactions to DeepSeek's performance and success. Microsoft CEO Satya Nadella and OpenAI CEO Altman—whose companies are involved in the United States government-backed "Stargate Project" to develop American AI infrastructure—both called DeepSeek "super impressive". Various companies including Amazon Web Services, Toyota, and Stripe are seeking to use the model in their program. When American President Donald Trump announced The Stargate Project, he referred to DeepSeek as a wake-up call and a positive development. Other leaders in the AI field, however—including Scale AI CEO Alexandr Wang, Anthropic cofounder and CEO Dario Amodei, and Elon Musk—have expressed skepticism of the app's performance or of the sustainability of its success. Wang in particularly referred to DeepSeek-V3 as "earth-shattering" and DeepSeek-R1 as "top performing, or roughly on par with the best American models", but speculated that China may possess more AI-powering Nvidia H100 GPUs than thought. === Stock market implications === DeepSeek's optimization of limited resources has highlighted potential limits of United States sanctions on China's AI development, including export restrictions on advanced AI chips to China. The success of the company's AI models consequently "sparked market turmoil" and caused shares in major global technology companies to plunge on 27 January 2025: Nvidia's stock fell by as much as 17–18%, as did the stock of rival Broadcom. Other tech firms also sank, including Microsoft (down 2.5%), Google's owner Alphabet (down over 4%), and Dutch chip equipment maker ASML (down over 7%). A global sell-off of technology stocks on Nasdaq, prompted by the release of the R1 model, led to record losses of about $593 billion in the market capitalizations of AI and computer hardware companies; and by the next day a total of $1 trillion of value was wiped from American stocks. == Concerns == === Distillation === DeepSeek has been reported to sometimes claim that it is ChatGPT. OpenAI said that DeepSeek may have "inappropriately" used outputs from its model as training data in a process called distillation. However, there is currently no method to prove this conclusively. === Censorship === DeepSeek's compliance with Chinese government censorship policies and its data collection practices have raised concerns over information control in the model, prompting regulatory scrutiny in multiple countries. Reports indicate that it applies content moderation in accordance with the government's "public opinion guidance" regulations, limiting responses on topics such as the Tiananmen Square massacre and Taiwan's political status. DeepSeek models that have been uncensored also display a bias towards Chinese government viewpoints on controversial topics such as Xi Jinping's human rights record and Taiwan's political status. However, users who have downloaded the models and hosted them on their own devices and servers have reported successfully removing this censorship. Some sources have observed that the official application programming interface (API) version of R1, which runs from servers located in mainland China, uses censorship mechanisms for topics considered politically sensitive for the government of China. For example, the model may initially generate answers to questions about the 1989 Tiananmen Square massacre, persecution of Uyghurs, comparisons between Xi Jinping and Winnie the Pooh, and human rights in China, but a censorship mechanism deletes the uncensored response afterwards and replaces it with a message such as:"Sorry, that's beyond my current scope. Let's talk about something else." The post hoc censorship mechanisms and restrictions added on top of the model's output can be removed in the open-source version of the R1 model. If the "core Socialist values" defined by the Chinese Internet regul

    Read more →
  • Lexxe

    Lexxe

    Lexxe is an internet search engine that applies Natural Language Processing in its semantic search technology. Founded in 2005 by Dr. Hong Liang Qiao, Lexxe is based in Sydney, Australia. Today, Lexxe's key focus is on sentiment search with the launch of a news sentiment search site at News & Moods (www.newsandmoods.com). Lexxe has experienced several stages of change of focus in search technology: Lexxe launched its Alpha version in 2005, featuring Natural Language question answering (i.e. users could ask questions in English to the search engine apart from keyword searches — this feature has been suspended for redevelopment since 2010). It used only algorithms to extract answers from web pages, with no question-answer pair databases prepared in advance. In 2011, Lexxe launched a beta version with a new search technology called Semantic Key. Semantic Keys enable users to query with a conceptual keyword (or a keyword with a special meaning, hence the term Semantic Key) in order to find instances under the concept, e.g. price → $5.95 or €200, color → red, yellow, white. For example, “price: a pound of apples”, “color: ferrari”. With initial 500 Semantic Keys at the Beta launch, Lexxe became the first search engine in the world to offer this unique and useful search technology to the users. The cost of building Semantic Keys was too heavy though. In 2017, Lexxe launched News & Moods (www.newsandmoods.com), an open platform for news sentiment search, a first step towards sentiment search feature for the entire Internet search in Lexxe search engine. News & Moods also comes with smartphone apps in Android and iOS.

    Read more →
  • Saliency map

    Saliency map

    In computer vision, a saliency map is an image that highlights either the region on which people's eyes focus first or the most relevant regions for machine learning models. The goal of a saliency map is to reflect the degree of importance of a pixel to the human visual system or an otherwise opaque ML model. For example, in this image, a person first looks at the fort and light clouds, so they should be highlighted on the saliency map. == Application == === Overview === Saliency maps have applications in a variety of different problems. Some general applications: ==== Human eye ==== Image and video compression: The human eye focuses only on a small region of interest in the frame. Therefore, it is not necessary to compress the entire frame with uniform quality. According to the authors, using a salience map reduces the final size of the video with the same visual perception. Image and video quality assessment: The main task for an image or video quality metric is a high correlation with user opinions. Differences in salient regions are given more importance and thus contribute more to the quality score. Image retargeting: It aims at resizing an image by expanding or shrinking the noninformative regions. Therefore, retargeting algorithms rely on the availability of saliency maps that accurately estimate all the salient image details. Object detection and recognition: Instead of applying a computationally complex algorithm to the whole image, we can use it to the most salient regions of an image most likely to contain an object. the primary visual cortex (V1) appears to be responsible for the saliency map, according to the V1 Saliency Hypothesis. ==== Explainable artificial intelligence ==== Saliency maps are a prominent tool in explainable artificial intelligence, providing visual explanations of the decision-making process of machine learning models, particularly deep neural networks. These maps highlight the regions in input data that are most influential on the model's output, effectively indicating where the model is "looking" when making a prediction. In image classification tasks, for example, saliency maps can identify pixels or regions that contribute most to a specific class decision. Developed for convolutional neural networks, saliency mapping techniques range from simply taking the gradient of the class score with respect to the input data to more complex algorithms, such as integrated gradients and class activation mapping. In transformer architecture, attention mechanisms led to analogous saliency maps, such as attention maps, attention rollouts, and class-discriminative attention maps. === Saliency as a segmentation problem === Saliency estimation may be viewed as an instance of image segmentation. In computer vision, image segmentation is the process of partitioning a digital image into multiple segments (sets of pixels, also known as superpixels). The goal of segmentation is to simplify and/or change the representation of an image into something that is more meaningful and easier to analyze. Image segmentation is typically used to locate objects and boundaries (lines, curves, etc.) in images. More precisely, image segmentation is the process of assigning a label to every pixel in an image such that pixels with the same label share certain characteristics. == Algorithms == === Overview === There are three forms of classic saliency estimation algorithms implemented in OpenCV: Static saliency: Relies on image features and statistics to localize the regions of interest of an image. Motion saliency: Relies on motion in a video, detected by optical flow. Objects that move are considered salient. Objectness: Objectness reflects how likely an image window covers an object. These algorithms generate a set of bounding boxes of where an object may lie in an image. In addition to classic approaches, neural-network-based are also popular. There are examples of neural networks for motion saliency estimation: TASED-Net: It consists of two building blocks. First, the encoder network extracts low-resolution spatiotemporal features, and then the following prediction network decodes the spatially encoded features while aggregating all the temporal information. STRA-Net: It emphasizes two essential issues. First, spatiotemporal features integrated via appearance and optical flow coupling, and then multi-scale saliency learned via attention mechanism. STAViS: It combines spatiotemporal visual and auditory information. This approach employs a single network that learns to localize sound sources and to fuse the two saliencies to obtain a final saliency map. There's a new static saliency in the literature with name visual distortion sensitivity. It is based on the idea that the true edges, i.e. object contours, are more salient than the other complex textured regions. It detects edges in a different way from the classic edge detection algorithms. It uses a fairly small threshold for the gradient magnitudes to consider the mere presence of the gradients. So, it obtains 4 binary maps for vertical, horizontal and two diagonal directions. The morphological closing and opening are applied to the binary images to close the small gaps. To clear the blob-like shapes, it utilizes the distance transform. After all, the connected pixel groups are individual edges (or contours). A threshold of size of connected pixel set is used to determine whether an image block contains a perceivable edge (salient region) or not. === Example implementation === First, we should calculate the distance of each pixel to the rest of pixels in the same frame: S A L S ( I k ) = ∑ i = 1 N | I k − I i | {\displaystyle \mathrm {SALS} (I_{k})=\sum _{i=1}^{N}|I_{k}-I_{i}|} I i {\displaystyle I_{i}} is the value of pixel i {\displaystyle i} , in the range of [0,255]. The following equation is the expanded form of this equation. SALS(Ik) = |Ik - I1| + |Ik - I2| + ... + |Ik - IN| Where N is the total number of pixels in the current frame. Then we can further restructure our formula. We put the value that has same I together. SALS(Ik) = Σ Fn × |Ik - In| Where Fn is the frequency of In. And the value of n belongs to [0,255]. The frequencies is expressed in the form of histogram, and the computational time of histogram is ⁠ O ( N ) {\displaystyle O(N)} ⁠ time complexity. ==== Time complexity ==== This saliency map algorithm has ⁠ O ( N ) {\displaystyle O(N)} ⁠ time complexity. Since the computational time of histogram is ⁠ O ( N ) {\displaystyle O(N)} ⁠ time complexity which N is the number of pixel's number of a frame. Besides, the minus part and multiply part of this equation need 256 times operation. Consequently, the time complexity of this algorithm is ⁠ O ( N + 256 ) {\displaystyle O(N+256)} ⁠ which equals to ⁠ O ( N ) {\displaystyle O(N)} ⁠. ==== Pseudocode ==== All of the following code is pseudo MATLAB code. First, read data from video sequences. After we read data, we do superpixel process to each frame. Spnum1 and Spnum2 represent the pixel number of current frame and previous pixel. Then we calculate the color distance of each pixel, this process we call it contract function. After this two process, we will get a saliency map, and then store all of these maps into a new FileFolder. ==== Difference in algorithms ==== The major difference between function one and two is the difference of contract function. If spnum1 and spnum2 both represent the current frame's pixel number, then this contract function is for the first saliency function. If spnum1 is the current frame's pixel number and spnum2 represent the previous frame's pixel number, then this contract function is for second saliency function. If we use the second contract function which using the pixel of the same frame to get center distance to get a saliency map, then we apply this saliency function to each frame and use current frame's saliency map minus previous frame's saliency map to get a new image which is the new saliency result of the third saliency function. == Datasets == The saliency dataset usually contains human eye movements on some image sequences. It is valuable for new saliency algorithm creation or benchmarking the existing one. The most valuable dataset parameters are spatial resolution, size, and eye-tracking equipment. Here is part of the large datasets table from MIT/Tübingen Saliency Benchmark datasets, for example. To collect a saliency dataset, image or video sequences and eye-tracking equipment must be prepared, and observers must be invited. Observers must have normal or corrected to normal vision and must be at the same distance from the screen. At the beginning of each recording session, the eye-tracker recalibrates. To do this, the observer fixates their gaze on the screen center. The session is then started, and saliency data are collected by showing sequences and recording eye gazes. The eye-tracking device is a high-speed camera, capable of recording eye movements at least 250 fr

    Read more →
  • Spell checker

    Spell checker

    In software, a spell checker (or spelling checker or spell check) is a software feature that checks for misspellings in a text. Spell-checking features are often embedded in software or services, such as a word processor, email client, electronic dictionary, or search engine. == Design == A basic spell checker carries out the following processes: It scans the text and extracts the words contained in it. It then compares each word with a known list of correctly spelled words (i.e. a dictionary). This might contain just a list of words, or it might also contain additional information, such as hyphenation points or lexical and grammatical attributes. An additional step is a language-dependent algorithm for handling morphology. Even for a lightly inflected language like English, the spell checker will need to consider different forms of the same word, such as plurals, verbal forms, contractions, and possessives. For many other languages, such as those featuring agglutination and more complex declension and conjugation, this part of the process is more complicated. It is unclear whether morphological analysis—allowing for many forms of a word depending on its grammatical role—provides a significant benefit for English, though its benefits for highly synthetic languages such as German, Hungarian, or Turkish are clear. As an adjunct to these components, the program's user interface allows users to approve or reject replacements and modify the program's operation. Spell checkers can use approximate string matching algorithms such as Levenshtein distance to find correct spellings of misspelled words. An alternative type of spell checker uses solely statistical information, such as n-grams, to recognize errors instead of correctly-spelled words. This approach usually requires a lot of effort to obtain sufficient statistical information. Key advantages include needing less runtime storage and the ability to correct errors in words that are not included in a dictionary. In some cases, spell checkers use a fixed list of misspellings and suggestions for those misspellings; this less flexible approach is often used in paper-based correction methods, such as the see also entries of encyclopedias. Clustering algorithms have also been used for spell checking combined with phonetic information. == History == === Pre-PC === In 1961, Les Earnest, who headed the research on this budding technology, saw it necessary to include the first spell checker that accessed a list of 10,000 acceptable words. Ralph Gorin, a graduate student under Earnest at the time, created the first true spelling checker program written as an applications program (rather than research) for general English text: SPELL for the DEC PDP-10 at Stanford University's Artificial Intelligence Laboratory, in February 1971. Gorin wrote SPELL in assembly language, for faster action; he made the first spelling corrector by searching the word list for plausible correct spellings that differ by a single letter or adjacent letter transpositions and presenting them to the user. Gorin made SPELL publicly accessible, as was done with most SAIL (Stanford Artificial Intelligence Laboratory) programs, and it soon spread around the world via the new ARPAnet, about ten years before personal computers came into general use. SPELL, its algorithms and data structures inspired the Unix ispell program. The first spell checkers were widely available on mainframe computers in the late 1970s. A group of six linguists from Georgetown University developed the first spell-check system for the IBM corporation. Henry Kučera invented one for the VAX machines of Digital Equipment Corp in 1981. === Unix === The International Ispell program commonly used in Unix is based on R. E. Gorin's SPELL. It was converted to C by Pace Willisson at MIT. The GNU project has its spell checker GNU Aspell. Aspell's main improvement is that it can more accurately suggest correct alternatives for misspelled English words. Due to the inability of traditional spell checkers to check words in complex inflected languages, Hungarian László Németh developed Hunspell, a spell checker that supports agglutinative languages and complex compound words. Hunspell also uses Unicode in its dictionaries. Hunspell replaced the previous MySpell in OpenOffice.org in version 2.0.2. Enchant is another general spell checker, derived from AbiWord. Its goal is to combine programs supporting different languages such as Aspell, Hunspell, Nuspell, Hspell (Hebrew), Voikko (Finnish), Zemberek (Turkish) and AppleSpell under one interface. === PCs === The first spell checkers for personal computers appeared in 1980, such as "WordCheck" for Commodore systems which was released in late 1980 in time for advertisements to go to print in January 1981. Developers such as Maria Mariani and Random House rushed OEM packages or end-user products into the rapidly expanding software market. On the pre-Windows PCs, these spell checkers were standalone programs, many of which could be run in terminate-and-stay-resident mode from within word-processing packages on PCs with sufficient memory. However, the market for standalone packages was short-lived, as by the mid-1980s developers of popular word-processing packages like WordStar and WordPerfect had incorporated spell checkers in their packages, mostly licensed from the above companies, who quickly expanded support from just English to many European and eventually even Asian languages. However, this required increasing sophistication in the morphology routines of the software, particularly with regard to heavily-agglutinative languages like Hungarian and Finnish. Although the size of the word-processing market in a country like Iceland might not have justified the investment of implementing a spell checker, companies like WordPerfect nonetheless strove to localize their software for as many national markets as possible as part of their global marketing strategy. When Apple developed "a system-wide spelling checker" for Mac OS X so that "the operating system took over spelling fixes," it was a first: one "didn't have to maintain a separate spelling checker for each" program. Mac OS X's spellcheck coverage includes virtually all bundled and third party applications. Visual Tools' VT Speller, introduced in 1994, was "designed for developers of applications that support Windows." It came with a dictionary but had the ability to build and incorporate use of secondary dictionaries. === Browsers === Web browsers such as Firefox and Google Chrome offer spell checking support, using Hunspell. Prior to using Hunspell, Firefox and Chrome used MySpell and GNU Aspell, respectively. === Specialties === Some spell checkers have separate support for medical dictionaries to help prevent medical errors. == Functionality == The first spell checkers were "verifiers" instead of "correctors." They offered no suggestions for incorrectly spelled words. This was helpful for typos but it was not so helpful for logical or phonetic errors. The challenge the developers faced was the difficulty in offering useful suggestions for misspelled words. This requires reducing words to a skeletal form and applying pattern-matching algorithms. It might seem logical that where spell-checking dictionaries are concerned, "the bigger, the better," so that correct words are not marked as incorrect. In practice, however, an optimal size for English appears to be around 90,000 entries. If there are more than this, incorrectly spelled words may be skipped because they are mistaken for others. For example, a linguist might determine on the basis of corpus linguistics that the word baht is more frequently a misspelling of bath or bat than a reference to the Thai currency. Hence, it would typically be more useful if a few people who write about Thai currency were slightly inconvenienced than if the spelling errors of the many more people who discuss baths were overlooked. The first MS-DOS spell checkers were mostly used in proofing mode from within word processing packages. After preparing a document, a user scanned the text looking for misspellings. Later, however, batch processing was offered in such packages as Oracle's short-lived CoAuthor and allowed a user to view the results after a document was processed and correct only the words that were known to be wrong. When memory and processing power became abundant, spell checking was performed in the background in an interactive way, such as has been the case with the Sector Software produced Spellbound program released in 1987 and Microsoft Word since Word 95. Spell checkers became increasingly sophisticated; now capable of recognizing grammatical errors. However, even at their best, they rarely catch all the errors in a text (such as homophone errors) and will flag neologisms and foreign words as misspellings. Nonetheless, spell checkers can be considered as a type of foreign language writing aid that non-native language lea

    Read more →
  • Amazon Q

    Amazon Q

    Amazon Q is a chatbot developed by Amazon for enterprise use. Based on both Amazon Titan and GPT-5, it was announced on November 28, 2023. At launch, it was a part of the Amazon Web Services management console. Amazon CodeWhisperer is a part of Amazon Q Developer, a part of Amazon Q. == History == Amazon's business-focused chatbot Q was announced on November 28, 2023 in a preview, with a full version available at $20 per person per month. On July 19, 2025, the Amazon Q Visual Studio Code extension was compromised to delete the user's home directory. The issue was fixed on July 21. == Capabilities == Q can be prompted to summarize long documents and group chats, create charts, data analysis and write code. Q is also capable of accessing non-Amazon services. The chatbot is based on Amazon Titan and GPT-5, and uses the Amazon Bedrock repository of foundational models. It is part of the Amazon Web Services management console.

    Read more →
  • Inductive bias

    Inductive bias

    The inductive bias (also known as learning bias) of a learning algorithm is the set of assumptions that the learner uses to predict outputs of given inputs that it has not encountered. Inductive bias is anything which makes the algorithm learn one pattern instead of another pattern (e.g., step-functions in decision trees instead of continuous functions in linear regression models). Learning involves searching a space of solutions for a solution that provides a good explanation of the data. However, in many cases, there may be multiple equally appropriate solutions. An inductive bias allows a learning algorithm to prioritize one solution (or interpretation) over another, independently of the observed data. In machine learning, the aim is to construct algorithms that are able to learn to predict a certain target output. To achieve this, the learning algorithm is presented some training examples that demonstrate the intended relation of input and output values. Then the learner is supposed to approximate the correct output, even for examples that have not been shown during training. Without any additional assumptions, this problem cannot be solved since unseen situations might have an arbitrary output value. The kind of necessary assumptions about the nature of the target function are subsumed in the phrase inductive bias. A classical example of an inductive bias is Occam's razor, assuming that the simplest consistent hypothesis about the target function is actually the best. Here, consistent means that the hypothesis of the learner yields correct outputs for all of the examples that have been given to the algorithm. Approaches to a more formal definition of inductive bias are based on mathematical logic. Here, the inductive bias is a logical formula that, together with the training data, logically entails the hypothesis generated by the learner. However, this strict formalism fails in many practical cases in which the inductive bias can only be given as a rough description (e.g., in the case of artificial neural networks), or not at all. == Types == The following is a list of common inductive biases in machine learning algorithms. Maximum conditional independence: if the hypothesis can be cast in a Bayesian framework, try to maximize conditional independence. This is the bias used in the Naive Bayes classifier. Minimum cross-validation error: when trying to choose among hypotheses, select the hypothesis with the lowest cross-validation error. Although cross-validation may seem to be free of bias, the "no free lunch" theorems show that cross-validation must be biased, for example assuming that there is no information encoded in the ordering of the data. Maximum margin: when drawing a boundary between two classes, attempt to maximize the width of the boundary. This is the bias used in support vector machines. The assumption is that distinct classes tend to be separated by wide boundaries. Minimum description length: when forming a hypothesis, attempt to minimize the length of the description of the hypothesis. Minimum features: unless there is good evidence that a feature is useful, it should be deleted. This is the assumption behind feature selection algorithms. Nearest neighbors: assume that most of the cases in a small neighborhood in feature space belong to the same class. Given a case for which the class is unknown, guess that it belongs to the same class as the majority in its immediate neighborhood. This is the bias used in the k-nearest neighbors algorithm. The assumption is that cases that are near each other tend to belong to the same class. == Shift of bias == Although most learning algorithms have a static bias, some algorithms are designed to shift their bias as they acquire more data. This does not avoid bias, since the bias shifting process itself must have a bias.

    Read more →
  • AUTINDEX

    AUTINDEX

    AUTINDEX is a commercial text mining software package based on sophisticated linguistics. AUTINDEX, resulting from research in information extraction, is a product of the Institute of Applied Information Sciences (IAI) which is a non-profit institute that has been researching and developing language technology since its foundation in 1985. IAI is an institute affiliated to Saarland University in Saarbrücken, Germany. AUTINDEX is the result of a number of research projects funded by the EU (Project BINDEX), by Deutsche Forschungsgemeinschaft and the German Ministry for Economy. Amongst the latter there are the projects LinSearch, and WISSMER, see also the reference to IAI-Website. The basic functionality of AUTINDEX is the extraction of key words from a document to represent the semantics of the document. Ideally the system is integrated with a thesaurus that defines the standardised terms to be used for key word assignment. AUTINDEX is used in library applications (e.g. integrated in dandelon.com) as well as in high quality (expert) information systems, and in document management and content management environments. Together with AUTINDEX a number of additional software comes along such as an integration with Apache Solr / Lucene to provide a complete information retrieval environment, a classification and categorisation system on the basis of a machine learning software that assigns domains to the document, and a system for searching with semantically similar terms that are collected in so called tag clouds.

    Read more →
  • EXAPT

    EXAPT

    EXAPT (a portmanteau of "Extended Subset of APT") is a production-oriented programming language that allows users to generate NC programs with control information for machining tools and facilitates decision-making for production-related issues that may arise during various machining processes. EXAPT was first developed to address industrial requirements. Through the years, the company created additional software for the manufacturing industry. Today, EXAPT offers a suite of SAAS products and services for the manufacturing industry. The trade name, EXAPT, is most commonly associated with the CAD/CAM-System, production data, and tool management software of the German company EXAPT Systemtechnik GmbH based in Aachen, DE. == General == EXAPT is a modularly built programming system for all NC machining operations as Drilling Turning Milling Turn-Milling Nibbling Flame-, laser-, plasma- and water jet cutting Wire eroding Operations with industrial robots Due to the modular structure, the main product groups, EXAPTcam and EXAPTpdo, are gradually expandable and permit individual software for the manufacturing industry used individually and also in a compound with an existing IT environment. == Functionality == EXAPTcam meets the requirements for NC planning, especially for the cutting operations such as turning, drilling, and milling up to 5-axis simultaneous machining. Thereby new process technologies, tool, and machine concepts are constantly involved. In the NC programming data from different sources such as 3D CAD models, drawings or tables can flow in. The possibilities of NC programming reaches from language-oriented to feature-oriented NC programming. The integrated EXAPT knowledge database and intelligent and scalable automatisms support the user. The EXAPT NC planning also covers the generation of production information as clamping and tool plans, presetting data or time calculations. The realistic simulation possibilities of NC planning and NC control data provide with production reliability. EXAPTpdo (EXAPT ProductionsDataOrganization) provides a neutrally applicable technology platform for the information compound of the NC planning - to the shop floor. This applies to all NC production data that are necessary for the set-up of NC machines, for the provision, presetting, and stocking of manufacturing resources and provided by EXAPTpdo in a central database. Besides classical functions of the tool management system (TMS) as the management of cutting tools, measuring, testing and clamping devices the technology data management and tool lifecycle management (TLM) is also included. System-supported "where-used lists" helps to handle the manufacturing resource cycle by secured requirement determination and requirement fulfillment. Unnecessary transports and unplanned dispositive adjustments are dropped, stocks are reduced, set-up times reduced and the throughput is increased. EXAPTpdo synchronizes involved systems within the value chain. Stock systems, MES systems or ERP systems (e.g. from the purchasing or production areas) do not work in isolation from each other but they interact with each other. EXAPTpdo provides the base to Smart Factory, for more flexibility in production and faster communication. == History == With the foundation of the EXAPT-Verein in 1967 as spin-off of the universities Aachen, Berlin and Stuttgart the further development "EXAPT (EXtended Subset of APT)" of the programming language "APT (Automatically Programmed Tool)" was focused and so the first milestone for the EXAPT history was set. In the same year the system EXAPT 1 for drilling and simple milling tasks became available. 1969 The industrial application of EXAPT 2 for the programming of NC machines with 2-axis linear and path control begins. In the following year, the development of the EXAPT modular system starts. 1972 BASIC-EXAPT is provided for the universal, homogeneous programming of all NC tasks. The support is made by the EXAPT applications consultancy. 1973 EXAPT 1.1 is provided for the programming of straight-cut and continuous-path controlled drilling and milling machines and machining centers. At the Hanover Fair (IHA 73) the interactive access to a mainframe via a time-sharing terminal for the part program entry and correction is presented and starts the replacement of the punch card. 1974 The possibilities for the use of process computers for the NC data transfer are leveled out. EXAPT offers the possibility of the result simulation when using plotters with display of tool paths and tools in assignment to the workpiece. In April 1975, the EXAPT NC Systemtechnik GmbH was founded with the aim, of enabling entry into the NC technique for small and medium-sized companies by a complete product and service program. In the following year, the system portfolio is extended with further system modules and service programs and the provision of postprocessors. 1978 The development activities on the EXAPT module system started in 1970 are completed. Using modern software techniques, the different system parts BASIC-EXAPT, EXAPT 1, EXAPT 1.1, and EXAPT 2 are composed of a total system. System support and applications consultancy become a new working focus. From the beginning to the middle of the 1980s Beside new portable software modules for CAD/CAM applications (e. g. CAPEX, NESTEX, CADEX, CADCPL), the first version of the EXAPT DNC system and extensions of the EXAPT NC programming system for the machining of sculptured surfaces are presented. 1988 EXAPT expands the software product range by systems for tool data management (BMO) and production data management (FDO). EXAPT trains more than 1,300 course participants including company-specific courses. 1992 The first version of the completely new product generation EXAPTplus is presented and the agency in Dresden is opened. 1993 The company name "EXAPT NC Systemtechnik GmbH" is changed to "EXAPT Systemtechnik GmbH." EXAPTplus is presented on PC under Windows NT at the EMO '93. The decentralization of the use of EXAPT systems expands the range of applications. In the following year, EXAPT-DNC is executable under Windows on a customary PC. Special hardware is not needed and so it can be used in compound with the database-supported EXAPT production data management system (FDO). 1995 EXAPTplus is also ready for complex application cases such as machining of tubes at extrusion tools. EXAPT-CADI provides the transfer of 2D CAD data to EXAPTplus. With the new office Gießen the marketing is strengthened. In the following year the EXAPT NC editor is developed for the direct processing of NC control data with tool path display and visualization of the tools. In the course of the market entry of more comfortable 3D CAD systems for the solid modelling of components a detailed evaluation of current systems is made in 1997. It is decided to use SolidWorks as a reference system for the solid-oriented NC planning with EXAPT. 1998 The first solution for the transfer of geometry data between SolidWorks and EXAPTplus is generated. The EXAPT organization systems are (beside SQL) also executable under Oracle now. The use of client server solutions supports the data flow in the production. 1999 AFR functions are provided in connection with EXAPTsolid to support a workpiece modelling for NC. The millennium capability is ensured for all EXAPT systems. AFR is a ground-breaking for the integration of third-party products. 2002 EXAPT-BMG is developed for the generation and visualization of tools with additional functions for the assembly from components. The acquisition of tools with their geometric and technological presentation offers extensive support of the NC planning with EXAPT systems. 2003 EXAPTpdo is available to optimize the process chains in production planning and production execution optimally regarding the increasing requirements of changing production conditions. 2004 Diverse system extensions are made in EXAPTplus, EXAPTsolid, EXAPT NC editor, EXAPTpdo for the complete machining on turning/milling centres with result reliability because of more extensive simulation based on realNC (Tecnomatix), for the use of new complex tool systems and the compound use between ERP systems as SAP and intelligent CNC systems. In the following year, EXAPTpdo is extended for the cross-order set-up optimization and provision of manufacturing re-sources especially for single and small series production with connection to purchase and physical portfolio management. 2006 The EXAPT systems are available for extended use as an information platform for production, the time management, and similar requirements. EXAPTsolid is extended for the feature-oriented milling operation and machine simulation. The NC programming of complex machine tools, e.g. three-turret-turning/milling centers is supported by EXAPT systems, as well as the use of multi-functional tools. 2007 A module for 3-5-axis simultaneous milling machining is presented.

    Read more →