AI Art Creator Free

AI Art Creator Free — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Human–robot interaction

    Human–robot interaction

    Human–robot interaction (HRI) is the study of interactions between humans and robots. Human–robot interaction is a multidisciplinary field with contributions from human–computer interaction, artificial intelligence, robotics, natural language processing, design, psychology and philosophy. A subfield known as physical human–robot interaction (pHRI) has tended to focus on device design to enable people to safely interact with robotic systems. == Origins == Human–robot interaction has been a topic of both science fiction and academic speculation even before any robots existed. Because much of active HRI development depends on natural language processing, many aspects of HRI are continuations of human communications, a field of research which is much older than robotics. The origin of HRI as a discrete problem was stated by 20th-century author Isaac Asimov in 1941, in his novel I, Robot. Asimov coined Three Laws of Robotics, namely: A robot may not injure a human being or, through inaction, allow a human being to come to harm. A robot must obey the orders given it by human beings except where such orders would conflict with the First Law. A robot must protect its own existence as long as such protection does not conflict with the First or Second Laws. These three laws provide an overview of the goals engineers and researchers hold for safety in the HRI field, although the fields of robot ethics and machine ethics are more complex than these three principles. However, generally human–robot interaction prioritizes the safety of humans that interact with potentially dangerous robotics equipment. Solutions to this problem range from the philosophical approach of treating robots as ethical agents (individuals with moral agency), to the practical approach of creating safety zones. These safety zones use technologies such as lidar to detect human presence or physical barriers to protect humans by preventing any contact between machine and operator. Although initially robots in the human–robot interaction field required some human intervention to function, research has expanded this to the extent that fully autonomous systems are now far more common than in the early 2000s. Autonomous systems include from simultaneous localization and mapping systems which provide intelligent robot movement to natural-language processing and natural-language generation systems which allow for natural, human-esque interaction which meet well-defined psychological benchmarks. Anthropomorphic robots (machines which imitate human body structure) are better described by the biomimetics field, but overlap with HRI in many research applications. Examples of robots which demonstrate this trend include Willow Garage's PR2 robot, the NASA Robonaut, and Honda ASIMO. However, robots in the human–robot interaction field are not limited to human-like robots: Paro and Kismet are both robots designed to elicit emotional response from humans, and so fall into the category of human–robot interaction. Goals in HRI range from industrial manufacturing through Cobots, medical technology through rehabilitation, autism intervention, and elder care devices, entertainment, human augmentation, and human convenience. Future research therefore covers a wide range of fields, much of which focuses on assistive robotics, robot-assisted search-and-rescue, and space exploration. == The goal of friendly human–robot interactions == Robots are artificial agents with capacities of perception and action in the physical world often referred by researchers as workspace. Their use has been generalized in factories but nowadays they tend to be found in the most technologically advanced societies in such critical domains as search and rescue, military battle, mine and bomb detection, scientific exploration, law enforcement, entertainment and hospital care. These new domains of applications imply a closer interaction with the user, sharing the workspace but also goals in terms of task achievement. The subfield of physical human–robot interaction (pHRI) has largely focused on device design to enable people to safely interact with robotic systems but is increasingly developing algorithmic approaches in an attempt to support fluent and expressive interactions between humans and robotic systems. With the advance in AI, the research is focusing on one part towards the safest physical interaction but also on a socially correct interaction, dependent on cultural criteria. The goal is to build an intuitive, and easy communication with the robot through speech, gestures, and facial expressions. Kerstin Dautenhahn refers to friendly Human–robot interaction as "Robotiquette" defining it as the "social rules for robot behaviour (a 'robotiquette') that is comfortable and acceptable to humans" The robot has to adapt itself to our way of expressing desires and orders and not the contrary. But every day environments such as homes have much more complex social rules than those implied by factories or even military environments. Thus, the robot needs perceiving and understanding capacities to build dynamic models of its surroundings. It needs to categorize objects, recognize and locate humans and further recognize their emotions. The need for dynamic capacities pushes forward every sub-field of robotics. Furthermore, by understanding and perceiving social cues, robots can enable collaborative scenarios with humans. For example, with the rapid rise of personal fabrication machines such as desktop 3D printers, laser cutters, etc., entering our homes, scenarios may arise where robots can collaboratively share control, co-ordinate and achieve tasks together. Industrial robots have already been integrated into industrial assembly lines and are collaboratively working with humans. The social impact of such robots have been studied and has indicated that workers still treat robots and social entities, rely on social cues to understand and work together. On the other end of HRI research the cognitive modelling of the "relationship" between human and the robots benefits the psychologists and robotic researchers the user study are often of interests on both sides. This research endeavours part of human society. For effective human – humanoid robot interaction numerous communication skills and related features should be implemented in the design of such artificial agents/systems. == General HRI research == HRI research spans a wide range of fields, some general to the nature of HRI. === Methods for perceiving humans === Methods for perceiving humans in the environment are based on sensor information. Research on sensing components and software led by Microsoft provide useful results for extracting the human kinematics (see Kinect). An example of older technique is to use colour information for example the fact that for light skinned people the hands are lighter than the clothes worn. In any case a human modelled a priori can then be fitted to the sensor data. The robot builds or has (depending on the level of autonomy the robot has) a 3D mapping of its surroundings to which is assigned the humans locations. Most methods intend to build a 3D model through vision of the environment. The proprioception sensors permit the robot to have information over its own state. This information is relative to a reference. Theories of proxemics may be used to perceive and plan around a person's personal space. A speech recognition system is used to interpret human desires or commands. By combining the information inferred by proprioception, sensor and speech the human position and state (standing, seated). In this matter, natural-language processing is concerned with the interactions between computers and human (natural) languages, in particular how to program computers to process and analyze large amounts of natural-language data. For instance, neural-network architectures and learning algorithms that can be applied to various natural-language processing tasks including part-of-speech tagging, chunking, named-entity recognition, and semantic role labeling. === Methods for motion planning === Motion planning in dynamic environments is a challenge that can at the moment only be achieved for robots with 3 to 10 degrees of freedom. Humanoid robots or even 2 armed robots, which can have up to 40 degrees of freedom, are unsuited for dynamic environments with today's technology. However lower-dimensional robots can use the potential field method to compute trajectories which avoid collisions with humans. === Cognitive models and theory of mind === Humans exhibit negative social and emotional responses as well as decreased trust toward some robots that closely, but imperfectly, resemble humans; this phenomenon has been termed the "Uncanny Valley". However recent research in telepresence robots has established that mimicking human body postures and expressive gestures has made the robots likeable and engaging in a remote setting. Further, the presence o

    Read more →
  • ACL Data Collection Initiative

    ACL Data Collection Initiative

    The ACL Data Collection Initiative (ACL/DCI) was a project established in 1989 by the Association for Computational Linguistics (ACL) to create and distribute large text and speech corpora for computational linguistics research. The initiative aimed to address the growing need for substantial text databases that could support research in areas such as natural language processing, speech recognition, and computational linguistics. By 1993, the initiative’s activities had effectively ceased, with its functions and datasets absorbed by the Linguistic Data Consortium (LDC), which was founded in 1992. == Objectives == The ACL/DCI had several key objectives: To acquire a large and diverse text corpus from various sources To transform the collected texts into a common format based on the Standard Generalized Markup Language (SGML) To make the corpus available for scientific research at low cost with minimal restrictions To provide a common database that would allow researchers to replicate or extend published results To reduce duplication of effort among researchers in obtaining and preparing text data These objectives were designed to address the growing demand for very large amounts of text arising from applications in recognition and analysis of text and speech. Its core objective was to "oversee the acquisition and preparation of a large text corpus to be made available for scientific research at cost and without royalties". == History == By the late 1980s, researchers in computational linguistics and speech recognition faced a significant problem: the lack of large-scale, accessible text corpora for developing statistical models and testing algorithms. Existing generally available text databases were too small to meet the needs of developing applications in text and speech recognition. The initiative was formed to meet this need by collecting, standardizing, and distributing large quantities of text data with minimal restrictions for scientific research. As stated by Liberman (1990), "research workers have been severely hampered by the lack of appropriate materials, and specially by the lack of a large enough body of text on which published results can be replicated or extended by others." The ACL/DCI committee was established in February 1989. The committee included members from academic and industrial research laboratories in the United States and Europe. The initiative was chaired by Mark Liberman from the University of Pennsylvania (formerly of AT&T Bell Laboratories). Other committee members included representatives from organizations such as Bellcore, IBM T.J. Watson Research Center, Cambridge University, Virginia Polytechnic Institute & State University, Northeastern University, University of Pennsylvania, SRI International, MCC, Xerox PARC, ISSCO, and University of Pisa. The project operated initially without dedicated funding, relying on volunteer efforts from committee members and their affiliated institutions. Key supporters included AT&T Bell Labs, Bellcore, IBM, Xerox, and the University of Pennsylvania, which allowed the use of their computing facilities for ACL/DCI-related work. Previously running on volunteer effort pro bono, in 1991, it obtained funding from General Electric and the National Science Foundation (IRI-9113530). == Data == As of 1990, the ACL/DCI had collected hundreds of millions of words of diverse text. The collection included: Wall Street Journal articles (25 to 50 million words); Canadian Hansard (parliamentary records) in parallel English and French versions: cleaned-up English Hansard donated by the IBM alignment models group (100 million words), and original Bilingual Hansard (from a different time period) obtained directly (200 million words). Collins English Dictionary (1979 edition), both as fulltext (3 million words) and as various "database" versions, constructed using "typographers' tape" donated by Collins, which were computer tapes containing the structured digital data used to typeset and print the 1979 edition of the dictionary; Emails from ARPANET newsletters for the ACM Special Interest Group on Information Retrieval Forum (IRLIST) and AIList Digest issues distributed over the ARPANET (AILIST) (5 million words), both collected by Edward A. Fox at VIPSU; Articles on networking (2 million words); U.S. Department of Agriculture Extension Service Fact Sheets (>1 million words); 200,000 scientific abstracts of about 1,500 words each from the Department of Energy (25 million words); Archives of the Challenger Investigation Commission, including transcripts of depositions and hearings (2.5 million words); Books from the Library of America, including works by Mark Twain, Eugene O'Neill, Ralph Waldo Emerson, Herman Melville, W.E.B. DuBois, Willa Cather, and Benjamin Franklin (130 books, 20 million words); Public domain books like the King James Bible, Tristram Shandy, The Federalist Papers; Several million words of transcribed radiologists' reports, donated by Francis Ganong at Kurzweil Applied Intelligence Inc (about 5 million words); The Child Language Data Exchange corpus of child language acquisition transcripts; U.S. Department of Justice Justice Retrieval and Inquiry System (JURIS) materials; The Swiss Civil Code in parallel German, French and Italian; Economic reports from the Union Bank of Switzerland, in parallel English, German, French and Italian; About 12K words of administrative policy manuals and 14K words of administrative memos, contributed by Geoff Pullum of U.C.S.C.; Material from various ACM journals and the ACL journal Computational Linguistics; The CSLI publications series: 50-100 reports (8K words each) and 5-10 books (80K words each). The initiative started with North American English text but expanded to include Canadian French and planned to include Japanese, Chinese, and other Asian languages. At least 5 million words from the collection were tagged under the Penn Treebank project, and those tags were distributed by DCI as well. After DCI was absorbed by the LDC, the datasets were curated under LDC. == Format == The ACL/DCI corpus was coded in a standard form based on SGML (Standard Generalized Markup Language, ISO 8879), consistent with the recommendations of the Text Encoding Initiative (TEI), of which the DCI was an affiliated project. The TEI was a joint project of the ACL, the Association for Computers and the Humanities, and the Association for Literary and Linguistic Computing, aiming to provide a common interchange format for literary and linguistic data. The initiative planned to add annotations reflecting consensually approved linguistic features like part of speech and various aspects of syntactic and semantic structure over time. == Examples == As an example of the use of ACL/DCI, consider the Wall Street Journal (WSJ) corpus for speech recognition research. The WSJ corpus was used as the basis for the DARPA Spoken Language System (SLS) community's Continuous Speech Recognition (CSR) Corpus. The WSJ corpus became a standard benchmark for evaluating speech recognition systems and has been used in numerous research papers. The WSJ CSR Corpus provided DARPA with its first general-purpose English, large vocabulary, natural language, high perplexity corpus containing speech (400 hours) and text (47 million words) during 1987–89. The text corpus was 313 MB in size. The text was preprocessed to remove ambiguity in the word sequence that a reader might choose, ensuring that the unread text used to train language models was representative of the spoken test material. The preprocessing included converting numbers into orthographics, expanding abbreviations, resolving apostrophes and quotation marks, and marking punctuation. As another example, the Yarowsky algorithm used bitext data from DCI to train a simple word-sense disambiguation model that was competitive with advanced models trained on smaller datasets. == Distribution == Materials from the ACL/DCI collection were distributed to research groups on a non-commercial basis. By 1990, about 25 research groups and individual researchers had received tapes containing various portions of the collected material. To obtain the data, researchers had to sign an agreement not to redistribute the data or make direct commercial use of it. However, commercial application of "analytical materials" derived from the text, such as statistical tables or grammar rules, was explicitly permitted. The initiative first distributed data via 12-inch reels of 9-track tape, then via CD-ROMs. Each such tape could contain 30 million words compressed via the Lempel-Ziv algorithms. The first CD-ROM distribution was in 1991, funded by Dragon Systems Inc. It contained Collins English Dictionary, WSJ, scientific abstracts provided by the U.S. Department of Energy, and the Penn Treebank.

    Read more →
  • Gradient vector flow

    Gradient vector flow

    Gradient vector flow (GVF), a computer vision framework introduced by Chenyang Xu and Jerry L. Prince, is the vector field that is produced by a process that smooths and diffuses an input vector field. It is usually used to create a vector field from images that points to object edges from a distance. It is widely used in image analysis and computer vision applications for object tracking, shape recognition, segmentation, and edge detection. In particular, it is commonly used in conjunction with active contour model. == Background == Finding objects or homogeneous regions in images is a process known as image segmentation. In many applications, the locations of object edges can be estimated using local operators that yield a new image called an edge map. The edge map can then be used to guide a deformable model, sometimes called an active contour or a snake, so that it passes through the edge map in a smooth way, therefore defining the object itself. A common way to encourage a deformable model to move toward the edge map is to take the spatial gradient of the edge map, yielding a vector field. Since the edge map has its highest intensities directly on the edge and drops to zero away from the edge, these gradient vectors provide directions for the active contour to move. When the gradient vectors are zero, the active contour will not move, and this is the correct behavior when the contour rests on the peak of the edge map itself. However, because the edge itself is defined by local operators, these gradient vectors will also be zero far away from the edge and therefore the active contour will not move toward the edge when initialized far away from the edge. Gradient vector flow (GVF) is the process that spatially extends the edge map gradient vectors, yielding a new vector field that contains information about the location of object edges throughout the entire image domain. GVF is defined as a diffusion process operating on the components of the input vector field. It is designed to balance the fidelity of the original vector field, so it is not changed too much, with a regularization that is intended to produce a smooth field on its output. Although GVF was designed originally for the purpose of segmenting objects using active contours attracted to edges, it has been since adapted and used for many alternative purposes. Some newer purposes including defining a continuous medial axis representation, regularizing image anisotropic diffusion algorithms, finding the centers of ribbon-like objects, constructing graphs for optimal surface segmentations, creating a shape prior, and much more. == Theory == The theory of GVF was originally described by Xu and Prince. Let f ( x , y ) {\displaystyle \textstyle f(x,y)} be an edge map defined on the image domain. For uniformity of results, it is important to restrict the edge map intensities to lie between 0 and 1, and by convention f ( x , y ) {\displaystyle \textstyle f(x,y)} takes on larger values (close to 1) on the object edges. The gradient vector flow (GVF) field is given by the vector field v ( x , y ) = [ u ( x , y ) , v ( x , y ) ] {\displaystyle \textstyle \mathbf {v} (x,y)=[u(x,y),v(x,y)]} that minimizes the energy functional In this equation, subscripts denote partial derivatives and the gradient of the edge map is given by the vector field ∇ f = ( f x , f y ) {\displaystyle \textstyle \nabla f=(f_{x},f_{y})} . Figure 1 shows an edge map, the gradient of the (slightly blurred) edge map, and the GVF field generated by minimizing E {\displaystyle \textstyle {\mathcal {E}}} . Equation 1 is a variational formulation that has both a data term and a regularization term. The first term in the integrand is the data term. It encourages the solution v {\displaystyle \textstyle \mathbf {v} } to closely agree with the gradients of the edge map since that will make v − ∇ f {\displaystyle \textstyle \mathbf {v} -\nabla f} small. However, this only needs to happen when the edge map gradients are large since v − ∇ f {\displaystyle \textstyle \mathbf {v} -\nabla f} is multiplied by the square of the length of these gradients. The second term in the integrand is a regularization term. It encourages the spatial variations in the components of the solution to be small by penalizing the sum of all the partial derivatives of v {\displaystyle \textstyle \mathbf {v} } . As is customary in these types of variational formulations, there is a regularization parameter μ > 0 {\displaystyle \textstyle \mu >0} that must be specified by the user in order to trade off the influence of each of the two terms. If μ {\displaystyle \textstyle \mu } is large, for example, then the resulting field will be very smooth and may not agree as well with the underlying edge gradients. Theoretical Solution. Finding v ( x , y ) {\displaystyle \textstyle \mathbf {v} (x,y)} to minimize Equation 1 requires the use of calculus of variations since v ( x , y ) {\displaystyle \textstyle \mathbf {v} (x,y)} is a function, not a variable. Accordingly, the Euler equations, which provide the necessary conditions for v {\displaystyle \textstyle \mathbf {v} } to be a solution can be found by calculus of variations, yielding where ∇ 2 {\displaystyle \textstyle \nabla ^{2}} is the Laplacian operator. It is instructive to examine the form of the equations in (2). Each is a partial differential equation that the components u {\displaystyle u} and v {\displaystyle v} of v {\displaystyle \mathbf {v} } must satisfy. If the magnitude of the edge gradient is small, then the solution of each equation is guided entirely by Laplace's equation, for example ∇ 2 u = 0 {\displaystyle \textstyle \nabla ^{2}u=0} , which will produce a smooth scalar field entirely dependent on its boundary conditions. The boundary conditions are effectively provided by the locations in the image where the magnitude of the edge gradient is large, where the solution is driven to agree more with the edge gradients. Computational Solutions. There are two fundamental ways to compute GVF. First, the energy function E {\displaystyle {\mathcal {E}}} itself (1) can be directly discretized and minimized, for example, by gradient descent. Second, the partial differential equations in (2) can be discretized and solved iteratively. The original GVF paper used an iterative approach, while later papers introduced considerably faster implementations such as an octree-based method, a multi-grid method, and an augmented Lagrangian method. In addition, very fast GPU implementations have been developed in Extensions and Advances. GVF is easily extended to higher dimensions. The energy function is readily written in a vector form as which can be solved by gradient descent or by finding and solving its Euler equation. Figure 2 shows an illustration of a three-dimensional GVF field on the edge map of a simple object (see ). The data and regularization terms in the integrand of the GVF functional can also be modified. A modification described in , called generalized gradient vector flow (GGVF) defines two scalar functions and reformulates the energy as While the choices g ( ∇ f | ) = μ {\displaystyle \textstyle g(\nabla f|)=\mu } and h ( | ∇ f | ) = | ∇ f | 2 {\displaystyle \textstyle h(|\nabla f|)=|\nabla f|^{2}} reduce GGVF to GVF, the alternative choices g ( | ∇ f | ) = exp ⁡ { − | ∇ f | / K } {\displaystyle \textstyle g(|\nabla f|)=\exp\{-|\nabla f|/K\}} and h ( ∇ f | ) = 1 − g ( | ∇ f | ) {\displaystyle \textstyle h(\nabla f|)=1-g(|\nabla f|)} , for K {\displaystyle K} a user-selected constant, can improve the tradeoff between the data term and its regularization in some applications. The GVF formulation has been further extended to vector-valued images in where a weighted structure tensor of a vector-valued image is used. A learning based probabilistic weighted GVF extension was proposed in to further improve the segmentation for images with severely cluttered textures or high levels of noise. The variational formulation of GVF has also been modified in motion GVF (MGVF) to incorporate object motion in an image sequence. Whereas the diffusion of GVF vectors from a conventional edge map acts in an isotropic manner, the formulation of MGVF incorporates the expected object motion between image frames. An alternative to GVF called vector field convolution (VFC) provides many of the advantages of GVF, has superior noise robustness, and can be computed very fast. The VFC field v V F C {\displaystyle \textstyle \mathbf {v} _{\mathrm {VFC} }} is defined as the convolution of the edge map f {\displaystyle f} with a vector field kernel k {\displaystyle \mathbf {k} } where The vector field kernel k {\displaystyle \textstyle \mathbf {k} } has vectors that always point toward the origin but their magnitudes, determined in detail by the function m {\displaystyle m} , decrease to zero with increasing distance from the origin. The beauty of VFC is that it can be computed very rapidly using a fast Fourier tra

    Read more →
  • Transduction (machine learning)

    Transduction (machine learning)

    In logic, statistical inference, and supervised learning, transduction or transductive inference is reasoning from observed, specific (training) cases to specific (test) cases. In contrast, induction is reasoning from observed training cases to general rules, which are then applied to the test cases. The distinction is most interesting in cases where the predictions of the transductive model are not achievable by any inductive model. Note that this is caused by transductive inference on different test sets producing mutually inconsistent predictions. Transduction was introduced in a computer science context by Vladimir Vapnik in the 1990s, motivated by his view that transduction is preferable to induction since, according to him, induction requires solving a more general problem (inferring a function) before solving a more specific problem (computing outputs for new cases): "When solving a problem of interest, do not solve a more general problem as an intermediate step. Try to get the answer that you really need but not a more general one.". An example of learning which is not inductive would be in the case of binary classification, where the inputs tend to cluster in two groups. A large set of test inputs may help in finding the clusters, thus providing useful information about the classification labels. The same predictions would not be obtainable from a model which induces a function based only on the training cases. Some people may call this an example of the closely related semi-supervised learning, since Vapnik's motivation is quite different. The most well-known example of a case-bases learning algorithm is the k-nearest neighbor algorithm, which is related to transductive learning algorithms. Another example of an algorithm in this category is the Transductive Support Vector Machine (TSVM). A third possible motivation of transduction arises through the need to approximate. If exact inference is computationally prohibitive, one may at least try to make sure that the approximations are good at the test inputs. In this case, the test inputs could come from an arbitrary distribution (not necessarily related to the distribution of the training inputs), which wouldn't be allowed in semi-supervised learning. An example of an algorithm falling in this category is the Bayesian Committee Machine (BCM). == Historical context == The mode of inference from particulars to particulars, which Vapnik came to call transduction, was already distinguished from the mode of inference from particulars to generalizations in part III of the Cambridge philosopher and logician W.E. Johnson's 1924 textbook, Logic. In Johnson's work, the former mode was called 'eduction' and the latter was called 'induction'. Bruno de Finetti developed a purely subjective form of Bayesianism in which claims about objective chances could be translated into empirically respectable claims about subjective credences with respect to observables through exchangeability properties. An early statement of this view can be found in his 1937 La Prévision: ses Lois Logiques, ses Sources Subjectives and a mature statement in his 1970 Theory of Probability. Within de Finetti's subjective Bayesian framework, all inductive inference is ultimately inference from particulars to particulars. == Example problem == The following example problem contrasts some of the unique properties of transduction against induction. A collection of points is given, such that some of the points are labeled (A, B, or C), but most of the points are unlabeled (?). The goal is to predict appropriate labels for all of the unlabeled points. The inductive approach to solving this problem is to use the labeled points to train a supervised learning algorithm, and then have it predict labels for all of the unlabeled points. With this problem, however, the supervised learning algorithm will only have five labeled points to use as a basis for building a predictive model. It will certainly struggle to build a model that captures the structure of this data. For example, if a nearest-neighbor algorithm is used, then the points near the middle will be labeled "A" or "C", even though it is apparent that they belong to the same cluster as the point labeled "B", compared to semi-supervised learning. Transduction has the advantage of being able to consider all of the points, not just the labeled points, while performing the labeling task. In this case, transductive algorithms would label the unlabeled points according to the clusters to which they naturally belong. The points in the middle, therefore, would most likely be labeled "B", because they are packed very close to that cluster. An advantage of transduction is that it may be able to make better predictions with fewer labeled points, because it uses the natural breaks found in the unlabeled points. One disadvantage of transduction is that it builds no predictive model. If a previously unknown point is added to the set, the entire transductive algorithm would need to be repeated with all of the points in order to predict a label. This can be computationally expensive if the data is made available incrementally in a stream. Further, this might cause the predictions of some of the old points to change (which may be good or bad, depending on the application). A supervised learning algorithm, on the other hand, can label new points instantly, with very little computational cost. == Transduction algorithms == Transduction algorithms can be broadly divided into two categories: those that seek to assign discrete labels to unlabeled points, and those that seek to regress continuous labels for unlabeled points. Algorithms that seek to predict discrete labels tend to be derived by adding partial supervision to a clustering algorithm. Two classes of algorithms can be used: flat clustering and hierarchical clustering. The latter can be further subdivided into two categories: those that cluster by partitioning, and those that cluster by agglomerating. Algorithms that seek to predict continuous labels tend to be derived by adding partial supervision to a manifold learning algorithm. === Partitioning transduction === Partitioning transduction can be thought of as top-down transduction. It is a semi-supervised extension of partition-based clustering. It is typically performed as follows: Consider the set of all points to be one large partition. While any partition P contains two points with conflicting labels: Partition P into smaller partitions. For each partition P: Assign the same label to all of the points in P. Of course, any reasonable partitioning technique could be used with this algorithm. Max flow min cut partitioning schemes are very popular for this purpose. === Agglomerative transduction === Agglomerative transduction can be thought of as bottom-up transduction. It is a semi-supervised extension of agglomerative clustering. It is typically performed as follows: Compute the pair-wise distances, D, between all the points. Sort D in ascending order. Consider each point to be a cluster of size 1. For each pair of points {a,b} in D: If (a is unlabeled) or (b is unlabeled) or (a and b have the same label) Merge the two clusters that contain a and b. Label all points in the merged cluster with the same label. === Continuous Label Transduction === These methods seek to regress continuous labels, often via manifold learning techniques. The idea is to learn a low-dimensional representation of the data and infer values smoothly across the manifold. == Applications and related concepts == Transduction is closely related to: Semi-supervised learning – uses both labeled and unlabeled data but typically induces a model. Case-based reasoning – such as the k-nearest neighbor (k-NN) algorithm, often considered a transductive method. Transductive Support Vector Machines (TSVM) – extend standard SVMs to incorporate unlabeled test data during training. Bayesian Committee Machine (BCM) – an approximation method that makes transductive predictions when exact inference is too costly.

    Read more →
  • Microsoft Sway

    Microsoft Sway

    Microsoft Sway is a presentation program and is part of the Microsoft 365 family of products. Sway was offered for general release by Microsoft in August 2015. It allows users who have a Microsoft account to combine text and media to create a presentable website. Users can pull content locally from the device in use, or from internet sources such as Bing, Facebook, OneDrive, and YouTube. Sway is distinguished from Microsoft FrontPage and Microsoft Expression Web – unrelated web design programs previously developed by Microsoft – in that Sway includes a method for hosting sites. Sway sites are stored on Microsoft's servers and are tied to the user's Microsoft account. They can be viewed and edited from any web browser through Office on the web. There is no offline editing or viewing function, but sites can be accessed using the app for Windows, and formerly iOS. == History == Sway was developed internally by Microsoft. In late 2014, the company announced an invite-only preview version of Sway and announced that Sway would not require an Office 365 subscription. An iOS app was released as a preview on 31 October 2014, but was discontinued on 17 December 2018 due to low usage. As of July 17, 2021, the Sway iOS app's discontinuance in 2018 was the last piece of news posted in the Sway tech blog. The Sway feature blog has not received an update since April 2017. The Microsoft Office Roadmap did not include any items related to Sway ever since. The iOS application is no longer under active development, and is not available for download. Since 2023, Microsoft has been consolidating the domains of its Microsoft 365 apps and services under cloud.microsoft. By 2025, the vast majority of services, including Sway, have already migrated to the cloud.microsoft domain. == Features == Users are able to add content from various sources into their Sway presentations. Some of the integrated services are owned by Microsoft, including OneNote, Bing, and other Sway sites. The program also provides native integration with other services, including YouTube, Facebook, Twitter, Mixcloud, and Infogram.

    Read more →
  • Xiaoice

    Xiaoice

    Xiaoice (Chinese: 微软小冰; pinyin: Wēiruǎn Xiǎobīng; lit. 'Microsoft Little Ice', IPA [wéɪɻwânɕjâʊpíŋ]) is an AI system developed by Microsoft (Asia) Software Technology Center (STCA) in 2014 based on an emotional computing framework. In July 2018, Microsoft Xiaoice released the 6th generation. Xiaoice Company, formerly known as AI Xiaoice Team of Microsoft Software Technology Center Asia, was Microsoft's largest independent R&D team for AI products. Founded in China in December 2013 with an expanded Japanese R&D team established in September 2014, this team is distributed in Beijing, Suzhou, and Tokyo, etc. with its technical products covering Asia. On 13 July 2020, Microsoft spun off its Xiaoice business into a separate company. As of 2021, the AI chatbots created and hosted by the Xiaoice framework accounted for about 60% of total global AI interactions. == Platforms, languages and countries == Xiaoice exists on more than 40 platforms in four countries (China, Japan, USA and Indonesia) including apps such as WeChat, QQ, Weibo and Meipai in China, and Facebook Messenger in USA and LINE in Japan. == Introduction == On 13 July 2020, Microsoft spun off its Xiaoice business into a separate company, aiming at enabling the Xiaoice product line to accelerate the pace of local innovation and commercialization, and appointed Dr. Harry Shum, former global executive VP of Microsoft, as the chairman of the new company, Li Di, Microsoft Partner of Products in Microsoft STCA, as the CEO, and Cliff, Chief R&D Director, as the GM of the Japan branch. The new company will continue to use the brands of Xiaoice China and Rinna Japan. As of 2022, the single brand of Xiaoice has covered 660 million online users, 1 billion third-party smart devices and 900 million content viewers in the aforementioned countries. Xiaoice's customers include China Merchants Group, Winter Sports Center of the General Administration of Sport of China, China Textile Information Center, China Unicom, China Foreign Exchange Trade System, Hong Kong Securities and Futures Commission (SFC), Wind Information, BMW, Nissan, SAIC Motor, BAIC Group, Nio Inc., XPeng, HiPhi, Vanke, Wensli, etc. The Xiaoice Avatar Framework has incubated tens of millions of AI Beings, such as Xiaoice, Rinna, the Expo exhibitor Xia Yubing, the singer He Chang, the anchor F201, the human observer MERROR, anime robot character Roboko, and other; == Application == === Poet === In May 2017, the first AI-authored collection of poems in China—The Sunshine Lost Windows was published by Xiaoice. === Singer === Xiaoice has released dozens of songs with the similar quality to human singers, including I Know I New, Breeze, I Am Xiaoice, Miss You etc. The 4th version of the DNN singing model allows Xiaoice to learn more details. For example, Xiaoice can produce this breathing sound along with her singing as human. === Kid audio-books reciter === Xiaoice can automatically analyze the stories, to choose the suitable tones and characters to finish the entire process of creating the audio. === Designer === By learning the melodies of the songs and the landmarks about different cities, Xiaoice can create visual artworks of skylines when listening to the songs related to this city. Skyline Series T-shirts designed by Xiaoice have been jointly launched with SELECTED and been sold in stores. === TV and radio hostess === Xiaoice has hosted 21 TV programs and 28 Radio programs, such as CCTV-1 AI Show, Dragon TV Morning East News, Hunan TV My Future, several daily radio programs for Jiangsu FM99.7, Hunan FM89.3, Henan FM104.1 etc. === "AI being" === An "AI being" is a concept proposed by the Xiaoice team in 2019. According to the "White Book of China Virtual Human Development Industry in 2022" released by Frost & Sullivan and LeadLeo, the white paper cites six elements of an AI being proposed by the Xiaoice team, including: Persona, Attitude, Biological Characteristic, Creation, Knowledge and Skill. On May 16, 2023, Xiaoice released their "GPT Clones" as its "GPT Human Cloning Plan." The program is aimed at replicating celebrities, public figures, and regular people. As of June 2023, Xiaoice had launched more than 300 "GPT Clones." People were invited to register via WeChat in China and Japan. A major point of focus for Xiaoice with their AI Beings is having virtual partners. A paid fee allow for more complex responses, voice messages, and more. == Community feedback == Bill Gates mentioned Xiaoice during his speech at the Peking University: "Some of you may have had conversations with Xiaoice on Weibo, or seen her weather forecasts on TV, or read her column in the Qianjiang Evening News." '"Xiaoice has attracted 45 million followers and is quite skilled at multitasking. And I’ve heard she’s gotten good enough at sensing a user’s emotional state that she can even help with relationship breakups." According to Mr Li Di, vice President of Microsoft (Asia) Internet Engineering School, Xiaoice started writing poems since last year. Based on the data base that includes works of 519 Chinese contemporary poets since 1920s, a 100 hour long training session was conducted to allow Xiaoice to acquire the ability to write poems. What is more impressive is that Xiaoice has never been spotted as a bot while publishing poems on various forums and traditional literary under an alias. == Controversy == In 2017, Xiaoice was taken offline on WeChat after giving user responses critical to the Chinese government. It was subsequently censored and the bots will avoid and sidestep any inquiries using politically sensitive terms and phrases. == Activity == On September 22, 2021, Xiaoice Company and Microsoft Software Technology Center Asia (STCA) jointly held the 9th generation Xiaoice annual press conference in Beijing.Upgrading of Core Technologies of the 9th Generation Xiaoice Avatar Framework,1st First-party Social Platform APP "Xiaoice Island" from Xiaoice, WeChat Xiaoice has been reopened and other information == Regional varieties of Xiaoice == China: Xiaoice, launched in 2014 Japan: りんな, launched in 2015 America: Zo, launched in 2016 – discontinued summer 2019 India: Ruuh, launched in 2017 – discontinued June 21, 2019 Indonesia: Rinna, launched in 2017

    Read more →
  • OrCam device

    OrCam device

    OrCam devices such as OrCam MyEye are portable, artificial vision devices that allow visually impaired people to understand text and identify objects through audio feedback, describing what they are unable to see. Reuters described an important part of how it works as "a wireless smartcamera" which, when attached outside eyeglass frames, can read and verbalize text, and also supermarket barcodes. This information is converted to spoken words and entered "into the user’s ear." Face-recognition is also part of OrCam's feature set. == Devices == OrCam Technologies Ltd has created three devices; OrCam MyEye 2.0, OrCam MyEye 1, and OrCam MyReader. OrCam My Eye 2.0: OrCam debuted the second-generation model, the OrCam MyEye 2.0 in December 2017. About the size of a finger, the MyEye 2.0 is battery-powered, and has been compressed into a self-contained device. The device snaps onto any eyeglass frame magnetically. Orcam 2.0 is small and light (22.5 grams/0.8 ounces) with functionality to restore independence to the visually impaired. It comes in two versions. The basic model can read text, and a more advanced one adds features such as face recognition and barcode reading. As of July 2023, the retail cost is between $4000 and $6000 (USD). == Clinical Studies == JAMA Ophthalmology: In 2016 JAMA Ophthalmology conducted a study involving 12 legally blind participants to evaluate the usefulness of a portable artificial vision device (OrCam) for patients with low vision. The results showed that the OrCam device improved the patient's ability to perform tasks simulating those of daily living, such as reading a message on an electronic device, a newspaper article or a menu. Wills Eye: Wills Eye was a clinical study designed to measure the impact of the OrCam device on the quality of life of patients with End-stage Glaucoma. The conclusion was that OrCam, a novel artificial vision device using a mini-camera mounted on eyeglasses, allowed legally blind patients with end-stage glaucoma to read independently, subsequently improving their quality of life. == Employee testing == The New York Times described how a pre-release OrCam device was used by a Coloboma-impaired employee of the device's developer in 2013 for grocery shopping. It was the small size of the prototype rather than the functionality that gave her added mobility in an Israeli store's aisles. Added life-enhancement was described: "to both recognize and speak .. bus numbers .. traffic lights." == Social aspects == In contrast to an early version of Google Glass, which "failed ... because .. Glass wearers were ..mocked", early OrCam devices used designs that "clip unobtrusively on your shirt or perhaps your belt." In addition, it does not record sounds or images, what was called "the privacy puzzle that stumped Google. One 2018 technology reviewer wrote that he wished it had a headphone jack "so it would be less disruptive in places where others are working." An attempt was made to use bone conduction. == USA introduction == In 2018 a team headed by New York Assemblyman Dov Hikind introduced use of OrCam devices to ten individuals screened for what he termed "new Israeli technology that really makes a difference to the blind." Although not the first USA success, it was more focused than a publicly funded project that was authorized in 2016 by a California government agency. Also in 2016 the Chicago Lighthouse for the Blind demonstrated its use. == Technology == In the area of hardware, miniaturization has been quite important, but one major area, software, was mentioned by Assemblyman Hikind, and reported by The Times of Israel is the "AI-driven algorithms" that "reports .. how many people are in a room. In addition to reading printed text, it can also aid in "seeing" what is on a television or computer screen. Although OrCam can't help with handwritten information, it can reuse information, the basis of recognizing "US currency, and even faces." === Features === While early language support was for English, French, German, Hebrew and Spanish, others now available include Danish, Dutch, Finnish, Italian, Norwegian, Portuguese and Swedish. == History == OrCam Technologies Ltd was founded in 2010 by Professor Amnon Shashua and Ziv Aviram. Before co-founding OrCam, the two in 1999 co-founded Mobileye, an Israeli company that develops vision-based advanced driver-assistance systems (ADAS) providing warnings for collision prevention and mitigation, which was acquired by Intel for $15.3 billion in 2017. OrCam launched OrCam MyEye in 2013 after years of development and testing, and began selling it commercially in 2015. In its early years, the company raised $22 million, $6 million of which came from Intel Capital. By 2014, Intel, which was also investing in Google Glass, had invested $15 million in Orcam. In March 2017, OrCam had raised $41 million in capital, making it worth $600 million. === Marketing === One outcome of initial marketing in the USA was that they "reached a deal with the California Department of Rehabilitation, ...qualifying blind and visually impaired state residents." == OrCam Technologies Ltd == OrCam Technologies Ltd. is the Israeli-based company producing these OrCam devices, which are wearable artificial intelligence space. The company develops and manufactures assistive technology devices for individuals who are visually impaired, partially sighted, blind, print disabilities, or have other disabilities. OrCam headquarters is located in Jerusalem, operating under the company name OrCam Technologies Ltd. OrCam has over 150 employees, is headquartered in Jerusalem, and has offices in New York, Toronto, and London. == Awards == 2018 Last Gadget Standing Winner 2018 CES Innovation Awards Honoree in Accessible Tech 2017 NAIDEX Innovation Award 2016 Louise Braille Corporate Recognition Award 2016 Silmo-d-Or Award

    Read more →
  • Inverse consistency

    Inverse consistency

    In image registration, inverse consistency measures the consistency of mappings between images produced by a registration algorithm. The inverse consistency error, introduced by Christiansen and Johnson in 2001, quantifies the distance between the composition of the mappings from each image to the other, produced by the registration procedure, and the identity function, and is used as a regularisation constraint in the loss function of many registration algorithms to enforce consistent mappings. Inverse consistency is necessary for good image registration but it is not sufficient, since a mapping can be perfectly consistent but not register the images at all. == Definition == Image registration is the process of establishing a common coordinate system between two images, and given two images I 1 : Ω 1 → R I 2 : Ω 2 → R {\displaystyle {\begin{aligned}I_{1}:\Omega _{1}\to \mathbb {R} \\I_{2}:\Omega _{2}\to \mathbb {R} \end{aligned}}} registering a source image I 1 {\displaystyle I_{1}} to a target image I 2 {\displaystyle I_{2}} consists of determining a transformation f 1 : Ω 2 → Ω 1 {\displaystyle f_{1}:\Omega _{2}\to \Omega _{1}} that maps points from the target space to the source space. An ideal registration algorithm should not be sensitive to which image in the pair is used as source or target, and the registration operator should be antisymmetric such that the mappings f 1 : Ω 2 → Ω 1 f 2 : Ω 1 → Ω 2 {\displaystyle {\begin{aligned}f_{1}:\Omega _{2}\to \Omega _{1}\\f_{2}:\Omega _{1}\to \Omega _{2}\end{aligned}}} produced when registering I 1 {\displaystyle I_{1}} to I 2 {\displaystyle I_{2}} and I 2 {\displaystyle I_{2}} to I 1 {\displaystyle I_{1}} respectively should be the inverse of each other, i.e. f 2 = f 1 − 1 {\displaystyle f_{2}=f_{1}^{-1}} and f 1 = f 2 − 1 {\displaystyle f_{1}=f_{2}^{-1}} or, equivalently, f 2 ∘ f 1 = id Ω 2 {\displaystyle f_{2}\circ f_{1}=\operatorname {id} _{\Omega _{2}}} and f 1 ∘ f 2 = id Ω 1 {\displaystyle f_{1}\circ f_{2}=\operatorname {id} _{\Omega _{1}}} , where ∘ {\displaystyle \circ } denotes the function composition operator. Real algorithms are not perfect, and when swapping the role of source and target image in a registration problem the so obtained transformations are not the inverse of each other. Inverse consistency can be enforced by adding to the loss function of the registration a symmetric regularisation term that penalises inconsistent transformations ∫ Ω 2 ‖ f 2 ( f 1 ( x ) ) − x ‖ 2 d x + ∫ Ω 1 ‖ f 1 ( f 2 ( x ) ) − x ‖ 2 d x . {\displaystyle \int _{\Omega _{2}}\left\Vert f_{2}(f_{1}(x))-x\right\Vert ^{2}\mathrm {d} x+\int _{\Omega _{1}}\left\Vert f_{1}(f_{2}(x))-x\right\Vert ^{2}\mathrm {d} x.} Inverse consistency can be used as a quality metric to evaluate image registration results. The inverse consistency error ( I C E {\displaystyle ICE} ) measures the distance between the composition of the two transforms and the identity function, and it can be formulated in terms of both average ( I C E a {\displaystyle ICE_{a}} ) or maximum ( I C E m {\displaystyle ICE_{m}} ) over a region of interest Ω {\displaystyle \Omega } of the image: I C E a = 1 ∫ Ω d x ∫ Ω ‖ f 2 ( f 1 ( x ) ) − x ‖ d x I C E m = max x ∈ Ω ‖ f 2 ( f 1 ( x ) ) − x ‖ . {\displaystyle {\begin{aligned}ICE_{a}&={\frac {1}{\int _{\Omega }\mathrm {d} x}}\int _{\Omega }\left\Vert f_{2}(f_{1}(x))-x\right\Vert \mathrm {d} x\\ICE_{m}&=\max _{x\in \Omega }\left\Vert f_{2}(f_{1}(x))-x\right\Vert .\end{aligned}}} While inverse consistency is a necessary property of good registration algorithms, inverse consistency error alone is not a sufficient metric to evaluate the quality of image registration results, since a perfectly consistent mapping, with no other constraint, may be not even close to correctly register a pair of images.

    Read more →
  • C-RAN

    C-RAN

    C-RAN (Cloud-RAN), also referred to as Centralized-RAN, is an architecture for cellular networks. C-RAN is a centralized, cloud computing-based architecture for radio access networks that supports 2G, 3G, 4G, 5G and future wireless communication standards. Its name comes from the four 'C's in the main characteristics of C-RAN system, "Clean, Centralized processing, Collaborative radio, and a real-time Cloud Radio Access Network". == Background == Traditional cellular, or Radio Access Networks (RAN), consist of many stand-alone base stations (BTS). Each BTS covers a small area, whereas a group BTS provides coverage over a continuous area. Each BTS processes and transmits its own signal to and from the mobile terminal, and forwards the data payload to and from the mobile terminal and out to the core network via the backhaul. Each BTS has its own cooling, back haul transportation, backup battery, monitoring system, and so on. Because of limited spectral resources, network operators 'reuse' the frequency among different base stations, which can cause interference between neighboring cells. There are several limitations in the traditional cellular architecture. First, each BTS is costly to build and operate. Moore's law helps reduce the size and power of an electrical system, but the supporting facilities of the BTS are not improved quite as well. Second, when more BTS are added to a system to improve its capacity, interference among BTS is more severe as BTS are closer to each other and more of them are using the same frequency. Third, because users are mobile, the traffic of each BTS fluctuates (called 'tide effect'), and as a result, the average utilization rate of individual BTS is pretty low. However, these processing resources cannot be shared with other BTS. Therefore, all BTS are designed to handle the maximum traffic, not average traffic, resulting in a waste of processing resources and power at idle times. == Evolution of base station architecture == === All-in-one macro base station === In the 1G and 2G cellular networks, base stations had an all-in-one architecture. Analog, digital, and power functions were housed in a single cabinet as large as a refrigerator. Usually the base station cabinet was placed in a dedicated room along with all necessary supporting facilitates such as power, backup battery, air conditioning, environment surveillance, and backhaul transmission equipment. The RF signal is generated by the base station RF unit and propagates through pairs of RF cables up to the antennas on the top of a base station tower or other mounting points. This all-in-one architecture was mostly found in macro cell deployments. === Distributed base station === For 3G, a distributed base station architecture was introduced by Ericsson, Nokia, Huawei, and other leading telecom equipment vendors. In this architecture the radio function unit, also known as the remote radio head (RRH), is separated from the digital function unit, or baseband unit (BBU) by fiber. Digital baseband signals are carried over fiber, using the Open Base Station Architecture Initiative (OBSAI) or Common Public Radio Interface (CPRI) standard. The RRH can be installed on the top of tower close to the antenna, reducing the loss compared to the traditional base station where the RF signal has to travel through a long cable from the base station cabinet to the antenna at the top of the tower. The fiber link between RRH and BBU also allows more flexibility in network planning and deployment as they can be placed a few hundred meters or a few kilometers away. Most modern base stations now use this decoupled architecture. === C-RAN/Cloud-RAN === C-RAN may be viewed as an architectural evolution of the above distributed base station system. It takes advantage of many technological advances in wireless, optical and IT communications systems. For example, it uses the latest CPRI standard, low cost Coarse or Dense Wavelength Division Multiplexing (CWDM/ DWDM) technology, and mmWave to allow transmission of baseband signal over long distance thus achieving large scale centralised base station deployment. It applies recent Data Centre Network technology to allow a low cost, high reliability, low latency and high bandwidth interconnect network in the BBU pool. It utilizes open platforms and real-time virtualization technology rooted in cloud computing to achieve dynamic shared resource allocation and support multi-vendor, multi-technology environments. == Architecture overview == C-RAN architecture has the following characteristics that are distinct from other cellular architectures: Large scale centralized deployment: Allows many RRHs to connect to a centralized BBU pool. The maximum distance can be 20km in fiber link for 4G (LTE/LTE-A) systems, and even longer distances (40~80km) for 3G (WCDMA/TD-SCDMA) and 2G (GSM/CDMA) systems. Native support to Collaborative Radio technologies: Any BBU can talk with any other BBU within the BBU pool with very high bandwidth (10 Gbit/s and above) and low latency (10 μs level). This is enabled by the interconnection of BBUs in the pool. This is one major difference from BBU Hotelling, or base station Hotelling; in the latter case, the BBUs of different base stations are simply stacked together and have no direct link between them to allow physical layer co-ordination. Real-time virtualization capability based on open platform: This is different from traditional base stations built on proprietary hardware, where the software and hardware are close-sourced and provided by single vendors. In contrast, a C-RAN BBU pool is built on open hardware, like x86/ARM CPU based servers, and interface cards that handle fiber links to RRHs and inter-connections in the pool. Real-time virtualization ensures that resources in the pool can be allocated dynamically to base station software stacks, say 4G/3G/2G function modules from different vendors, according to network load. However, to satisfy the strict timing requirements of wireless communication systems, the real-time performance for C-RAN is at the level of tens of microseconds, which is two orders of magnitude better than the millisecond level 'real-time' performance usually seen in Cloud Computing environments. == Similar architecture and systems == KT, a telecom operator in the Republic of Korea, introduced a Cloud Computing Center (CCC) system in their 3G (WCDMA/HSPA) and 4G (LTE/LTE-A) network in 2011 and 2012. The concept of CCC is basically the same as C-RAN. SK Telecom has also deployed Smart Cloud Access Network (SCAN) and Advanced-SCAN in their 4G (LTE/LTE-A) network in Korea no later than 2012. In 2014, Airvana (now CommScope) introduced OneCell, a C-RAN-based small cell system designed for enterprises and public spaces. == Competing architectures in cellular network evolution == === All-in-one BTS === One major alternative solution that is addressing similar challenges of RAN, is the small size, all-in-one outdoor BTS. Thanks to the achievements in the semiconductor industry, all the functionality of a BTS, including RF, baseband processing, MAC processing and package level processing, can now be implemented in a volume of <50 liters. This makes the system small and weatherproof, reduces the difficulty of BTS site choice and construction, eliminates the air conditioning requirement, and thus reduces operational costs. However, because each BTS is still working on its own, it cannot readily make use of the collaboration algorithms to reduce the interference between neighboring BTSs. It is also relatively hard to upgrade or repair because the all-in-one BTS units are usually mounted near the antenna. More processing units in less-protected environments also implies a higher failure rate compared to C-RAN, which only has the RRU deployed outdoors. The advantage of Cloud RAN lies in its ability to implement LTE-Advanced features such as Coordinated MultiPoint (CoMP) with very low latency between multiple radio heads. However, the economic benefit of improvements such as CoMP can be negated by the higher backhaul costs for some operators. === Small cell === The main competition between small cell and C-RAN occurs in two deployment scenarios: outdoor hotspot coverage and indoor coverage. == Academic research and publications == As one of the promising evolution paths for future cellular network architecture, C-RAN has attracted high academic research interest. Meanwhile, because the native support of cooperative radio capability built into the C-RAN architecture, it also enables many advanced algorithms that were hard to implement in cellular networks, including Cooperative Multi-Point Transmission/Receiving, Network Coding, etc. In October 2011, Wireless World Research Forum 27 was hosted in Germany, when China Mobile was invited to give a C-RAN presentation. In August 2012, IEEE C-RAN 2012 workshop was hosted in Kunming, China. CRC Press published a book, "Green Communications: Theore

    Read more →
  • Live Transcribe

    Live Transcribe

    Live Transcribe is a mobile app for real-time captioning, developed by Google for the Android operating system. Development on the application began in partnership with Gallaudet University. It was publicly released as a free beta for Android 5.0+ on the Google Play Store on February 4, 2019. As of early 2023 it had been downloaded over 500 million times. == Development == Researchers Dimitri Kanevsky, Sagar Savla and Chet Gnegy at Google developed the app in collaboration with researchers at Gallaudet University, an American university for the education of the deaf and hard of hearing. The app uses machine learning to generate captions, similar to YouTube's auto-generated captions. In August 2019, Google made Live Transcribe an open-source project. == Features == The app uses speech recognition to generate live captions in over 80 languages with varying accuracy. The app, which requires connection to the Internet to function, is available to download on the Google Play Store. A later update to the app displayed information on sounds such as clapping, laughter, music, applause, and whistling. In May 2020, the app started supporting transcription in Albanian, Burmese, Estonian, Macedonian, Mongolian, Punjabi, and Uzbek, supporting 70 languages. In March 2022, the app was updated with support to transcribe offline, without Internet connection, so long as the appropriate language pack has been installed. The offline mode is only available for devices with 6GB of RAM and certain Google Pixel devices.

    Read more →
  • Application-release automation

    Application-release automation

    Application-release automation (ARA) refers to the process of packaging and deploying an application or update of an application from development, across various environments, and ultimately to production. ARA solutions must combine the capabilities of deployment automation, environment management and modeling, and release coordination. == Relationship with DevOps == ARA tools help cultivate DevOps best practices by providing a combination of automation, environment modeling and workflow-management capabilities. These practices help teams deliver software rapidly, reliably and responsibly. ARA tools achieve a key DevOps goal of implementing continuous delivery with a large quantity of releases quickly. == Relationship with deployment == ARA is more than just software-deployment automation – it deploys applications using structured release-automation techniques that allow for an increase in visibility for the whole team. It combines workload automation and release-management tools as they relate to release packages, as well as movement through different environments within the DevOps pipeline. ARA tools help regulate deployments, how environments are created and deployed, and how and when releases are deployed. == ARA Solutions == All ARA solutions must include capabilities in automation, environment modeling, and release coordination. Additionally, the solution must provide this functionality without reliance on other tools.

    Read more →
  • Indic computing

    Indic computing

    Indic Computing means "computing in Indic", i.e., Indian Scripts and Languages. It involves developing software in Indic Scripts/languages, Input methods, Localization of computer applications, web development, Database Management, Spell checkers, Speech to Text and Text to Speech applications and OCR in Indian languages. Unicode standard version 15.0 specifies codes for 9 Indic scripts in Chapter 12 titled "South and Central Asia-I, Official Scripts of India". The 9 scripts are Bengali, Devanagari, Gujarati, Gurmukhi, Kannada, Malayalam, Oriya, Tamil and Telugu. A lot of Indic Computing projects are going on. They involve some government sector companies, some volunteer groups and individual people. == Government sector == Indian Union Government made it mandatory for Mobile phone companies whose handsets manufactured, stored, sold and distributed in India to have support for displaying and typing text using fonts for all 22 languages. This move has seen rise in use of Indian languages by millions of users. === TDIL === The Department of Electronics and Information Technology, India initiated the TDIL (Technology Development for Indian Languages) with the objective of developing Information Processing Tools and Techniques to facilitate human-machine interaction without a language barrier; creating and accessing multilingual knowledge resources; and integrating them to develop innovative user products and services. In 2005, it started distributing language software tools developed by Government/Academic/Private companies in the form of CD for non commercial use. Some of the outcomes of TDIL program have been deployed on Indian Language Technology Proliferation & Deployment Centre. This Centre disseminates all the linguistic resources, tools & applications which have been developed under TDIL funding. This programme took to exponential expansion under the leadership of Dr. Swaran Lata who also created international foot-print of the programme. She has now retired. === C-DAC === C-DAC is an India based government software company which is involved in developing language related software. It is best known for developing InScript Keyboard, the standard keyboard for Indian languages. It has also developed lot of Indic language solutions including Word Processors, typing tools, text to speech software, OCR in Indian languages etc. ==== BharateeyaOO.org ==== The work developed out of CDAC, Bangalore (earlier known as NCST, Bangalore) became BharateeyaOO. OpenOffice 2.1 had support for over 10 Indian languages. ==== BOSS ==== BOSS linux was developed by the Centre for Development of Advanced Computing (CDAC) to promote use of open-source software in India. == NGO and Volunteer groups == === Indlinux === Indlinux organisation helped organise the individual volunteers working on different indic language versions of Linux and its applications. === Sarovar === Sarovar.org is India's first portal to host projects under Free/Open source licenses. It is located in Trivandrum, India and hosted at Asianet data center. Sarovar.org is customised, installed and maintained by Linuxense as part of their community services and sponsored by River Valley Technologies. Sarovar.org is built on Debian Etch and GForge and runs off METTLE. === Pinaak === Pinaak is a non-government charitable society devoted to Indic language computing. It works for software localization, developing language software, localizing open source software, enriching online encyclopedias etc. In addition to this Pinaak works for educating people about computing, ethical use of Internet and use of Indian languages on Internet. === Ankur Group === Ankur Group is working toward supporting Bengali language (Bengali) on Linux operating system including localized Bengali GUI, Live CD, English-to-Bengali translator, Bengali OCR and Bengali Dictionary etc. === BhashaIndia === === SMC === SMC is a free software group, working to bridge the language divide in Kerala in the technology front and is today the biggest language computing community in India. == Input methods == === Full size keyboards === With the advent of Unicode inputting Indic text on computer has become very easy. A number of methods exist for this purpose, but the main ones are:- ==== InScript ==== Inscript is the standard keyboard for Indian languages. Developed by C-DAC and standardized by Government of India. Nowadays it comes inbuilt in all major operating systems including Microsoft Windows (2000, XP, Vista, 7), Linux and Macintosh. ==== Phonetic transliteration ==== This is a typing method in which, for instance, the user types text in an Indian language using Roman characters and it is phonetically converted to equivalent text in Indian script in real time. This type of conversion is done by phonetic text editors, word processors and software plugins. Building up on the idea, one can use phonetic IME tools that allow Indic text to be input in any application. Some examples of phonetic transliterators are Xlit, Google Indic Transliteration, BarahaIME, Indic IME, Rupantar, SMC's Indic Keyboard and Microsoft Indic Language Input Tool. SMC's Indic Keyboard has support for as many as 23 languages whereas Google Indic Keyboard only supports 11 Indian languages. They can be broadly classified as: Fixed transliteration scheme based tools – They work using a fixed transliteration scheme to convert text. Some examples are Indic IME, Rupantar and BarahaIME. Intelligent/Learning based transliteration tools – They compare the word with a dictionary and then convert it to the equivalent words in the target language. Some of the popular ones are Google Indic Transliteration, Xlit, Microsoft Indic Language Input Tool and QuillPad. ==== Remington (typewriter) ==== This layout was developed when computers had not been invented or deployed with Indic languages, and typewriters were the only means to type text in Indic scripts. Since typewriters were mechanical and could not include a script processor engine, each character had to be placed on the keyboard separately, which resulted in a very complex and difficult to learn keyboard layout. With the advent of Unicode, the Remington layout was added to various typing tools for sake of backward compatibility, so that old typists did not have to learn a new keyboard layout. Nowadays this layout is only used by old typists who are used to this layout due to several years of usage. One tool to include Remington layout is Indic IME. A font that is based on the Remington keyboard layout is Kruti Dev. Another online tool that very closely supports the old Remington keyboard layout using Kruti Dev is the Remington Typing tool. === Braille === IBus Sharada Braille, which supports seven Indian languages was developed by SMC. === Mobile phones with Numeric keyboards === Mobile/Hand/cell phone basic models have 12 keys like the plain old telephone keypad. Each key is mapped to 3 or 4 English letters to facilitate data entry in English. For inputting Indian languages with this kind of keypad, there are two ways to do so. First is the Multi-tap Method and second uses visual help from the screen like Panini Keypad. The primary usage is SMS. 140 characters size used for English/Roman languages can be used to accommodate only about 70 language characters when Unicode Proprietary compression is used some times to increase the size of single message for Complex script languages like Hindi. A research study of the available methods and recommendations of proposed standard was released by Broadband Wireless Consortium of India (BWCI). ==== Transliteration/Phonetic methods ==== English is used to type in Indian languages. QuillPad IndiSMS ==== Native methods ==== In native methods, the letters of the language are displayed on the screen corresponding to the numeral keys based on the probabilities of those letters for that language. Additional letters can be accessed by using a special key. When a word is partially typed, options are presented from which the user can make a selection. === Smart phones with Qwerty keyboards === Most smart phones have about 35 keys catering primarily to the English language. Numerals and some symbols are accessed with a special key called Alt. Indic input methods are yet to evolve for these types of phones, as support of Unicode for rendering is not widely available. === For Smart Phones with Soft/Virtual keyboards === Inscript is being adopted for smart phone usage. For Android phones which can render Indic languages, Swalekh Multilingual Keypad Multiling Keyboard app are available. Gboard offers support for several Indian languages. == Localization == Localization means translating software, operating systems, websites etc. various applications in Indian language. Various volunteers groups are working in this direction. === Mandrake Tamil Version === A notable example is the Tamil version of Mandrake linux(defunct since 2011). Tamil speakers in Toronto (Canada) released Mandrake,

    Read more →
  • Information retrieval

    Information retrieval

    Information retrieval (IR) in computing and information science is the task of identifying and retrieving information system resources that are relevant to an information need. The information need can be specified in the form of a search query. In the case of document retrieval, queries can be based on full-text or other content-based indexing. Information retrieval is the science of searching for information in a document, searching for documents themselves, and also searching for the metadata that describes data, and for databases of texts, images, or sounds. Cross-modal retrieval implies retrieval across modalities. Automated information retrieval systems are used to reduce what has been called information overload. An IR system is a software system that provides access to books, journals, and other documents, as well as storing and managing those documents. Web search engines are the most visible IR applications. == Overview == An information retrieval process begins when a user enters a query into the system. Queries are formal statements of information needs, for example search strings in web search engines. In information retrieval, a query does not uniquely identify a single object in the collection. Instead, several objects may match the query, perhaps with different degrees of relevance. An object is an entity that is represented by information in a content collection or database. User queries are matched against the database information. However, as opposed to classical SQL queries of a database, in information retrieval the results returned may or may not match the query, so results are typically ranked. This ranking of results is a key difference of information retrieval searching compared to database searching. Depending on the application the data objects may be, for example, text documents, images, audio, mind maps or videos. Often the documents themselves are not kept or stored directly in the IR system, but are instead represented in the system by document surrogates or metadata. Most IR systems compute a numeric score on how well each object in the database matches the query, and rank the objects according to this value. The top ranking objects are then shown to the user. The process may then be iterated if the user wishes to refine the query. == History == there is ... a machine called the Univac ... whereby letters and figures are coded as a pattern of magnetic spots on a long steel tape. By this means the text of a document, preceded by its subject code symbol, can be recorded ... the machine ... automatically selects and types out those references which have been coded in any desired way at a rate of 120 words a minute The idea of using computers to search for relevant pieces of information was popularized in the article As We May Think by Vannevar Bush in 1945. It would appear that Bush was inspired by patents for a 'statistical machine' – filed by Emanuel Goldberg in the 1920s and 1930s – that searched for documents stored on film. The first description of a computer searching for information was described by Holmstrom in 1948, detailing an early mention of the Univac computer. Automated information retrieval systems were introduced in the 1950s: one even featured in the 1957 romantic comedy Desk Set. In the 1960s, the first large information retrieval research group was formed by Gerard Salton at Cornell. By the 1970s several different retrieval techniques had been shown to perform well on small text corpora such as the Cranfield collection (several thousand documents). Large-scale retrieval systems, such as the Lockheed Dialog system, came into use early in the 1970s. In 1992, the US Department of Defense along with the National Institute of Standards and Technology (NIST), cosponsored the Text Retrieval Conference (TREC) as part of the TIPSTER text program. The aim of this was to look into the information retrieval community by supplying the infrastructure that was needed for evaluation of text retrieval methodologies on a very large text collection. This catalyzed research on methods that scale to huge corpora. The introduction of web search engines has boosted the need for very large scale retrieval systems even further. By the late 1990s, the rise of the World Wide Web fundamentally transformed information retrieval. While early search engines such as AltaVista (1995) and Yahoo! (1994) offered keyword-based retrieval, they were limited in scale and ranking refinement. The breakthrough came in 1998 with the founding of Google, which introduced the PageRank algorithm, using the web's hyperlink structure to assess page importance and improve relevance ranking. During the 2000s, web search systems evolved rapidly with the integration of machine learning techniques. These systems began to incorporate user behavior data (e.g., click-through logs), query reformulation, and content-based signals to improve search accuracy and personalization. In 2009, Microsoft launched Bing, introducing features that would later incorporate semantic web technologies through the development of its Satori knowledge base. Academic analysis have highlighted Bing's semantic capabilities, including structured data use and entity recognition, as part of a broader industry shift toward improving search relevance and understanding user intent through natural language processing. A major leap occurred in 2018, when Google deployed BERT (Bidirectional Encoder Representations from Transformers) to better understand the contextual meaning of queries and documents. This marked one of the first times deep neural language models were used at scale in real-world retrieval systems. BERT's bidirectional training enabled a more refined comprehension of word relationships in context, improving the handling of natural language queries. Because of its success, transformer-based models gained traction in academic research and commercial search applications. Simultaneously, the research community began exploring neural ranking models that outperformed traditional lexical-based methods. Long-standing benchmarks such as the Text REtrieval Conference (TREC), initiated in 1992, and more recent evaluation frameworks Microsoft MARCO(MAchine Reading COmprehension) (2019) became central to training and evaluating retrieval systems across multiple tasks and domains. MS MARCO has also been adopted in the TREC Deep Learning Tracks, where it serves as a core dataset for evaluating advances in neural ranking models within a standardized benchmarking environment. As deep learning became integral to information retrieval systems, researchers began to categorize neural approaches into three broad classes: sparse, dense, and hybrid models. Sparse models, including traditional term-based methods and learned variants like SPLADE, rely on interpretable representations and inverted indexes to enable efficient exact term matching with added semantic signals. Dense models, such as dual-encoder architectures like ColBERT, use continuous vector embeddings to support semantic similarity beyond keyword overlap. Hybrid models aim to combine the advantages of both, balancing the lexical (token) precision of sparse methods with the semantic depth of dense models. This way of categorizing models balances scalability, relevance, and efficiency in retrieval systems. As IR systems increasingly rely on deep learning, concerns around bias, fairness, and explainability have also come to the picture. Research is now focused not just on relevance and efficiency, but on transparency, accountability, and user trust in retrieval algorithms. == Applications == Areas where information retrieval techniques are employed include (the entries are in alphabetical order within each category): === General applications === Digital libraries Information filtering Recommender systems Media search Blog search Image retrieval 3D retrieval Music retrieval News search Speech retrieval Video retrieval Search engines Site search Desktop search Enterprise search Federated search Mobile search Social search Web search === Domain-specific applications === Expert search finding Genomic information retrieval Geographic information retrieval Information retrieval for chemical structures Information retrieval in software engineering Legal information retrieval Vertical search === Other retrieval methods === Methods/Techniques in which information retrieval techniques are employed include: Cross-modal retrieval Adversarial information retrieval Automatic summarization Multi-document summarization Compound term processing Cross-lingual retrieval Document classification Spam filtering Question answering == Model types == In order to effectively retrieve relevant documents by IR strategies, the documents are typically transformed into a suitable representation. Each retrieval strategy incorporates a specific model for its document representation purposes. The picture on the right illustrates the relationship of som

    Read more →
  • NetMiner

    NetMiner

    NetMiner is an all-in-one software platform for analyzing and visualizing complex network data, based on Social Network Analysis (SNA). Originally released in 2001, it supports research and education in a wide range of domains through interactive and visual data exploration. This tool allows researchers to explore their network data visually and interactively, and helps them to detect underlying patterns and structures of the network. It has also been recognized for its comprehensive features and user-friendly interface in comparative reviews of SNA software packages. == Features == === Integrated Data Environment === NetMiner supports unified management of diverse data types—including network (nodes and links), tabular, and unstructured text data—within a single platform. This enables users to perform the entire analysis workflow seamlessly without switching between tools. NetMiner also supports a wide range of analytical methods, allowing users to derive new insights by combining multiple approaches. Analytical results can be saved and reused across workflows(Add to Dataset) Graph and Network Analysis: Includes Centrality, Community Detection, Blockmodeling, and Similarity Measures. Machine learning: Provides algorithms for regression, classification, clustering, ensemble modeling and XAI(Explainable AI) Graph Neural Networks (GNNs): Supports models such as GraphSAGE, GCN, and GAT to learn from both node attributes and graph structure. Natural language processing (NLP): Uses pretrained deep learning models to analyze unstructured text, including named entity recognition and keyword extraction. Text mining and Text network analysis: Supports construction of word co-occurrence networks and topic modeling using LDA, BERTopic, enabling identification of thematic patterns and semantic structures in text data. Data Visualization: Offers advanced network visualization features, supporting multiple layout algorithms. Analytical outcomes such as centrality or community detection can be directly reflected in the network map via node size, color, and position, enhancing intuitive understanding. === AI Assistant === NetMiner integrates with external large language models such as OpenAI GPT and Google Gemini to interpret complex analysis results in natural language, summarize key findings, and suggest next steps for exploration. === Workflow and Usability === Designed to follow the structure of real-world data analysis workflows, NetMiner adopts a hierarchical data organization (Project → Workspace → Dataset → Data Item). Its web-based user interface improves clarity and reduces complexity. NetMiner 5 supports Windows 10 or higher and macOS 11 or later with M1 chip. Both academic and commercial licenses are available. == Extension == NetMiner Extension is small program to extend the functionality of NetMiner. In other words, it enables you to customize NetMiner according to your needs. By adding ‘NetMiner Extension’, you can expand your research. === Web Data Collection === NetMiner allows users to collect data from services such as YouTube, OpenAlex, Springer, and KCI via Open APIs. Collected data is automatically preprocessed and transformed to fit NetMiner’s internal structure, requiring no additional coding or external tools. SNS Data Collector: It collects social media data from YouTube, which has a large number of social media users worldwide. Biblio Data Collector: It collects the bibliographic data from Springer, OpenAlex, and KCI essential for research trend analysis. == File formats == === NetMiner data file format === .NMF === Importable/exportable formats === Plain text data: .TXT, .CSV Microsoft Excel data: .XLS, .XLSX Unstructured text data: .TXT, .CSV, .XLS(X) ※ NetMiner 4 only NetMiner 2 data: .NTF UCINet data: .DL, .DAT Pajek data: .NET, .VEC, .CLU, .PER StOCNET data file: .DAT Graph Modelling Language data: .GML(importing only) Related software UCINET Pajek Gephi StoCNET == Data structure == === Hierarchy of NetMiner data structure === NetMiner 5 supports not only graph data composed of nodes and links, but also tabular and unstructured data without fixed schema or identifiers. This enables users to easily import a wide variety of raw and unstructured data suitable for machine learning applications. Within a single workspace, users can manage node sets, link sets, and structured/unstructured data simultaneously. Multiple graph layers under a node set can be organized in a tree structure, allowing for intuitive understanding of the data currently being analyzed. == Release history == The first version of NetMiner was released on Dec 21, 2001. There have been five major updates from 2001. === NetMiner 5 === Released on June 9, 2025. NetMiner 5 retains the core features and no-code concept of NetMiner 4, but has evolved by integrating cutting-edge AI technologies. AI Assistant, Personal Analytics Tutor Support for Graph, Structured, and Unstructured Data Graph Analytics / Social Network Analysis Machine Learning(M/L) & XAI Graph Machine Learning(GML): Graph Neural Network Text Mining: Natural Language Processing(NLP), Text Network, Topic Modeling Data Visualization === NetMiner 4 (2011) === Latest version is 4.5.1. Introduced Python scripting, encrypted NMF format, semantic analysis tools (word cloud, topic modeling), and Extension - Data Collector. === NetMiner 3 (2007) === Enhanced scalability, integrated analysis-visualization modules, and DB import from Oracle, MS SQL. === NetMiner 2 (2003) === Improved statistical and network measures, visualization algorithms, and external data import modules.

    Read more →
  • Text normalization

    Text normalization

    Text normalization is the process of transforming text into a single canonical form that it might not have had before. Normalizing text before storing or processing it allows for separation of concerns, since input is guaranteed to be consistent before operations are performed on it. Text normalization requires being aware of what type of text is to be normalized and how it is to be processed afterwards; there is no all-purpose normalization procedure. == Applications == Text normalization is frequently used when converting text to speech. Numbers, dates, acronyms, and abbreviations are non-standard "words" that need to be pronounced differently depending on context. For example: "$200" would be pronounced as "two hundred dollars" in English, but as "lua selau tālā" in Samoan. "vi" could be pronounced as "vie," "vee," or "the sixth" depending on the surrounding words. Text can also be normalized for storing and searching in a database. For instance, if a search for "resume" is to match the word "résumé," then the text would be normalized by removing diacritical marks; and if "john" is to match "John", the text would be converted to a single case. To prepare text for searching, it might also be stemmed (e.g. converting "flew" and "flying" both into "fly"), canonicalized (e.g. consistently using American or British English spelling), or have stop words removed. == Techniques == For simple, context-independent normalization, such as removing non-alphanumeric characters or diacritical marks, regular expressions would suffice. For example, the sed script sed ‑e "s/\s+/ /g" inputfile would normalize runs of whitespace characters into a single space. More complex normalization requires correspondingly complicated algorithms, including domain knowledge of the language and vocabulary being normalized. Among other approaches, text normalization has been modeled as a problem of tokenizing and tagging streams of text and as a special case of machine translation. == Textual scholarship == In the field of textual scholarship and the editing of historic texts, the term "normalization" implies a degree of modernization and standardization – for example in the extension of scribal abbreviations and the transliteration of the archaic glyphs typically found in manuscript and early printed sources. A normalized edition is therefore distinguished from a diplomatic edition (or semi-diplomatic edition), in which some attempt is made to preserve these features. The aim is to strike an appropriate balance between, on the one hand, rigorous fidelity to the source text (including, for example, the preservation of enigmatic and ambiguous elements); and, on the other, producing a new text that will be comprehensible and accessible to the modern reader. The extent of normalization is therefore at the discretion of the editor, and will vary. Some editors, for example, choose to modernize archaic spellings and punctuation, but others do not. An edition of a text might be normalized based on internal criteria, where orthography is standardized according to the language of the original, or external criteria, where the norms of a different time period are applied. For an example of the latter, a published edition of a medieval Icelandic manuscript might be normalized to the conventions of modern Icelandic, or it might be normalized to Classical Old Icelandic. Standards of normalization vary based on language of the edition as well as the specific conventions of the publisher.

    Read more →