AI Coding Unity

AI Coding Unity — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Kubity

    Kubity

    Kubity is a cloud-based 3D communication tool that works on desktop computers, the web, smartphones, tablets, augmented reality gear, and virtual reality glasses. Kubity is powered by several proprietary 3D processing engines including "Paragone" and "Etna" that prepare the 3D file for transfer over mobile devices. Kubity has practical applications for architecture, interior design, engineering, product design, film, and video games among others. The majority of its users create 3D models using SketchUp or Autodesk Revit software. Kubity products include the Kubity web app and Kubity Go (a mobile application for iOS and Android). Kubity is compatible across many platforms, devices and operating systems including: iOS, Android, Firefox, Chrome, Windows, MacOS, and Linux. == History == Kubity was created by SPK Technology (ex Kubity S.A.S.), a Paris-based software company specializing in automatic 3D data optimization and visualization. Founded in 2012 by a group of software engineers and an urban projects developer, they united around a simple idea: create a way for anyone, anywhere to simply and intuitively explore 3D models on smartphones and computers. In order to bring architects, engineers and designers together with their clients around a 3D model, it was essential to develop an interactive platform that supported multiple desktop and mobile devices for instantaneous and fluid 3D navigation. With specifications in place, 15 engineers fused together several technologies: 3D design, data compression, decimation and rendering optimization, web and mobile transfer, and virtual reality headset integration. In January 2014, the first public Kubity prototype (1.0 Amethyst) was launched to a small group of beta testers with a plug-in that allowed users to import 3D models from SketchUp to their browser. A global release was announced in April 2014 at the SketchUp Basecamp in Vail, Colorado. In May 2015, Kubity launched a web application that worked using WebGL technology (2.0 Citrine). For the first time, users were able to drag and drop any SketchUp file in a web browser without having to install a plug-in. In December 2015, Kubity launched a mobile application on the App Store for iPhone, iPod, and iPad as well as on Google Play for Android devices (3.0 Druzy). In November 2016, Kubity launched support for Oculus Rift and HTC Vive (4.0 Emerald). Beginning in November 2017, Kubity launched a full suite rollout of mobile applications over six months that included Kubity AR for augmented reality, Kubity VR for virtual reality, and Kubity Mirror for remote presentations and screen mirroring (5.0 Feldspar). In September 2018, a one-click plugin for SketchUp and Revit (Kubity PRO), along with a mobile-first revamp of Kubity Go was launched, allowing PRO-to-Go device pairing for automatic mobile sync (6.0 Gypsum). In early 2019, the Kubity Go application was updated to include fully integrated AR, VR, and screen mirroring functionalities, killing off the dedicated companion apps Kubity AR, Kubity VR and Kubity Mirror in the process (7.0 Heliotrope). In January 2020, support for the Kubity PRO plugin for SketchUp and Revit was migrated to a SketchUp-only web app. == Technology == Kubity is powered by a proprietary 3D crystallization engine known as "Paragone"; a technology developed by SPK Technology. Paragone takes constrained information from a 3D file and runs it through the "BlockWave" algorithm (US Patent 10,482.629), also developed by SPK Technology. BlockWave is a multiphase optimization algorithm that combines 3D design, data compression, decimation and rendering optimization, web and mobile transfer, and mixed reality headset integration to create a crystallized universal format of the original file. One phase of the BlockWave algorithm is based on the quadric-based polygonal surface simplification algorithm, performed using predefined heuristics, and is associated with a plurality of simplified versions of the 3D model, each version being associated with a predefined level of detail adapted to the user specific end device. BlockWave extracts data content, geometry and textures, then sets quadrics for each top of the original 3D model, and identifies pairs of adjacent tops linked by vertices. The algorithm uses a local collapsing operator and a top-plan error metric to obtain a fixed number of faces or a maximum defined error; 3D meshing is simplified by replacing two points with one, then deleting the degrading faces and updating adjacent relations. Once decimation is completed, texture optimization is set using texture target parameters allowing maximized GPU memory to improve computing time. With texture encoding completed, the crystallized universal 3D file can now be easily opened on any user-specific end device and played across most digital devices with real-time rendering. == Features == === 3D Crystallization === A user converts (or crystallizes) a 3D file by exporting it with the Kubity web app. Crystallization adds features like AR/VR and cinematic fly-through tour as well as assigns the model a dedicated QR code. === Automatic Mobile Sync === When a 3D model is exported, it is automatically synced to Kubity Go on the user's mobile device. From there, it can be accessed, explored, and shared with others with or without an internet connection. === Security and Management === User models can be managed all in one place on Kubity Go or in a browser from their account. Models can be renamed, password-protected, shared, and played. === Augmented Reality === Developed using Apple ARKit and Google ARCore technology, Kubity Go's augmented reality feature maps the environment in a room detecting horizontal planes like tables and floors to track and place 3D objects. By blending digital objects and information with the environment, Kubity allows users to interact with 3D models in true augmented reality. Built-in communication features allows users to instantly share 3D models with anyone over text, email, social media, or direct device-to-device with a QR Code. Platform Support AR supports devices running iOS11 including: iPhone SE, iPhone 6s, iPhone 6s Plus, iPhone 7, iPhone 7 Plus, iPhone 8, iPhone X, all iPad Pro models, and iPad (2017). AR for Android requires Android 7.0 or later and access to the Google Play Store. === Virtual Reality === VR allows users to explore SketchUp models and Revit projects on-the-go right from a mobile device using Oculus Go, Google Cardboard, Samsung Gear VR, or the glasses-free Magic Window feature. Kubity's virtual reality feature is compatible with Oculus Go, Google Cardboard viewers and other cardboard compatible devices including clip-on style VR glasses like Homido Mini, as well as the mobile virtual reality headset, Samsung Gear VR. Samsung Gear VR supports: Galaxy S6, Galaxy S6 Edge, Galaxy S6 Edge+, Samsung Galaxy Note 5, Galaxy S7, Galaxy S7 Edge, Galaxy S8, Galaxy S8+, Samsung Galaxy Note Fan Edition, Samsung Galaxy Note 8, Samsung Galaxy A8/A8+ (2018), and Samsung Galaxy S9/Galaxy S9+. === Screen Mirroring === Screen mirroring allows a user to sync the sender device to a receiver on a webpage, then control from the sender device to give a remote presentation of the 3D model. Devices are easily synced by entering a six-digit number displayed on the receiving computer. == Platform support == On iOS, the Kubity application is compatible with devices running on the version 9.0 or higher. On Android, Kubity is compatible with devices running on the version 4.4 “Kit Kat” or higher. The web version of Kubity applications currently support web browsers compatible with WebGL2 : Mozilla Firefox and Google Chrome. AR is compatible with devices running iOS11 including: iPhone SE, iPhone 6s, iPhone 6s Plus, iPhone 7, iPhone 7 Plus, iPhone 8, iPhone X, all iPad Pro models, and iPad (2017), and Android devices. Requires Android 7.0 or later and access to the Google Play Store. VR is compatible with Google Cardboard viewers and other cardboard compatible devices including clip-on style VR glasses like Homido Mini, as well as the Samsung Gear VR and Oculus Go. Samsung Gear VR supports: Galaxy S6, Galaxy S6 Edge, Galaxy S6 Edge+, Samsung Galaxy Note 5, Galaxy S7, Galaxy S7 Edge, Galaxy S8, Galaxy S8+, Samsung Galaxy Note Fan Edition, Samsung Galaxy Note 8, Samsung Galaxy A8/A8+ (2018) and Samsung Galaxy S9/Galaxy S9+.

    Read more →
  • Alice AI (AI model family)

    Alice AI (AI model family)

    Alice AI is a neural network family developed by the Russian company Yandex LLC. Alice AI can create and revise texts, generate new ideas and capture the context of the conversation with the user. Alice AI is trained using a dataset which includes information from books, magazines, newspapers and other open sources available on the internet. The neural network may get facts wrong and hallucinate, but as it learns, it will produce increasingly accurate answers. == Usage == YandexGPT is integrated into virtual assistant Alice (an analog of Siri and Alexa) and is available in Yandex services and applications. The company gives businesses access to the neural network’s API through the public cloud platform Yandex Cloud and develops its own B2B solutions on its basis. Since July 2023, 800 companies have participated in the closed testing of YandexGPT. IT developers, banks, retail businesses, and companies from other industries can use the technology in two modes — API and Playground (an interface in the Yandex Cloud console for testing models and hypotheses). Two model versions are available to businesses: one works in asynchronous mode and is better able to handle complex tasks, while the other is suitable for creating quick responses in real time. As a result, YandexGPT has been tested in dozens of scenarios such as content tasks, tech support, creating chatbots, virtual assistants, etc. == History == In February 2023, Yandex announced that it was working on its own version of the ChatGPT generative neural network while developing a language model from the YaLM (Yet another Language Model) family. The project was tentatively named YaLM 2.0, which was later changed to YandexGPT. On May 17, the company unveiled a neural network called YandexGPT (YaGPT) and enabled its virtual assistant Alice to interact with the new language model. On June 15, 2023, Yandex added the YandexGPT language model to the image generation application Shedevrum. This enabled its users to create fully-fledged posts complete with a title, text, and relevant illustration. In July 2023, YandexGPT launched new features enabling businesses to create virtual assistants and chatbots, as well as generate and structure texts. On September 7, 2023, Yandex presented a new version of the language model, YandexGPT 2, at the Practical ML Conf. Compared to the previous one, the new version is able to perform more types of tasks, and the quality of answers has improved. The developers claimed that YandexGPT 2 answered user questions better than the first version in 67% of cases. From October 6, 2023, YandexGPT can create short retellings of online Russian-language videos on the Internet. It can summarize videos that are from two minutes to four hours long and contain speech.

    Read more →
  • Residual neural network

    Residual neural network

    A residual neural network (also referred to as a residual network or ResNet) is a deep learning architecture in which the layers learn residual functions with reference to the layer inputs. It was developed in 2015 for image recognition, and won the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) of that year. As a point of terminology, "residual connection" refers to the specific architectural motif of x ↦ f ( x ) + x {\displaystyle x\mapsto f(x)+x} , where f {\displaystyle f} is an arbitrary neural network module. The motif had been used previously (see §History for details). However, the publication of ResNet made it widely popular for feedforward networks, appearing in neural networks that are seemingly unrelated to ResNet. The residual connection stabilizes the training and convergence of deep neural networks with hundreds of layers, and is a common motif in deep neural networks, such as transformer models (e.g., BERT, and GPT models such as ChatGPT), the AlphaGo Zero system, the AlphaStar system, and the AlphaFold system. == Mathematics == === Residual connection === In a multilayer neural network model, consider a (non-residual) subnetwork with a certain number of stacked layers (e.g., 2 or 3). Let H ( x ; α ) {\displaystyle H(x;\alpha )} denote the subnetwork. Suppose H ∗ {\displaystyle H^{}} is the desired optimal output of this subnetwork. Residual learning simply adds x {\displaystyle x} directly to the output, such that the optimal learned output now becomes be H ∗ − x {\displaystyle H^{}-x} , which is interpreted as a "residual" with respect to x {\displaystyle x} . The operation of "adding x {\displaystyle x} " is implemented via a "skip connection" that performs an identity mapping to connect the input of the subnetwork with its output. This connection is referred to as a "residual connection" in later work. Let F ( x ; α ) = H ( x ; a ) + x {\displaystyle F(x;\alpha )=H(x;a)+x} . The function F {\displaystyle F} is often represented by matrix multiplication interlaced with activation functions and normalization operations (e.g., batch normalization or layer normalization). As a whole, one of these subnetworks is referred to as a "residual block". A deep residual network is constructed by simply stacking these blocks. Long short-term memory (LSTM) has a memory mechanism that serves as a residual connection. In an LSTM without a forget gate, an input x t {\displaystyle x_{t}} is processed by a function F {\displaystyle F} and added to a memory cell c t {\displaystyle c_{t}} , resulting in c t + 1 = c t + F ( x t ) {\displaystyle c_{t+1}=c_{t}+F(x_{t})} . An LSTM with a forget gate essentially functions as a highway network. To stabilize the variance of the layers' inputs, it is recommended to replace the residual connections x + f ( x ) {\displaystyle x+f(x)} with x / L + f ( x ) {\displaystyle x/L+f(x)} , where L {\displaystyle L} is the total number of residual layers. === Projection connection === If the function F {\displaystyle F} is of type F : R n → R m {\displaystyle F:\mathbb {R} ^{n}\to \mathbb {R} ^{m}} where n ≠ m {\displaystyle n\neq m} , then F ( x ) + x {\displaystyle F(x)+x} is undefined. To handle this special case, a projection connection is used: y = F ( x ) + P ( x ) {\displaystyle y=F(x)+P(x)} where P {\displaystyle P} is typically a linear projection, defined by P ( x ) = M x {\displaystyle P(x)=Mx} where M {\displaystyle M} is a m × n {\displaystyle m\times n} matrix. The matrix is trained via backpropagation, as is any other parameter of the model. === Signal propagation === The introduction of identity mappings facilitates signal propagation in both forward and backward paths. ==== Forward propagation ==== If the output of the ℓ {\displaystyle \ell } -th residual block is the input to the ( ℓ + 1 ) {\displaystyle (\ell +1)} -th residual block (assuming no activation function between blocks), then the ( ℓ + 1 ) {\displaystyle (\ell +1)} -th input is: x ℓ + 1 = F ( x ℓ ) + x ℓ {\displaystyle x_{\ell +1}=F(x_{\ell })+x_{\ell }} Applying this formulation recursively, e.g.: x ℓ + 2 = F ( x ℓ + 1 ) + x ℓ + 1 = F ( x ℓ + 1 ) + F ( x ℓ ) + x ℓ {\displaystyle {\begin{aligned}x_{\ell +2}&=F(x_{\ell +1})+x_{\ell +1}\\&=F(x_{\ell +1})+F(x_{\ell })+x_{\ell }\end{aligned}}} yields the general relationship: x L = x ℓ + ∑ i = ℓ L − 1 F ( x i ) {\displaystyle x_{L}=x_{\ell }+\sum _{i=\ell }^{L-1}F(x_{i})} where L {\textstyle L} is the index of a residual block and ℓ {\textstyle \ell } is the index of some earlier block. This formulation suggests that there is always a signal that is directly sent from a shallower block ℓ {\textstyle \ell } to a deeper block L {\textstyle L} . ==== Backward propagation ==== The residual learning formulation provides the added benefit of mitigating the vanishing gradient problem to some extent. However, it is crucial to acknowledge that the vanishing gradient issue is not the root cause of the degradation problem, which is tackled through the use of normalization. To observe the effect of residual blocks on backpropagation, consider the partial derivative of a loss function E {\displaystyle {\mathcal {E}}} with respect to some residual block input x ℓ {\displaystyle x_{\ell }} . Using the equation above from forward propagation for a later residual block L > ℓ {\displaystyle L>\ell } : ∂ E ∂ x ℓ = ∂ E ∂ x L ∂ x L ∂ x ℓ = ∂ E ∂ x L ( 1 + ∂ ∂ x ℓ ∑ i = ℓ L − 1 F ( x i ) ) = ∂ E ∂ x L + ∂ E ∂ x L ∂ ∂ x ℓ ∑ i = ℓ L − 1 F ( x i ) {\displaystyle {\begin{aligned}{\frac {\partial {\mathcal {E}}}{\partial x_{\ell }}}&={\frac {\partial {\mathcal {E}}}{\partial x_{L}}}{\frac {\partial x_{L}}{\partial x_{\ell }}}\\&={\frac {\partial {\mathcal {E}}}{\partial x_{L}}}\left(1+{\frac {\partial }{\partial x_{\ell }}}\sum _{i=\ell }^{L-1}F(x_{i})\right)\\&={\frac {\partial {\mathcal {E}}}{\partial x_{L}}}+{\frac {\partial {\mathcal {E}}}{\partial x_{L}}}{\frac {\partial }{\partial x_{\ell }}}\sum _{i=\ell }^{L-1}F(x_{i})\end{aligned}}} This formulation suggests that the gradient computation of a shallower layer, ∂ E ∂ x ℓ {\textstyle {\frac {\partial {\mathcal {E}}}{\partial x_{\ell }}}} , always has a later term ∂ E ∂ x L {\textstyle {\frac {\partial {\mathcal {E}}}{\partial x_{L}}}} that is directly added. Even if the gradients of the F ( x i ) {\displaystyle F(x_{i})} terms are small, the total gradient ∂ E ∂ x ℓ {\textstyle {\frac {\partial {\mathcal {E}}}{\partial x_{\ell }}}} resists vanishing due to the added term ∂ E ∂ x L {\textstyle {\frac {\partial {\mathcal {E}}}{\partial x_{L}}}} . == Variants of residual blocks == === Basic block === A basic block is the simplest building block studied in the original ResNet. This block consists of two sequential 3x3 convolutional layers and a residual connection. The input and output dimensions of both layers are equal. === Bottleneck block === A bottleneck block consists of three sequential convolutional layers and a residual connection. The first layer in this block is a 1×1 convolution for dimension reduction (e.g., to 1/2 of the input dimension); the second layer performs a 3×3 convolution; the last layer is another 1×1 convolution for dimension restoration. The models of ResNet-50, ResNet-101, and ResNet-152 are all based on bottleneck blocks. === Pre-activation block === The pre-activation residual block applies activation functions before applying the residual function F {\displaystyle F} . Formally, the computation of a pre-activation residual block can be written as: x ℓ + 1 = F ( ϕ ( x ℓ ) ) + x ℓ {\displaystyle x_{\ell +1}=F(\phi (x_{\ell }))+x_{\ell }} where ϕ {\displaystyle \phi } can be any activation (e.g. ReLU) or normalization (e.g. LayerNorm) operation. This design reduces the number of non-identity mappings between residual blocks, and allows an identity mapping directly from the input to the output. This design was used to train models with 200 to over 1000 layers, and was found to consistently outperform variants where the residual path is not an identity function. The pre-activation ResNet with 200 layers took 3 weeks to train for ImageNet on 8 GPUs in 2016. Since GPT-2, transformer blocks have been mostly implemented as pre-activation blocks. This is often referred to as "pre-normalization" in the literature of transformer models. == Applications == Originally, ResNet was designed for computer vision. All transformer architectures include residual connections. Indeed, very deep transformers cannot be trained without them. The original ResNet paper made no claim on being inspired by biological systems. However, later research has related ResNet to biologically-plausible algorithms. A study published in Science in 2023 disclosed the complete connectome of an insect brain (specifically that of a fruit fly larva). This study discovered "multilayer shortcuts" that resemble the skip connections in artificial neural networks, including ResNets. == History == === Previous work === Residual connections were noticed in neu

    Read more →
  • Inception (deep learning architecture)

    Inception (deep learning architecture)

    Inception is a family of convolutional neural network (CNN) for computer vision, introduced by researchers at Google in 2014 as GoogLeNet (later renamed Inception v1). The series was historically important as an early CNN that separates the stem (data ingest), body (data processing), and head (prediction), an architectural design that persists in all modern CNN. == Version history == === Inception v1 === In 2014, a team at Google developed the GoogLeNet architecture, an instance of which won the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14). The name came from the LeNet of 1998, since both LeNet and GoogLeNet are CNNs. They also called it "Inception" after a "we need to go deeper" internet meme, a phrase from Inception (2010) the film. Because later, more versions were released, the original Inception architecture was renamed again as "Inception v1". The models and the code were released under Apache 2.0 license on GitHub. The Inception v1 architecture is a deep CNN composed of 22 layers. Most of these layers were "Inception modules". The original paper stated that Inception modules are a "logical culmination" of Network in Network and (Arora et al, 2014). Since Inception v1 is deep, it suffered from the vanishing gradient problem. The team solved it by using two "auxiliary classifiers", which are linear-softmax classifiers inserted at 1/3-deep and 2/3-deep within the network, and the loss function is a weighted sum of all three: L = 0.3 L a u x , 1 + 0.3 L a u x , 2 + L r e a l {\displaystyle L=0.3L_{aux,1}+0.3L_{aux,2}+L_{real}} These were removed after training was complete. This was later solved by the ResNet architecture. The architecture consists of three parts stacked on top of one another: The stem (data ingestion): The first few convolutional layers perform data preprocessing to downscale images to a smaller size. The body (data processing): The next many Inception modules perform the bulk of data processing. The head (prediction): The final fully-connected layer and softmax produces a probability distribution for image classification. This structure is used in most modern CNN architectures. === Inception v2 === Inception v2 was released in 2015, in a paper that is more famous for proposing batch normalization. It had 13.6 million parameters. It improves on Inception v1 by adding batch normalization, and removing dropout and local response normalization which they found became unnecessary when batch normalization is used. === Inception v3 === Inception v3 was released in 2016. It improves on Inception v2 by using factorized convolutions. As an example, a single 5×5 convolution can be factored into 3×3 stacked on top of another 3×3. Both has a receptive field of size 5×5. The 5×5 convolution kernel has 25 parameters, compared to just 18 in the factorized version. Thus, the 5×5 convolution is strictly more powerful than the factorized version. However, this power is not necessarily needed. Empirically, the research team found that factorized convolutions help. It also uses a form of dimension-reduction by concatenating the output from a convolutional layer and a pooling layer. As an example, a tensor of size 35 × 35 × 320 {\displaystyle 35\times 35\times 320} can be downscaled by a convolution with stride 2 to 17 × 17 × 320 {\displaystyle 17\times 17\times 320} , and by maxpooling with pool size 2 × 2 {\displaystyle 2\times 2} to 17 × 17 × 320 {\displaystyle 17\times 17\times 320} . These are then concatenated to 17 × 17 × 640 {\displaystyle 17\times 17\times 640} . Other than this, it also removed the lowest auxiliary classifier during training. They found that the auxiliary head worked as a form of regularization. They also proposed label-smoothing regularization in classification. For an image with label c {\displaystyle c} , instead of making the model to predict the probability distribution δ c = ( 0 , 0 , … , 0 , 1 ⏟ c -th entry , 0 , … , 0 ) {\displaystyle \delta _{c}=(0,0,\dots ,0,\underbrace {1} _{c{\text{-th entry}}},0,\dots ,0)} , they made the model predict the smoothed distribution ( 1 − ϵ ) δ c + ϵ / K {\displaystyle (1-\epsilon )\delta _{c}+\epsilon /K} where K {\displaystyle K} is the total number of classes. === Inception v4 === In 2017, the team released Inception v4, Inception ResNet v1, and Inception ResNet v2. Inception v4 is an incremental update with even more factorized convolutions, and other complications that were empirically found to improve benchmarks. Inception ResNet v1 and v2 are both modifications of Inception v4, where residual connections are added to each Inception module, inspired by the ResNet architecture. === Xception === Xception ("Extreme Inception") was published in 2017. It is a linear stack of depthwise separable convolution layers with residual connections. The design was proposed on the hypothesis that in a CNN, the cross-channels correlations and spatial correlations in the feature maps can be entirely decoupled. Training each network took 3 days on 60 K80 GPUs, or approximately 0.5 petaFLOP-days.

    Read more →
  • Flask (web framework)

    Flask (web framework)

    Flask is a micro web framework written in Python. It is classified as a microframework because it does not require particular tools or libraries. It has no database abstraction layer, form validation, or any other components where pre-existing third-party libraries provide common functions. However, Flask supports extensions that can add application features as if they were implemented in Flask itself. Extensions exist for object-relational mappers, form validation, upload handling, various open authentication technologies and several common framework related tools. Applications that use the Flask framework include Pinterest and LinkedIn. == History == Flask was created by Armin Ronacher of Pocoo, an international group of Python enthusiasts formed in 2004. According to Ronacher, the idea was originally an April Fool's joke that was popular enough to make into a serious application. The name is a play on the earlier Bottle framework. When Ronacher and Georg Brandl created a bulletin board system written in Python in 2004, the Pocoo projects Werkzeug and Jinja were developed. In April 2016, the Pocoo team was disbanded and development of Flask and related libraries passed to the newly formed Pallets project. Flask has become popular among Python enthusiasts. As of October 2020, it has the second-most number of stars on GitHub among Python web-development frameworks, only slightly behind Django, and was voted the most popular web framework in the Python Developers Survey for years between and including 2018 and 2022. == Components == The microframework Flask is part of the Pallets Projects (formerly Pocoo), and based on several others of them, all under a BSD license. === Werkzeug === Werkzeug (German for "tool") is a utility library for the Python programming language for Web Server Gateway Interface (WSGI) applications. Werkzeug can instantiate objects for request, response, and utility functions. It can be used as the basis for a custom software framework and supports Python 2.7 and 3.5 and later. === Jinja === Jinja, also by Ronacher, is a template engine for the Python programming language. Similar to the Django web framework, it handles templates in a sandbox. === MarkupSafe === MarkupSafe is a string handling library for the Python programming language. The eponymous MarkupSafe type extends the Python string type and marks its contents as "safe"; combining MarkupSafe with regular strings automatically escapes the unmarked strings, while avoiding double escaping of already marked strings. === ItsDangerous === ItsDangerous is a safe data serialization library for the Python programming language. It is used to store the session of a Flask application in a cookie without allowing users to tamper with the session contents. === Click === Click is a Python package used by Flask to create command-line interfaces (CLI) by providing a simple and composable way to define commands, arguments, and options. == Features == Development server and debugger Integrated support for unit testing RESTful request dispatching Uses Jinja templating Support for secure cookies (client side sessions) 100% WSGI 1.0 compliant Unicode-based Complete documentation Google App Engine compatibility Extensions available to extend functionality == Example == The following code shows a simple web application that displays "Hello World!" when visited: === Render Template with Flask === ==== Jinja in HTML for the Render Template ====

    Read more →
  • Gorn address

    Gorn address

    A Gorn address (Gorn, 1967) is a method of identifying and addressing any node within a tree data structure. This notation is often used for identifying nodes in a parse tree defined by phrase structure rules. The Gorn address is a sequence of zero or more integers conventionally separated by dots, e.g., 0 or 1.0.1. The root which Gorn calls can be regarded as the empty sequence. And the j {\displaystyle j} -th child of the i {\displaystyle i} -th child has an address i . j {\displaystyle i.j} , counting from 0. It is named after American computer scientist Saul Gorn.

    Read more →
  • VLLM

    VLLM

    vLLM is an open-source software framework for inference and serving of large language models and related multimodal models. Originally developed at the University of California, Berkeley's Sky Computing Lab, the project is centered on PagedAttention, a memory-management method for transformer key–value caches, and supports features such as continuous batching, distributed inference, quantization, and OpenAI-compatible APIs. According to a project maintainer, the "v" in vLLM originally referred to "virtual", inspired by virtual memory. == History == vLLM was introduced in 2023 by researchers affiliated with the Sky Computing Lab at UC Berkeley. Its core ideas were described in the 2023 paper Efficient Memory Management for Large Language Model Serving with PagedAttention, which presented the system as a high-throughput and memory-efficient serving engine for large language models. In 2025, the PyTorch Foundation announced that vLLM had become a Foundation-hosted project. PyTorch's project page states that the University of California, Berkeley contributed vLLM to the Linux Foundation in July 2024. In January 2026, TechCrunch reported that the creators of vLLM had launched the startup Inferact to commercialize the project, raising $150 million in seed funding. == Architecture == According to its 2023 paper, vLLM was designed to improve the efficiency of large language model serving by reducing memory waste in the key–value cache used during transformer inference. The paper introduced PagedAttention, an algorithm inspired by virtual memory and paging techniques in operating systems, and described vLLM as using block-level memory management and request scheduling to increase throughput while maintaining similar latency. The project documentation and repository describe support for continuous batching, chunked prefill, speculative decoding, prefix caching, quantization, and multiple forms of distributed inference and serving. PyTorch has described vLLM as a high-throughput, memory-efficient inference and serving engine that supports a range of hardware back ends, including NVIDIA and AMD GPUs, Google TPUs, AWS Trainium, and Intel processors.

    Read more →
  • Calais (Reuters product)

    Calais (Reuters product)

    Calais is a service created by Thomson Reuters that automatically extracts semantic information from web pages in a format that can be used on the semantic web. Calais was launched in January 2008, and is free to use. The technology is now available via the website of Refinitiv, a provider of financial market data and infrastructure founded in 2018, that is a subsidiary of London Stock Exchange Group. The Calais Web service reads unstructured text and returns Resource Description Framework formatted results identifying entities, facts and events within the text. The service appears to be based on technology acquired when Reuters purchased ClearForest in 2007. The technology has also been used to automatically tag blog articles, and organize museum collections. Calais uses natural language processing technologies delivered via a web service interface.

    Read more →
  • MLOps

    MLOps

    MLOps or ML Ops is a paradigm that aims to deploy and maintain machine learning models in production reliably and efficiently. It bridges the gap between machine learning development and production operations, ensuring that models are robust, scalable, and aligned with business goals. The word is a compound of "machine learning" and the continuous delivery practice (CI/CD) of DevOps in the software field. Machine learning models are tested and developed in isolated experimental systems. When an algorithm is ready to be launched, MLOps is practiced between data scientists, DevOps, and machine learning engineers to transition the algorithm to production systems. Similar to DevOps or DataOps approaches, MLOps seeks to increase automation and improve the quality of production models, while also focusing on business and regulatory requirements. While MLOps started as a set of best practices, it is slowly evolving into an independent approach to ML lifecycle management. MLOps applies to the entire lifecycle - from integrating with model generation (software development lifecycle, continuous integration/continuous delivery), orchestration, and deployment, to health, diagnostics, governance, and business metrics. == Definition == MLOps is a paradigm, including aspects like best practices, sets of concepts, as well as a development culture when it comes to the end-to-end conceptualization, implementation, monitoring, deployment, and scalability of machine learning products. Most of all, it is an engineering practice that leverages three contributing disciplines: machine learning, software engineering (especially DevOps), and data engineering. MLOps is aimed at productionizing machine learning systems by bridging the gap between development (Dev) and operations (Ops). Essentially, MLOps aims to facilitate the creation of machine learning products by leveraging these principles: CI/CD automation, workflow orchestration, reproducibility; versioning of data, model, and code; collaboration; continuous ML training and evaluation; ML metadata tracking and logging; continuous monitoring; and feedback loops. == History == Interest in operationalizing machine learning systems began to grow in the mid-2010s as ML projects started moving from experimentation to production use. The challenges associated with sustaining such systems were highlighted in a 2015 paper. The predicted growth in machine learning included an estimated doubling of ML pilots and implementations from 2017 to 2018, and again from 2018 to 2020. Reports show a majority (up to 88%) of corporate machine learning initiatives are struggling to move beyond test stages. However, those organizations that actually put machine learning into production saw a 3–15% profit margin increases. The MLOps market size was USD 2,191.8 Million in 2024, and is projected to be USD 16,613.4 Million in 2030. == Architecture == Machine Learning systems can be categorized in eight different categories: data collection, data processing, feature engineering, data labeling, model design, model training and optimization, endpoint deployment, and endpoint monitoring. Each step in the machine learning lifecycle is built in its own system, but requires interconnection. These are the minimum systems that enterprises need to scale machine learning within their organization. == Goals == There are a number of goals enterprises want to achieve through MLOps systems successfully implementing ML across the enterprise, including: Deployment and automation Reproducibility of models and predictions Diagnostics Governance and regulatory compliance Scalability Collaboration Business uses Monitoring and management A standard practice, such as MLOps, takes into account each of the aforementioned areas, which can help enterprises optimize workflows and avoid issues during implementation. Vendors such as Adaptive ML deliver commercial reinforcement learning operations (RLOps) and MLOps-infrastructure, targeting organizations deploying large language models in production. A common architecture of an MLOps system would include data science platforms where models are constructed and the analytical engines where computations are performed, with the MLOps tool orchestrating the movement of machine learning models, data and outcomes between the systems.

    Read more →
  • Meesho

    Meesho

    Meesho Limited (short for Meri shop, transl. My shop) is an Indian e-commerce company, headquartered in Bengaluru. Founded by Vidit Aatrey and Sanjeev Barnwal in December 2015, Meesho is an online marketplace in categories such as fashion, home and kitchen, beauty and personal care, electronics accessories, and daily use products. == History == Meesho Private Limited, formerly Fashnear Technologies Private Limited, was established by IIT Delhi graduates Vidit Aatrey and Sanjeev Barnwal in December, 2015 In 2016, the founders came up with the idea of re-establishing the platform as Meesho, one that would enable country-wide shipping for resellers with the use of social media sites as tools for marketing. In February 2019, the platform reported having around 209,000 users and about 1.2 million monthly orders, and in March 2020, it reported approximately 563,000 users and 3.1 million monthly orders. In 2021, the Meesho mobile application was ranked among the most downloaded shopping apps globally. In 2022, Meesho had about 120 million monthly users and about 910 million orders were made through the platform, with a gross merchandise value (GMV) of about $5 billion. According to report as of August 2023 Meesho delisted 42 lakh counterfeit listings and 10 lakh restricted products under its initiative Project Suraksha. During the same period, the platform blocked access for over 12,000 user accounts flagged for policy violations. The Court granted injunctive relief by directing domain registrars to suspend the infringing websites. Additionally, the Court ordered law enforcement authorities to initiate criminal investigations, freeze associated financial accounts against the identified offenders. In 2023, Meesho became the fastest shopping app to cross over 500 million downloads. In 2024, Meesho introduced Valmo, a logistics marketplace, to provide shipment services to sellers by aggregating multiple logistics providers. Meesho employs over 3,000 small businesses and 10-12 large firms for warehousing and sorting operations within its logistics framework. According to media reports, Valmo operating in approximately 15,000 pincodes in India with around 6,000 partners. It is reported to handle over 50% of Meesho's daily orders. In November 2024, Meesho introduced a generative AI-powered voice bot for customer support, managing approximately 60,000 calls daily in English and Hindi. According to media reports, the system resolves the majority of queries without human assistance, with only a small fraction of calls requiring manual intervention. According to media reports, in 2024, Meesho prevented over 22 million suspicious or potentially fraudulent transactions on its platform. The company initiated legal proceedings, resulting in the filing of twelve cases, including nine specifically targeting over forty individuals in the cities of Kolkata and Ranchi. The company filed a suit in the Delhi High Court for a permanent injunction against parties operating deceptive websites misappropriating its brand identity. Meesha went public through an initial public offering in December 2025, raising $603 million. It is listed on both the BSE and NSE. == Recognition == In 2023, Meesho was named one of the most influential companies of the year by Time (magazine).

    Read more →
  • Transfer learning

    Transfer learning

    Transfer learning (TL) is a technique in machine learning (ML) in which knowledge learned from a task is re-used in order to boost performance on a related task. For example, for image classification, knowledge gained while learning to recognize cars could be applied when trying to recognize trucks. This topic is related to the psychological literature on transfer of learning, although practical ties between the two fields are limited. Reusing or transferring information from previously learned tasks to new tasks has the potential to significantly improve learning efficiency. Since transfer learning makes use of training with multiple objective functions it is related to cost-sensitive machine learning and multi-objective optimization. == History == In 1976, Bozinovski and Fulgosi published a paper addressing transfer learning in neural network training. The paper gives a mathematical and geometrical model of the topic. In 1981, a report considered the application of transfer learning to a dataset of images representing letters of computer terminals, experimentally demonstrating positive and negative transfer learning. In 1992, Lorien Pratt formulated the discriminability-based transfer (DBT) algorithm. By 1998, the field had advanced to include multi-task learning, along with more formal theoretical foundations. Influential publications on transfer learning include the book Learning to Learn in 1998, a 2009 survey and a 2019 survey. Ng said in his NIPS 2016 tutorial that TL would become the next driver of machine learning commercial success after supervised learning. In the 2020 paper, "Rethinking Pre-Training and self-training", Zoph et al. reported that pre-training can hurt accuracy, and advocate self-training instead. == Definition == The definition of transfer learning is given in terms of domains and tasks. A domain D {\displaystyle {\mathcal {D}}} consists of: a feature space X {\displaystyle {\mathcal {X}}} and a marginal probability distribution P ( X ) {\displaystyle P(X)} , where X = { x 1 , . . . , x n } ∈ X {\displaystyle X=\{x_{1},...,x_{n}\}\in {\mathcal {X}}} . Given a specific domain, D = { X , P ( X ) } {\displaystyle {\mathcal {D}}=\{{\mathcal {X}},P(X)\}} , a task consists of two components: a label space Y {\displaystyle {\mathcal {Y}}} and an objective predictive function f : X → Y {\displaystyle f:{\mathcal {X}}\rightarrow {\mathcal {Y}}} . The function f {\displaystyle f} is used to predict the corresponding label f ( x ) {\displaystyle f(x)} of a new instance x {\displaystyle x} . This task, denoted by T = { Y , f ( x ) } {\displaystyle {\mathcal {T}}=\{{\mathcal {Y}},f(x)\}} , is learned from the training data consisting of pairs { x i , y i } {\displaystyle \{x_{i},y_{i}\}} , where x i ∈ X {\displaystyle x_{i}\in {\mathcal {X}}} and y i ∈ Y {\displaystyle y_{i}\in {\mathcal {Y}}} . Given a source domain D S {\displaystyle {\mathcal {D}}_{S}} and learning task T S {\displaystyle {\mathcal {T}}_{S}} , a target domain D T {\displaystyle {\mathcal {D}}_{T}} and learning task T T {\displaystyle {\mathcal {T}}_{T}} , where D S ≠ D T {\displaystyle {\mathcal {D}}_{S}\neq {\mathcal {D}}_{T}} , or T S ≠ T T {\displaystyle {\mathcal {T}}_{S}\neq {\mathcal {T}}_{T}} , transfer learning aims to help improve the learning of the target predictive function f T ( ⋅ ) {\displaystyle f_{T}(\cdot )} in D T {\displaystyle {\mathcal {D}}_{T}} using the knowledge in D S {\displaystyle {\mathcal {D}}_{S}} and T S {\displaystyle {\mathcal {T}}_{S}} . == Applications == Algorithms for transfer learning are available in Markov logic networks and Bayesian networks. Transfer learning has been applied to cancer subtype discovery, building utilization, general game playing, text classification, digit recognition, medical imaging and spam filtering. In 2020, it was discovered that, due to their similar physical natures, transfer learning is possible between electromyographic (EMG) signals from the muscles and classifying the behaviors of electroencephalographic (EEG) brainwaves, from the gesture recognition domain to the mental state recognition domain. It was noted that this relationship worked in both directions, showing that electroencephalographic can likewise be used to classify EMG. The experiments noted that the accuracy of neural networks and convolutional neural networks were improved through transfer learning both prior to any learning (compared to standard random weight distribution) and at the end of the learning process (asymptote). That is, results are improved by exposure to another domain. Moreover, the end-user of a pre-trained model can change the structure of fully-connected layers to improve performance.

    Read more →
  • Retrieval-augmented generation

    Retrieval-augmented generation

    Retrieval-augmented generation (RAG) is a technique that enables large language models (LLMs) to retrieve and incorporate new information from external data sources. With RAG, LLMs first refer to a specified set of documents, then respond to user queries. These documents supplement information from the LLM's pre-existing training data. This allows LLMs to use domain-specific and/or updated information that is not available in the training data. For example, this enables LLM-based chatbots to access internal company data or generate responses based on authoritative sources. RAG improves LLMs by incorporating information retrieval before generating responses. Unlike LLMs that rely on static training data, RAG pulls relevant text from databases, uploaded documents, or web sources. According to Ars Technica, "RAG is a way of improving LLM performance, in essence by blending the LLM process with a web search or other document look-up process to help LLMs stick to the facts." This method helps reduce AI hallucinations, which have caused chatbots to describe policies that don't exist, or recommend nonexistent legal cases to lawyers that are looking for citations to support their arguments. RAG also reduces the need to retrain LLMs with new data, saving on computational and financial costs. Beyond efficiency gains, RAG also allows LLMs to include sources in their responses, so users can verify the cited sources. This provides greater transparency, as users can cross-check retrieved content to ensure accuracy and relevance. The term retrieval-augmented generation (RAG) was introduced in a 2020 paper that described combining a parametric language model with a non-parametric external memory accessed through retrieval at inference time. == RAG and LLM limitations == LLMs can provide incorrect information. For example, when Google first demonstrated its LLM tool "Google Bard" (later re-branded to Gemini), the LLM provided incorrect information about the James Webb Space Telescope. This error contributed to a $100 billion decline in Google's stock value. RAG is used to prevent these errors, but it does not solve all the problems. For example, LLMs can generate misinformation even when pulling from factually correct sources if they misinterpret the context. MIT Technology Review gives the example of an AI-generated response stating, "The United States has had one Muslim president, Barack Hussein Obama." The model retrieved this from an academic book rhetorically titled Barack Hussein Obama: America's First Muslim President? The LLM did not "know" or "understand" the context of the title, generating a false statement. LLMs with RAG are programmed to prioritize new information. This technique has been called "prompt stuffing." Without prompt stuffing, the LLM's input is generated by a user; with prompt stuffing, additional relevant context is added to this input to guide the model's response. This approach provides the LLM with key information early in the prompt, encouraging it to prioritize the supplied data over pre-existing training knowledge. == Process == Retrieval-augmented generation (RAG) enhances large language models (LLMs) by incorporating an information-retrieval mechanism that allows models to access and utilize additional data beyond their original training set. Ars Technica notes that "when new information becomes available, rather than having to retrain the model, all that's needed is to augment the model's external knowledge base with the updated information" ("augmentation"). IBM states that "in the generative phase, the LLM draws from the augmented prompt and its internal representation of its training data to synthesize" an answer. === RAG key stages === Typically, the data to be referenced is converted into LLM embeddings, numerical representations in the form of a large vector space. RAG can be used on unstructured (usually text), semi-structured, or structured data (for example knowledge graphs). These embeddings are then stored in a vector database to allow for document retrieval. Given a user query, a document retriever is first called to select the most relevant documents that will be used to augment the query. This comparison can be done using a variety of methods, which depend in part on the type of indexing used. The model feeds this relevant retrieved information into the LLM via prompt engineering of the user's original query. Newer implementations (as of 2023) can also incorporate specific augmentation modules with abilities such as expanding queries into multiple domains and using memory and self-improvement to learn from previous retrievals. Finally, the LLM can generate output based on both the query and the retrieved documents. Some models incorporate extra steps to improve output, such as the re-ranking of retrieved information, context selection, and fine-tuning. == Applications == Retrieval-augmented generation is used in applications where generated responses need to be grounded in external or frequently updated information. Commonly cited use cases include search engines, question-answering systems, customer support chatbots, enterprise knowledge assistants, content generation, recommendation systems, retail and e-commerce, and industrial or manufacturing workflows. In healthcare, RAG has been studied as a way to ground large language model outputs in external medical knowledge sources, although reviews have noted continuing challenges around evaluation, ethics, and clinical reliability. == Improvements == Improvements to the basic process above can be applied at different stages in the RAG flow. === Encoder === These methods focus on the encoding of text as either dense or sparse vectors. Sparse vectors, which encode the identity of a word, are typically dictionary-length and contain mostly zeros. Dense vectors, which encode meaning, are more compact and contain fewer zeros. Various enhancements can improve the way similarities are calculated in the vector stores (databases). Performance improves by optimizing how vector similarities are calculated. Dot products enhance similarity scoring, while approximate nearest neighbor (ANN) searches improve retrieval efficiency over K-nearest neighbors (KNN) searches. Accuracy may be improved with Late Interactions, which allow the system to compare words more precisely after retrieval. This helps refine document ranking and improve search relevance. Hybrid vector approaches may be used to combine dense vector representations with sparse one-hot vectors, taking advantage of the computational efficiency of sparse dot products over dense vector operations. Other retrieval techniques focus on improving accuracy by refining how documents are selected. Some retrieval methods combine sparse representations, such as SPLADE, with query expansion strategies to improve search accuracy and recall. === Retriever-centric methods === These methods aim to enhance the quality of document retrieval in vector databases: Pre-training the retriever using the Inverse Cloze Task (ICT), a technique that helps the model learn retrieval patterns by predicting masked text within documents. Supervised retriever optimization aligns retrieval probabilities with the generator model's likelihood distribution. This involves retrieving the top-k vectors for a given prompt, scoring the generated response's perplexity, and minimizing KL divergence between the retriever's selections and the model's likelihoods to refine retrieval. Reranking techniques can refine retriever performance by prioritizing the most relevant retrieved documents during training. === Language model === By redesigning the language model with the retriever in mind, a 25-time smaller network can get comparable perplexity as its much larger counterparts. Because it is trained from scratch, this method (Retro) incurs the high cost of training runs that the original RAG scheme avoided. The hypothesis is that by giving domain knowledge during training, Retro needs less focus on the domain and can devote its smaller weight resources only to language semantics. The redesigned language model is shown here. It has been reported that Retro is not reproducible, so modifications were made to make it so. The more reproducible version is called Retro++ and includes in-context RAG. === Chunking === Chunking involves various strategies for breaking up the data into vectors so the retriever can find details in it. Three types of chunking strategies are: Fixed length with overlap. This is fast and easy. Overlapping consecutive chunks helps to maintain semantic context across chunks. Syntax-based chunks can break the document up into sentences. Libraries such as spaCy or NLTK can also help. File format-based chunking. Certain file types have natural chunks built in, and it's best to respect them. For example, code files are best chunked and vectorized as whole functions or classes. HTML files should leave

    or base64 encoded elements

    Read more →
  • Human–robot collaboration

    Human–robot collaboration

    Human-Robot Collaboration is the study of collaborative processes in human and robot agents work together to achieve shared goals. Many new applications for robots require them to work alongside people as capable members of human-robot teams. These include robots for homes, hospitals, and offices, space exploration and manufacturing. Human-Robot Collaboration (HRC) is an interdisciplinary research area comprising classical robotics, human-computer interaction, artificial intelligence, process design, layout planning, ergonomics, cognitive sciences, and psychology. Industrial applications of human-robot collaboration involve Collaborative Robots, or cobots, that physically interact with humans in a shared workspace to complete tasks such as collaborative manipulation or object handovers. == Collaborative Activity == Collaboration is defined as a special type of coordinated activity, one in which two or more agents work jointly with each other, together performing a task or carrying out the activities needed to satisfy a shared goal. The process typically involves shared plans, shared norms and mutually beneficial interactions. Although collaboration and cooperation are often used interchangeably, collaboration differs from cooperation as it involves a shared goal and joint action where the success of both parties depend on each other. For effective human-robot collaboration, it is imperative that the robot is capable of understanding and interpreting several communication mechanisms similar to the mechanisms involved in human-human interaction. The robot must also communicate its own set of intents and goals to establish and maintain a set of shared beliefs and to coordinate its actions to execute the shared plan. In addition, all team members demonstrate commitment to doing their own part, to the others doing theirs, and to the success of the overall task. == Theories Informing Human-Robot Collaboration == Human-human collaborative activities are studied in depth in order to identify the characteristics that enable humans to successfully work together. These activity models usually aim to understand how people work together in teams, how they form intentions and achieve a joint goal. Theories on collaboration inform human-robot collaboration research to develop efficient and fluent collaborative agents. === Belief Desire Intention Model === The belief-desire-intention (BDI) model is a model of human practical reasoning that was originally developed by Michael Bratman. The approach is used in intelligent agents research to describe and model intelligent agents. The BDI model is characterized by the implementation of an agent's beliefs (the knowledge of the world, state of the world), desires (the objective to accomplish, desired end state) and intentions (the course of actions currently under execution to achieve the desire of the agent) in order to deliberate their decision-making processes. BDI agents are able to deliberate about plans, select plans and execute plans. === Shared Cooperative Activity === Shared Cooperative Activity defines certain prerequisites for an activity to be considered shared and cooperative: mutual responsiveness, commitment to the joint activity and commitment to mutual support. An example case to illustrate these concepts would be a collaborative activity where agents are moving a table out the door, mutual responsiveness ensures that movements of the agents are synchronized; a commitment to the joint activity reassures each team member that the other will not at some point drop his side; and a commitment to mutual support deals with possible breakdowns due to one team member's inability to perform part of the plan. === Joint Intention Theory === Joint Intention Theory proposes that for joint action to emerge, team members must communicate to maintain a set of shared beliefs and to coordinate their actions towards the shared plan. In collaborative work, agents should be able to count on the commitment of other members, therefore each agent should inform the others when they reach the conclusion that a goal is achievable, impossible, or irrelevant. == Approaches to Human-Robot Collaboration == The approaches to human-robot collaboration include human emulation (HE) and human complementary (HC) approaches. Although these approaches have differences, there are research efforts to develop a unified approach stemming from potential convergences such as Collaborative Control. === Human Emulation === The human emulation approach aims to enable computers to act like humans or have human-like abilities in order to collaborate with humans. It focuses on developing formal models of human-human collaboration and applying these models to human-computer collaboration. In this approach, humans are viewed as rational agents who form and execute plans for achieving their goals and infer other people's plans. Agents are required to infer the goals and plans of other agents, and collaborative behavior consists of helping other agents to achieve their goals. === Human Complementary === The human complementary approach seeks to improve human-computer interaction by making the computer a more intelligent partner that complements and collaborates with humans. The premise is that the computer and humans have fundamentally asymmetric abilities. Therefore, researchers invent interaction paradigms that divide responsibility between human users and computer systems by assigning distinct roles that exploit the strengths and overcome the weaknesses of both partners. == Key Aspects == Specialization of Roles: Based on the level of autonomy and intervention, there are several human-robot relationships including master-slave, supervisor–subordinate, partner–partner, teacher–learner and fully autonomous robot. In addition to these roles, homotopy (a weighting function that allows a continuous change between leader and follower behaviors) was introduced as a flexible role distribution. Establishing shared goal(s): Through direct discussion about goals or inference from statements and actions, agents must determine the shared goals they are trying to achieve. Allocation of Responsibility and Coordination: Agents must decide how to achieve their goals, determine what actions will be done by each agent, and how to coordinate the actions of individual agents and integrate their results. Shared context: Agents must be able to track progress toward their goals. They must keep track of what has been achieved and what remains to be done. They must evaluate the effects of actions and determine whether an acceptable solution has been achieved. Communication: Any collaboration requires communication to define goals, negotiate over how to proceed and who will do what, and evaluate progress and results. Adaptation and learning: Collaboration over time require partners to adapt themselves to each other and learn from one's partner both directly or indirectly. Time and space: The time-space taxonomy divides human-robot interaction into four categories based on whether the humans and robots are using computing systems at the same time (synchronous) or different times (asynchronous) and while in the same place (collocated) or in different places (non-collocated). Ergonomics: Human factors and ergonomics are one of the key aspects for a sustainable human-robot collaboration. The robot control system can use biomechanical models and sensors to optimize various ergonomic metrics, such as muscle fatigue.

    Read more →
  • Vicarious (company)

    Vicarious (company)

    Vicarious was an artificial intelligence company based in the San Francisco Bay Area, California. They use the theorized computational principles of the brain to attempt to build software that can think and learn like a human. Vicarious describes its technology as "a turnkey robotics solution integrator using artificial intelligence to automate tasks too complex and versatile for traditional automations". Alphabet Inc acquired the company in 2022 for an undisclosed amount. == Founders == The company was founded in 2010 by D. Scott Phoenix and Dileep George. Before co-founding Vicarious, Phoenix was Entrepreneur in Residence at Founders Fund and CEO of Frogmetrics, a touchscreen analytics company he co-founded through the Y Combinator incubator program. Previously, George was Chief Technology Officer at Numenta, a company he co-founded with Jeff Hawkins and Donna Dubinsky while completing his PhD at Stanford University. == Funding == The company launched in February 2011 with funding from Founders Fund, Dustin Moskovitz, Adam D’Angelo (former Facebook CTO and co-founder of Quora), Felicis Ventures, and Palantir co-founder Joe Lonsdale. In August 2012, in its Series A round of funding, it raised an additional $15 million. The round was led by Good Ventures; Founders Fund, Open Field Capital and Zarco Investment Group also participated. The company received $40 million in its Series B round of funding. The round was led by individuals including Mark Zuckerberg, Elon Musk, and others. An additional undisclosed amount was later contributed by Amazon.com CEO Jeff Bezos, Yahoo! co-founder Jerry Yang, Skype co-founder Janus Friis and Salesforce.com CEO Marc Benioff. == Recursive Cortical Network == Vicarious is developing machine learning software based on the computational principles of the human brain. One such software is a vision system known as the Recursive Cortical Network (RCN), it is a generative graphical visual perception system that interprets the contents of photographs and videos in a manner similar to humans. The system is powered by a balanced approach that takes sensory data, mathematics, and biological plausibility into consideration. On October 22, 2013, beating CAPTCHA, Vicarious announced its model was reliably able to solve modern CAPTCHAs, with character recognition rates of 90% or better when trained on one style. However, Luis von Ahn, a pioneer of early CAPTCHA and founder of reCAPTCHA, expressed skepticism, stating: "It's hard for me to be impressed since I see these every few months." He pointed out that 50 similar claims to that of Vicarious had been made since 2003. Vicarious later published their findings in peer-reviewed journal Science. Vicarious has indicated that its AI was not specifically designed to complete CAPTCHAs and its success at the task is a product of its advanced vision system. Because Vicarious's algorithms are based on insights from the human brain, it is also able to recognize photographs, videos, and other visual data.

    Read more →
  • LaMDA

    LaMDA

    LaMDA (Language Model for Dialogue Applications) is a family of conversational large language models developed by Google. Originally developed and introduced as Meena in 2020, the first-generation LaMDA was announced during the 2021 Google I/O keynote, while the second generation was announced the following year. In June 2022, LaMDA gained widespread attention when Google engineer Blake Lemoine made claims that the chatbot had become sentient. The scientific community has largely rejected Lemoine's claims, though it has led to conversations about the efficacy of the Turing test, which measures whether a computer can pass for a human. In February 2023, Google announced Gemini (then Bard), a conversational artificial intelligence chatbot powered by LaMDA, to counter the rise of OpenAI's ChatGPT. == History == === Background === On January 28, 2020, Google unveiled Meena, a neural network-powered chatbot with 2.6 billion parameters, which Google claimed to be superior to all other existing chatbots. The company previously hired computer scientist Ray Kurzweil in 2012 to develop multiple chatbots for the company, including one named Danielle. The Google Brain research team, who developed Meena, hoped to release the chatbot to the public in a limited capacity, but corporate executives refused on the grounds that Meena violated Google's "AI principles around safety and fairness". Meena was later renamed LaMDA as its data and computing power increased, and the Google Brain team again sought to deploy the software to the Google Assistant, the company's virtual assistant software, in addition to opening it up to a public demo. Both requests were once again denied by company leadership. LaMDA's two lead researchers, Daniel de Freitas and Noam Shazeer, eventually left the company in frustration. === First generation === Google announced the LaMDA conversational large language model during the Google I/O keynote on May 18, 2021, powered by artificial intelligence. The acronym stands for "Language Model for Dialogue Applications". Built on the seq2seq architecture, transformer-based neural networks developed by Google Research in 2017, LaMDA was trained on human dialogue and stories, allowing it to engage in open-ended conversations. Google states that responses generated by LaMDA have been ensured to be "sensible, interesting, and specific to the context". LaMDA has access to multiple symbolic text processing systems, including a database, a real-time clock and calendar, a mathematical calculator, and a natural language translation system, giving it superior accuracy in tasks supported by those systems, and making it among the first dual process chatbots. LaMDA is also not stateless because its "sensibleness" metric is fine-tuned by "pre-conditioning" each dialog turn by prepending many of the most recent dialog interactions, on a user-by-user basis. LaMDA is tuned on nine unique performance metrics: sensibleness, specificity, interestingness, safety, groundedness, informativeness, citation accuracy, helpfulness, and role consistency. Tests by Google indicated that LaMDA surpassed human responses in the area of interestingness. The pre-training dataset consists of 2.97B documents, 1.12B dialogs, and 13.39B utterances, for a total of 1.56T words. The largest LaMDA model has 137B non-embedding parameters. === Second generation === On May 11, 2022, Google unveiled LaMDA 2, the successor to LaMDA, during the 2022 Google I/O keynote. The new incarnation of the model draws examples of text from numerous sources, using it to formulate unique "natural conversations" on topics that it may not have been trained to respond to. === Sentience claims === On June 11, 2022, The Washington Post reported that Google engineer Blake Lemoine had been placed on paid administrative leave after Lemoine told company executives Blaise Agüera y Arcas and Jen Gennai that LaMDA had become sentient. Lemoine came to this conclusion after the chatbot made questionable responses to questions regarding self-identity, moral values, religion, and Isaac Asimov's Three Laws of Robotics. Google refuted these claims, insisting that there was substantial evidence to indicate that LaMDA was not sentient. In an interview with Wired, Lemoine reiterated his claims that LaMDA was "a person" as dictated by the Thirteenth Amendment to the U.S. Constitution, comparing it to an "alien intelligence of terrestrial origin". He further revealed that he had been dismissed by Google after he hired an attorney on LaMDA's behalf after the chatbot requested that Lemoine do so. On July 22, Google fired Lemoine, asserting that Blake had violated their policies "to safeguard product information" and rejected his claims as "wholly unfounded". Internal controversy instigated by the incident prompted Google executives to decide against releasing LaMDA to the public, which it had previously been considering. Lemoine's claims were widely pushed back by the scientific community. Many experts rejected the idea that LaMDA was sentient, including former New York University psychology professor Gary Marcus, David Pfau of Google sister company DeepMind, Erik Brynjolfsson of the Institute for Human-Centered Artificial Intelligence at Stanford University, and University of Surrey professor Adrian Hilton. Yann LeCun, who leads Meta Platforms' AI research team, stated that neural networks such as LaMDA were "not powerful enough to attain true intelligence". University of California, Santa Cruz professor Max Kreminski noted that LaMDA's architecture did not "support some key capabilities of human-like consciousness" and that its neural network weights were "frozen", assuming it was a typical large language model. Philosopher Nick Bostrom noted, however, that the lack of precise and consensual criteria for determining whether a system is conscious warrants some uncertainty. IBM Watson lead developer David Ferrucci compared how LaMDA appeared to be human in the same way Watson did when it was first introduced. Former Google AI ethicist Timnit Gebru called Lemoine a victim of a "hype cycle" initiated by researchers and the media. Lemoine's claims have also generated discussion on whether the Turing test remained useful to determine researchers' progress toward achieving artificial general intelligence, with Will Omerus of the Post opining that the test actually measured whether machine intelligence systems were capable of deceiving humans, while Brian Christian of The Atlantic said that the controversy was an instance of the ELIZA effect. == Products == === AI Test Kitchen === With the unveiling of LaMDA 2 in May 2022, Google also launched the AI Test Kitchen, a mobile application for the Android operating system powered by LaMDA capable of providing lists of suggestions on-demand based on a complex goal. Originally open only to Google employees, the app was set to be made available to "select academics, researchers, and policymakers" by invitation sometime in the year. In August, the company began allowing users in the U.S. to sign up for early access. In November, Google released a "season 2" update to the app, integrating a limited form of Google Brain's Imagen text-to-image model. A third iteration of the AI Test Kitchen was in development by January 2023, expected to launch at I/O later that year. Following the 2023 I/O keynote in May, Google added MusicLM, an AI-powered music generator first previewed in January, to the AI Test Kitchen app. In August, the app was delisted from Google Play and the Apple App Store, instead moving completely online. === Bard === On February 6, 2023, Google announced Bard, a conversational AI chatbot powered by LaMDA, in response to the unexpected popularity of OpenAI's ChatGPT chatbot. Google positions the chatbot as a "collaborative AI service" rather than a search engine. Bard became available for early access on March 21. === Other products === In addition to Bard, Pichai also unveiled the company's Generative Language API, an application programming interface also based on LaMDA, which he announced would be opened up to third-party developers in March 2023. == Architecture == LaMDA is a decoder-only Transformer language model. It is pre-trained on a text corpus that includes both documents and dialogs consisting of 1.56 trillion words, and is then trained with fine-tuning data generated by manually annotated responses for "sensibleness, interestingness, and safety". LaMDA was retrieval-augmented to improve the accuracy of facts provided to the user. Three different models were tested, with the largest having 137 billion non-embedding parameters:

    Read more →