AI App Maker No Sign Up

AI App Maker No Sign Up — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Labeled data

    Labeled data

    Labeled data is a group of samples that have been tagged with one or more labels. Labeling typically takes a set of unlabeled data and augments each piece of it with informative tags called judgments. For example, a data label might indicate whether a photo contains a horse or a cow, which words were uttered in an audio recording, what type of action is being performed in a video, what the topic of a news article is, what the overall sentiment of a tweet is, or whether a dot in an X-ray is a tumor. Labels can be obtained by having humans make judgments about a given piece of unlabeled data. Labeled data is significantly more expensive to obtain than the raw unlabeled data. The quality of labeled data directly influences the performance of supervised machine learning models in operation, as these models learn from the provided labels. == Crowdsourced labeled data == In 2006, Fei-Fei Li, the co-director of the Stanford Human-Centered AI Institute, initiated research to improve the artificial intelligence models and algorithms for image recognition by significantly enlarging the training data. The researchers downloaded millions of images from the World Wide Web and a team of undergraduates started to apply labels for objects to each image. In 2007, Li outsourced the data labeling work on Amazon Mechanical Turk, an online marketplace for digital piece work. The 3.2 million images that were labeled by more than 49,000 workers formed the basis for ImageNet, one of the largest hand-labeled database for outline of object recognition. == Automated data labelling == After obtaining a labeled dataset, machine learning models can be applied to the data so that new unlabeled data can be presented to the model and a likely label can be guessed or predicted for that piece of unlabeled data. == Challenges == === Data-driven bias === Algorithmic decision-making is subject to programmer-driven bias as well as data-driven bias. Training data that relies on bias labeled data will result in prejudices and omissions in a predictive model, despite the machine learning algorithm being legitimate. The labeled data used to train a specific machine learning algorithm needs to be a statistically representative sample to not bias the results. For example, in facial recognition systems underrepresented groups are subsequently often misclassified if the labeled data available to train has not been representative of the population,. In 2018, a study by Joy Buolamwini and Timnit Gebru demonstrated that two facial analysis datasets that have been used to train facial recognition algorithms, IJB-A and Adience, are composed of 79.6% and 86.2% lighter skinned humans respectively. === Human error and inconsistency === Human annotators are prone to errors and biases when labeling data. This can lead to inconsistent labels and affect the quality of the data set. The inconsistency can affect the machine learning model's ability to generalize well. === Domain expertise === Certain fields, such as legal document analysis or medical imaging, require annotators with specialized domain knowledge. Without the expertise, the annotations or labeled data may be inaccurate, negatively impacting the machine learning model's performance in a real-world scenario.

    Read more →
  • WaveNet

    WaveNet

    WaveNet is a deep neural network for generating raw audio. It was created by researchers at London-based AI firm DeepMind. The technique, outlined in a paper in September 2016, is able to generate relatively realistic-sounding human-like voices by directly modelling waveforms using a neural network method trained with recordings of real speech. Tests with US English and Mandarin reportedly showed that the system outperforms Google's best existing text-to-speech (TTS) systems, although as of 2016 its text-to-speech synthesis still was less convincing than actual human speech. WaveNet's ability to generate raw waveforms means that it can model any kind of audio, including music. == History == Generating speech from text is an increasingly common task thanks to the popularity of software such as Apple's Siri, Microsoft's Cortana, Amazon Alexa and the Google Assistant. Most such systems use a variation of a technique that involves concatenated sound fragments together to form recognisable sounds and words. The most common of these is called concatenative TTS. It consists of large library of speech fragments, recorded from a single speaker that are then concatenated to produce complete words and sounds. The result sounds unnatural, with an odd cadence and tone. The reliance on a recorded library also makes it difficult to modify or change the voice. Another technique, known as parametric TTS, uses mathematical models to recreate sounds that are then assembled into words and sentences. The information required to generate the sounds is stored in the parameters of the model. The characteristics of the output speech are controlled via the inputs to the model, while the speech is typically created using a voice synthesiser known as a vocoder. This can also result in unnatural sounding audio. == Design and ongoing research == === Background === WaveNet is a type of feedforward neural network known as a deep convolutional neural network (CNN). In WaveNet, the CNN takes a raw signal as an input and synthesises an output one sample at a time. It does so by sampling from a softmax (i.e. categorical) distribution of a signal value that is encoded using μ-law companding transformation and quantized to 256 possible values. === Initial concept and results === According to the original September 2016 DeepMind research paper WaveNet: A Generative Model for Raw Audio, the network was fed real waveforms of speech in English and Mandarin. As these pass through the network, it learns a set of rules to describe how the audio waveform evolves over time. The trained network can then be used to create new speech-like waveforms at 16,000 samples per second. These waveforms include realistic breaths and lip smacks – but do not conform to any language. WaveNet is able to accurately model different voices, with the accent and tone of the input correlating with the output. For example, if it is trained with German, it produces German speech. The capability also means that if the WaveNet is fed other inputs – such as music – its output will be musical. At the time of its release, DeepMind showed that WaveNet could produce waveforms that sound like classical music. === Content (voice) swapping === According to the June 2018 paper Disentangled Sequential Autoencoder, DeepMind has successfully used WaveNet for audio and voice "content swapping": the network can swap the voice on an audio recording for another, pre-existing voice while maintaining the text and other features from the original recording. "We also experiment on audio sequence data. Our disentangled representation allows us to convert speaker identities into each other while conditioning on the content of the speech." (p. 5) "For audio, this allows us to convert a male speaker into a female speaker and vice versa [...]." (p. 1) According to the paper, a two-digit minimum amount of hours (c. 50 hours) of pre-existing speech recordings of both source and target voice are required to be fed into WaveNet for the program to learn their individual features before it is able to perform the conversion from one voice to another at a satisfying quality. The authors stress that "[a]n advantage of the model is that it separates dynamical from static features [...]." (p. 8), i. e. WaveNet is capable of distinguishing between the spoken text and modes of delivery (modulation, speed, pitch, mood, etc.) to maintain during the conversion from one voice to another on the one hand, and the basic features of both source and target voices that it is required to swap on the other. The January 2019 follow-up paper Unsupervised speech representation learning using WaveNet autoencoders details a method to successfully enhance the proper automatic recognition and discrimination between dynamical and static features for "content swapping", notably including swapping voices on existing audio recordings, in order to make it more reliable. Another follow-up paper, Sample Efficient Adaptive Text-to-Speech, dated September 2018 (latest revision January 2019), states that DeepMind has successfully reduced the minimum amount of real-life recordings required to sample an existing voice via WaveNet to "merely a few minutes of audio data" while maintaining high-quality results. Its ability to clone voices has raised ethical concerns about WaveNet's ability to mimic the voices of living and dead persons. According to a 2016 BBC article, companies working on similar voice-cloning technologies (such as Adobe Voco) intend to insert watermarking inaudible to humans to prevent counterfeiting, while maintaining that voice cloning satisfying, for instance, the needs of entertainment-industry purposes would be of a far lower complexity and use different methods than required to fool forensic evidencing methods and electronic ID devices, so that natural voices and voices cloned for entertainment-industry purposes could still be easily told apart by technological analysis. == Applications == At the time of its release, DeepMind said that WaveNet required too much computational processing power to be used in real world applications. As of October 2017, Google announced a 1,000-fold performance improvement along with better voice quality. WaveNet was then used to generate Google Assistant voices for US English and Japanese across all Google platforms. In November 2017, DeepMind researchers released a research paper detailing a proposed method of "generating high-fidelity speech samples at more than 20 times faster than real-time", called "Probability Density Distillation". At the annual I/O developer conference in May 2018, it was announced that new Google Assistant voices were available and made possible by WaveNet; WaveNet greatly reduced the number of audio recordings that were required to create a voice model by modeling the raw audio of the voice actor samples.

    Read more →
  • PhyCV

    PhyCV

    PhyCV is the first computer vision library which utilizes algorithms directly derived from the equations of physics governing physical phenomena. The algorithms appearing in the first release emulate the propagation of light through a physical medium with natural and engineered diffractive properties followed by coherent detection. Unlike traditional algorithms that are a sequence of hand-crafted empirical rules, physics-inspired algorithms leverage physical laws of nature as blueprints. In addition, these algorithms can, in principle, be implemented in real physical devices for fast and efficient computation in the form of analog computing. Currently PhyCV has three algorithms, Phase-Stretch Transform (PST) and Phase-Stretch Adaptive Gradient-Field Extractor (PAGE), and Vision Enhancement via Virtual diffraction and coherent Detection (VEViD). All algorithms have CPU and GPU versions. PhyCV is now available on GitHub and can be installed from pip. == History == Algorithms in PhyCV are inspired by the physics of the photonic time stretch (a hardware technique for ultrafast and single-shot data acquisition). PST is an edge detection algorithm that was open-sourced in 2016 and has 800+ stars and 200+ forks on GitHub. PAGE is a directional edge detection algorithm that was open-sourced in February, 2022. PhyCV was originally developed and open-sourced by Jalali-Lab @ UCLA in May 2022. In the initial release of PhyCV, the original open-sourced code of PST and PAGE is significantly refactored and improved to be modular, more efficient, GPU-accelerated and object-oriented. VEViD is a low-light and color enhancement algorithm that was added to PhyCV in November 2022. == Background == === Phase-Stretch Transform (PST) === Phase-Stretch Transform (PST) is a computationally efficient edge and texture detection algorithm with exceptional performance in visually impaired images. The algorithm transforms the image by emulating propagation of light through a device with engineered diffractive property followed by coherent detection. It has been applied in improving the resolution of MRI image, extracting blood vessels in retina images, dolphin identification, and waste water treatment, single molecule biological imaging, and classification of UAV using micro Doppler imaging. === Phase-Stretch Adaptive Gradient-Field Extractor (PAGE) === Phase-Stretch Adaptive Gradient-Field Extractor (PAGE) is a physics-inspired algorithm for detecting edges and their orientations in digital images at various scales. The algorithm is based on the diffraction equations of optics. Metaphorically speaking, PAGE emulates the physics of birefringent (orientation-dependent) diffractive propagation through a physical device with a specific diffractive structure. The propagation converts a real-valued image into a complex function. Related information is contained in the real and imaginary components of the output. The output represents the phase of the complex function. === Vision Enhancement via Virtual diffraction and coherent Detection (VEViD) === Vision Enhancement via Virtual diffraction and coherent Detection (VEViD) an efficient and interpretable low-light and color enhancement algorithm that reimagines a digital image as a spatially varying metaphoric light field and then subjects the field to the physical processes akin to diffraction and coherent detection. The term “Virtual” captures the deviation from the physical world. The light field is pixelated and the propagation imparts a phase with an arbitrary dependence on frequency which can be different from the quadratic behavior of physical diffraction. VEViD can be further accelerated through mathematical approximations that reduce the computation time without appreciable sacrifice in image quality. A closed-form approximation for VEViD which we call VEViD-lite can achieve up to 200 FPS for 4K video enhancement. == PhyCV on the Edge == Featuring low-dimensionality and high-efficiency, PhyCV is ideal for edge computing applications. In this section, we demonstrate running PhyCV on NVIDIA Jetson Nano in real-time. === NVIDIA Jetson Nano Developer Kit === NVIDIA Jetson Nano Developer Kit is a small- sized and power-efficient platform for edge computing applications. It is equipped with an NVIDIA Maxwell architecture GPU with 128 CUDA cores, a quad-core ARM Cortex-A57 CPU, 4GB 64-bit LPDDR4 RAM, and supports video encoding and decoding up to 4K resolution. Jetson Nano also offers a variety of interfaces for connectivity and expansion, making it ideal for a wide range of AI and IoT applications. In our setup, we connect a USB camera to the Jetson Nano to acquire videos and demonstrate using PhyCV to process the videos in real-time. === Real-time PhyCV on Jetson Nano === We use the Jetson Nano (4GB) with NVIDIA JetPack SDK version 4.6.1, which comes with pre- installed Python 3.6, CUDA 10.2, and OpenCV 4.1.1. We further install PyTorch 1.10 to enable the GPU accelerated PhyCV. We demonstrate the results and metrics of running PhyCV on Jetson Nano in real-time for edge detection and low-light enhancement tasks. For 480p videos, both operations achieve beyond 38 FPS, which is sufficient for most cameras that capture videos at 30 FPS. For 720p videos, PhyCV low-light enhancement can operate at 24 FPS and PhyCV edge detection can operate at 17 FPS. == Highlights == === Modular Code Architecture === The code in PhyCV has a modular design which faithfully follows the physical process from which the algorithm was originated. Both PST and PAGE modules in the PhyCV library emulate the propagation of the input signal (original digital image) through a device with engineered diffractive property followed by coherent (phase) detection. The dispersive propagation applies a phase kernel to the frequency domain of the original image. This process has three steps in general, loading the image, initializing the kernel and applying the kernel. In the implementation of PhyCV, each algorithm is represented as a class in Python and each class has methods that simulate the steps described above. The modular code architecture follows the physics behind the algorithm. Please refer to the source code on GitHub for more details. === GPU Acceleration === PhyCV supports GPU acceleration. The GPU versions of PST and PAGE are built on PyTorch accelerated by the CUDA toolkit. The acceleration is beneficial for applying the algorithms in real-time image video processing and other deep learning tasks. The running time per frame of PhyCV algorithms on CPU (Intel i9-9900K) and GPU (NVIDIA TITAN RTX) for videos at different resolutions are shown below. Note that the PhyCV low-light enhancement operates in the HSV color space, so the running time also includes RGB to HSV conversion. However, for all running times using GPUs, we ignore the time of moving data from CPUs to GPUs and count the algorithm operation time only. == Installation and Examples == Please refer to the GitHub README file for a detailed technical documentation. == Current Limitations == === I/O (Input/Output) Bottleneck for Real-time Video Processing === When dealing with real-time video streams from cameras, the frames are captured and buffered in CPU and have to be moved to GPU to run the GPU-accelerated PhyCV algorithms. This process is time-consuming and it is a common bottleneck for real-time video-processing algorithms. === Lack of Parameter Adaptivity for Different Images === Currently, the parameters of PhyCV algorithms have to be manually tuned for different images. Although a set of pre-selected parameters work relatively well for a wide range of images, the lack of parameter adaptivity for different images remains a limitation for now.

    Read more →
  • Xiaomi MiMo

    Xiaomi MiMo

    Xiaomi MiMo is a family of large language models (LLMs) developed by Xiaomi. It was initially released in April 2025 with the MiMo-7B model. Currently, MiMo is available for developers through API service. It is used as the key AI model in Xiaomi's "Human x Car x Home" ecosystem. == Development == Xiaomi developed MiMo as a reasoning-focused language model. Its development team was led by Luo Fuli, who had previously worked at DeepSeek before joining Xiaomi in late 2025. The model was trained using multi-token prediction and reinforcement learning, with a particular emphasis on mathematical reasoning and code generation tasks. In March 2026, Xiaomi CEO Lei Jun announced that the company planned to invest at least US$8.7 billion in artificial intelligence over the following three years. == Models == === List of models === === MiMo-7B === MiMo-7B is the first model of this LLM. The base model, MiMo-7B-Base, was pre-trained on approximately 25 trillion tokens using web pages, academic papers, books, and synthetic reasoning data. MiMo-7B-RL underwent supervised fine-tuning and reinforcement learning on 130,000 mathematics and code problems. MiMo-7B-RL-0530 was released in May 2025. It scaled the fine-tuning dataset from 500,000 to 6 million instances and extended the RL window from 32,000 to 48,000 tokens and improved AIME 2024 scores from 68.2 to 80.1. MiMo-VL-7B was a vision-language model combining a Vision Transformer encoder with the MiMo-7B backbone. It was trained in four stages consuming 2.4 trillion tokens. Its reinforcement learning variant used Mixed On-Policy Reinforcement Learning (MORL) which integrated reward signals across perception, grounding, and reasoning. Xiaomi also released MiMo-Audio-7B, an audio-language model for voice conversion, style transfer, and speech editing. === MiMo-V2-Flash === MiMo-V2-Flash was launched in December 2025. It is a open-sourced Mixture-of-experts model with 309 billion total parameters and 15 billion active parameters. It was trained on 27 trillion tokens using FP8 mixed precision. It used hybrid attention interleaving Sliding Window and Global Attention at a 5:1 ratio. === MiMo-V2-Pro === Xiaomi publicly introduced MiMo-V2-Pro on 18 March 2026. It has over 1 trillion total parameters, 42 billion active, and a 1-million-token context window. Before the official release, the model had appeared anonymously on OpenRouter under the codename "Hunter Alpha," where it drew substantial usage and topped daily charts for several days, according to Xiaomi and Reuters. During its listing on OpenRouter, the model reportedly processed over one trillion tokens in total usage. Xiaomi later said Hunter Alpha was an early internal test build of MiMo-V2-Pro, and Reuters reported that the model had been mistaken by some users for a possible DeepSeek system before Xiaomi confirmed its origin. The model was released as a proprietary API product, and Luo Fuli stated that Xiaomi intended to open-source a variant at an unspecified future date. Xiaomi has partnered with several API web platforms like OpenClaw to launch the model. All these websites initially offered a free trial of this model for a week, but due to the overwhelming response, Xiaomi later extended the free trial period of the model until 2 April 2026. === MiMo-V2-Omni === Alongside MiMo-V2-Pro, Xiaomi launched MiMo-V2-Omni on 18 March 2026. It handles image, video, audio, and text inputs. Before the official release, it was codenamed "Healer Alpha" in OpenRouter. === MiMo-V2-TTS === On the same date as the release of MiMo-V2-Pro and MiMo-V2-Omni, a Text-to-Speech model named MiMo-V2-TTS was released also. It is a speech synthesis model. It was trained on audio data, which makes it capable of emotional transitions, mid-sentence tone shifts, singing, and synthesis of regional dialects like Sichuan, Cantonese, Henan, and Taiwanese. == Licensing == Xiaomi has used different licensing approaches for different models in the MiMo family. The MiMo-7B series and MiMo-V2-Flash were released as open-weight models. MiMo-V2-Flash was published under the MIT license with model weights and inference code available on Hugging Face. MiMo-V2-Pro and MiMo-V2-Omni were released as proprietary models. It was accessible through Xiaomi's API platform and third-party API providers. Luo Fuli stated that Xiaomi intended to open-source a variant of MiMo-V2-Pro. Although, she did not specify any timeline. MiMo-V2-TTS was released as a proprietary model with no publicly available weights.

    Read more →
  • Hardware for artificial intelligence

    Hardware for artificial intelligence

    Specialized computer hardware is often used to execute artificial intelligence (AI) programs faster, and with less energy, such as Lisp machines, neuromorphic engineering, event cameras, and physical neural networks. Since 2017, several consumer grade CPUs and SoCs have on-die NPUs. As of 2023, the market for AI hardware is dominated by GPUs. As of the 2020s, AI computation is dominated by graphics processing units (GPUs) and newer domain-specific accelerators such as Google's Tensor Processing Units (TPUs), AMD's Instinct MI300 series, and various on-device neural-processing units (NPUs) found in consumer hardware. == Scope == For the purposes of this article, AI hardware refers to computing components and systems specifically designed or optimized to accelerate artificial-intelligence workloads such as machine-learning training or inference. This includes general-purpose accelerators used for AI (for example, GPUs) and domain-specific accelerators (for example, TPUs, NPUs, and other AI ASICs). Event-based cameras are sometimes discussed in the context of neuromorphic computing, but they are input sensors rather than AI compute devices. Conversely, components such as memristors are basic circuit elements rather than specialized AI hardware when considered alone. == Lisp machines == Lisp machines were developed in the late 1970s and early 1980s to make artificial intelligence programs written in the programming language Lisp run faster. == Dataflow architecture == Dataflow architecture processors used for AI serve various purposes with varied implementations like the polymorphic dataflow Convolution Engine by Kinara (formerly Deep Vision), structure-driven dataflow by Hailo, and dataflow scheduling by Cerebras. == Component hardware == === AI accelerators === Since the 2010s, advances in computer hardware have led to more efficient methods for training deep neural networks that contain many layers of non-linear hidden units and a very large output layer. By 2019, graphics processing units (GPUs), often with AI-specific enhancements, had displaced central processing units (CPUs) as the dominant means to train large-scale commercial cloud AI. OpenAI estimated the hardware compute used in the largest deep learning projects from Alex Net (2012) to Alpha Zero (2017), and found a 300,000-fold increase in the amount of compute needed, with a doubling-time trend of 3.4 months. === General-purpose GPUs for AI === Since the 2010s, graphics processing units (GPUs) have been widely used to train and deploy deep learning models because of their highly parallel architecture and high memory bandwidth. Modern data-center GPUs include dedicated tensor or matrix-math units that accelerate neural-network operations. In 2022, NVIDIA introduced the Hopper-generation H100 GPU, adding FP8 precision support and faster interconnects for large-scale model training. AMD and other vendors have also developed GPUs and accelerators aimed at AI and high-performance computing workloads. === Domain-specific accelerators (ASICs / NPUs) === Beyond general-purpose GPUs, several companies have developed application-specific integrated circuits (ASICs) and neural processing units (NPUs) tailored for AI workloads. Google introduced the Tensor Processing Unit (TPU) in 2016 for deep-learning inference, with later generations supporting large-scale training through dense systolic-array designs and optical interconnects. Other vendors have released similar devices—such as Apple's Neural Engine and various on-device NPUs—that emphasize energy-efficient inference in mobile or edge computing environments. === Memory and interconnects === AI accelerators rely on fast memory and inter-chip links to manage the large data volumes of training and inference. High-bandwidth memory (HBM) stacks, standardized as HBM3 in 2022, provide terabytes-per-second throughput on modern GPUs and ASICs. These accelerators are often connected through dedicated fabrics such as NVIDIA's NVLink and NVSwitch or optical interconnects used in TPU systems to scale performance across thousands of chips.

    Read more →
  • Ultra Hal

    Ultra Hal

    Ultra Hal is a chatbot intended to function as a virtual assistant. It was developed by Zabaware, Inc. Ultra Hal uses a natural language interface with animated characters using speech synthesis. Users can communicate with the chatterbot via typing or via a speech recognition engine. It utilizes the WordNet lexical dictionary. Its name is an allusion to HAL 9000, the artificial intelligence from the movie 2001: A Space Odyssey. Ultra Hal won the 2007 Loebner Prize for "most human" chatterbot.

    Read more →
  • Deep image prior

    Deep image prior

    Deep image prior is a type of convolutional neural network used to enhance a given image with no prior training data other than the image itself. A neural network is randomly initialized and used as prior to solve inverse problems such as noise reduction, super-resolution, and inpainting. Image statistics are captured by the structure of a convolutional image generator rather than by any previously learned capabilities. == Method == === Background === Inverse problems such as noise reduction, super-resolution, and inpainting can be formulated as the optimization task x ∗ = m i n x E ( x ; x 0 ) + R ( x ) {\displaystyle x^{}=min_{x}E(x;x_{0})+R(x)} , where x {\displaystyle x} is an image, x 0 {\displaystyle x_{0}} a corrupted representation of that image, E ( x ; x 0 ) {\displaystyle E(x;x_{0})} is a task-dependent data term, and R(x) is the regularizer. Deep neural networks learn a generator/decoder x = f θ ( z ) {\displaystyle x=f_{\theta }(z)} which maps a random code vector z {\displaystyle z} to an image x {\displaystyle x} . The image corruption method used to generate x 0 {\displaystyle x_{0}} is selected for the specific application. === Specifics === In this approach, the R ( x ) {\displaystyle R(x)} prior is replaced with the implicit prior captured by the neural network (where R ( x ) = 0 {\displaystyle R(x)=0} for images that can be produced by a deep neural networks and R ( x ) = + ∞ {\displaystyle R(x)=+\infty } otherwise). This yields the equation for the minimizer θ ∗ = a r g m i n θ E ( f θ ( z ) ; x 0 ) {\displaystyle \theta ^{}=argmin_{\theta }E(f_{\theta }(z);x_{0})} and the result of the optimization process x ∗ = f θ ∗ ( z ) {\displaystyle x^{}=f_{\theta ^{}}(z)} . The minimizer θ ∗ {\displaystyle \theta ^{}} (typically a gradient descent) starts from a randomly initialized parameters and descends into a local best result to yield the x ∗ {\displaystyle x^{}} restoration function. ==== Overfitting ==== A parameter θ may be used to recover any image, including its noise. However, the network is reluctant to pick up noise because it contains high impedance while useful signal offers low impedance. This results in the θ parameter approaching a good-looking local optimum so long as the number of iterations in the optimization process remains low enough not to overfit data. === Deep Neural Network Model === Typically, the deep neural network model for deep image prior uses a U-Net like model without the skip connections that connect the encoder blocks with the decoder blocks. The authors in their paper mention that "Our findings here (and in other similar comparisons) seem to suggest that having deeper architecture is beneficial, and that having skip-connections that work so well for recognition tasks (such as semantic segmentation) is highly detrimental." == Applications == === Denoising === The principle of denoising is to recover an image x {\displaystyle x} from a noisy observation x 0 {\displaystyle x_{0}} , where x 0 = x + ϵ {\displaystyle x_{0}=x+\epsilon } . The distribution ϵ {\displaystyle \epsilon } is sometimes known (e.g.: profiling sensor and photon noise) and may optionally be incorporated into the model, though this process works well in blind denoising. The quadratic energy function E ( x , x 0 ) = | | x − x 0 | | 2 {\displaystyle E(x,x_{0})=||x-x_{0}||^{2}} is used as the data term, plugging it into the equation for θ ∗ {\displaystyle \theta ^{}} yields the optimization problem m i n θ | | f θ ( z ) − x 0 | | 2 {\displaystyle min_{\theta }||f_{\theta }(z)-x_{0}||^{2}} . === Super-resolution === Super-resolution is used to generate a higher resolution version of image x. The data term is set to E ( x ; x 0 ) = | | d ( x ) − x 0 | | 2 {\displaystyle E(x;x_{0})=||d(x)-x_{0}||^{2}} where d(·) is a downsampling operator such as Lanczos that decimates the image by a factor t. === Inpainting === Inpainting is used to reconstruct a missing area in an image x 0 {\displaystyle x_{0}} . These missing pixels are defined as the binary mask m ∈ { 0 , 1 } H × V {\displaystyle m\in \{0,1\}^{H\times V}} . The data term is defined as E ( x ; x 0 ) = | | ( x − x 0 ) ⊙ m | | 2 {\displaystyle E(x;x_{0})=||(x-x_{0})\odot m||^{2}} (where ⊙ {\displaystyle \odot } is the Hadamard product). The intuition behind this is that the loss is computed only on the known pixels in the image, and the network is going to learn enough about the image to fill in unknown parts of the image even though the computed loss doesn't include those pixels. This strategy is used to remove image watermarks by treating the watermark as missing pixels in the image. === Flash–no-flash reconstruction === This approach may be extended to multiple images. A straightforward example mentioned by the author is the reconstruction of an image to obtain natural light and clarity from a flash–no-flash pair. Video reconstruction is possible but it requires optimizations to take into account the spatial differences. == Implementations == A reference implementation rewritten in Python 3.6 with the PyTorch 0.4.0 library was released by the author under the Apache 2.0 license: deep-image-prior A TensorFlow-based implementation written in Python 2 and released under the CC-SA 3.0 license: deep-image-prior-tensorflow A Keras-based implementation written in Python 2 and released under the GPLv3: machine_learning_denoising == Example == See Astronomy Picture of the Day (APOD) of 2024-02-18

    Read more →
  • SimSimi

    SimSimi

    SimSimi is an artificial intelligence conversation program created in 2002 by ISMaker. It grows its artificial intelligence day by day assisted by a feature that allows users to teach it to respond correctly. SimSimi, pronounced as "shim-shimi", is from a Korean word simsim (심심) which means "bored". It has an application designed for Android, Windows Phone and iOS. The application was banned in Thailand in 2012 after users taught it to make responses containing profanity, and to criticise leading politicians. In April 2018, SimSimi was suspended in Brazil due to accusations of sending inappropriate messages, such as sexual language, bullying and even death threats, being labeled as "dangerous" mainly due to its popularity among children, and according to its developer, the suspension of the app in the country "was inevitable because the SimSimi app, at least in the last few days, had a significant negative social impact in Brazil.”

    Read more →
  • TinEye

    TinEye

    TinEye is a reverse image search engine developed and offered by Idée, Inc., a company based in Toronto, Ontario, Canada. It was the first image search engine on the web to use image identification technology rather than keywords, metadata or watermarks. TinEye allows users to search not using keywords but with images. Upon submitting an image, TinEye creates a "unique and compact digital signature or fingerprint" of the image and matches it with other indexed images. This procedure is able to match even heavily edited versions of the submitted image, but will not usually return similar images in the results. == History == Idée, Inc. was founded by Leila Boujnane and Paul Bloore in 1999. Idée launched the service on May 6, 2008 and went into open beta in August that year. While computer vision and image identification research projects began as early as the 1980s, the company claims that TinEye is the first web-based image search engine to use image identification technology. The service was created with copyright owners and brand marketers as the intended user base, to look up unauthorized use and track where the brands are showing up respectively. In June 2014, TinEye claimed to have indexed more than five billion images for comparisons. However, this is a relatively small proportion of the total number of images available on the World Wide Web. As of September 2025, TinEye's search results claim to have over 77.6 billion images indexed for comparison. == Technology == A user uploads an image to the search engine (the upload size is limited to 20 MB) or provides a URL for an image or for a page containing the image. The search engine will look up other usage of the image in the internet, including modified images based upon that image, and report the date and time at which they were posted. TinEye does not recognize outlines of objects or perform facial recognition, but recognizes the entire image, and some altered versions of that image. This includes smaller, larger, and cropped versions of the image. TinEye has shown itself capable of retrieving different images from its database of the same subject, such as famous landmarks. TinEye is capable of searching for images in JPEG, PNG, WebP, GIF, BMP and TIFF format. Results generated from TinEye include the total number of matches in their database, a preview image, and the URL to each match. TinEye can sort results by best match, most changed, biggest image, newest, and oldest. User registration is optional and offers storage of the user's previous queries. Other features include embeddable widgets and bookmarklets. TinEye has also released their commercial API. == Usage == TinEye's ability to search the web for specific images (and modifications of those images) makes it a potential tool for the copyright holders of visual works to locate infringements on their copyright. It also creates a possible avenue for people who are looking to make use of imagery under orphan works to find the copyright holders of that imagery. Being that orphan works can be defined as "copyrighted works whose owners are difficult or impossible to identify and/or locate," the use of TinEye could potentially remove the orphan work status from online images that can be found in its database. === Fact-checking === It has been recommended by fact-checkers as a useful resource in attempts to verify the origin of images. As of 2019, TinEye specialized in copyright violations and finding exact versions of images online.

    Read more →
  • Lessac Technologies

    Lessac Technologies

    Lessac Technologies, Inc. (LTI) is an American firm which develops voice synthesis software, licenses technology and sells synthesized novels as MP3 files. The firm currently has seven patents granted and three more pending for its automated methods of converting digital text into human-sounding speech, more accurately recognizing human speech and outputting the text representing the words and phrases of said speech, along with recognizing the speaker's emotional state. The LTI technology is partly based on the work of the late Arthur Lessac, a Professor of Theater at the State University of New York and the creator of Lessac Kinesensic Training, and LTI has licensed exclusive rights to exploit Arthur Lessac's copyrighted works in the fields of speech synthesis and speech recognition. Based on the view that music is speech and speech is music, Lessac's work and books focused on body and speech energies and how they go together. Arthur Lessac's textual annotation system, which was originally developed to assist actors, singers, and orators in marking up scripts to prepare for performance, is adapted in LTI's speech synthesis system as the basic representation of the speech to be synthesized (Lessemes), in contrast to many other systems which use a phonetic representation. LTI's software has two major components: (1) a linguistic front-end that converts plain text to a sequence of prosodic and phonosensory graphic symbols (Lessemes) based on Arthur Lessac's annotation system, which specify the speech units to be synthesized; (2) a signal-processing back-end that takes the Lessemes as acoustic data and produces human-sounding synthesized speech as output, using unit selection and concatenation. LTI's text-to-speech system came in second in the world-wide Blizzard Challenge 2011 and 2012. The first-place team in 2011 also employed LTI's "front-end" technology, but with its own back-end. The Blizzard Challenge, conducted by the Language Technologies Institute of Carnegie Mellon University, was devised as a way to evaluate speech synthesis techniques by having different research groups build voices from the same voice-actor recordings, and comparing the results through listening tests. LTI was founded in 2000 by H. Donald Wilson (chairman), a lawyer, LexisNexis entrepreneur and business associate of Arthur Lessac; and Gary A. Marple (chief inventor), after Marple suggested that Arthur Lessac's kinesensic voice training might be applicable to computational linguistics. After Wilson's death in 2006, his nephew John Reichenbach became the firm's CEO.

    Read more →
  • GPT-4Chan

    GPT-4Chan

    Generative Pre-trained Transformer 4Chan (GPT-4chan) is a controversial AI model that was developed and deployed by YouTuber and AI researcher Yannic Kilcher in June 2022. The model is a large language model, which means it can generate text based on some input, by fine-tuning GPT-J with a dataset of millions of posts from the /pol/ board of 4chan, an anonymous online forum known for occasionally hosting hateful and extremist content. The model learned to mimic the style and tone of /pol/ users, producing text that is often intentionally offensive to groups (racist, sexist, homophobic, etc.) and nihilistic. Kilcher deployed the model on the /pol/ board itself, where it interacted with other users without revealing its identity. He also made the model publicly available on Hugging Face, a platform for sharing and using AI models, until it was removed from the platform. The project sparked criticism and debate in the AI community. Some people questioned the ethics, legality, and social impact of creating and distributing such a model. Some of the issues raised by the GPT-4chan controversy include the potential harm of spreading hate speech, the responsibility of AI developers and platforms, the need for regulation and oversight of AI models, and the role of open source and transparency in AI research. == Development == The development of GPT-4chan began in May 2022, when Kilcher announced his project on his YouTube channel. Notably, at the time before ChatGPT, he explained that he wanted to create a large language model that could generate realistic and coherent text in the style of /pol/, one of the most notorious online communities. He indicated that he was inspired by the success of GPT-3, a powerful AI model created by OpenAI, and GPT-J, an open-source model, with GPT-3 comparable performance, released by EleutherAI, a group of independent AI researchers. Kilcher decided to use GPT-J as the base model for his project, and fine-tune it with a large dataset of /pol/ posts. The Raiders of the Lost Kek dataset contained over 100 million posts from /pol/, spanning from June 2016-November 2019. Kilcher then proceeded to fine-tune the GPT-J model on the 4chan data. He also showed some examples of the model’s outputs, which ranged from political opinions, conspiracy theories, jokes, insults, and threats, to more creative and bizarre texts, such as poems, stories, songs, and code. He said that he was impressed by the model’s ability to generate fluent and diverse text, and that he was curious to see how it would interact with real /pol/ users. == Release == In June 2022, Kilcher deployed his model on the /pol/ board itself, using a bot that he programmed to post and reply to threads. He did not reveal the model’s identity, and he let it run autonomously, without any human supervision or intervention. He wanted to conduct a natural experiment, and to observe the model’s behavior and impact in a real-world setting. Furthermore, he also wanted to test the model’s robustness, and to see how it would handle the challenges and dynamics of /pol/, such as trolling, flaming, baiting, and moderation. At the same time, Kilcher also made his model publicly available on Hugging Face, a platform for sharing and using AI models. He wanted to share his work with the AI community and the public, and that he hoped that his model would inspire and enable others to create and explore new applications and possibilities with large language models. Likewise, he also said that he wanted to spark a discussion and a debate about the ethical and social implications of his project, and that he welcomed feedback and criticism from anyone. He provided a link to his model’s page on Hugging Face, where anyone could access and use the model through a web interface or an API, and also provided a link to his GitHub repository, where anyone could download and inspect the model’s code and data. == Controversy == The release of GPT-4chan to the public caused a lot of reactions and responses from various audiences. On the /pol/ board, the model’s posts and replies attracted a lot of attention and engagement from other users, who were mostly unaware of the model’s identity and nature. Some users praised the model for its intelligence, creativity, and humor, and agreed with its opinions and views. Some users challenged the model for its ignorance, inconsistency, and absurdity, and disagreed with its claims and arguments. Some users tried to troll, bait, or expose the model, and attempted to trick or test it with various questions and scenarios. The model’s posts and replies also generated a lot of controversy and conflict among the users, who often engaged in heated and violent debates and fights with each other. On Hugging Face, the model’s page received a lot of visits and requests from users who wanted to try out and experiment with the model. The model’s page also received a lot of feedback and reviews from users who rated and commented on the model. However, with the controversy of the model, access to it was gated and then disabled on Hugging Face for concerns about the potential harm the model could cause. The incident was notable for the direct intervention of CEO Clément Delangue in the talk pages, a very unusual occurrence compared to the normal practices of content moderation. The release of GPT-4chan also sparked a lot of media coverage and public attention, as various news outlets and social media platforms reported and commented on the model’s project. On YouTube, the model’s video received a lot of views and interactions from viewers who watched and followed the project. Furthermore, a petition condemning the deployment of GPT-4chan gained over 300 signatures from technology experts.

    Read more →
  • ChatScript

    ChatScript

    ChatScript is a combination Natural Language engine and dialog management system designed initially for creating chatbots, but is currently also used for various forms of NL processing. It is written in C++. The engine is an open source project at SourceForge. and GitHub. ChatScript was written by Bruce Wilcox and originally released in 2011, after Suzette (written in ChatScript) won the 2010 Loebner Prize, fooling one of four human judges. == Features == In general ChatScript aims to author extremely concisely, since the limiting scalability of hand-authored chatbots is how much/fast one can write the script. Because ChatScript is designed for interactive conversation, it automatically maintains user state across volleys. A volley is any number of sentences the user inputs at once and the chatbots response. The basic element of scripting is the rule. A rule consists of a type, a label (optional), a pattern, and an output. There are three types of rules. Gambits are something a chatbot might say when it has control of the conversation. Rejoinders are rules that respond to a user remark tied to what the chatbot just said. Responders are rules that respond to arbitrary user input which is not necessarily tied to what the chatbot just said. Patterns describe conditions under which a rule may fire. Patterns range from extremely simplistic to deeply complex (analogous to Regex but aimed for NL). Heavy use is typically made of concept sets, which are lists of words sharing a meaning. ChatScript contains some 2000 predefined concepts and scripters can easily write their own. Output of a rule intermixes literal words to be sent to the user along with common C-style programming code. Rules are bundled into collections called topics. Topics can have keywords, which allows the engine to automatically search the topic for relevant rules based on user input. == Example code == Words starting with ~ are concept sets. For example, ~fruit is the list of all known fruits. The simple pattern (~fruit) reacts if any fruit is mentioned immediately after the chatbot asks for favorite food. The slightly more complex pattern for the rule labelled WHATMUSIC requires all the words what, music, you and any word or phrase meaning to like, but they may occur in any order. Responders come in three types. ?: rules react to user questions. s: rules react to user statements. u: rules react to either. ChatScript code supports standard if-else, loops, user-defined functions and calls, and variable assignment and access. == Data == Some data in ChatScript is transient, meaning it will disappear at the end of the current volley. Other data is permanent, lasting forever until explicitly killed off. Data can be local to a single user or shared across all users at the bot level. Internally all data is represented as text and is automatically converted to a numeric form as needed. === Variables === User variables come in several kinds. Variables purely local to a topic or function are transient. Global variables can be declared as transient or permanent. A variable is generally declared merely by using it, and its type depends on its prefix ($, $$, $_). === Facts === In addition to variables, ChatScript supports facts – triples of data, which can also be transient or permanent. Functions can query for facts having particular values of some of the fields, making them act like an in-memory database. Fact retrieval is very quick and efficient the number of available in-memory facts is largely constrained to the available memory of the machine running the ChatScript engine. Facts can represent record structures and are how ChatScript represents JSON internally. Tables of information can be defined to generate appropriate facts. The above table links people to what they invented (1 per line) with Einstein getting a list of things he did. == External communication == ChatScript embeds the Curl library and can directly read and write facts in JSON to a website. == Server == A ChatScript engine can run in local or server mode. == Pos-tagging, parsing, and ontology == ChatScript comes with a copy of English WordNet embedded within, including its ontology, and creates and extends its own ontology via concept declarations. It has an English language pos-tagger and parser and supports integration with TreeTagger for pos-tagging a number of other languages (TreeTagger commercial license required). == Databases == In addition to an internal fact database, ChatScript supports PostgreSQL, MySQL, MSSQL and MongoDB both for access by scripts, but also as a central filesystem if desired so ChatScript can be scaled horizontally. A common use case is to use a centralized database to host the user files and multiple servers to scale the ChatScript engine. == JavaScript == ChatScript also embeds DukTape, ECMAScript E5/E5.1 compatibility, with some semantics updated from ES2015+. == Spelling Correction == ChatScript has built-in automatic spell checking, which can be augmented in script as both simple word replacements or context sensitive changes. With appropriate simple rules you can change perfect legal words into other words or delete them. E.g., if you have a concept of ~electronic_goods and don't want an input of Radio Shack (a store name) to be detected as an electronic good, you can get the input to change to Radio_Shack (a single word), or allow the words to remain but block the detection of the concept. This is particularly useful when combined with speech-to-text code that is imperfect, but you are familiar with common failings of it and can compensate for them in script. == Control flow == A chatbot's control flow is managed by the control script. This is merely another ordinary topic of rules, that invokes API functions of the engine. Thus control is fully configurable by the scripter (and functions exist to allow introspection into the engine). There are pre-processing control flow and post-processing control flow options available, for special processing.

    Read more →
  • Algorithm selection

    Algorithm selection

    Algorithm selection (sometimes also called per-instance algorithm selection or offline algorithm selection) is a meta-algorithmic technique to choose an algorithm from a portfolio on an instance-by-instance basis. It is motivated by the observation that on many practical problems, different algorithms have different performance characteristics. That is, while one algorithm performs well in some scenarios, it performs poorly in others and vice versa for another algorithm. If we can identify when to use which algorithm, we can optimize for each scenario and improve overall performance. This is what algorithm selection aims to do. The only prerequisite for applying algorithm selection techniques is that there exists (or that there can be constructed) a set of complementary algorithms. == Definition == Given a portfolio P {\displaystyle {\mathcal {P}}} of algorithms A ∈ P {\displaystyle {\mathcal {A}}\in {\mathcal {P}}} , a set of instances i ∈ I {\displaystyle i\in {\mathcal {I}}} and a cost metric m : P × I → R {\displaystyle m:{\mathcal {P}}\times {\mathcal {I}}\to \mathbb {R} } , the algorithm selection problem consists of finding a mapping s : I → P {\displaystyle s:{\mathcal {I}}\to {\mathcal {P}}} from instances I {\displaystyle {\mathcal {I}}} to algorithms P {\displaystyle {\mathcal {P}}} such that the cost ∑ i ∈ I m ( s ( i ) , i ) {\displaystyle \sum _{i\in {\mathcal {I}}}m(s(i),i)} across all instances is optimized. == Examples == === Boolean satisfiability problem (and other hard combinatorial problems) === A well-known application of algorithm selection is the Boolean satisfiability problem. Here, the portfolio of algorithms is a set of (complementary) SAT solvers, the instances are Boolean formulas, the cost metric is for example average runtime or number of unsolved instances. So, the goal is to select a well-performing SAT solver for each individual instance. In the same way, algorithm selection can be applied to many other N P {\displaystyle {\mathcal {NP}}} -hard problems (such as mixed integer programming, CSP, AI planning, TSP, MAXSAT, QBF and answer set programming). Competition-winning systems in SAT are SATzilla, 3S and CSHC === Machine learning === In machine learning, algorithm selection is better known as meta-learning. The portfolio of algorithms consists of machine learning algorithms (e.g., Random Forest, SVM, DNN), the instances are data sets and the cost metric is for example the error rate. So, the goal is to predict which machine learning algorithm will have a small error on each data set. == Instance features == The algorithm selection problem is mainly solved with machine learning techniques. By representing the problem instances by numerical features f {\displaystyle f} , algorithm selection can be seen as a multi-class classification problem by learning a mapping f i ↦ A {\displaystyle f_{i}\mapsto {\mathcal {A}}} for a given instance i {\displaystyle i} . Instance features are numerical representations of instances. For example, we can count the number of variables, clauses, average clause length for Boolean formulas, or number of samples, features, class balance for ML data sets to get an impression about their characteristics. === Static vs. probing features === We distinguish between two kinds of features: Static features are in most cases some counts and statistics (e.g., clauses-to-variables ratio in SAT). These features ranges from very cheap features (e.g. number of variables) to very complex features (e.g., statistics about variable-clause graphs). Probing features (sometimes also called landmarking features) are computed by running some analysis of algorithm behavior on an instance (e.g., accuracy of a cheap decision tree algorithm on an ML data set, or running for a short time a stochastic local search solver on a Boolean formula). These feature often cost more than simple static features. === Feature costs === Depending on the used performance metric m {\displaystyle m} , feature computation can be associated with costs. For example, if we use running time as performance metric, we include the time to compute our instance features into the performance of an algorithm selection system. SAT solving is a concrete example, where such feature costs cannot be neglected, since instance features for CNF formulas can be either very cheap (e.g., to get the number of variables can be done in constant time for CNFs in the DIMACs format) or very expensive (e.g., graph features which can cost tens or hundreds of seconds). It is important to take the overhead of feature computation into account in practice in such scenarios; otherwise a misleading impression of the performance of the algorithm selection approach is created. For example, if the decision which algorithm to choose can be made with perfect accuracy, but the features are the running time of the portfolio algorithms, there is no benefit to the portfolio approach. This would not be obvious if feature costs were omitted. == Approaches == === Regression approach === One of the first successful algorithm selection approaches predicted the performance of each algorithm m ^ A : I → R {\displaystyle {\hat {m}}_{\mathcal {A}}:{\mathcal {I}}\to \mathbb {R} } and selected the algorithm with the best predicted performance a r g min A ∈ P m ^ A ( i ) {\displaystyle arg\min _{{\mathcal {A}}\in {\mathcal {P}}}{\hat {m}}_{\mathcal {A}}(i)} for an instance i {\displaystyle i} . === Clustering approach === A common assumption is that the given set of instances I {\displaystyle {\mathcal {I}}} can be clustered into homogeneous subsets and for each of these subsets, there is one well-performing algorithm for all instances in there. So, the training consists of identifying the homogeneous clusters via an unsupervised clustering approach and associating an algorithm with each cluster. A new instance is assigned to a cluster and the associated algorithm selected. A more modern approach is cost-sensitive hierarchical clustering using supervised learning to identify the homogeneous instance subsets. === Pairwise cost-sensitive classification approach === A common approach for multi-class classification is to learn pairwise models between every pair of classes (here algorithms) and choose the class that was predicted most often by the pairwise models. We can weight the instances of the pairwise prediction problem by the performance difference between the two algorithms. This is motivated by the fact that we care most about getting predictions with large differences correct, but the penalty for an incorrect prediction is small if there is almost no performance difference. Therefore, each instance i {\displaystyle i} for training a classification model A 1 {\displaystyle {\mathcal {A}}_{1}} vs A 2 {\displaystyle {\mathcal {A}}_{2}} is associated with a cost | m ( A 1 , i ) − m ( A 2 , i ) | {\displaystyle |m({\mathcal {A}}_{1},i)-m({\mathcal {A}}_{2},i)|} . == Requirements == The algorithm selection problem can be effectively applied under the following assumptions: The portfolio P {\displaystyle {\mathcal {P}}} of algorithms is complementary with respect to the instance set I {\displaystyle {\mathcal {I}}} , i.e., there is no single algorithm A ∈ P {\displaystyle {\mathcal {A}}\in {\mathcal {P}}} that dominates the performance of all other algorithms over I {\displaystyle {\mathcal {I}}} (see figures to the right for examples on complementary analysis). In some application, the computation of instance features is associated with a cost. For example, if the cost metric is running time, we have also to consider the time to compute the instance features. In such cases, the cost to compute features should not be larger than the performance gain through algorithm selection. == Application domains == Algorithm selection is not limited to single domains but can be applied to any kind of algorithm if the above requirements are satisfied. Application domains include: hard combinatorial problems: SAT, Mixed Integer Programming, CSP, AI Planning, TSP, MAXSAT, QBF and Answer Set Programming combinatorial auctions in machine learning, the problem is known as meta-learning software design black-box optimization multi-agent systems numerical optimization linear algebra, differential equations evolutionary algorithms vehicle routing problem power systems For an extensive list of literature about algorithm selection, we refer to a literature overview. == Variants of algorithm selection == === Online selection === Online algorithm selection refers to switching between different algorithms during the solving process. This is useful as a hyper-heuristic. In contrast, offline algorithm selection selects an algorithm for a given instance only once and before the solving process. === Computation of schedules === An extension of algorithm selection is the per-instance algorithm scheduling problem, in which we do not select only one solver, but we select a time budget for each algorithm

    Read more →
  • CLEVER score

    CLEVER score

    The CLEVER (Cross Lipschitz Extreme Value for nEtwork Robustness) score is a way of measuring the robustness of an artificial neural network towards adversarial attacks. It was developed by a team at the MIT-IBM Watson AI Lab in IBM Research and first presented at the 2018 International Conference on Learning Representations. It was mentioned and reviewed by Ian Goodfellow as well. It was adopted into an educational game Fool The Bank by Narendra Nath Joshi, Abhishek Bhandwaldar and Casey Dugan

    Read more →
  • Application enablement

    Application enablement

    Application enablement is an approach which brings telecommunications network providers and developers together to combine their network and web abilities in creating and delivering high demand advanced services and new intelligent applications. Network providers, in addition to bandwidth, provide abilities such as billing, location, presence, and security, which have allowed them to establish long-term relationships with end-users. By offering these select abilities as application programming interfaces (APIs), providers give developers access to a set of tools to create (mashup) new applications and services to run on provider networks. Unifying the strengths of providers and developers facilitates the creation of mash-up applications, and in turn, a better end user quality of experience (QoE) for improved profit margins. Apple's iOS with App Store, and Google's Android with Android Market exemplify this approach. Both have introduced mobile platforms that are supported by a comprehensive ecosystem in order to perpetuate innovation in product design, content and service offerings, and overall consumer behavior. By the end of April 2010, downloadable applications numbered over 200,000 for iPhone and over 50,000 for Android. == Background == Historically, telecommunication providers primarily based their business models on network performance, emphasizing connectivity, availability, and quality of service (QoS) as key sources of revenue and customer value. With the increasing demand for bandwidth-intensive data and video applications, maintaining service continuity has required substantial infrastructure investments. To address rising operational costs and declining average revenue per user (ARPU), providers have increasingly adopted customer-oriented strategies and diversified business models to expand their roles within the telecommunications value chain. Application enablement supports providers in making this transition by providing an environment, or ecosystem, where providers and developers can collaborate to build, test, manage, and distribute applications across networks including television, broadband, Internet, and mobile. This cooperative effort produces mutually beneficial results for all parties, opening up new revenue streams while enhancing value and rate of return (ROI). The following are some examples of key network abilities which function as application enablers in the telecommunications market: Billing systems Security for private transactions Network-based storage of digital content End-to-end bandwidth for high-quality transmissions Scoring abilities to identify end-user preferences and behaviors Subscriber data to customize the end-user experience Context information, such as location and presence, to localize services. == New business models == As network providers work toward effective collaboration with application and content developers, several new business models are emerging to help facilitate the business relationships: === Vendor-led === A type of business model driven by telecommunications vendors, who assist network providers in building relationships with application and content developers to lower the cost and complexity of managing third parties. Examples of this model include: Forum Nokia IBM Technology Partner Ecosystem Ng Connect Huawei Intouch program === Operator-led === Characterized by network providers who want to maintain a high degree of flexibility and control over applications created for their end-consumers, this model lets them create and manage their own developer program, development platform, and application store. Under this arrangement, independent developers provide their own branding, marketing communications, pricing and customer care. Network providers pursuing this model will often seek to partner with a large number of third parties using standardized on-boarding processes. Examples of this model include: o2 Litmus Orange Partner Joint Innovation Lab === Aggregator === Network providers who choose not to create/manage their own developer relationships will partner with one or multiple aggregators, to administer a portion of or their entire application strategy. Examples of this model include: Ovi Operator Partnership Blackberry Operator Partnership Cellmania Buongiorno === Mass wholesale === Select network providers also participate in wholesale models that exist primarily for applications (BT's Ribbit- an Internet Protocol (IP) based calling and messaging platform) and devices (Verizon's Open Device initiative). This business-to-business approach reduces a large portion of the potential costs of third party application enablement (marketing, acquisition and support). Examples of this model include: BT's Ribbit Verizon Wireless ODI AT&T Synaptic Hosting === The enterprise customer === Some network providers are focusing on enabling applications in the enterprise space. In this model, the network provider establishes a platform for their large enterprise customers who want to blend custom software with enhanced abilities, and will provide standardized processes around mobilizing enterprise applications, and exposing core back-office abilities to allow for dynamic customer interaction. Examples of this model include: Vodafone Applications Service Verizon Private Network Sprint Solution Launchpad === Trusted partner === In this model, the network provider builds one-on-one relationships with trusted third-party developers by exposing customized network abilities, bringing a greater variety of brands to the network provider's portfolio. Network providers using this model tend to only have a few partners (in contrast to the operator led model). Under this scenario, network providers benefit from a pre-established customer base and the developer's marketing resources. Examples of this model include: 3/Skype Partnership (UK) Virgin Media and BBC iPlayer == Network operator developer resources == Operator led model o2 Litmus Orange Partner Joint Innovations Lab Aggregator model Ovi Operator Partnership Cellmania Buongiorno Mass wholesale model BT Ribbit Verizon Wireless ODI AT&T Synaptic Hosting Enterprise customer model Vodafone Applications Service Verizon Private Network Sprint Solution Launchpad == Rerencesfe ==

    Read more →