AI For Business Specialization Upenn

AI For Business Specialization Upenn — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Line detection

    Line detection

    In image processing, line detection is an algorithm that takes a collection of n edge points and finds all the lines on which these edge points lie. The most popular line detectors are the Hough transform and convolution-based techniques. == Hough transform == The Hough transform can be used to detect lines and the output is a parametric description of the lines in an image, for example ρ = r cos(θ) + c sin(θ). If there is a line in a row and column based image space, it can be defined ρ, the distance from the origin to the line along a perpendicular to the line, and θ, the angle of the perpendicular projection from the origin to the line measured in degrees clockwise from the positive row axis. Therefore, a line in the image corresponds to a point in the Hough space. The Hough space for lines has therefore these two dimensions θ and ρ, and a line is represented by a single point corresponding to a unique set of these parameters. The Hough transform can then be implemented by choosing a set of values of ρ and θ to use. For each pixel (r, c) in the image, compute r cos(θ) + c sin(θ) for each values of θ, and place the result in the appropriate position in the (ρ, θ) array. At the end, the values of (ρ, θ) with the highest values in the array will correspond to strongest lines in the image == Convolution-based technique == In a convolution-based technique, the line detector operator consists of a convolution masks tuned to detect the presence of lines of a particular width n and a θ orientation. Here are the four convolution masks to detect horizontal, vertical, oblique (+45 degrees), and oblique (−45 degrees) lines in an image. a) Horizontal mask(R1) (b) Vertical (R3) (C) Oblique (+45 degrees)(R2) (d) Oblique (−45 degrees)(R4) In practice, masks are run over the image and the responses are combined given by the following equation: R(x, y) = max(|R1 (x, y)|, |R2 (x, y)|, |R3 (x, y)|, |R4 (x, y)|) If R(x, y) > T, then discontinuity As can be seen below, if mask is overlay on the image (horizontal line), multiply the coincident values, and sum all these results, the output will be the (convolved image). For example, (−1)(0)+(−1)(0)+(−1)(0) + (2)(1) +(2)(1)+(2)(1) + (−1)(0)+(−1)(0)+(−1)(0) = 6 pixels on the second row, second column in the (convolved image) starting from the upper left corner of the horizontal lines. page 82 == Example == These masks above are tuned for light lines against a dark background, and would give a big negative response to dark lines against a light background. == Code example == The code was used to detect only the vertical lines in an image using Matlab and the result is below. The original image is the one on the top and the result is below it. As can be seen on the picture on the right, only the vertical lines were detected

    Read more →
  • Ericom Connect

    Ericom Connect

    Ericom Connect is a remote access/application publishing solution produced by Ericom Software that provides secure, centrally managed access to physical or hosted desktops and applications running on Microsoft Windows and Linux systems. == Product overview == Ericom Connect is desktop virtualization and application virtualization software that allows users to run applications remotely, without installing them on the local computer or device. The software is noted for its scalability, ease of deployment, and compatibility with any type of infrastructure, cloud or physical. Ericom Connect uses AccessPad (native client for desktops), AccessToGo (native client for mobile), or AccessNow, one of the first HTML5 RDP solutions to support clientless access to Windows desktops and applications from any device with an HTML5-compatible browser, including Macintosh computers, mobile devices, and Google Chromebooks. Other notable features include performance monitoring, built-in real-time analytics & BI, support for two-factor authentication (using RSA SecurID), multi-tenancy and multi-datacenter support via a single unified web interface, and a “Launch Simulation” feature that allows users to visualize and simulate actual step-by-step user processes directly from within the administration console. In addition to scalability, by distributing configurations, logs, etc., across multiple servers there is no single point of failure, as can be the case if all configuration information is stored on one server. == History == Ericom Connect was introduced in 2015. Ericom Connect is a successor to Ericom PowerTerm Web Connect. PowerTerm Web Connect used an architecture similar to what was then current with Citrix and VMWare, relying on a centralized SQL server, a connection broker, image management for different hypervisors, and a variety of clients. Ericom Connect uses a new grid architecture that provides more scalability, reliability, and flexibility than before.

    Read more →
  • SmarterChild

    SmarterChild

    SmarterChild was a chatbot available on AOL Instant Messenger and Windows Live Messenger (previously MSN Messenger) networks. == History == SmarterChild was an apparently intelligent agent or "bot" developed by ActiveBuddy, Inc., with offices in New York and Sunnyvale. It was widely distributed across global instant messaging networks. SmarterChild became very popular, attracting over 30 million Instant Messenger "buddies" on AIM (AOL), MSN and Yahoo Messenger over the course of its lifetime. Founded in 2000, ActiveBuddy was the brainchild of Robert Hoffer and Timothy Kay, who later brought seasoned advertising executive Peter Levitan on board as CEO. The concept for conversational instant messaging bots came from the founder's vision to add natural language comprehension functionality to the increasingly popular AIM instant messaging application. The original implementation took shape as a demo that Kay programmed in Perl in his Los Altos garage to connect a single buddy name, "ActiveBuddy", to look up stock symbols, and later allow AIM users to play Colossal Cave Adventure, a word-based adventure game, and MIT's Boris Katz Start Question Answering System but quickly grew to include a wide range of database applications the company called 'knowledge domains' including instant access to news, weather, stock information, movie times, yellow pages listings, and detailed sports data, as well as a variety of tools (personal assistant, calculators, translator, etc.). None of the individual domains which the company had named “stocksBuddy”, “sportsBuddy”, etc. ever launched publicly. When Stephen Klein came on board as COO — and eventually CEO — he insisted that all of the disparate test “buddies” be launched together with the company’s highly-developed colloquial chat domain. He suggested using “SmarterChild”, a username coined by Tim Kay which Tim was using to test various things. The bundled domains were launched publicly as SmarterChild (on AIM initially) in June 2001. SmarterChild provided information wrapped in fun and quirky conversation. The company generated no revenue from SmarterChild, but used it as a demonstration of the power of what Klein called “conversational computing”. The company subsequently marketed Automated Service Agents—delivering immediate answers to customer service inquiries—-to large corporations, like Comcast, Cingular, TimeWarner Cable, etc. SmarterChild's popularity spawned targeted marketing-oriented bots for Radiohead, Austin Powers, Intel, Keebler, The Sporting News and others. ActiveBuddy co-founders, Kay and Hoffer, as co-inventors, were issued two controversial U.S. patents in 2002. ActiveBuddy changed its name to Colloquis (briefly Conversagent) and targeted development of consumer-facing enterprise customer service agents, which the company marketed as Automated Service Agents. Microsoft acquired Colloquis in October 2006 and proceeded to de-commission SmarterChild and kill off the Automated Service Agent business as well. Robert Hoffer, ActiveBuddy co-founder, licensed the technology from Microsoft after Microsoft abandoned the Colloquis technology.

    Read more →
  • Keyword extraction

    Keyword extraction

    Keyword extraction is tasked with the automatic identification of terms that best describe the subject of a document. Key phrases, key terms, key segments or just keywords are the terminology which is used for defining the terms that represent the most relevant information contained in the document. Although the terminology is different, function is the same: characterization of the topic discussed in a document. The task of keyword extraction is an important problem in text mining, information extraction, information retrieval and natural language processing (NLP). == Keyword assignment vs. extraction == Keyword assignment methods can be roughly divided into: keyword assignment (keywords are chosen from controlled vocabulary or taxonomy) and keyword extraction (keywords are chosen from words that are explicitly mentioned in original text). Methods for automatic keyword extraction can be supervised, semi-supervised, or unsupervised. Unsupervised methods can be further divided into simple statistics, linguistics or graph-based, or ensemble methods that combine some or most of these methods.

    Read more →
  • Cheekd

    Cheekd

    Cheekd is a dating app based in New York City. It was founded in 2010 by Lori Cheek. == History == The service debuted with the name "Cheek'd". Founder Lori Cheek appeared on the television program, Shark Tank in February 2014, but did not succeed in obtaining funding from any of the five judges. She said Cheek’d only had 1000 subscribers at that time. === Business card model === Cheek'd offered two plans, paid and free. For $25, subscribers got a set of 50 business cards that could be given out once someone caught their eye. Each card had a phrase, an online code, and a URL to the subscriber's account. Recipients could look up the giver's profile. In addition to purchasing cards, there was a $9.95 monthly membership fee. === Smartphone app === In 2015, the service's name changed from "Cheek'd" to "Cheekd". The new app used Bluetooth technology to alert users whenever a compatible user was within a 30-foot radius, instead of using cards. == Patent lawsuit == The original business card-based model for Cheekd had been claimed as a patented process by Lori Cheek, as U.S. patent 8,543,465. In September 2017, a complaint was filed, alleging that the idea was not original to Lori Cheek. Cheek responded, stating that the complaint was baseless, and a complete fabrication. The lawsuit Pirri v. Cheek was dismissed in a pre-trial conference in New York's Federal Court on April 5, 2018.

    Read more →
  • GPTs

    GPTs

    GPTs are custom versions of ChatGPT with added instructions and extra knowledge. GPTs can be used and created from the GPT Store. Any user can easily create them without any programming knowledge. GPTs can be tailored for specific writing styles, topics, or tasks. The ability to create GPTs was introduced in November 2023, and by January 2024, more than 3 million GPTs had been published. == Features and uses == GPTs can be configured to answer complex questions in specific fields, solve problems, provide image-based information, or create digital content. They can be programmed as educational tools, purchasing guides, or technical advisors, as well as for many others applications. GPTs are accessed from the GPT Store section of the ChatGPT web page. The “Explore GPT” link opens the store where the most popular GPTs in each section are highlighted. The GPTs are organized by categories. The store also uses a rating system based on user experiences similar to that used by other app stores such as Apple's App Store or Google Play. Those with the best ratings appear at the top of each category. According to La Vanguardia, the most popular categories are: Personal assistants Learning to program Image generation Creative writing Gaming Entertainment It is expected that in the future the creators of GPTs will be able to monetize them. Companies like Moderna are using GPTs to assist in various specific business tasks. The company has created 750 GPTs for its own internal use. == Configuration == Creating GPTs does not require prior programming knowledge. Free users can use existing GPTs but cannot create their own. Paying subscribers can use the editor on the ChatGPT site to configure the GPT's name, image and description, instructions and access to APIs, along with visibility options. == Criticism == The implementation and use of GPTs has not been without criticism. The GPT Store has been criticized for the proliferation of low-quality GPTs and spam due to a lack of effective moderation. There are also concerns about data privacy and security, as GPTs may collect and use personal information in ways that are not always transparent to users.

    Read more →
  • Contextual image classification

    Contextual image classification

    Contextual image classification, a topic of pattern recognition in computer vision, is an approach of classification based on contextual information in images. "Contextual" means this approach is focusing on the relationship of the nearby pixels, which is also called neighbourhood. The goal of this approach is to classify the images by using the contextual information. == Introduction == Similar as processing language, a single word may have multiple meanings unless the context is provided, and the patterns within the sentences are the only informative segments we care about. For images, the principle is same. Find out the patterns and associate proper meanings to them. As the image illustrated below, if only a small portion of the image is shown, it is very difficult to tell what the image is about. Even try another portion of the image, it is still difficult to classify the image. However, if we increase the contextual of the image, then it makes more sense to recognize. As the full images shows below, almost everyone can classify it easily. During the procedure of segmentation, the methods which do not use the contextual information are sensitive to noise and variations, thus the result of segmentation will contain a great deal of misclassified regions, and often these regions are small (e.g., one pixel). Compared to other techniques, this approach is robust to noise and substantial variations for it takes the continuity of the segments into account. Several methods of this approach will be described below. == Applications == === Functioning as a post-processing filter to a labelled image === This approach is very effective against small regions caused by noise. And these small regions are usually formed by few pixels or one pixel. The most probable label is assigned to these regions. However, there is a drawback of this method. The small regions also can be formed by correct regions rather than noise, and in this case the method is actually making the classification worse. This approach is widely used in remote sensing applications. === Improving the post-processing classification === This is a two-stage classification process: For each pixel, label the pixel and form a new feature vector for it. Use the new feature vector and combine the contextual information to assign the final label to the === Merging the pixels in earlier stages === Instead of using single pixels, the neighbour pixels can be merged into homogeneous regions benefiting from contextual information. And provide these regions to classifier. === Acquiring pixel feature from neighbourhood === The original spectral data can be enriched by adding the contextual information carried by the neighbour pixels, or even replaced in some occasions. This kind of pre-processing methods are widely used in textured image recognition. The typical approaches include mean values, variances, texture description, etc. === Combining spectral and spatial information === The classifier uses the grey level and pixel neighbourhood (contextual information) to assign labels to pixels. In such case the information is a combination of spectral and spatial information. === Powered by the Bayes minimum error classifier === Contextual classification of image data is based on the Bayes minimum error classifier (also known as a naive Bayes classifier). Present the pixel: A pixel is denoted as x 0 {\displaystyle x_{0}} . The neighbourhood of each pixel x 0 {\displaystyle x_{0}} is a vector and denoted as N ( x 0 ) {\displaystyle N(x_{0})} . The values in the neighbourhood vector is denoted as f ( x i ) {\displaystyle f(x_{i})} . Each pixel is presented by the vector ξ = ( f ( x 0 ) , f ( x 1 ) , … , f ( x k ) ) {\displaystyle \xi =\left(f(x_{0}),f(x_{1}),\ldots ,f(x_{k})\right)} x i ∈ N ( x 0 ) ; i = 1 , … , k {\displaystyle x_{i}\in N(x_{0});\quad i=1,\ldots ,k} The labels (classification) of pixels in the neighbourhood N ( x 0 ) {\displaystyle N(x_{0})} are presented as a vector η = ( θ 0 , θ 1 , … , θ k ) {\displaystyle \eta =\left(\theta _{0},\theta _{1},\ldots ,\theta _{k}\right)} θ i ∈ { ω 0 , ω 1 , … , ω k } {\displaystyle \theta _{i}\in \left\{\omega _{0},\omega _{1},\ldots ,\omega _{k}\right\}} ω s {\displaystyle \omega _{s}} here denotes the assigned class. A vector presents the labels in the neighbourhood N ( x 0 ) {\displaystyle N(x_{0})} without the pixel x 0 {\displaystyle x_{0}} η ^ = ( θ 1 , θ 2 , … , θ k ) {\displaystyle {\hat {\eta }}=\left(\theta _{1},\theta _{2},\ldots ,\theta _{k}\right)} The neighbourhood: Size of the neighbourhood. There is no limitation of the size, but it is considered to be relatively small for each pixel x 0 {\displaystyle x_{0}} . A reasonable size of neighbourhood would be 3 × 3 {\displaystyle 3\times 3} of 4-connectivity or 8-connectivity ( x 0 {\displaystyle x_{0}} is marked as red and placed in the centre). The calculation: Apply the minimum error classification on a pixel x 0 {\displaystyle x_{0}} , if the probability of a class ω r {\displaystyle \omega _{r}} being presenting the pixel x 0 {\displaystyle x_{0}} is the highest among all, then assign ω r {\displaystyle \omega _{r}} as its class. θ 0 = ω r if P ( ω r ∣ f ( x 0 ) ) = max s = 1 , 2 , … , R P ( ω s ∣ f ( x 0 ) ) {\displaystyle \theta _{0}=\omega _{r}\quad {\text{ if }}\quad P(\omega _{r}\mid f(x_{0}))=\max _{s=1,2,\ldots ,R}P(\omega _{s}\mid f(x_{0}))} The contextual classification rule is described as below, it uses the feature vector x 1 {\displaystyle x_{1}} rather than x 0 {\displaystyle x_{0}} . θ 0 = ω r if P ( ω r ∣ ξ ) = max s = 1 , 2 , … , R P ( ω s ∣ ξ ) {\displaystyle \theta _{0}=\omega _{r}\quad {\text{ if }}\quad P(\omega _{r}\mid \xi )=\max _{s=1,2,\ldots ,R}P(\omega _{s}\mid \xi )} Use the Bayes formula to calculate the posteriori probability P ( ω s ∣ ξ ) {\displaystyle P(\omega _{s}\mid \xi )} P ( ω s ∣ ξ ) = p ( ξ ∣ ω s ) P ( ω s ) p ( ξ ) {\displaystyle P(\omega _{s}\mid \xi )={\frac {p(\xi \mid \omega _{s})P(\omega _{s})}{p\left(\xi \right)}}} The number of vectors is the same as the number of pixels in the image. For the classifier uses a vector corresponding to each pixel x i {\displaystyle x_{i}} , and the vector is generated from the pixel's neighbourhood. The basic steps of contextual image classification: Calculate the feature vector ξ {\displaystyle \xi } for each pixel. Calculate the parameters of probability distribution p ( ξ ∣ ω s ) {\displaystyle p(\xi \mid \omega _{s})} and P ( ω s ) {\displaystyle P(\omega _{s})} Calculate the posterior probabilities P ( ω r ∣ ξ ) {\displaystyle P(\omega _{r}\mid \xi )} and all labels θ 0 {\displaystyle \theta _{0}} . Get the image classification result. == Algorithms == === Template matching === The template matching is a "brute force" implementation of this approach. The concept is first create a set of templates, and then look for small parts in the image match with a template. This method is computationally high and inefficient. It keeps an entire templates list during the whole process and the number of combinations is extremely high. For a m × n {\displaystyle m\times n} pixel image, there could be a maximum of 2 m × n {\displaystyle 2^{m\times n}} combinations, which leads to high computation. This method is a top down method and often called table look-up or dictionary look-up. === Lower-order Markov chain === The Markov chain also can be applied in pattern recognition. The pixels in an image can be recognised as a set of random variables, then use the lower order Markov chain to find the relationship among the pixels. The image is treated as a virtual line, and the method uses conditional probability. === Hilbert space-filling curves === The Hilbert curve runs in a unique pattern through the whole image, it traverses every pixel without visiting any of them twice and keeps a continuous curve. It is fast and efficient. === Markov meshes === The lower-order Markov chain and Hilbert space-filling curves mentioned above are treating the image as a line structure. The Markov meshes however will take the two dimensional information into account. === Dependency tree === The dependency tree is a method using tree dependency to approximate probability distributions.

    Read more →
  • Backend as a service

    Backend as a service

    Backend as a service (BaaS), sometimes also referred to as mobile backend as a service (MBaaS), is a service for providing web app and mobile app developers with a way to easily build a backend to their frontend applications. Features available include user management, push notifications, and integration with social networking services. These services are provided via the use of custom software development kits (SDKs) and application programming interfaces (APIs). BaaS is a relatively recent development in cloud computing, with most BaaS startups dating from 2011 or later. Some of the most popular service providers are AWS Amplify and Firebase. == Purpose == Web and mobile apps require a similar set of features on the backend, including notification service, integration with social networks, and cloud storage. Each of these services has its own API that must be individually incorporated into an app, a process that can be time-consuming and complicated for app developers. BaaS providers form a bridge between the frontend of an application and various cloud-based backends via a unified API and SDK. Providing a consistent way to manage backend data means that developers do not need to redevelop their own backend for each of the services that their apps need to access, potentially saving both time and money. Although similar to other cloud-computing business models, such as serverless computing, software as a service (SaaS), infrastructure as a service (IaaS), and platform as a service (PaaS), BaaS is distinct from these other services in that it specifically addresses the cloud-computing needs of web and mobile app developers by providing a unified means of connecting their apps to cloud services. == Features == BaaS providers offer different set of features and backend tools. Some of the most common features include: Database management. Most BaaS solutions provide SQL and/or NoSQL database management services for applications. Developers can store their app data without deploying and managing databases themselves. BaaS usually provides client SDKs, REST and GraphQL APIs for the frontend to interact with databases. File storage. BaaS providers often offer storage solutions for media files, user uploads, and other binary data. Applications can upload, download, and delete files through provided SDKs and APIs. Authentication and authorization. Some BaaS offer authentication and authorization services that allow developers to easily manage app users. This includes user sign-up, login, password reset, social media login integration through OAuth, user group and permission management etc. Notification service. Some BaaS providers such as Firebase and AWS Amplify have notification services that can send custom emails to users and push native notifications on mobile platforms. This is especially useful for applications that need to send messages, alerts, and reminders. Cloud functions. Some BaaS allow developers to deploy and run serverless functions. The functions are usually stateless and can be triggered by various ways including HTTP requests, SDK invocation, background server events, and cloud scheduled executions. Different providers offer runtime support for different languages, some of the popular languages are JavaScript/TypeScript (Node.js, Deno), Python, Java/Kotlin. Cloud functions extend the potential and flexibility of BaaS by allowing developers to write custom functionalities for their apps, working in a way similar to a traditional REST API backend framework. Usage analytics. Analytics data about application usage is often included in BaaS. This allows developers to monitor user behaviors and make decisions correspondingly in marketing strategies and performance optimizations. UI design. Some BaaS providers, such as AWS Amplify and Backendless, offer user interface designing tools that help developers design the frontend UI of web and mobile apps. While this may be useful for small teams and individual developers, UI design assistance may not be conventional in BaaS as it goes beyond the scope of backend infrastructure. Real-Time. Real-time features in a BaaS platform ensure that data updates and synchronizations occur instantly across all clients, making changes immediately visible to users. This is crucial for applications like live chat and collaborative tools, using technologies like WebSockets to maintain continuous server-client connections. == Service providers == BaaS providers have a broad focus, providing SDKs and APIs that work for app development on multiple platforms with different technology stacks, such as JavaScript (for Web apps), Flutter, Java/Kotlin (for Android apps), Swift/Objective-C (for iOS/MacOS/WatchOS/TvOS apps), .NET (for Windows) and others. BaaS providers also come in different types, suiting developers of different needs. === Cloud-based BaaS === Most BaaS providers host backend platforms on their cloud servers. They also manage the infrastructure, security, and scalability of the platforms. Developers can access the backend services via a web interface or the provided APIs. Some examples of cloud-based BaaS include Firebase (hosted on Google Cloud Platform), AWS Amplify (hosted on Amazon Web Services), and Microsoft Azure Mobile Apps (hosted on Microsoft Azure). === Self-hosted BaaS === Self-hosted BaaS allow developers to host backend on their own servers, providing more flexibility and potential to customization compared to cloud-based BaaS, which often is more difficult to migrate from. However, developers are also in charge of managing the infrastructure, security, and scalability of their servers. === Mobile BaaS === Mobile backend as a service (MBaaS) is a type of BaaS specifically for applications deployed in mobile systems. While some references use MBaaS interchangeably for BaaS, BaaS can have a wider variety of support such as for web apps and desktop apps. == Business model == BaaS providers generate revenue from their services in various ways, often using a freemium model. Under this model, a client receives a certain number of free active users or API calls per month, and pays a fee for each user or call over this limit. Alternatively, clients can pay a set fee for a package which allows for a greater number of calls or active users per month. There are also flat fee plans that make the pricing more predictable. Some of the providers offer the unlimited API calls inside their free plan offerings. Another business model that has been used by a lot of BaaS providers is PAYG (pay as you go), which has a flexible cost based on developers' usage of database, storage, bandwidth, function calls, user numbers etc.

    Read more →
  • Pooling layer

    Pooling layer

    In neural networks, a pooling layer is a kind of network layer that downsamples and aggregates information that is dispersed among many vectors into fewer vectors. It has several uses. It removes redundant information, thus reducing the amount of computation and memory required, which makes the model more robust to small variations in the input; and it increases the receptive field of neurons in later layers in the network. == Convolutional neural network pooling == Pooling is most commonly used in convolutional neural networks (CNN). Below is a description of pooling in 2-dimensional CNNs. The generalization to n-dimensions is immediate. As notation, we consider a tensor x ∈ R H × W × C {\displaystyle x\in \mathbb {R} ^{H\times W\times C}} , where H {\displaystyle H} is height, W {\displaystyle W} is width, and C {\displaystyle C} is the number of channels. A pooling layer outputs a tensor y ∈ R H ′ × W ′ × C ′ {\displaystyle y\in \mathbb {R} ^{H'\times W'\times C'}} . We define two variables f , s {\displaystyle f,s} called "filter size" (aka "kernel size") and "stride". Sometimes, it is necessary to use a different filter size and stride for horizontal and vertical directions. In such cases, we define 4 variables: f H , f W , s H , s W {\displaystyle f_{H},f_{W},s_{H},s_{W}} . The receptive field of an entry in the output tensor, y {\displaystyle y} , are all the entries in x {\displaystyle x} that can affect that entry. === Max pooling === Max Pooling (MaxPool) is commonly used in CNNs to reduce the spatial dimensions of feature maps. Define M a x P o o l ( x | f , s ) 0 , 0 , 0 = max ( x 0 : f − 1 , 0 : f − 1 , 0 ) {\displaystyle \mathrm {MaxPool} (x|f,s)_{0,0,0}=\max(x_{0:f-1,0:f-1,0})} where 0 : f − 1 {\displaystyle 0:f-1} means the range 0 , 1 , … , f − 1 {\displaystyle 0,1,\dots ,f-1} . Note that we need to avoid the off-by-one error. The next input is M a x P o o l ( x | f , s ) 1 , 0 , 0 = max ( x s : s + f − 1 , 0 : f − 1 , 0 ) {\displaystyle \mathrm {MaxPool} (x|f,s)_{1,0,0}=\max(x_{s:s+f-1,0:f-1,0})} and so on. The receptive field of y i , j , c {\displaystyle y_{i,j,c}} is x i s + f − 1 , j s + f − 1 , c {\displaystyle x_{is+f-1,js+f-1,c}} , so in general, M a x P o o l ( x | f , s ) i , j , c = m a x ( x i s : i s + f − 1 , j s : j s + f − 1 , c ) {\displaystyle \mathrm {MaxPool} (x|f,s)_{i,j,c}=\mathrm {max} (x_{is:is+f-1,js:js+f-1,c})} If the horizontal and vertical filter size and strides differ, then in general, M a x P o o l ( x | f , s ) i , j , c = m a x ( x i s H : i s H + f H − 1 , j s W : j s W + f W − 1 , c ) {\displaystyle \mathrm {MaxPool} (x|f,s)_{i,j,c}=\mathrm {max} (x_{is_{H}:is_{H}+f_{H}-1,js_{W}:js_{W}+f_{W}-1,c})} More succinctly, we can write y k = max ( { x k ′ | k ′ in the receptive field of k } ) {\displaystyle y_{k}=\max(\{x_{k'}|k'{\text{ in the receptive field of }}k\})} . If H {\displaystyle H} is not expressible as k s + f {\displaystyle ks+f} where k {\displaystyle k} is an integer, then for computing the entries of the output tensor on the boundaries, max pooling would attempt to take as inputs variables off the tensor. In this case, how those non-existent variables are handled depends on the padding conditions, illustrated on the right. Global Max Pooling (GMP) is a specific kind of max pooling where the output tensor has shape R C {\displaystyle \mathbb {R} ^{C}} and the receptive field of y c {\displaystyle y_{c}} is all of x 0 : H , 0 : W , c {\displaystyle x_{0:H,0:W,c}} . That is, it takes the maximum over each entire channel. It is often used just before the final fully connected layers in a CNN classification head. === Average pooling === Average pooling (AvgPool) is similarly defined A v g P o o l ( x | f , s ) i , j , c = a v e r a g e ( x i s : i s + f − 1 , j s : j s + f − 1 , c ) = 1 f 2 ∑ k ∈ i s : i s + f − 1 ∑ l ∈ j s : j s + f − 1 x k , l , c {\displaystyle \mathrm {AvgPool} (x|f,s)_{i,j,c}=\mathrm {average} (x_{is:is+f-1,js:js+f-1,c})={\frac {1}{f^{2}}}\sum _{k\in is:is+f-1}\sum _{l\in js:js+f-1}x_{k,l,c}} Global Average Pooling (GAP) is defined similarly to GMP. It was first proposed in Network-in-Network. Similarly to GMP, it is often used just before the final fully connected layers in a CNN classification head. === Interpolations === There are some interpolations of max pooling and average pooling. Mixed Pooling is a linear sum of max pooling and average pooling. That is, M i x e d P o o l ( x | f , s , w ) = w M a x P o o l ( x | f , s ) + ( 1 − w ) A v g P o o l ( x | f , s ) {\displaystyle \mathrm {MixedPool} (x|f,s,w)=w\mathrm {MaxPool} (x|f,s)+(1-w)\mathrm {AvgPool} (x|f,s)} where w ∈ [ 0 , 1 ] {\displaystyle w\in [0,1]} is either a hyperparameter, a learnable parameter, or randomly sampled anew every time. Lp Pooling is similar to average pooling, but uses Lp norm average instead of average: y k = ( 1 N ∑ k ′ in the receptive field of k | x k ′ | p ) 1 / p {\displaystyle y_{k}=\left({\frac {1}{N}}\sum _{k'{\text{ in the receptive field of }}k}|x_{k'}|^{p}\right)^{1/p}} where N {\displaystyle N} is the size of receptive field, and p ≥ 1 {\displaystyle p\geq 1} is a hyperparameter. If all activations are non-negative, then average pooling is the case of p = 1 {\displaystyle p=1} , and max pooling is the case of p → ∞ {\displaystyle p\to \infty } . Square-root pooling is the case of p = 2 {\displaystyle p=2} . Stochastic pooling samples a random activation x k ′ {\displaystyle x_{k'}} from the receptive field with probability x k ′ ∑ k ″ x k ″ {\displaystyle {\frac {x_{k'}}{\sum _{k''}x_{k''}}}} . It is the same as average pooling in expectation. Softmax pooling is like max pooling, but uses softmax, i.e. ∑ k ′ e β x k ′ x k ′ ∑ k ″ e β x k ″ {\displaystyle {\frac {\sum _{k'}e^{\beta x_{k'}}x_{k'}}{\sum _{k''}e^{\beta x_{k''}}}}} where β > 0 {\displaystyle \beta >0} . Average pooling is the case of β ↓ 0 {\displaystyle \beta \downarrow 0} , and max pooling is the case of β ↑ ∞ {\displaystyle \beta \uparrow \infty } Local Importance-based Pooling generalizes softmax pooling by ∑ k ′ e g ( x k ′ ) x k ′ ∑ k ″ e g ( x k ″ ) {\displaystyle {\frac {\sum _{k'}e^{g(x_{k'})}x_{k'}}{\sum _{k''}e^{g(x_{k''})}}}} where g {\displaystyle g} is a learnable function. === Other poolings === Spatial pyramidal pooling applies max pooling (or any other form of pooling) in a pyramid structure. That is, it applies global max pooling, then applies max pooling to the image divided into 4 equal parts, then 16, etc. The results are then concatenated. It is a hierarchical form of global pooling, and similar to global pooling, it is often used just before a classification head. Region of Interest Pooling (also known as RoI pooling) is a variant of max pooling used in R-CNNs for object detection. It is designed to take an arbitrarily-sized input matrix, and output a fixed-sized output matrix. Covariance pooling computes the covariance matrix of the vectors { x k , l , 0 : C − 1 } k ∈ i s : i s + f − 1 , l ∈ j s : j s + f − 1 {\displaystyle \{x_{k,l,0:C-1}\}_{k\in is:is+f-1,l\in js:js+f-1}} which is then flattened to a C 2 {\displaystyle C^{2}} -dimensional vector y i , j , 0 : C 2 − 1 {\displaystyle y_{i,j,0:C^{2}-1}} . Global covariance pooling is used similarly to global max pooling. As average pooling computes the average, which is a first-degree statistic, and covariance is a second-degree statistic, covariance pooling is also called "second-order pooling". It can be generalized to higher-order poolings. Blur Pooling means applying a blurring method before downsampling. For example, the Rect-2 blur pooling means taking an average pooling at f = 2 , s = 1 {\displaystyle f=2,s=1} , then taking every second pixel (identity with s = 2 {\displaystyle s=2} ). == Vision Transformer pooling == In Vision Transformers (ViT), there are the following common kinds of poolings. BERT-like pooling uses a dummy [CLS] token, "classification". For classification, the output at [CLS] is the classification token, which is then processed by a LayerNorm-feedforward-softmax module into a probability distribution, which is the network's prediction of class probability distribution. This is the one used by the original ViT and Masked Autoencoder. Global average pooling (GAP) does not use the dummy token, but simply takes the average of all output tokens as the classification token. It was mentioned in the original ViT as being equally good. Multihead attention pooling (MAP) applies a multi headed attention block to pooling. Specifically, it takes as input a list of vectors x 1 , x 2 , … , x n {\displaystyle x_{1},x_{2},\dots ,x_{n}} , which might be thought of as the output vectors of a layer of a ViT. It then applies a feedforward layer F F N {\displaystyle \mathrm {FFN} } on each vector, resulting in a matrix V = [ F F N ( v 1 ) , … , F F N ( v n ) ] {\displaystyle V=[\mathrm {FFN} (v_{1}),\dots ,\mathrm {FFN} (v_{n})]} . This is then sent to a multi-headed attention, resulting in M u l t i h e a d e d A

    Read more →
  • Ayoba

    Ayoba

    Ayoba is an African communication platform developed in South Africa. It is owned by Progressive Tech Holdings in Mauritius and managed by SIMFY Africa. Launched on May 4, 2019, as of April 2024, it has over 35 million active users. == History == Ayoba was first published on Google Play in February 2019. Its first marketing campaign and brand launch took place in Cameroon on May 4, 2019. In June 2019, the platform introduced its first eight channels. In November 2019, the platform reached one million active users, which increased to two million by June 2020. Subsequently, ayoba expanded its services, including the launch of games for Android in February 2020, Momo (Mobile Money) in Cameroon in May 2020, and MicroApps in May 2020. It also launched music and voice and video calling features in 12 territories in August 2020. The first version of ayoba for iOS was released in September 2020. In December of the same year, games and Messaging 2.0 were launched on the platform. In November 2020, it won Best Mobile Application at the African Digital Awards. In 2021, it won OTT Brand of the Year at the Marketing World Awards in Ghana. In December 2022, it received Top Innovative Technology and Telecom Product of the Year at the National Communications Awards in December 2022. In June 2023 ayoba partnered with BoomPlay and as of April 2024, it had 35 million monthly active users. Ayoba has partnered with Jumia Ghana to offer exclusive deals to users. Ayoba users can get a 10% discount on selected Jumia purchases through the app, with no data charges for MTN users. This partnership aims to make online shopping more affordable and accessible by integrating Jumia's offers into the ayoba app. Ayoba supports over 35 million users across Africa and provides services in 22 languages. To access the deals, users can download the ayoba app from the Google Play Store, iOS Store, or the official website. == Platform features == Chat, Call and Share: ayoba enables instant messaging, voice notes, picture sharing, and file sharing with contacts, even if they do not have the app installed. The app supports voice and video calls on both Android and iOS, as well as group chats, help channel and SMS continuity (non ayoba users receive messages as SMS, their responses appear in the ayoba app). Music: ayoba offers a free music player with daily updates on international and African music. Users can find playlists for different genres. Games: ayoba provides a selection of interactive games, including action, adventure, and children's games available on both Android and iOS. Mobile Money Transfers: In certain territories, ayoba supports mobile money transfers using MTN Mobile Money (MoMo) for transactions within the app. MicroApps: ayoba features individual MicroApps within the platform that offer content and services, including streaming channels, podcasts, and specialized apps. The availability of these apps may vary by country. == Operations == ayoba primarily focuses on the following territories: Nigeria, Cameroon, South Africa, Ghana, Côte d'Ivoire, Uganda, Republic of Congo, Benin, Zambia, Tanzania, Kenya, Senegal, Togo, Guinea Bissau, Guinea Conakry, Sudan, South Sudan, and Liberia. The company operates from its offices in Cape Town and Johannesburg, South Africa. David Gillaranz served as the CEO from 2019 to 2021, and Burak Akinci has been the CEO since 2021.

    Read more →
  • Shape context

    Shape context

    Shape context is a feature descriptor used in object recognition. Serge Belongie and Jitendra Malik proposed the term in their paper "Matching with Shape Contexts" in 2000. == Theory == The shape context is intended to be a way of describing shapes that allows for measuring shape similarity and the recovering of point correspondences. The basic idea is to pick n points on the contours of a shape. For each point pi on the shape, consider the n − 1 vectors obtained by connecting pi to all other points. The set of all these vectors is a rich description of the shape localized at that point but is far too detailed. The key idea is that the distribution over relative positions is a robust, compact, and highly discriminative descriptor. So, for the point pi, the coarse histogram of the relative coordinates of the remaining n − 1 points, h i ( k ) = # { q ≠ p i : ( q − p i ) ∈ bin ( k ) } {\displaystyle h_{i}(k)=\#\{q\neq p_{i}:(q-p_{i})\in {\mbox{bin}}(k)\}} is defined to be the shape context of p i {\displaystyle p_{i}} . The bins are normally taken to be uniform in log-polar space. The fact that the shape context is a rich and discriminative descriptor can be seen in the figure below, in which the shape contexts of two different versions of the letter "A" are shown. (a) and (b) are the sampled edge points of the two shapes. (c) is the diagram of the log-polar bins used to compute the shape context. (d) is the shape context for the point marked with a circle in (a), (e) is that for the point marked as a diamond in (b), and (f) is that for the triangle. As can be seen, since (d) and (e) are the shape contexts for two closely related points, they are quite similar, while the shape context in (f) is very different. For a feature descriptor to be useful, it needs to have certain invariances. In particular it needs to be invariant to translation, scaling, small perturbations, and, depending on the application, rotation. Translational invariance comes naturally to shape context. Scale invariance is obtained by normalizing all radial distances by the mean distance α {\displaystyle \alpha } between all the point pairs in the shape although the median distance can also be used. Shape contexts are empirically demonstrated to be robust to deformations, noise, and outliers using synthetic point set matching experiments. One can provide complete rotational invariance in shape contexts. One way is to measure angles at each point relative to the direction of the tangent at that point (since the points are chosen on edges). This results in a completely rotationally invariant descriptor. But of course this is not always desired since some local features lose their discriminative power if not measured relative to the same frame. Many applications in fact forbid rotational invariance e.g. distinguishing a "6" from a "9". == Use in shape matching == A complete system that uses shape contexts for shape matching consists of the following steps (which will be covered in more detail in the Details of Implementation section): Randomly select a set of points that lie on the edges of a known shape and another set of points on an unknown shape. Compute the shape context of each point found in step 1. Match each point from the known shape to a point on an unknown shape. To minimize the cost of matching, first choose a transformation (e.g. affine, thin plate spline, etc.) that warps the edges of the known shape to the unknown (essentially aligning the two shapes). Then select the point on the unknown shape that most closely corresponds to each warped point on the known shape. Calculate the "shape distance" between each pair of points on the two shapes. Use a weighted sum of the shape context distance, the image appearance distance, and the bending energy (a measure of how much transformation is required to bring the two shapes into alignment). To identify the unknown shape, use a nearest-neighbor classifier to compare its shape distance to shape distances of known objects. == Details of implementation == === Step 1: Finding a list of points on shape edges === The approach assumes that the shape of an object is essentially captured by a finite subset of the points on the internal or external contours on the object. These can be simply obtained using the Canny edge detector and picking a random set of points from the edges. Note that these points need not and in general do not correspond to key-points such as maxima of curvature or inflection points. It is preferable to sample the shape with roughly uniform spacing, though it is not critical. === Step 2: Computing the shape context === This step is described in detail in the Theory section. === Step 3: Computing the cost matrix === Consider two points p and q that have normalized K-bin histograms (i.e. shape contexts) g(k) and h(k). As shape contexts are distributions represented as histograms, it is natural to use the χ2 test statistic as the "shape context cost" of matching the two points: C S = 1 2 ∑ k = 1 K [ g ( k ) − h ( k ) ] 2 g ( k ) + h ( k ) {\displaystyle C_{S}={\frac {1}{2}}\sum _{k=1}^{K}{\frac {[g(k)-h(k)]^{2}}{g(k)+h(k)}}} The values of this range from 0 to 1. In addition to the shape context cost, an extra cost based on the appearance can be added. For instance, it could be a measure of tangent angle dissimilarity (particularly useful in digit recognition): C A = 1 2 ‖ ( cos ⁡ ( θ 1 ) sin ⁡ ( θ 1 ) ) − ( cos ⁡ ( θ 2 ) sin ⁡ ( θ 2 ) ) ‖ {\displaystyle C_{A}={\frac {1}{2}}{\begin{Vmatrix}{\dbinom {\cos(\theta _{1})}{\sin(\theta _{1})}}-{\dbinom {\cos(\theta _{2})}{\sin(\theta _{2})}}\end{Vmatrix}}} This is half the length of the chord in unit circle between the unit vectors with angles θ 1 {\displaystyle \theta _{1}} and θ 2 {\displaystyle \theta _{2}} . Its values also range from 0 to 1. Now the total cost of matching the two points could be a weighted-sum of the two costs: C = ( 1 − β ) C S + β C A {\displaystyle C=(1-\beta )C_{S}+\beta C_{A}\!\,} Now for each point pi on the first shape and a point qj on the second shape, calculate the cost as described and call it Ci,j. This is the cost matrix. === Step 4: Finding the matching that minimizes total cost === Now, a one-to-one matching π ( i ) {\displaystyle \pi (i)} that matches each point pi on shape 1 and qj on shape 2 that minimizes the total cost of matching, H ( π ) = ∑ i C ( p i , q π ( i ) ) {\displaystyle H(\pi )=\sum _{i}C\left(p_{i},q_{\pi (i)}\right)} is needed. This can be done in O ( N 3 ) {\displaystyle O(N^{3})} time using the Hungarian method, although there are more efficient algorithms. To have robust handling of outliers, one can add "dummy" nodes that have a constant but reasonably large cost of matching to the cost matrix. This would cause the matching algorithm to match outliers to a "dummy" if there is no real match. === Step 5: Modeling transformation === Given the set of correspondences between a finite set of points on the two shapes, a transformation T : R 2 → R 2 {\displaystyle T:\mathbb {R} ^{2}\to \mathbb {R} ^{2}} can be estimated to map any point from one shape to the other. There are several choices for this transformation, described below. ==== Affine ==== The affine model is a standard choice: T ( p ) = A p + o {\displaystyle T(p)=Ap+o\!} . The least squares solution for the matrix A {\displaystyle A} and the translational offset vector o is obtained by: o = 1 n ∑ i = 1 n ( p i − q π ( i ) ) , A = ( Q + P ) t {\displaystyle o={\frac {1}{n}}\sum _{i=1}^{n}\left(p_{i}-q_{\pi (i)}\right),A=(Q^{+}P)^{t}} Where P = ( 1 p 11 p 12 ⋮ ⋮ ⋮ 1 p n 1 p n 2 ) {\displaystyle P={\begin{pmatrix}1&p_{11}&p_{12}\\\vdots &\vdots &\vdots \\1&p_{n1}&p_{n2}\end{pmatrix}}} with a similar expression for Q {\displaystyle Q\!} . Q + {\displaystyle Q^{+}\!} is the pseudoinverse of Q {\displaystyle Q\!} . ==== Thin plate spline ==== The thin plate spline (TPS) model is the most widely used model for transformations when working with shape contexts. A 2D transformation can be separated into two TPS function to model a coordinate transform: T ( x , y ) = ( f x ( x , y ) , f y ( x , y ) ) {\displaystyle T(x,y)=\left(f_{x}(x,y),f_{y}(x,y)\right)} where each of the ƒx and ƒy have the form: f ( x , y ) = a 1 + a x x + a y y + ∑ i = 1 n ω i U ( ‖ ( x i , y i ) − ( x , y ) ‖ ) , {\displaystyle f(x,y)=a_{1}+a_{x}x+a_{y}y+\sum _{i=1}^{n}\omega _{i}U\left({\begin{Vmatrix}(x_{i},y_{i})-(x,y)\end{Vmatrix}}\right),} and the kernel function U ( r ) {\displaystyle U(r)\!} is defined by U ( r ) = r 2 log ⁡ r 2 {\displaystyle U(r)=r^{2}\log r^{2}\!} . The exact details of how to solve for the parameters can be found elsewhere but it essentially involves solving a linear system of equations. The bending energy (a measure of how much transformation is needed to align the points) will also be easily obtained. ==== Regularized TPS ==== The TPS formulation above has exact matching requirement for the pairs of points on the two shapes. For noisy data, it is best to

    Read more →
  • Automatic acquisition of sense-tagged corpora

    Automatic acquisition of sense-tagged corpora

    The knowledge acquisition bottleneck is perhaps the major impediment to solving the word-sense disambiguation (WSD) problem. Unsupervised learning methods rely on knowledge about word senses, which is barely formulated in dictionaries and lexical databases. Supervised learning methods depend heavily on the existence of manually annotated examples for every word sense, a requisite that can so far be met only for a handful of words for testing purposes, as it is done in the Senseval exercises. == Existing methods == Therefore, one of the most promising trends in WSD research is using the largest corpus ever accessible, the World Wide Web, to acquire lexical information automatically. WSD has been traditionally understood as an intermediate language engineering technology which could improve applications such as information retrieval (IR). In this case, however, the reverse is also true: Web search engines implement simple and robust IR techniques that can be successfully used when mining the Web for information to be employed in WSD. The most direct way of using the Web (and other corpora) to enhance WSD performance is the automatic acquisition of sense-tagged corpora, the fundamental resource to feed supervised WSD algorithms. Although this is far from being commonplace in the WSD literature, a number of different and effective strategies to achieve this goal have already been proposed. Some of these strategies are: acquisition by direct Web searching (searches for monosemous synonyms, hypernyms, hyponyms, parsed gloss' words, etc.), Yarowsky algorithm (bootstrapping), acquisition via Web directories, and acquisition via cross-language meaning evidences. == Summary == === Optimistic results === The automatic extraction of examples to train supervised learning algorithms reviewed has been, by far, the best explored approach to mine the web for word-sense disambiguation. Some results are certainly encouraging: In some experiments, the quality of the Web data for WSD equals that of human-tagged examples. This is the case of the monosemous relatives plus bootstrapping with Semcor seeds technique and the examples taken from the ODP Web directories. In the first case, however, Semcor-size example seeds are necessary (and only available for English), and it has only been tested with a very limited set of nouns; in the second case, the coverage is quite limited, and it is not yet clear whether it can be grown without compromising the quality of the examples retrieved. It has been shown that a mainstream supervised learning technique trained exclusively with web data can obtain better results than all unsupervised WSD systems which participated at Senseval-2. Web examples made a significant contribution to the best Senseval-2 English all-words system. === Difficulties === There are, however, several open research issues related to the use of Web examples in WSD: High precision in the retrieved examples (i.e., correct sense assignments for the examples) does not necessarily lead to good supervised WSD results (i.e., the examples are possibly not useful for training). The most complete evaluation of Web examples for supervised WSD indicates that learning with Web data improves over unsupervised techniques, but the results are nevertheless far from those obtained with hand-tagged data, and do not even beat the most-frequent-sense baseline. Results are not always reproducible; the same or similar techniques may lead to different results in different experiments. Compare, for instance, Mihalcea (2002) with Agirre and Martínez (2004), or Agirre and Martínez (2000) with Mihalcea and Moldovan (1999). Results with Web data seem to be very sensitive to small differences in the learning algorithm, to when the corpus was extracted (search engines change continuously), and on small heuristic issues (e.g., differences in filters to discard part of the retrieved examples). Results are strongly dependent on bias (i.e., on the relative frequencies of examples per word sense). It is unclear whether this is simply a problem of Web data, or an intrinsic problem of supervised learning techniques, or just a problem of how WSD systems are evaluated (indeed, testing with rather small Senseval data may overemphasize sense distributions compared to sense distributions obtained from the full Web as corpus). In any case, Web data has an intrinsic bias, because queries to search engines directly constrain the context of the examples retrieved. There are approaches that alleviate this problem, such as using several different seeds/queries per sense or assigning senses to Web directories and then scanning directories for examples; but this problem is nevertheless far from being solved. Once a Web corpus of examples is built, it is not entirely clear whether its distribution is safe from a legal perspective. === Future === Besides automatic acquisition of examples from the Web, there are some other WSD experiments that have profited from the Web: The Web as a social network has been successfully used for cooperative annotation of a corpus (OMWE, Open Mind Word Expert project), which has already been used in three Senseval-3 tasks (English, Romanian and Multilingual). The Web has been used to enrich WordNet senses with domain information: topic signatures and Web directories, which have in turn been successfully used for WSD. Also, some research benefited from the semantic information that the Wikipedia maintains on its disambiguation pages. It is clear, however, that most research opportunities remain largely unexplored. For instance, little is known about how to use lexical information extracted from the Web in knowledge-based WSD systems; and it is also hard to find systems that use Web-mined parallel corpora for WSD, even though there are already efficient algorithms that use parallel corpora in WSD.

    Read more →
  • Product-family engineering

    Product-family engineering

    Product-family engineering (PFE), also known as product-line engineering (PLE), is based on the ideas of "domain engineering" created by the Software Engineering Institute, a term coined by James Neighbors in his 1980 dissertation at University of California, Irvine. Software product lines are quite common in our daily lives, but before a product family can be successfully established, an extensive process has to be followed. This process is known as product-family engineering. Product-family engineering can be defined as a method that creates an underlying architecture of an organization's product platform. It provides an architecture that is based on commonality as well as planned variabilities. The various product variants can be derived from the basic product family, which creates the opportunity to reuse and differentiate on products in the family. Product-family engineering is conceptually similar to the widespread use of vehicle platforms in the automotive industry. Product-family engineering is a relatively new approach to the creation of new products, recently evolving to Model-Based Product Line Engineering (MBPLE), emphasizing the centrality of a model-centric approach in PLE. It focuses on the process of engineering new products in such a way that it is possible to reuse product components and apply variability with decreased costs and time. Product-family engineering is all about reusing components and structures as much as possible, according to the ISO/IEC 26550/2015 and the latest ISO/IEC 26580/2021 that introduced the concept of feature-based Product Line Engineering. Several studies have proven that using a product-family engineering approach for product development can have several benefits. Here is a list of some of them: Higher productivity Higher quality Faster time-to-market Lower labor needs The Nokia case mentioned below also illustrates these benefits. In 2025 the publishing of the book Model-Based Product Line Engineering (MBPLE): The feature-based path to product lines success by Marco Forlingieri, Tim Weilkiens and Hugo Guillermo Chalé-Gongora formalized the foundation of the discipline, including best practices and new industrial cases. == Overall process == The product family engineering process consists of several phases. The three main phases are: Phase 1: Product management Phase 2: Domain engineering Phase 3: Product engineering The process has been modeled on a higher abstraction level. This has the advantage that it can be applied to all kinds of product lines and families, not only software. The model can be applied to any product family. Figure 1 (below) shows a model of the entire process. Below, the process is described in detail. The process description contains elaborations of the activities and the important concepts being used. All concepts printed in italic are explained in Table 1. === Phase 1: product management === The first phase is the starting up of the whole process. In this phase some important aspects are defined especially with regard to economic aspects. This phase is responsible for outlining market strategies and defining a scope, which tells what should and should not be inside the product family. ==== Evaluate business visioning ==== During this first activity all context information relevant for defining the scope of the product line is collected and evaluated. It is important to define a clear market strategy and take external market information into account, such as consumer demands. The activity should deliver a context document that contains guidelines, constraints and the product strategy. ==== Define product line scope ==== Scoping techniques are applied to define which aspects are within the scope. This is based upon the previous step in the process, where external factors have been taken into account. The output is a product portfolio description, which includes a list of current and future products and also a product roadmap. It can be argued whether phase 1, product management, is part of the product-family-engineering process, because it could be seen as an individual business process that is more focused on the management aspects instead of the product aspect. However phase 2 needs some important input from this phase, as a large piece of the scope is defined in this phase. So from this point of view it is important to include the product-management phase (phase 1) into the entire process as a base for the domain-engineering process. === Phase 2: domain engineering === During the domain-engineering phases, the variable and common requirements are gathered for the whole product line. The goal is to establish a reusable platform. The output of this phase is a set of common and variable requirements for all products in the product line. ==== Analyze domain requirements ==== This activity includes all activities for analyzing the domain with regard to concept requirements. The requirements are categorized and split up into two new activities. The output is a document with the domain analysis. As can be seen in Figure 1 the process of defining common requirements is a parallel process with defining variable requirements. Both activities take place at the same time. ==== Define common requirements ==== Includes all activities for eliciting and documenting the common requirements of the product line, resulting in a document with reusable common requirements. ==== Define variable requirements ==== Includes all activities for eliciting and documenting the variable requirements of the product line, resulting in a document with variable requirements. ==== Design domain ==== This process step consists of activities for defining the reference architecture of the product line. This generates an abstract structure for all products in the product line. ==== Implement domain ==== During this step a detailed design of the reusable components and the implementation of these components are created. ==== Test domain ==== Validates and verifies the reusability of components. Components are tested against their specifications. After successful testing of all components in different use cases and scenarios, the domain engineering phase has been completed. === Phase 3: product engineering === In the final phase a product X is being engineered. This product X uses the commonalities and variability from the domain engineering phase, so product X is being derived from the platform established in the domain engineering phase. It basically takes all common requirements and similarities from the preceding phase plus its own variable requirements. Using the base from the domain engineering phase and the individual requirements of the product engineering phase a complete and new product can be built. After the product has been fully tested and approved, the product X can be delivered. ==== Define product requirements ==== Developing the product requirements specification for the individual product and reuse the requirements from the preceding phase. ==== Design product ==== All activities for producing the product architecture. Makes use of the reference architecture from the step "design domain", it selects and configures the required parts of the reference architecture and incorporates product specific adaptations. ==== Build product ==== During this process the product is built, using selections and configurations of the reusable components. ==== Test product ==== During this step the product is verified and validated against its specifications. A test report gives information about all tests that were carried out, this gives an overview of possible errors in the product. If the product in the next step is not accepted, the process will loop back to "build product", in Figure 1 this is indicated as "[unsatisfied]". ==== Deliver and support product ==== The final step is the acceptance of the final product. If it has been successfully tested and approved to be complete, it can be delivered. If the product does not satisfy to the specifications, it has to be rebuilt and tested again. The next figure shows the overall process of product-family engineering as described above. It is a full process overview with all concepts attached to the different steps. == Process data diagram == On the left side the entire process from the top to bottom has been drawn. All activities on the left side are linked to the concepts on the right side through dotted lines. Every concept has a number, which reflects the association with other concepts. == List of concepts == Below the list with concepts will be explained. Most concept definitions are extracted from Pohl, Bockle, & Linden (2005) and also some new definitions have been added. Table 1: List of concepts == Example == There are some good examples of the use of product family engineering, which were quite successful. The abstract model of product family engineering allows different kinds of uses, most of them are related to the consumer electronics m

    Read more →
  • StatMuse

    StatMuse

    StatMuse Inc. is an American artificial intelligence company founded in 2014. It operates an eponymous website that hosts a database of sports statistics covering the four major North American sports leagues, the Women's National Basketball Association (WNBA), NCAA Division I men's basketball, NCAA Division I Football Bowl Subdivision, the Big Five association football leagues in Europe, and various professional golf tours. == History == The company was founded by friends Adam Elmore and Eli Dawson in 2014. In email correspondence to the Springfield News-Leader, Elmore detailed that he and Dawson, fans of the National Basketball Association (NBA), were compelled to create StatMuse after they realized there was no online platform where they could search "Lebron James most points" [sic] and quickly get a result "showing his highest scoring games." As a startup, the company's goal was to utilize a type of artificial intelligence called natural language processing (NLP) for sports. In 2015, the company was part of the second group of startups accepted into the Disney Accelerator program. The company secured support from several investors, including The Walt Disney Company, Techstars, Allen & Company, the NFL Players Association, Greycroft and NBA Commissioner David Stern. As part of their partnership with Disney, StatMuse signed a content deal with ESPN (owned by Disney) to provide stats content on social media and television during the 2015–16 NBA season. Initially, the company only had stats available for the NBA, but eventually expanded to provide stats for the other major North American sports leagues. The company's initial demographic was players of fantasy sports, but it eventually expanded to target general sports fans as well. StatMuse offers responses to user queries in the voices of sports-related public figures. Dawson shared with VentureBeat that StatMuse brings people in and records them saying different words and phrases. These celebrity voices were made accessible through Google's Google Assistant service, Microsoft's Cortana virtual assistant, and Amazon's Echo devices. The company launched its phone app in September 2017. The app allows users to access StatMuse's sports statistics database by submitting queries in their natural language. Upon the launch of the phone app, Fitz Tepper of TechCrunch wrote that: "The technology isn't perfect – some of the pauses between words are a bit awkward, making it clear that some phrases are being stitched together on the fly. But this is the exception, and on the whole, most responses sound pretty good." StatMuse plug-ins for Slack and Facebook Messenger were also made, providing text-based sports stats. In 2019, StatMuse received investment from the Google Assistant Investment program. The service launched a premium option dubbed StatMuse+ in May 2023, offering options that had previously been included for free, such as unlimited searches and full results in data tables. The premium version also included early access to new features and a personalized search history, as well as not having ads. The app received a variety of feedback. In January 2024, the service launched a Premier League version of the website dubbed StatMuse FC. It is planned to introduce more leagues on the website.

    Read more →
  • Xiaomi MiMo

    Xiaomi MiMo

    Xiaomi MiMo is a family of large language models (LLMs) developed by Xiaomi. It was initially released in April 2025 with the MiMo-7B model. Currently, MiMo is available for developers through API service. It is used as the key AI model in Xiaomi's "Human x Car x Home" ecosystem. == Development == Xiaomi developed MiMo as a reasoning-focused language model. Its development team was led by Luo Fuli, who had previously worked at DeepSeek before joining Xiaomi in late 2025. The model was trained using multi-token prediction and reinforcement learning, with a particular emphasis on mathematical reasoning and code generation tasks. In March 2026, Xiaomi CEO Lei Jun announced that the company planned to invest at least US$8.7 billion in artificial intelligence over the following three years. == Models == === List of models === === MiMo-7B === MiMo-7B is the first model of this LLM. The base model, MiMo-7B-Base, was pre-trained on approximately 25 trillion tokens using web pages, academic papers, books, and synthetic reasoning data. MiMo-7B-RL underwent supervised fine-tuning and reinforcement learning on 130,000 mathematics and code problems. MiMo-7B-RL-0530 was released in May 2025. It scaled the fine-tuning dataset from 500,000 to 6 million instances and extended the RL window from 32,000 to 48,000 tokens and improved AIME 2024 scores from 68.2 to 80.1. MiMo-VL-7B was a vision-language model combining a Vision Transformer encoder with the MiMo-7B backbone. It was trained in four stages consuming 2.4 trillion tokens. Its reinforcement learning variant used Mixed On-Policy Reinforcement Learning (MORL) which integrated reward signals across perception, grounding, and reasoning. Xiaomi also released MiMo-Audio-7B, an audio-language model for voice conversion, style transfer, and speech editing. === MiMo-V2-Flash === MiMo-V2-Flash was launched in December 2025. It is a open-sourced Mixture-of-experts model with 309 billion total parameters and 15 billion active parameters. It was trained on 27 trillion tokens using FP8 mixed precision. It used hybrid attention interleaving Sliding Window and Global Attention at a 5:1 ratio. === MiMo-V2-Pro === Xiaomi publicly introduced MiMo-V2-Pro on 18 March 2026. It has over 1 trillion total parameters, 42 billion active, and a 1-million-token context window. Before the official release, the model had appeared anonymously on OpenRouter under the codename "Hunter Alpha," where it drew substantial usage and topped daily charts for several days, according to Xiaomi and Reuters. During its listing on OpenRouter, the model reportedly processed over one trillion tokens in total usage. Xiaomi later said Hunter Alpha was an early internal test build of MiMo-V2-Pro, and Reuters reported that the model had been mistaken by some users for a possible DeepSeek system before Xiaomi confirmed its origin. The model was released as a proprietary API product, and Luo Fuli stated that Xiaomi intended to open-source a variant at an unspecified future date. Xiaomi has partnered with several API web platforms like OpenClaw to launch the model. All these websites initially offered a free trial of this model for a week, but due to the overwhelming response, Xiaomi later extended the free trial period of the model until 2 April 2026. === MiMo-V2-Omni === Alongside MiMo-V2-Pro, Xiaomi launched MiMo-V2-Omni on 18 March 2026. It handles image, video, audio, and text inputs. Before the official release, it was codenamed "Healer Alpha" in OpenRouter. === MiMo-V2-TTS === On the same date as the release of MiMo-V2-Pro and MiMo-V2-Omni, a Text-to-Speech model named MiMo-V2-TTS was released also. It is a speech synthesis model. It was trained on audio data, which makes it capable of emotional transitions, mid-sentence tone shifts, singing, and synthesis of regional dialects like Sichuan, Cantonese, Henan, and Taiwanese. == Licensing == Xiaomi has used different licensing approaches for different models in the MiMo family. The MiMo-7B series and MiMo-V2-Flash were released as open-weight models. MiMo-V2-Flash was published under the MIT license with model weights and inference code available on Hugging Face. MiMo-V2-Pro and MiMo-V2-Omni were released as proprietary models. It was accessible through Xiaomi's API platform and third-party API providers. Luo Fuli stated that Xiaomi intended to open-source a variant of MiMo-V2-Pro. Although, she did not specify any timeline. MiMo-V2-TTS was released as a proprietary model with no publicly available weights.

    Read more →