AI Data Water

AI Data Water — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Alipay

    Alipay

    Alipay (simplified Chinese: 支付宝; traditional Chinese: 支付寶; pinyin: zhīfùbǎo) is a third-party mobile and online payment platform, established in Hangzhou, China, in February 2004 by Alibaba Group and its founder Jack Ma. In 2015, Alipay moved its headquarters to Pudong, Shanghai, although its parent company Ant Financial remains Hangzhou-based. Alipay overtook PayPal as the world's largest mobile (digital) payment platform in 2013. As of June 2020, Alipay serves over 1.3 billion users and 80 million merchants. According to the statistics of the fourth quarter of 2018, Alipay has a 55.32% share of the third-party payment market in mainland China, and it continues to grow. Along with WeChat, Alipay has been described to be China's super-app with a wide range of functionalities including ridesharing, travel booking and medical appointments. == History == The service was first launched in 2003, by Taobao. The People's Bank of China, China's central bank, issued licensing regulations in June 2010 for third-party payment providers. It also issued separate guidelines for foreign-funded payment institutions. Because of this, Alipay, which accounted for half of China's non-bank online payment market, was restructured as a domestic company controlled by Alibaba CEO Jack Ma in order to facilitate the regulatory approval for the license. The 2010 transfer of Alipay's ownership was controversial, with media reports in 2011 that Yahoo! and Softbank (Alibaba Group's controlling shareholders) were not informed of the sale for nominal value. Chinese business publication Century Weekly criticised Ma, who stated that Alibaba Group's board of directors was aware of the transaction. The incident was criticised in foreign and Chinese media as harming foreign trust in making Chinese investments. The ownership dispute was resolved by Alibaba Group, Yahoo!, and Softbank in July 2011. In 2013, Alipay launched a financial product platform called Yu'e Bao. Alipay partnered with Tianhong Asset Management to launch the it. Yu'e Bao offers an online money market account in which Alipay customers can deposit money and receive a higher interest rate than that available from banks. It soon became China's largest online money market fund and prompted competitors like Baidu and Tencent to introduce alternatives. Alibaba (the parent company of Alipay) reported having 152 million Yu'e Bao users in mid-2016, with 810 billion RMB (US$117 billion) in funds under management. In 2015, Alipay's parent company was re-branded as Ant Financial Services Group. In 2017, Alipay unveiled their facial recognition payment service. In 2020, Alipay upgraded from a payment financial instrument to an open platform for digital life. In 2021, the mandate by the Ministry of Industry and Information Technology (MIIT) to open up the "walled garden" ecosystems of the major tech companies has led to the introduction of interoperability of payment QR codes of Alipay and competing WeChat Pay and UnionPay's Cloud QuickPass platforms. In response to the increase in Alipay's payment volume due to use on Alibaba's e-commerce sites and others, Chinese regulators introduced new rules in 2020. The new rules focused on Alipay because the payment volume exploded due to its use on Alibaba's e-commerce sites and other platforms. By the second quarter in 2020, Alipay held 55.6% of China's third party mobile payment market. The People's Bank of China made rules that required payment firms to place money with regulators and anti-monopoly reviews would be triggered if the amount exceeded 50% market share. The rules included that the People's Bank of China mandate an online-payment clearing route through the NetsUnion Clearing Corporation, a centralized, state-overseen clearing body, and that unused consumer funds be held by a third-party payment provider in a non-interest-bearing account. These measures increased transparency and reduced systemic risk. When Alipay operates outside of China, it must comply with local financial regulations, which may treat specific functions such as money-market funds or investment-linked products. In Singapore, such services may require prior authorization from securities or financial-services regulators before they can be offered to residents. == Services == Alipay states that it operates with more than 65 financial institutions including Visa and MasterCard to provide payment services for Taobao and Tmall as well as more than 460,000 online and local Chinese businesses. Alipay is used in smartphones with their Alipay Wallet app. QR code payment codes are used for local in-store payments. The Alipay app also provides features such as credit card bill payments, bank account managements, P2P transfer, prepay mobile phone top-up, bus and train ticket purchases, food orders, vehicles for hire, insurance selections and a digital identification document storage. Alipay also allows online check-out on most Chinese-based websites such as Taobao and Tmall. The Alipay app allows users to add their own services provided from different companies to create a more personalised experience. Since late 2008, Alipay has promoted public service payment services and has covered more than 300 cities nationwide, supporting more than 1,200 partner organizations. In addition to utility bills such as water and electricity, Alipay also extends their services to areas such as paying transportation fines, property fees, and cable television fees. Common online payment services also include hydropower coal payment, tuition payment and traffic fine. On 15 January 2009, Alipay launched a credit card repayment service, supporting 39 domestic bank-issued credit cards. It is currently the most popular third-party repayment platform. The main advantages are free credit card bills checking, repayments with no administrative fee, as well as automatic repayment, repayment reminders and other value-added services. In the first quarter of 2014, 76% of credit cards were also paid by Alipay Wallet. From December 2013, several chain convenience store companies, including Meiyijia, Hongqi Chain, and Qishiduo C-STORE and 7-Eleven, have successively supported Alipay payment; in December, Beijing taxi drivers began to accept Alipay to pay the fare. Subsequently, Wanda Cinema, Joy City, Wangfujing and other large-scale retail companies as well as movie theaters, KTV, and catering companies have access to Alipay. From 26 March 2019, the service fee will be charged for the payment of credit card through Alipay. Customers only pay the portion of the payment that exceeds 2,000 yuan at 0.1%. In addition to this, in 2019, Walgreens accepted Alipay as payment in 3,000 US stores. Walgreen's products are available to Chinese customers through Alibaba's Tmall online marketplace. The payment application can also be used on Alibaba.com's site and Taobao as a means of payment. A Nielsen report suggests that over 90% of Chinese tourists would be willing to use mobile payment overseas if given the option. Many Chinese tourists do not have international credit cards, and so Alipay is a payment option. Digital payments have become the norm in China as the government pushes a cashless system even in rural and village areas. In November 2019, Alipay introduced Tourpass, a service component that allows non-Chinese users to use its mobile payment feature by pre-loading Chinese Yuan equivalent foreign currency into the app. In 2020, Alipay used a QR code system to help in containing the COVID-19 outbreak. The health code system tags users one of three colors according to their location, basic health information and travel history. "Beauty filters" were included to Alipay's face-scan payment system in a new upgrade that was released in July 2019. The market has responded well to the "beauty filters," which make users seem better when they use the program to make payments. Alipay Tap is a payment function launched by Alipay in July 2024. Alipay+ NFC enables wallets to offer tap-to-pay acceptance across Mastercard's global contactless network, all within your existing wallet infrastructure. == Foreign expansion == Outside of China, more than 300 worldwide merchants use Alipay to sell directly to consumers in China. It currently supports transactions in 18 foreign currencies. Since the launch of Alipay in the Mainland China, Ant Financial introduced a series of expansion of the services to other countries. Other than expanding into individual countries, the system would also be integrated with online payment platform providers. Ant Group had acquired a majority stake into 2C2P, a Singapore-based provider used by merchants worldwide in April 2022, and would eventually integrate Alipay with 2C2P. === Asia === ==== Bangladesh ==== In 2018, Alipay bought 20% shares in Bangladeshi mobile financial service provider bKash Limited. ==== Hong Kong ==== In 2017, Ant Financial expanded to Hong Kong. In a joint venture with CK Hutchison, as Alipay Payment Ser

    Read more →
  • Actor-critic algorithm

    Actor-critic algorithm

    The actor-critic algorithm (AC) is a family of reinforcement learning (RL) algorithms that combine policy-based RL algorithms such as policy gradient methods, and value-based RL algorithms such as value iteration, Q-learning, SARSA, and TD learning. An AC algorithm consists of two main components: an "actor" that determines which actions to take according to a policy function, and a "critic" that evaluates those actions according to a value function. Some AC algorithms are on-policy, some are off-policy. Some apply to either continuous or discrete action spaces. Some work in both cases. == Overview == The actor-critic methods can be understood as an improvement over pure policy gradient methods like REINFORCE via introducing a baseline. === Actor === The actor uses a policy function π ( a | s ) {\displaystyle \pi (a|s)} , while the critic estimates either the value function V ( s ) {\displaystyle V(s)} , the action-value Q-function Q ( s , a ) , {\displaystyle Q(s,a),} the advantage function A ( s , a ) {\displaystyle A(s,a)} , or any combination thereof. The actor is a parameterized function π θ {\displaystyle \pi _{\theta }} , where θ {\displaystyle \theta } are the parameters of the actor. The actor takes as argument the state of the environment s {\displaystyle s} and produces a probability distribution π θ ( ⋅ | s ) {\displaystyle \pi _{\theta }(\cdot |s)} . If the action space is discrete, then ∑ a π θ ( a | s ) = 1 {\displaystyle \sum _{a}\pi _{\theta }(a|s)=1} . If the action space is continuous, then ∫ a π θ ( a | s ) d a = 1 {\displaystyle \int _{a}\pi _{\theta }(a|s)da=1} . The goal of policy optimization is to improve the actor. That is, to find some θ {\displaystyle \theta } that maximizes the expected episodic reward J ( θ ) {\displaystyle J(\theta )} : J ( θ ) = E π θ [ ∑ t = 0 T γ t r t ] {\displaystyle J(\theta )=\mathbb {E} _{\pi _{\theta }}\left[\sum _{t=0}^{T}\gamma ^{t}r_{t}\right]} where γ {\displaystyle \gamma } is the discount factor, r t {\displaystyle r_{t}} is the reward at step t {\displaystyle t} , and T {\displaystyle T} is the time-horizon (which can be infinite). The goal of policy gradient method is to optimize J ( θ ) {\displaystyle J(\theta )} by gradient ascent on the policy gradient ∇ J ( θ ) {\displaystyle \nabla J(\theta )} . As detailed on the policy gradient method page, there are many unbiased estimators of the policy gradient: ∇ θ J ( θ ) = E π θ [ ∑ 0 ≤ j ≤ T ∇ θ ln ⁡ π θ ( A j | S j ) ⋅ Ψ j | S 0 = s 0 ] {\displaystyle \nabla _{\theta }J(\theta )=\mathbb {E} _{\pi _{\theta }}\left[\sum _{0\leq j\leq T}\nabla _{\theta }\ln \pi _{\theta }(A_{j}|S_{j})\cdot \Psi _{j}{\Big |}S_{0}=s_{0}\right]} where Ψ j {\textstyle \Psi _{j}} is a linear sum of the following: ∑ 0 ≤ i ≤ T ( γ i R i ) {\textstyle \sum _{0\leq i\leq T}(\gamma ^{i}R_{i})} . γ j ∑ j ≤ i ≤ T ( γ i − j R i ) {\textstyle \gamma ^{j}\sum _{j\leq i\leq T}(\gamma ^{i-j}R_{i})} : the REINFORCE algorithm. γ j ∑ j ≤ i ≤ T ( γ i − j R i ) − b ( S j ) {\textstyle \gamma ^{j}\sum _{j\leq i\leq T}(\gamma ^{i-j}R_{i})-b(S_{j})} : the REINFORCE with baseline algorithm. Here b {\displaystyle b} is an arbitrary function. γ j ( R j + γ V π θ ( S j + 1 ) − V π θ ( S j ) ) {\textstyle \gamma ^{j}\left(R_{j}+\gamma V^{\pi _{\theta }}(S_{j+1})-V^{\pi _{\theta }}(S_{j})\right)} : TD(1) learning. γ j Q π θ ( S j , A j ) {\textstyle \gamma ^{j}Q^{\pi _{\theta }}(S_{j},A_{j})} . γ j A π θ ( S j , A j ) {\textstyle \gamma ^{j}A^{\pi _{\theta }}(S_{j},A_{j})} : Advantage Actor-Critic (A2C). γ j ( R j + γ R j + 1 + γ 2 V π θ ( S j + 2 ) − V π θ ( S j ) ) {\textstyle \gamma ^{j}\left(R_{j}+\gamma R_{j+1}+\gamma ^{2}V^{\pi _{\theta }}(S_{j+2})-V^{\pi _{\theta }}(S_{j})\right)} : TD(2) learning. γ j ( ∑ k = 0 n − 1 γ k R j + k + γ n V π θ ( S j + n ) − V π θ ( S j ) ) {\textstyle \gamma ^{j}\left(\sum _{k=0}^{n-1}\gamma ^{k}R_{j+k}+\gamma ^{n}V^{\pi _{\theta }}(S_{j+n})-V^{\pi _{\theta }}(S_{j})\right)} : TD(n) learning. γ j ∑ n = 1 ∞ λ n − 1 1 − λ ⋅ ( ∑ k = 0 n − 1 γ k R j + k + γ n V π θ ( S j + n ) − V π θ ( S j ) ) {\textstyle \gamma ^{j}\sum _{n=1}^{\infty }{\frac {\lambda ^{n-1}}{1-\lambda }}\cdot \left(\sum _{k=0}^{n-1}\gamma ^{k}R_{j+k}+\gamma ^{n}V^{\pi _{\theta }}(S_{j+n})-V^{\pi _{\theta }}(S_{j})\right)} : TD(λ) learning, also known as GAE (generalized advantage estimate). This is obtained by an exponentially decaying sum of the TD(n) learning terms. === Critic === In the unbiased estimators given above, certain functions such as V π θ , Q π θ , A π θ {\displaystyle V^{\pi _{\theta }},Q^{\pi _{\theta }},A^{\pi _{\theta }}} appear. These are approximated by the critic. Since these functions all depend on the actor, the critic must learn alongside the actor. The critic is learned by value-based RL algorithms. For example, if the critic is estimating the state-value function V π θ ( s ) {\displaystyle V^{\pi _{\theta }}(s)} , then it can be learned by any value function approximation method. Let the critic be a function approximator V ϕ ( s ) {\displaystyle V_{\phi }(s)} with parameters ϕ {\displaystyle \phi } . The simplest example is TD(1) learning, which trains the critic to minimize the TD(1) error: δ i = R i + γ V ϕ ( S i + 1 ) − V ϕ ( S i ) {\displaystyle \delta _{i}=R_{i}+\gamma V_{\phi }(S_{i+1})-V_{\phi }(S_{i})} The critic parameters are updated by gradient descent on the squared TD error: ϕ ← ϕ − α ∇ ϕ ( δ i ) 2 = ϕ + α δ i ∇ ϕ V ϕ ( S i ) {\displaystyle \phi \leftarrow \phi -\alpha \nabla _{\phi }(\delta _{i})^{2}=\phi +\alpha \delta _{i}\nabla _{\phi }V_{\phi }(S_{i})} where α {\displaystyle \alpha } is the learning rate. Note that the gradient is taken with respect to the ϕ {\displaystyle \phi } in V ϕ ( S i ) {\displaystyle V_{\phi }(S_{i})} only, since the ϕ {\displaystyle \phi } in γ V ϕ ( S i + 1 ) {\displaystyle \gamma V_{\phi }(S_{i+1})} constitutes a moving target, and the gradient is not taken with respect to that. This is a common source of error in implementations that use automatic differentiation, and requires "stopping the gradient" at that point. Similarly, if the critic is estimating the action-value function Q π θ {\displaystyle Q^{\pi _{\theta }}} , then it can be learned by Q-learning or SARSA. In SARSA, the critic maintains an estimate of the Q-function, parameterized by ϕ {\displaystyle \phi } , denoted as Q ϕ ( s , a ) {\displaystyle Q_{\phi }(s,a)} . The temporal difference error is then calculated as δ i = R i + γ Q θ ( S i + 1 , A i + 1 ) − Q θ ( S i , A i ) {\displaystyle \delta _{i}=R_{i}+\gamma Q_{\theta }(S_{i+1},A_{i+1})-Q_{\theta }(S_{i},A_{i})} . The critic is then updated by θ ← θ + α δ i ∇ θ Q θ ( S i , A i ) {\displaystyle \theta \leftarrow \theta +\alpha \delta _{i}\nabla _{\theta }Q_{\theta }(S_{i},A_{i})} The advantage critic can be trained by training both a Q-function Q ϕ ( s , a ) {\displaystyle Q_{\phi }(s,a)} and a state-value function V ϕ ( s ) {\displaystyle V_{\phi }(s)} , then let A ϕ ( s , a ) = Q ϕ ( s , a ) − V ϕ ( s ) {\displaystyle A_{\phi }(s,a)=Q_{\phi }(s,a)-V_{\phi }(s)} . Although, it is more common to train just a state-value function V ϕ ( s ) {\displaystyle V_{\phi }(s)} , then estimate the advantage by A ϕ ( S i , A i ) ≈ ∑ j ∈ 0 : n − 1 γ j R i + j + γ n V ϕ ( S i + n ) − V ϕ ( S i ) {\displaystyle A_{\phi }(S_{i},A_{i})\approx \sum _{j\in 0:n-1}\gamma ^{j}R_{i+j}+\gamma ^{n}V_{\phi }(S_{i+n})-V_{\phi }(S_{i})} Here, n {\displaystyle n} is a positive integer. The higher n {\displaystyle n} is, the more lower is the bias in the advantage estimation, but at the price of higher variance. The Generalized Advantage Estimation (GAE) introduces a hyperparameter λ {\displaystyle \lambda } that smoothly interpolates between Monte Carlo returns ( λ = 1 {\displaystyle \lambda =1} , high variance, no bias) and 1-step TD learning ( λ = 0 {\displaystyle \lambda =0} , low variance, high bias). This hyperparameter can be adjusted to pick the optimal bias-variance trade-off in advantage estimation. It uses an exponentially decaying average of n-step returns with λ {\displaystyle \lambda } being the decay strength. == Variants == Asynchronous Advantage Actor-Critic (A3C): Parallel and asynchronous version of A2C. Soft Actor-Critic (SAC): Incorporates entropy maximization for improved exploration. Deep Deterministic Policy Gradient (DDPG): Specialized for continuous action spaces.

    Read more →
  • Learning rate

    Learning rate

    In machine learning and statistics, the learning rate is a tuning parameter in an optimization algorithm that determines the step size at each iteration while moving toward a minimum of a loss function. Since it influences to what extent newly acquired information overrides old information, it metaphorically represents the speed at which a machine learning model "learns". In the adaptive control literature, the learning rate is commonly referred to as gain. In setting a learning rate, there is a trade-off between the rate of convergence and overshooting. While the descent direction is usually determined from the gradient of the loss function, the learning rate determines how big a step is taken in that direction. Too high a learning rate will make the learning jump over minima, but too low a learning rate will either take too long to converge or get stuck in an undesirable local minimum. In order to achieve faster convergence, prevent oscillations and getting stuck in undesirable local minima the learning rate is often varied during training either in accordance to a learning rate schedule or by using an adaptive learning rate. The learning rate and its adjustments may also differ per parameter, in which case it is a diagonal matrix that can be interpreted as an approximation to the inverse of the Hessian matrix in Newton's method. The learning rate is related to the step length determined by inexact line search in quasi-Newton methods and related optimization algorithms. == Learning rate schedule == Initial rate can be left as system default or can be selected using a range of techniques. A learning rate schedule changes the learning rate during learning and is most often changed between epochs/iterations. This is mainly done with two parameters: decay and momentum. There are many different learning rate schedules but the most common are time-based, step-based and exponential. Decay serves to settle the learning in a nice place and avoid oscillations, a situation that may arise when too high a constant learning rate makes the learning jump back and forth over a minimum, and is controlled by a hyperparameter. Momentum is analogous to a ball rolling down a hill; we want the ball to settle at the lowest point of the hill (corresponding to the lowest error). Momentum both speeds up the learning (increasing the learning rate) when the error cost gradient is heading in the same direction for a long time and also avoids local minima by 'rolling over' small bumps. Momentum is controlled by a hyperparameter analogous to a ball's mass which must be chosen manually—too high and the ball will roll over minima which we wish to find, too low and it will not fulfil its purpose. The formula for factoring in the momentum is more complex than for decay but is most often built in with deep learning libraries such as Keras. Time-based learning schedules alter the learning rate depending on the learning rate of the previous time iteration. Factoring in the decay the mathematical formula for the learning rate is: η n + 1 = η 0 1 + d n {\displaystyle \eta _{n+1}={\frac {\eta _{0}}{1+dn}}} where η {\displaystyle \eta } is the learning rate, η 0 {\displaystyle \eta _{0}} is the original learning rate, d {\displaystyle d} is a decay parameter and n {\displaystyle n} is the iteration step. Step-based learning schedules changes the learning rate according to some predefined steps. The decay application formula is here defined as: η n = η 0 d ⌊ 1 + n r ⌋ {\displaystyle \eta _{n}=\eta _{0}d^{\left\lfloor {\frac {1+n}{r}}\right\rfloor }} where η n {\displaystyle \eta _{n}} is the learning rate at iteration n {\displaystyle n} , η 0 {\displaystyle \eta _{0}} is the initial learning rate, d {\displaystyle d} is how much the learning rate should change at each drop (0.5 corresponds to a halving) and r {\displaystyle r} corresponds to the drop rate, or how often the rate should be dropped (10 corresponds to a drop every 10 iterations). The floor function ( ⌊ … ⌋ {\displaystyle \lfloor \dots \rfloor } ) here drops the value of its input to 0 for all values smaller than 1. Exponential learning schedules are similar to step-based, but instead of steps, a decreasing exponential function is used. The mathematical formula for factoring in the decay is: η n = η 0 e − d n {\displaystyle \eta _{n}=\eta _{0}e^{-dn}} where d {\displaystyle d} is a decay parameter. == Adaptive learning rate == The issue with learning rate schedules is that they all depend on hyperparameters that must be manually chosen for each given learning session and may vary greatly depending on the problem at hand or the model used. To combat this, there are many different types of adaptive gradient descent algorithms such as Adagrad, Adadelta, RMSprop, and Adam which are generally built into deep learning libraries such as Keras.

    Read more →
  • Label noise

    Label noise

    Label noise refers to errors or inaccuracies in the class labels of data instances. This is a widespread issue in machine learning datasets, arising from human annotator mistakes, unclear labeling instructions, automated labeling methods, or adversarial attacks in supervised learning. Label noise can be roughly divided into random noise, where labels are flipped independently of input features, and systematic noise, where mislabeling is dependent on certain patterns or biases in the data. Label noise can be damaging to model performance, especially for complex models that may overfit to noisy labels rather than generalizable patterns. Many approaches have been proposed to deal with the effects of label noise, including robust loss functions, noise-tolerant algorithms, data cleaning methods, and semi-supervised learning approaches. To reduce the impact of wrong labels during training, techniques like label smoothing, sample reweighting and using trusted validation sets are used. The role of noise-robust training paradigms and curriculum learning strategies to improve resilience against mislabeled data is also explored in recent research.

    Read more →
  • Cloud printing

    Cloud printing

    There are, in essence, three kinds of Cloud printing. == Benefits == 76% of IT teams have moved, or plan to move, their print workflows to the cloud due to its simplicity. Consumers can print easily to any printer from their PC, tablet or smartphone, while the Cloud print service monitors the supplies level. Many printer vendors such as Lexmark propose an automatic supplies shipment based on the real-time analysis of the printer supplies and user behavior to ensure printing will always be possible. For IT department, Cloud Printing eliminates the need for print servers and represents the only way to print from Cloud virtual desktops and servers. For consumers, cloud ready printers eliminate the need for PC connections and print drivers, enabling them to print from mobile devices. As for publishers and content owners, cloud printing allows them to "avoid the cost and complexity of buying and managing the underlying hardware, software and processes" required for the production of professional print products. Leveraging cloud print for print on demand also allows businesses to cut down on the costs associated with mass production. Moreover, cloud printing can be considered more eco-friendly, as it significantly reduces the amount of paper used (13% reduction in print jobs yearly) and lowers carbon emissions from transportation. As many companies move their IT to the Cloud, some adopting the Windows 365 and Azure Virtual Desktop services from Microsoft, the connection from the Cloud environment to the on-premise printers become an issue as opening ports for incoming print flow traffic is not an option. In 2020, at the exact same time Google discontinued its Google Print offer, Microsoft has announced its Universal Print service offer, aimed at making printing compatible with Cloud Desktop environments, making printing driver-free and simple with no client to install on PC. With Universal Print Microsoft has built a disrupting architecture with a value proposition commodifying printers, removing print servers and drivers, allowing to move printers to VLAN for security purpose and printing from anywhere. Clients are free to use any printer from any model as they all work the same, clients are not tied anymore to any printer brand and that gave a significant boost to the Cloud print market. That Microsoft Universal Print architecture provides APIs to third-party developers who can develop add-ons such as Celiveo 365 to extend Microsoft Cloud Print with added features such as access control on printers and copiers, follow-me pull print, data encryption, advanced usage reporting or charge back. == Providers of Consumer Cloud Printing Solutions == Before 2020 only a handful of providers used to work towards a professional cloud print solution, operating in their own niche or focus on mobile devices. In 2020 Microsoft has boosted that market by announcing its Universal Print Cloud printing service and since then many publishers have started to propose solutions for that growing market. The Covid pandemic also created the need for employees to be able to print at home when using the corporate IT software. Closed VPN often prevent accessing home network printers from corporate laptops and Full Public Cloud solutions are meant to be a solution to that problem. After the decision by Google to terminate Google Cloud Print service on 31 December 2020, most printer vendors released their own mobile cloud solution to fill the gap, while Hewlett-Packard implemented its own cloud print with their ePrint solution. Those solutions are often proprietary, only working on printers proposed by the vendor. Google has decided to let third-party developers develop Cloud Print solutions and to limit its scope to certifying the best Print Management offers compatible with its Chrome Enterprise Cloud ecosystem. == Providers of Corporate Cloud Printing solutions == While many print solutions claim to be "Cloud Printing", there are actually three categories: full Private Cloud, full Public Cloud, and Hybrid Cloud. Their differences are real and have an impact on the overall TCO as the more software there is on-site, the more hidden cost there are. In the Full Public Cloud category, independent SaaS vendors like Celiveo, ezeep , Printix , and Y Soft support a wide range of printer brands and models, allowing clients to buy the best printer without being locked on any brand. They are leveraging cloud computing technology to offer cloud-based print infrastructure and cloud-based printing software as a Service (SaaS). These solutions have integrations to cloud enabled printers or provide embedded printer agents. They feature allow users to print to any printer in any network, isolated network or not, even if that printer is otherwise not reachable from the user's computer. This also allows IT departments to move printers to VLAN for maximum security, like what they are doing with IP phones. Google Chrome Enterprise Cloud ecosystem has its own technical particularities and Google certifies Print Management solutions, ensuring they comply with Google technical requirement, yet letting each solution differentiate from others with specific features or security. Many of solutions for Chrome Enterprise are Hybrid, a few are Full Public Cloud. Industry experts believe that as these services become more popular, users will no longer consider printers as necessary assets but rather as devices that they can access on demand when the need to generate a printed page presents itself. == Caveats of Cloud Printing == == Security == Print jobs flow through Public Internet. It is therefore important to verify no Man-in-the-Middle attack can be performed. The only technical solution is to ensure each printer and PC uses a non-self-generated cryptographic token or certificate allowing TLS mutual authentication and specific data encryption. Self-generated printer certificates are unknown from the Cloud and prevent trusted authentication. Microsoft has implemented its Zero Trust Access security in its Universal Print service, it generates a unique certificate on printers compatible with its service. Other Cloud Printing SaaS providers have followed Microsoft on that High Security path. Print jobs data stored on the Cloud is sensitive as it contains user information as well as all information appearing on pages. Good practices require such data is encrypted at rest and in motion, using asymmetric PKI keys instead of fixed encryption keys. Some solutions require to open incoming traffic ports on the firewall to let Cloud services communicate with printers attached behind that firewall (most of the time for IPP/IPPS flows), some other solutions use a pull model where the communication is always initiated by the printer and no firewall port needs to be open. In terms of security the later is to be preferred.

    Read more →
  • Gödel machine

    Gödel machine

    A Gödel machine is a hypothetical self-improving computer program that solves problems in an optimal way. It uses a recursive self-improvement protocol in which it rewrites its own code when it can prove the new code provides a better strategy. The machine was invented by Jürgen Schmidhuber (first proposed in 2003), but is named after Kurt Gödel who inspired the mathematical theories. The Gödel machine is often discussed when dealing with issues of meta-learning, also known as "learning to learn." Applications include automating human design decisions and transfer of knowledge between multiple related tasks, and may lead to design of more robust and general learning architectures. Though theoretically possible, no full implementation has been created. The Gödel machine is often compared with Marcus Hutter's AIXI, another formal specification for an artificial general intelligence. Schmidhuber points out that the Gödel machine could start out by implementing AIXItl as its initial sub-program, and self-modify after it finds proof that another algorithm for its search code will be better. == Limitations == Traditional problems solved by a computer only require one input and provide some output. Computers of this sort had their initial algorithm hardwired. This does not take into account the dynamic natural environment, and thus was a goal for the Gödel machine to overcome. The Gödel machine has limitations of its own, however. According to Gödel's First Incompleteness Theorem, any formal system that encompasses arithmetic is either flawed or allows for statements that cannot be proved in the system. Hence even a Gödel machine with unlimited computational resources must ignore those self-improvements whose effectiveness it cannot prove. == Variables of interest == There are three variables that are particularly useful in the run time of the Gödel machine. At some time t {\displaystyle t} , the variable time {\displaystyle {\text{time}}} will have the binary equivalent of t {\displaystyle t} . This is incremented steadily throughout the run time of the machine. Any input meant for the Gödel machine from the natural environment is stored in variable x {\displaystyle x} . It is likely the case that x {\displaystyle x} will hold different values for different values of variable time {\displaystyle {\text{time}}} . The outputs of the Gödel machine are stored in variable y {\displaystyle y} , where y ( t ) {\displaystyle y(t)} would be the output bit-string at some time t {\displaystyle t} . At any given time t {\displaystyle t} , where ( 1 ≤ t ≤ T ) {\displaystyle (1\leq t\leq T)} , the goal is to maximize future success or utility. A typical utility function follows the pattern u ( s , E n v ) : S × E → R {\displaystyle u(s,\mathrm {Env} ):S\times E\rightarrow \mathbb {R} } : u ( s , E n v ) = E μ [ ∑ τ = time T r ( τ ) ∣ s , E n v ] {\displaystyle u(s,\mathrm {Env} )=E_{\mu }{\Bigg [}\sum _{\tau ={\text{time}}}^{T}r(\tau )\mid s,\mathrm {Env} {\Bigg ]}} where r ( t ) {\displaystyle r(t)} is a real-valued reward input (encoded within s ( t ) {\displaystyle s(t)} ) at time t {\displaystyle t} , E μ [ ⋅ ∣ ⋅ ] {\displaystyle E_{\mu }[\cdot \mid \cdot ]} denotes the conditional expectation operator with respect to some possibly unknown distribution μ {\displaystyle \mu } from a set M {\displaystyle M} of possible distributions ( M {\displaystyle M} reflects whatever is known about the possibly probabilistic reactions of the environment), and the above-mentioned time = time ⁡ ( s ) {\displaystyle {\text{time}}=\operatorname {time} (s)} is a function of state s {\displaystyle s} which uniquely identifies the current cycle. Note that we take into account the possibility of extending the expected lifespan through appropriate actions. == Instructions used by proof techniques == The nature of the six proof-modifying instructions below makes it impossible to insert an incorrect theorem into proof, thus trivializing proof verification. === get-axiom(n) === Appends the n-th axiom as a theorem to the current theorem sequence. Below is the initial axiom scheme: Hardware Axioms formally specify how components of the machine could change from one cycle to the next. Reward Axioms define the computational cost of hardware instruction and the physical cost of output actions. Related Axioms also define the lifetime of the Gödel machine as scalar quantities representing all rewards/costs. Environment Axioms restrict the way new inputs x are produced from the environment, based on previous sequences of inputs y. Uncertainty Axioms/String Manipulation Axioms are standard axioms for arithmetic, calculus, probability theory, and string manipulation that allow for the construction of proofs related to future variable values within the Gödel machine. Initial State Axioms contain information about how to reconstruct parts or all of the initial state. Utility Axioms describe the overall goal in the form of utility function u. === apply-rule(k, m, n) === Takes in the index k of an inference rule (such as Modus tollens, Modus ponens), and attempts to apply it to the two previously proved theorems m and n. The resulting theorem is then added to the proof. === delete-theorem(m) === Deletes the theorem stored at index m in the current proof. This helps to mitigate storage constraints caused by redundant and unnecessary theorems. Deleted theorems can no longer be referenced by the above apply-rule function. === set-switchprog(m, n) === Replaces switchprog S pm:n, provided it is a non-empty substring of S p. === check() === Verifies whether the goal of the proof search has been reached. A target theorem states that given the current axiomatized utility function u (Item 1f), the utility of a switch from p to the current switchprog would be higher than the utility of continuing the execution of p (which would keep searching for alternative switchprogs). === state2theorem(m, n) === Takes in two arguments, m and n, and attempts to convert the contents of Sm:n into a theorem. == Example applications == === Time-limited NP-hard optimization === The initial input to the Gödel machine is the representation of a connected graph with a large number of nodes linked by edges of various lengths. Within given time T it should find a cyclic path connecting all nodes. The only real-valued reward will occur at time T. It equals 1 divided by the length of the best path found so far (0 if none was found). There are no other inputs. The by-product of maximizing expected reward is to find the shortest path findable within the limited time, given the initial bias. === Fast theorem proving === Prove or disprove as quickly as possible that all even integers > 2 are the sum of two primes (Goldbach’s conjecture). The reward is 1/t, where t is the time required to produce and verify the first such proof. === Maximizing expected reward with bounded resources === A cognitive robot that needs at least 1 liter of gasoline per hour interacts with a partially unknown environment, trying to find hidden, limited gasoline depots to occasionally refuel its tank. It is rewarded in proportion to its lifetime, and dies after at most 100 years or as soon as its tank is empty or it falls off a cliff, and so on. The probabilistic environmental reactions are initially unknown but assumed to be sampled from the axiomatized Speed Prior, according to which hard-to-compute environmental reactions are unlikely. This permits a computable strategy for making near-optimal predictions. One by-product of maximizing expected reward is to maximize expected lifetime.

    Read more →
  • Hardware for artificial intelligence

    Hardware for artificial intelligence

    Specialized computer hardware is often used to execute artificial intelligence (AI) programs faster, and with less energy, such as Lisp machines, neuromorphic engineering, event cameras, and physical neural networks. Since 2017, several consumer grade CPUs and SoCs have on-die NPUs. As of 2023, the market for AI hardware is dominated by GPUs. As of the 2020s, AI computation is dominated by graphics processing units (GPUs) and newer domain-specific accelerators such as Google's Tensor Processing Units (TPUs), AMD's Instinct MI300 series, and various on-device neural-processing units (NPUs) found in consumer hardware. == Scope == For the purposes of this article, AI hardware refers to computing components and systems specifically designed or optimized to accelerate artificial-intelligence workloads such as machine-learning training or inference. This includes general-purpose accelerators used for AI (for example, GPUs) and domain-specific accelerators (for example, TPUs, NPUs, and other AI ASICs). Event-based cameras are sometimes discussed in the context of neuromorphic computing, but they are input sensors rather than AI compute devices. Conversely, components such as memristors are basic circuit elements rather than specialized AI hardware when considered alone. == Lisp machines == Lisp machines were developed in the late 1970s and early 1980s to make artificial intelligence programs written in the programming language Lisp run faster. == Dataflow architecture == Dataflow architecture processors used for AI serve various purposes with varied implementations like the polymorphic dataflow Convolution Engine by Kinara (formerly Deep Vision), structure-driven dataflow by Hailo, and dataflow scheduling by Cerebras. == Component hardware == === AI accelerators === Since the 2010s, advances in computer hardware have led to more efficient methods for training deep neural networks that contain many layers of non-linear hidden units and a very large output layer. By 2019, graphics processing units (GPUs), often with AI-specific enhancements, had displaced central processing units (CPUs) as the dominant means to train large-scale commercial cloud AI. OpenAI estimated the hardware compute used in the largest deep learning projects from Alex Net (2012) to Alpha Zero (2017), and found a 300,000-fold increase in the amount of compute needed, with a doubling-time trend of 3.4 months. === General-purpose GPUs for AI === Since the 2010s, graphics processing units (GPUs) have been widely used to train and deploy deep learning models because of their highly parallel architecture and high memory bandwidth. Modern data-center GPUs include dedicated tensor or matrix-math units that accelerate neural-network operations. In 2022, NVIDIA introduced the Hopper-generation H100 GPU, adding FP8 precision support and faster interconnects for large-scale model training. AMD and other vendors have also developed GPUs and accelerators aimed at AI and high-performance computing workloads. === Domain-specific accelerators (ASICs / NPUs) === Beyond general-purpose GPUs, several companies have developed application-specific integrated circuits (ASICs) and neural processing units (NPUs) tailored for AI workloads. Google introduced the Tensor Processing Unit (TPU) in 2016 for deep-learning inference, with later generations supporting large-scale training through dense systolic-array designs and optical interconnects. Other vendors have released similar devices—such as Apple's Neural Engine and various on-device NPUs—that emphasize energy-efficient inference in mobile or edge computing environments. === Memory and interconnects === AI accelerators rely on fast memory and inter-chip links to manage the large data volumes of training and inference. High-bandwidth memory (HBM) stacks, standardized as HBM3 in 2022, provide terabytes-per-second throughput on modern GPUs and ASICs. These accelerators are often connected through dedicated fabrics such as NVIDIA's NVLink and NVSwitch or optical interconnects used in TPU systems to scale performance across thousands of chips.

    Read more →
  • TinyML

    TinyML

    TinyML (short for tiny machine learning) is an area of machine learning that focuses on deploying and running models on low-power, resource-constrained embedded systems such as microcontrollers and edge devices. TinyML supports on-device inference with low latency and minimal reliance on cloud connectivity, which makes it suitable for applications in the Internet of Things (IoT), wearable devices, and real-time systems. == History == The idea of running machine learning models on embedded systems has gained traction in the late 2010s, as model compression, quantization, and efficient neural network architectures progressed. The term TinyML was popularized in 2019 with the publication of the book TinyML by Pete Warden and Daniel Situnayake and the creation of the TinyML Foundation.

    Read more →
  • Imix video cube

    Imix video cube

    The Imix (also known as ImMix) Video Cube is one of the first computer non-linear editing systems that was a full broadcast quality online video finishing machine. After its release in 1994, Imix released a more advanced version, the Imix Turbo Cube, which boasted 4 channels of real time layered visual effects. It was a hardware computer system controlled by an Apple Macintosh computer.

    Read more →
  • Artificial psychology

    Artificial psychology

    Artificial psychology (AP) has had multiple meanings dating back to 19th century, with recent usage related to artificial intelligence (AI).Artificial psychology is a theoretical field related to artificial intelligence, cognitive science, and psychology, which explores how advanced AI systems may develop human-like decision-making processes. In 1999, Zhiliang Wang and Lun Xie presented a theory of artificial psychology based on artificial intelligence. They analyze human psychology using information science research methods and artificial intelligence research to probe deeper into the human mind. == Main Theory == Dan Curtis (b. 1963) proposed AP is a theoretical discipline. The theory considers the situation when an artificial intelligence approaches the level of complexity where the intelligence meets two conditions: Condition I A: Makes all of its decisions autonomously B: Is capable of making decisions based on information that is New Abstract Incomplete C: The artificial intelligence is capable of reprogramming itself based on the new data, allowing it to evolve. D: And is capable of resolving its own programming conflicts, even in the presence of incomplete data. This means that the intelligence autonomously makes value-based decisions, referring to values that the intelligence has created for itself. Condition II All four criteria are met in situations that are not part of the original operating program When both conditions are met, then, according to this theory, the possibility exists that the intelligence will reach irrational conclusions based on real or created information. At this point, the criteria are met for intervention which will not necessarily be resolved by simple re-coding of processes due to extraordinarily complex nature of the codebase itself; but rather a discussion with the intelligence in a format which more closely resembles classical (human) psychology. If the intelligence cannot be reprogrammed by directly inputting new code, but requires the intelligence to reprogram itself through a process of analysis and decision based on information provided by a human, in order for it to overcome behavior which is inconsistent with the machines purpose or ability to function normally, then artificial psychology is by definition, what is required. The level of complexity that is required before these thresholds are met is currently a subject of extensive debate. The theory of artificial psychology does not address the specifics of what those levels may be, but only that the level is sufficiently complex that the intelligence cannot simply be recoded by a software developer, and therefore dysfunctionality must be addressed through the same processes that humans must go through to address their own dysfunctionalities. Along the same lines, artificial psychology does not address the question of whether or not the intelligence is conscious. As of 2022, the level of artificial intelligence does not approach any threshold where any of the theories or principles of artificial psychology can even be tested, and therefore, artificial psychology remains a largely theoretical discipline. Even at a theoretical level, artificial psychology remains an advanced stage of artificial intelligence.

    Read more →
  • Hidden layer

    Hidden layer

    In artificial neural networks, a hidden layer is a layer of artificial neurons that is neither an input layer nor an output layer. The simplest examples appear in multilayer perceptrons (MLP), as illustrated in the diagram. An MLP without any hidden layer is essentially just a linear model. With hidden layers and activation functions, however, nonlinearity is introduced into the model. In typical machine learning practice, the weights and biases are initialized, then iteratively updated during training via backpropagation.

    Read more →
  • Computational humor

    Computational humor

    Computational humor is a branch of computational linguistics and artificial intelligence which uses computers in humor research. It is a relatively new area, with the first dedicated conference organized in 1996. The first "computer model of a sense of humor" was suggested by Suslov as early as 1992. Investigation of the general scheme of the information processing show a possibility of a specific malfunction, conditioned by the necessity of a quick deletion from consciousness of a false version. This specific malfunction can be identified with a humorous effect on the psychological grounds; however, an essentially new ingredient, a role of timing, is added to a well known role of ambiguity. In biological systems, a sense of humour inevitably develops in the course of evolution, because its biological function consists in quickening the transmission of processed information into consciousness and in a more effective use of brain resources. A realization of this algorithm in neural networks explains naturally the mechanism of laughter: deletion of a false version corresponds to zeroing of some part of the neural network and excessive energy of neurons is thrown out to the motor cortex, arousing muscular contractions. Unfortunately, a practical realization of this algorithm needs extensive databases, whose creation in the automatic regime was suggested only recently . As a result, this magistral direction was not developed properly and subsequent investigations (see below) accepted somewhat specialized colouring. == Joke generators == === Pun generation === An approach to analysis of humor is classification of jokes. A further step is an attempt to generate jokes basing on the rules that underlie classification. Simple prototypes for computer pun generation were reported in the early 1990s, based on a natural language generator program, VINCI. Graeme Ritchie and Kim Binsted in their 1994 research paper described a computer program, JAPE, designed to generate question-answer-type puns from a general, i.e., non-humorous, lexicon. (The program name is an acronym for "Joke Analysis and Production Engine".) Some examples produced by JAPE are: Q: What is the difference between leaves and a car? A: One you brush and rake, the other you rush and brake. Q: What do you call a strange market? A: A bizarre bazaar. Since then the approach has been improved, and the latest report, dated 2007, describes the STANDUP joke generator, implemented in the Java programming language. The STANDUP generator was tested on children within the framework of analyzing its usability for language skills development for children with communication disabilities, e.g., because of cerebral palsy. (The project name is an acronym for "System To Augment Non-speakers' Dialog Using Puns" and an allusion to standup comedy.) Children responded to this "language playground" with enthusiasm, and showed marked improvement on certain types of language tests. The two young people, who used the system over a ten-week period, regaled their peers, staff, family and neighbors with jokes such as: "What do you call a spicy missile? A hot shot!" Their joy and enthusiasm at entertaining others was inspirational. === Other === Stock and Strapparava described a program to generate funny acronyms. == Joke recognition == A statistical machine learning algorithm to detect whether a sentence contained a "That's what she said" double entendre was developed by Kiddon and Brun (2011). There is an open-source Python implementation of Kiddon & Brun's TWSS system. A program to recognize knock-knock jokes was reported by Taylor and Mazlack. This kind of research is important in analysis of human–computer interaction. An application of machine learning techniques for the distinguishing of joke texts from non-jokes was described by Mihalcea and Strapparava (2006). Takizawa et al. (1996) reported on a heuristic program for detecting puns in the Japanese language. == Applications == A possible application for assistance in language acquisition is described in the section "Pun generation". Another envisioned use of joke generators is in cases of a steady supply of jokes where quantity is more important than quality. Another obvious, yet remote, direction is automated joke appreciation. It is known that humans interact with computers in ways similar to interacting with other humans that may be described in terms of personality, politeness, flattery, and in-group favoritism. Therefore, the role of humor in human–computer interaction is being investigated. In particular, humor generation in user interface to ease communications with computers was suggested. Craig McDonough implemented the Mnemonic Sentence Generator, which converts passwords into humorous sentences. Based on the incongruity theory of humor, it is suggested that the resulting meaningless but funny sentences are easier to remember. For example, the password AjQA3Jtv is converted into "Arafat joined Quayle's Ant, while TARAR Jeopardized thurmond's vase," an example chosen by combining politicians names with verbs and common nouns. == Related research == John Allen Paulos is known for his interest in mathematical foundations of humor. His book Mathematics and Humor: A Study of the Logic of Humor demonstrates structures common to humor and formal sciences (mathematics, linguistics) and develops a mathematical model of jokes based on catastrophe theory. Conversational systems which have been designed to take part in Turing test competitions generally have the ability to learn humorous anecdotes and jokes. Because many people regard humor as something particular to humans, its appearance in conversation can be quite useful in convincing a human interrogator that a hidden entity, which could be a machine or a human, is in fact a human.

    Read more →
  • ELMo

    ELMo

    ELMo (embeddings from language model) is a word embedding method for representing a sequence of words as a corresponding sequence of vectors. It was created by researchers at the Allen Institute for Artificial Intelligence, and University of Washington and first released in February 2018. It is a bidirectional LSTM which takes character-level as inputs and produces word-level embeddings, trained on a corpus of about 30 million sentences and 1 billion words. The architecture of ELMo accomplishes a contextual understanding of tokens. Deep contextualized word representation is useful for many natural language processing tasks, such as coreference resolution and polysemy resolution. ELMo was historically important as a pioneer of self-supervised generative pretraining followed by fine-tuning, where a large model is trained to reproduce a large corpus, then the large model is augmented with additional task-specific weights and fine-tuned on supervised task data. It was an instrumental step in the evolution towards transformer-based language modelling. == Architecture == ELMo is a multilayered bidirectional LSTM on top of a token embedding layer. The output of all LSTMs concatenated together consists of the token embedding. The input text sequence is first mapped by an embedding layer into a sequence of vectors. Then two parts are run in parallel over it. The forward part is a 2-layered LSTM with 4096 units and 512 dimension projections, and a residual connection from the first to second layer. The backward part has the same architecture, but processes the sequence back-to-front. The outputs from all 5 components (embedding layer, two forward LSTM layers, and two backward LSTM layers) are concatenated and multiplied by a linear matrix ("projection matrix") to produce a 512-dimensional representation per input token. ELMo was pretrained on a text corpus of 1 billion words. The forward part is trained by repeatedly predicting the next token, and the backward part is trained by repeatedly predicting the previous token. After the ELMo model is pretrained, its parameters are frozen, except for the projection matrix, which can be fine-tuned to minimize loss on specific language tasks. This is an early example of the pretraining-fine-tune paradigm. The original paper demonstrated this by improving state of the art on six benchmark NLP tasks. === Contextual word representation === The architecture of ELMo accomplishes a contextual understanding of tokens. For example, the first forward LSTM of ELMo would process each input token in the context of all previous tokens, and the first backward LSTM would process each token in the context of all subsequent tokens. The second forward LSTM would then incorporate those to further contextualize each token. Deep contextualized word representation is useful for many natural language processing tasks, such as coreference resolution and polysemy resolution. For example, consider the sentenceShe went to the bank to withdraw money.In order to represent the token "bank", the model must resolve its polysemy in context. The first forward LSTM would process "bank" in the context of "She went to the", which would allow it to represent the word to be a location that the subject is going towards. The first backward LSTM would process "bank" in the context of "to withdraw money", which would allow it to disambiguate the word as referring to a financial institution. The second forward LSTM can then process "bank" using the representation vector provided by the first backward LSTM, thus allowing it to represent it to be a financial institution that the subject is going towards. == Historical context == ELMo is one link in a historical evolution of language modelling. Consider a simple problem of document classification, where we want to assign a label (e.g., "spam", "not spam", "politics", "sports") to a given piece of text. The simplest approach is the "bag of words" approach, where each word in the document is treated independently, and its frequency is used as a feature for classification. This was computationally cheap but ignored the order of words and their context within the sentence. GloVe and Word2Vec built upon this by learning fixed vector representations (embeddings) for words based on their co-occurrence patterns in large text corpora. Like BERT (but unlike "bag of words" such as Word2Vec and GloVe), ELMo word embeddings are context-sensitive, producing different representations for words that share the same spelling. It was trained on a corpus of about 30 million sentences and 1 billion words. Previously, bidirectional LSTM was used for contextualized word representation. ELMo applied the idea to a large scale, achieving state of the art performance. After the 2017 publication of Transformer architecture, the architecture of ELMo was changed from a multilayered bidirectional LSTM to a Transformer encoder, giving rise to BERT. BERT has a similar pretrain-fine-tune workflow, but uses a Transformer with implications for more parallelizable training.

    Read more →
  • AZFinText

    AZFinText

    Arizona Financial Text System (AZFinText) is a textual-based quantitative financial prediction system written by Robert P. Schumaker of University of Texas at Tyler and Hsinchun Chen of the University of Arizona. == System == This system differs from other systems in that it uses financial text as one of its key means of predicting stock price movement. This reduces the information lag-time problem evident in many similar systems where new information must be transcribed (e.g., such as losing a costly court battle or having a product recall), before the quant can react appropriately. AZFinText overcomes these limitations by utilizing the terms used in financial news articles to predict future stock prices twenty minutes after the news article has been released. It is believed that certain article terms can move stocks more than others. Terms such as factory exploded or workers strike will have a depressing effect on stock prices whereas terms such as earnings rose will tend to increase stock prices. The AZFinText system analyzes financial news to identify the patterns in how investors react to such specific information. It uses methods like sentiment analysis and term weighting to examine the text of news articles. This system is designed to find price differences that occur when the market responds to news stories. This approach provides an alternative and easier method for predicting stock market movements. == Overview of research == The foundation of AZFinText can be found in the ACM TOIS article. Within this paper, the authors tested several different prediction models and linguistic textual representations. From this work, it was found that using the article terms and the price of the stock at the time the article was released was the most effective model and using proper nouns was the most effective textual representation technique. Combining the two, AZFinText netted a 2.84% trading return over the five-week study period. AZFinText was then extended to study what combination of peer organizations help to best train the system. Using the premise that IBM has more in common with Microsoft than GM, AZFinText studied the effect of varying peer-based training sets. To do this, AZFinText trained on the various levels of GICS and evaluated the results. It was found that sector-based training was most effective, netting an 8.50% trading return, outperforming Jim Cramer, Jim Jubak and DayTraders.com during the study period. AZFinText was also compared against the top 10 quantitative systems and outperformed 6 of them. A third study investigated the role of portfolio building in a textual financial prediction system. From this study, Momentum and Contrarian stock portfolios were created and tested. Using the premise that past winning stocks will continue to win and past losing stocks will continue to lose, AZFinText netted a 20.79% return during the study period. It was also noted that traders were generally overreacting to news events, creating the opportunity of abnormal returns. A fourth study looked into using author sentiment as an added predictive variable. Using the premise that an author can unwittingly influence market trades simply by the terms they use, AZFinText was tested using tone and polarity features. It was found that Contrarian activity was occurring within the market, where articles of a positive tone would decrease in price and articles of a negative tone would increase in price. A further study investigated what article verbs have the most influence on stock price movement. From this work, it was found that planted, announcing, front, smaller and crude had the highest positive impact on stock price. == Notable publicity == AZFinText has been the topic of discussion by numerous media outlets. Some of the more notable ones include The Wall Street Journal, MIT's Technology Review, Dow Jones Newswire, WBIR in Knoxville, TN, Slashdot and other media outlets.

    Read more →
  • Competition in artificial intelligence

    Competition in artificial intelligence

    Competition in artificial intelligence refers to the rivalry among companies, research institutions, and governments to develop and deploy the most capable artificial intelligence (AI) systems. The competition spans multiple domains, including large language models (LLMs), autonomous vehicles, robotics, computer vision systems, natural language processing (NLP), and AI-optimized hardware. == Background == Competition in AI is driven by potential economic, strategic, and scientific advantages. Breakthroughs in AI can enhance productivity, enable new products and services, and provide geopolitical leverage. The field has experienced rapid progress since the mid-2010s, particularly in machine learning and artificial neural networks, leading to intense rivalry among leading actors. == Corporate competition == Major technology companies are among the most visible competitors in AI. In the United States, firms such as OpenAI, Google DeepMind, Meta Platforms, Microsoft, Anthropic, and Nvidia compete in building advanced LLMs, generative AI platforms, and AI-optimized graphics processing units (GPUs). In China, companies such as Baidu, Alibaba Group, Tencent, and startups such DeepSeek have become leaders in AI deployment, often with state backing. The "[war for talent]" in AI research has become a defining feature of corporate competition. Leading firms often recruit top AI researchers from rivals, sometimes offering multi-million-dollar compensation packages. == National competition == Governments see leadership in AI as a strategic priority. The United States has funded AI research for military, economic, and societal applications, while China has set a target to lead the world in AI by 2030 through its "New Generation Artificial Intelligence Development Plan". Other nations, including the UK, India, Israel, Russia, South Korea, and members of the European Union, have launched national AI strategies. In February 2026 Anthropic said Chinese companies - DeepSeek, Moonshot AI, and MiniMax - were conducting "distillation attacks" in an attempt to copy their model's capabilities, and warned that business wars were closely tied to geopolitical ones: "foreign labs that illicitly distill American models can remove safeguards, feeding model capabilities into their own military, intelligence, and surveillance systems." == Sectors of competition == === Large language models and chatbots competition === Competition to produce the most capable generative text models, with benchmarks such as MMLU and ARC used to evaluate performance has been on scale since the emergence of AI. These systems leverage deep learning, especially transformer architectures, to understand and generate human-like language. Companies and research groups globally compete to develop chatbots that are more capable, reliable, and context-aware. Among the most well-known chatbots is ChatGPT, developed by OpenAI. Since its public release in 2022, ChatGPT has rapidly gained widespread attention for its ability to engage in coherent and versatile conversations, assist with creative writing, and solve complex problems. In response, technology firms introduced competing chatbots aiming to challenge or surpass ChatGPT's capabilities. Notably, DeepSeek, a Chinese AI company, launched an advanced chatbot integrated with their R1 language model, emphasizing strong natural language understanding and multilingual support. Similarly, Grok, developed by xAI (company), integrates conversational AI into vehicles and digital assistants, combining natural language processing with real-time data for personalized user interaction. These chatbots not only compete in language tasks but also demonstrate strategic reasoning capabilities by playing complex games such as chess and Go. This form of competition is reminiscent of historic AI milestones set by programs such as Deep Blue and AlphaGo. The OpenAI’s ChatGPT has been tested in playing chess at various levels, while DeepSeek’s chatbot showcased its prowess in online chess tournaments in early 2024, winning several matches against human and AI opponents. Grok, leveraging Tesla's vast data infrastructure, has demonstrated real-time strategic decision-making in simulation environments that include chess-like games. The competition pushes rapid innovation, with firms racing to improve chatbot conversational depth, reduce biases, increase factual accuracy, and integrate multimodal inputs like images and videos. At the same time, the competition raises questions about AI safety, ethical use, and the societal impacts of increasingly human-like chatbots. === Autonomous vehicles === Companies such as Waymo, Tesla, and Baidu are racing to deploy safe and reliable self-driving car technology. === AI chips === Rivalry between Nvidia, AMD, Intel, and Huawei in designing processors optimized for AI workloads. === Military applications === Development of AI-enabled drones, surveillance systems, and decision-support tools, with associated ethical debates. == Events == In 2023, OpenAI released GPT-4, prompting competitors such as Google DeepMind to accelerate the release of their own models, including Gemini. In 2024, Chinese AI company DeepSeek launched the R1 model, leading OpenAI to release an open-source system, GPT-OSS, as a strategic countermeasure. In 2022, Tesla and Waymo both expanded autonomous taxi services in U.S. cities, competing for regulatory approval and public trust. The U.S. Department of Defense's Project Maven and China's AI-enabled surveillance programs have been cited as examples of military AI rivalry. In 2025, Microsoft hired several senior engineers from Google DeepMind, highlighting the ongoing "talent poaching" competition in the AI sector. == Risks and concerns == Critics warn that unrestrained competition in AI can undermine safety, ethics, and governance. Concerns include the proliferation of biased or unsafe models, escalation in autonomous weapons, and reduced cooperation on safety standards.

    Read more →