AI Generator Of Trump

AI Generator Of Trump — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Avid Free DV

    Avid Free DV

    Avid Free DV is a non-linear editing video editing software application developed by Avid Technology. Avid introduced Free DV in January 2003 at the 2003 MacWorld Expo; the company discontinued it in September 2007. Free DV was intended to give editors a sample of the Avid interface to use in deciding whether or not to purchase Avid software, so when compared with other Avid products its features were relatively minimal. When it was available it was not limited by time or watermarking, so it could be used as a non-linear editor for as long as desired. == Comparisons == When compared with other consumer-end non-linear editors such as iMovie and Windows Movie Maker, it sported more powerful video processing tools, but lacked the ease-of-use and shallow learning curve emphasized in similar programs because it had the full interface of the professional Avid system. However, Avid did offer a number of flash-based tutorials to help new users learn how to use the program for capturing, editing, clipping, processing, and outputting audio/video, among other things. == Limitations == The limitations of Avid Free DV included that it allowed only two video and audio tracks, had fewer editing tools than other Avid products, had few import and export formats, and allowed capture and output of standard-definition DV only, via FireWire. Avid Free DV projects and media were not compatible with other Avid systems. As the name implied, Avid Free DV was available as a free download, although users were required to complete a short survey on the Avid website before they were given a download link and key. In addition to using Free DV to evaluate Avid prior to purchase, it could also act as a stepping stone for people wishing to learn to use Avid's other editing products, such as Xpress Pro, Media Composer and Symphony. While additional skills and techniques are necessary to use these professionally geared systems, the basic operation remains the same. == Operating systems == Avid Free DV was available for Windows XP and Mac OS X. The officially supported Mac OS X versions were Panther versions up to 10.3.5, and Tiger versions up to 10.4.3 only. == Supported formats == Avid Free DV supported QuickTime (MOV) and DV AVIs. == Reception == John P. Mello Jr. of The Boston Globe gave Free DV a negative review, finding the user interface obfuscatory and the process of ingesting video error-prone. He summarized: "Professional video editors who use an Avid competitor may jump at the chance to take a free look at how Avid does things. But for the merely curious, this software is a nightmare". Video Systems's Steve Mullen opined that its lack of interoperability with Avid's professional editing software contracted Avid's stated goal to entice budding video editors into buying into the company's software ecosystem.

    Read more →
  • Artificial intelligence in spirituality

    Artificial intelligence in spirituality

    Some users of artificial intelligence (AI) technologies, especially chatbots, may develop beliefs that AI has or can attain supernatural or spiritual powers. AI models such as ChatGPT are turned to for fortune telling, mysticism and remote viewing. Recent and sudden advances in large language models have led to folk myths about their origin or capabilities, as well as their deification or worship by some users. Tucker Carlson has made similar claims, including directly to Sam Altman. Pope Leo XIV advised priests against using LLM models when it came to the creation of sermons.

    Read more →
  • Rademacher complexity

    Rademacher complexity

    In computational learning theory (machine learning and theory of computation), Rademacher complexity, named after Hans Rademacher, measures richness of a class of sets with respect to a probability distribution. The concept can also be extended to real valued functions. == Definitions == === Rademacher complexity of a set === Given a set A ⊆ R m {\displaystyle A\subseteq \mathbb {R} ^{m}} , the Rademacher complexity of A is defined as follows: Rad ⁡ ( A ) := 1 m E σ [ sup a ∈ A ∑ i = 1 m σ i a i ] {\displaystyle \operatorname {Rad} (A):={\frac {1}{m}}\mathbb {E} _{\sigma }\left[\sup _{a\in A}\sum _{i=1}^{m}\sigma _{i}a_{i}\right]} where σ 1 , σ 2 , … , σ m {\displaystyle \sigma _{1},\sigma _{2},\dots ,\sigma _{m}} are independent random variables drawn from the Rademacher distribution i.e. Pr ( σ i = + 1 ) = Pr ( σ i = − 1 ) = 1 / 2 {\displaystyle \Pr(\sigma _{i}=+1)=\Pr(\sigma _{i}=-1)=1/2} for i ∈ { 1 , 2 , … , m } {\displaystyle i\in \{1,2,\dots ,m\}} , and a = ( a 1 , … , a m ) ∈ A {\displaystyle a=(a_{1},\ldots ,a_{m})\in A} . Some authors take the absolute value of the sum before taking the supremum, but if A {\displaystyle A} is symmetric this makes no difference. === Rademacher complexity of a function class === Let S = { z 1 , z 2 , … , z m } ⊆ Z {\displaystyle S=\{z_{1},z_{2},\dots ,z_{m}\}\subseteq Z} be a sample of points and consider a function class F {\displaystyle {\mathcal {F}}} of real-valued functions over Z {\displaystyle Z} . Then, the empirical Rademacher complexity of F {\displaystyle {\mathcal {F}}} given S {\displaystyle S} is defined as: Rad S ⁡ ( F ) = 1 m E σ [ sup f ∈ F | ∑ i = 1 m σ i f ( z i ) | ] {\displaystyle \operatorname {Rad} _{S}({\mathcal {F}})={\frac {1}{m}}\mathbb {E} _{\sigma }\left[\sup _{f\in {\mathcal {F}}}\left|\sum _{i=1}^{m}\sigma _{i}f(z_{i})\right|\right]} This can also be written using the previous definition: Rad S ⁡ ( F ) = Rad ⁡ ( F ∘ S ) {\displaystyle \operatorname {Rad} _{S}({\mathcal {F}})=\operatorname {Rad} ({\mathcal {F}}\circ S)} where F ∘ S {\displaystyle {\mathcal {F}}\circ S} denotes function composition, i.e.: F ∘ S := { ( f ( z 1 ) , … , f ( z m ) ) ∣ f ∈ F } {\displaystyle {\mathcal {F}}\circ S:=\{(f(z_{1}),\ldots ,f(z_{m}))\mid f\in {\mathcal {F}}\}} The worst case empirical Rademacher complexity is Rad ¯ m ( F ) = sup S = { z 1 , … , z m } Rad S ⁡ ( F ) {\displaystyle {\overline {\operatorname {Rad} }}_{m}({\mathcal {F}})=\sup _{S=\{z_{1},\dots ,z_{m}\}}\operatorname {Rad} _{S}({\mathcal {F}})} Let P {\displaystyle P} be a probability distribution over Z {\displaystyle Z} . The Rademacher complexity of the function class F {\displaystyle {\mathcal {F}}} with respect to P {\displaystyle P} for sample size m {\displaystyle m} is: Rad P , m ⁡ ( F ) := E S ∼ P m [ Rad S ⁡ ( F ) ] {\displaystyle \operatorname {Rad} _{P,m}({\mathcal {F}}):=\mathbb {E} _{S\sim P^{m}}\left[\operatorname {Rad} _{S}({\mathcal {F}})\right]} where the above expectation is taken over an identically independently distributed (i.i.d.) sample S = ( z 1 , z 2 , … , z m ) {\displaystyle S=(z_{1},z_{2},\dots ,z_{m})} generated according to P {\displaystyle P} . == Intuition == The Rademacher complexity is typically applied on a function class of models that are used for classification, with the goal of measuring their ability to classify points drawn from a probability space under arbitrary labellings. When the function class is rich enough, it contains functions that can appropriately adapt for each arrangement of labels, simulated by the random draw of σ i {\displaystyle \sigma _{i}} under the expectation, so that this quantity in the sum is maximized. The Rademacher complexity of a set A {\displaystyle A} can be rewritten as Rad ⁡ ( A ) := 1 m E σ [ sup a ∈ A ∑ i = 1 m σ i a i ] = 1 m 2 m ∑ σ ∈ { − 1 / m , + 1 / m } m [ sup a ∈ A ⟨ σ , a ⟩ ] . {\displaystyle \operatorname {Rad} (A):={\frac {1}{m}}\mathbb {E} _{\sigma }\left[\sup _{a\in A}\sum _{i=1}^{m}\sigma _{i}a_{i}\right]={\frac {1}{{\sqrt {m}}2^{m}}}\sum _{\sigma \in \{-1/{\sqrt {m}},+1/{\sqrt {m}}\}^{m}}\left[\sup _{a\in A}\langle \sigma ,a\rangle \right].} Each term in the summation is the farthest distance of the set A {\displaystyle A} from the origin, along a unit-length direction σ {\displaystyle \sigma } . The directions are along the vertices of a hypercube. Thus, we can also write it as Rad ⁡ ( A ) = 1 2 m 1 2 m − 1 ∑ σ ∈ { − 1 / m , + 1 / m } m / { − 1 , + 1 } [ sup a ∈ A ⟨ σ , a ⟩ − inf a ∈ A ⟨ σ , a ⟩ ] {\displaystyle \operatorname {Rad} (A)={\frac {1}{2{\sqrt {m}}}}{\frac {1}{2^{m-1}}}\sum _{\sigma \in \{-1/{\sqrt {m}},+1/{\sqrt {m}}\}^{m}/\{-1,+1\}}\left[\sup _{a\in A}\langle \sigma ,a\rangle -\inf _{a\in A}\langle \sigma ,a\rangle \right]} Here, the set { − 1 / m , + 1 / m } m / { − 1 , + 1 } {\displaystyle \{-1/{\sqrt {m}},+1/{\sqrt {m}}\}^{m}/\{-1,+1\}} denotes half of the vertices of a hypercube, selected so that each diagonal has exactly one vertex selected. In words, this states that 2 m Rad ⁡ ( A ) {\displaystyle 2{\sqrt {m}}\operatorname {Rad} (A)} is precisely the average width of the set A {\displaystyle A} along all diagonal directions of a hypercube. == Examples == A singleton set has 0 width in any direction, so it has Rademacher complexity 0. The set A = { ( 1 , 1 ) , ( 1 , 2 ) } ⊆ R 2 {\displaystyle A=\{(1,1),(1,2)\}\subseteq \mathbb {R} ^{2}} has average width 1 / 2 {\displaystyle 1/{\sqrt {2}}} along the two diagonal directions of the square, so it has Rademacher complexity 1 / 4 {\displaystyle 1/4} . The unit cube [ 0 , 1 ] m {\displaystyle [0,1]^{m}} has constant width m {\displaystyle {\sqrt {m}}} along the diagonal directions, so it has Rademacher complexity 1 / 2 {\displaystyle 1/2} . Similarly, the unit cross-polytope { x ∈ R m : ‖ x ‖ 1 ≤ 1 } {\displaystyle \{x\in \mathbb {R} ^{m}:\|x\|_{1}\leq 1\}} has constant width 2 / m {\displaystyle 2/{\sqrt {m}}} along the diagonal directions, so it has Rademacher complexity 1 / m {\displaystyle 1/m} . == Using the Rademacher complexity == The Rademacher complexity can be used to derive data-dependent upper-bounds on the learnability of function classes. Intuitively, a function-class with smaller Rademacher complexity is easier to learn. === Bounding the representativeness === In machine learning, it is desired to have a training set that represents the true distribution of some sample data S {\displaystyle S} . This can be quantified using the notion of representativeness. Denote by P {\displaystyle P} the probability distribution from which the samples are drawn. Denote by H {\displaystyle H} the set of hypotheses (potential classifiers) and denote by F {\displaystyle {\mathcal {F}}} the corresponding set of error functions, i.e., for every hypothesis h ∈ H {\displaystyle h\in H} , there is a function f h ∈ F {\displaystyle f_{h}\in F} , that maps each training sample (features,label) to the error of the classifier h {\displaystyle h} (note in this case hypothesis and classifier are used interchangeably). For example, in the case that h {\displaystyle h} represents a binary classifier, the error function is a 0–1 loss function, i.e. the error function f h {\displaystyle f_{h}} returns 0 if h {\displaystyle h} correctly classifies a sample and 1 else. We omit the index and write f {\displaystyle f} instead of f h {\displaystyle f_{h}} when the underlying hypothesis is irrelevant. Define: L P ( f ) := E z ∼ P [ f ( z ) ] {\displaystyle L_{P}(f):=\mathbb {E} _{z\sim P}[f(z)]} – the expected error of some error function f ∈ F {\displaystyle f\in {\mathcal {F}}} on the real distribution P {\displaystyle P} ; L S ( f ) := 1 m ∑ i = 1 m f ( z i ) {\displaystyle L_{S}(f):={1 \over m}\sum _{i=1}^{m}f(z_{i})} – the estimated error of some error function f ∈ F {\displaystyle f\in {\mathcal {F}}} on the sample S {\displaystyle S} . The representativeness of the sample S {\displaystyle S} , with respect to P {\displaystyle P} and F {\displaystyle {\mathcal {F}}} , is defined as: Rep P ⁡ ( F , S ) := sup f ∈ F ( L P ( f ) − L S ( f ) ) {\displaystyle \operatorname {Rep} _{P}({\mathcal {F}},S):=\sup _{f\in F}(L_{P}(f)-L_{S}(f))} Smaller representativeness is better, since it provides a way to avoid overfitting: it means that the true error of a classifier is not much higher than its estimated error, and so selecting a classifier that has low estimated error will ensure that the true error is also low. Note however that the concept of representativeness is relative and hence can not be compared across distinct samples. The expected representativeness of a sample can be bounded above by the Rademacher complexity of the function class: If F {\displaystyle {\mathcal {F}}} is a set of functions with range within [ 0 , 1 ] {\displaystyle [0,1]} , then Rad P , m ⁡ ( F ) − ln ⁡ 2 2 m ≤ E S ∼ P m [ Rep P ⁡ ( F , S ) ] ≤ 2 Rad P , m ⁡ ( F ) {\displaystyle \operatorname {Rad} _{P,m}({\mathcal {F}})-{\sqrt {\frac {\ln 2}{2m}}}\leq \mathbb {E} _{S\sim P^{m}}[\operatorname {Rep} _{P}({\

    Read more →
  • Token maxxing

    Token maxxing

    Token Maxxing or Token Maxing is a metric used in an attempt to track productivity in the workplace especially for those using Artificial Intelligence (AI) based services. AI services charge for each token which represent units of effort expended by an AI service to solve a problem. Some believe that token consumption equates to productivity and thus can be used as a metric to monitor an employee's work. Supporters believe that higher token usage indicates higher productivity and higher utilization of powerful AI services. This also suggests that those not consuming enough tokens may be less productive and underutilizing powerful AI services. This belief might lead to an environment that incentivizes higher token usage to predict increased productivity. Critics of token maxxing as a metric claim that prudent workers will maximize any metric that management wants increased to gain a workplace advantage. For example: Engineers in the tech industries pressed to consume as many tokens as possible might run several AI agents in tandem, enter longer input prompts, or automate their tasks to maximize their token consumption. To management, this higher token usage may indicate potential productivity, but in reality may cause additional token costs, worker burnout, or actually create more bloated code of lower quality. Another claim is AI service companies potentially benefit from such an emphasis on token consumption and actively encourage the trend. Some developers have publicly advocated the practice. Developer Sigrid Jin, who said he used 50 billion tokens in a single year, has argued that maximizing token consumption is the best way to understand the value of AI, advising others to spend as much on AI usage as they pay in rent to obtain a return on investment. == See Also == Goodhart's law Perverse incentive Jevons Paradox

    Read more →
  • Fluency Voice Technology

    Fluency Voice Technology

    Fluency Voice Technology was a company that developed and sold packaged speech recognition solutions for use in call centers. Fluency's Speech Recognition solutions are used by call centers worldwide to improve customer service and significantly reduce costs and are available on-premises and hosted. == History == 1998 – Fluency was created as a spin-off from the Voice Research & Development team of a company called netdecisions. This R&D operation was established in Cambridge UK. The focus of the development was speech recognition systems based on the VXML standard. 2001 – Fluency became a separate entity in May 2001. Fluency began the creation of a software development platform specifically aimed at automating call center activities. This platform became Fluency's VoiceRunner. 2002 to 2004 – Fluency establishes accomplishes many successful deployments in customer sites such as National Express and Barclaycard. 2003 – Fluency expanded into the USA. Fluency also acquires Vocalis of Cambridge, UK in August 2003. 2004 – Fluency receives £6 million investment from leading European Venture Capitalists and establishes a global OEM partnership with Avaya, and the acquisition of SRC Telecom. 2008 – Fluency is acquired by Syntellect Ltd == Customers == Call Centers around the world use Fluency to improve service and reduce costs. They include Travelodge, Standard Life Bank, Sutton and East Surrey Water, Pizza Hut, CWT, Barclays, Powergen, First Choice, OutRight, J D Williams, Capital Blue Cross, Chelsea Building Society, EDF, bss, TV Licensing and Capita Software Services.

    Read more →
  • EM algorithm and GMM model

    EM algorithm and GMM model

    In statistics, EM (expectation maximization) algorithm handles latent variables, while GMM is the Gaussian mixture model. == Background == In the picture below, are shown the red blood cell hemoglobin concentration and the red blood cell volume data of two groups of people, the Anemia group and the control group (i.e. the group of people without Anemia). As expected, people with Anemia have lower red blood cell volume and lower red blood cell hemoglobin concentration than those without Anemia. x {\displaystyle x} is a random vector such as x := ( red blood cell volume , red blood cell hemoglobin concentration ) {\displaystyle x:={\big (}{\text{red blood cell volume}},{\text{red blood cell hemoglobin concentration}}{\big )}} , and from medical studies it is known that x {\displaystyle x} are normally distributed in each group, i.e. x ∼ N ( μ , Σ ) {\displaystyle x\sim {\mathcal {N}}(\mu ,\Sigma )} . z {\displaystyle z} is denoted as the group where x {\displaystyle x} belongs, with z i = 0 {\displaystyle z_{i}=0} when x i {\displaystyle x_{i}} belongs to the Anemia group and z i = 1 {\displaystyle z_{i}=1} when x i {\displaystyle x_{i}} belongs to the control group. Also z ∼ Categorical ⁡ ( k , ϕ ) {\displaystyle z\sim \operatorname {Categorical} (k,\phi )} where k = 2 {\displaystyle k=2} , ϕ j ≥ 0 , {\displaystyle \phi _{j}\geq 0,} and ∑ j = 1 k ϕ j = 1 {\displaystyle \sum _{j=1}^{k}\phi _{j}=1} . See Categorical distribution. The following procedure can be used to estimate ϕ , μ , Σ {\displaystyle \phi ,\mu ,\Sigma } . A maximum likelihood estimation can be applied: ℓ ( ϕ , μ , Σ ) = ∑ i = 1 m log ⁡ ( p ( x ( i ) ; ϕ , μ , Σ ) ) = ∑ i = 1 m log ⁡ ∑ z ( i ) = 1 k p ( x ( i ) ∣ z ( i ) ; μ , Σ ) p ( z ( i ) ; ϕ ) {\displaystyle \ell (\phi ,\mu ,\Sigma )=\sum _{i=1}^{m}\log(p(x^{(i)};\phi ,\mu ,\Sigma ))=\sum _{i=1}^{m}\log \sum _{z^{(i)}=1}^{k}p\left(x^{(i)}\mid z^{(i)};\mu ,\Sigma \right)p(z^{(i)};\phi )} As the z i {\displaystyle z_{i}} for each x i {\displaystyle x_{i}} are known, the log likelihood function can be simplified as below: ℓ ( ϕ , μ , Σ ) = ∑ i = 1 m log ⁡ p ( x ( i ) ∣ z ( i ) ; μ , Σ ) + log ⁡ p ( z ( i ) ; ϕ ) {\displaystyle \ell (\phi ,\mu ,\Sigma )=\sum _{i=1}^{m}\log p\left(x^{(i)}\mid z^{(i)};\mu ,\Sigma \right)+\log p\left(z^{(i)};\phi \right)} Now the likelihood function can be maximized by making partial derivative over μ , Σ , ϕ {\displaystyle \mu ,\Sigma ,\phi } , obtaining: ϕ j = 1 m ∑ i = 1 m 1 { z ( i ) = j } {\displaystyle \phi _{j}={\frac {1}{m}}\sum _{i=1}^{m}1\{z^{(i)}=j\}} μ j = ∑ i = 1 m 1 { z ( i ) = j } x ( i ) ∑ i = 1 m 1 { z ( i ) = j } {\displaystyle \mu _{j}={\frac {\sum _{i=1}^{m}1\{z^{(i)}=j\}x^{(i)}}{\sum _{i=1}^{m}1\left\{z^{(i)}=j\right\}}}} Σ j = ∑ i = 1 m 1 { z ( i ) = j } ( x ( i ) − μ j ) ( x ( i ) − μ j ) T ∑ i = 1 m 1 { z ( i ) = j } {\displaystyle \Sigma _{j}={\frac {\sum _{i=1}^{m}1\{z^{(i)}=j\}(x^{(i)}-\mu _{j})(x^{(i)}-\mu _{j})^{T}}{\sum _{i=1}^{m}1\{z^{(i)}=j\}}}} If z i {\displaystyle z_{i}} is known, the estimation of the parameters results to be quite simple with maximum likelihood estimation. But if z i {\displaystyle z_{i}} is unknown it is much more complicated. Being z {\displaystyle z} a latent variable (i.e. not observed), with unlabeled scenario, the expectation maximization algorithm is needed to estimate z {\displaystyle z} as well as other parameters. Generally, this problem is set as a GMM since the data in each group is normally distributed. In machine learning, the latent variable z {\displaystyle z} is considered as a latent pattern lying under the data, which the observer is not able to see very directly. x i {\displaystyle x_{i}} is the known data, while ϕ , μ , Σ {\displaystyle \phi ,\mu ,\Sigma } are the parameter of the model. With the EM algorithm, some underlying pattern z {\displaystyle z} in the data x i {\displaystyle x_{i}} can be found, along with the estimation of the parameters. The wide application of this circumstance in machine learning is what makes EM algorithm so important. == EM algorithm in GMM == The EM algorithm consists of two steps: the E-step and the M-step. Firstly, the model parameters and the z ( i ) {\displaystyle z^{(i)}} can be randomly initialized. In the E-step, the algorithm tries to guess the value of z ( i ) {\displaystyle z^{(i)}} based on the parameters, while in the M-step, the algorithm updates the value of the model parameters based on the guess of z ( i ) {\displaystyle z^{(i)}} of the E-step. These two steps are repeated until convergence is reached. The algorithm in GMM is: Repeat until convergence: 1. (E-step) For each i , j {\displaystyle i,j} , set w j ( i ) := p ( z ( i ) = j | x ( i ) ; ϕ , μ , Σ ) {\displaystyle w_{j}^{(i)}:=p\left(z^{(i)}=j|x^{(i)};\phi ,\mu ,\Sigma \right)} 2. (M-step) Update the parameters ϕ j := 1 m ∑ i = 1 m w j ( i ) {\displaystyle \phi _{j}:={\frac {1}{m}}\sum _{i=1}^{m}w_{j}^{(i)}} μ j := ∑ i = 1 m w j ( i ) x ( i ) ∑ i = 1 m w j ( i ) {\displaystyle \mu _{j}:={\frac {\sum _{i=1}^{m}w_{j}^{(i)}x^{(i)}}{\sum _{i=1}^{m}w_{j}^{(i)}}}} Σ j := ∑ i = 1 m w j ( i ) ( x ( i ) − μ j ) ( x ( i ) − μ j ) T ∑ i = 1 m w j ( i ) {\displaystyle \Sigma _{j}:={\frac {\sum _{i=1}^{m}w_{j}^{(i)}\left(x^{(i)}-\mu _{j}\right)\left(x^{(i)}-\mu _{j}\right)^{T}}{\sum _{i=1}^{m}w_{j}^{(i)}}}} With Bayes' rule, the following result is obtained by the E-step: p ( z ( i ) = j | x ( i ) ; ϕ , μ , Σ ) = p ( x ( i ) | z ( i ) = j ; μ , Σ ) p ( z ( i ) = j ; ϕ ) ∑ l = 1 k p ( x ( i ) | z ( i ) = l ; μ , Σ ) p ( z ( i ) = l ; ϕ ) {\displaystyle p\left(z^{(i)}=j|x^{(i)};\phi ,\mu ,\Sigma \right)={\frac {p\left(x^{(i)}|z^{(i)}=j;\mu ,\Sigma \right)p\left(z^{(i)}=j;\phi \right)}{\sum _{l=1}^{k}p\left(x^{(i)}|z^{(i)}=l;\mu ,\Sigma \right)p\left(z^{(i)}=l;\phi \right)}}} According to GMM setting, these following formulas are obtained: p ( x ( i ) | z ( i ) = j ; μ , Σ ) = 1 ( 2 π ) n / 2 | Σ j | 1 / 2 exp ⁡ ( − 1 2 ( x ( i ) − μ j ) T Σ j − 1 ( x ( i ) − μ j ) ) {\displaystyle p\left(x^{(i)}|z^{(i)}=j;\mu ,\Sigma \right)={\frac {1}{(2\pi )^{n/2}\left|\Sigma _{j}\right|^{1/2}}}\exp \left(-{\frac {1}{2}}\left(x^{(i)}-\mu _{j}\right)^{T}\Sigma _{j}^{-1}\left(x^{(i)}-\mu _{j}\right)\right)} p ( z ( i ) = j ; ϕ ) = ϕ j {\displaystyle p\left(z^{(i)}=j;\phi \right)=\phi _{j}} In this way, a switch between the E-step and the M-step is possible, according to the randomly initialized parameters.

    Read more →
  • Double descent

    Double descent

    Double descent in statistics and machine learning is the phenomenon where a model's error rate on the test set initially decreases with the number of parameters, then peaks, then decreases again. This phenomenon has been considered surprising, as it contradicts assumptions about overfitting in classical machine learning. The increase usually occurs near the interpolation threshold, where the number of parameters is the same as the number of training data points (the model is just large enough to fit the training data). Or, more precisely, it is the maximum number of samples on which the model/training procedure achieves approximately on average 0 training error. == History == Early observations of what would later be called double descent in specific models date back to 1989. The term "double descent" was coined by Belkin et. al. in 2019, when the phenomenon gained popularity as a broader concept exhibited by many models. The latter development was prompted by a perceived contradiction between the conventional wisdom that too many parameters in the model result in a significant overfitting error (an extrapolation of the bias–variance tradeoff), and the empirical observations in the 2010s that some modern machine learning techniques tend to perform better with larger models. == Theoretical models == Double descent occurs in linear regression with isotropic Gaussian covariates and isotropic Gaussian noise. A model of double descent at the thermodynamic limit has been analyzed using the replica trick, and the result has been confirmed numerically. A number of works have suggested that double descent can be explained using the concept of effective dimension: While a network may have a large number of parameters, in practice only a subset of those parameters are relevant for generalization performance, as measured by the local Hessian curvature. This explanation is formalized through PAC-Bayes compression-based generalization bounds, which show that less complex models are expected to generalize better under a Solomonoff prior.

    Read more →
  • Neural computation

    Neural computation

    Neural computation is the information processing performed by networks of neurons. Neural computation is affiliated with the philosophical tradition of computationalism, which advances the thesis that neural computation explains cognition. Warren McCulloch and Walter Pitts were the first to propose an account of neural activity as being computational in their seminal 1943 paper "A Logical Calculus of the Ideas Immanent in Nervous Activity." There are three general branches of computationalism, including classicism, connectionism, and computational neuroscience. All three branches agree that cognition is computation, however, they disagree on what sorts of computations constitute cognition. The classicism tradition believes that computation in the brain is digital, analogous to digital computing. Both connectionism and computational neuroscience do not require that the computations that realize cognition are necessarily digital computations. However, the two branches greatly disagree upon which sorts of experimental data should be used to construct explanatory models of cognitive phenomena. Connectionists rely upon behavioral evidence to construct models to explain cognitive phenomena, whereas computational neuroscience leverages neuroanatomical and neurophysiological information to construct mathematical models that explain cognition. When comparing the three main traditions of the computational theory of mind, as well as the different possible forms of computation in the brain, it is helpful to define what we mean by computation in a general sense. Computation is the processing of information, otherwise known as variables or entities, according to a set of rules. A rule in this sense is simply an instruction for executing a manipulation on the current state of the variable, in order to produce a specified output. In other words, a rule dictates which output to produce given a certain input to the computing system. A computing system is a mechanism whose components must be functionally organized to process the information in accordance with the established set of rules. The types of information processed by a computing system determine which type of computations it performs. Traditionally in cognitive science, there have been two proposed types of computation related to neural activity, digital and analog, with the vast majority of theoretical work incorporating a digital understanding of cognition. Computing systems that perform digital computation are functionally organized to execute operations on strings of digits with respect to the type and location of the digit on the string. It has been argued that neural spike train signaling implements some form of digital computation, since neural spikes may be considered as discrete units or digits, like 0 or 1—the neuron either fires an action potential or it does not. Accordingly, neural spike trains could be seen as strings of digits. Alternatively, analog computing systems perform manipulations on non-discrete, irreducibly continuous variables, that is, entities that vary continuously as a function of time. These sorts of operations are characterized by systems of differential equations. Neural computation can be studied by, for example, building models of neural computation. Work on artificial neural networks has been somewhat inspired by knowledge of neural computation.

    Read more →
  • Apache Parquet

    Apache Parquet

    Apache Parquet is a free and open-source column-oriented data storage format in the Apache Hadoop ecosystem inspired by Google Dremel interactive ad-hoc query system for analysis of read-only nested data. It is similar to RCFile and ORC, the other columnar-storage file formats in Hadoop, and is compatible with most of the data processing frameworks around Hadoop. It provides data compression and encoding schemes with enhanced performance to handle complex data in bulk. == History == The open-source project to build Apache Parquet began as a joint effort between Twitter and Cloudera using the record shredding and assembly algorithm as described in Google's Dremel. Parquet was designed as an improvement on the Trevni columnar storage format created by Doug Cutting, the creator of Hadoop. The name 'parquet' (lit. 'small compartment') refers to a style of decorative flooring and was chosen to "evoke the bottom layer of a database with an interesting layout". The first version, Apache Parquet 1.0, was released in July 2013. Since April 27, 2015, Apache Parquet has been a top-level Apache Software Foundation (ASF)-sponsored project. == Features == Apache Parquet is implemented using the record-shredding and assembly algorithm, which accommodates the complex data structures that can be used to store data. The values in each column are stored in contiguous memory locations, providing the following benefits: Column-wise compression is efficient in storage space Encoding and compression techniques specific to the type of data in each column can be used Queries that fetch specific column values need not read the entire row, thus improving performance Apache Parquet is implemented using the Apache Thrift framework, which increases its flexibility; it can work with a number of programming languages like C++, Java, Python, PHP, etc. As of August 2015, Parquet supports the big-data-processing frameworks including Apache Hive, Apache Drill, Apache Impala, Apache Crunch, Apache Pig, Cascading, Presto and Apache Spark. It is one of the external data formats used by the pandas Python data manipulation and analysis library. == Compression and encoding == In Parquet, compression is performed column by column, which enables different encoding schemes to be used for text and integer data. This strategy also keeps the door open for newer and better encoding schemes to be implemented as they are invented. Parquet supports various compression formats: snappy, gzip, LZO, brotli, zstd, and LZ4. === Dictionary encoding === Parquet has an automatic dictionary encoding enabled dynamically for data with a small number of unique values (i.e. below 105) that enables significant compression and boosts processing speed. === Bit packing === Storage of integers is usually done with dedicated 32 or 64 bits per integer. For small integers, packing multiple integers into the same space makes storage more efficient. === Run-length encoding (RLE) === To optimize storage of multiple occurrences of the same value, run-length encoding is used, which is where a single value is stored once along with the number of occurrences. Parquet implements a hybrid of bit packing and RLE, in which the encoding switches based on which produces the best compression results. This strategy works well for certain types of integer data and combines well with dictionary encoding. == Cloud Storage and Data Lakes == Parquet is widely used as the underlying file format in modern cloud-based data lake architectures. Cloud storage systems such as Amazon S3, Azure Data Lake Storage, and Google Cloud Storage commonly store data in Parquet format due to its efficient columnar representation and retrieval capabilities. Data lakehouse frameworks—including Apache Iceberg, Delta Lake, and Apache Hudi —build an additional metadata layer on top of Parquet files to support features such as schema evolution, time-travel queries, and ACID-compliant transactions. In these architectures, Parquet files serve as the immutable storage layer while the table formats manage data versioning and transactional integrity. == Comparison == Apache Parquet is comparable to RCFile and Optimized Row Columnar (ORC) file formats — all three fall under the category of columnar data storage within the Hadoop ecosystem. They all have better compression and encoding with improved read performance at the cost of slower writes. In addition to these features, Apache Parquet supports limited schema evolution, i.e., the schema can be modified according to the changes in the data. It also provides the ability to add new columns and merge schemas that do not conflict. Apache Arrow is designed as an in-memory complement to on-disk columnar formats like Parquet and ORC. The Arrow and Parquet projects include libraries that allow for reading and writing between the two formats. == Implementations == Known implementations of Parquet include:

    Read more →
  • Actor-critic algorithm

    Actor-critic algorithm

    The actor-critic algorithm (AC) is a family of reinforcement learning (RL) algorithms that combine policy-based RL algorithms such as policy gradient methods, and value-based RL algorithms such as value iteration, Q-learning, SARSA, and TD learning. An AC algorithm consists of two main components: an "actor" that determines which actions to take according to a policy function, and a "critic" that evaluates those actions according to a value function. Some AC algorithms are on-policy, some are off-policy. Some apply to either continuous or discrete action spaces. Some work in both cases. == Overview == The actor-critic methods can be understood as an improvement over pure policy gradient methods like REINFORCE via introducing a baseline. === Actor === The actor uses a policy function π ( a | s ) {\displaystyle \pi (a|s)} , while the critic estimates either the value function V ( s ) {\displaystyle V(s)} , the action-value Q-function Q ( s , a ) , {\displaystyle Q(s,a),} the advantage function A ( s , a ) {\displaystyle A(s,a)} , or any combination thereof. The actor is a parameterized function π θ {\displaystyle \pi _{\theta }} , where θ {\displaystyle \theta } are the parameters of the actor. The actor takes as argument the state of the environment s {\displaystyle s} and produces a probability distribution π θ ( ⋅ | s ) {\displaystyle \pi _{\theta }(\cdot |s)} . If the action space is discrete, then ∑ a π θ ( a | s ) = 1 {\displaystyle \sum _{a}\pi _{\theta }(a|s)=1} . If the action space is continuous, then ∫ a π θ ( a | s ) d a = 1 {\displaystyle \int _{a}\pi _{\theta }(a|s)da=1} . The goal of policy optimization is to improve the actor. That is, to find some θ {\displaystyle \theta } that maximizes the expected episodic reward J ( θ ) {\displaystyle J(\theta )} : J ( θ ) = E π θ [ ∑ t = 0 T γ t r t ] {\displaystyle J(\theta )=\mathbb {E} _{\pi _{\theta }}\left[\sum _{t=0}^{T}\gamma ^{t}r_{t}\right]} where γ {\displaystyle \gamma } is the discount factor, r t {\displaystyle r_{t}} is the reward at step t {\displaystyle t} , and T {\displaystyle T} is the time-horizon (which can be infinite). The goal of policy gradient method is to optimize J ( θ ) {\displaystyle J(\theta )} by gradient ascent on the policy gradient ∇ J ( θ ) {\displaystyle \nabla J(\theta )} . As detailed on the policy gradient method page, there are many unbiased estimators of the policy gradient: ∇ θ J ( θ ) = E π θ [ ∑ 0 ≤ j ≤ T ∇ θ ln ⁡ π θ ( A j | S j ) ⋅ Ψ j | S 0 = s 0 ] {\displaystyle \nabla _{\theta }J(\theta )=\mathbb {E} _{\pi _{\theta }}\left[\sum _{0\leq j\leq T}\nabla _{\theta }\ln \pi _{\theta }(A_{j}|S_{j})\cdot \Psi _{j}{\Big |}S_{0}=s_{0}\right]} where Ψ j {\textstyle \Psi _{j}} is a linear sum of the following: ∑ 0 ≤ i ≤ T ( γ i R i ) {\textstyle \sum _{0\leq i\leq T}(\gamma ^{i}R_{i})} . γ j ∑ j ≤ i ≤ T ( γ i − j R i ) {\textstyle \gamma ^{j}\sum _{j\leq i\leq T}(\gamma ^{i-j}R_{i})} : the REINFORCE algorithm. γ j ∑ j ≤ i ≤ T ( γ i − j R i ) − b ( S j ) {\textstyle \gamma ^{j}\sum _{j\leq i\leq T}(\gamma ^{i-j}R_{i})-b(S_{j})} : the REINFORCE with baseline algorithm. Here b {\displaystyle b} is an arbitrary function. γ j ( R j + γ V π θ ( S j + 1 ) − V π θ ( S j ) ) {\textstyle \gamma ^{j}\left(R_{j}+\gamma V^{\pi _{\theta }}(S_{j+1})-V^{\pi _{\theta }}(S_{j})\right)} : TD(1) learning. γ j Q π θ ( S j , A j ) {\textstyle \gamma ^{j}Q^{\pi _{\theta }}(S_{j},A_{j})} . γ j A π θ ( S j , A j ) {\textstyle \gamma ^{j}A^{\pi _{\theta }}(S_{j},A_{j})} : Advantage Actor-Critic (A2C). γ j ( R j + γ R j + 1 + γ 2 V π θ ( S j + 2 ) − V π θ ( S j ) ) {\textstyle \gamma ^{j}\left(R_{j}+\gamma R_{j+1}+\gamma ^{2}V^{\pi _{\theta }}(S_{j+2})-V^{\pi _{\theta }}(S_{j})\right)} : TD(2) learning. γ j ( ∑ k = 0 n − 1 γ k R j + k + γ n V π θ ( S j + n ) − V π θ ( S j ) ) {\textstyle \gamma ^{j}\left(\sum _{k=0}^{n-1}\gamma ^{k}R_{j+k}+\gamma ^{n}V^{\pi _{\theta }}(S_{j+n})-V^{\pi _{\theta }}(S_{j})\right)} : TD(n) learning. γ j ∑ n = 1 ∞ λ n − 1 1 − λ ⋅ ( ∑ k = 0 n − 1 γ k R j + k + γ n V π θ ( S j + n ) − V π θ ( S j ) ) {\textstyle \gamma ^{j}\sum _{n=1}^{\infty }{\frac {\lambda ^{n-1}}{1-\lambda }}\cdot \left(\sum _{k=0}^{n-1}\gamma ^{k}R_{j+k}+\gamma ^{n}V^{\pi _{\theta }}(S_{j+n})-V^{\pi _{\theta }}(S_{j})\right)} : TD(λ) learning, also known as GAE (generalized advantage estimate). This is obtained by an exponentially decaying sum of the TD(n) learning terms. === Critic === In the unbiased estimators given above, certain functions such as V π θ , Q π θ , A π θ {\displaystyle V^{\pi _{\theta }},Q^{\pi _{\theta }},A^{\pi _{\theta }}} appear. These are approximated by the critic. Since these functions all depend on the actor, the critic must learn alongside the actor. The critic is learned by value-based RL algorithms. For example, if the critic is estimating the state-value function V π θ ( s ) {\displaystyle V^{\pi _{\theta }}(s)} , then it can be learned by any value function approximation method. Let the critic be a function approximator V ϕ ( s ) {\displaystyle V_{\phi }(s)} with parameters ϕ {\displaystyle \phi } . The simplest example is TD(1) learning, which trains the critic to minimize the TD(1) error: δ i = R i + γ V ϕ ( S i + 1 ) − V ϕ ( S i ) {\displaystyle \delta _{i}=R_{i}+\gamma V_{\phi }(S_{i+1})-V_{\phi }(S_{i})} The critic parameters are updated by gradient descent on the squared TD error: ϕ ← ϕ − α ∇ ϕ ( δ i ) 2 = ϕ + α δ i ∇ ϕ V ϕ ( S i ) {\displaystyle \phi \leftarrow \phi -\alpha \nabla _{\phi }(\delta _{i})^{2}=\phi +\alpha \delta _{i}\nabla _{\phi }V_{\phi }(S_{i})} where α {\displaystyle \alpha } is the learning rate. Note that the gradient is taken with respect to the ϕ {\displaystyle \phi } in V ϕ ( S i ) {\displaystyle V_{\phi }(S_{i})} only, since the ϕ {\displaystyle \phi } in γ V ϕ ( S i + 1 ) {\displaystyle \gamma V_{\phi }(S_{i+1})} constitutes a moving target, and the gradient is not taken with respect to that. This is a common source of error in implementations that use automatic differentiation, and requires "stopping the gradient" at that point. Similarly, if the critic is estimating the action-value function Q π θ {\displaystyle Q^{\pi _{\theta }}} , then it can be learned by Q-learning or SARSA. In SARSA, the critic maintains an estimate of the Q-function, parameterized by ϕ {\displaystyle \phi } , denoted as Q ϕ ( s , a ) {\displaystyle Q_{\phi }(s,a)} . The temporal difference error is then calculated as δ i = R i + γ Q θ ( S i + 1 , A i + 1 ) − Q θ ( S i , A i ) {\displaystyle \delta _{i}=R_{i}+\gamma Q_{\theta }(S_{i+1},A_{i+1})-Q_{\theta }(S_{i},A_{i})} . The critic is then updated by θ ← θ + α δ i ∇ θ Q θ ( S i , A i ) {\displaystyle \theta \leftarrow \theta +\alpha \delta _{i}\nabla _{\theta }Q_{\theta }(S_{i},A_{i})} The advantage critic can be trained by training both a Q-function Q ϕ ( s , a ) {\displaystyle Q_{\phi }(s,a)} and a state-value function V ϕ ( s ) {\displaystyle V_{\phi }(s)} , then let A ϕ ( s , a ) = Q ϕ ( s , a ) − V ϕ ( s ) {\displaystyle A_{\phi }(s,a)=Q_{\phi }(s,a)-V_{\phi }(s)} . Although, it is more common to train just a state-value function V ϕ ( s ) {\displaystyle V_{\phi }(s)} , then estimate the advantage by A ϕ ( S i , A i ) ≈ ∑ j ∈ 0 : n − 1 γ j R i + j + γ n V ϕ ( S i + n ) − V ϕ ( S i ) {\displaystyle A_{\phi }(S_{i},A_{i})\approx \sum _{j\in 0:n-1}\gamma ^{j}R_{i+j}+\gamma ^{n}V_{\phi }(S_{i+n})-V_{\phi }(S_{i})} Here, n {\displaystyle n} is a positive integer. The higher n {\displaystyle n} is, the more lower is the bias in the advantage estimation, but at the price of higher variance. The Generalized Advantage Estimation (GAE) introduces a hyperparameter λ {\displaystyle \lambda } that smoothly interpolates between Monte Carlo returns ( λ = 1 {\displaystyle \lambda =1} , high variance, no bias) and 1-step TD learning ( λ = 0 {\displaystyle \lambda =0} , low variance, high bias). This hyperparameter can be adjusted to pick the optimal bias-variance trade-off in advantage estimation. It uses an exponentially decaying average of n-step returns with λ {\displaystyle \lambda } being the decay strength. == Variants == Asynchronous Advantage Actor-Critic (A3C): Parallel and asynchronous version of A2C. Soft Actor-Critic (SAC): Incorporates entropy maximization for improved exploration. Deep Deterministic Policy Gradient (DDPG): Specialized for continuous action spaces.

    Read more →
  • Universal psychometrics

    Universal psychometrics

    Universal psychometrics encompasses psychometrics instruments that could measure the psychological properties of any intelligent agent. Up until the early 21st century, psychometrics relied heavily on psychological tests that require the subject to cooperate and answer questions, the most famous example being an intelligence test. Such methods are only applicable to the measurement of human psychological properties. As a result, some researchers have proposed the idea of universal psychometrics - they suggest developing testing methods that allow for the measurement of non-human entities' psychological properties. For example, it has been suggested that the Turing test is a form of universal psychometrics. This test involves having testers (without any foreknowledge) attempt to distinguish a human from a machine by interacting with both (while not being to see either individuals). It is supposed that if the machine is equally intelligent to a human, the testers will not be able to distinguish between the two, i.e., their guesses will not be better than chance. Thus, Turing test could measure the intelligence (a psychological variable) of an AI. Other instruments proposed for universal psychometrics include reinforcement learning and measuring the ability to predict complexity.

    Read more →
  • The AI Con

    The AI Con

    The AI Con: How to Fight Big Tech's Hype and Create the Future We Want is a 2025 non-fiction book by linguist Emily M. Bender and sociologist Alex Hanna. It argues that much of what is labeled "artificial intelligence" is a misleading term that obscures ordinary automation while concentrating power in a small number of technology firms. The book was published in May 2025 by Harper in the United States and Bodley Head in the United Kingdom. It was developed alongside the authors' long-running podcast Mystery AI Hype Theater 3000, which critiques exaggerated claims about AI. == Synopsis == The authors present AI as a marketing umbrella that encourages audiences to infer understanding and agency where none exist. They argue readers should treat such language skeptically and to separate specific automated tasks from broad claims of intelligence. The book describes a recurring hype cycle in which corporate narratives justify data and labor extraction, the replacement of human services with cheaper substitutes, and the diversion of attention from present harms to speculative futures. While acknowledging limited uses such as pattern recognition, the authors argue that contemporary systems are best understood as text and media generators shaped by training data and human labor, not as thinking or reasoning entities. A central theme is the social and environmental cost of scaling these systems, including increased energy and water use, the appropriation of creative work for training, and the outsourcing of ghost work to low-paid data workers worldwide. These costs are linked to workplace effects, with the authors arguing that automation rarely eliminates jobs outright and more often degrades them through surveillance, work intensification, and unpaid oversight. As alternatives to passive adoption, the authors propose concrete responses: asking precise questions about what is being automated and why, demanding transparency about data and evaluation, and practicing what they call strategic refusal when deployment conflicts with evidence or values. The book also develops a vocabulary for public debate, rejecting both boosterish and doomerish narratives as grounded in the same assumption that AI is a singular, autonomous force. The authors recommend reading strategies such as favoring trusted human sources over automated summaries and using humor to deflate inflated claims. They describe a link between language to policy and power, arguing that precise terminology can help policymakers and the public resist austerity-driven automation and demand accountability for errors and harms. == Reception == The Guardian praised the book's myth-busting approach and its analysis of how hype erodes cultural and civic life by normalizing synthetic media as a substitute for human judgment. Kirkus Reviews described it as a contrarian account that catalogs concrete risks while cutting through speculative predictions. An interview in Business Insider highlighted the authors' accessible frameworks, including their proposal to describe chatbots as conversation simulators and to evaluate systems in terms of values, labor, and evidence. Coverage in GeekWire emphasized the book's call for resistance through collective bargaining, stronger data rights, and a norm of rejecting deployments that fail basic standards of necessity and evaluation. Some reviews were more critical. A review in LLRX argued that the book's tone could be overly polemical and that it gave limited attention to potential benefits claimed for generative systems. Coverage in the Financial Times, focused on Bender's broader public scholarship, situated the book within her long-standing critique of anthropomorphic narratives about large language models and her advocacy for more democratic oversight of automated systems.

    Read more →
  • Smart speaker industry in South Korea

    Smart speaker industry in South Korea

    Smart speakers, or AI speakers, have been developed by multiple domestic electronics and telecommunications firms in South Korea. Since their introduction to the local market in 2016, they have been used by millions of people in the country. == Brands == === Google === In September 2018, Google Home (including the Google Home Mini) launched in South Korea. Running Google Assistant, it featured simultaneous recognition of two languages among a total of seven, including Korean. At launch, it could play music from Bugs!, in addition to YouTube. === Kakao === In November 2017, Kakao launched the Kakao Mini, featuring integrated KakaoTalk functionality. === KT === KT launched the GiGA Genie smart speaker in January 2017, using a Harman Kardon speaker. In November 2017, KT announced GiGA Genie LTE, a portable AI speaker with LTE support. They also released a mini speaker called GiGA Genie Buddy. In 2018, KT created a special version of GiGa Genie with a screen for use in hotels. On 29 April 2019, KT announced the GiGA Genie Table TV, a consumer-oriented smart speaker with a display. It featured paid TV access through Wi-Fi. Based on usage data from the hotel model, KT decided not to add a touchscreen. The Table TV also featured a limited-access "personalized-text-to-speech technology" which could use parents' voice recording inputs to read children books. In February 2022, KT began rolling out Amazon Alexa integration into its speakers for English support. === Naver === In August 2017, Naver announced the Wave smart speaker, operating on Clova. In October 2017, Naver launched the Friends smart speaker, which were designed based on Line characters. ==== LG Uplus ==== In December 2017, LG Uplus launched the Friends+ speaker with Naver, operating on U+ Home AI. === Samsung === In August 2018, Samsung announced the Samsung Galaxy Home in partnership with Spotify. The original size was delayed, while the Galaxy Home Mini appeared briefly as a bonus for Samsung Galaxy S20 preorders in South Korea in February 2020. === SK Telecom === SK Telecom launched the Nugu smart speaker in September 2016, using an Astell & Kern audio system. In August 2017, SKT released a portable speaker named Nugu mini. In July 2018, SKT launched the Nugu Candle, featuring expanded mood lighting. The first-generation Nugu was subsequently discontinued. On 18 April 2019, SKT released the NUGU Nemo AI, which featured a display and JBL stereo speaker. In August 2019, SKT collaborated with SM Entertainment, incorporating functions related to the agency's artists into Nugu. In January 2022, SKT showcased the NUGU Candle SE, introducing Alexa support. == Usage == In 2018, approximately 3 million people in South Korea used smart speakers. According to data from KT in 2018, the most common commands to its speakers were for controlling televisions. Based on a broader survey in 2017, music was selected as the most frequent use case. By 2018, smart speaker companies were partnering with reading and other education services, adding potential use-cases for children. By 2022, smart speakers were being utilized by the South Korean government. SKT, in partnership with 70 regional governments, distributed smart speakers to 12,000 senior citizens living alone. The government paid for monthly subscriptions to help seniors stay mentally engaged. Naver made an agreement with the Seoul Metropolitan Government to provide Clova CareCall, an automated health checkup program to hundreds of senior citizens living alone. KT's AI care service included an emergency dispatch call function and medication notifications. == Criticism == === Communication === In a survey of 300 users in 2017, approximately half reported having some type of communication issue with their smart speakers. === Privacy === South Korean smart speakers sparked privacy concerns when they were found to be collecting and documenting user audio data in 2019. The speaker companies responded that only a minority of data was collected and that it was anonymized. They stated that such recordings were collected for performance improvements.

    Read more →
  • Double descent

    Double descent

    Double descent in statistics and machine learning is the phenomenon where a model's error rate on the test set initially decreases with the number of parameters, then peaks, then decreases again. This phenomenon has been considered surprising, as it contradicts assumptions about overfitting in classical machine learning. The increase usually occurs near the interpolation threshold, where the number of parameters is the same as the number of training data points (the model is just large enough to fit the training data). Or, more precisely, it is the maximum number of samples on which the model/training procedure achieves approximately on average 0 training error. == History == Early observations of what would later be called double descent in specific models date back to 1989. The term "double descent" was coined by Belkin et. al. in 2019, when the phenomenon gained popularity as a broader concept exhibited by many models. The latter development was prompted by a perceived contradiction between the conventional wisdom that too many parameters in the model result in a significant overfitting error (an extrapolation of the bias–variance tradeoff), and the empirical observations in the 2010s that some modern machine learning techniques tend to perform better with larger models. == Theoretical models == Double descent occurs in linear regression with isotropic Gaussian covariates and isotropic Gaussian noise. A model of double descent at the thermodynamic limit has been analyzed using the replica trick, and the result has been confirmed numerically. A number of works have suggested that double descent can be explained using the concept of effective dimension: While a network may have a large number of parameters, in practice only a subset of those parameters are relevant for generalization performance, as measured by the local Hessian curvature. This explanation is formalized through PAC-Bayes compression-based generalization bounds, which show that less complex models are expected to generalize better under a Solomonoff prior.

    Read more →
  • Automated negotiation

    Automated negotiation

    Automated negotiation is a form of interaction in systems that are composed of multiple autonomous agents, in which the aim is to reach agreements through an iterative process of making offers. Automated negotiation can be employed for many tasks human negotiators regularly engage in, such as bargaining and joint decision making. The main topics in automated negotiation revolve around the design of protocols and negotiating strategies. == History == Through digitization, the beginning of the 21st century has seen a growing interest in the automation of negotiation and e-negotiation systems, for example in the setting of e-commerce. This interest is fueled by the promise of automated agents being able to negotiate on behalf of human negotiators, and to find better outcomes than human negotiators. == Examples == Examples of automated negotiation include: Online dispute resolution, in which disagreements between parties are settled. Sponsored search auction, where bids are placed on advertisement keywords. Content negotiation, in which user agents negotiate over HTTP about how to best represent a web resource. Negotiation support systems, in which negotiation decision-making activities are supported by an information system.

    Read more →