AI App Video Generator

AI App Video Generator — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Magisto

    Magisto

    Magisto provided an online video editing tool (both as a web application and a mobile app) for automated video editing and production. In 2019, the company was acquired by Vimeo for an estimated US$200 million. The Magisto app contained a library of music. The music, largely by independent artists, was sorted by mood and is licensed for in-app use. Magisto had a freemium business model where users can create basic video clips for free. In addition, advanced business, professional and personal service tiers are available via various subscription plans, unlocking more features; such as longer videos, HD, premium themes, customization, and control features. == History == Magisto was founded in 2009 as SightEra (LTD) by Oren Boiman (CEO) and Alex Rav-Acha (CTO). Boiman, frustrated with the amount of time it took editing together videos of his daughter, wanted an easier to use application to capture and share videos. Boiman, a computer scientist that graduated from Tel Aviv University, followed with graduate work in computer vision at the Weizmann Institute of Science. Boiman developed several patent-pending image analysis technologies that analyze unedited videos to identify the most interesting parts. The system recognized faces, animals, landscapes, action sequences, movements and other important content within the video, as well as analyzing speech and audio. These scenes are then edited together, along with music and effects. Magisto was launched publicly on September 20, 2011, as a video editing software web application through which users could upload unedited video footage, choose a title and soundtrack and have their video edited for them automatically. On the following day, Magisto was added to YouTube Create's collection of video production applications. The Magisto iPhone app was launched publicly at the 2012 International Consumer Electronics Show (CES) in Las Vegas. At CES, the company was also awarded first place in the 2012 CES Mobile App Showdown. In August 2012, Magisto launched the Android app on Google Play. In September 2012, Magisto launched a Google Chrome App and announced Google Drive integration. In March 2013, Magisto claimed it had 5 million users. Google listed Magisto as an "Editors’ Choice" on its list of "Best Apps of 2013". In September 2013, the company claimed that 10 million users had downloaded the App. In February 2014, Magisto claimed that they had 20 million users, with 2 million new users per month. The company also confirmed investment from Mail.Ru. In September 2014, Magisto rolled out a feature called 'Instagram Ready' which allowed users to upload 15 second clips that are automatically formatted for Instagram. In the same month, Magisto launched a feature for iOS and Android users, called 'Surprise Me', which created video from still photography on users’ smartphones. In October 2014, Magisto was placed 9th on the 2014 Deloitte Israel Technology Fast 50 list and named as a finalist in the Red Herring's Top 100 Europe award. In July 2015, Magisto released an editing theme dedicated to Jerry Garcia. In April 2019, the company was acquired by Vimeo, the IAC-owned platform for hosting, sharing and monetizing streamed video, for an estimated $200 million. === Financing === In 2011, the company received more than $5.5 million in a Series B venture round funding from Magma Venture Partners and Horizons Ventures. In September 2011, at the same time as the public launch of their web application, Magisto announced a $5.5 million Series B funding round led by Li Ka-shing’s Horizons Ventures. Li Ka-Shing is known for making early-stage investments in companies like Facebook, Spotify, SecondMarket and Siri. In October 2013, the company received $13 million in funding from Qualcomm and Sandisk. In 2014, the company received $2 million in Venture Funding from Magma Venture Partners, Qualcomm Ventures, Horizons Ventures and the Mail.Ru Group. == Awards == Magisto won first place at Technonomy3, an annual Internet Technology start-up competition in Israel. Judges of the competition included Jeff Pulver, TechCrunch editor Mike Butcher, investor Yaron Samid, Bessemer Venture Partners Israel partner Adam Fisher and Brad McCarty of The Next Web. Magisto won first place at CES 2012 Mobile app competition, during the launch of Magisto iOS mobile app. Magisto was awarded twice the Google Play Editor's Choice and was part of iPhone App Store Best App awards for 2013 and 2014, and Wired Essential iPad Apps. Magisto was declared by Deloitte as the 7th fastest growing company in Europe, the Middle East, and Africa in 2016.

    Read more →
  • EfficientNet

    EfficientNet

    EfficientNet is a family of convolutional neural networks (CNNs) for computer vision published by researchers at Google AI in 2019. Its key innovation is compound scaling, which uniformly scales all dimensions of depth, width, and resolution using a single parameter. EfficientNet models have been adopted in various computer vision tasks, including image classification, object detection, and segmentation. == Compound scaling == EfficientNet introduces compound scaling, which, instead of scaling one dimension of the network at a time, such as depth (number of layers), width (number of channels), or resolution (input image size), uses a compound coefficient ϕ {\displaystyle \phi } to scale all three dimensions simultaneously. Specifically, given a baseline network, the depth, width, and resolution are scaled according to the following equations: depth multiplier: d = α ϕ width multiplier: w = β ϕ resolution multiplier: r = γ ϕ {\displaystyle {\begin{aligned}{\text{depth multiplier: }}d&=\alpha ^{\phi }\\{\text{width multiplier: }}w&=\beta ^{\phi }\\{\text{resolution multiplier: }}r&=\gamma ^{\phi }\end{aligned}}} subject to α ⋅ β 2 ⋅ γ 2 ≈ 2 {\displaystyle \alpha \cdot \beta ^{2}\cdot \gamma ^{2}\approx 2} and α ≥ 1 , β ≥ 1 , γ ≥ 1 {\displaystyle \alpha \geq 1,\beta \geq 1,\gamma \geq 1} . The α ⋅ β 2 ⋅ γ 2 ≈ 2 {\displaystyle \alpha \cdot \beta ^{2}\cdot \gamma ^{2}\approx 2} condition is such that increasing ϕ {\displaystyle \phi } by a factor of ϕ 0 {\displaystyle \phi _{0}} would increase the total FLOPs of running the network on an image approximately 2 ϕ 0 {\displaystyle 2^{\phi _{0}}} times. The hyperparameters α {\displaystyle \alpha } , β {\displaystyle \beta } , and γ {\displaystyle \gamma } are determined by a small grid search. The original paper suggested 1.2, 1.1, and 1.15, respectively. Architecturally, they optimized the choice of modules by neural architecture search (NAS), and found that the inverted bottleneck convolution (which they called MBConv) used in MobileNet worked well. The EfficientNet family is a stack of MBConv layers, with shapes determined by the compound scaling. The original publication consisted of 8 models, from EfficientNet-B0 to EfficientNet-B7, with increasing model size and accuracy. EfficientNet-B0 is the baseline network, and subsequent models are obtained by scaling the baseline network by increasing ϕ {\displaystyle \phi } . == Variants == EfficientNet has been adapted for fast inference on edge TPUs and centralized TPU or GPU clusters by NAS. EfficientNet V2 was published in June 2021. The architecture was improved by further NAS search with more types of convolutional layers. It also introduced a training method, which progressively increases image size during training, and uses regularization techniques like dropout, RandAugment, and Mixup. The authors claim this approach mitigates accuracy drops often associated with progressive resizing.

    Read more →
  • Similarity learning

    Similarity learning

    Similarity learning is an area of supervised machine learning in artificial intelligence. It is closely related to regression and classification, but the goal is to learn a similarity function that measures how similar or related two objects are. It has applications in ranking, in recommendation systems, visual identity tracking, face verification, and speaker verification. == Learning setup == There are four common setups for similarity and metric distance learning. Regression similarity learning In this setup, pairs of objects are given ( x i 1 , x i 2 ) {\displaystyle (x_{i}^{1},x_{i}^{2})} together with a measure of their similarity y i ∈ R {\displaystyle y_{i}\in R} . The goal is to learn a function that approximates f ( x i 1 , x i 2 ) ∼ y i {\displaystyle f(x_{i}^{1},x_{i}^{2})\sim y_{i}} for every new labeled triplet example ( x i 1 , x i 2 , y i ) {\displaystyle (x_{i}^{1},x_{i}^{2},y_{i})} . This is typically achieved by minimizing a regularized loss min W ∑ i l o s s ( w ; x i 1 , x i 2 , y i ) + r e g ( w ) {\displaystyle \min _{W}\sum _{i}loss(w;x_{i}^{1},x_{i}^{2},y_{i})+reg(w)} . Classification similarity learning Given are pairs of similar objects ( x i , x i + ) {\displaystyle (x_{i},x_{i}^{+})} and non similar objects ( x i , x i − ) {\displaystyle (x_{i},x_{i}^{-})} . An equivalent formulation is that every pair ( x i 1 , x i 2 ) {\displaystyle (x_{i}^{1},x_{i}^{2})} is given together with a binary label y i ∈ { 0 , 1 } {\displaystyle y_{i}\in \{0,1\}} that determines if the two objects are similar or not. The goal is again to learn a classifier that can decide if a new pair of objects is similar or not. Ranking similarity learning Given are triplets of objects ( x i , x i + , x i − ) {\displaystyle (x_{i},x_{i}^{+},x_{i}^{-})} whose relative similarity obey a predefined order: x i {\displaystyle x_{i}} is known to be more similar to x i + {\displaystyle x_{i}^{+}} than to x i − {\displaystyle x_{i}^{-}} . The goal is to learn a function f {\displaystyle f} such that for any new triplet of objects ( x , x + , x − ) {\displaystyle (x,x^{+},x^{-})} , it obeys f ( x , x + ) > f ( x , x − ) {\displaystyle f(x,x^{+})>f(x,x^{-})} (contrastive learning). This setup assumes a weaker form of supervision than in regression, because instead of providing an exact measure of similarity, one only has to provide the relative order of similarity. For this reason, ranking-based similarity learning is easier to apply in real large-scale applications. Locality sensitive hashing (LSH) Hashes input items so that similar items map to the same "buckets" in memory with high probability (the number of buckets being much smaller than the universe of possible input items). It is often applied in nearest neighbor search on large-scale high-dimensional data, e.g., image databases, document collections, time-series databases, and genome databases. A common approach for learning similarity is to model the similarity function as a bilinear form. For example, in the case of ranking similarity learning, one aims to learn a matrix W that parametrizes the similarity function f W ( x , z ) = x T W z {\displaystyle f_{W}(x,z)=x^{T}Wz} . When data is abundant, a common approach is to learn a siamese network – a deep network model with parameter sharing. == Metric learning == Similarity learning is closely related to distance metric learning. Metric learning is the task of learning a distance function over objects. A metric or distance function has to obey four axioms: non-negativity, identity of indiscernibles, symmetry and subadditivity (or the triangle inequality). In practice, metric learning algorithms ignore the condition of identity of indiscernibles and learn a pseudo-metric. When the objects x i {\displaystyle x_{i}} are vectors in R d {\displaystyle R^{d}} , then any matrix W {\displaystyle W} in the symmetric positive semi-definite cone S + d {\displaystyle S_{+}^{d}} defines a distance pseudo-metric of the space of x through the form D W ( x 1 , x 2 ) 2 = ( x 1 − x 2 ) ⊤ W ( x 1 − x 2 ) {\displaystyle D_{W}(x_{1},x_{2})^{2}=(x_{1}-x_{2})^{\top }W(x_{1}-x_{2})} . When W {\displaystyle W} is a symmetric positive definite matrix, D W {\displaystyle D_{W}} is a metric. Moreover, as any symmetric positive semi-definite matrix W ∈ S + d {\displaystyle W\in S_{+}^{d}} can be decomposed as W = L ⊤ L {\displaystyle W=L^{\top }L} where L ∈ R e × d {\displaystyle L\in R^{e\times d}} and e ≥ r a n k ( W ) {\displaystyle e\geq rank(W)} , the distance function D W {\displaystyle D_{W}} can be rewritten equivalently D W ( x 1 , x 2 ) 2 = ( x 1 − x 2 ) ⊤ L ⊤ L ( x 1 − x 2 ) = ‖ L ( x 1 − x 2 ) ‖ 2 2 {\displaystyle D_{W}(x_{1},x_{2})^{2}=(x_{1}-x_{2})^{\top }L^{\top }L(x_{1}-x_{2})=\|L(x_{1}-x_{2})\|_{2}^{2}} . The distance D W ( x 1 , x 2 ) 2 = ‖ x 1 ′ − x 2 ′ ‖ 2 2 {\displaystyle D_{W}(x_{1},x_{2})^{2}=\|x_{1}'-x_{2}'\|_{2}^{2}} corresponds to the Euclidean distance between the transformed feature vectors x 1 ′ = L x 1 {\displaystyle x_{1}'=Lx_{1}} and x 2 ′ = L x 2 {\displaystyle x_{2}'=Lx_{2}} . Many formulations for metric learning have been proposed. Some well-known approaches for metric learning include learning from relative comparisons, which is based on the triplet loss, large margin nearest neighbor, and information theoretic metric learning (ITML). In statistics, the covariance matrix of the data is sometimes used to define a distance metric called Mahalanobis distance. == Applications == Similarity learning is used in information retrieval for learning to rank, in face verification or face identification, and in recommendation systems. Also, many machine learning approaches rely on some metric. This includes unsupervised learning such as clustering, which groups together close or similar objects. It also includes supervised approaches like K-nearest neighbor algorithm which rely on labels of nearby objects to decide on the label of a new object. Metric learning has been proposed as a preprocessing step for many of these approaches. == Scalability == Metric and similarity learning scale quadratically with the dimension of the input space, as can easily see when the learned metric has a bilinear form f W ( x , z ) = x T W z {\displaystyle f_{W}(x,z)=x^{T}Wz} . Scaling to higher dimensions can be achieved by enforcing a sparseness structure over the matrix model, as done with HDSL, and with COMET. == Software == metric-learn is a free software Python library which offers efficient implementations of several supervised and weakly-supervised similarity and metric learning algorithms. The API of metric-learn is compatible with scikit-learn. OpenMetricLearning is a Python framework to train and validate the models producing high-quality embeddings. == Further information == For further information on this topic, see the surveys on metric and similarity learning by Bellet et al. and Kulis.

    Read more →
  • EM algorithm and GMM model

    EM algorithm and GMM model

    In statistics, EM (expectation maximization) algorithm handles latent variables, while GMM is the Gaussian mixture model. == Background == In the picture below, are shown the red blood cell hemoglobin concentration and the red blood cell volume data of two groups of people, the Anemia group and the control group (i.e. the group of people without Anemia). As expected, people with Anemia have lower red blood cell volume and lower red blood cell hemoglobin concentration than those without Anemia. x {\displaystyle x} is a random vector such as x := ( red blood cell volume , red blood cell hemoglobin concentration ) {\displaystyle x:={\big (}{\text{red blood cell volume}},{\text{red blood cell hemoglobin concentration}}{\big )}} , and from medical studies it is known that x {\displaystyle x} are normally distributed in each group, i.e. x ∼ N ( μ , Σ ) {\displaystyle x\sim {\mathcal {N}}(\mu ,\Sigma )} . z {\displaystyle z} is denoted as the group where x {\displaystyle x} belongs, with z i = 0 {\displaystyle z_{i}=0} when x i {\displaystyle x_{i}} belongs to the Anemia group and z i = 1 {\displaystyle z_{i}=1} when x i {\displaystyle x_{i}} belongs to the control group. Also z ∼ Categorical ⁡ ( k , ϕ ) {\displaystyle z\sim \operatorname {Categorical} (k,\phi )} where k = 2 {\displaystyle k=2} , ϕ j ≥ 0 , {\displaystyle \phi _{j}\geq 0,} and ∑ j = 1 k ϕ j = 1 {\displaystyle \sum _{j=1}^{k}\phi _{j}=1} . See Categorical distribution. The following procedure can be used to estimate ϕ , μ , Σ {\displaystyle \phi ,\mu ,\Sigma } . A maximum likelihood estimation can be applied: ℓ ( ϕ , μ , Σ ) = ∑ i = 1 m log ⁡ ( p ( x ( i ) ; ϕ , μ , Σ ) ) = ∑ i = 1 m log ⁡ ∑ z ( i ) = 1 k p ( x ( i ) ∣ z ( i ) ; μ , Σ ) p ( z ( i ) ; ϕ ) {\displaystyle \ell (\phi ,\mu ,\Sigma )=\sum _{i=1}^{m}\log(p(x^{(i)};\phi ,\mu ,\Sigma ))=\sum _{i=1}^{m}\log \sum _{z^{(i)}=1}^{k}p\left(x^{(i)}\mid z^{(i)};\mu ,\Sigma \right)p(z^{(i)};\phi )} As the z i {\displaystyle z_{i}} for each x i {\displaystyle x_{i}} are known, the log likelihood function can be simplified as below: ℓ ( ϕ , μ , Σ ) = ∑ i = 1 m log ⁡ p ( x ( i ) ∣ z ( i ) ; μ , Σ ) + log ⁡ p ( z ( i ) ; ϕ ) {\displaystyle \ell (\phi ,\mu ,\Sigma )=\sum _{i=1}^{m}\log p\left(x^{(i)}\mid z^{(i)};\mu ,\Sigma \right)+\log p\left(z^{(i)};\phi \right)} Now the likelihood function can be maximized by making partial derivative over μ , Σ , ϕ {\displaystyle \mu ,\Sigma ,\phi } , obtaining: ϕ j = 1 m ∑ i = 1 m 1 { z ( i ) = j } {\displaystyle \phi _{j}={\frac {1}{m}}\sum _{i=1}^{m}1\{z^{(i)}=j\}} μ j = ∑ i = 1 m 1 { z ( i ) = j } x ( i ) ∑ i = 1 m 1 { z ( i ) = j } {\displaystyle \mu _{j}={\frac {\sum _{i=1}^{m}1\{z^{(i)}=j\}x^{(i)}}{\sum _{i=1}^{m}1\left\{z^{(i)}=j\right\}}}} Σ j = ∑ i = 1 m 1 { z ( i ) = j } ( x ( i ) − μ j ) ( x ( i ) − μ j ) T ∑ i = 1 m 1 { z ( i ) = j } {\displaystyle \Sigma _{j}={\frac {\sum _{i=1}^{m}1\{z^{(i)}=j\}(x^{(i)}-\mu _{j})(x^{(i)}-\mu _{j})^{T}}{\sum _{i=1}^{m}1\{z^{(i)}=j\}}}} If z i {\displaystyle z_{i}} is known, the estimation of the parameters results to be quite simple with maximum likelihood estimation. But if z i {\displaystyle z_{i}} is unknown it is much more complicated. Being z {\displaystyle z} a latent variable (i.e. not observed), with unlabeled scenario, the expectation maximization algorithm is needed to estimate z {\displaystyle z} as well as other parameters. Generally, this problem is set as a GMM since the data in each group is normally distributed. In machine learning, the latent variable z {\displaystyle z} is considered as a latent pattern lying under the data, which the observer is not able to see very directly. x i {\displaystyle x_{i}} is the known data, while ϕ , μ , Σ {\displaystyle \phi ,\mu ,\Sigma } are the parameter of the model. With the EM algorithm, some underlying pattern z {\displaystyle z} in the data x i {\displaystyle x_{i}} can be found, along with the estimation of the parameters. The wide application of this circumstance in machine learning is what makes EM algorithm so important. == EM algorithm in GMM == The EM algorithm consists of two steps: the E-step and the M-step. Firstly, the model parameters and the z ( i ) {\displaystyle z^{(i)}} can be randomly initialized. In the E-step, the algorithm tries to guess the value of z ( i ) {\displaystyle z^{(i)}} based on the parameters, while in the M-step, the algorithm updates the value of the model parameters based on the guess of z ( i ) {\displaystyle z^{(i)}} of the E-step. These two steps are repeated until convergence is reached. The algorithm in GMM is: Repeat until convergence: 1. (E-step) For each i , j {\displaystyle i,j} , set w j ( i ) := p ( z ( i ) = j | x ( i ) ; ϕ , μ , Σ ) {\displaystyle w_{j}^{(i)}:=p\left(z^{(i)}=j|x^{(i)};\phi ,\mu ,\Sigma \right)} 2. (M-step) Update the parameters ϕ j := 1 m ∑ i = 1 m w j ( i ) {\displaystyle \phi _{j}:={\frac {1}{m}}\sum _{i=1}^{m}w_{j}^{(i)}} μ j := ∑ i = 1 m w j ( i ) x ( i ) ∑ i = 1 m w j ( i ) {\displaystyle \mu _{j}:={\frac {\sum _{i=1}^{m}w_{j}^{(i)}x^{(i)}}{\sum _{i=1}^{m}w_{j}^{(i)}}}} Σ j := ∑ i = 1 m w j ( i ) ( x ( i ) − μ j ) ( x ( i ) − μ j ) T ∑ i = 1 m w j ( i ) {\displaystyle \Sigma _{j}:={\frac {\sum _{i=1}^{m}w_{j}^{(i)}\left(x^{(i)}-\mu _{j}\right)\left(x^{(i)}-\mu _{j}\right)^{T}}{\sum _{i=1}^{m}w_{j}^{(i)}}}} With Bayes' rule, the following result is obtained by the E-step: p ( z ( i ) = j | x ( i ) ; ϕ , μ , Σ ) = p ( x ( i ) | z ( i ) = j ; μ , Σ ) p ( z ( i ) = j ; ϕ ) ∑ l = 1 k p ( x ( i ) | z ( i ) = l ; μ , Σ ) p ( z ( i ) = l ; ϕ ) {\displaystyle p\left(z^{(i)}=j|x^{(i)};\phi ,\mu ,\Sigma \right)={\frac {p\left(x^{(i)}|z^{(i)}=j;\mu ,\Sigma \right)p\left(z^{(i)}=j;\phi \right)}{\sum _{l=1}^{k}p\left(x^{(i)}|z^{(i)}=l;\mu ,\Sigma \right)p\left(z^{(i)}=l;\phi \right)}}} According to GMM setting, these following formulas are obtained: p ( x ( i ) | z ( i ) = j ; μ , Σ ) = 1 ( 2 π ) n / 2 | Σ j | 1 / 2 exp ⁡ ( − 1 2 ( x ( i ) − μ j ) T Σ j − 1 ( x ( i ) − μ j ) ) {\displaystyle p\left(x^{(i)}|z^{(i)}=j;\mu ,\Sigma \right)={\frac {1}{(2\pi )^{n/2}\left|\Sigma _{j}\right|^{1/2}}}\exp \left(-{\frac {1}{2}}\left(x^{(i)}-\mu _{j}\right)^{T}\Sigma _{j}^{-1}\left(x^{(i)}-\mu _{j}\right)\right)} p ( z ( i ) = j ; ϕ ) = ϕ j {\displaystyle p\left(z^{(i)}=j;\phi \right)=\phi _{j}} In this way, a switch between the E-step and the M-step is possible, according to the randomly initialized parameters.

    Read more →
  • Computer Graphics International

    Computer Graphics International

    Computer Graphics International (CGI) is one of the oldest annual international conferences on computer graphics. It is organized by the Computer Graphics Society (CGS). Researchers across the whole world are invited to share their experiences and novel achievements in various fields - like computer graphics and human-computer interaction. Former conferences have been held recently in Hong Kong (China), Geneva (Switzerland), Shanghai (China), Geneva (virtually), Calgary (Canada), Bintan (Indonesia) and Yokohama (Japan). == Awards == Starting in the year of 2013, CGI has given yearly a Best Paper Award and a Career Achievement Award. == Venues ==

    Read more →
  • World model (artificial intelligence)

    World model (artificial intelligence)

    A world model in artificial intelligence is a machine learning system that builds an internal representation of an environment. The model predicts how that environment changes over time in response to actions. Researchers design world models to help agents plan, reason, and act without constant real-world trial and error. World models differ from systems that merely classify or generate outputs. They simulate dynamics such as physics, object interactions, and causality. Early ideas date to the 1990s. Modern versions power robots, autonomous driving, and interactive video generation. == History == Jürgen Schmidhuber introduced the term world model in machine learning in 1990. He proposed recurrent neural networks that predict future states from observations and use those predictions to train agents. David Ha and Schmidhuber revived the concept in a 2018 paper. Their agents learned to drive virtual cars and play video games inside self-generated simulations. Yann LeCun advanced the idea in a 2022 position paper titled "A Path Towards Autonomous Machine Intelligence". He argued that intelligence requires predictive models of the world rather than pure pattern matching. LeCun proposed the joint embedding predictive architecture (JEPA) as a practical foundation. LeCun and collaborators developed several JEPA variants. V-JEPA 2 reached state-of-the-art performance on video understanding and physical reasoning at the time. It supports zero-shot robot control in unfamiliar environments. Introduced in March 2026, LeWorldModel trains stably end-to-end from raw pixels and uses two loss terms and avoids hand-crafted heuristics. LeCun founded Advanced Machine Intelligence Labs in 2026 to further develop world models. Google DeepMind introduced Genie in 2024. The model learned interactive environments from unlabeled internet videos. Genie 2 followed in late 2024 and added three-dimensional generation. The Genie series set benchmarks for general-purpose simulation. Genie 3 was introduced in August 2025. It produces photorealistic, real-time interactive worlds from text prompts which are displayed at 24 frames per second and explored in real time with text or image prompts. The model supports persistent three-dimensional worlds and real-time interaction. Waymo adopted Genie 3 in February 2026 and used it to create a specialized world model for autonomous driving simulation, called the Waymo World Model. It produces synchronized camera and lidar outputs and creates edge cases that real robotaxis rarely encounter. The edge cases were reported to be unusual by PCMag. General Intuition announced a $133.7 million seed round. World Labs raised $1 billion. AMI raised $1.03 billion. In April 2026, Alibaba announced Happy Oyster, its world model designed for real-time and “flowy” world model. It includes a directing mode for world building based on text and image prompts and a wandering mode for exploring the resulting world. It can generate 3-minute in-world video clips. Also in April, World Labs, co-founded by Li Fei Fei, unveiled Spark 2.0, an open-source 3D Gaussian splatting rendering engine that targets smartphone-class devices. In June 2026, Nvidia released Cosmos 3, a family of open-weight models. It combines previously independent physical reasoning, world simulation, and action generation. Cosmos 3 integrates can process and generate text, image, video, audio, and action sequences. The model employs a Mixture-of-Transformers" (MoT) approach. An autoregressive (AR) transformer handles reasoning and next-token prediction, while a diffusion transformer (DT) does multimodal generation. Encoders (ViT for vision, VAE for visual/audio, and domain-specific for actions) and generate a shared representation space using 3D multi-dimensional rotary position embedding (mRoPE) for spatial and temporal information. The family includes Cosmos3-Nano (16B parameters) for workstations; Cosmos3-Super (64B parameters) for research. == Architecture == World models process raw sensory data such as video frames or lidar scans. They compress this input into compact latent representations. The system then predicts future representations rather than pixel-by-pixel reconstructions. Many modern world models use joint embedding predictive architecture (JEPA). An encoder turns observations into embeddings. A predictor estimates one or a suite of embeddings from the current one and an action. In some cases a critic chooses one embedding as the best result. A regularizer keeps embeddings well-behaved. The model trains by minimizing prediction error in embedding space. This approach avoids the high cost of generating every detail. Some architectures add explicit components. A fast reactive path handles immediate responses. A slower deliberative path performs longer-horizon planning. Video prediction accuracy or robot success rates are key metrics, but do not always predict real-world performance. Generative world models such as Genie 3 combine these with a simulator. They accept text prompts or layouts and output consistent video, lidar, or three-dimensional scenes. World models often train with self-supervised learning. They use large unlabeled datasets of video or robot interactions. Self-supervised learning can speed learning. Reinforcement learning can fine-tune a model for specific tasks. == Applications == World models support robot learning. Agents train inside simulations and transfer skills to the physical world. This reduces the need for dangerous or expensive real-world trials. Autonomous vehicles use world models to test rare events. Waymo's system simulates tornadoes or unusual pedestrian behavior. Companies train planners without putting vehicles on public roads. Interactive entertainment benefits from world models. Genie 3 lets users generate playable environments from simple descriptions. Game studios prototype levels faster. Scientific simulation gains from these models. Researchers model physical systems or biological processes at scale. Planners in logistics or urban design test strategies inside accurate digital twins. == Comparison with large language models == Both world models and large language models (LLMs) use inferencing on their inputs to make predictions. LLMs operate on textual inputs. They predict the next token in text sequences. They excel at language-oriented tasks such as translation or summarization. However, they lack understanding of physics. World models operate on sensor inputs such as pixels. They predict state changes in that data in latent space. This design supports planning and causal reasoning. LLMs generate fluent text but often fail at consistent physical predictions. Their architecture employs transformers with refinements such as mixture of experts. World models divide an inferencing task into work performed by encoders, predictors, simulators, and other pieces. They typically handle multimodal inputs such as video, lidar, radar, and audio, guided by textual prompting. LLMs power chatbots and code assistants. World models drive embodied agents that act in dynamic environments, such as autonomous driving. The two may be combined in hybrid systems. For example, a LLM handles instructions, while a world model manages low-level control. World model proponents such as LeCun claim that because LLMs are trained only on text, they have no ability to predict anything beyond text, such as real-world events. == Benchmarks == World model benchmarks test physical understanding, long-term consistency, planning, and generalization from sensor data. Meta introduced three benchmarks for V-JEPA 2. IntPhys 2 measures a model's ability to detect physics violations. It presents pairs of videos that diverge when one breaks physical rules. Humans score near 100% accuracy. V-JEPA 2 achieves little better than random chance on many conditions. Minimal Video Pairs (MVPBench) tests physical understanding through multiple-choice questions based on short video clips. It probes object interactions and causality. Something-Something tests action recognition. Epic-Kitchens-100 tests human action anticipation. DeepMind benchmark: Interactive evaluation measures consistency over minutes of interaction, memory of off-screen objects, and response to user actions or text prompts. Waymo benchmark: Output generation quality: Metrics include realism, controllability (via text prompts), and usefulness for training planners in simulated worlds. However, pixel reconstruction error rate with episodic rewards often fails. Other: Epic-Kitchens-100 (often measured with Recall@5) Ego4D 50 Salads, Breakfast, etc. Potential benchmarks: Zero-shot transfer to robots Long-horizon planning Implausible prediction rate

    Read more →
  • Sparkles emoji

    Sparkles emoji

    The Sparkles emoji (U+2728 ✨ SPARKLES) is an emoji that has one large star surrounded by smaller stars. Originating from Japan to represent sparkles used in anime and manga, the sparkles are often used as emphasis in text by surrounding words or phrases with it. It is the third most-used emoji in the world on Twitter as of 2021. Since the early 2020s it has been used by major software companies to represent artificial intelligence, marketing the technology as "like magic". == Development == According to Emojipedia, the Sparkles emoji was first used by Japanese mobile operators SoftBank, Docomo and au in the late 1990s. The emoji was added to Unicode 6.0 in 2010 and Emoji 1.0 in 2015. On some platforms the Sparkles emoji has been multicoloured whilst on other platforms it has been one colour. Twitter and Microsoft's Sparkles have changed from being multicoloured to being a single colour. Samsung's version of the emoji previously had a night sky in the background. == Usage == === Interpersonal communication === The Sparkles emoji was originally meant to represent the usage of sparkles in Japanese anime and manga, where the sparkles are used to represent beauty, happiness or awe. The emoji has several meanings and depends upon context. Starting in the late 2010s, the emoji started being used to surround words or phrases to be used as emphasis, an example from the book Because Internet being "I would simply ✨pass away✨". It can also be used as sarcasm, irony or as a way to mock people. Without emoji this could be represented with tildes or asterisks, for example, "~tildes~" or "~asterisk plus tilde~" or "~~true sparkle exuberance~~". The sparkles emoji can be used to represent stars in text, be used to represent cleanliness or can be used to mean "orgasm" whilst sexting. In September 2021 the Sparkles emoji overtook the Pleading Face (🥺) emoji to become the third most-used emoji in the world according to Emojipedia, with approximately 1 per cent of all tweets containing the Sparkles emoji. === Artificial intelligence === In the early 2020s, the Sparkles emoji started being used as an icon to represent artificial intelligence (AI). Companies who use the emoji this way include Google, OpenAI, Samsung, Microsoft, Adobe, Spotify and Zoom. As of August 2024, seven of the top 10 software companies by market capitalisation use the Sparkles emojis with AI. OpenAI has different versions of the Sparkles for different versions of the models that ChatGPT uses. One explanation is that Sparkles is being used by these companies as a way to market AI as "magic". Marketing technology as "magic" has been used before AI, particularly by Apple. Another explanation given by designers and marketers choosing to use Sparkles to signify AI is simply that other platforms are doing it, making it familiar to users. Around 2024, some of these companies started removing two of the smaller stars from the emoji in their AI services and have kept the one large star, an example being Google's Gemini chatbot. In early 2024, the Nielsen Norman Group provided test subjects with the star in isolation and found that people did not associate the symbol with AI, but instead mostly with "optimisation" or "favourite or save an item".

    Read more →
  • Connectionist expert system

    Connectionist expert system

    Connectionist expert systems are artificial neural network (ANN) based expert systems where the ANN generates inferencing rules e.g., fuzzy-multi layer perceptron where linguistic and natural form of inputs are used. Apart from that, rough set theory may be used for encoding knowledge in the weights better and also genetic algorithms may be used to optimize the search solutions better. Symbolic reasoning methods may also be incorporated (see hybrid intelligent system). (Also see expert system, neural network, clinical decision support system.)

    Read more →
  • Charge-coupled device

    Charge-coupled device

    A charge-coupled device (CCD) is an integrated circuit containing an array of linked, or coupled, capacitors. Under the control of an external circuit, each capacitor can transfer its electric charge to a neighboring capacitor. CCD sensors are a major technology used in digital imaging. In a CCD image sensor, pixels are represented by p-doped metal–oxide–semiconductor (MOS) capacitors. These MOS capacitors, the basic building blocks of a CCD, are biased above the threshold for inversion when image acquisition begins, allowing the conversion of incoming photons into electron charges at the semiconductor-oxide interface; the CCD is then used to read out these charges. Although CCDs are not the only technology to allow for light detection, CCD image sensors are widely used in professional, medical, and scientific applications where high-quality image data are required. In applications with less exacting quality demands, such as consumer and professional digital cameras, active pixel sensors, also known as CMOS sensors (complementary MOS sensors), are generally used. However, the large quality advantage CCDs enjoyed early on has narrowed over time and since the late 2010s CMOS sensors are the dominant technology, having largely if not completely replaced CCD image sensors. == History == The basis for the CCD is the metal–oxide–semiconductor (MOS) structure, with MOS capacitors being the basic building blocks of a CCD, and a depleted MOS structure used as the photodetector in early CCD devices. In the late 1960s, Willard Boyle and George E. Smith at Bell Labs were researching MOS technology while working on semiconductor bubble memory. They realized that an electric charge was the analog of the magnetic bubble and that it could be stored on a tiny MOS capacitor. As it was fairly straightforward to fabricate a series of MOS capacitors in a row, they connected a suitable voltage to them so that the charge could be stepped along from one to the next. This led to the invention of the charge-coupled device by Boyle and Smith in 1969. They conceived of the design of what they termed, in their notebook, "Charge 'Bubble' Devices". The initial paper describing the concept in April 1970 listed possible uses as memory, a delay line, and an imaging device. The device could also be used as a shift register. The essence of the design was the ability to transfer charge along the surface of a semiconductor from one storage capacitor to the next. The first experimental device demonstrating the principle was a row of closely spaced metal squares on an oxidized silicon surface electrically accessed by wire bonds. It was demonstrated by Gil Amelio, Michael Francis Tompsett and George Smith in April 1970. This was the first experimental application of the CCD in image sensor technology, and used a depleted MOS structure as the photodetector. The first patent (U.S. patent 4,085,456) on the application of CCDs to imaging was assigned to Tompsett, who filed the application in 1971. The first working CCD made with integrated circuit technology was a simple 8-bit shift register, reported by Tompsett, Amelio and Smith in August 1970. This device had input and output circuits and was used to demonstrate its use as a shift register and as a crude eight pixel linear imaging device. Development of the device progressed at a rapid rate. By 1971, Bell researchers led by Michael Tompsett were able to capture images with simple linear devices. Several companies, including Fairchild Semiconductor, RCA and Texas Instruments, picked up on the invention and began development programs. Fairchild's effort, led by ex-Bell researcher Gil Amelio, was the first with commercial devices, and by 1974 had a linear 500-element device and a 2D 100 × 100 pixel device. Peter L. P. Dillon, a scientist at Kodak Research Labs, invented the first color CCD image sensor by overlaying a color filter array on this Fairchild 100 x 100 pixel Interline CCD starting in 1974. Steven Sasson, an electrical engineer working for the Kodak Apparatus Division, invented a digital still camera using this same Fairchild 100 × 100 CCD in 1975. The interline transfer (ILT) CCD device was proposed by L. Walsh and R. Dyck at Fairchild in 1973 to reduce smear and eliminate a mechanical shutter. To further reduce smear from bright light sources, the frame-interline-transfer (FIT) CCD architecture was developed by K. Horii, T. Kuroda and T. Kunii at Matsushita (now Panasonic) in 1981. The first KH-11 KENNEN reconnaissance satellite equipped with charge-coupled device array (800 × 800 pixels) technology for imaging was launched in December 1976. Under the leadership of Kazuo Iwama, Sony started a large development effort on CCDs involving a significant investment. Eventually, Sony managed to mass-produce CCDs for their camcorders. Before this happened, Iwama died in August 1982. Subsequently, a CCD chip was placed on his tombstone to acknowledge his contribution. The first mass-produced consumer CCD video camera, the CCD-G5, was released by Sony in 1983, based on a prototype developed by Yoshiaki Hagiwara in 1981. Early CCD sensors suffered from shutter lag. This was largely resolved with the invention of the pinned photodiode (PPD). It was invented by Nobukazu Teranishi, Hiromitsu Shiraki and Yasuo Ishihara at NEC in 1980. They recognized that lag can be eliminated if the signal carriers could be transferred from the photodiode to the CCD. This led to their invention of the pinned photodiode, a photodetector structure with low lag, low noise, high quantum efficiency and low dark current. It was first publicly reported by Teranishi and Ishihara with A. Kohono, E. Oda and K. Arai in 1982, with the addition of an anti-blooming structure. The new photodetector structure invented at NEC was given the name "pinned photodiode" (PPD) by B.C. Burkey at Kodak in 1984. In 1987, the PPD began to be incorporated into most CCD devices, becoming a fixture in consumer electronic video cameras and then digital still cameras. Since then, the PPD has been used in nearly all CCD sensors and then CMOS sensors. In January 2006, Boyle and Smith were awarded the National Academy of Engineering Charles Stark Draper Prize, and in 2009 they were awarded the Nobel Prize for Physics for their invention of the CCD concept. Michael Tompsett was awarded the 2010 National Medal of Technology and Innovation, for pioneering work and electronic technologies including the design and development of the first CCD imagers. He was also awarded the 2012 IEEE Edison Medal for "pioneering contributions to imaging devices including CCD Imagers, cameras and thermal imagers". == Basics of operation == In a CCD for capturing images, there is a photoactive region (an epitaxial layer of silicon), and a transmission region made out of a shift register (the CCD, properly speaking). An image is projected through a lens onto the capacitor array (the photoactive region), causing each capacitor to accumulate an electric charge proportional to the light intensity at that location. A one-dimensional array, used in line-scan cameras, captures a single slice of the image, whereas a two-dimensional array, used in video and still cameras, captures a two-dimensional picture corresponding to the scene projected onto the focal plane of the sensor. Once the array has been exposed to the image, a control circuit causes each capacitor to transfer its contents to its neighbor (operating as a shift register). The last capacitor in the array dumps its charge into a charge amplifier, which converts the charge into a voltage. By repeating this process, the controlling circuit converts the entire contents of the array in the semiconductor to a sequence of voltages. In a digital device, these voltages are then sampled, digitized, and usually stored in memory; in an analog device (such as an analog video camera), they are processed into a continuous analog signal (e.g. by feeding the output of the charge amplifier into a low-pass filter), which is then processed and fed out to other circuits for transmission, recording, or other processing. == Detailed physics of operation == === Charge generation === Before the MOS capacitors are exposed to light, they are biased into the depletion region; in n-channel CCDs, the silicon under the bias gate is slightly p-doped or intrinsic. The gate is then biased at a positive potential, above the threshold for strong inversion, which will eventually result in the creation of an n channel below the gate as in a MOSFET. However, it takes time to reach this thermal equilibrium: up to hours in high-end scientific cameras cooled at low temperature. Initially after biasing, the holes are pushed far into the substrate, and no mobile electrons are at or near the surface; the CCD thus operates in a non-equilibrium state called deep depletion. Then, when electron–hole pairs are generated in the depletion region, they are separated by the electric field, the elec

    Read more →
  • Coupled pattern learner

    Coupled pattern learner

    Coupled Pattern Learner (CPL) is a machine learning algorithm which couples the semi-supervised learning of categories and relations to forestall the problem of semantic drift associated with boot-strap learning methods. == Coupled Pattern Learner == Semi-supervised learning approaches using a small number of labeled examples with many unlabeled examples are usually unreliable as they produce an internally consistent, but incorrect set of extractions. CPL solves this problem by simultaneously learning classifiers for many different categories and relations in the presence of an ontology defining constraints that couple the training of these classifiers. It was introduced by Andrew Carlson, Justin Betteridge, Estevam R. Hruschka Jr. and Tom M. Mitchell in 2009. == CPL overview == CPL is an approach to semi-supervised learning that yields more accurate results by coupling the training of many information extractors. Basic idea behind CPL is that semi-supervised training of a single type of extractor such as ‘coach’ is much more difficult than simultaneously training many extractors that cover a variety of inter-related entity and relation types. Using prior knowledge about the relationships between these different entities and relations CPL makes unlabeled data as a useful constraint during training. For e.g., ‘coach(x)’ implies ‘person(x)’ and ‘not sport(x)’. == CPL description == === Coupling of predicates === CPL primarily relies on the notion of coupling the learning of multiple functions so as to constrain the semi-supervised learning problem. CPL constrains the learned function in two ways. Sharing among same-arity predicates according to logical relations Relation argument type-checking === Sharing among same-arity predicates === Each predicate P in the ontology has a list of other same-arity predicates with which P is mutually exclusive. If A is mutually exclusive with predicate B, A’s positive instances and patterns become negative instances and negative patterns for B. For example, if ‘city’, having an instance ‘Boston’ and a pattern ‘mayor of arg1’, is mutually exclusive with ‘scientist’, then ‘Boston’ and ‘mayor of arg1’ will become a negative instance and a negative pattern respectively for ‘scientist.’ Further, Some categories are declared to be a subset of another category. For e.g., ‘athlete’ is a subset of ‘person’. === Relation argument type-checking === This is a type checking information used to couple the learning of relations and categories. For example, the arguments of the ‘ceoOf’ relation are declared to be of the categories ‘person’ and ‘company’. CPL does not promote a pair of noun phrases as an instance of a relation unless the two noun phrases are classified as belonging to the correct argument types. === Algorithm description === Following is a quick summary of the CPL algorithm. Input: An ontology O, and a text corpus C Output: Trusted instances/patterns for each predicate for i=1,2,...,∞ do foreach predicate p in O do EXTRACT candidate instances/contextual patterns using recently promoted patterns/instances; FILTER candidates that violate coupling; RANK candidate instances/patterns; PROMOTE top candidates; end end ==== Inputs ==== A large corpus of Part-Of-Speech tagged sentences and an initial ontology with predefined categories, relations, mutually exclusive relationships between same-arity predicates, subset relationships between some categories, seed instances for all predicates, and seed patterns for the categories. ==== Candidate extraction ==== CPL finds new candidate instances by using newly promoted patterns to extract the noun phrases that co-occur with those patterns in the text corpus. CPL extracts, Category Instances Category Patterns Relation Instances Relation Patterns ==== Candidate filtering ==== Candidate instances and patterns are filtered to maintain high precision, and to avoid extremely specific patterns. An instance is only considered for assessment if it co-occurs with at least two promoted patterns in the text corpus, and if its co-occurrence count with all promoted patterns is at least three times greater than its co-occurrence count with negative patterns. ==== Candidate ranking ==== CPL ranks candidate instances using the number of promoted patterns that they co-occur with so that candidates that occur with more patterns are ranked higher. Patterns are ranked using an estimate of the precision of each pattern. ==== Candidate promotion ==== CPL ranks the candidates according to their assessment scores and promotes at most 100 instances and 5 patterns for each predicate. Instances and patterns are only promoted if they co-occur with at least two promoted patterns or instances, respectively. == Meta-Bootstrap Learner == Meta-Bootstrap Learner (MBL) was also proposed by the authors of CPL. Meta-Bootstrap learner couples the training of multiple extraction techniques with a multi-view constraint, which requires the extractors to agree. It makes addition of coupling constraints on top of existing extraction algorithms, while treating them as black boxes, feasible. MBL assumes that the errors made by different extraction techniques are independent. Following is a quick summary of MBL. Input: An ontology O, a set of extractors ε Output: Trusted instances for each predicate for i=1,2,...,∞ do foreach predicate p in O do foreach extractor e in ε do Extract new candidates for p using e with recently promoted instances; end FILTER candidates that violate mutual-exclusion or type-checking constraints; PROMOTE candidates that were extracted by all extractors; end end Subordinate algorithms used with MBL do not promote any instance on their own, they report the evidence about each candidate to MBL and MBL is responsible for promoting instances. == Applications == In their paper authors have presented results showing the potential of CPL to contribute new facts to existing repository of semantic knowledge, Freebase

    Read more →
  • Action model learning

    Action model learning

    Action model learning (sometimes abbreviated action learning) is an area of machine learning concerned with the creation and modification of a software agent's knowledge about the effects and preconditions of the actions that can be executed within its environment. This knowledge is usually represented in a logic-based action description language and used as input for automated planners. Learning action models is important when goals change. When an agent acted for a while, it can use its accumulated knowledge about actions in the domain to make better decisions. Thus, learning action models differs from reinforcement learning. It enables reasoning about actions instead of expensive trials in the world. Action model learning is a form of inductive reasoning, where new knowledge is generated based on the agent's observations. The usual motivation for action model learning is the fact that manual specification of action models for planners is often a difficult, time-consuming, and error-prone task (especially in complex environments). == Action models == Given a training set E {\displaystyle E} consisting of examples e = ( s , a , s ′ ) {\displaystyle e=(s,a,s')} , where s , s ′ {\displaystyle s,s'} are observations of a world state from two consecutive time steps t , t ′ {\displaystyle t,t'} and a {\displaystyle a} is an action instance observed in time step t {\displaystyle t} , the goal of action model learning in general is to construct an action model ⟨ D , P ⟩ {\displaystyle \langle D,P\rangle } , where D {\displaystyle D} is a description of domain dynamics in action description formalism like STRIPS, ADL or PDDL and P {\displaystyle P} is a probability function defined over the elements of D {\displaystyle D} . However, many state of the art action learning methods assume determinism and do not induce P {\displaystyle P} . In addition to determinism, individual methods differ in how they deal with other attributes of domain (e.g. partial observability or sensoric noise). == Action learning methods == === State of the art === Recent action learning methods take various approaches and employ a wide variety of tools from different areas of artificial intelligence and computational logic. As an example of a method based on propositional logic, we can mention SLAF (Simultaneous Learning and Filtering) algorithm, which uses agent's observations to construct a long propositional formula over time and subsequently interprets it using a satisfiability (SAT) solver. Another technique, in which learning is converted into a satisfiability problem (weighted MAX-SAT in this case) and SAT solvers are used, is implemented in ARMS (Action-Relation Modeling System). Two mutually similar, fully declarative approaches to action learning were based on logic programming paradigm Answer Set Programming (ASP) and its extension, Reactive ASP. In another example, bottom-up inductive logic programming approach was employed. Several different solutions are not directly logic-based. For example, the action model learning using a perceptron algorithm or the multi level greedy search over the space of possible action models. In the older paper from 1992, the action model learning was studied as an extension of reinforcement learning. Nonetheless, further algorithms can be found that operate under different assumptions: FAMA can work even when some observations are missing, and it produces a general (lifted) planning model. It treats learning an action model like a planning problem, making sure the learned model matches the observations given. NOLAM can learn general action models even from noisy or imperfect data. LOCM focuses only on the order of actions in the data, ignoring any details about the states between those actions. The family of safe action model (SAM) learning methods create models that guarantee any plans made with them will actually work in the real world. There's also an extension called N-SAM that can learn action models with numeric conditions and effects. Additionally, numeric action models like N-SAM can be used to improve reinforcement learning (RL) performance through the RAMP algorithm. === Literature === Most action learning research papers are published in journals and conferences focused on artificial intelligence in general (e.g. Journal of Artificial Intelligence Research (JAIR), Artificial Intelligence, Applied Artificial Intelligence (AAI) or AAAI conferences). Despite mutual relevance of the topics, action model learning is usually not addressed in planning conferences like the International Conference on Automated Planning and Scheduling (ICAPS).

    Read more →
  • Artificial intelligence in spirituality

    Artificial intelligence in spirituality

    Some users of artificial intelligence (AI) technologies, especially chatbots, may develop beliefs that AI has or can attain supernatural or spiritual powers. AI models such as ChatGPT are turned to for fortune telling, mysticism and remote viewing. Recent and sudden advances in large language models have led to folk myths about their origin or capabilities, as well as their deification or worship by some users. Tucker Carlson has made similar claims, including directly to Sam Altman. Pope Leo XIV advised priests against using LLM models when it came to the creation of sermons.

    Read more →
  • FreshBooks

    FreshBooks

    FreshBooks is accounting software operated by 2ndSite Inc. primarily for small and medium-sized businesses. It is a web-based software as a service (SaaS) model, that can be accessed through a desktop or mobile device. The company was founded in 2003 and is based in Toronto, Canada. == History == FreshBooks was founded in 2004 by Mike McDerment, Levi Cooperman, and Joe Sawada in Toronto, Ontario. McDerment incorporated a second company, BillSpring in January 2015 to work on new product development. It was rolled back into FreshBooks as an updated interface in 2016. Initially FreshBooks functioned like an electronic invoicing program targeting IT professionals. After the release of the new interface, the initial release of FreshBooks was referred to as "FreshBooks Classic." FreshBooks Classic was discontinued in 2022 after migrating users to the new platform. FreshBooks Classic's front-end application was built in PHP, and the backend services were built in Python while the new FreshBooks uses the same backend services with a JavaScript single-page application. == Product == FreshBooks is a subscription-based accounting software platform that provides features such as invoicing, accounts payable, expense and time tracking, retainers, fixed asset depreciation, purchase orders, payroll integrations, mileage tracking, double-entry accounting, and standard business reporting. Financial data is stored in the cloud on a unified ledger, enabling access from desktop and mobile devices. The platform includes a free API for integration with external applications and supports multiple tax rates and currencies. It also offers project management and payroll functionalities. Pricing is based on a recurring monthly fee. FreshBooks supports country-specific tax calculations, including GST and HST in Canada, sales taxes in the United States, and MTD compliance in the UK. == Operations == FreshBooks has its headquarters in Toronto, Canada with operations in North America, Europe and Australia. Founder Mike McDerment was the chief executive officer of the company from 2003 until 2021, when he stepped down and was replaced by Don Epperson, but stayed as the executive chair. Don Epperson had previously joined FreshBooks as executive director in 2019. == Funding == FreshBooks was initially self-funded. In 2014, the company raised a Series A venture investment of $30 million led by the venture capital firm Oak Investment Partners, with participation by Georgian Partners and Atlas Venture. In 2017, FreshBooks announced that it raised another $43 million in funding from Accomplice, Georgian Partners and Oak Investment Partners. On August 10, 2021, FreshBooks announced that it had secured $80.75 million in Series E funding and $50 million in debt financing. FreshBooks also reached a valuation of more than $1 billion.

    Read more →
  • Automation in construction

    Automation in construction

    Automation in construction is the combination of methods, processes, and systems that allow for greater machine autonomy in construction activities. Construction automation may have multiple goals, including but not limited to, reducing jobsite injuries, decreasing activity completion times, and assisting with quality control and quality assurance. Some systems may be fielded as a direct response to increasing skilled labor shortages in some countries. Opponents claim that increased automation may lead to less construction jobs and that software leaves heavy equipment vulnerable to hackers. Research insights on this subject are today published in several journals such as Automation in Construction by Elsevier. == Uses of automation in construction == Equipment control and management: Automation can be used to control and monitor construction equipment, such as cranes, excavators, and bulldozers. Material handling: Automated systems can be used to handle, transport, and place materials such as concrete, bricks, and stones. Surveying: Automated survey equipment and drones can be used to collect and analyze data on construction sites. Quality control: Automated systems can be used to monitor and control the quality of materials and construction processes. Safety management: Automated systems can be used to monitor and control safety conditions on construction sites. Scheduling and planning: Automated systems can be used to manage schedules, resources, and costs. Waste management: Automated systems can be used to manage and dispose of waste materials generated during construction. 3D printing: Automated 3D printing can be used to create prototypes, models, and even full-scale building components. == Autonomous heavy equipment == Advances in sensors, machine learning, and autonomous vehicle technology have led to the development of self-operating construction equipment and retrofit systems designed to automate excavators, bulldozers, tracked loaders, skid steer loaders, and haul trucks, allowing them to perform tasks with limited human supervision. Since 2017, tech companies have developed autonomous or semi-autonomous retrofit kits that can be installed on existing construction machinery. Examples include Bedrock Robotics, Built Robotics, and SafeAI, which develop sensor and software systems that enable excavators and other earthmoving machines to operate with varying degrees of autonomy. Major equipment manufacturers have also introduced autonomous capabilities: Caterpillar and John Deere have developed autonomous or semi-autonomous systems for construction and mining equipment, including haul trucks and earthmoving machines. == Transportation сonstruction == Kratos Defense & Security Solutions fielded the world’s first Autonomous Truck-Mounted Attenuator (ATMA) in 2017, in conjunction with Royal Truck & Equipment. == Benefits of automation in construction == The use of automation in construction has become increasingly prevalent in recent years due to its numerous benefits. Automation in construction refers to the use of machinery, software, and other technologies to perform tasks that were previously done manually by workers. One of the most significant benefits of automation in construction is increased productivity. Automation can help speed up construction processes, reduce project completion times, and improve overall efficiency. For example, using automated machinery for tasks such as concrete pouring, bricklaying, and welding can significantly increase the speed and accuracy of these tasks, allowing for more work to be completed in a shorter amount of time. Another benefit of automation in construction is improved safety. By automating tasks that are hazardous to workers, such as demolition or working at height, companies can reduce the risk of accidents and injuries on site. Automation can also help to reduce worker fatigue, which can be a significant factor in accidents and mistakes. Overall, the use of automation in construction can improve productivity, reduce costs, increase safety, and improve the quality of construction projects. As technology continues to advance, the use of automation is likely to become even more prevalent in the construction industry.

    Read more →
  • Histogram of oriented displacements

    Histogram of oriented displacements

    Histogram of oriented displacements (HOD) is a 2D trajectory descriptor. The trajectory is described using a histogram of the directions between each two consecutive points. Given a trajectory T = {P1, P2, P3, ..., Pn}, where Pt is the 2D position at time t. For each pair of positions Pt and Pt+1, calculate the direction angle θ(t, t+1). Value of θ is between 0 and 360. A histogram of the quantized values of θ is created. If the histogram is of 8 bins, the first bin represents all θs between 0 and 45. The histogram accumulates the lengths of the consecutive moves. For each θ, a specific histogram bin is determined. The length of the line between Pt and Pt+1 is then added to the specific histogram bin. To show the intuition behind the descriptor, consider the action of waving hands. At the end of the action, the hand falls down. When describing this down movement, the descriptor does not care about the position from which the hand started to fall. This fall will affect the histogram with the appropriate angles and lengths, regardless of the position where the hand started to fall. HOD records for each moving point: how much it moves in each range of directions. HOD has a clear physical interpretation. It proposes that, a simple way to describe the motion of an object, is to indicate how much distance it moves in each direction. If the movement in all directions are saved accurately, the movement can be repeated from the initial position to the final destination regardless of the displacements order. However, the temporal information will be lost, as the order of movements is not stored-this is what we solve by applying the temporal pyramid, as shown in section \ref{sec:temp-pyramid}. If the angles quantization range is small, classifiers that use the descriptor will overfit. Generalization needs some slack in directions-which can be done by increasing the quantization range.

    Read more →