Alex Krizhevsky

Alex Krizhevsky

Alex Krizhevsky is a Canadian computer scientist most noted for his work on artificial neural networks and deep learning. In 2012, Krizhevsky, Ilya Sutskever and their PhD advisor Geoffrey Hinton, at the University of Toronto, developed a powerful visual-recognition network AlexNet using only two GeForce-branded GPU cards. This revolutionized research in neural networks. Previously neural networks were trained on CPUs. The transition to GPUs opened the way to the development of advanced AI models. == AlexNet == Motivated by Sutskever and inspired by Hinton, Krizhevsky developed AlexNet to expand the limits in image recognition and classification. Building on Convolutional Neural Networks and Sutskever’s Deep Neural Network approach of deepening the neural layers far beyond the convention of the time—as well as adding Dropout for training resilience—AlexNet won the ImageNet challenge in 2012. The team presented their paper for AlexNet at NeurIPS (NIPS) 2012. Shortly after AlexNet’s debut, Krizhevsky and Sutskever sold their startup, DNN Research Inc., to Google. Krizhevsky left Google in September 2017 after losing interest in the work, to work at the company Dessa in support of new deep-learning techniques. Many of his numerous papers on machine learning and computer vision are frequently cited by other researchers. He is also the main author of the CIFAR-10 and CIFAR-100 datasets. == Legacy == AlexNet is widely credited with igniting the deep learning revolution. Its success demonstrated the effectiveness of deep neural networks trained on GPUs, leading to rapid progress across multiple domains of artificial intelligence beyond computer vision. The techniques and momentum generated by AlexNet helped shape the development of modern natural language processing models, including large-scale transformer-based models such as BERT and GPT, which power tools like ChatGPT.

AI Mode

AI Mode is a search feature used within Google Search. In March 2025, Google introduced an experimental "AI Mode" within its search platform, enabling users to input complex, multi-part queries and receive comprehensive, AI-generated responses. This feature uses Google's Gemini model, which enhances the system's reasoning capabilities and supports multimodal inputs, including text, images, and voice. Users need to be signed in to be able to use the image generation features. Initially, AI Mode was available to Google One AI Premium subscribers in the United States, who could access it through the Search Labs platform. This phased rollout allowed Google to gather user feedback and refine the feature before a broader release.

Apache OpenNLP

The Apache OpenNLP library is a machine learning based toolkit for the processing of natural language text. It supports the most common NLP tasks, such as language detection, tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, chunking, parsing and coreference resolution. These tasks are usually required to build more advanced text processing services.

Geometric hashing

In computer science, geometric hashing is a method for efficiently finding two-dimensional objects represented by discrete points that have undergone an affine transformation, though extensions exist to other object representations and transformations. In an off-line step, the objects are encoded by treating each pair of points as a geometric basis. The remaining points can be represented in an invariant fashion with respect to this basis using two parameters. For each point, its quantized transformed coordinates are stored in the hash table as a key, and indices of the basis points as a value. Then a new pair of basis points is selected, and the process is repeated. In the on-line (recognition) step, randomly selected pairs of data points are considered as candidate bases. For each candidate basis, the remaining data points are encoded according to the basis and possible correspondences from the object are found in the previously constructed table. The candidate basis is accepted if a sufficiently large number of the data points index a consistent object basis. Geometric hashing was originally suggested in computer vision for object recognition in 2D and 3D, but later was applied to different problems such as structural alignment of proteins. == Geometric hashing in computer vision == Geometric hashing is a method used for object recognition. Let’s say that we want to check if a model image can be seen in an input image. This can be accomplished with geometric hashing. The method could be used to recognize one of the multiple objects in a base, in this case the hash table should store not only the pose information but also the index of object model in the base. === Example === For simplicity, this example will not use too many point features and assume that their descriptors are given by their coordinates only (in practice local descriptors such as SIFT could be used for indexing). ==== Training Phase ==== Find the model's feature points. Assume that 5 feature points are found in the model image with the coordinates ( 12 , 17 ) ; {\displaystyle (12,17);} ( 45 , 13 ) ; {\displaystyle (45,13);} ( 40 , 46 ) ; {\displaystyle (40,46);} ( 20 , 35 ) ; {\displaystyle (20,35);} ( 35 , 25 ) {\displaystyle (35,25)} , see the picture. Introduce a basis to describe the locations of the feature points. For 2D space and similarity transformation the basis is defined by a pair of points. The point of origin is placed in the middle of the segment connecting the two points (P2, P4 in our example), the x ′ {\displaystyle x'} axis is directed towards one of them, the y ′ {\displaystyle y'} is orthogonal and goes through the origin. The scale is selected such that absolute value of x ′ {\displaystyle x'} for both basis points is 1. Describe feature locations with respect to that basis, i.e. compute the projections to the new coordinate axes. The coordinates should be discretised to make recognition robust to noise, we take the bin size 0.25. We thus get the coordinates ( − 0.75 , − 1.25 ) ; {\displaystyle (-0.75,-1.25);} ( 1.00 , 0.00 ) ; {\displaystyle (1.00,0.00);} ( − 0.50 , 1.25 ) ; {\displaystyle (-0.50,1.25);} ( − 1.00 , 0.00 ) ; {\displaystyle (-1.00,0.00);} ( 0.00 , 0.25 ) {\displaystyle (0.00,0.25)} Store the basis in a hash table indexed by the features (only transformed coordinates in this case). If there were more objects to match with, we should also store the object number along with the basis pair. Repeat the process for a different basis pair (Step 2). It is needed to handle occlusions. Ideally, all the non-colinear pairs should be enumerated. We provide the hash table after two iterations, the pair (P1, P3) is selected for the second one. Hash Table: Most hash tables cannot have identical keys mapped to different values. So in real life one won’t encode basis keys (1.0, 0.0) and (-1.0, 0.0) in a hash table. ==== Recognition Phase ==== Find interesting feature points in the input image. Choose an arbitrary basis. If there isn't a suitable arbitrary basis, then it is likely that the input image does not contain the target object. Describe coordinates of the feature points in the new basis. Quantize obtained coordinates as it was done before. Compare all the transformed point features in the input image with the hash table. If the point features are identical or similar, then increase the count for the corresponding basis (and the type of object, if any). For each basis such that the count exceeds a certain threshold, verify the hypothesis that it corresponds to an image basis chosen in Step 2. Transfer the image coordinate system to the model one (for the supposed object) and try to match them. If successful, the object is found. Otherwise, go back to Step 2. === Finding mirrored pattern === It seems that this method is only capable of handling scaling, translation, and rotation. However, the input image may contain the object in mirror transform. Therefore, geometric hashing should be able to find the object, too. There are two ways to detect mirrored objects. For the vector graph, make the left side positive, and the right side negative. Multiplying the x position by -1 will give the same result. Use 3 points for the basis. This allows detecting mirror images (or objects). Actually, using 3 points for the basis is another approach for geometric hashing. === Geometric hashing in higher-dimensions === Similar to the example above, hashing applies to higher-dimensional data. For three-dimensional data points, three points are also needed for the basis. The first two points define the x-axis, and the third point defines the y-axis (with the first point). The z-axis is perpendicular to the created axis using the right-hand rule. Notice that the order of the points affects the resulting basis

Recursive transition network

A recursive transition network ("RTN") is a graph theoretical schematic used to represent the rules of a context-free grammar. RTNs have application to programming languages, natural language and lexical analysis. Any sentence that is constructed according to the rules of an RTN is said to be "well-formed". The structural elements of a well-formed sentence may also be well-formed sentences by themselves, or they may be simpler structures. This is why RTNs are described as recursive. == Notes and references ==

PNGOUT

PNGOUT is a freeware command line optimizer for PNG images written by Ken Silverman. The transformation is lossless, meaning that the resulting image is visually identical to the source image. According to its author, this program can often get higher compression than other optimizers by 5–10%. It is possible to compress some inflated PNGs to a size below 1% of the original file. PNGOUT was also available as a plug-in for the freeware image viewer IrfanView and can be enabled as an option when saving files. It allows editing of various PNGOUT settings via a dialog box. PNGOUT integration was removed in IrfanView version 4.58 in favour of OptiPNG. In 2006, a commercial version of PNGOUT with a graphical user interface, known as PNGOUTWin, was released by Ardfry Imaging, a small company Silverman co-founded in 2005. There is also a freeware GUI frontend to PNGOUT available, known as PNGGauntlet. == Main operation == The main function of PNGOUT is to reduce the size of image data contained in the IDAT chunk. This chunk is compressed using the deflate algorithm. Deflate algorithms can vary in speed and compression ratio, with higher compression ratios generally implying lower speed. Ken Silverman wrote a deflate compressor for PNGOUT that is slower than the ones used in most graphics software, but produces smaller files. PNGOUT also performs automatic bit depth, color, and palette reduction where appropriate.

Brave Leo

Brave Leo is a large language model-based chatbot developed by Brave Software and included with the Brave browser. == History == In November 2023, the company said versions for iOS and Android would be available "in the coming months". == Features == Since January 2024, Leo has used the open-source Mixtral 8x7B from Mistral AI as its default large language model, in addition to LLaMA 2 from Meta Platforms and Claude from Anthropic, both of which have been used previously. Leo can suggest follow-up questions, and summarize webpages, PDFs, and videos. Leo has a $15 (US) per month premium version that enables more requests and uses larger LLMs. == Privacy == The answers given by Leo are not saved. Brave uses the slogan Love Privacy to emphasize its focus on user privacy and data protection. The phrase has been featured in Brave's official marketing campaigns and has been cited in media coverage of the browser's privacy-first approach. == Controversies == In 2023, PC World reported that Leo evades questions about US elections.