Kernel method

Kernel method

In machine learning, kernel machines are a class of algorithms for pattern analysis, whose best known member is the support-vector machine (SVM). These methods involve using linear classifiers to solve nonlinear problems. The general task of pattern analysis is to find and study general types of relations (for example clusters, rankings, principal components, correlations, classifications) in datasets. For many algorithms that solve these tasks, the data in raw representation have to be explicitly transformed into feature vector representations via a user-specified feature map: in contrast, kernel methods require only a user-specified kernel, i.e., a similarity function over all pairs of data points computed using inner products. The feature map in kernel machines is infinite dimensional but only requires a finite dimensional matrix from user-input according to the representer theorem. Kernel machines are slow to compute for datasets larger than a couple of thousand examples without parallel processing. Kernel methods owe their name to the use of kernel functions, which enable them to operate in a high-dimensional, implicit feature space without ever computing the coordinates of the data in that space, but rather by simply computing the inner products between the images of all pairs of data in the feature space. This operation is often computationally cheaper than the explicit computation of the coordinates. This approach is called the "kernel trick". Kernel functions have been introduced for sequence data, graphs, text, images, as well as vectors. Algorithms capable of operating with kernels include the kernel perceptron, support-vector machines (SVM), Gaussian processes, principal components analysis (PCA), canonical correlation analysis, ridge regression, spectral clustering, linear adaptive filters and many others. Most kernel algorithms are based on convex optimization or eigenproblems and are statistically well-founded. Typically, their statistical properties are analyzed using statistical learning theory (for example, using Rademacher complexity). == Motivation and informal explanation == Kernel methods can be thought of as instance-based learners: rather than learning some fixed set of parameters corresponding to the features of their inputs, they instead "remember" the i {\displaystyle i} -th training example ( x i , y i ) {\displaystyle (\mathbf {x} _{i},y_{i})} and learn for it a corresponding weight w i {\displaystyle w_{i}} . Prediction for unlabeled inputs, i.e., those not in the training set, are treated by the application of a similarity function k {\displaystyle k} , called a kernel, between the unlabeled input x ′ {\displaystyle \mathbf {x'} } and each of the training inputs x i {\displaystyle \mathbf {x} _{i}} . For instance, a kernelized binary classifier typically computes a weighted sum of similarities y ^ = sgn ⁡ ∑ i = 1 n w i y i k ( x i , x ′ ) , {\displaystyle {\hat {y}}=\operatorname {sgn} \sum _{i=1}^{n}w_{i}y_{i}k(\mathbf {x} _{i},\mathbf {x'} ),} where y ^ ∈ { − 1 , + 1 } {\displaystyle {\hat {y}}\in \{-1,+1\}} is the kernelized binary classifier's predicted label for the unlabeled input x ′ {\displaystyle \mathbf {x'} } whose hidden true label y {\displaystyle y} is of interest; k : X × X → R {\displaystyle k\colon {\mathcal {X}}\times {\mathcal {X}}\to \mathbb {R} } is the kernel function that measures similarity between any pair of inputs x , x ′ ∈ X {\displaystyle \mathbf {x} ,\mathbf {x'} \in {\mathcal {X}}} ; the sum ranges over the n labeled examples { ( x i , y i ) } i = 1 n {\displaystyle \{(\mathbf {x} _{i},y_{i})\}_{i=1}^{n}} in the classifier's training set, with y i ∈ { − 1 , + 1 } {\displaystyle y_{i}\in \{-1,+1\}} ; the w i ∈ R {\displaystyle w_{i}\in \mathbb {R} } are the weights for the training examples, as determined by the learning algorithm; the sign function sgn {\displaystyle \operatorname {sgn} } determines whether the predicted classification y ^ {\displaystyle {\hat {y}}} comes out positive or negative. Kernel classifiers were described as early as the 1960s, with the invention of the kernel perceptron. They rose to great prominence with the popularity of the support-vector machine (SVM) in the 1990s, when the SVM was found to be competitive with neural networks on tasks such as handwriting recognition. == Mathematics: the kernel trick == The kernel trick avoids the explicit mapping that is needed to get linear learning algorithms to learn a nonlinear function or decision boundary. For all x {\displaystyle \mathbf {x} } and x ′ {\displaystyle \mathbf {x'} } in the input space X {\displaystyle {\mathcal {X}}} , certain functions k ( x , x ′ ) {\displaystyle k(\mathbf {x} ,\mathbf {x'} )} can be expressed as an inner product in another space V {\displaystyle {\mathcal {V}}} . The function k : X × X → R {\displaystyle k\colon {\mathcal {X}}\times {\mathcal {X}}\to \mathbb {R} } is often referred to as a kernel or a kernel function. The word "kernel" is used in mathematics to denote a weighting function for a weighted sum or integral. Certain problems in machine learning have more structure than an arbitrary weighting function k {\displaystyle k} . The computation is made much simpler if the kernel can be written in the form of a "feature map" φ : X → V {\displaystyle \varphi \colon {\mathcal {X}}\to {\mathcal {V}}} which satisfies k ( x , x ′ ) = ⟨ φ ( x ) , φ ( x ′ ) ⟩ V . {\displaystyle k(\mathbf {x} ,\mathbf {x'} )=\langle \varphi (\mathbf {x} ),\varphi (\mathbf {x'} )\rangle _{\mathcal {V}}.} The key restriction is that ⟨ ⋅ , ⋅ ⟩ V {\displaystyle \langle \cdot ,\cdot \rangle _{\mathcal {V}}} must be a proper inner product. On the other hand, an explicit representation for φ {\displaystyle \varphi } is not necessary, as long as V {\displaystyle {\mathcal {V}}} is an inner product space. The alternative follows from Mercer's theorem: an implicitly defined function φ {\displaystyle \varphi } exists whenever the space X {\displaystyle {\mathcal {X}}} can be equipped with a suitable measure ensuring the function k {\displaystyle k} satisfies Mercer's condition. Mercer's theorem is similar to a generalization of the result from linear algebra that associates an inner product to any positive-definite matrix. In fact, Mercer's condition can be reduced to this simpler case. If we choose as our measure the counting measure μ ( T ) = | T | {\displaystyle \mu (T)=|T|} for all T ⊂ X {\displaystyle T\subset X} , which counts the number of points inside the set T {\displaystyle T} , then the integral in Mercer's theorem reduces to a summation ∑ i = 1 n ∑ j = 1 n k ( x i , x j ) c i c j ≥ 0. {\displaystyle \sum _{i=1}^{n}\sum _{j=1}^{n}k(\mathbf {x} _{i},\mathbf {x} _{j})c_{i}c_{j}\geq 0.} If this summation holds for all finite sequences of points ( x 1 , … , x n ) {\displaystyle (\mathbf {x} _{1},\dotsc ,\mathbf {x} _{n})} in X {\displaystyle {\mathcal {X}}} and all choices of n {\displaystyle n} real-valued coefficients ( c 1 , … , c n ) {\displaystyle (c_{1},\dots ,c_{n})} (cf. positive definite kernel), then the function k {\displaystyle k} satisfies Mercer's condition. Some algorithms that depend on arbitrary relationships in the native space X {\displaystyle {\mathcal {X}}} would, in fact, have a linear interpretation in a different setting: the range space of φ {\displaystyle \varphi } . The linear interpretation gives us insight about the algorithm. Furthermore, there is often no need to compute φ {\displaystyle \varphi } directly during computation, as is the case with support-vector machines. Some cite this running time shortcut as the primary benefit. Researchers also use it to justify the meanings and properties of existing algorithms. Theoretically, a Gram matrix K ∈ R n × n {\displaystyle \mathbf {K} \in \mathbb {R} ^{n\times n}} with respect to { x 1 , … , x n } {\displaystyle \{\mathbf {x} _{1},\dotsc ,\mathbf {x} _{n}\}} (sometimes also called a "kernel matrix"), where K i j = k ( x i , x j ) {\displaystyle K_{ij}=k(\mathbf {x} _{i},\mathbf {x} _{j})} , must be positive semi-definite (PSD). Empirically, for machine learning heuristics, choices of a function k {\displaystyle k} that do not satisfy Mercer's condition may still perform reasonably if k {\displaystyle k} at least approximates the intuitive idea of similarity. Regardless of whether k {\displaystyle k} is a Mercer kernel, k {\displaystyle k} may still be referred to as a "kernel". If the kernel function k {\displaystyle k} is also a covariance function as used in Gaussian processes, then the Gram matrix K {\displaystyle \mathbf {K} } can also be called a covariance matrix. == Applications == Application areas of kernel methods are diverse and include geostatistics, kriging, inverse distance weighting, 3D reconstruction, bioinformatics, cheminformatics, information extraction and handwriting recognition. == Popular kernels == Fisher kernel Graph kernels Kernel smoother Polynomial kernel Radial basis function kern

Visual Turing Test

The Visual Turing Test is “an operator-assisted device that produces a stochastic sequence of binary questions from a given test image”. The query engine produces a sequence of questions that have unpredictable answers given the history of questions. The test is only about vision and does not require any natural language processing. The job of the human operator is to provide the correct answer to the question or reject it as ambiguous. The query generator produces questions such that they follow a “natural story line”, similar to what humans do when they look at a picture. == History == Research in computer vision dates back to the 1960s when Seymour Papert first attempted to solve the problem. This unsuccessful attempt was referred to as the Summer Vision Project. The reason why it was not successful was because computer vision is more complicated than what people think. The complexity is in alignment with the human visual system. Roughly 50% of the human brain is devoted in processing vision, which indicates that it is a difficult problem. Later there were attempts to solve the problems with models inspired by the human brain. Perceptrons by Frank Rosenblatt, which is a form of the neural networks, was one of the first such approaches. These simple neural networks could not live up to their expectations and had certain limitations due to which they were not considered in future research. Later with the availability of the hardware and some processing power the research shifted to image processing which involves pixel-level operations, like finding edges, de-noising images or applying filters to name a few. There was some great progress in this field but the problem of vision which was to make the machines understand the images was still not being addressed. During this time the neural networks also resurfaced as it was shown that the limitations of the perceptrons can be overcome by Multi-layer perceptrons. Also in the early 1990s convolutional neural networks were born which showed great results on digit recognition but did not scale up well on harder problems. The late 1990s and early 2000s saw the birth of modern computer vision. One of the reasons this happened was due to the availability of key, feature extraction and representation algorithms. Features along with the already present machine learning algorithms were used to detect, localise and segment objects in Images. While all these advancements were being made, the community felt the need to have standardised datasets and evaluation metrics so the performances can be compared. This led to the emergence of challenges like the Pascal VOC challenge and the ImageNet challenge. The availability of standard evaluation metrics and the open challenges gave directions to the research. Better algorithms were introduced for specific tasks like object detection and classification. Visual Turing Test aims to give a new direction to the computer vision research which would lead to the introduction of systems that will be one step closer to understanding images the way humans do. == Current evaluation practices == A large number of datasets have been annotated and generalised to benchmark performances of difference classes of algorithms to assess different vision tasks (e.g., object detection/recognition) on some image domain (e.g., scene images). One of the most famous datasets in computer vision is ImageNet which is used to assess the problem of object level Image classification. ImageNet is one of the largest annotated datasets available and has over one million images. The other important vision task is object detection and localisation which refers to detecting the object instance in the image and providing the bounding box coordinates around the object instance or segmenting the object. The most popular dataset for this task is the Pascal dataset. Similarly there are other datasets for specific tasks like the H3D dataset for human pose detection, Core dataset to evaluate the quality of detected object attributes such as colour, orientation, and activity. Having these standard datasets has helped the vision community to come up with well performing algorithms for all these tasks. The next logical step is to create a larger task encompassing of these smaller subtasks. Having such a task would lead to building systems that would understand images, as understanding images would inherently involve detecting objects, localising them and segmenting them. == Details == The Visual Turing Test (VTT) unlike the Turing test has a query engine system which interrogates a computer vision system in the presence of a human co-ordinator. It is a system that generates a random sequence of binary questions specific to the test image, such that the answer to any question k is unpredictable given the true answers to the previous k − 1 questions (also known as history of questions). The test happens in the presence of a human operator who serves two main purposes: removing the ambiguous questions and providing the correct answers to the unambiguous questions. Given an Image infinite possible binary questions can be asked and a lot of them are bound to be ambiguous. These questions if generated by the query engine are removed by the human moderator and instead the query engine generates another question such that the answer to it is unpredictable given the history of the questions. The aim of the Visual Turing Test is to evaluate the Image understanding of a computer system, and an important part of image understanding is the story line of the image. When humans look at an image, they do not think that there is a car at ‘x’ pixels from the left and ‘y’ pixels from the top, but instead they look at it as a story, for e.g. they might think that there is a car parked on the road, a person is exiting the car and heading towards a building. The most important elements of the story line are the objects and so to extract any story line from an image the first and the most important task is to instantiate the objects in it, and that is what the query engine does. === Query engine === The query engine is the core of the Visual Turing Test and it comprises two main parts : Vocabulary and Questions ==== Vocabulary ==== Vocabulary is a set of words that represent the elements of the images. This vocabulary when used with appropriate grammar leads to a set of questions. The grammar is defined in the next section in a way that it leads to a space of binary questions. The vocabulary V {\displaystyle {\mathcal {V}}} consist of three components: Types of Objects T {\displaystyle {\mathcal {T}}} Type-dependent attributes of objects A ( t ) {\displaystyle {\mathcal {A}}(t)} Type-dependent relationships between two objects R ( t , t ′ ) {\displaystyle {\mathcal {R}}(t,t')} For Images of urban street scenes the types of objects include people, vehicle and buildings. Attributes refer to the properties of these objects, for e.g. female, child, wearing a hat or carrying something, for people and moving, parked, stopped, one tire visible or two tires visible for vehicles. Relationships between each pair of object classes can be either “ordered” or “unordered”. The unordered relationships may include talking, walking together and the ordered relationships include taller, closer to the camera, occluding, being occluded etc. Additionally all of this vocabulary is used in context of rectangular image regions w \in W which allow for the localisation of objects in the image. An extremely large number of such regions are possible and this complicates the problem, so for this test, regions at specific scales are only used which include 1/16 the size of image, 1/4 the size of image, 1/2 the size of image or larger. ==== Questions ==== The question space is composed of four types of questions: Existence questions: The aim of the existence questions is to find new objects in the image that have not been uniquely identified previously. They are of the form : Qexist = 'Is there an instance of an object of type t with attributes A partially visible in region w that was not previously instantiated?' Uniqueness questions: A uniqueness question tries to uniquely identify an object to instantiate it. Quniq = 'Is there a unique instance of an object of type t with attributes A partially visible in region w that was not previously instantiated?' The uniqueness questions along with the existence questions form the instantiation questions. As mentioned earlier instantiating objects leads to other interesting questions and eventually a story line. Uniqueness questions follow the existence questions and a positive answer to it leads to instantiation of an object. Attribute questions: An attribute question tries to find more about the object once it has been instantiated. Such questions can query about a single attribute, conjunction of two attributes or disjunction of two attributes. Qatt(ot) = {'Does object ot have attribute a?' , 'Does object

Layer (deep learning)

A layer in a deep learning model is a structure or network topology in the model's architecture, which takes information from the previous layers and then passes it to the next layer. == Layer types == The first type of layer is the Dense layer, also called the fully-connected layer, and is used for abstract representations of input data. In this layer, neurons connect to every neuron in the preceding layer. In multilayer perceptron networks, these layers are stacked together. The Convolutional layer is typically used for image analysis tasks. In this layer, the network detects edges, textures, and patterns. The outputs from this layer are then fed into a fully-connected layer for further processing. See also: CNN model. The Pooling layer is used to reduce the size of data input. The Recurrent layer is used for text processing with a memory function. Similar to the Convolutional layer, the output of recurrent layers are usually fed into a fully-connected layer for further processing. See also: RNN model. The Normalization layer adjusts the output data from previous layers to achieve a regular distribution. This results in improved scalability and model training. A Hidden layer is any of the layers in a Neural Network that aren't the input or output layers. == Differences with layers of the neocortex == There is an intrinsic difference between deep learning layering and neocortical layering: deep learning layering depends on network topology, while neocortical layering depends on intra-layers homogeneity.

Omar Al Olama

Omar Sultan Al Olama (Arabic: عمر سلطان العلماء; born 16 February 1990) is Minister of State for Artificial Intelligence, Digital Economy, and Remote Work Applications in the United Arab Emirates. He was appointed in October 2017 by Vice President and Prime Minister of the UAE and Ruler of Dubai, Sheikh Mohammed bin Rashid Al Maktoum. The UAE was the first country to appoint a minister for artificial intelligence. == Early life and education == Al Olama was born on 16 February 1990 in Dubai. He has a bachelor's degree in Business and Administration and Management from the American University in Dubai, and a Diploma in Excellence and Project Management from the American University in Sharjah. == Career == Between February 2012 and May 2014, Al Olama was member of the corporate planning at the UAE's Prime Minister's Office. From November 2015 to November 2016, he was Deputy Head of Minister's Office at the UAE's Prime Minister's Office. Between December 2015 and October 2017, he was Secretary General of the World Organization of Racing Drones. In November 2017, he was appointed member of the Board of Trustees of Dubai Future Foundation and Deputy Managing Director of the Foundation. In July 2016, Al Olama was appointed the managing director, and later in 2021 appointed Vice-Chair of the World Government Summit. In 2021, Al Olama was appointed as the Chairman of the Dubai Chamber of Digital Economy, a sub-section of Dubai Chamber of Commerce and Industry. During the cabinet reshuffle in 2023, Al Olama was appointed as the Director General of the Prime Minister's Office, concurrently maintaining his role as the Minister of State for Artificial Intelligence, Digital Economy and Remote Work Applications. == Memberships == In November 2017, Al Olama was appointed as a member of the Future of Digital Economy and Society Council, part of the World Economic Forum (WEF). Later in 2023, the World Economic Forum selected Al Olama to join the steering committee of the AI Governance Alliance, a group comprising 10 global leaders in the digital and technological fields. In 2019, Al Olama was appointed as Chair of the Advisory Board of the Mohamed bin Zayed University of Artificial Intelligence. In 2022, Al Olama was appointed by the UAE Cabinet as Vice-Chair of the Higher Committee for Government Digital Transformation, and also appointed by the Government of Dubai as Vice-Chair of the Higher Committee for Future Technology. In 2022, Al Olama was appointed Chairman of the oversight committee of the Dubai Future District Fund. Since 2023, Al Olama has been on the High-Level Advisory Body on Artificial Intelligence. In 2023, Al Olama, recognized as the world's first minister for artificial intelligence, was included in Time Magazine's inaugural list of the 100 most influential people in AI.

Generative AI Copyright Disclosure Act

The Generative AI Copyright Disclosure Act is a piece of legislation introduced by California Representative Adam Schiff in the United States Congress on April 9, 2024. It concerns the transparency of companies regarding their use of copyrighted work to train their generative artificial intelligence (AI) models. The legislation requires the submission of a notice regarding the identity and the uniform resource locator (URL) address of the copyrighted works used in the training data to the Register of Copyrights at least 30 days before the public release of the new or updated version of the AI model; it does not ban the use of copyrighted works for AI training. The bill's requirements would apply retroactively to prior AI models. Violation penalties would start at US$5,000. The legislation does not have a maximum penalty assessment that can be charged. The bill by Schiff was introduced a few days after The New York Times published an article regarding the business activities of major tech firms, including Google and Meta, in the training of their generative AI platforms on April 6, 2024. The legislation is supported by the Professional Photographers of America (PPA), SAG-AFTRA, the Writers Guild of America, the International Alliance of Theatrical Stage Employees (IATSE), the Recording Industry Association of America (RIAA), and others.

Lessac Technologies

Lessac Technologies, Inc. (LTI) is an American firm which develops voice synthesis software, licenses technology and sells synthesized novels as MP3 files. The firm currently has seven patents granted and three more pending for its automated methods of converting digital text into human-sounding speech, more accurately recognizing human speech and outputting the text representing the words and phrases of said speech, along with recognizing the speaker's emotional state. The LTI technology is partly based on the work of the late Arthur Lessac, a Professor of Theater at the State University of New York and the creator of Lessac Kinesensic Training, and LTI has licensed exclusive rights to exploit Arthur Lessac's copyrighted works in the fields of speech synthesis and speech recognition. Based on the view that music is speech and speech is music, Lessac's work and books focused on body and speech energies and how they go together. Arthur Lessac's textual annotation system, which was originally developed to assist actors, singers, and orators in marking up scripts to prepare for performance, is adapted in LTI's speech synthesis system as the basic representation of the speech to be synthesized (Lessemes), in contrast to many other systems which use a phonetic representation. LTI's software has two major components: (1) a linguistic front-end that converts plain text to a sequence of prosodic and phonosensory graphic symbols (Lessemes) based on Arthur Lessac's annotation system, which specify the speech units to be synthesized; (2) a signal-processing back-end that takes the Lessemes as acoustic data and produces human-sounding synthesized speech as output, using unit selection and concatenation. LTI's text-to-speech system came in second in the world-wide Blizzard Challenge 2011 and 2012. The first-place team in 2011 also employed LTI's "front-end" technology, but with its own back-end. The Blizzard Challenge, conducted by the Language Technologies Institute of Carnegie Mellon University, was devised as a way to evaluate speech synthesis techniques by having different research groups build voices from the same voice-actor recordings, and comparing the results through listening tests. LTI was founded in 2000 by H. Donald Wilson (chairman), a lawyer, LexisNexis entrepreneur and business associate of Arthur Lessac; and Gary A. Marple (chief inventor), after Marple suggested that Arthur Lessac's kinesensic voice training might be applicable to computational linguistics. After Wilson's death in 2006, his nephew John Reichenbach became the firm's CEO.

Lisp machine

Lisp machines are general-purpose computers designed to efficiently run Lisp as their main software and programming language, usually via hardware support. They are an example of a high-level language computer architecture. In a sense, they were the first commercial single-user workstations. Despite being modest in number (perhaps 7,000 units total as of 1988) Lisp machines commercially pioneered some now-commonplace technologies, including networking innovations such as Chaosnet, and effective garbage collection. Several firms built and sold Lisp machines in the 1980s: Symbolics (3600, 3640, XL1200, MacIvory, and other models), Lisp Machines Incorporated (LMI Lambda), Texas Instruments (Explorer, MicroExplorer), and Xerox (Interlisp-D workstations). The operating systems were written in Lisp Machine Lisp, Interlisp (Xerox), and later partly in Common Lisp. == History == === Historical context === Artificial intelligence (AI) computer programs of the 1960s and 1970s intrinsically required what was then considered a huge amount of computer power, as measured in processor time and memory space. The power requirements of AI research were exacerbated by the Lisp symbolic programming language, when commercial hardware was designed and optimized for assembly- and Fortran-like programming languages. At first, the cost of such computer hardware meant that it had to be shared among many users. As integrated circuit technology shrank the size and cost of computers in the 1960s and early 1970s, and the memory needs of AI programs began to exceed the address space of the most common research computer, the Digital Equipment Corporation (DEC) PDP-10, researchers considered a new approach: a computer designed specifically to develop and run large artificial intelligence programs, and tailored to the semantics of the Lisp language. To provide consistent performance for interactive programs, these machines would often not be shared, but would be dedicated to a single user at a time. === Initial development === In 1973, Richard Greenblatt and Thomas Knight, programmers at Massachusetts Institute of Technology (MIT) Artificial Intelligence Laboratory (AI Lab), began what would become the MIT Lisp Machine Project when they first began building a computer hardwired to run certain basic Lisp operations, rather than run them in software, in a 24-bit tagged architecture. The machine also did incremental (or Arena) garbage collection. More specifically, since Lisp variables are typed at runtime rather than compile time, a simple addition of two variables could take five times as long on conventional hardware, due to test and branch instructions. Lisp Machines ran the tests in parallel with the more conventional single instruction additions. If the simultaneous tests failed, then the result was discarded and recomputed; this meant in many cases a speed increase by several factors. This simultaneous checking approach was used as well in testing the bounds of arrays when referenced, and other memory management necessities (not merely garbage collection or arrays). Type checking was further improved and automated when the conventional byte word of 32 bits was lengthened to 36 bits for Symbolics 3600-model Lisp machines and eventually to 40 bits or more (usually, the excess bits not accounted for by the following were used for error-correcting codes). The first group of extra bits were used to hold type data, making the machine a tagged architecture, and the remaining bits were used to implement compressed data representation (CDR) coding (wherein the usual linked list elements are compressed to occupy roughly half the space), aiding garbage collection by reportedly an order of magnitude. A further improvement was two microcode instructions which specifically supported Lisp functions, reducing the cost of calling a function to as little as 20 clock cycles, in some Symbolics implementations. The first machine was called the CONS machine (named after the list construction operator cons in Lisp). Often it was affectionately referred to as the Knight machine, perhaps since Knight wrote his master's thesis on the subject; it was extremely well received. It was subsequently improved into a version called CADR (a pun; in Lisp, the cadr function, which returns the second item of a list, is pronounced /ˈkeɪ.dəɹ/ or /ˈkɑ.dəɹ/, as some pronounce the word "cadre") which was based on essentially the same architecture. About 25 of what were essentially prototype CADRs were sold within and without MIT for ~$50,000; it quickly became the favorite machine for hacking – many of the most favored software tools were quickly ported to it (e.g. Emacs was ported from ITS in 1975). It was so well received at an AI conference held at MIT in 1978 that Defense Advanced Research Projects Agency (DARPA) began funding its development. === Commercializing MIT Lisp machine technology === In 1979, Russell Noftsker, being convinced that Lisp machines had a bright commercial future due to the strength of the Lisp language and the enabling factor of hardware acceleration, proposed to Greenblatt that they commercialize the technology. In a counter-intuitive move for an AI Lab hacker, Greenblatt acquiesced, hoping perhaps that he could recreate the informal and productive atmosphere of the Lab in a real business. These ideas and goals were considerably different from those of Noftsker. The two negotiated at length, but neither would compromise. As the proposed firm could succeed only with the full and undivided assistance of the AI Lab hackers as a group, Noftsker and Greenblatt decided that the fate of the enterprise was up to them, and so the choice should be left to the hackers. The ensuing discussions of the choice divided the lab into two factions. In February 1979, matters came to a head. The hackers sided with Noftsker, believing that a commercial venture-fund-backed firm had a better chance of surviving and commercializing Lisp machines than Greenblatt's proposed self-sustaining start-up. Greenblatt lost the battle. It was at this juncture that Symbolics, Noftsker's enterprise, slowly came together. While Noftsker was paying his staff a salary, he had no building or any equipment for the hackers to work on. He bargained with Patrick Winston that, in exchange for allowing Symbolics' staff to keep working out of MIT, Symbolics would let MIT use internally and freely all the software Symbolics developed. A consultant from CDC, who was trying to put together a natural language computer application with a group of West-coast programmers, came to Greenblatt, seeking a Lisp machine for his group to work with, about eight months after the disastrous conference with Noftsker. Greenblatt had decided to start his own rival Lisp machine firm, but he had done nothing. The consultant, Alexander Jacobson, decided that the only way Greenblatt was going to start the firm and build the Lisp machines that Jacobson desperately needed was if Jacobson pushed and otherwise helped Greenblatt launch the firm. Jacobson pulled together business plans, a board, a partner for Greenblatt (one F. Stephen Wyle). The newfound firm was named LISP Machine, Inc. (LMI), and was funded by CDC orders, via Jacobson. Around this time Symbolics (Noftsker's firm) began operating. It had been hindered by Noftsker's promise to give Greenblatt a year's head start, and by severe delays in procuring venture capital. Symbolics still had the major advantage that while 3 or 4 of the AI Lab hackers had gone to work for Greenblatt, 14 other hackers had signed onto Symbolics. Two AI Lab people were not hired by either: Richard Stallman and Marvin Minsky. Stallman, however, blamed Symbolics for the decline of the hacker community that had centered around the AI lab. For two years, from 1982 to the end of 1983, Stallman worked by himself to clone the output of the Symbolics programmers, with the aim of preventing them from gaining a monopoly on the lab's computers. Regardless, after a series of internal battles, Symbolics did get off the ground in 1980/1981, selling the CADR as the LM-2, while Lisp Machines, Inc. sold it as the LMI-CADR. Symbolics did not intend to produce many LM-2s, since the 3600 family of Lisp machines was supposed to ship quickly, but the 3600s were repeatedly delayed, and Symbolics ended up producing ~100 LM-2s, each of which sold for $70,000. Both firms developed second-generation products based on the CADR: the Symbolics 3600 and the LMI-LAMBDA (of which LMI managed to sell ~200). The 3600, which shipped a year late, expanded on the CADR by widening the machine word to 36-bits, expanding the address space to 28-bits, and adding hardware to accelerate certain common functions that were implemented in microcode on the CADR. The LMI-LAMBDA, which came out a year after the 3600, in 1983, was compatible with the CADR (it could run CADR microcode), but hardware differences existed. Texas Instruments (TI) joined the fray whe