Elon Reeve Musk ( EE-lon; born June 28, 1971) is a businessman and former public official known for his leadership of Tesla and SpaceX. Musk has been the wealthiest person in the world since 2025; as of June 2026, Forbes estimates his net worth to be US$834 billion. Born into the wealthy Musk family in Pretoria, South Africa, Musk emigrated in 1989 to Canada; he has Canadian citizenship since his mother was born there. He received bachelor's degrees in 1997 from the University of Pennsylvania before moving to California to pursue business ventures. In 1995, Musk co-founded the software company Zip2. Following its sale in 1999, he co-founded X.com, an online payment company that later merged to form PayPal, which was acquired by eBay in 2002. Musk also became an American citizen in 2002. In 2002, Musk founded the space technology company SpaceX, becoming its CEO and chief engineer; the company has since led innovations in reusable rockets and commercial spaceflight. Musk joined the automaker Tesla as an early investor in 2004 and became its CEO and product architect in 2008; it has since become a leader in electric vehicles. In 2015, he co-founded OpenAI to advance artificial intelligence (AI) research, but later left; growing discontent with the organization's direction and leadership in the AI boom in the 2020s led him to establish xAI, which became a subsidiary of SpaceX in 2026. In 2022, he acquired the social network Twitter, implementing significant changes, and rebranding it as X in 2023. His other businesses include the neurotechnology company Neuralink, which he co-founded in 2016, and the tunneling company the Boring Company, which he founded in 2017. In November 2025, Tesla approved a pay package worth $1 trillion for Musk, which he is to receive over 10 years if he meets specific goals. Musk is a supporter of global far-right politics, figures, and political parties. He was the largest donor in the 2024 U.S. presidential election, where he supported Donald Trump. After Trump was inaugurated as president in January 2025, Musk served as Senior Advisor to the President and as the de facto head of the Department of Government Efficiency (DOGE). Shortly before a public feud with Trump, Musk left the Trump administration in May 2025 and returned to managing his companies. Musk's political activities, statements and views have made him a polarizing figure. He has been criticized for making unscientific and misleading statements, including spreading COVID-19 misinformation, promoting conspiracy theories, and affirming antisemitic, racist, and transphobic comments. His acquisition of Twitter was controversial due to a subsequent increase in hate speech and the spread of misinformation on the service, following his pledge to decrease censorship. His role in the second Trump administration attracted public backlash, particularly in response to DOGE. == Early life and education == Elon Reeve Musk was born on June 28, 1971, in Pretoria, South Africa's administrative capital. He is of British and Pennsylvania Dutch ancestry. His mother, Maye (née Haldeman), is a model and dietitian born in Saskatchewan, Canada, and raised in South Africa. Musk therefore holds both South African and Canadian citizenship from birth. His father, Errol Musk, is a South African electromechanical engineer, pilot, sailor, consultant, emerald dealer, and property developer, who partly owned a rental lodge at Timbavati Private Nature Reserve. His maternal grandfather, Joshua N. Haldeman, who died in a plane crash when Elon was a toddler, was an American-born Canadian chiropractor, aviator and political activist in the Technocracy movement who moved to South Africa in 1950. Haldeman's anti-government, anti-democratic and conspiracist views, which included the promotion of far-right antisemitic conspiracy theories, "fanatical" support of apartheid, and according to Errol Musk, support of Nazism, have been suggested as an influence on Elon. During his childhood, Elon was told stories by his grandmother of Haldeman's travels and exploits, and Elon has suggested that all of Haldeman's descendants have his "desire for adventure, exploration – doing crazy things". Elon has a younger brother, Kimbal, a younger sister, Tosca, and four paternal half-siblings. Musk was baptized as a child in the Anglican Church of Southern Africa. The Musk family was wealthy during Elon's youth. Despite both Elon and Errol previously stating that Errol was a part owner of a Zambian emerald mine, in 2023, Errol recounted that the deal he made was to receive "a portion of the emeralds produced at three small mines". Errol was elected to the Pretoria City Council as a representative of the anti-apartheid Progressive Party and has said that his children shared their father's dislike of apartheid. After his parents divorced in 1979, Elon, aged around 9, chose to live with his father because he had an Encyclopædia Britannica set and a computer. Elon later regretted his decision and became estranged from his father. Elon has recounted trips to a wilderness school that he described as a "paramilitary Lord of the Flies" where "bullying was a virtue" and children were encouraged to fight over rations. In one incident, after an altercation with a fellow pupil, Elon was thrown down concrete steps and beaten severely, leading to him being hospitalized for his injuries. Elon described his father berating him after he was discharged from the hospital. Errol denied berating Elon and claimed, "The [other] boy had just lost his father to suicide, and Elon had called him stupid. Elon had a tendency to call people stupid. How could I possibly blame that child?" Elon was an enthusiastic reader of books, and had attributed his success in part to having read The Lord of the Rings, the Foundation series, and The Hitchhiker's Guide to the Galaxy. At age ten, he developed an interest in computing and video games, teaching himself how to program from the VIC-20 user manual. At age twelve, Elon sold his BASIC-based game Blastar to PC and Office Technology magazine for approximately $500 (equivalent to $1,600 in 2025). === Education === Musk attended Waterkloof House Preparatory School, Bryanston High School, and then Pretoria Boys High School, where he graduated. Musk was a decent but unexceptional student, earning a 61/100 in Afrikaans and a B on his senior math certification. Musk applied for a Canadian passport through his Canadian-born mother to avoid South Africa's mandatory military service, which would have forced him to participate in the apartheid regime, as well as to ease his path to immigration to the United States. While waiting for his application to be processed, he attended the University of Pretoria for five months. Musk arrived in Canada in June 1989, connected with a second cousin in Saskatchewan, and worked odd jobs, including at a farm and a lumber mill. In 1990, he entered Queen's University in Kingston, Ontario. Two years later, he transferred to the University of Pennsylvania, where he studied until 1995. Although Musk has said that he earned his degrees in 1995, the University of Pennsylvania did not award them until 1997 – a Bachelor of Arts in physics and a Bachelor of Science in economics from the university's Wharton School. He reportedly hosted large, ticketed house parties to help pay for tuition, and wrote a business plan for an electronic book-scanning service similar to Google Books. In 1994, Musk held two internships in Silicon Valley: one at energy storage startup Pinnacle Research Institute, which investigated electrolytic supercapacitors for energy storage, and another at Palo Alto–based startup Rocket Science Games. In 1995, he was accepted to a graduate program in materials science at Stanford University, but did not enroll. Musk decided to join the Internet boom of the 1990s, applying for a job at Netscape, to which he reportedly never received a response. The Washington Post reported that Musk lacked legal authorization to remain and work in the United States after failing to enroll at Stanford. In response, Musk said he was allowed to work at that time and that his student visa transitioned to an H1-B. According to numerous former business associates and shareholders, Musk said he was on a student visa at the time. == Business career == === Zip2 === In 1995, Musk, his brother Kimbal, and Greg Kouri founded the web software company Zip2 with funding from a group of angel investors. They housed the venture at a small rented office in Palo Alto. Replying to Rolling Stone, Musk denounced the notion that they started their company with funds borrowed from Elon's father Errol Musk, but in a tweet, he recognized that his father contributed 10% of a later funding round. The company developed and marketed an Internet city guide for the newspaper publishing industry, with maps, directions, and yellow pages. According to Musk, "The website was up during the day and I was coding it
Intelligent database
Until the 1980s, databases were viewed as computer systems that stored record-oriented and business data such as manufacturing inventories, bank records, and sales transactions. A database system was not expected to merge numeric data with text, images, or multimedia information, nor was it expected to automatically notice patterns in the data it stored. In the late 1980s the concept of an intelligent database was put forward as a system that manages information (rather than data) in a way that appears natural to users and which goes beyond simple record keeping. The term was introduced in 1989 by the book Intelligent Databases by Kamran Parsaye, Mark Chignell, Setrag Khoshafian and Harry Wong. The concept postulated three levels of intelligence for such systems: high level tools, the user interface and the database engine. The high level tools manage data quality and automatically discover relevant patterns in the data with a process called data mining. This layer often relies on the use of artificial intelligence techniques. The user interface uses hypermedia in a form that uniformly manages text, images and numeric data. The intelligent database engine supports the other two layers, often merging relational database techniques with object orientation. In the twenty-first century, intelligent databases have now become widespread, e.g. hospital databases can now call up patient histories consisting of charts, text and x-ray images just with a few mouse clicks, and many corporate databases include decision support tools based on sales pattern analysis.
Point-set registration
In computer vision, pattern recognition, and robotics, point-set registration, also known as point-cloud registration or scan matching, is the process of finding a spatial transformation (e.g., scaling, rotation and translation) that aligns two point clouds. The purpose of finding such a transformation includes merging multiple data sets into a globally consistent model (or coordinate frame), and mapping a new measurement to a known data set to identify features or to estimate its pose. Raw 3D point cloud data are typically obtained from Lidars and RGB-D cameras. 3D point clouds can also be generated from computer vision algorithms such as triangulation, bundle adjustment, and more recently, monocular image depth estimation using deep learning. For 2D point set registration used in image processing and feature-based image registration, a point set may be 2D pixel coordinates obtained by feature extraction from an image, for example corner detection. Point cloud registration has extensive applications in autonomous driving, motion estimation and 3D reconstruction, object detection and pose estimation, robotic manipulation, simultaneous localization and mapping (SLAM), panorama stitching, virtual and augmented reality, and medical imaging. As a special case, registration of two point sets that only differ by a 3D rotation (i.e., there is no scaling and translation), is called the Wahba Problem and also related to the orthogonal procrustes problem. == Formulation == The problem may be summarized as follows: Let { M , S } {\displaystyle \lbrace {\mathcal {M}},{\mathcal {S}}\rbrace } be two finite size point sets in a finite-dimensional real vector space R d {\displaystyle \mathbb {R} ^{d}} , which contain M {\displaystyle M} and N {\displaystyle N} points respectively (e.g., d = 3 {\displaystyle d=3} recovers the typical case of when M {\displaystyle {\mathcal {M}}} and S {\displaystyle {\mathcal {S}}} are 3D point sets). The problem is to find a transformation to be applied to the moving "model" point set M {\displaystyle {\mathcal {M}}} such that the difference (typically defined in the sense of point-wise Euclidean distance) between M {\displaystyle {\mathcal {M}}} and the static "scene" set S {\displaystyle {\mathcal {S}}} is minimized. In other words, a mapping from R d {\displaystyle \mathbb {R} ^{d}} to R d {\displaystyle \mathbb {R} ^{d}} is desired which yields the best alignment between the transformed "model" set and the "scene" set. The mapping may consist of a rigid or non-rigid transformation. The transformation model may be written as T {\displaystyle T} , using which the transformed, registered model point set is: The output of a point set registration algorithm is therefore the optimal transformation T ⋆ {\displaystyle T^{\star }} such that M {\displaystyle {\mathcal {M}}} is best aligned to S {\displaystyle {\mathcal {S}}} , according to some defined notion of distance function dist ( ⋅ , ⋅ ) {\displaystyle \operatorname {dist} (\cdot ,\cdot )} : where T {\displaystyle {\mathcal {T}}} is used to denote the set of all possible transformations that the optimization tries to search for. The most popular choice of the distance function is to take the square of the Euclidean distance for every pair of points: where ‖ ⋅ ‖ 2 {\displaystyle \|\cdot \|_{2}} denotes the vector 2-norm, s m {\displaystyle s_{m}} is the corresponding point in set S {\displaystyle {\mathcal {S}}} that attains the shortest distance to a given point m {\displaystyle m} in set M {\displaystyle {\mathcal {M}}} after transformation. Minimizing such a function in rigid registration is equivalent to solving a least squares problem. == Types of algorithms == When the correspondences (i.e., s m ↔ m {\displaystyle s_{m}\leftrightarrow m} ) are given before the optimization, for example, using feature matching techniques, then the optimization only needs to estimate the transformation. This type of registration is called correspondence-based registration. On the other hand, if the correspondences are unknown, then the optimization is required to jointly find out the correspondences and transformation together. This type of registration is called simultaneous pose and correspondence registration. === Rigid registration === Given two point sets, rigid registration yields a rigid transformation which maps one point set to the other. A rigid transformation is defined as a transformation that does not change the distance between any two points. Typically such a transformation consists of translation and rotation. In rare cases, the point set may also be mirrored. In robotics and computer vision, rigid registration has the most applications. === Non-rigid registration === Given two point sets, non-rigid registration yields a non-rigid transformation which maps one point set to the other. Non-rigid transformations include affine transformations such as scaling and shear mapping. However, in the context of point set registration, non-rigid registration typically involves nonlinear transformation. If the eigenmodes of variation of the point set are known, the nonlinear transformation may be parametrized by the eigenvalues. A nonlinear transformation may also be parametrized as a thin plate spline. === Other types === Some approaches to point set registration use algorithms that solve the more general graph matching problem. However, the computational complexity of such methods tend to be high and they are limited to rigid registrations. In this article, we will only consider algorithms for rigid registration, where the transformation is assumed to contain 3D rotations and translations (possibly also including a uniform scaling). The PCL (Point Cloud Library) is an open-source framework for n-dimensional point cloud and 3D geometry processing. It includes several point registration algorithms. == Correspondence-based registration == Correspondence-based methods assume the putative correspondences m ↔ s m {\displaystyle m\leftrightarrow s_{m}} are given for every point m ∈ M {\displaystyle m\in {\mathcal {M}}} . Therefore, we arrive at a setting where both point sets M {\displaystyle {\mathcal {M}}} and S {\displaystyle {\mathcal {S}}} have N {\displaystyle N} points and the correspondences m i ↔ s i , i = 1 , … , N {\displaystyle m_{i}\leftrightarrow s_{i},i=1,\dots ,N} are given. === Outlier-free registration === In the simplest case, one can assume that all the correspondences are correct, meaning that the points m i , s i ∈ R 3 {\displaystyle m_{i},s_{i}\in \mathbb {R} ^{3}} are generated as follows:where l > 0 {\displaystyle l>0} is a uniform scaling factor (in many cases l = 1 {\displaystyle l=1} is assumed), R ∈ SO ( 3 ) {\displaystyle R\in {\text{SO}}(3)} is a proper 3D rotation matrix ( SO ( d ) {\displaystyle {\text{SO}}(d)} is the special orthogonal group of degree d {\displaystyle d} ), t ∈ R 3 {\displaystyle t\in \mathbb {R} ^{3}} is a 3D translation vector and ϵ i ∈ R 3 {\displaystyle \epsilon _{i}\in \mathbb {R} ^{3}} models the unknown additive noise (e.g., Gaussian noise). Specifically, if the noise ϵ i {\displaystyle \epsilon _{i}} is assumed to follow a zero-mean isotropic Gaussian distribution with standard deviation σ i {\displaystyle \sigma _{i}} , i.e., ϵ i ∼ N ( 0 , σ i 2 I 3 ) {\displaystyle \epsilon _{i}\sim {\mathcal {N}}(0,\sigma _{i}^{2}I_{3})} , then the following optimization can be shown to yield the maximum likelihood estimate for the unknown scale, rotation and translation:Note that when the scaling factor is 1 and the translation vector is zero, then the optimization recovers the formulation of the Wahba problem. Despite the non-convexity of the optimization (cb.2) due to non-convexity of the set SO ( 3 ) {\displaystyle {\text{SO}}(3)} , seminal work by Berthold K.P. Horn showed that (cb.2) actually admits a closed-form solution, by decoupling the estimation of scale, rotation and translation. Similar results were discovered by Arun et al. In addition, in order to find a unique transformation ( l , R , t ) {\displaystyle (l,R,t)} , at least N = 3 {\displaystyle N=3} non-collinear points in each point set are required. More recently, Briales and Gonzalez-Jimenez have developed a semidefinite relaxation using Lagrangian duality, for the case where the model set M {\displaystyle {\mathcal {M}}} contains different 3D primitives such as points, lines and planes (which is the case when the model M {\displaystyle {\mathcal {M}}} is a 3D mesh). Interestingly, the semidefinite relaxation is empirically tight, i.e., a certifiably globally optimal solution can be extracted from the solution of the semidefinite relaxation. === Robust registration === The least squares formulation (cb.2) is known to perform arbitrarily badly in the presence of outliers. An outlier correspondence is a pair of measurements s i ↔ m i {\displaystyle s_{i}\leftrightarrow m_{i}} that departs from the generative model (cb.1). In this case, one can consider a differen
Powerset (company)
Powerset was an American company based in San Francisco, California, that, in 2006, was developing a natural language search engine for the Internet. On July 1, 2008, Powerset was acquired by Microsoft for an estimated $100 million (~$143 million in 2024). Powerset was working on building a natural language search engine that could find targeted answers to user questions (as opposed to keyword based search). For example, when confronted with a question like "Which U.S. state has the highest income tax?", conventional search engines ignore the question phrasing and instead do a search on the keywords "state", "highest", "income", and "tax". Powerset on the other hand, attempts to use natural language processing to understand the nature of the question and return pages containing the answer. The company was in the process of "building a natural language search engine that reads and understands every sentence on the Web". The company has licensed natural language technology from PARC, the former Xerox Palo Alto Research Center. On May 11, 2008, the company unveiled a tool for searching a fixed subset of English Wikipedia using conversational phrases rather than keywords. Acquisition by Microsoft: One significant milestone in Powerset's history was its acquisition by Microsoft on July 1, 2008, for an estimated $100 million. This acquisition was part of Microsoft's broader strategy to enhance its search capabilities and compete more effectively with other search engine providers, particularly Google. Natural Language Search Engine: Powerset's primary focus was on developing a natural language search engine capable of understanding and interpreting user queries in a more human-like manner. Instead of simply matching keywords, Powerset aimed to comprehend the meaning behind the words, allowing for more accurate and contextually relevant search results. Technology and Partnerships: Powerset had licensed natural language technology from PARC, the Xerox Palo Alto Research Center. This technology likely played a crucial role in the development of Powerset's NLP capabilities. Wikipedia Search Tool: In May 2008, Powerset unveiled a search tool that allowed users to search a fixed subset of English Wikipedia using conversational phrases rather than traditional keywords. This demonstrated the potential of Powerset's NLP technology in providing more precise and relevant search results. == Powerlabs == In a form of beta testing, Powerset opened an online community called Powerlabs on September 17, 2007. Business Week said: "The company hopes the site will marshal thousands of people to help build and improve its search engine before it goes public next year." Said The New York Times: "[Powerset Labs] goes far beyond the 'alpha' or 'beta' testing involved in most software projects, when users put a new product through rigorous testing to find its flaws. Powerset doesn’t have a product yet, but rather a collection of promising natural language technologies, which are the fruit of years of research at Xerox PARC." Powerlabs' initial search results are taken from Wikipedia. == Notable people == Barney Pell (born March 18, 1968, in Hollywood, California) was co-founder and CEO of Powerset. Pell received his Bachelor of Science degree in symbolic systems from Stanford University in 1989, where he graduated Phi Beta Kappa and was a National Merit Scholar. Pell received a PhD in computer science from Cambridge University in 1993, where he was a Marshall Scholar. He has worked at NASA, as chief strategist and vice president of business development at StockMaster.com (acquired by Red Herring in March, 2000) and at Whizbang! Labs. Prior to joining Powerset, Pell was an Entrepreneur-in-Residence at Mayfield Fund, a venture capital firm in Silicon Valley. Pell is also a founder of Moon Express, Inc., a U.S. company awarded a $10M commercial lunar contract by NASA and a competitor in the Google Lunar X PRIZE. Steve Newcomb was the COO and co-founder of Powerset. Prior to joining Powerset, he was a co-founder of Loudfire, General Manager at Promptu, and was on the board of directors at Jaxtr. He left Powerset in October 2007 to form Virgance, a social startup incubator. Lorenzo Thione (born in Como, Italy) was the product architect and co-founder of Powerset. Prior to joining Powerset, he worked at FXPAL in natural language processing and related research fields. Thione earned his master's degree in software engineering from the University of Texas at Austin. Ronald Kaplan, former manager of research in Natural Language Theory and Technology at PARC, served as the company's CTO and CSO. Ryan Ferrier is a member of the founding team of Powerset. He managed personnel and internal operations. After 2008 he went on to co-found Serious Business, which made Facebook applications and was later bought by Zynga. Another Powerset alumnus, Alex Le, became CTO of Serious Business and went on to become an executive producer at Zynga when it bought the company. Siqi Chen founded a stealth startup in mobile computing after leaving Powerset. Tom Preston-Werner worked at Powerset and left after the acquisition to found GitHub. == Investors == Powerset attracted a wide range of investors, many of whom had considerable experience in the venture capital field. The company received $12.5 million (~$18.2 million in 2024) in Series A funding during November 2007, co-led by the venture capital firms Foundation Capital and The Founders Fund. Among the better-known investors: Esther Dyson, founding chairman of ICANN, founder of the newsletter Release 1.0 and editor at Cnet Peter Thiel, founder and former CEO of PayPal Luke Nosek, founder of PayPal Todd Parker. Managing Partner, Hidden River Ventures Reid Hoffman, executive vice president of PayPal and founder of LinkedIn First Round Capital, seed-stage venture firm
Information extraction
Information extraction (IE) is the task of automatically extracting structured information from unstructured and/or semi-structured machine-readable documents and other electronically represented sources. Typically, this involves processing human language texts by means of natural language processing (NLP). Recent activities in multimedia document processing like automatic annotation and content extraction out of images/audio/video/documents could be seen as information extraction. Recent advances in NLP techniques have allowed for significantly improved performance compared to previous years. An example is the extraction from newswire reports of corporate mergers, such as denoted by the formal relation: MergerBetween ( c o m p a n y 1 , c o m p a n y 2 , d a t e ) {\displaystyle \operatorname {MergerBetween} (\mathrm {company} _{1},\mathrm {company} _{2},\mathrm {date} )} , from an online news sentence such as: "Yesterday, New York based Foo Inc. announced their acquisition of Bar Corp." A broad goal of IE is to allow computation to be done on the previously unstructured data. A more specific goal is to allow automated reasoning about the logical form of the input data. Structured data is semantically well-defined data from a chosen target domain, interpreted with respect to category and context. Information extraction is the part of a greater puzzle which deals with the problem of devising automatic methods for text management, beyond its transmission, storage and display. The discipline of information retrieval (IR) has developed automatic methods, typically of a statistical flavor, for indexing large document collections and classifying documents. Another complementary approach is that of natural language processing (NLP) which has solved the problem of modelling human language processing with considerable success when taking into account the magnitude of the task. In terms of both difficulty and emphasis, IE deals with tasks in between both IR and NLP. In terms of input, IE assumes the existence of a set of documents in which each document follows a template, i.e. describes one or more entities or events in a manner that is similar to those in other documents but differing in the details. An example, consider a group of newswire articles on Latin American terrorism with each article presumed to be based upon one or more terroristic acts. We also define for any given IE task a template, which is a(or a set of) case frame(s) to hold the information contained in a single document. For the terrorism example, a template would have slots corresponding to the perpetrator, victim, and weapon of the terroristic act, and the date on which the event happened. An IE system for this problem is required to "understand" an attack article only enough to find data corresponding to the slots in this template. == History == Information extraction dates back to the late 1970s in the early days of NLP. An early commercial system from the mid-1980s was JASPER built for Reuters by the Carnegie Group Inc with the aim of providing real-time financial news to financial traders. Beginning in 1987, IE was spurred by a series of Message Understanding Conferences. MUC is a competition-based conference that focused on the following domains: MUC-1 (1987), MUC-3 (1989): Naval operations messages. MUC-3 (1991), MUC-4 (1992): Terrorism in Latin American countries. MUC-5 (1993): Joint ventures and microelectronics domain. MUC-6 (1995): News articles on management changes. MUC-7 (1998): Satellite launch reports. Considerable support came from the U.S. Defense Advanced Research Projects Agency (DARPA), who wished to automate mundane tasks performed by government analysts, such as scanning newspapers for possible links to terrorism. == Present significance == The present significance of IE pertains to the growing amount of information available in unstructured form. Tim Berners-Lee, inventor of the World Wide Web, refers to the existing Internet as the web of documents and advocates that more of the content be made available as a web of data. Until this transpires, the web largely consists of unstructured documents lacking semantic metadata. Knowledge contained within these documents can be made more accessible for machine processing by means of transformation into relational form, or by marking-up with XML tags. An intelligent agent monitoring a news data feed requires IE to transform unstructured data into something that can be reasoned with. A typical application of IE is to scan a set of documents written in a natural language and populate a database with the information extracted. == Tasks and subtasks == Applying information extraction to text is linked to the problem of text simplification in order to create a structured view of the information present in free text. The overall goal being to create a more easily machine-readable text to process the sentences. Typical IE tasks and subtasks include: Template filling: Extracting a fixed set of fields from a document, e.g. extract perpetrators, victims, time, etc. from a newspaper article about a terrorist attack. Event extraction: Given an input document, output zero or more event templates. For instance, a newspaper article might describe multiple terrorist attacks. Knowledge Base Population: Fill a database of facts given a set of documents. Typically the database is in the form of triplets, (entity 1, relation, entity 2), e.g. (Barack Obama, Spouse, Michelle Obama) Named entity recognition: recognition of known entity names (for people and organizations), place names, temporal expressions, and certain types of numerical expressions, by employing existing knowledge of the domain or information extracted from other sentences. Typically the recognition task involves assigning a unique identifier to the extracted entity. A simpler task is named entity detection, which aims at detecting entities without having any existing knowledge about the entity instances. For example, in processing the sentence "M. Smith likes fishing", named entity detection would denote detecting that the phrase "M. Smith" does refer to a person, but without necessarily having (or using) any knowledge about a certain M. Smith who is (or, "might be") the specific person whom that sentence is talking about. Coreference resolution: detection of coreference and anaphoric links between text entities. In IE tasks, this is typically restricted to finding links between previously extracted named entities. For example, "International Business Machines" and "IBM" refer to the same real-world entity. If we take the two sentences "M. Smith likes fishing. But he doesn't like biking", it would be beneficial to detect that "he" is referring to the previously detected person "M. Smith". Relationship extraction: identification of relations between entities, such as: PERSON works for ORGANIZATION (extracted from the sentence "Bill works for IBM.") PERSON located in LOCATION (extracted from the sentence "Bill is in France.") Semi-structured information extraction which may refer to any IE that tries to restore some kind of information structure that has been lost through publication, such as: Table extraction: finding and extracting tables from documents. Table information extraction : extracting information in structured manner from the tables. This task is more complex than table extraction, as table extraction is only the first step, while understanding the roles of the cells, rows, columns, linking the information inside the table and understanding the information presented in the table are additional tasks necessary for table information extraction. Comments extraction : extracting comments from the actual content of articles in order to restore the link between authors of each of the sentences Language and vocabulary analysis Terminology extraction: finding the relevant terms for a given corpus Audio extraction Template-based music extraction: finding relevant characteristic in an audio signal taken from a given repertoire; for instance time indexes of occurrences of percussive sounds can be extracted in order to represent the essential rhythmic component of a music piece. Note that this list is not exhaustive and that the exact meaning of IE activities is not commonly accepted and that many approaches combine multiple sub-tasks of IE in order to achieve a wider goal. Machine learning, statistical analysis and/or natural language processing are often used in IE. IE on non-text documents is becoming an increasingly interesting topic in research, and information extracted from multimedia documents can now be expressed in a high level structure as it is done on text. This naturally leads to the fusion of extracted information from multiple kinds of documents and sources. == World Wide Web applications == IE has been the focus of the MUC conferences. The proliferation of the Web, however, intensified the need for developing IE systems that help people
Agent-assisted automation
Agent-assisted automation is a type of call center technology that automates elements of what the call center agent 1) does with his/her desktop tools and/or 2) says to customers during the call using pre-recorded audio. It is a relatively new category of call center technology that shows promise in improving call center productivity and compliance. == Types of agent-assisted automation == === Pre-recorded audio === Pre-recorded audio (sometimes referred to as soundboard (computer program) or as soundboard technology) is another form of agent-assisted automation. The purpose of using pre-recorded messages is to increase the probability (and in some cases error-proof the process so) that the right information is provided to customers at the right time. The required disclosures are pre-recorded to ensure accuracy and understandability. By integrating the recordings with the customer relationship management software, the right combination of disclosures can be played based on the combination of goods and services the customer purchased. The integration with the customer relationship management software also ensures that the order cannot be submitted until the disclosures are played, essentially error-proofing (poka-yoke) the process of ensuring the customer gets all the required consumer protection information. Phone surveys are ideal applications of this technology. Whether surveying market preferences or political views, the pre-recorded audio with an agent listening allows the questions to be asked in the same way every time, uninfluenced by the agents' fatigue levels, accents, or their own views. === Fraud prevention === Fraud prevention is a specialized type of agent-assisted automation focused on reducing ID theft and credit card fraud. ID theft and credit card fraud are huge threats for call centers and their customers and few good solutions exist, but new agent-assisted automation solutions are producing promising results. The technology allows the agents to remain on the phone while the customers use their phone key pads to enter the information. The tones are masked and the information passes directly into the customer relationship management system or payment gateway in the case of credit card transactions. The automation essentially makes it impossible for call center agents and also call center personnel that might be monitoring the calls to steal the credit card number, social security number, or other personally identifiable information. === Outbound telemarketing === Another specialized application space of agent-assisted automation is in outbound telemarketing, which goes under numerous headings including outbound prospecting, cold calling, solicitation, fund-raising, etc. Turnover is high among agents engaged in this kind of work because the task is tedious and emotionally difficult. It is tedious because the agent spends the bulk of their day, not talking to qualified leads, but in getting wrong numbers and answering machines. == Benefits == Just as automation has benefited manufacturing by reducing the mental and physical effort required of workers while simultaneously improving throughput, quality, and safety, agent-assisted automation is improving call center results while reducing the tiring aspects of the job for agents. In some cases, the agent-assisted automation streamlines the process and allows calls to be handled more quickly. By eliminating cutting and pasting from one application to another, by auto-navigating applications, and by providing a single view of the customer, agent-assisted automation can reduce call handle time and increase agent productivity. Second, in theory, the more steps that can be automated and the more logic that can be built into the call flow (e.g., if the customer buys items 2 and 9, then disclosures a, c, and f are read by the pre-recorded audio), then companies may be able to reduce the amount of training that is required of the agents while at the same time ensuring more consistency and accuracy. However, no published studies have reported this result yet. But an even larger problem in call centers is between-agent variation in behavior and results. Agents differ in the amount of training and coaching they receive, they differ in the amount of experience they have, their jobs are repetitious and tiring, and the process and procedures the agents are supposed to follow constantly change. Moreover, there are significant individual differences between agents in their intelligence, personality, motivations, etc. which all affect performance. Despite the large amount of money call centers have spent over decades trying to reduce between-agent variation, the problem is still so prevalent that one large study of customer interactions with call centers found that a customer's experience was completely a function of the quality of the agent who happened to answer the phone. Therefore, the most significant benefit of agent-assisted automation may prove to be in how the automation error-proofs or poka-yoke the process and ensures that something that needs to be done or said happens every time. Properly implemented, the between-agent variation for whatever step of the process the automation is applied to may be able to be reduced to near zero. This is especially important in a collection agency whose processes and procedures are closely regulated by the Fair Debt Collection Practices Act.
Realization (linguistics)
In linguistics, realization is the process by which some kind of surface representation is derived from its underlying representation; that is, the way in which some abstract object of linguistic analysis comes to be produced in actual language. Phonemes are often said to be realized by speech sounds. The different sounds that can realize a particular phoneme are called its allophones. Realization is also a subtask of natural language generation, which involves creating an actual text in a human language (English, French, etc.) from a syntactic representation. There are a number of software packages available for realization, most of which have been developed by academic research groups in NLG. The remainder of this article concerns realization of this kind. == Example == For example, the following Java code causes the simplenlg system [2] to print out the text The women do not smoke.: In this example, the computer program has specified the linguistic constituents of the sentence (verb, subject), and also linguistic features (plural subject, negated), and from this information the realiser has constructed the actual sentence. == Processing == Realisation involves three kinds of processing: Syntactic realisation: Using grammatical knowledge to choose inflections, add function words and also to decide the order of components. For example, in English the subject usually precedes the verb, and the negated form of smoke is do not smoke. Morphological realisation: Computing inflected forms, for example the plural form of woman is women (not womans). Orthographic realisation: Dealing with casing, punctuation, and formatting. For example, capitalising The because it is the first word of the sentence. The above examples are very basic, most realisers are capable of considerably more complex processing. == Systems == A number of realisers have been developed over the past 20 years. These systems differ in terms of complexity and sophistication of their processing, robustness in dealing with unusual cases, and whether they are accessed programmatically via an API or whether they take a textual representation of a syntactic structure as their input. There are also major differences in pragmatic factors such as documentation, support, licensing terms, speed and memory usage, etc. It is not possible to describe all realisers here, but a few of the emerging areas are: Simplenlg [3]: a document realizing engine with an api which intended to be simple to learn and use, focused on limiting scope to only finding the surface area of a document. KPML [4]: this is the oldest realiser, which has been under development under different guises since the 1980s. It comes with grammars for ten different languages. FUF/SURGE [5]: a realiser which was widely used in the 1990s, and is still used in some projects today OpenCCG [6]: an open-source realiser which has a number of nice features, such as the ability to use statistical language models to make realisation decisions.