AI Art Prints

AI Art Prints — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Scale space implementation

    Scale space implementation

    In the areas of computer vision, image analysis and signal processing, the notion of scale-space representation is used for processing measurement data at multiple scales, and specifically enhance or suppress image features over different ranges of scale (see the article on scale space). A special type of scale-space representation is provided by the Gaussian scale space, where the image data in N dimensions is subjected to smoothing by Gaussian convolution. Most of the theory for Gaussian scale space deals with continuous images, whereas one when implementing this theory will have to face the fact that most measurement data are discrete. Hence, the theoretical problem arises concerning how to discretize the continuous theory while either preserving or well approximating the desirable theoretical properties that lead to the choice of the Gaussian kernel (see the article on scale-space axioms). This article describes basic approaches for this that have been developed in the literature, see also for an in-depth treatment regarding the topic of approximating the Gaussian smoothing operation and the Gaussian derivative computations in scale-space theory, and for a complementary treatment regarding hybrid discretization methods. == Statement of the problem == The Gaussian scale-space representation of an N-dimensional continuous signal, f C ( x 1 , ⋯ , x N , t ) , {\displaystyle f_{C}\left(x_{1},\cdots ,x_{N},t\right),} is obtained by convolving fC with an N-dimensional Gaussian kernel: g N ( x 1 , ⋯ , x N , t ) . {\displaystyle g_{N}\left(x_{1},\cdots ,x_{N},t\right).} In other words: L ( x 1 , ⋯ , x N , t ) = ∫ u 1 = − ∞ ∞ ⋯ ∫ u N = − ∞ ∞ f C ( x 1 − u 1 , ⋯ , x N − u N , t ) ⋅ g N ( u 1 , ⋯ , u N , t ) d u 1 ⋯ d u N . {\displaystyle L\left(x_{1},\cdots ,x_{N},t\right)=\int _{u_{1}=-\infty }^{\infty }\cdots \int _{u_{N}=-\infty }^{\infty }f_{C}\left(x_{1}-u_{1},\cdots ,x_{N}-u_{N},t\right)\cdot g_{N}\left(u_{1},\cdots ,u_{N},t\right)\,du_{1}\cdots du_{N}.} However, for implementation, this definition is impractical, since it is continuous. When applying the scale space concept to a discrete signal fD, different approaches can be taken. This article is a brief summary of some of the most frequently used methods. == Separability == Using the separability property of the Gaussian kernel g N ( x 1 , … , x N , t ) = G ( x 1 , t ) ⋯ G ( x N , t ) {\displaystyle g_{N}\left(x_{1},\dots ,x_{N},t\right)=G\left(x_{1},t\right)\cdots G\left(x_{N},t\right)} the N-dimensional convolution operation can be decomposed into a set of separable smoothing steps with a one-dimensional Gaussian kernel G along each dimension L ( x 1 , ⋯ , x N , t ) = ∫ u 1 = − ∞ ∞ ⋯ ∫ u N = − ∞ ∞ f C ( x 1 − u 1 , ⋯ , x N − u N , t ) G ( u 1 , t ) d u 1 ⋯ G ( u N , t ) d u N , {\displaystyle L(x_{1},\cdots ,x_{N},t)=\int _{u_{1}=-\infty }^{\infty }\cdots \int _{u_{N}=-\infty }^{\infty }f_{C}(x_{1}-u_{1},\cdots ,x_{N}-u_{N},t)G(u_{1},t)\,du_{1}\cdots G(u_{N},t)\,du_{N},} where G ( x , t ) = 1 2 π t e − x 2 2 t {\displaystyle G(x,t)={\frac {1}{\sqrt {2\pi t}}}e^{-{\frac {x^{2}}{2t}}}} and the standard deviation of the Gaussian σ is related to the scale parameter t according to t = σ2. Separability will be assumed in all that follows, even when the kernel is not exactly Gaussian, since separation of the dimensions is the most practical way to implement multidimensional smoothing, especially at larger scales. Therefore, the rest of the article focuses on the one-dimensional case. == The sampled Gaussian kernel == When implementing the one-dimensional smoothing step in practice, the presumably simplest approach is to convolve the discrete signal fD with a sampled Gaussian kernel: L ( x , t ) = ∑ n = − ∞ ∞ f ( x − n ) G ( n , t ) {\displaystyle L(x,t)=\sum _{n=-\infty }^{\infty }f(x-n)\,G(n,t)} where G ( n , t ) = 1 2 π t e − n 2 2 t {\displaystyle G(n,t)={\frac {1}{\sqrt {2\pi t}}}e^{-{\frac {n^{2}}{2t}}}} (with t = σ2) which in turn is truncated at the ends to give a filter with finite impulse response L ( x , t ) = ∑ n = − M M f ( x − n ) G ( n , t ) {\displaystyle L(x,t)=\sum _{n=-M}^{M}f(x-n)\,G(n,t)} for M chosen sufficiently large (see error function) such that 2 ∫ M ∞ G ( u , t ) d u = 2 ∫ M t ∞ G ( v , 1 ) d v < ε . {\displaystyle 2\int _{M}^{\infty }G(u,t)\,du=2\int _{\frac {M}{\sqrt {t}}}^{\infty }G(v,1)\,dv<\varepsilon .} A common choice is to set M to a constant C times the standard deviation of the Gaussian kernel M = C σ + 1 = C t + 1 {\displaystyle M=C\sigma +1=C{\sqrt {t}}+1} where C is often chosen somewhere between 3 and 6. Using the sampled Gaussian kernel can, however, lead to implementation problems, in particular when computing higher-order derivatives at finer scales by applying sampled derivatives of Gaussian kernels. When accuracy and robustness are primary design criteria, alternative implementation approaches should therefore be considered. For small values of ε (10−6 to 10−8) the errors introduced by truncating the Gaussian are usually negligible. For larger values of ε, however, there are many better alternatives to a rectangular window function. For example, for a given number of points, a Hamming window, Blackman window, or Kaiser window will do less damage to the spectral and other properties of the Gaussian than a simple truncation will. Notwithstanding this, since the Gaussian kernel decreases rapidly at the tails, the main recommendation is still to use a sufficiently small value of ε such that the truncation effects are no longer important. == The discrete Gaussian kernel == A more refined approach is to convolve the original signal with the discrete Gaussian kernel T(n, t) L ( x , t ) = ∑ n = − ∞ ∞ f ( x − n ) T ( n , t ) {\displaystyle L(x,t)=\sum _{n=-\infty }^{\infty }f(x-n)\,T(n,t)} where T ( n , t ) = e − t I n ( t ) {\displaystyle T(n,t)=e^{-t}I_{n}(t)} and I n ( t ) {\displaystyle I_{n}(t)} denotes the modified Bessel functions of integer order, n. This is the discrete counterpart of the continuous Gaussian in that it is the solution to the discrete diffusion equation (discrete space, continuous time), just as the continuous Gaussian is the solution to the continuous diffusion equation. This filter can be truncated in the spatial domain as for the sampled Gaussian L ( x , t ) = ∑ n = − M M f ( x − n ) T ( n , t ) {\displaystyle L(x,t)=\sum _{n=-M}^{M}f(x-n)\,T(n,t)} or can be implemented in the Fourier domain using a closed-form expression for its discrete-time Fourier transform: T ^ ( θ , t ) = ∑ n = − ∞ ∞ T ( n , t ) e − i θ n = e t ( cos ⁡ θ − 1 ) . {\displaystyle {\widehat {T}}(\theta ,t)=\sum _{n=-\infty }^{\infty }T(n,t)\,e^{-i\theta n}=e^{t(\cos \theta -1)}.} With this frequency-domain approach, the scale-space properties transfer exactly to the discrete domain, or with excellent approximation using periodic extension and a suitably long discrete Fourier transform to approximate the discrete-time Fourier transform of the signal being smoothed. Moreover, higher-order derivative approximations can be computed in a straightforward manner (and preserving scale-space properties) by applying small support central difference operators to the discrete scale space representation. As with the sampled Gaussian, a plain truncation of the infinite impulse response will in most cases be a sufficient approximation for small values of ε, while for larger values of ε it is better to use either a decomposition of the discrete Gaussian into a cascade of generalized binomial filters or alternatively to construct a finite approximate kernel by multiplying by a window function. If ε has been chosen too large such that effects of the truncation error begin to appear (for example as spurious extrema or spurious responses to higher-order derivative operators), then the options are to decrease the value of ε such that a larger finite kernel is used, with cutoff where the support is very small, or to use a tapered window. == Recursive filters == Since computational efficiency is often important, low-order recursive filters are often used for scale-space smoothing. For example, Young and van Vliet use a third-order recursive filter with one real pole and a pair of complex poles, applied forward and backward to make a sixth-order symmetric approximation to the Gaussian with low computational complexity for any smoothing scale. By relaxing a few of the axioms, Lindeberg concluded that good smoothing filters would be "normalized Pólya frequency sequences", a family of discrete kernels that includes all filters with real poles at 0 < Z < 1 and/or Z > 1, as well as with real zeros at Z < 0. For symmetry, which leads to approximate directional homogeneity, these filters must be further restricted to pairs of poles and zeros that lead to zero-phase filters. To match the transfer function curvature at zero frequency of the discrete Gaussian, which ensures an approximate semi-group property of additive t, two poles at Z = 1 + 2 t − ( 1 + 2 t ) 2 − 1 {\displaystyle

    Read more →
  • How to Choose an AI Logo Maker

    How to Choose an AI Logo Maker

    Trying to pick the best AI logo maker? An AI logo maker is software that uses machine learning to help you get more done — it scales effortlessly from a single task to thousands. The best picks balance beginner-friendly simplicity with the depth power users need, and they ship updates often. Whether you are a beginner or a pro, the right AI logo maker slots into your workflow and pays for itself fast. This guide breaks down the top picks, their pros and cons, and who each one is best for.

    Read more →
  • Cortana (virtual assistant)

    Cortana (virtual assistant)

    Cortana is a discontinued virtual assistant developed by Microsoft that used the Bing search engine to perform tasks such as setting reminders and answering questions for users. Cortana was available in English, Portuguese, French, German, Italian, Spanish, Chinese, and Japanese language editions, depending on the software platform and region in which it was used. In 2019, Microsoft began reducing the prevalence of Cortana and converting it from an assistant into different software integrations. It was split from the Windows 10 search bar in April 2019. In January 2020, the Cortana mobile app was removed from certain markets, and on March 31, 2021, the Cortana mobile app was shut down globally. On June 2, 2023, Microsoft announced that support for the Cortana standalone app on Microsoft Windows would end in late 2023 and would be replaced by Microsoft Copilot, an AI chatbot. Support for Cortana in the Microsoft Outlook and Microsoft 365 mobile apps was discontinued in fall of 2023. == History == === Beginnings (2009–2014) === The development of Cortana started in 2009 in the Microsoft Speech products team with general manager Zig Serafin and Chief Scientist Larry Heck. Heck and Serafin established the vision, mission, and long-range plan for Microsoft's digital personal assistant and they built a team with the expertise to create the initial prototypes for Cortana. Some of the key researchers in these early efforts included Microsoft Research researchers Dilek Hakkani-Tür, Gokhan Tur, Andreas Stolcke, and Malcolm Slaney, research software developer Madhu Chinthakunta, and user experience designer Lisa Stifelman. To develop the Cortana digital assistant, the team interviewed human personal assistants. The interviews inspired a number of unique features in Cortana, including the assistant's "notebook" feature. Originally, Cortana was meant to be only a codename, but a petition on Windows Phone's UserVoice site proved to be popular and made the codename official. Cortana was demonstrated for the first time at the Microsoft Build developer conference in San Francisco in April 2014. It was launched as a key ingredient of Microsoft's planned "makeover" of future operating systems for Windows Phone and Windows. It was named after Cortana, a synthetic intelligence character in Microsoft's Halo video game franchise originating in Bungie folklore, with Jen Taylor, the character's voice actress, returning to voice the personal assistant's US-specific version. === Expansion (2015–2018) === In January 2015, Microsoft announced the availability of Cortana for Windows 10 desktops and mobile devices as part of merging Windows Phone into the operating system at large. On May 26, 2015, Microsoft announced that Cortana would also be available on other mobile platforms. An Android release was set for July 2015, but the Android APK file containing Cortana was leaked ahead of its release. It was officially released, along with an iOS version, in December 2015. During E3 2015, Microsoft announced that Cortana would come to the Xbox One as part of a universally designed Windows 10 update for the console. Microsoft integrated Cortana into numerous products such as Microsoft Edge. Microsoft's Cortana assistant was deeply integrated into the browser. Cortana was able to find opening hours when on restaurant sites, show retail coupons for websites, or show weather information in the address bar. At the Worldwide Partners Conference 2015 Microsoft demonstrated Cortana integration with products such as GigJam. Conversely, Microsoft announced in late April 2016 that it would block anything other than Bing and Edge from being used to complete Cortana searches, again raising questions of anti-competitive practices by the company. Microsoft's "Windows in the car" concept included Cortana. The concept makes it possible for drivers to make restaurant reservations and see places before they go there. At Microsoft Build 2016, Microsoft announced plans to integrate Cortana into Skype (Microsoft's video-conferencing and instant messaging service) as a bot to allow users to order food, book trips, transcribe video messages and make calendar appointments through Cortana in addition to other bots. As of 2016, Cortana was able to underline certain words and phrases in Skype conversations that relate to contacts and corporations. A writer from Engadget has criticised the Cortana integration in Skype for responding only to very specific keywords, feeling as if she was "chatting with a search engine" due to the impersonal way the bots replied to certain words such as "Hello" causing the Bing Music bot to bring up Adele's song of that name. Microsoft also announced at Microsoft Build 2016 that Cortana would be able to cloud-synchronise notifications between Windows 10 Mobile's and Windows 10's Action Center, as well as notifications from Android devices. In December 2016, Microsoft announced the preview of Calendar.help, a service that enabled people to delegate the scheduling of meetings to Cortana. Users interact with Cortana by including her in email conversations. Cortana would then check people's availability in Outlook Calendar or Google Calendar, and work with others Cc'd on the email to schedule the meeting. The service relied on automation and human-based computation. In May 2017, Microsoft announced INVOKE, a voice-activated speaker featuring Cortana, in collaboration with Harman Kardon. The premium speaker has a cylindrical design and offers 360-degree sound, the ability to make and receive calls with Skype, and all of the other features currently available with Cortana. In 2017, Microsoft partnered with Amazon to integrate Echo and Cortana with each other, allowing users of each smart assistant to summon the other via a command. This feature preview was released in August 2018. Windows 10 users were able to just say "Hey Cortana, open Alexa" and Echo users were able to say "Alexa, open Cortana" to summon the other assistant. === Decreasing focus and discontinuation (2019–2024) === In January 2019, Microsoft CEO Satya Nadella stated that he no longer saw Cortana as a direct competitor against Alexa and Siri. Shortly thereafter, Microsoft began reducing the prevalence of Cortana and converting it from an assistant into different software integrations. It was split from the Windows 10 search bar in April 2019. In January 2020, the Cortana mobile app was removed from certain markets, and then, on July 24, 2020, Cortana was removed from the Xbox dashboard as part of a redesign. On January 31, 2021, Microsoft removed the Cortana mobile application in many markets, including the UK, Australia, Germany, Mexico, China, Spain, Canada, and India. On March 31, 2021, Microsoft shut down the Cortana apps globally for iOS and Android and removed the apps entirely from their corresponding app stores. To access previously recorded content, users had to use Cortana on Windows 10 or other specialized Microsoft applications. Microsoft also reduced emphasis on Cortana in Windows with the 2021 release of Windows 11. Cortana was not used during the device setup process or pinned to the taskbar by default. On June 2, 2023, Microsoft announced the Cortana standalone app on Windows 10 and Windows 11 which would shut down later in the year. In its support article, Microsoft listed several alternatives, most of which have since been rebranded as Microsoft Copilot. They also added that the change would not impact Cortana in Office 365 and Teams environments. On August 11, 2023, Microsoft updated the Cortana standalone app in Windows, informing that it was deprecated and can no longer be used. Microsoft's support article announcing the deprecation of Cortana was updated to reflect this change. Along with the deprecation of the standalone app, it was announced that Cortana support in Teams mobile, Microsoft Teams displays, and Teams rooms would end in late 2023. The support article states that Cortana in the “Play my emails” feature of the Microsoft Outlook mobile app would continue to be available. Later in June 2024, the support article was updated, stating that Cortana in the voice search and the "Play my emails" feature is now removed from the Microsoft Outlook mobile app, officially marking the discontinuation of Cortana across all Microsoft products. On May 22, 2024, Microsoft announced the Windows 11 24H2 update, which removed Cortana, Tips, and WordPad from systems. == Functionality == Cortana was able to set reminders, recognize natural voice without the requirement for keyboard input, and answer questions using information from the Bing search engine. Searches using Windows 10 are made only with the Microsoft Bing search engine, and all links will open with Microsoft Edge, except when a screen reader such as Narrator was being used, where the links will open in Internet Explorer. Windows Phone 8.1's universal Bing SmartSearch features were incorporated into Cortana, which replaced the

    Read more →
  • SDL plc

    SDL plc

    SDL plc was a British multinational professional services company based in Maidenhead, Berkshire, United Kingdom. SDL specialized in language translation software and services (including interpretation services). It was listed on the London Stock Exchange until it was acquired by RWS Group in November 2020. == Name == SDL is an abbreviation for "Software and Documentation Localization". == History == The company was founded by Mark Lancaster with nine employees in 1992. It opened its first overseas office in France in 1996 and was first listed on the London Stock Exchange in 1999. The company grew organically and via acquisitions. SDL acquired Polylang Multimedia in 1998, International Translation & Publishing (ITP) in 2000, Alpnet in 2001, and the machine translation (MT) assets of Transparent Language in 2001. It bought Trados, a rival translation memory (TM) developer, in 2005. In 2007, the company acquired Tridion, a content management system vendor, and PASS Engineering, developers of the Passolo software. In 2008, it bought Idiom Technologies, a global information system management business. In July 2009 SDL acquired XyEnterprise in an all-cash transaction to add XML Professional Publisher as well as Contenta content management software and LiveContent to manage and deliver XML. This unit combined with Trisoft formerly Infoshare. In December 2009, SDL acquired Fredhopper, a Dutch eCommerce onsite search and navigation, onsite targeting and targeted advertising software vendor. Later that same year, it bought Xopus, another Dutch company and the leader in online XML editing. In May 2011 SDL acquired Dutch-based Media Asset Management company, Calamares, in 2012 the campaign management and social media analytics company, Alterian, and in 2013, bemoko, a supplier of internet software for mobile devices. In January 2016, having undertaken a strategic review, SDL announced the divestment of Fredhopper and Alterian as non-complementary to its new strategy. In August 2020 RWS Group announced a proposed takeover of the company for £809 million. The transaction was completed on 4 November 2020. == Operations == SDL provided software for language translation purposes.

    Read more →
  • Retained mode

    Retained mode

    Retained mode in computer graphics is a major pattern of API design in graphics libraries, in which the graphics library, instead of the client, retains the scene (complete object model of the rendering primitives) to be rendered and the client calls into the graphics library do not directly cause actual rendering, but make use of extensive indirection to resources, managed – thus retained – by the graphics library. It does not preclude the use of double-buffering. Immediate mode is an alternative approach. Historically, retained mode has been the dominant style in GUI libraries; however, both can coexist in the same library and are not necessarily exclusionary in practice. == Overview == In retained mode the client calls do not directly cause actual rendering, but instead update an abstract internal model (typically a list of objects) which is maintained within the library's data space. This allows the library to optimize when actual rendering takes place along with the processing of related objects. Some techniques to optimize rendering include: managing double buffering treatment of hidden surfaces by backface culling/occlusion culling (Z-buffering) only transferring data that has changed from one frame to the next from the application to the library Example of coexistence with immediate mode in the same library is OpenGL. OpenGL has immediate mode functions that can use previously defined server side objects (textures, vertex buffers and index buffers, shaders, etc.) without resending unchanged data. Examples of retained mode rendering systems include Windows Presentation Foundation, SceneKit on macOS, and PHIGS.

    Read more →
  • Writer invariant

    Writer invariant

    Writer invariant, also called authorial invariant or author's invariant, is a property of a text which is invariant of its author, that is, it will be similar in all texts of a given author and different in texts of different authors. It can be used to find plagiarism or discover who is real author of anonymously published text. Writer invariant is also an author's pattern of writing a letter in handwritten text recognition. While it is generally recognised that writer invariants exist, it is not agreed what properties of a text should be used. Among the first ones used was distribution of word lengths; other proposed invariants include average sentence length, average word length, noun, verb or adjective usage frequency, vocabulary richness, and frequency of function words, or specific function words. Of these, average sentence lengths can be very similar in works of different authors or vary significantly even within a single work; average word lengths likewise turn out to be very similar in works of different authors. Analysis of function words shows promise because they are used by authors unconsciously.

    Read more →
  • Anil K. Jain (computer scientist, born 1948)

    Anil K. Jain (computer scientist, born 1948)

    Anil Kumar Jain (born 1948) is an Indian-American computer scientist and University Distinguished Professor in the Department of Computer Science and Engineering at Michigan State University. He is one of the most highly cited researchers in computer science, and is internationally recognized for his foundational contributions to pattern recognition, computer vision, and biometric recognition, particularly in fingerprint recognition and face recognition. Jain is a member of the United States National Academy of Engineering, a Foreign Member of the Chinese Academy of Sciences, and a Foreign Fellow of the Indian National Academy of Engineering. He is a Fellow of the ACM, IEEE, AAAS, IAPR, and SPIE. His research has shaped the field of biometrics and has been applied in systems used worldwide for identity verification, law enforcement, and border security. In 2024, he was awarded the BBVA Foundation Frontiers of Knowledge Award in the category of Information and Communication Technologies. == Early life and education == Born in Basti, India, Jain received his Bachelor of Technology in electrical engineering from the Indian Institute of Technology, Kanpur in 1969. He then moved to the United States, where he earned his M.S. in 1970 and Ph.D. in 1973 from Ohio State University. His doctoral dissertation, titled Some Aspects of Dimensionality and Sample Size Problems in Statistical Pattern Recognition, was supervised by Robert B. McGhee and laid the groundwork for his subsequent research in pattern recognition. == Career == Jain began his academic career at Wayne State University, where he taught from 1972 to 1974. In 1974, he joined the faculty of Michigan State University, where he has remained for over five decades and currently holds the position of University Distinguished Professor. Throughout his career, Jain has conducted pioneering research in data clustering, fingerprint recognition, and face recognition. His work has been published in leading scientific journals including Scientific American, Nature, IEEE Spectrum, and MIT Technology Review. He served as Editor-in-Chief of the IEEE Transactions on Pattern Analysis and Machine Intelligence from 1991 to 1994. Jain has also contributed to national security and policy through his service on several advisory bodies. He served as a member of the U.S. National Academies panels on Information Technology, Whither Biometrics, and Improvised Explosive Devices (IED). He has also served on the Defense Science Board, the Forensic Science Standards Board, and the AAAS Latent Fingerprint Working Group. In 2014, Jain was named Innovator of the Year at Michigan State University for transferring several technologies on face and fingerprint recognition to major players in the biometrics industry. He holds eight U.S. and Korean patents related to biometric technologies. == Research contributions == Jain's research spans pattern recognition, computer vision, machine learning, and biometric recognition. His contributions have been particularly influential in several areas: === Biometric recognition === Jain is considered one of the foremost authorities on biometric recognition systems. His research group at Michigan State University has developed algorithms and systems for fingerprint, face, and iris recognition that have been widely adopted in both academic research and commercial applications. His work on fingerprint matching algorithms has been instrumental in establishing standards for automated fingerprint identification systems (AFIS) used by law enforcement agencies worldwide. In recent years, Jain and his research team have made significant advances in child fingerprint recognition, demonstrating that digital scans of a young child's fingerprint can be correctly recognized one year later with over 99 percent accuracy for children as young as six months old. This research has important implications for child identification in developing countries, where it can be used to track immunization records and provide access to medical care. === Data clustering === Jain's survey article "Data clustering: a review" (1999), co-authored with M. N. Murty and P. J. Flynn, is one of the most highly cited papers in computer science. His 2010 paper "Data Clustering: 50 Years Beyond K-Means" provided a comprehensive overview of the evolution of clustering methods and remains an essential reference in the field. === Statistical pattern recognition === Jain's work on statistical pattern recognition, including his influential survey "Statistical pattern recognition: A review" (2000) with R. P. W. Duin and Jianchang Mao, has shaped the theoretical foundations of the field. == Citation metrics and academic impact == Jain is among the most highly cited researchers in computer science. Based on his Google Scholar profile, he had an h-index of 200 in 2020, which was the highest among computer scientists identified in a survey published by UCLA at the time. As of August 2023, his h-index on Google Scholar is 211. He has since been surpassed by Yoshua Bengio, a researcher of similar subjects (neural networks and deep learning for artificial intelligence), who had an h-index of 224 as of August 2023. Another source reported that as of December 2022, he had the highest discipline h-index (D-index) in computer science. == Honors and awards == Jain has received numerous awards and honors recognizing his contributions to computer science and engineering: === Academy memberships === Member, United States National Academy of Engineering (2016) — elected "for contributions to the engineering and practice of biometrics" Foreign Fellow, Indian National Academy of Engineering (2016) Foreign Member, Chinese Academy of Sciences (2019) Member, The World Academy of Sciences (2019) Fellow, National Academy of Inventors === Professional society fellowships === Fellow, ACM Fellow, IEEE (1988) — for contributions to image processing Fellow, AAAS Fellow, International Association for Pattern Recognition Fellow, SPIE === Major awards === BBVA Foundation Frontiers of Knowledge Award in Information and Communication Technologies (2024) IAPR King-Sun Fu Prize (2008) IEEE W. Wallace McDowell Award (2007) — the highest technical honor awarded by the IEEE Computer Society, for pioneering contributions to theory, technique, and practice of pattern recognition, computer vision, and biometric recognition systems IEEE Computer Society Technical Achievement Award (2003) IAPR Pierre Devijver Award (2002) Humboldt Research Award (2002) Guggenheim Fellowship (2001) Fulbright Fellowship (1998) IEEE ICDM Research Contribution Award (2008) === Best paper awards === IEEE Transactions on Neural Networks (1996) Pattern Recognition journal (1987, 1991, 2005) === Honorary doctorates === Universidad Autónoma de Madrid (2018) Hong Kong University of Science and Technology (2021) == Legacy and endowments == Two endowed funds have been established in Jain's honor at Michigan State University, recognizing his lasting impact on the field and the university. In 2015, a former visiting scholar from Jain's laboratory made an anonymous $400,000 gift to create the Anil K. Jain Endowed Graduate Fellowship, which supports doctoral-level research in pattern recognition, computer vision, and biometric recognition. In 2022, the Anil K. and Nandita K. Jain Endowed Professorship was established through $1 million in contributions from multiple donors, including a substantial gift from the Jain family, to support faculty recruitment and retention in the Department of Computer Science and Engineering. == Selected publications == === Books === 1988. Algorithms For Clustering Data. With Richard C. Dubes. Prentice Hall. 1993. Markov Random Fields: Theory and Applications. With Rama Chellappa eds. Academic Press. 1999. Biometrics: Personal Identification in Networked Society. With Ruud M. Bolle and Sharath Pankanti eds. Springer. 2003. Handbook of Fingerprint Recognition. (2nd edition 2009). With D. Maio, D. Maltoni, S. Prabhakar. Springer. 2005. Handbook of Face Recognition. (2nd edition 2011). With S. Z. Li ed. Springer. 2006. Handbook of Multibiometrics. With A. Ross and K. Nandakumar. Springer. 2007. Handbook of Biometrics. With P. Flynn and A. Ross eds. Springer. 2011. Introduction to Biometrics. With A. Ross and K. Nandakumar. Springer. 2015. Encyclopedia of Biometrics (Second Edition). With Stan Li. Springer. === Research articles === Cross, George R. and Anil K. Jain. "Markov random field texture models". IEEE Transactions on Pattern Analysis and Machine Intelligence (1983): 25–39. Jain, Anil K., and Farshid Farrokhnia. "Unsupervised texture segmentation using Gabor filters". Pattern Recognition 24.12 (1991): 1167–1186. Jain, Anil K., and Douglas Zongker. "Feature selection: Evaluation, application, and small sample performance". IEEE Transactions on Pattern Analysis and Machine Intelligence, 19.2 (1997): 153–158. Jain, Anil K., L. Hong, S. Pankanti, R. Bolle. "An Identity-A

    Read more →
  • Powerset construction

    Powerset construction

    In the theory of computation and automata theory, the powerset construction or subset construction is a standard method for converting a nondeterministic finite automaton (NFA) into a deterministic finite automaton (DFA) that recognizes the same formal language. It is important in theory because it establishes that NFAs, despite their additional flexibility, are unable to recognize any language that cannot be recognized by some DFA. It is also important in practice for converting easier-to-construct NFAs into more efficiently executable DFAs. However, if the NFA has n states, the resulting DFA may have up to 2n states, an exponentially larger number, which sometimes makes the construction impractical for large NFAs. The construction, sometimes called the Rabin–Scott powerset construction (or subset construction) to distinguish it from similar constructions for other types of automata, was first published by Michael O. Rabin and Dana Scott in 1959. == Intuition == To simulate the operation of a DFA on a given input string, one needs to keep track of a single state at any time: the state that the automaton will reach after seeing a prefix of the input. In contrast, to simulate an NFA, one needs to keep track of a set of states: all of the states that the automaton could reach after seeing the same prefix of the input, according to the nondeterministic choices made by the automaton. If, after a certain prefix of the input, a set S of states can be reached, then after the next input symbol x the set of reachable states is a deterministic function of S and x. Therefore, the sets of reachable NFA states play the same role in the NFA simulation as single DFA states play in the DFA simulation, and in fact the sets of NFA states appearing in this simulation may be re-interpreted as being states of a DFA. == Construction == The powerset construction applies most directly to an NFA that does not allow state transformations without consuming input symbols (aka: "ε-moves"). Such an automaton may be defined as a 5-tuple (Q, Σ, T, q0, F), in which Q is the set of states, Σ is the set of input symbols, T is the transition function (mapping a state and an input symbol to a set of states), q0 is the initial state, and F is the set of accepting states. The corresponding DFA has states corresponding to subsets of Q. The initial state of the DFA is {q0}, the (one-element) set of initial states. The transition function of the DFA maps a state S (representing a subset of Q) and an input symbol x to the set T(S,x) = ∪{T(q,x) | q ∈ S}, the set of all states that can be reached by an x-transition from a state in S. A state S of the DFA is an accepting state if and only if at least one member of S is an accepting state of the NFA. In the simplest version of the powerset construction, the set of all states of the DFA is the powerset of Q, the set of all possible subsets of Q. However, many states of the resulting DFA may be useless as they may be unreachable from the initial state. An alternative version of the construction creates only the states that are actually reachable. === NFA with ε-moves === For an NFA with ε-moves (also called an ε-NFA), the construction must be modified to deal with these by computing the ε-closure of states: the set of all states reachable from some given state using only ε-moves. Van Noord recognizes three possible ways of incorporating this closure computation in the powerset construction: Compute the ε-closure of the entire automaton as a preprocessing step, producing an equivalent NFA without ε-moves, then apply the regular powerset construction. This version, also discussed by Hopcroft and Ullman, is straightforward to implement, but impractical for automata with large numbers of ε-moves, as commonly arise in natural language processing application. During the powerset computation, compute the ε-closure { q ′ | q → ε ∗ q ′ } {\displaystyle \{q'~|~q\to _{\varepsilon }^{}q'\}} of each state q that is considered by the algorithm (and cache the result). During the powerset computation, compute the ε-closure { q ′ | ∃ q ∈ Q ′ , q → ε ∗ q ′ } {\displaystyle \{q'~|~\exists q\in Q',q\to _{\varepsilon }^{}q'\}} of each subset of states Q' that is considered by the algorithm, and add its elements to Q'. === Multiple initial states === If NFAs are defined to allow for multiple initial states, the initial state of the corresponding DFA is the set of all initial states of the NFA, or (if the NFA also has ε-moves) the set of all states reachable from initial states by ε-moves. == Example == The NFA below has four states; state 1 is initial, and states 3 and 4 are accepting. Its alphabet consists of the two symbols 0 and 1, and it has ε-moves. The initial state of the DFA constructed from this NFA is the set of all NFA states that are reachable from state 1 by ε-moves; that is, it is the set {1,2,3}. A transition from {1,2,3} by input symbol 0 must follow either the arrow from state 1 to state 2, or the arrow from state 3 to state 4. Additionally, neither state 2 nor state 4 have outgoing ε-moves. Therefore, T({1,2,3},0) = {2,4}, and by the same reasoning the full DFA constructed from the NFA is as shown below. As can be seen in this example, there are five states reachable from the start state of the DFA; the remaining 11 sets in the powerset of the set of NFA states are not reachable. == Complexity == Because the DFA states consist of sets of NFA states, an n-state NFA may be converted to a DFA with at most 2n states. For every n, there exist n-state NFAs such that every subset of states is reachable from the initial subset, so that the converted DFA has exactly 2n states, giving Θ(2n) worst-case time complexity. A simple example requiring nearly this many states is the language of strings over the alphabet {0,1} in which there are at least n characters, the nth from last of which is 1. It can be represented by an (n + 1)-state NFA, but it requires 2n DFA states, one for each n-character suffix of the input; cf. picture for n=4. == Applications == Brzozowski's algorithm for DFA minimization uses the powerset construction, twice. It converts the input DFA into an NFA for the reverse language, by reversing all its arrows and exchanging the roles of initial and accepting states, converts the NFA back into a DFA using the powerset construction, and then repeats its process. Its worst-case complexity is exponential, unlike some other known DFA minimization algorithms, but in many examples it performs more quickly than its worst-case complexity would suggest. Safra's construction, which converts a non-deterministic Büchi automaton with n states into a deterministic Muller automaton or into a deterministic Rabin automaton with 2O(n log n) states, uses the powerset construction as part of its machinery.

    Read more →
  • CPanel

    CPanel

    cPanel is a web hosting control panel software developed by cPanel, L.L.C. It provides a graphical interface (GUI) and automation tools designed to simplify the process of hosting a web site for the website owner or "end user". It enables administration through a standard web browser using a three-tier structure. While cPanel is limited to managing a single hosting account, cPanel & WHM allow the administration of the entire server. In addition to the GUI, cPanel also has command line and API-based access that allows third-party software vendors, web hosting organizations, and developers to automate standard system administration processes. cPanel & WHM is designed to function either as a dedicated server or virtual private server. The latest cPanel & WHM version supports installation on AlmaLinux, Rocky Linux, CloudLinux OS, and Ubuntu. == History == cPanel is currently developed by cPanel, L.L.C., a privately owned company headquartered in Houston, Texas, United States. WebPros is the parent company of cPanel, L.L.C. It was originally designed in 1996 as the control panel for Speed Hosting, a now-defunct web hosting company. The original author of cPanel, J. Nick Koston, had a stake in Speed Hosting. Webking quickly began using cPanel after its merger with Speed Hosting. The new company moved its servers to Virtual Development Inc. (VDI), a now-defunct hosting facility. Following an agreement between Koston and VDI, cPanel was only available to customers hosted directly at VDI. At the time, there was little competition in the control panel market, with the main choices being VDI and Alabanza. Eventually, due to Koston leaving for college, he and William Jensen signed an agreement in which cPanel was split into a separate program called WebPanel; this version was run by VDI. Without the lead programmer, VDI was not able to continue any work on cPanel and eventually stopped supporting it completely. Koston kept working on cPanel while also working at BurstNET. Eventually, he left BurstNET to focus fully on cPanel. cPanel 3 was released in 1999: main additions over cPanel 2 were an automatic upgrade and the Web Host Manager (WHM). The interface was also improved when Carlos Rego of WizardsHosting made what became the default theme of cPanel. With the release of cPanel 11, cPanel adopted a four-tier versioning system, "Parent.Major.Minor.Patch" (e.g., 11.32.0.3). As of version 11.52, the "Parent" representation is deprecated, with 11.54 stylized as "Version 54." cPanel 11.30 is the last major version to support FreeBSD. On August 20, 2018 cPanel L.L.C. announced that it had signed an agreement to be acquired by a group led by Oakley Capital (who also own Plesk and SolusVM). While Koston sold his interest in cPanel, he will continue to be an owner of the company that owns cPanel. In April 2026, a severe vulnerability was discovered that affected all cPanel and WHM versions after 11.40, affectively allowing unauthenticated remote attackers to access the control panel. According to some web hosters the vulnerability was already being actively exploited, with some attempts even dating back to late February 2026. == Add-ons == cPanel provides front-ends for a number of common operations, including the management of PGP keys, crontab tasks, mail and FTP accounts, and mailing lists. Several add-ons exist, some for an additional fee, including auto installers such as Installatron, Fantastico, Softaculous, and WHMSonic (SHOUTcast/radio Control Panel Add-on). The add-ons need to be enabled by the server administrator in WHM to be accessible to the cPanel user. WHM manages some software packages separately from the underlying operating system, applying upgrades to Apache, PHP, MySQL and MariaDB, Exim, FTP, and related software packages automatically. This ensures that these packages are kept up-to-date and compatible with WHM, but makes it more difficult to install newer versions of these packages. It also makes it difficult to verify that the packages have not been tampered with, since the operating system's package management verification system cannot be used to do so. == WHM == WHM, short for WebHost Manager, is a web-based tool which is used for server administration. There are at least two tiers of WHM, often referred to as "root WHM", and non-root WHM (or Reseller WHM). Root WHM is used by server administrators and non-root WHM (with fewer privileges) is used by others, like entity departments, and resellers to manage hosting accounts often referred to as cPanel accounts on a web server. WHM is also used to manage SSL certificates (both server self generated and CA provided SSL certificates), cPanel users, hosting packages, DNS zones, themes, and authentication methods. The default automatic SSL (AutoSSL) provided by cPanel is powered by Let's Encrypt. Additionally, WHM can also be used to manage FTP, Mail (POP, IMAP, and SMTP) and SSH services on the server. As well as being accessible by the root administrator, WHM is also accessible to users with reseller privileges. Reseller users of cPanel have a smaller set of features than the root user, generally limited by the server administrator, to features which they determine will affect their customers' accounts rather than the server as a whole. From root WHM, the server administrator can perform maintenance operations such as upgrading and recompiling Apache and PHP, installing Perl modules, and upgrading RPMs installed on the system. == Enkompass == A version of cPanel & WHM for Microsoft Windows, called Enkompass, was declared end-of-life as of February 2014. Version 3 remained available for download, but without further development or support. In the preceding years, Enkompass had been available for free as product development slowed. == Pricing == On June 27, 2019 cPanel announced a new account-based pricing structure. After backlash from their customers, cPanel issued a second announcement but did not change the new structure.

    Read more →
  • Regularization perspectives on support vector machines

    Regularization perspectives on support vector machines

    Within mathematical analysis, Regularization perspectives on support-vector machines provide a way of interpreting support-vector machines (SVMs) in the context of other regularization-based machine-learning algorithms. SVM algorithms categorize binary data, with the goal of fitting the training set data in a way that minimizes the average of the hinge-loss function and L2 norm of the learned weights. This strategy avoids overfitting via Tikhonov regularization and in the L2 norm sense and also corresponds to minimizing the bias and variance of our estimator of the weights. Estimators with lower Mean squared error predict better or generalize better when given unseen data. Specifically, Tikhonov regularization algorithms produce a decision boundary that minimizes the average training-set error and constrain the Decision boundary not to be excessively complicated or overfit the training data via a L2 norm of the weights term. The training and test-set errors can be measured without bias and in a fair way using accuracy, precision, Auc-Roc, precision-recall, and other metrics. Regularization perspectives on support-vector machines interpret SVM as a special case of Tikhonov regularization, specifically Tikhonov regularization with the hinge loss for a loss function. This provides a theoretical framework with which to analyze SVM algorithms and compare them to other algorithms with the same goals: to generalize without overfitting. SVM was first proposed in 1995 by Corinna Cortes and Vladimir Vapnik, and framed geometrically as a method for finding hyperplanes that can separate multidimensional data into two categories. This traditional geometric interpretation of SVMs provides useful intuition about how SVMs work, but is difficult to relate to other machine-learning techniques for avoiding overfitting, like regularization, early stopping, sparsity and Bayesian inference. However, once it was discovered that SVM is also a special case of Tikhonov regularization, regularization perspectives on SVM provided the theory necessary to fit SVM within a broader class of algorithms. This has enabled detailed comparisons between SVM and other forms of Tikhonov regularization, and theoretical grounding for why it is beneficial to use SVM's loss function, the hinge loss. == Theoretical background == In the statistical learning theory framework, an algorithm is a strategy for choosing a function f : X → Y {\displaystyle f\colon \mathbf {X} \to \mathbf {Y} } given a training set S = { ( x 1 , y 1 ) , … , ( x n , y n ) } {\displaystyle S=\{(x_{1},y_{1}),\ldots ,(x_{n},y_{n})\}} of inputs x i {\displaystyle x_{i}} and their labels y i {\displaystyle y_{i}} (the labels are usually ± 1 {\displaystyle \pm 1} ). Regularization strategies avoid overfitting by choosing a function that fits the data, but is not too complex. Specifically: f = argmin f ∈ H { 1 n ∑ i = 1 n V ( y i , f ( x i ) ) + λ ‖ f ‖ H 2 } , {\displaystyle f={\underset {f\in {\mathcal {H}}}{\operatorname {argmin} }}\left\{{\frac {1}{n}}\sum _{i=1}^{n}V(y_{i},f(x_{i}))+\lambda \|f\|_{\mathcal {H}}^{2}\right\},} where H {\displaystyle {\mathcal {H}}} is a hypothesis space of functions, V : Y × Y → R {\displaystyle V\colon \mathbf {Y} \times \mathbf {Y} \to \mathbb {R} } is the loss function, ‖ ⋅ ‖ H {\displaystyle \|\cdot \|_{\mathcal {H}}} is a norm on the hypothesis space of functions, and λ ∈ R {\displaystyle \lambda \in \mathbb {R} } is the regularization parameter. When H {\displaystyle {\mathcal {H}}} is a reproducing kernel Hilbert space, there exists a kernel function K : X × X → R {\displaystyle K\colon \mathbf {X} \times \mathbf {X} \to \mathbb {R} } that can be written as an n × n {\displaystyle n\times n} symmetric positive-definite matrix K {\displaystyle \mathbf {K} } . By the representer theorem, f ( x i ) = ∑ j = 1 n c j K i j , and ‖ f ‖ H 2 = ⟨ f , f ⟩ H = ∑ i = 1 n ∑ j = 1 n c i c j K ( x i , x j ) = c T K c . {\displaystyle f(x_{i})=\sum _{j=1}^{n}c_{j}\mathbf {K} _{ij},{\text{ and }}\|f\|_{\mathcal {H}}^{2}=\langle f,f\rangle _{\mathcal {H}}=\sum _{i=1}^{n}\sum _{j=1}^{n}c_{i}c_{j}K(x_{i},x_{j})=c^{T}\mathbf {K} c.} == Special properties of the hinge loss == The simplest and most intuitive loss function for categorization is the misclassification loss, or 0–1 loss, which is 0 if f ( x i ) = y i {\displaystyle f(x_{i})=y_{i}} and 1 if f ( x i ) ≠ y i {\displaystyle f(x_{i})\neq y_{i}} , i.e. the Heaviside step function on − y i f ( x i ) {\displaystyle -y_{i}f(x_{i})} . However, this loss function is not convex, which makes the regularization problem very difficult to minimize computationally. Therefore, we look for convex substitutes for the 0–1 loss. The hinge loss, V ( y i , f ( x i ) ) = ( 1 − y f ( x ) ) + {\displaystyle V{\big (}y_{i},f(x_{i}){\big )}={\big (}1-yf(x){\big )}_{+}} , where ( s ) + = max ( s , 0 ) {\displaystyle (s)_{+}=\max(s,0)} , provides such a convex relaxation. In fact, the hinge loss is the tightest convex upper bound to the 0–1 misclassification loss function, and with infinite data returns the Bayes-optimal solution: f b ( x ) = { 1 , p ( 1 ∣ x ) > p ( − 1 ∣ x ) , − 1 , p ( 1 ∣ x ) < p ( − 1 ∣ x ) . {\displaystyle f_{b}(x)={\begin{cases}1,&p(1\mid x)>p(-1\mid x),\\-1,&p(1\mid x) Read more →

  • Mehryar Mohri

    Mehryar Mohri

    Mehryar Mohri is a professor and theoretical computer scientist at the Courant Institute of Mathematical Sciences. He is also heading the Machine Learning Theory (ML Theory) team at Google Research. == Career == Prior to joining the Courant Institute, Mohri was a research department head and later technology leader at AT&T Bell Labs, where he was a member of the technical staff for about ten years. Mohri has also taught as an assistant professor at the University of Paris 7 (1992-1993) and Ecole Polytechnique (1992-1994). == Research == Mohri's main area of research is machine learning, in particular learning theory. He is also an expert in automata theory and algorithms. He is the author of several core algorithms that have served as the foundation for the design of many deployed speech recognition and natural language processing systems. == Publications == Mohri is the author of the reference book Foundations of Machine Learning used as a textbook in many graduate-level machine learning courses. Mohri is also a member of the Lothaire group of mathematicians with the pseudonym M. Lothaire and contributed to the book on Applied Combinatorics on Words. He is the author of more than 250 conference and journal publications. == Organizational affiliations == Mohri is currently the President of the Association for Algorithmic Learning Theory (AALT) and the Steering Committee Chair for the ALT conference. He is also Editorial Board member of Machine Learning and TheoretiCS, Action Editor of the Journal of Machine Learning Research (JMLR) and a member of the advisory board for the Journal of Automata, Languages and Combinatorics.

    Read more →
  • Corinna Cortes

    Corinna Cortes

    Corinna Cortes (born 31 March 1961) is a Danish computer scientist known for her contributions to machine learning. She is a Vice President at Google Research in New York City. Cortes is an ACM Fellow and a recipient of the Paris Kanellakis Award for her work on theoretical foundations of support vector machines. == Early life and education == Corinna Cortes was born in 1961 in Denmark. Cortes received her Master of Science degree in physics from University of Copenhagen in 1989. She received her PhD in computer science from the University of Rochester in 1993 for research supervised by Randal C. Nelson. == Career and research == Cortes joined AT&T Bell Labs as a researcher in 1993. Since 2003, she has served as Vice President of Google Research, New York City, and since 2011, as adjunct professor at the UCPH Department of Computer Science. She is serves as an editorial board member of the journal Machine Learning. Cortes' research covers a wide range of topics in machine learning, including support vector machines (SVM) and data mining. SVM is one of the most frequently used algorithms in machine learning, which is used in many practical applications, including medical diagnosis and weather forecasting. At AT&T, Cortes was a contributor to the design of Hancock programming language. === Awards and honours === In 2008, she jointly with Vladimir Vapnik received the Paris Kanellakis Award for the development of a highly effective algorithm for supervised learning known as support vector machines (SVM). She was named an ACM Fellow in 2023 for theoretical and practical contributions to machine learning, industrial leadership and service to the field. == Personal life == Corinna has two children and is also a competitive runner.

    Read more →
  • Software component

    Software component

    A software component is a modular unit of software that encapsulates specific functionality. The desired characteristics of a component are reusability and maintainability. == Value == Components allow software developers to assemble software with reliable parts rather than writing code for every aspect. It makes implementation more like factory assembly than custom building. == Attributes == Desirable attributes of a component include but are not limited to: Cohesive – encapsulates related functionality Reusable Robust Substitutable – can be replaced by another component with the same interface Documented Tested == Third-party == Some components are built in-house by the same organization or team building the software system. Some are third-party, developed elsewhere and assembled into the software system. == Component-based software engineering == For large-scale systems, component-based development encourages a disciplined process to manage complexity. == Framework == Some components conform to a framework technology that allows them to be consumed in a well-known way. Examples include: CORBA, COM, Enterprise JavaBeans, and the .NET Framework. == Modeling == Component design is often modeled visually. In Unified Modeling Language (UML) 2.0 a component is shown as a rectangle, and an interface is shown as a lollipop to indicate a provided interface and as a socket to indicate consumption of an interface. == History == The idea of reusable software components was promoted by Douglas McIlroy in his presentation at the NATO Software Engineering Conference of 1968. (One goal of that conference was to resolve the so-called software crisis of the time.) In the 1970s, McIlroy put this idea into practice with the addition of the pipeline feature to the Unix operating system. Brad Cox refined the concept of a software component in the 1980s. He attempted to create an infrastructure and market for reusable third-party components by inventing the Objective-C programming language. IBM introduced System Object Model (SOM) in the early 1990s. Microsoft introduced Component Object Model (COM) in the early 1990s. Microsoft built many domain-specific component technologies on COM, including Distributed Component Object Model (DCOM), Object Linking and Embedding (OLE), and ActiveX.

    Read more →
  • Suffix automaton

    Suffix automaton

    In computer science, a suffix automaton is an efficient data structure for representing the substring index of a given string which allows the storage, processing, and retrieval of compressed information about all its substrings. The suffix automaton of a string S {\displaystyle S} is the smallest directed acyclic graph with a dedicated initial vertex and a set of "final" vertices, such that paths from the initial vertex to final vertices represent the suffixes of the string. In terms of automata theory, a suffix automaton is the minimal partial deterministic finite automaton that recognizes the set of suffixes of a given string S = s 1 s 2 … s n {\displaystyle S=s_{1}s_{2}\dots s_{n}} . The state graph of a suffix automaton is called a directed acyclic word graph (DAWG), a term that is also sometimes used for any deterministic acyclic finite state automaton. Suffix automata were introduced in 1983 by a group of scientists from the University of Denver and the University of Colorado Boulder. They suggested a linear time online algorithm for its construction and showed that the suffix automaton of a string S {\displaystyle S} having length at least two characters has at most 2 | S | − 1 {\textstyle 2|S|-1} states and at most 3 | S | − 4 {\textstyle 3|S|-4} transitions. Further works have shown a close connection between suffix automata and suffix trees, and have outlined several generalizations of suffix automata, such as compacted suffix automaton obtained by compression of nodes with a single outgoing arc. Suffix automata provide efficient solutions to problems such as substring search and computation of the largest common substring of two and more strings. == History == The concept of suffix automaton was introduced in 1983 by a group of scientists from University of Denver and University of Colorado Boulder consisting of Anselm Blumer, Janet Blumer, Andrzej Ehrenfeucht, David Haussler and Ross McConnell, although similar concepts had earlier been studied alongside suffix trees in the works of Peter Weiner, Vaughan Pratt and Anatol Slissenko. In their initial work, Blumer et al. showed a suffix automaton built for the string S {\displaystyle S} of length greater than 1 {\displaystyle 1} has at most 2 | S | − 1 {\displaystyle 2|S|-1} states and at most 3 | S | − 4 {\displaystyle 3|S|-4} transitions, and suggested a linear algorithm for automaton construction. In 1983, Mu-Tian Chen and Joel Seiferas independently showed that Weiner's 1973 suffix-tree construction algorithm while building a suffix tree of the string S {\displaystyle S} constructs a suffix automaton of the reversed string S R {\textstyle S^{R}} as an auxiliary structure. In 1987, Blumer et al. applied the compressing technique used in suffix trees to a suffix automaton and invented the compacted suffix automaton, which is also called the compacted directed acyclic word graph (CDAWG). In 1997, Maxime Crochemore and Renaud Vérin developed a linear algorithm for direct CDAWG construction. In 2001, Shunsuke Inenaga et al. developed an algorithm for construction of CDAWG for a set of words given by a trie. == Definitions == Usually when speaking about suffix automata and related concepts, some notions from formal language theory and automata theory are used, in particular: "Alphabet" is a finite set Σ {\displaystyle \Sigma } that is used to construct words. Its elements are called "characters"; "Word" is a finite sequence of characters ω = ω 1 ω 2 … ω n {\displaystyle \omega =\omega _{1}\omega _{2}\dots \omega _{n}} . "Length" of the word ω {\displaystyle \omega } is denoted as | ω | = n {\displaystyle |\omega |=n} ; "Formal language" is a set of words over given alphabet; "Language of all words" is denoted as Σ ∗ {\displaystyle \Sigma ^{}} (where the "" character stands for Kleene star), "empty word" (the word of zero length) is denoted by the character ε {\displaystyle \varepsilon } ; "Concatenation of words" α = α 1 α 2 … α n {\displaystyle \alpha =\alpha _{1}\alpha _{2}\dots \alpha _{n}} and β = β 1 β 2 … β m {\displaystyle \beta =\beta _{1}\beta _{2}\dots \beta _{m}} is denoted as α ⋅ β {\displaystyle \alpha \cdot \beta } or α β {\displaystyle \alpha \beta } and corresponds to the word obtained by writing β {\displaystyle \beta } to the right of α {\displaystyle \alpha } , that is, α β = α 1 α 2 … α n β 1 β 2 … β m {\displaystyle \alpha \beta =\alpha _{1}\alpha _{2}\dots \alpha _{n}\beta _{1}\beta _{2}\dots \beta _{m}} ; "Concatenation of languages" A {\displaystyle A} and B {\displaystyle B} is denoted as A ⋅ B {\displaystyle A\cdot B} or A B {\displaystyle AB} and corresponds to the set of pairwise concatenations A B = { α β : α ∈ A , β ∈ B } {\displaystyle AB=\{\alpha \beta :\alpha \in A,\beta \in B\}} ; If the word ω ∈ Σ ∗ {\displaystyle \omega \in \Sigma ^{}} may be represented as ω = α γ β {\displaystyle \omega =\alpha \gamma \beta } , where α , β , γ ∈ Σ ∗ {\displaystyle \alpha ,\beta ,\gamma \in \Sigma ^{}} , then words α {\displaystyle \alpha } , β {\displaystyle \beta } and γ {\displaystyle \gamma } are called "prefix", "suffix" and "subword" (substring) of the word ω {\displaystyle \omega } correspondingly; If T = T 1 … T n {\displaystyle T=T_{1}\dots T_{n}} and T l T l + 1 … T r = S {\displaystyle T_{l}T_{l+1}\dots T_{r}=S} (with 1 ≤ l ≤ r ≤ n {\displaystyle 1\leq l\leq r\leq n} ) then S {\displaystyle S} is said to "occur" in T {\displaystyle T} as a subword. Here l {\displaystyle l} and r {\displaystyle r} are called left and right positions of occurrence of S {\displaystyle S} in T {\displaystyle T} correspondingly. == Automaton structure == Formally, deterministic finite automaton is determined by 5-tuple A = ( Σ , Q , q 0 , F , δ ) {\displaystyle {\mathcal {A}}=(\Sigma ,Q,q_{0},F,\delta )} , where: Σ {\displaystyle \Sigma } is an "alphabet" that is used to construct words, Q {\displaystyle Q} is a set of automaton "states", q 0 ∈ Q {\displaystyle q_{0}\in Q} is an "initial" state of automaton, F ⊂ Q {\displaystyle F\subset Q} is a set of "final" states of automaton, δ : Q × Σ ↦ Q {\displaystyle \delta :Q\times \Sigma \mapsto Q} is a partial "transition" function of automaton, such that δ ( q , σ ) {\displaystyle \delta (q,\sigma )} for q ∈ Q {\displaystyle q\in Q} and σ ∈ Σ {\displaystyle \sigma \in \Sigma } is either undefined or defines a transition from q {\displaystyle q} over character σ {\displaystyle \sigma } . Most commonly, deterministic finite automaton is represented as a directed graph ("diagram") such that: Set of graph vertices corresponds to the state of states Q {\displaystyle Q} , Graph has a specific marked vertex corresponding to initial state q 0 {\displaystyle q_{0}} , Graph has several marked vertices corresponding to the set of final states F {\displaystyle F} , Set of graph arcs corresponds to the set of transitions δ {\displaystyle \delta } , Specifically, every transition δ ( q 1 , σ ) = q 2 {\textstyle \delta (q_{1},\sigma )=q_{2}} is represented by an arc from q 1 {\displaystyle q_{1}} to q 2 {\displaystyle q_{2}} marked with the character σ {\displaystyle \sigma } . This transition also may be denoted as q 1 σ ⟶ q 2 {\textstyle q_{1}{\begin{smallmatrix}{\sigma }\\[-5pt]{\longrightarrow }\end{smallmatrix}}q_{2}} . In terms of its diagram, the automaton recognizes the word ω = ω 1 ω 2 … ω m {\displaystyle \omega =\omega _{1}\omega _{2}\dots \omega _{m}} only if there is a path from the initial vertex q 0 {\displaystyle q_{0}} to some final vertex q ∈ F {\displaystyle q\in F} such that concatenation of characters on this path forms ω {\displaystyle \omega } . The set of words recognized by an automaton forms a language that is set to be recognized by the automaton. In these terms, the language recognized by a suffix automaton of S {\displaystyle S} is the language of its (possibly empty) suffixes. === Automaton states === "Right context" of the word ω {\displaystyle \omega } with respect to language L {\displaystyle L} is a set [ ω ] R = { α : ω α ∈ L } {\displaystyle [\omega ]_{R}=\{\alpha :\omega \alpha \in L\}} that is a set of words α {\displaystyle \alpha } such that their concatenation with ω {\displaystyle \omega } forms a word from L {\displaystyle L} . Right contexts induce a natural equivalence relation [ α ] R = [ β ] R {\displaystyle [\alpha ]_{R}=[\beta ]_{R}} on the set of all words. If language L {\displaystyle L} is recognized by some deterministic finite automaton, there exists unique up to isomorphism automaton that recognizes the same language and has the minimum possible number of states. Such an automaton is called a minimal automaton for the given language L {\displaystyle L} . Myhill–Nerode theorem allows it to define it explicitly in terms of right contexts: In these terms, a "suffix automaton" is the minimal deterministic finite automaton recognizing the language of suffixes of the word S = s 1 s 2 … s n {\displaystyle S=s_{1}s_{2}\dots s_{n}} . The right context of the word ω {\displaystyle \omeg

    Read more →
  • Jerome H. Friedman

    Jerome H. Friedman

    Jerome Harold Friedman (born December 29, 1939) is an American statistician, consultant and Professor of Statistics at Stanford University, known for his contributions in the field of statistics and data mining. == Biography == Friedman studied at Chico State College for two years before transferring to the University of California, Berkeley in 1959, where he received his AB in Physics in 1962, and his PhD in High Energy Particle Physics in 1967. In 1968 he started his academic career as research physicist at the Lawrence Berkeley National Laboratory. In 1972 he started at Stanford University as leader of the Computation Research Group at the Stanford Linear Accelerator Center, where he would participate until 2003. In the year 1976–77 he was a visiting scientist at CERN in Geneva. From 1981 to 1984 he was visiting professor at the University of California, Berkeley. In 1982 he was appointed Professor of Statistics at Stanford University. In 1984 he was elected as a Fellow of the American Statistical Association. In 2002 he was awarded the SIGKDD Innovation Award by the Association for Computing Machinery (ACM). In 2010 he was elected as a member of the National Academy of Sciences (Applied mathematical sciences). == Publications == Friedman has authored and co-authored many publications in the field of data-mining including "nearest neighbor classification, logistical regressions, and high dimensional data analysis. His primary research interest is in the area of machine learning." A selection: Friedman, Jerome H. & Tukey, John W. (1974). "A projection pursuit algorithm for exploratory data analysis". IEEE Transactions on Computers. 23 (9): 881–890. doi:10.1109/T-C.1974.224051. OSTI 1442925. S2CID 7997450. Friedman, Jerome H. & Stuetzle, Werner (1981). "Projection pursuit regression". Journal of the American Statistical Association. 76 (376): 817–823. doi:10.1080/01621459.1981.10477729. OSTI 1445517. Friedman, Jerome H. (1991). "Multivariate adaptive regression splines". Annals of Statistics. 19 (1): 1–67. CiteSeerX 10.1.1.382.970. doi:10.1214/aos/1176347963. JSTOR 2241837. Friedman, Jerome H. (2001). "Greedy function approximation: a gradient boosting machine". Annals of Statistics. 29 (5): 1189–1232. doi:10.1214/aos/1013203451. JSTOR 2699986.

    Read more →