AI Content Udemy

AI Content Udemy — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • DataViva

    DataViva

    DataViva is an information visualization engine created by the Strategic Priorities Office of the government of Minas Gerais. DataViva makes official data about exports, industries, locations and occupations available for the entirety of Brazil through eight apps and more than 100 million possible visualizations. The first set of datum – also available at ALICEWEB – is provided by MDIC (Ministry of Development, Industry and Foreign Trade) / SECEX (Secretariat of Foreign Trade), an official institution of the Government of Brazil and shows foreign trade statistics for all exporting municipalities in the country. The other database, provided by Ministério do Trabalho e Emprego (MTE – Ministry of Labor and Employment), shows information about all the industries and occupations in Brazil (RAIS – Annual Social Information Report). The platform consists of eight core applications, each of which allows different ways of visualizing the data available. Some applications are descriptive, that is, showing data aggregated at various levels in a simple and comparative way, such as Treemapping. Others are prescriptive, using calculations that allow an analytic visualization of the data, based on theories such as the Product Space. All the applications are generated using D3plus, an open source JavaScript library built on top of D3.js by Alexander Simoes and Dave Landry. Inspired by The Observatory of Economic Complexity, DataViva is an open data, open-source, and free to use tool. It was developed in a partnership with Datawheel, co-founded by MIT Media Lab Professor César Hidalgo, and is maintained by the Government of Minas Gerais.

    Read more →
  • How to Choose an AI Clip Maker

    How to Choose an AI Clip Maker

    Curious about the best AI clip maker? An AI clip maker is software that uses machine learning to help you get more done — it combines speed, accuracy, and an interface that just works. Hands-on testing shows real-world results vary, so a short free trial is the smartest way to decide. Whether you are a beginner or a pro, the right AI clip maker slots into your workflow and pays for itself fast. This guide breaks down the top picks, their pros and cons, and who each one is best for.

    Read more →
  • Li Sheng (computer scientist)

    Li Sheng (computer scientist)

    Li Sheng (Chinese: 李生; born 1943), is a professor at the School of Computer Science and Engineering, Harbin Institute of Technology (HIT), China. He began his research on Chinese-English machine translation in 1985, making himself one of the earliest Chinese scholars in this field. After that, he pursued in vast topics of natural language processing, including machine translation, information retrieval, question answering and applied artificial intelligence. He was the final review committee member for computer area in NSF China. Born and raised in Heilongjiang province, he graduated in 1965 from the computer specialty of HIT, which is one of the earliest computer specialties in Chinese universities. Then he started to work as a staff in the Computer specialty of HIT, which was finally granted as a department in 1985. Also from 1985, he was appointed to undertake a series administrative positions in HIT, e.g. Dean of Computer Department(1987–1988), Director of R&D Division (1988–1990), Chief R&D Officer and several other key leading positions in HIT. Resigned all his administrative positions in 2004, Li devoted himself as the director of MOE-Microsoft Join Key Lab of NLP& Speech (HIT), making it a leading NLP research group with more than 100 staffs and students working on various aspects of NLP. So far, the lab has already been granted for dozens of technology awards by the ministries of central government and local provincial government of China. Its research progresses are reported annually in top tier conferences including ACL, IJCAI, SIGIR etc. As one of the pioneers in NLP research in China, he contributes NLP in China not only in technology innovations but also in talents education. So far, his research group has graduated more than 60 Ph.D. and almost 200 M.E with NLP major. Most of them are now working as the chief researcher in various NLP groups of universities and companies in China, including several world-known NLP scholars, such as Wang Haifeng of Baidu, Zhou Ming of Microsoft Research, Zhang Min (张民) of Soochow University (China), and Zhao Tiejun (赵铁军) and Liu Ting (刘挺) of HIT. Owing to his contributions in Chinese language processing, Li was elected as the President of Chinese Information Processing Society of China (CIPSC) in 2011. He scaled this top level academic organization in China up to more than 3000 registered members, and promoted NLP into several national projects for research or industry development. In addition, the CIPSC is now enhancing its co-operations with world NLP organizations including ACL. == Machine Intelligence & Translation Laboratory (MI&TLAB) == Originates from Machine Translation Research Group of Computer Science Department, Harbin Institute of Technology, which was started Li in 1985. It is one of the earliest institutions engaged in MT research in China, featured by its investigations into Chinese-English machine translation. It is now running under the Research Center on Language Technology, School of Computer Science and Technology, HIT. Details for staffs and publications can be found at https://mitlab.hit.edu.cn. == MOE-MS Joint Key Lab of Natural Language Processing and Speech (HIT) == In June, 2000, the Joint HIT-Microsoft Machine Translation Lab was founded by MI&T Lab and Microsoft Research (China). It was the third joint lab established by Microsoft Research (China) with Chinese universities, and the only one focusing on Machine Translation. Based on this jointly lab, the cooperation between HIT and Microsoft gradually extended to the areas of machine translation, information retrieval, speech recognition and processing, natural language understanding. In Oct, 2004, the joint key lab was granted as one of the 10 joint key labs supported by the Microsoft Research of Asia and Ministry of Education in China. In July 2006, the Shenzhen extension of the lab was launched. More than 200 staff and students have undertaken research projects, including some sponsored by the National Natural Science Foundation of China and the National 863 program of China. Since 2005, the lab has also been organizing a summer camp in Harbin Institute of Technology, and approximately 150 faculty members and students from universities in China have participated. This summer workshop was organized annually until 2014, when it was organized formally as the summer school series by Chinese Information Processing Society, China. Through the lab, a Microsoft Research of Asia-HIT joint PhD program was implemented in 2012. == CEMT-I MT System == In May 1989, CEMT-I passed the formal project appraisal in Harbin, China. Capable of translating technical paper titles from Chinese to English, it is not only the first MT system completed by Li and his group, but also the first Chinese-English Translation system that passed the technical appraisal by Chinese government according to the public reports. It was then awarded the Second Prize of Ministry Level Technology Innovation by the former National Aerospace Industry Corporation in 1990. == Daya Translation Workstation == Owing to the technical achievements by Li's group in Chinese-English machine translation, the former National Aerospace Industry Corporation of China sponsored a commercial system development of "Daya Translation Station (MT)" in 1993. Designed as a comprehensive English composition aid for Chinese users, this system was finished and put into the market in 1995. And in 1997, this system was awarded the Second Prize of Ministry Level Technology Innovation by the former National Aerospace Industry Corporation. == BT863 MT System == From 1994, the researches in Li's lab were supported by National 863 Hi-tech Research and Development Program. During this period, the BT863 system was explored to employ one engine for both Chinese-English and English-Chinese translation. This system was proved to be the best performance among Chinese-English MT systems in the formal technical evaluation of National 863 program, yielding the Third Prize of Ministry Level Technology Innovation by the former National Aerospace Industry Corporation in 1997. == Next Generation IR == This is a key project granted by NSF China (with a joint sponsorship from MSRA) started form 2008. In contrast to his previous NSF grants for different NLP issues, Li explored in his last PI project on key technologies in personalized IR, together with researchers from Tsinghua University and Institute of Software, Chinese Academy of Science. With impressive publications in top tier journals and conferences (including breakthrough publications in SIGIR of his own group), this projected was approved "A-level" achievements by the NSF China office in 2012.

    Read more →
  • Brendan Frey

    Brendan Frey

    Brendan John Frey FRSC (born 29 August 1968) is a Canadian computer scientist, entrepreneur, and engineer. He is Founder and CEO of Deep Genomics, Cofounder of the Vector Institute for Artificial Intelligence and Professor of Engineering and Medicine at the University of Toronto. Frey is a pioneer in the development of machine learning and artificial intelligence methods, their use in accurately determining the consequences of genetic mutations, and in designing medications that can slow, stop or reverse the progression of disease. As far back as 1995, Frey co-invented one of the first deep learning methods, called the wake-sleep algorithm, the affinity propagation algorithm for clustering and data summarization, and the factor graph notation for probability models. In the late 1990s, Frey was a leading researcher in the areas of computer vision, speech recognition, and digital communications. == Education == Frey studied computer engineering and physics at the University of Calgary (BSc 1990) and the University of Manitoba (MSc 1993), and then studied neural networks and graphical models as a doctoral candidate at the University of Toronto under the supervision of Geoffrey Hinton (PhD 1997). He was an invited participant of the Machine Learning program at the Isaac Newton Institute for Mathematical Sciences in Cambridge, UK (1997) and was a Beckman Fellow at the University of Illinois at Urbana Champaign (1999). == Career == Following his undergraduate studies, Frey worked as a junior research scientist at Bell-Northern Research from 1990 to 1991. After completing his postdoctoral studies at the University of Illinois at Urbana-Champaign, Frey was an assistant professor in the Department of Computer Science at the University of Waterloo, from 1999 to 2001. In 2001, Frey joined the Department of Electrical and Computer Engineering at the University of Toronto and was cross-appointed to the Department of Computer Science, the Banting and Best Department of Medical Research and the Terrence Donnelly Centre for Cellular and Biomolecular Research. From 2008 to 2009, he was a visiting researcher at Microsoft Research (Cambridge, UK) and a visiting professor in the Cavendish Laboratories and Darwin College at Cambridge University. Between 2001 and 2014, Frey consulted for several groups at Microsoft Research and acted as a member of its Technical Advisory Board. In 2002, a personal crisis led Frey to face the fact that there was a tragic gap between our ability to measure a patient's mutations and our ability to understand and treat the consequences. Recognizing that biology is too complex for humans to understand, that in the decades to come there would be an exponential growth in biology data, and that machine learning is the best technology we have for discovering relationships in large datasets, Frey set out to build machine learning systems that could accurately predict genome and cell biology. Frey’s group pioneered much of the early work in the field and over the next 15 years published more papers in leading-edge journals than any other academic or industrial research lab. In 2015, Frey founded Deep Genomics, with the goal of building a company that can produce effective and safe genetic medicines more rapidly and with a higher rate of success than was previously possible. The company has received 240 million dollars in funding to date from leading Bay Area investors, including the backers of SpaceX and Tesla.

    Read more →
  • Inductive bias

    Inductive bias

    The inductive bias (also known as learning bias) of a learning algorithm is the set of assumptions that the learner uses to predict outputs of given inputs that it has not encountered. Inductive bias is anything which makes the algorithm learn one pattern instead of another pattern (e.g., step-functions in decision trees instead of continuous functions in linear regression models). Learning involves searching a space of solutions for a solution that provides a good explanation of the data. However, in many cases, there may be multiple equally appropriate solutions. An inductive bias allows a learning algorithm to prioritize one solution (or interpretation) over another, independently of the observed data. In machine learning, the aim is to construct algorithms that are able to learn to predict a certain target output. To achieve this, the learning algorithm is presented some training examples that demonstrate the intended relation of input and output values. Then the learner is supposed to approximate the correct output, even for examples that have not been shown during training. Without any additional assumptions, this problem cannot be solved since unseen situations might have an arbitrary output value. The kind of necessary assumptions about the nature of the target function are subsumed in the phrase inductive bias. A classical example of an inductive bias is Occam's razor, assuming that the simplest consistent hypothesis about the target function is actually the best. Here, consistent means that the hypothesis of the learner yields correct outputs for all of the examples that have been given to the algorithm. Approaches to a more formal definition of inductive bias are based on mathematical logic. Here, the inductive bias is a logical formula that, together with the training data, logically entails the hypothesis generated by the learner. However, this strict formalism fails in many practical cases in which the inductive bias can only be given as a rough description (e.g., in the case of artificial neural networks), or not at all. == Types == The following is a list of common inductive biases in machine learning algorithms. Maximum conditional independence: if the hypothesis can be cast in a Bayesian framework, try to maximize conditional independence. This is the bias used in the Naive Bayes classifier. Minimum cross-validation error: when trying to choose among hypotheses, select the hypothesis with the lowest cross-validation error. Although cross-validation may seem to be free of bias, the "no free lunch" theorems show that cross-validation must be biased, for example assuming that there is no information encoded in the ordering of the data. Maximum margin: when drawing a boundary between two classes, attempt to maximize the width of the boundary. This is the bias used in support vector machines. The assumption is that distinct classes tend to be separated by wide boundaries. Minimum description length: when forming a hypothesis, attempt to minimize the length of the description of the hypothesis. Minimum features: unless there is good evidence that a feature is useful, it should be deleted. This is the assumption behind feature selection algorithms. Nearest neighbors: assume that most of the cases in a small neighborhood in feature space belong to the same class. Given a case for which the class is unknown, guess that it belongs to the same class as the majority in its immediate neighborhood. This is the bias used in the k-nearest neighbors algorithm. The assumption is that cases that are near each other tend to belong to the same class. == Shift of bias == Although most learning algorithms have a static bias, some algorithms are designed to shift their bias as they acquire more data. This does not avoid bias, since the bias shifting process itself must have a bias.

    Read more →
  • Best AI Marketing Tools in 2026

    Best AI Marketing Tools in 2026

    Trying to pick the best AI marketing tool? An AI marketing tool is software that uses machine learning to help you get more done — it scales effortlessly from a single task to thousands. The best picks balance beginner-friendly simplicity with the depth power users need, and they ship updates often. Whether you are a beginner or a pro, the right AI marketing tool slots into your workflow and pays for itself fast. Read on for hands-on impressions, pricing tiers, and the standout features that matter.

    Read more →
  • Sparse dictionary learning

    Sparse dictionary learning

    Sparse dictionary learning (also known as sparse coding or SDL) is a representation learning method which aims to find a sparse representation of the input data in the form of a linear combination of basic elements as well as those basic elements themselves. These elements are called atoms, and they compose a dictionary. Atoms in the dictionary are not required to be orthogonal, and they may be an over-complete spanning set. This problem setup also allows the dimensionality of the signals being represented to be higher than any one of the signals being observed. These two properties lead to having seemingly redundant atoms that allow multiple representations of the same signal, but also provide an improvement in sparsity and flexibility of the representation. One of the most important applications of sparse dictionary learning is in the field of compressed sensing or signal recovery. In compressed sensing, a high-dimensional signal can be recovered with only a few linear measurements, provided that the signal is sparse or near-sparse. Since not all signals satisfy this condition, it is crucial to find a sparse representation of that signal such as the wavelet transform or the directional gradient of a rasterized matrix. Once a matrix or a high-dimensional vector is transferred to a sparse space, different recovery algorithms like basis pursuit, CoSaMP, or fast non-iterative algorithms can be used to recover the signal. One of the key principles of dictionary learning is that the dictionary has to be inferred from the input data. The emergence of sparse dictionary learning methods was stimulated by the fact that in signal processing, one typically wants to represent the input data using a minimal amount of components. Before this approach, the general practice was to use predefined dictionaries such as Fourier or wavelet transforms. However, in certain cases, a dictionary that is trained to fit the input data can significantly improve the sparsity, which has applications in data decomposition, compression, and analysis, and has been used in the fields of image denoising and classification, and video and audio processing. Sparsity and overcomplete dictionaries have immense applications in image compression, image fusion, and inpainting. == Problem statement == Given the input dataset X = [ x 1 , . . . , x K ] , x i ∈ R d {\displaystyle X=[x_{1},...,x_{K}],x_{i}\in \mathbb {R} ^{d}} we wish to find a dictionary D ∈ R d × n : D = [ d 1 , . . . , d n ] {\displaystyle \mathbf {D} \in \mathbb {R} ^{d\times n}:D=[d_{1},...,d_{n}]} and a representation R = [ r 1 , . . . , r K ] , r i ∈ R n {\displaystyle R=[r_{1},...,r_{K}],r_{i}\in \mathbb {R} ^{n}} such that both ‖ X − D R ‖ F 2 {\displaystyle \|X-\mathbf {D} R\|_{F}^{2}} is minimized and the representations r i {\displaystyle r_{i}} are sparse enough. This can be formulated as the following optimization problem: argmin D ∈ C , r i ∈ R n ∑ i = 1 K ‖ x i − D r i ‖ 2 2 + λ ‖ r i ‖ 0 {\displaystyle {\underset {\mathbf {D} \in {\mathcal {C}},r_{i}\in \mathbb {R} ^{n}}{\text{argmin}}}\sum _{i=1}^{K}\|x_{i}-\mathbf {D} r_{i}\|_{2}^{2}+\lambda \|r_{i}\|_{0}} , where C ≡ { D ∈ R d × n : ‖ d i ‖ 2 ≤ 1 ∀ i = 1 , . . . , n } {\displaystyle {\mathcal {C}}\equiv \{\mathbf {D} \in \mathbb {R} ^{d\times n}:\|d_{i}\|_{2}\leq 1\,\,\forall i=1,...,n\}} , λ > 0 {\displaystyle \lambda >0} C {\displaystyle {\mathcal {C}}} is required to constrain D {\displaystyle \mathbf {D} } so that its atoms would not reach arbitrarily high values allowing for arbitrarily low (but non-zero) values of r i {\displaystyle r_{i}} . λ {\displaystyle \lambda } controls the trade off between the sparsity and the minimization error. The minimization problem above is not convex because of the ℓ0-"norm" and solving this problem is NP-hard. In some cases L1-norm is known to ensure sparsity and so the above becomes a convex optimization problem with respect to each of the variables D {\displaystyle \mathbf {D} } and R {\displaystyle \mathbf {R} } when the other one is fixed, but it is not jointly convex in ( D , R ) {\displaystyle (\mathbf {D} ,\mathbf {R} )} . === Properties of the dictionary === The dictionary D {\displaystyle \mathbf {D} } defined above can be "undercomplete" if n < d {\displaystyle n d {\displaystyle n>d} with the latter being a typical assumption for a sparse dictionary learning problem. The case of a complete dictionary does not provide any improvement from a representational point of view and thus isn't considered. Undercomplete dictionaries represent the setup in which the actual input data lies in a lower-dimensional space. This case is strongly related to dimensionality reduction and techniques like principal component analysis which require atoms d 1 , . . . , d n {\displaystyle d_{1},...,d_{n}} to be orthogonal. The choice of these subspaces is crucial for efficient dimensionality reduction, but it is not trivial. And dimensionality reduction based on dictionary representation can be extended to address specific tasks such as data analysis or classification. However, their main downside is limiting the choice of atoms. Overcomplete dictionaries, however, do not require the atoms to be orthogonal (they will never have a basis anyway) thus allowing for more flexible dictionaries and richer data representations. An overcomplete dictionary which allows for sparse representation of signal can be a famous transform matrix (wavelets transform, fourier transform) or it can be formulated so that its elements are changed in such a way that it sparsely represents the given signal in a best way. Learned dictionaries are capable of giving sparser solutions as compared to predefined transform matrices. == Algorithms == As the optimization problem described above can be solved as a convex problem with respect to either dictionary or sparse coding while the other one of the two is fixed, most of the algorithms are based on the idea of iteratively updating one and then the other. The problem of finding an optimal sparse coding R {\displaystyle R} with a given dictionary D {\displaystyle \mathbf {D} } is known as sparse approximation (or sometimes just sparse coding problem). A number of algorithms have been developed to solve it (such as matching pursuit and LASSO) and are incorporated in the algorithms described below. === Method of optimal directions (MOD) === The method of optimal directions (or MOD) was one of the first methods introduced to tackle the sparse dictionary learning problem. The core idea of it is to solve the minimization problem subject to the limited number of non-zero components of the representation vector: min D , R { ‖ X − D R ‖ F 2 } s.t. ∀ i ‖ r i ‖ 0 ≤ T {\displaystyle \min _{\mathbf {D} ,R}\{\|X-\mathbf {D} R\|_{F}^{2}\}\,\,{\text{s.t.}}\,\,\forall i\,\,\|r_{i}\|_{0}\leq T} Here, F {\displaystyle F} denotes the Frobenius norm. MOD alternates between getting the sparse coding using a method such as matching pursuit and updating the dictionary by computing the analytical solution of the problem given by D = X R + {\displaystyle \mathbf {D} =XR^{+}} where R + {\displaystyle R^{+}} is a Moore-Penrose pseudoinverse. After this update D {\displaystyle \mathbf {D} } is renormalized to fit the constraints and the new sparse coding is obtained again. The process is repeated until convergence (or until a sufficiently small residue). MOD has proved to be a very efficient method for low-dimensional input data X {\displaystyle X} requiring just a few iterations to converge. However, due to the high complexity of the matrix-inversion operation, computing the pseudoinverse in high-dimensional cases is in many cases intractable. This shortcoming has inspired the development of other dictionary learning methods. === K-SVD === K-SVD is an algorithm that performs SVD at its core to update the atoms of the dictionary one by one and basically is a generalization of K-means. It enforces that each element of the input data x i {\displaystyle x_{i}} is encoded by a linear combination of not more than T 0 {\displaystyle T_{0}} elements in a way identical to the MOD approach: min D , R { ‖ X − D R ‖ F 2 } s.t. ∀ i ‖ r i ‖ 0 ≤ T 0 {\displaystyle \min _{\mathbf {D} ,R}\{\|X-\mathbf {D} R\|_{F}^{2}\}\,\,{\text{s.t.}}\,\,\forall i\,\,\|r_{i}\|_{0}\leq T_{0}} This algorithm's essence is to first fix the dictionary, find the best possible R {\displaystyle R} under the above constraint (using Orthogonal Matching Pursuit) and then iteratively update the atoms of dictionary D {\displaystyle \mathbf {D} } in the following manner: ‖ X − D R ‖ F 2 = | X − ∑ i = 1 K d i x T i | F 2 = ‖ E k − d k x T k ‖ F 2 {\displaystyle \|X-\mathbf {D} R\|_{F}^{2}=\left|X-\sum _{i=1}^{K}d_{i}x_{T}^{i}\right|_{F}^{2}=\|E_{k}-d_{k}x_{T}^{k}\|_{F}^{2}} The next steps of the algorithm include rank-1 approximation of the residual matrix E k {\displaystyle E_{k}} , updating d k {\displaystyle d_{k}} and enforcing the s

    Read more →
  • Transfer-based machine translation

    Transfer-based machine translation

    Transfer-based machine translation is a type of machine translation (MT). It is currently one of the most widely used methods of machine translation. In contrast to the simpler direct model of MT, transfer MT breaks translation into three steps: analysis of the source language text to determine its grammatical structure, transfer of the resulting structure to a structure suitable for generating text in the target language, and finally generation of this text. Transfer-based MT systems are thus capable of using knowledge of the source and target languages. == Design == Both transfer-based and interlingua-based machine translation have the same idea: to make a translation it is necessary to have an intermediate representation that captures the "meaning" of the original sentence in order to generate the correct translation. In interlingua-based MT this intermediate representation must be independent of the languages in question, whereas in transfer-based MT, it has some dependence on the language pair involved. The way in which transfer-based machine translation systems work varies substantially, but in general they follow the same pattern: they apply sets of linguistic rules which are defined as correspondences between the structure of the source language and that of the target language. The first stage involves analysing the input text for morphology and syntax (and sometimes semantics) to create an internal representation. The translation is generated from this representation using both bilingual dictionaries and grammatical rules. It is possible with this translation strategy to obtain fairly high quality translations, with accuracy in the region of 90% (although this is highly dependent on the language pair in question, for example the distance between the two). == Operation == In a rule-based machine translation system the original text is first analysed morphologically and syntactically in order to obtain a syntactic representation. This representation can then be refined to a more abstract level putting emphasis on the parts relevant for translation and ignoring other types of information. The transfer process then converts this final representation (still in the original language) to a representation of the same level of abstraction in the target language. These two representations are referred to as "intermediate" representations. From the target language representation, the stages are then applied in reverse. == Analysis and transformation == Various methods of analysis and transformation can be used before obtaining the final result. Along with these statistical approaches may be augmented generating hybrid systems. The methods which are chosen and the emphasis depends largely on the design of the system, however, most systems include at least the following stages: Morphological analysis. Surface forms of the input text are classified as to part-of-speech (e.g. noun, verb, etc.) and sub-category (number, gender, tense, etc.). All of the possible "analyses" for each surface form are typically made output at this stage, along with the lemma of the word. Lexical categorisation. In any given text some of the words may have more than one meaning, causing ambiguity in analysis. Lexical categorisation looks at the context of a word to try to determine the correct meaning in the context of the input. This can involve part-of-speech tagging and word sense disambiguation. Lexical transfer. This is basically dictionary translation; the source language lemma (perhaps with sense information) is looked up in a bilingual dictionary and the translation is chosen. Structural transfer. While the previous stages deal with words, this stage deals with larger constituents, for example phrases and chunks. Typical features of this stage include concordance of gender and number, and re-ordering of words or phrases. Morphological generation. From the output of the structural transfer stage, the target language surface forms are generated. == Transfer types == One of the main features of transfer-based machine translation systems is a phase that "transfers" an intermediate representation of the text in the original language to an intermediate representation of text in the target language. This can work at one of two levels of linguistic analysis, or somewhere in between. The levels are: Superficial transfer (or syntactic). This level is characterised by transferring "syntactic structures" between the source and target languages. It is suitable for languages in the same family or of the same type, for example in the Romance languages between Spanish, Catalan, French, Italian, etc. Deep transfer (or semantic). This level constructs a semantic representation that is dependent on the source language. This representation can consist of a series of structures which represent the meaning. In these transfer systems predicates are typically produced. The translation also typically requires structural transfer. This level is used to translate between more distantly related languages (e.g. Spanish-English or Spanish-Basque, etc.)

    Read more →
  • Application software

    Application software

    Application software is software that is intended for end-user use – not operating, administering or programming a computer. It includes programs such as word processors, web browsers, media players, and mobile applications used in daily tasks. An application (app, application program, software application) is any program that can be categorized as application software. Application is a subjective classification that is often used to differentiate from system and utility software. Application software represents the user-facing layer of computing systems, designed to translate complex system capabilities into task-oriented, goal-driven workflows. Unlike system software, which focuses on hardware orchestration and resource management, application software is centered on problem abstraction, user interaction, and domain-specific functionality. The abbreviation app became popular with the 2008 introduction of the iOS App Store, to refer to applications for mobile devices such as smartphones and tablets. Later, with the release of the Mac App Store in 2010 and the Windows Store in 2011, it began to be used to refer to end-user software in general, regardless of platform. Applications may be bundled with the computer and its system software or published separately. Applications may be proprietary or open-source. == Terminology == === Meaning program and software === When used as an adjective, application can have a broader meaning than that described in this article. For example, concepts such as application programming interface (API), application server, application virtualization, application lifecycle management and portable application refer to programs and software in general. === Distinction between system and application software === The distinction between system and application software is subjective and has been the subject of controversy. For example, one of the key questions in the United States v. Microsoft Corp. antitrust trial was whether Microsoft's Internet Explorer web browser was part of its Windows operating system or a separate piece of application software. As another example, the GNU/Linux naming controversy is, in part, due to disagreement about the relationship between the Linux kernel and the operating systems built over this kernel. In some types of embedded systems, the application software and the operating system software may be indistinguishable by the user, as in the case of software used to control a VCR, DVD player, or microwave oven. The above definitions may exclude some applications that may exist on some computers in large organizations. For an alternative definition of an app: see Application Portfolio Management. === Killer application === A killer application (killer app, coined in the late 1980s) is an application that is so popular that it causes demand for its host platform to increase. For example, VisiCalc was the first modern spreadsheet software for the Apple II and helped sell the then-new personal computers into offices. For the BlackBerry, it was its email software. === Software suite === As software suite consists of multiple applications bundled together. They usually have related functions, features, and user interfaces, and may be able to interact with each other, e.g. open each other's files. Business applications often come in suites, e.g. Microsoft Office, LibreOffice and iWork, which bundle together a word processor, a spreadsheet, etc.; but suites exist for other purposes, e.g. graphics or music. == Ways to classify == As there so many applications and since their attributes vary so dramatically, there are many different ways to classify them. === By legal aspects === Proprietary software is protected under an exclusive copyright, and a software license grants limited usage rights. Such applications may allow add-ons from third parties. Free and open-source software (FOSS) can be run, distributed, sold, and extended for any purpose. FOSS software released under a free license may be perpetual and also royalty-free. Perhaps, the owner, the holder or third-party enforcer of any right (copyright, trademark, patent, or ius in re aliena) are entitled to add exceptions, limitations, time decays or expiring dates to the license terms of use. Public-domain software is a type of FOSS that is royalty-free and can be run, distributed, modified, reversed, republished, or created in derivative works without any copyright attribution and therefore revocation. It can even be sold, but without transferring the public domain property to other single subjects. Public-domain software can be released under a (un)licensing legal statement, which enforces those terms and conditions for an indefinite duration (for a lifetime, or forever). === By platform === An application can be categorized by the host platform on which it runs. Notable platforms include operating system (native), web browser, cloud computing and mobile. For example a web application runs in a web browser whereas a more traditional, native application runs in the environment of a computer's operating system. There has been a contentious debate regarding web applications replacing native applications for many purposes, especially on mobile devices such as smartphones and tablets. Web apps have indeed greatly increased in popularity for some uses, but the advantages of applications make them unlikely to disappear soon, if ever. Furthermore, the two can be complementary, and even integrated. === Horizontal vs. vertical === Application software can be seen as either horizontal or vertical. Horizontal applications are more popular and widespread, because they are general purpose, for example word processors or databases. Vertical applications are niche products, designed for a particular type of industry or business, or department within an organization. Integrated suites of software will try to handle every specific aspect possible of, for example, manufacturing or banking worker, accounting, or customer service. === By purpose === There are many types of application software: Enterprise Addresses the needs of an entire organization's processes and data flows, across several departments, often in a large distributed environment. Examples include enterprise resource planning systems, customer relationship management (CRM) systems, data replication engines, and supply chain management software. Departmental Software is a sub-type of enterprise software with a focus on smaller organizations or groups within a large organization. (Examples include travel expense management and IT Helpdesk.) Enterprise infrastructure Provides common capabilities needed to support enterprise software systems. (Examples include databases, email servers, and systems for managing networks and security.) Application platform as a service (aPaaS) A cloud computing service that offers development and deployment environments for application services. Knowledge worker Lets users create and manage information, often for and individual media editors may aid in multiple information worker tasks. Content access Used primarily to access content without editing, but may include software that allows for content editing. Such software addresses the needs of individuals and groups to consume digital entertainment and published digital content. (Examples include media players, web browsers, and help browsers.) Educational Related to content access software, but has the content or features adapted for use by educators or students. For example, it may deliver evaluations (tests), track progress through material, or include collaborative capabilities. Simulation Simulates physical or abstract systems for either research, training, or entertainment purposes. Media development Generates print and electronic media for others to consume, most often in a commercial or educational setting. This includes graphic-art software, desktop publishing software, multimedia development software, HTML editors, digital-animation editors, digital audio and video composition, and many others. Engineering Used in developing hardware and software products. This includes computer-aided design (CAD), computer-aided engineering (CAE), computer language editing and compiling tools, integrated development environments, and application programmer interfaces. Entertainment Refers to video games, screen savers, programs to display motion pictures or play recorded music, and other forms of entertainment which can be experienced through the use of a computing device. == Taxonomy == This section is a taxonomy of kinds of applications. This organization is but one of many different ways to organize them. A kind is included in only one category even if it logically fits in multiple. === General-purpose === Calculator Spreadsheet Web browser Web mapping E-commerce Social media === Communication === Chat Email Presentation software Phone Messages Networking software Web conferencing === Documentation === Desktop

    Read more →
  • Cheng Xiang Zhai

    Cheng Xiang Zhai

    ChengXiang Zhai is a computer scientist. He is a Donald Biggar Willett Professor in Engineering in the Department of Computer Science at the University of Illinois at Urbana-Champaign. == Biography == Zhai received the BS (1984), MS (1987, under Guoliang Zheng), and PhD (1990, under Jiafu Xu) in Computer Science from Nanjing University. He spent 1990 to 1993 working at Nanjing University's State Key Laboratory for Novel Software Technology. In 1993, he left for America to pursue a second PhD, this time at Carnegie Mellon University (CMU) with David A. Evans. Evans then left to spend more time with the company ClariTech. Zhai obtained from CMU a MS (1997) in computational linguistics and then started working with John Lafferty. He finally received from CMU a PhD in Language and Information Technologies in 2002. Since then, he has been an Assistant Professor (2002–2008), Associate Professor (2008–2013), Professor (2013–2018), and Donald Biggar Willett Professor (2018–) at the UIUC Department of Computer Science. He also holds joint appointments with the Carl R. Woese Institute for Genomic Biology, Department of Statistics, and School of Information Sciences at UIUC. == Awards == ACM SIGIR Gerard Salton Award, 2021, "for significant and sustained contributions to information retrieval and data science. His work has defined many of the theoretical foundations of the language modeling approach, yielding major insights into areas such as smoothing methods, relevance feedback, topic diversification, and text representations that incorporate positional information. He and his collaborators have also pioneered the axiomatic approach to information retrieval, which continues to provide inspiration for retrieval model and evaluation research." ACM SIGIR Academy inductee, 2021 ACM Fellow, 2017, "for contributions to information retrieval and text data mining." ACM SIGIR Test of Time Award, 2016, for paper A study of smoothing methods for language models applied to Ad Hoc information retrieval ACM SIGIR Test of Time Award, 2016, for paper Document language models, query models, and risk minimization for information retrieval ACM SIGIR Test of Time Award, 2014, for paper Beyond independent relevance: methods and evaluation metrics for subtopic retrieval ACM Distinguished Member, 2009 Presidential Early Career Award for Scientists and Engineers (PECASE), 2004, "for his work on user-centered, adaptive intelligent information access. His techniques expect to improve search-engine performance, support better information organization and enable understanding of large volumes of information. Zhai's work in information retrieval is expected to enhance curricula and provide new educational tools for the growing information technology workforce." ACM SIGIR Best Paper Award, 2004, for paper A formal study of information retrieval heuristics == Personal == Zhai's son Alex has earned three medals at the International Mathematical Olympiad.

    Read more →
  • AI Bug Finders: Free vs Paid (2026)

    AI Bug Finders: Free vs Paid (2026)

    Curious about the best AI bug finder? An AI bug finder is software that uses machine learning to help you get more done — it combines speed, accuracy, and an interface that just works. Hands-on testing shows real-world results vary, so a short free trial is the smartest way to decide. Whether you are a beginner or a pro, the right AI bug finder slots into your workflow and pays for itself fast. Read on for hands-on impressions, pricing tiers, and the standout features that matter.

    Read more →
  • Ann Copestake

    Ann Copestake

    Ann Alicia Copestake is professor of computational linguistics and head of the Department of Computer Science and Technology at the University of Cambridge and a fellow of Wolfson College, Cambridge. == Education == Copestake was educated at the University of Cambridge where she was awarded a Bachelor of Arts degree in Natural Sciences. After two years working for Unilever Research she completed the Cambridge Diploma in Computer Science. She went on to study at the University of Sussex where she was awarded a PhD in 1992 for research on lexical semantics supervised by Gerald Gazdar. == Career and research == Copestake started doing research in Natural language processing and Computational Linguistics at the University of Cambridge in 1985. Since then she has been a visiting researcher at Xerox PARC (1993/4) and the University of Stuttgart (1994/5). From July 1994 to October 2000 she worked at the Center for the Study of Language and Information (CSLI) at Stanford University, as a Senior Researcher. Copestake was appointed a University Lecturer at Cambridge in October 2000. In the UK, her research has been funded by the Engineering and Physical Sciences Research Council (EPSRC) and Arts and Humanities Research Council (AHRC). According to Google Scholar and Scopus her most cited publications include papers on minimal recursion semantics, multiword expressions, polysemy, named-entity recognition and feature structure grammars.

    Read more →
  • Ericom Connect

    Ericom Connect

    Ericom Connect is a remote access/application publishing solution produced by Ericom Software that provides secure, centrally managed access to physical or hosted desktops and applications running on Microsoft Windows and Linux systems. == Product overview == Ericom Connect is desktop virtualization and application virtualization software that allows users to run applications remotely, without installing them on the local computer or device. The software is noted for its scalability, ease of deployment, and compatibility with any type of infrastructure, cloud or physical. Ericom Connect uses AccessPad (native client for desktops), AccessToGo (native client for mobile), or AccessNow, one of the first HTML5 RDP solutions to support clientless access to Windows desktops and applications from any device with an HTML5-compatible browser, including Macintosh computers, mobile devices, and Google Chromebooks. Other notable features include performance monitoring, built-in real-time analytics & BI, support for two-factor authentication (using RSA SecurID), multi-tenancy and multi-datacenter support via a single unified web interface, and a “Launch Simulation” feature that allows users to visualize and simulate actual step-by-step user processes directly from within the administration console. In addition to scalability, by distributing configurations, logs, etc., across multiple servers there is no single point of failure, as can be the case if all configuration information is stored on one server. == History == Ericom Connect was introduced in 2015. Ericom Connect is a successor to Ericom PowerTerm Web Connect. PowerTerm Web Connect used an architecture similar to what was then current with Citrix and VMWare, relying on a centralized SQL server, a connection broker, image management for different hypervisors, and a variety of clients. Ericom Connect uses a new grid architecture that provides more scalability, reliability, and flexibility than before.

    Read more →
  • Black in AI

    Black in AI

    Black in AI, formally called the Black in AI Workshop, is a technology research organization and affinity group, founded by computer scientists Timnit Gebru and Rediet Abebe in 2017. It started as a conference workshop, later pivoting into an organization. Black in AI increases the presence and inclusion of Black people in the field of artificial intelligence (AI) by creating space for sharing ideas, fostering collaborations, mentorship, and advocacy. == History == Black in AI was created in 2017 to address issues of lack of diversity in AI workshops, and was started as its own workshop within the Conference on Neural Information Processing Systems (NeurIPS) conference. Because of algorithmic bias, ethical issues, and underrepresentation of Black people in AI roles; there has been an ongoing need for unity within the AI community to have focus on these issues. Black in AI has strived to continue the progress of improving the presence of people of color in the field of artificial intelligence. In 2018 and 2019, the Black in AI workshop had many immigration visa issues to Canada, which spurred the conference to be planned for 2020 in Addis Ababa, Ethiopia. On December 7, 2020, Black in AI held its fourth annual workshop and first virtual workshop (due to the COVID-19 pandemic). In 2021, Black in AI, alongside the groups Queer in AI and Widening NLP, released a public statement refusing funding from Google in an act of protest of Google's treatment of Timnit Gebru, Margaret Mitchell, and April Christina Curley in the events that occurred in December 2020. == Founders == Rediet Abebe is an Ethiopian computer scientist who specializes in algorithms and artificial intelligence. She is a Computer Science Assistant Professor at the University of California, Berkeley. She was previously a Junior Fellow at Harvard's Society of Fellows. She was the first Black woman to receive a Ph.D. in computer science at Cornell University. She "designs and analyzes algorithms, discrete optimizations, network-based, [and] computational strategies to increase access to opportunity for historically disadvantaged populations," according to her web bio. Timnit Gebru was born in Ethiopia and moved to the United States at the age of fifteen. She got her B.S. and M.S. in electrical engineering from Stanford University, as well as a PhD from the Stanford Artificial Intelligence Laboratory, where she studied computer vision under Fei-Fei Li. She formerly worked as a postdoctoral researcher at Microsoft Research in the Fairness Accountability Transparency, and Ethics (FATE) division. She's also worked with Apple, where she assisted in the development of signal-processing algorithms for the original iPad. == Grants == Black in AI received grants and support from private foundations like MacArthur Foundation and Rockefeller Foundation. The organization received $10,000 in 2018 for its annual workshop and $150,000 in 2019 for its long-term organizational planning. In 2020, during the pandemic, the organization received a grant of $300,000 by MacArthur Foundation in order to provide broad organizational support. In 2022, Rockefeller Foundation announced $300,000 to fight prejudice in artificial intelligence (AI) across the globe and incorporate equity into this rapidly expanding field. == Programs == "Black in AI works in academics, advocacy, entrepreneurship, financial support, and summer research programs." The Black in AI Academic Program is a resource for Black junior researchers applying to graduate schools, navigating graduate school, and transitioning into the postgraduate employment market. They provide online education sessions, offer scholarships to cover application fees, pair participants with peer and senior mentors, and distribute crowdsourced papers that simplify the application process. They also undertake research projects to investigate and highlight the difficulties that Black young researchers face, as well as push for structural reforms to eliminate these barriers and build equitable research settings. Moses Namara is a Facebook Research Fellow at Clemson University and a PhD candidate in Human-Centered Computing (HCC). He is the mentor for the new Black in AI Academic Program. During the graduate school admissions season in 2021, Black in AI served more than 200 potential graduate program candidates in some capacity. Furthermore, the organization's study identified greater problems encountered by Black graduate school candidates, such as the high cost of graduate school admissions examinations (GREs), which are known to be biased against those from low-income backgrounds. Black in AI's attempts to encourage institutions to eliminate the obstacles were supported by the findings. Black in AI is also developing a program to help and connect Black tech startups with investors. Black in AI also mentors early-career Black AI academics and is forming relationships with Historically Black Colleges and Universities to extend its academic program. In 2021, Black in AI launched two summer research programs, one for undergraduate internships and another for unconstrained research mentorship, including one aimed explicitly at empowering Black women's AI research projects. == Conferences and workshops == At NeurIPS 2017, the first Black in AI event took place in December 8, 2017 in Long Beach, California. The goal was to bring together experts in the area to share ideas and debate efforts aimed at increasing the participation of Black people in artificial intelligence, both for diversity and to avoid data bias. Black AI researchers had the opportunity to share their work at the workshop's oral and poster sessions. The second workshop was hosted in Montréal, Canada, on December 7, 2018. According to AI experts, visa issues stymie efforts to make their area more inclusive, making technology that discriminates or disadvantages individuals who aren't white or Western less likely. Hundreds of participants who were supposed to attend or present work at the Black in AI session on Friday were unable to fly to Canada; many of the participants were from African countries. The third workshop was held in NeurIPS 2019, one of the premier machine learning conferences Vancouver, Canada. The workshop was able to give travel scholarships and visa support to hundreds of academics who would not have been able to attend NeurIPS without the help of sponsors. For instance, Ramon Vilarino of the University of Sao Paulo, who presented a poster at the conference on his study of geographical and racial prejudice in credit scoring in Brazil, would not have been able to attend NeurIPS without the help of Black in AI. Twenty-four academics from Africa and South America were denied visas to attend this session during the conference, according to Victor Silva, the workshop organizer. He noted that, less than a month before the conference, 40 applicants from both continents had been given visas but that more than 70 applications were still waiting. For the second year in a row, visa restrictions have stopped several African scholars from attending the 2018 meeting in Montreal. The AAAI announced the first Black in AI lunch, which was held in conjunction with AAAI-19. The lunch was hosted on Tuesday, January 29, 2019. This event was intended to promote networking, discussion of various AI career options, and the exchange of ideas in order to boost the number of Black researchers in the area. The fourth Black in AI workshop, which was held in conjunction with NeurIPS 2020, took place the week of December 7, 2020. The workshop was scheduled to take place in Vancouver, British Columbia. Due to the pandemic, the session was held for the first time in a virtual format. Victor Silva, an AI4Society student, served as the event's chair. The fifth annual Black in AI workshop was also held virtually in 2021. Oral presentations, guest keynote speakers, a combined poster session with other affinity groups, sponsored sessions, and startup showcases was all featured. The goal of the session was to raise the visibility of black scholars at NeurIPS.

    Read more →
  • Multiple sequence alignment

    Multiple sequence alignment

    Multiple sequence alignment (MSA) is the process or the result of sequence alignment of three or more biological sequences, generally protein, DNA, or RNA. These alignments are used to infer evolutionary relationships via phylogenetic analysis and can highlight homologous features between sequences. Alignments highlight mutation events such as point mutations (single amino acid or nucleotide changes), insertion mutations and deletion mutations, and alignments are used to assess sequence conservation and infer the presence and activity of protein domains, tertiary structures, secondary structures, and individual amino acids or nucleotides. Multiple sequence alignments require more sophisticated methodologies than pairwise alignments, as they are more computationally complex. Most multiple sequence alignment programs use heuristic methods rather than global optimization because identifying the optimal alignment between more than a few sequences of moderate length is prohibitively computationally expensive. However, heuristic methods generally cannot guarantee high-quality solutions and have been shown to fail to yield near-optimal solutions on benchmark test cases. == Problem statement == Given m {\displaystyle m} sequences S i {\displaystyle S_{i}} , i = 1 , ⋯ , m {\displaystyle i=1,\cdots ,m} similar to the form below: S := { S 1 = ( S 11 , S 12 , … , S 1 n 1 ) S 2 = ( S 21 , S 22 , ⋯ , S 2 n 2 ) ⋮ S m = ( S m 1 , S m 2 , … , S m n m ) {\displaystyle S:={\begin{cases}S_{1}=(S_{11},S_{12},\ldots ,S_{1n_{1}})\\S_{2}=(S_{21},S_{22},\cdots ,S_{2n_{2}})\\\,\,\,\,\,\,\,\,\,\,\vdots \\S_{m}=(S_{m1},S_{m2},\ldots ,S_{mn_{m}})\end{cases}}} A multiple sequence alignment is taken of this set of sequences S {\displaystyle S} by inserting any amount of gaps needed into each of the S i {\displaystyle S_{i}} sequences of S {\displaystyle S} until the modified sequences, S i ′ {\displaystyle S'_{i}} , all conform to length L ≥ max { n i ∣ i = 1 , … , m } {\displaystyle L\geq \max\{n_{i}\mid i=1,\ldots ,m\}} and no values in the sequences of S {\displaystyle S} of the same column consists of only gaps. The mathematical form of an MSA of the above sequence set is shown below: S ′ := { S 1 ′ = ( S 11 ′ , S 12 ′ , … , S 1 L ′ ) S 2 ′ = ( S 21 ′ , S 22 ′ , … , S 2 L ′ ) ⋮ S m ′ = ( S m 1 ′ , S m 2 ′ , … , S m L ′ ) {\displaystyle S':={\begin{cases}S'_{1}=(S'_{11},S'_{12},\ldots ,S'_{1L})\\S'_{2}=(S'_{21},S'_{22},\ldots ,S'_{2L})\\\,\,\,\,\,\,\,\,\,\,\vdots \\S'_{m}=(S'_{m1},S'_{m2},\ldots ,S'_{mL})\end{cases}}} To return from each particular sequence S i ′ {\displaystyle S'_{i}} to S i {\displaystyle S_{i}} , remove all gaps. == Graphing approach == A general approach when calculating multiple sequence alignments is to use graphs to identify all of the different alignments. When finding alignments via graph, a complete alignment is created in a weighted graph that contains a set of vertices and a set of edges. Each of the graph edges has a weight based on a certain heuristic that helps to score each alignment or subset of the original graph. === Tracing alignments === When determining the best suited alignments for each MSA, a trace is usually generated. A trace is a set of realized, or corresponding and aligned, vertices that has a specific weight based on the edges that are selected between corresponding vertices. When choosing traces for a set of sequences it is necessary to choose a trace with a maximum weight to get the best alignment of the sequences. == Alignment methods == There are various alignment methods used within multiple sequence to maximize scores and correctness of alignments. Each is usually based on a certain heuristic with an insight into the evolutionary process. Most try to replicate evolution to get the most realistic alignment possible to best predict relations between sequences. === Dynamic programming === A direct method for producing an MSA uses the dynamic programming technique to identify the globally optimal alignment solution. For proteins, this method usually involves two sets of parameters: a gap penalty and a substitution matrix assigning scores or probabilities to the alignment of each possible pair of amino acids based on the similarity of the amino acids' chemical properties and the evolutionary probability of the mutation. For nucleotide sequences, a similar gap penalty is used, but a much simpler substitution matrix, wherein only identical matches and mismatches are considered, is typical. The scores in the substitution matrix may be either all positive or a mix of positive and negative in the case of a global alignment, but must be both positive and negative, in the case of a local alignment. For n individual sequences, the naive method requires constructing the n-dimensional equivalent of the matrix formed in standard pairwise sequence alignment. The search space thus increases exponentially with increasing n and is also strongly dependent on sequence length. Expressed with the big O notation commonly used to measure computational complexity, a naïve MSA takes O(LengthNseqs) time to produce. To find the global optimum for n sequences this way has been shown to be an NP-complete problem. In 1989, based on Carrillo-Lipman Algorithm, Altschul introduced a practical method that uses pairwise alignments to constrain the n-dimensional search space. In this approach pairwise dynamic programming alignments are performed on each pair of sequences in the query set, and only the space near the n-dimensional intersection of these alignments is searched for the n-way alignment. The MSA program optimizes the sum of all of the pairs of characters at each position in the alignment (the so-called sum of pair score) and has been implemented in a software program for constructing multiple sequence alignments. In 2019, Hosseininasab and van Hoeve showed that by using decision diagrams, MSA may be modeled in polynomial space complexity. === Progressive alignment construction === The most widely used approach to multiple sequence alignments uses a heuristic search known as progressive technique (also known as the hierarchical or tree method) developed by Da-Fei Feng and Doolittle in 1987. Progressive alignment builds up a final MSA by combining pairwise alignments beginning with the most similar pair and progressing to the most distantly related. All progressive alignment methods require two stages: a first stage in which the relationships between the sequences are represented as a phylogenetic tree, called a guide tree, and a second step in which the MSA is built by adding the sequences sequentially to the growing MSA according to the guide tree. The initial guide tree is determined by an efficient clustering method such as neighbor-joining or unweighted pair group method with arithmetic mean (UPGMA), and may use distances based on the number of identical two-letter sub-sequences (as in FASTA rather than a dynamic programming alignment). Progressive alignments are not guaranteed to be globally optimal. The primary problem is that when errors are made at any stage in growing the MSA, these errors are then propagated through to the final result. Performance is also particularly bad when all of the sequences in the set are rather distantly related. Most modern progressive methods modify their scoring function with a secondary weighting function that assigns scaling factors to individual members of the query set in a nonlinear fashion based on their phylogenetic distance from their nearest neighbors. This corrects for non-random selection of the sequences given to the alignment program. Progressive alignment methods are efficient enough to implement on a large scale for many (100s to 1000s) sequences. A popular progressive alignment method has been the Clustal family. ClustalW is used extensively for phylogenetic tree construction, in spite of the author's explicit warnings that unedited alignments should not be used in such studies and as input for protein structure prediction by homology modeling. European Bioinformatics Institute (EMBL-EBI) announced that CLustalW2 will expire in August 2015. They recommend Clustal Omega which performs based on seeded guide trees and HMM profile-profile techniques for protein alignments. An alternative tool for progressive DNA alignments is multiple alignment using fast Fourier transform (MAFFT). Another common progressive alignment method named T-Coffee is slower than Clustal and its derivatives but generally produces more accurate alignments for distantly related sequence sets. T-Coffee calculates pairwise alignments by combining the direct alignment of the pair with indirect alignments that aligns each sequence of the pair to a third sequence. It uses the output from Clustal as well as another local alignment program LALIGN, which finds multiple regions of local alignment between two sequences. The resulting alignment and phylogenetic tree are used as a guide to produce new and more accurate w

    Read more →