AI Content Vs Human Content

AI Content Vs Human Content — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Autoscaling

    Autoscaling

    Autoscaling, (also written as auto scaling, auto-scaling, or known as automatic scaling), is a method used in cloud computing that dynamically adjusts the amount of computational resources in a server farm - typically measured by the number of active servers - automatically based on the load on the farm. For example, the number of servers running behind a web application may be increased or decreased automatically based on the number of active users on the site. Since such metrics may change dramatically throughout the course of the day, and servers are a limited resource that cost money to run even while idle, there is often an incentive to run "just enough" servers to support the current load while still being able to support sudden and large spikes in activity. Autoscaling is helpful for such needs, as it can reduce the number of active servers when activity is low, and launch new servers when activity is high. Autoscaling is closely related to, and builds upon, the idea of load balancing. == Advantages == Autoscaling offers the following advantages: For companies running their own web server infrastructure, autoscaling typically means allowing some servers to go to sleep during times of low load, saving on electricity costs (as well as water costs if water is being used to cool the machines). For companies using infrastructure hosted in the cloud, autoscaling can mean lower bills, because most cloud providers charge based on total usage rather than maximum capacity. Even for companies that cannot reduce the total compute capacity they run or pay for at any given time, autoscaling can help by allowing the company to run less time-sensitive workloads on machines that get freed up by autoscaling during times of low traffic. Autoscaling solutions, such as the one offered by Amazon Web Services, can also take care of replacing unhealthy instances and therefore protecting somewhat against hardware, network, and application failures. Autoscaling can offer greater uptime and more availability in cases where production workloads are variable and unpredictable. Autoscaling differs from having a fixed daily, weekly, or yearly cycle of server use in that it is responsive to actual usage patterns, and thus reduces the potential downside of having too few or too many servers for the traffic load. For instance, if traffic is usually lower at midnight, then a static scaling solution might schedule some servers to sleep at night, but this might result in downtime on a night where people happen to use the Internet more (for instance, due to a viral news event). Autoscaling, on the other hand, can handle unexpected traffic spikes better. == Terminology == In the list below, we use the terminology used by Amazon Web Services (AWS). However, alternative names are noted and terminology that is specific to the names of Amazon services is not used for the names. == Practice == === Amazon Web Services (AWS) === Amazon Web Services launched the Amazon Elastic Compute Cloud (EC2) service in August 2006, that allowed developers to programmatically create and terminate instances (machines). At the time of initial launch, AWS did not offer autoscaling, but the ability to programmatically create and terminate instances gave developers the flexibility to write their own code for autoscaling. Third-party autoscaling software for AWS began appearing around April 2008. These included tools by Scalr and RightScale. RightScale was used by Animoto, which was able to handle Facebook traffic by adopting autoscaling. On May 18, 2009, Amazon launched its own autoscaling feature along with Elastic Load Balancing, as part of Amazon Elastic Compute Cloud. Autoscaling is now an integral component of Amazon's EC2 offering. Autoscaling on Amazon Web Services is done through a web browser or the command line tool. In May 2016 Autoscaling was also offered in AWS ECS Service. On-demand video provider Netflix documented their use of autoscaling with Amazon Web Services to meet their highly variable consumer needs. They found that aggressive scaling up and delayed and cautious scaling down served their goals of uptime and responsiveness best. In an article for TechCrunch, Zev Laderman, the co-founder and CEO of Newvem, a service that helps optimize AWS cloud infrastructure, recommended that startups use autoscaling in order to keep their Amazon Web Services costs low. Various best practice guides for AWS use suggest using its autoscaling feature even in cases where the load is not variable. That is because autoscaling offers two other advantages: automatic replacement of any instances that become unhealthy for any reason (such as hardware failure, network failure, or application error), and automatic replacement of spot instances that get interrupted for price or capacity reasons, making it more feasible to use spot instances for production purposes. Netflix's internal best practices require every instance to be in an autoscaling group, and its conformity monkey terminates any instance not in an autoscaling group in order to enforce this best practice. === Microsoft's Windows Azure === On June 27, 2013, Microsoft announced that it was adding autoscaling support to its Windows Azure cloud computing platform. Documentation for the feature is available on the Microsoft Developer Network. === Oracle Cloud === Oracle Cloud Platform allows server instances to automatically scale a cluster in or out by defining an auto-scaling rule. These rules are based on CPU and/or memory utilization and determine when to add or remove nodes. === Google Cloud Platform === On November 17, 2014, the Google Compute Engine announced a public beta of its autoscaling feature for use in Google Cloud Platform applications. As of March 2015, the autoscaling tool is still in Beta. === Facebook === In a blog post in August 2014, a Facebook engineer disclosed that the company had started using autoscaling to bring down its energy costs. The blog post reported a 27% decline in energy use for low traffic hours (around midnight) and a 10-15% decline in energy use over the typical 24-hour cycle. === Kubernetes Horizontal Pod Autoscaler === Kubernetes Horizontal Pod Autoscaler automatically scales the number of pods in a replication controller, deployment or replicaset based on observed CPU utilization (or, with beta support, on some other, application-provided metrics) == Alternative autoscaling decision approaches == Autoscaling by default uses reactive decision approach for dealing with traffic scaling: scaling only happens in response to real-time changes in metrics. In some cases, particularly when the changes occur very quickly, this reactive approach to scaling is insufficient. Two other kinds of autoscaling decision approaches are described below. === Scheduled autoscaling approach === This is an approach to autoscaling where changes are made to the minimum size, maximum size, or desired capacity of the autoscaling group at specific times of day. Scheduled scaling is useful, for instance, if there is a known traffic load increase or decrease at specific times of the day, but the change is too sudden for reactive approach based autoscaling to respond fast enough. AWS autoscaling groups support scheduled scaling. === Predictive autoscaling === This approach to autoscaling uses predictive analytics. The idea is to combine recent usage trends with historical usage data as well as other kinds of data to predict usage in the future, and autoscale based on these predictions. For parts of their infrastructure and specific workloads, Netflix found that Scryer, their predictive analytics engine, gave better results than Amazon's reactive autoscaling approach. In particular, it was better for: Identifying huge spikes in demand in the near future and getting capacity ready a little in advance Dealing with large-scale outages, such as failure of entire availability zones and regions Dealing with variable traffic patterns, providing more flexibility on the rate of scaling out or in based on the typical level and rate of change in demand at various times of day On November 20, 2018, AWS announced that predictive scaling would be available as part of its autoscaling offering.

    Read more →
  • Structured support vector machine

    Structured support vector machine

    The structured supportvector machine is a machine learning algorithm that generalizes the support vector machine (SVM) classifier. Whereas the SVM classifier supports binary classification, multiclass classification and regression, the structured SVM allows training of a classifier for general structured output labels. As an example, a sample instance might be a natural language sentence, and the output label is an annotated parse tree. Training a classifier consists of showing pairs of correct sample and output label pairs. After training, the structured SVM model allows one to predict for new sample instances the corresponding output label; that is, given a natural language sentence, the classifier can produce the most likely parse tree. == Training == For a set of n {\displaystyle n} training instances ( x i , y i ) ∈ X × Y {\displaystyle ({\boldsymbol {x}}_{i},y_{i})\in {\mathcal {X}}\times {\mathcal {Y}}} , i = 1 , … , n {\displaystyle i=1,\dots ,n} from a sample space X {\displaystyle {\mathcal {X}}} and label space Y {\displaystyle {\mathcal {Y}}} , the structured SVM minimizes the following regularized risk function. min w ‖ w ‖ 2 + C ∑ i = 1 n max y ∈ Y ( 0 , Δ ( y i , y ) + ⟨ w , Ψ ( x i , y ) ⟩ − ⟨ w , Ψ ( x i , y i ) ⟩ ) {\displaystyle {\underset {\boldsymbol {w}}{\min }}\quad \|{\boldsymbol {w}}\|^{2}+C\sum _{i=1}^{n}{\underset {y\in {\mathcal {Y}}}{\max }}\left(0,\Delta (y_{i},y)+\langle {\boldsymbol {w}},\Psi ({\boldsymbol {x}}_{i},y)\rangle -\langle {\boldsymbol {w}},\Psi ({\boldsymbol {x}}_{i},y_{i})\rangle \right)} The function is convex in w {\displaystyle {\boldsymbol {w}}} because the maximum of a set of affine functions is convex. The function Δ : Y × Y → R + {\displaystyle \Delta :{\mathcal {Y}}\times {\mathcal {Y}}\to \mathbb {R} _{+}} measures a distance in label space and is an arbitrary function (not necessarily a metric) satisfying Δ ( y , z ) ≥ 0 {\displaystyle \Delta (y,z)\geq 0} and Δ ( y , y ) = 0 ∀ y , z ∈ Y {\displaystyle \Delta (y,y)=0\;\;\forall y,z\in {\mathcal {Y}}} . The function Ψ : X × Y → R d {\displaystyle \Psi :{\mathcal {X}}\times {\mathcal {Y}}\to \mathbb {R} ^{d}} is a feature function, extracting some feature vector from a given sample and label. The design of this function depends very much on the application. Because the regularized risk function above is non-differentiable, it is often reformulated in terms of a quadratic program by introducing one slack variable ξ i {\displaystyle \xi _{i}} for each sample, each representing the value of the maximum. The standard structured SVM primal formulation is given as follows. min w , ξ ‖ w ‖ 2 + C ∑ i = 1 n ξ i s.t. ⟨ w , Ψ ( x i , y i ) ⟩ − ⟨ w , Ψ ( x i , y ) ⟩ + ξ i ≥ Δ ( y i , y ) , i = 1 , … , n , ∀ y ∈ Y {\displaystyle {\begin{array}{cl}{\underset {{\boldsymbol {w}},{\boldsymbol {\xi }}}{\min }}&\|{\boldsymbol {w}}\|^{2}+C\sum _{i=1}^{n}\xi _{i}\\{\textrm {s.t.}}&\langle {\boldsymbol {w}},\Psi ({\boldsymbol {x}}_{i},y_{i})\rangle -\langle {\boldsymbol {w}},\Psi ({\boldsymbol {x}}_{i},y)\rangle +\xi _{i}\geq \Delta (y_{i},y),\qquad i=1,\dots ,n,\quad \forall y\in {\mathcal {Y}}\end{array}}} == Inference == At test time, only a sample x ∈ X {\displaystyle {\boldsymbol {x}}\in {\mathcal {X}}} is known, and a prediction function f : X → Y {\displaystyle f:{\mathcal {X}}\to {\mathcal {Y}}} maps it to a predicted label from the label space Y {\displaystyle {\mathcal {Y}}} . For structured SVMs, given the vector w {\displaystyle {\boldsymbol {w}}} obtained from training, the prediction function is the following. f ( x ) = argmax y ∈ Y ⟨ w , Ψ ( x , y ) ⟩ {\displaystyle f({\boldsymbol {x}})={\underset {y\in {\mathcal {Y}}}{\textrm {argmax}}}\quad \langle {\boldsymbol {w}},\Psi ({\boldsymbol {x}},y)\rangle } Therefore, the maximizer over the label space is the predicted label. Solving for this maximizer is the so-called inference problem and similar to making a maximum a-posteriori (MAP) prediction in probabilistic models. Depending on the structure of the function Ψ {\displaystyle \Psi } , solving for the maximizer can be a hard problem. == Separation == The above quadratic program involves a very large, possibly infinite number of linear inequality constraints. In general, the number of inequalities is too large to be optimized over explicitly. Instead the problem is solved by using delayed constraint generation where only a finite and small subset of the constraints is used. Optimizing over a subset of the constraints enlarges the feasible set and will yield a solution that provides a lower bound on the objective. To test whether the solution w {\displaystyle {\boldsymbol {w}}} violates constraints of the complete set inequalities, a separation problem needs to be solved. As the inequalities decompose over the samples, for each sample ( x i , y i ) {\displaystyle ({\boldsymbol {x}}_{i},y_{i})} the following problem needs to be solved. y n ∗ = argmax y ∈ Y ( Δ ( y i , y ) + ⟨ w , Ψ ( x i , y ) ⟩ − ⟨ w , Ψ ( x i , y i ) ⟩ − ξ i ) {\displaystyle y_{n}^{}={\underset {y\in {\mathcal {Y}}}{\textrm {argmax}}}\left(\Delta (y_{i},y)+\langle {\boldsymbol {w}},\Psi ({\boldsymbol {x}}_{i},y)\rangle -\langle {\boldsymbol {w}},\Psi ({\boldsymbol {x}}_{i},y_{i})\rangle -\xi _{i}\right)} The right hand side objective to be maximized is composed of the constant − ⟨ w , Ψ ( x i , y i ) ⟩ − ξ i {\displaystyle -\langle {\boldsymbol {w}},\Psi ({\boldsymbol {x}}_{i},y_{i})\rangle -\xi _{i}} and a term dependent on the variables optimized over, namely Δ ( y i , y ) + ⟨ w , Ψ ( x i , y ) ⟩ {\displaystyle \Delta (y_{i},y)+\langle {\boldsymbol {w}},\Psi ({\boldsymbol {x}}_{i},y)\rangle } . If the achieved right hand side objective is smaller or equal to zero, no violated constraints for this sample exist. If it is strictly larger than zero, the most violated constraint with respect to this sample has been identified. The problem is enlarged by this constraint and resolved. The process continues until no violated inequalities can be identified. If the constants are dropped from the above problem, we obtain the following problem to be solved. y i ∗ = argmax y ∈ Y ( Δ ( y i , y ) + ⟨ w , Ψ ( x i , y ) ⟩ ) {\displaystyle y_{i}^{}={\underset {y\in {\mathcal {Y}}}{\textrm {argmax}}}\left(\Delta (y_{i},y)+\langle {\boldsymbol {w}},\Psi ({\boldsymbol {x}}_{i},y)\rangle \right)} This problem looks very similar to the inference problem. The only difference is the addition of the term Δ ( y i , y ) {\displaystyle \Delta (y_{i},y)} . Most often, it is chosen such that it has a natural decomposition in label space. In that case, the influence of Δ {\displaystyle \Delta } can be encoded into the inference problem and solving for the most violating constraint is equivalent to solving the inference problem.

    Read more →
  • Black in AI

    Black in AI

    Black in AI, formally called the Black in AI Workshop, is a technology research organization and affinity group, founded by computer scientists Timnit Gebru and Rediet Abebe in 2017. It started as a conference workshop, later pivoting into an organization. Black in AI increases the presence and inclusion of Black people in the field of artificial intelligence (AI) by creating space for sharing ideas, fostering collaborations, mentorship, and advocacy. == History == Black in AI was created in 2017 to address issues of lack of diversity in AI workshops, and was started as its own workshop within the Conference on Neural Information Processing Systems (NeurIPS) conference. Because of algorithmic bias, ethical issues, and underrepresentation of Black people in AI roles; there has been an ongoing need for unity within the AI community to have focus on these issues. Black in AI has strived to continue the progress of improving the presence of people of color in the field of artificial intelligence. In 2018 and 2019, the Black in AI workshop had many immigration visa issues to Canada, which spurred the conference to be planned for 2020 in Addis Ababa, Ethiopia. On December 7, 2020, Black in AI held its fourth annual workshop and first virtual workshop (due to the COVID-19 pandemic). In 2021, Black in AI, alongside the groups Queer in AI and Widening NLP, released a public statement refusing funding from Google in an act of protest of Google's treatment of Timnit Gebru, Margaret Mitchell, and April Christina Curley in the events that occurred in December 2020. == Founders == Rediet Abebe is an Ethiopian computer scientist who specializes in algorithms and artificial intelligence. She is a Computer Science Assistant Professor at the University of California, Berkeley. She was previously a Junior Fellow at Harvard's Society of Fellows. She was the first Black woman to receive a Ph.D. in computer science at Cornell University. She "designs and analyzes algorithms, discrete optimizations, network-based, [and] computational strategies to increase access to opportunity for historically disadvantaged populations," according to her web bio. Timnit Gebru was born in Ethiopia and moved to the United States at the age of fifteen. She got her B.S. and M.S. in electrical engineering from Stanford University, as well as a PhD from the Stanford Artificial Intelligence Laboratory, where she studied computer vision under Fei-Fei Li. She formerly worked as a postdoctoral researcher at Microsoft Research in the Fairness Accountability Transparency, and Ethics (FATE) division. She's also worked with Apple, where she assisted in the development of signal-processing algorithms for the original iPad. == Grants == Black in AI received grants and support from private foundations like MacArthur Foundation and Rockefeller Foundation. The organization received $10,000 in 2018 for its annual workshop and $150,000 in 2019 for its long-term organizational planning. In 2020, during the pandemic, the organization received a grant of $300,000 by MacArthur Foundation in order to provide broad organizational support. In 2022, Rockefeller Foundation announced $300,000 to fight prejudice in artificial intelligence (AI) across the globe and incorporate equity into this rapidly expanding field. == Programs == "Black in AI works in academics, advocacy, entrepreneurship, financial support, and summer research programs." The Black in AI Academic Program is a resource for Black junior researchers applying to graduate schools, navigating graduate school, and transitioning into the postgraduate employment market. They provide online education sessions, offer scholarships to cover application fees, pair participants with peer and senior mentors, and distribute crowdsourced papers that simplify the application process. They also undertake research projects to investigate and highlight the difficulties that Black young researchers face, as well as push for structural reforms to eliminate these barriers and build equitable research settings. Moses Namara is a Facebook Research Fellow at Clemson University and a PhD candidate in Human-Centered Computing (HCC). He is the mentor for the new Black in AI Academic Program. During the graduate school admissions season in 2021, Black in AI served more than 200 potential graduate program candidates in some capacity. Furthermore, the organization's study identified greater problems encountered by Black graduate school candidates, such as the high cost of graduate school admissions examinations (GREs), which are known to be biased against those from low-income backgrounds. Black in AI's attempts to encourage institutions to eliminate the obstacles were supported by the findings. Black in AI is also developing a program to help and connect Black tech startups with investors. Black in AI also mentors early-career Black AI academics and is forming relationships with Historically Black Colleges and Universities to extend its academic program. In 2021, Black in AI launched two summer research programs, one for undergraduate internships and another for unconstrained research mentorship, including one aimed explicitly at empowering Black women's AI research projects. == Conferences and workshops == At NeurIPS 2017, the first Black in AI event took place in December 8, 2017 in Long Beach, California. The goal was to bring together experts in the area to share ideas and debate efforts aimed at increasing the participation of Black people in artificial intelligence, both for diversity and to avoid data bias. Black AI researchers had the opportunity to share their work at the workshop's oral and poster sessions. The second workshop was hosted in Montréal, Canada, on December 7, 2018. According to AI experts, visa issues stymie efforts to make their area more inclusive, making technology that discriminates or disadvantages individuals who aren't white or Western less likely. Hundreds of participants who were supposed to attend or present work at the Black in AI session on Friday were unable to fly to Canada; many of the participants were from African countries. The third workshop was held in NeurIPS 2019, one of the premier machine learning conferences Vancouver, Canada. The workshop was able to give travel scholarships and visa support to hundreds of academics who would not have been able to attend NeurIPS without the help of sponsors. For instance, Ramon Vilarino of the University of Sao Paulo, who presented a poster at the conference on his study of geographical and racial prejudice in credit scoring in Brazil, would not have been able to attend NeurIPS without the help of Black in AI. Twenty-four academics from Africa and South America were denied visas to attend this session during the conference, according to Victor Silva, the workshop organizer. He noted that, less than a month before the conference, 40 applicants from both continents had been given visas but that more than 70 applications were still waiting. For the second year in a row, visa restrictions have stopped several African scholars from attending the 2018 meeting in Montreal. The AAAI announced the first Black in AI lunch, which was held in conjunction with AAAI-19. The lunch was hosted on Tuesday, January 29, 2019. This event was intended to promote networking, discussion of various AI career options, and the exchange of ideas in order to boost the number of Black researchers in the area. The fourth Black in AI workshop, which was held in conjunction with NeurIPS 2020, took place the week of December 7, 2020. The workshop was scheduled to take place in Vancouver, British Columbia. Due to the pandemic, the session was held for the first time in a virtual format. Victor Silva, an AI4Society student, served as the event's chair. The fifth annual Black in AI workshop was also held virtually in 2021. Oral presentations, guest keynote speakers, a combined poster session with other affinity groups, sponsored sessions, and startup showcases was all featured. The goal of the session was to raise the visibility of black scholars at NeurIPS.

    Read more →
  • ROUGE (metric)

    ROUGE (metric)

    ROUGE, or Recall-Oriented Understudy for Gisting Evaluation, is a set of metrics and a software package used for evaluating automatic summarization and machine translation software in natural language processing. The metrics compare an automatically produced summary or translation against a reference or a set of references (human-produced) summary or translation. ROUGE metrics range between 0 and 1, with higher scores indicating higher similarity between the automatically produced summary and the reference. == Metrics == The following five evaluation metrics are available. ROUGE-N: Overlap of n-grams between the system and reference summaries. ROUGE-1 refers to the overlap of unigrams (each word) between the system and reference summaries. ROUGE-2 refers to the overlap of bigrams between the system and reference summaries. ROUGE-L: Longest Common Subsequence (LCS) based statistics. Longest common subsequence problem takes into account sentence-level structure similarity naturally and identifies longest co-occurring in sequence n-grams automatically. ROUGE-W: Weighted LCS-based statistics that favors consecutive LCSes. ROUGE-S: Skip-bigram based co-occurrence statistics. Skip-bigram is any pair of words in their sentence order. ROUGE-SU: Skip-bigram plus unigram-based co-occurrence statistics.

    Read more →
  • Intrinsic dimension

    Intrinsic dimension

    In mathematics, the intrinsic dimension of a subset can be thought of as the minimal number of variables needed to represent the subset. The concept has widespread applications in geometry, dynamical systems, signal processing, statistics, and other fields. Due to its widespread applications and vague conceptualization, there are many different ways to define it rigorously. Consequently, the same set might have different intrinsic dimensions according to different definitions. The intrinsic dimension can be used as a lower bound of what dimension it is possible to compress a data set into through dimension reduction, but it can also be used as a measure of the complexity of the data set or signal. For a data set or signal of N variables, its intrinsic dimension M satisfies 0 ≤ M ≤ N, although estimators may yield higher values. == Exact dimension == === Differential === In differential geometry, given a differentiable manifold N and a submanifold M, the intrinsic dimension of M is its dimension. Suppose N has n dimensions and M has m dimensions, then that means around any point in M, there exists a local coordinate system ( x 1 , … , x m , x m + 1 , … , x n ) {\displaystyle (x_{1},\dots ,x_{m},x_{m+1},\dots ,x_{n})} of N, such that the manifold M is simply the subset of N defined by x m + 1 = 0 , … , x n = 0 {\displaystyle x_{m+1}=0,\dots ,x_{n}=0} . === Metric === Given a mere metric space, we can still define its intrinsic dimension. The most general case is the Hausdorff dimension, though for metric spaces occurring in practice, the box-counting dimension and the packing dimension often are identical to the Hausdorff dimension. Let X , d {\textstyle X,d} be a metric space and A ⊂ X {\textstyle A\subset X} be totally bounded. Define the covering number N ( A , ε ) = min { k : A ⊂ ⋃ i = 1 k B ( x i , ε ) } . {\displaystyle N(A,\varepsilon )=\min \left\{k:A\subset \bigcup _{i=1}^{k}B\left(x_{i},\varepsilon \right)\right\}.} The metric entropy is H ( A , ε ) = log ⁡ N ( A , ε ) {\textstyle H(A,\varepsilon )=\log N(A,\varepsilon )} (any log base). The upper and lower metric entropy dimensions are dim ¯ E A = lim sup ε ↓ 0 H ( A , ε ) log ⁡ ( 1 / ε ) , dim _ E A = lim inf ε ↓ 0 H ( A , ε ) log ⁡ ( 1 / ε ) . {\displaystyle {\overline {\dim }}_{E}A=\limsup _{\varepsilon \downarrow 0}{\frac {H(A,\varepsilon )}{\log(1/\varepsilon )}},\quad {\underline {\dim }}_{E}A=\liminf _{\varepsilon \downarrow 0}{\frac {H(A,\varepsilon )}{\log(1/\varepsilon )}}.} If they are equal, then dim E ⁡ A {\textstyle \operatorname {dim} _{E}A} is that common value, called the metric entropy dimension. The entropy dimensions are usually used in information theory, and especially coding theory, since entropy is involved in its definition. === Topological === If X {\displaystyle X} is merely a topological space, then we can still define its intrinsic dimension, using the topological dimension or Lebesgue covering dimension. An open cover of a topological space X is a family of open sets Uα such that their union is the whole space, ∪ α {\displaystyle \cup _{\alpha }} Uα = X. The order or ply of an open cover A {\displaystyle {\mathfrak {A}}} = {Uα} is the smallest number m (if it exists) for which each point of the space belongs to at most m open sets in the cover: in other words Uα1 ∩ ⋅⋅⋅ ∩ Uαm+1 = ∅ {\displaystyle \emptyset } for α1, ..., αm+1 distinct. A refinement of an open cover A {\displaystyle {\mathfrak {A}}} = {Uα} is another open cover B {\displaystyle {\mathfrak {B}}} = {Vβ}, such that each Vβ is contained in some Uα. The covering dimension of a topological space X is defined to be the minimum value of n such that every finite open cover A {\displaystyle {\mathfrak {A}}} of X has an open refinement B {\displaystyle {\mathfrak {B}}} with order n + 1. The refinement B {\displaystyle {\mathfrak {B}}} can always be chosen to be finite. Thus, if n is finite, Vβ1 ∩ ⋅⋅⋅ ∩ Vβn+2 = ∅ {\displaystyle \emptyset } for β1, ..., βn+2 distinct. If no such minimal n exists, the space is said to have infinite covering dimension. == Introductory example == Let f ( x 1 , x 2 ) {\textstyle f(x_{1},x_{2})} be a two-variable function (or signal) which is of the form f ( x 1 , x 2 ) = g ( x 1 ) {\textstyle f(x_{1},x_{2})=g(x_{1})} for some one-variable function g which is not constant. This means that f varies, in accordance to g, with the first variable or along the first coordinate. On the other hand, f is constant with respect to the second variable or along the second coordinate. It is only necessary to know the value of one, namely the first, variable in order to determine the value of f. Hence, it is a two-variable function but its intrinsic dimension is one. A slightly more complicated example is f ( x 1 , x 2 ) = g ( x 1 + x 2 ) {\textstyle f(x_{1},x_{2})=g(x_{1}+x_{2})} . f is still intrinsic one-dimensional, which can be seen by making a variable transformation y 1 = x 1 + x 2 {\textstyle y_{1}=x_{1}+x_{2}} and y 2 = x 1 − x 2 {\textstyle y_{2}=x_{1}-x_{2}} which gives f ( y 1 + y 2 2 , y 1 − y 2 2 ) = g ( y 1 ) {\textstyle f\left({\frac {y_{1}+y_{2}}{2}},{\frac {y_{1}-y_{2}}{2}}\right)=g\left(y_{1}\right)} . Since the variation in f can be described by the single variable y1 its intrinsic dimension is one. For the case that f is constant, its intrinsic dimension is zero since no variable is needed to describe variation. For the general case, when the intrinsic dimension of the two-variable function f is neither zero or one, it is two. In the literature, functions which are of intrinsic dimension zero, one, or two are sometimes referred to as i0D, i1D or i2D, respectively. == Signal processing == In signal processing of multidimensional signals, the intrinsic dimension of the signal describes how many variables are needed to generate a good approximation of the signal. For an N-variable function f, the set of variables can be represented as an N-dimensional vector x: f = f ( x ) where x = ( x 1 , … , x N ) {\textstyle f=f\left(\mathbf {x} \right){\text{ where }}\mathbf {x} =\left(x_{1},\dots ,x_{N}\right)} . If for some M-variable function g and M × N matrix A it is the case that for all x; f ( x ) = g ( A x ) , {\textstyle f(\mathbf {x} )=g(\mathbf {Ax} ),} M is the smallest number for which the above relation between f and g can be found, then the intrinsic dimension of f is M. The intrinsic dimension is a characterization of f, it is not an unambiguous characterization of g nor of A. That is, if the above relation is satisfied for some f, g, and A, it must also be satisfied for the same f and g′ and A′ given by g ′ ( y ) = g ( B y ) {\textstyle g'\left(\mathbf {y} \right)=g\left(\mathbf {By} \right)} and A ′ = B − 1 A {\textstyle \mathbf {A'} =\mathbf {B} ^{-1}\mathbf {A} } where B is a non-singular M × M matrix, since f ( x ) = g ′ ( A ′ x ) = g ( B A ′ x ) = g ( A x ) {\textstyle f\left(\mathbf {x} \right)=g'\left(\mathbf {A'x} \right)=g\left(\mathbf {BA'x} \right)=g\left(\mathbf {Ax} \right)} . == The Fourier transform of signals of low intrinsic dimension == An N variable function which has intrinsic dimension M < N has a characteristic Fourier transform. Intuitively, since this type of function is constant along one or several dimensions its Fourier transform must appear like an impulse (the Fourier transform of a constant) along the same dimension in the frequency domain. === A simple example === Let f be a two-variable function which is i1D. This means that there exists a normalized vector n ∈ R 2 {\textstyle \mathbf {n} \in \mathbb {R} ^{2}} and a one-variable function g such that f ( x ) = g ( n T x ) {\textstyle f(\mathbf {x} )=g(\mathbf {n} ^{\operatorname {T} }\mathbf {x} )} for all x ∈ R 2 {\textstyle \mathbf {x} \in \mathbb {R} ^{2}} . If F is the Fourier transform of f (both are two-variable functions) it must be the case that F ( u ) = G ( n T u ) ⋅ δ ( m T u ) {\textstyle F\left(\mathbf {u} \right)=G\left(\mathbf {n} ^{\mathrm {T} }\mathbf {u} \right)\cdot \delta \left(\mathbf {m} ^{\mathrm {T} }\mathbf {u} \right)} . Here G is the Fourier transform of g (both are one-variable functions), δ is the Dirac impulse function and m is a normalized vector in R 2 {\textstyle \mathbb {R} ^{2}} perpendicular to n. This means that F vanishes everywhere except on a line which passes through the origin of the frequency domain and is parallel to m. Along this line F varies according to G. === The general case === Let f be an N-variable function which has intrinsic dimension M, that is, there exists an M-variable function g and M × N matrix A such that f ( x ) = g ( A x ) ∀ x {\textstyle f(\mathbf {x} )=g(\mathbf {Ax} )\quad \forall \mathbf {x} } . Its Fourier transform F can then be described as follows: F vanishes everywhere except for a subspace of dimension M The subspace M is spanned by the rows of the matrix A In the subspace, F varies according to G the Fourier transform of g == Generalizations == The type of intrinsic dimension described above assume

    Read more →
  • NFA minimization

    NFA minimization

    In automata theory (a branch of theoretical computer science), NFA minimization is the task of transforming a given nondeterministic finite automaton (NFA) into an equivalent NFA that has a minimum number of states, transitions, or both. While efficient algorithms exist for DFA minimization, NFA minimization is PSPACE-complete. No efficient (polynomial time) algorithms are known, and under the standard assumption that P ≠ PSPACE, none exist. The most efficient known algorithm is the Kameda–Weiner algorithm. == Non-uniqueness of minimal NFA == Unlike deterministic finite automata, minimal NFAs may not be unique. There may be multiple NFAs with the same number of states that accept the same regular language, but for which there is no equivalent NFA or DFA with fewer states.

    Read more →
  • Markov chain central limit theorem

    Markov chain central limit theorem

    In the mathematical theory of random processes, the Markov chain central limit theorem has a conclusion somewhat similar in form to that of the classic central limit theorem (CLT) of probability theory, but the quantity in the role taken by the variance in the classic CLT has a more complicated definition. See also the general form of Bienaymé's identity. == Statement == Suppose that: the sequence X 1 , X 2 , X 3 , … {\textstyle X_{1},X_{2},X_{3},\ldots } of random elements of some set is a Markov chain that has a stationary probability distribution; and the initial distribution of the process, i.e. the distribution of X 1 {\textstyle X_{1}} , is the stationary distribution, so that X 1 , X 2 , X 3 , … {\textstyle X_{1},X_{2},X_{3},\ldots } are identically distributed. In the classic central limit theorem these random variables would be assumed to be independent, but here we have only the weaker assumption that the process has the Markov property; and g {\textstyle g} is some (measurable) real-valued function for which var ⁡ ( g ( X 1 ) ) < + ∞ . {\textstyle \operatorname {var} (g(X_{1}))<+\infty .} Now let μ = E ⁡ ( g ( X 1 ) ) , μ ^ n = 1 n ∑ k = 1 n g ( X k ) σ 2 := lim n → ∞ var ⁡ ( n μ ^ n ) = lim n → ∞ n var ⁡ ( μ ^ n ) = var ⁡ ( g ( X 1 ) ) + 2 ∑ k = 1 ∞ cov ⁡ ( g ( X 1 ) , g ( X 1 + k ) ) . {\displaystyle {\begin{aligned}\mu &=\operatorname {E} (g(X_{1})),\\{\widehat {\mu }}_{n}&={\frac {1}{n}}\sum _{k=1}^{n}g(X_{k})\\\sigma ^{2}&:=\lim _{n\to \infty }\operatorname {var} ({\sqrt {n}}{\widehat {\mu }}_{n})=\lim _{n\to \infty }n\operatorname {var} ({\widehat {\mu }}_{n})=\operatorname {var} (g(X_{1}))+2\sum _{k=1}^{\infty }\operatorname {cov} (g(X_{1}),g(X_{1+k})).\end{aligned}}} Then as n → ∞ , {\textstyle n\to \infty ,} we have n ( μ ^ n − μ ) → D Normal ( 0 , σ 2 ) , {\displaystyle {\sqrt {n}}({\hat {\mu }}_{n}-\mu )\ {\xrightarrow {\mathcal {D}}}\ {\text{Normal}}(0,\sigma ^{2}),} where the decorated arrow indicates convergence in distribution. == Monte Carlo Setting == The Markov chain central limit theorem can be guaranteed for functionals of general state space Markov chains under certain conditions. In particular, this can be done with a focus on Monte Carlo settings. An example of the application in a MCMC (Markov Chain Monte Carlo) setting is the following: Consider a simple hard spheres model on a grid. Suppose X = { 1 , … , n 1 } × { 1 , … , n 2 } ⊆ Z 2 {\displaystyle X=\{1,\ldots ,n_{1}\}\times \{1,\ldots ,n_{2}\}\subseteq Z^{2}} . A proper configuration on X {\displaystyle X} consists of coloring each point either black or white in such a way that no two adjacent points are white. Let χ {\displaystyle \chi } denote the set of all proper configurations on X {\displaystyle X} , N χ ( n 1 , n 2 ) {\displaystyle N_{\chi }(n_{1},n_{2})} be the total number of proper configurations and π be the uniform distribution on χ {\displaystyle \chi } so that each proper configuration is equally likely. Suppose our goal is to calculate the typical number of white points in a proper configuration; that is, if W ( x ) {\displaystyle W(x)} is the number of white points in x ∈ χ {\displaystyle x\in \chi } then we want the value of E π W = ∑ x ∈ χ W ( x ) N χ ( n 1 , n 2 ) {\displaystyle E_{\pi }W=\sum _{x\in \chi }{\frac {W(x)}{N_{\chi }{\bigl (}n_{1},n_{2}{\bigr )}}}} If n 1 {\displaystyle n_{1}} and n 2 {\displaystyle n_{2}} are even moderately large then we will have to resort to an approximation to E π W {\displaystyle E_{\pi }W} . Consider the following Markov chain on χ {\displaystyle \chi } . Fix p ∈ ( 0 , 1 ) {\displaystyle p\in (0,1)} and set X 1 = x 1 {\displaystyle X_{1}=x_{1}} where x 1 ∈ χ {\displaystyle x_{1}\in \chi } is an arbitrary proper configuration. Randomly choose a point ( x , y ) ∈ X {\displaystyle (x,y)\in X} and independently draw U ∼ U n i f o r m ( 0 , 1 ) {\displaystyle U\sim \mathrm {Uniform} (0,1)} . If u ≤ p {\displaystyle u\leq p} and all of the adjacent points are black then color ( x , y ) {\displaystyle (x,y)} white leaving all other points alone. Otherwise, color ( x , y ) {\displaystyle (x,y)} black and leave all other points alone. Call the resulting configuration X 1 {\displaystyle X_{1}} . Continuing in this fashion yields a Harris ergodic Markov chain { X 1 , X 2 , X 3 , … } {\displaystyle \{X_{1},X_{2},X_{3},\ldots \}} having π {\displaystyle \pi } as its invariant distribution. It is now a simple matter to estimate E π W {\displaystyle E_{\pi }W} with w n ¯ = ∑ i = 1 n W ( X i ) / n {\displaystyle {\overline {w_{n}}}=\sum _{i=1}^{n}W(X_{i})/n} . Also, since χ {\displaystyle \chi } is finite (albeit potentially large) it is well known that X {\displaystyle X} will converge exponentially fast to π {\displaystyle \pi } which implies that a CLT holds for w n ¯ {\displaystyle {\overline {w_{n}}}} . == Implications == Not taking into account the additional terms in the variance which stem from correlations (e.g. serial correlations in markov chain monte carlo simulations) can result in the problem of pseudoreplication when computing e.g. the confidence intervals for the sample mean.

    Read more →
  • Margin (machine learning)

    Margin (machine learning)

    In machine learning, the margin of a single data point is defined to be the distance from the data point to a decision boundary. Note that there are many distances and decision boundaries that may be appropriate for certain datasets and goals. A margin classifier is a classification model that utilizes the margin of each example to learn such classification. There are theoretical justifications (based on the VC dimension) as to why maximizing the margin (under some suitable constraints) may be beneficial for machine learning and statistical inference algorithms. For a given dataset, there may be many hyperplanes that could classify it. One reasonable choice as the best hyperplane is the one that represents the largest separation, or margin, between the classes. Hence, one should choose the hyperplane such that the distance from it to the nearest data point on each side is maximized. If such a hyperplane exists, it is known as the maximum-margin hyperplane, and the linear classifier it defines is known as a maximum margin classifier (or, equivalently, the perceptron of optimal stability).

    Read more →
  • SimSimi

    SimSimi

    SimSimi is an artificial intelligence conversation program created in 2002 by ISMaker. It grows its artificial intelligence day by day assisted by a feature that allows users to teach it to respond correctly. SimSimi, pronounced as "shim-shimi", is from a Korean word simsim (심심) which means "bored". It has an application designed for Android, Windows Phone and iOS. The application was banned in Thailand in 2012 after users taught it to make responses containing profanity, and to criticise leading politicians. In April 2018, SimSimi was suspended in Brazil due to accusations of sending inappropriate messages, such as sexual language, bullying and even death threats, being labeled as "dangerous" mainly due to its popularity among children, and according to its developer, the suspension of the app in the country "was inevitable because the SimSimi app, at least in the last few days, had a significant negative social impact in Brazil.”

    Read more →
  • Aapo Hyvärinen

    Aapo Hyvärinen

    Aapo Johannes Hyvärinen (born 1970 in Helsinki) is a Finnish professor of computer science at the University of Helsinki and known for his research in independent component analysis. == Education and career == Hyvärinen was born in Helsinki and studied mathematics at the University of Helsinki and received his Doctor of Technology in information science in 1997 at the Helsinki University of Technology under the supervision of Erkki Oja. His doctoral thesis, titled "Independent component analysis: A neural network approach", introduced the FastICA algorithm. Since then, Hyvärinen has conducted research especially in relation to the independent component analysis, as well as score matching (also known as Hyvärinen scoring rule). In November 2007, he was appointed as a professor at the University of Helsinki. Hyvärinen has been a member of the Finnish Academy of Sciences since 2016. From August 2016 to March 2019, he held a professorship in machine learning at the Gatsby Computational Neuroscience Unit of the University College London.

    Read more →
  • Katie Bouman

    Katie Bouman

    Katherine Louise Bouman (; born 1989) is an American engineer and computer scientist working in the field of computational imaging. She led the development of an algorithm for imaging black holes, known as Continuous High-resolution Image Reconstruction using Patch priors (CHIRP), and was a member of the Event Horizon Telescope team that captured the first image of a black hole. The California Institute of Technology, which hired Bouman as an assistant professor in June 2019, awarded her a named professorship in 2020. In 2021, asteroid 291387 Katiebouman was named after her. In 2024, she became an associate professor. == Early life and education == Bouman grew up in West Lafayette, Indiana. Her father, Charles Bouman, is a professor of electrical and computer engineering and biomedical engineering at Purdue University. As a high school student, Bouman conducted imaging research at Purdue University. She graduated from West Lafayette Junior-Senior High School in 2007. Bouman studied electrical engineering at the University of Michigan and graduated summa cum laude in 2011. She earned her master's degree in 2013 and obtained a doctoral degree in electrical engineering and computer science in 2017 from the Massachusetts Institute of Technology (MIT). At MIT, she was a member of the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL). This group also worked closely with MIT's Haystack Observatory and with the Event Horizon Telescope. She was supported by a National Science Foundation Graduate Fellowship. Her master's thesis, Estimating Material Properties of Fabric through the Observation of Motion, was awarded the Ernst Guillemin Award for best Master's Thesis in electrical engineering. Her Ph.D. dissertation, Extreme imaging via physical model inversion: seeing around corners and imaging black holes, was supervised by William T. Freeman. Prior to receiving her doctoral degree, Bouman delivered a TEDx talk, How to Take a Picture of a Black Hole, which explained algorithms that could be used to capture the first image of a black hole. == Research and career == After earning her doctorate, Bouman joined Harvard University as a postdoctoral fellow on the Event Horizon Telescope Imaging team. Bouman joined Event Horizon Telescope project in 2013. She led the development of an algorithm for imaging black holes, known as Continuous High-resolution Image Reconstruction using Patch priors (CHIRP). CHIRP inspired image validation procedures used in acquiring the first image of a black hole in April 2019, and Bouman played a significant role in the project by verifying images, selecting parameters for filtering images taken by the Event Horizon Telescope, and participating in the development of a robust imaging framework that compared the results of different image reconstruction techniques. Her group is analyzing the Event Horizon Telescope's images to learn more about general relativity in a strong gravitational field. Bouman received significant media attention after a photo, showing her reaction to the detection of the black hole shadow in the EHT images, went viral. Some people in the media and on the Internet misleadingly implied that Bouman was a "lone genius" behind the image. However, Bouman herself repeatedly noted that the result came from the work of a large collaboration, showing the importance of teamwork in science. Bouman also became the target of online harassment, to the extent that her colleague Andrew Chael made a statement on Twitter criticizing "awful and sexist attacks on my colleague and friend", including attempts to undermine her contributions by crediting him solely with work accomplished by the team. Bouman joined the California Institute of Technology (Caltech) as an assistant professor in June 2019, where she works on new systems for computational imaging using computer vision and machine learning. In 2024, she was promoted to associate professor of computing and mathematical sciences, electrical engineering and astronomy as well as a Rosenberg Scholar. Bouman received a named professorship at Caltech in 2020. In 2021, Bouman was awarded the Royal Photographic Society Progress Medal and Honorary Fellowship. == Recognition == She was recognized as one of the BBC's 100 women of 2019. In 2024, Bouman was awarded a Sloan Research Fellowship.

    Read more →
  • How to Choose an Conversational AI Platform

    How to Choose an Conversational AI Platform

    Trying to pick the best conversational AI platform? An conversational AI platform is software that uses machine learning to help you get more done — it scales effortlessly from a single task to thousands. The best picks balance beginner-friendly simplicity with the depth power users need, and they ship updates often. Whether you are a beginner or a pro, the right conversational AI platform slots into your workflow and pays for itself fast. This guide breaks down the top picks, their pros and cons, and who each one is best for.

    Read more →
  • LumenVox

    LumenVox

    LumenVox is a privately held speech recognition software company based in San Diego, California. LumenVox has been described as one of the market leaders in the speech recognition software industry. == History == LumenVox was founded in 2001 as subsidiary of Progressive Computing. According to LumenVox CEO Edward Miller, when Progressive had initially looked to add speech recognition to its own phone system, it found the existing offerings too expensive and recognized a niche in the market for a more affordable speech recognition product. This led to the development of LumenVox with an aim to bring speech recognition to small-to-midsized businesses. LumenVox is one of the major providers of automatic speech recognition for telephone systems, and as of 2006, became the second largest provider of speech recognition software. == Products == The primary LumenVox product is the LumenVox Speech Engine. It is a speaker-independent automatic speech recognizer that uses the Speech Recognition Grammar Specification for building and defining grammars. It has been integrated with several of the major voice platforms, including Avaya Voice Portal/Interactive Response, Aculab, and BroadSoft's BroadWorks. The Speech Engine was originally derived from CMU Sphinx, but LumenVox has added considerable development effort to make it a commercial-ready product. LumenVox also offers a product called the Speech Tuner, which provides a graphical means of testing and troubleshooting speech recognition applications. == Open source support == LumenVox was recognized as one of the top VoIP companies in 2008 for its work in providing its offerings to the open source community, an effort by the company that began in 2006 when it partnered with Digium. At that time, Digium, maintainer of the open source Asterisk PBX, integrated the LumenVox Speech Engine into Asterisk. This made LumenVox the first commercially available speech recognition engine for Asterisk. As one of the earlier commercial software integrations with Asterisk, the LumenVox integration has been described as one of the applications that helped to mainstream Asterisk. In 2009, LumenVox also began offering access to the Speech Engine as a monthly subscription, bringing the cost of entry down even lower for open source users. LumenVox is also integrated with the open source UniMRCP project, which provides open source client and server libraries for the Media Resource Control Protocol.

    Read more →
  • Structured prediction

    Structured prediction

    Structured prediction or structured output learning is an umbrella term for supervised machine learning techniques that involves predicting structured objects, rather than discrete or real values. Similar to commonly used supervised learning techniques, structured prediction models are typically trained by means of observed data in which the predicted value is compared to the ground truth, and this is used to adjust the model parameters. Due to the complexity of the model and the interrelations of predicted variables, the processes of model training and inference are often computationally infeasible, so approximate inference and learning methods are used. == Applications == An example application is the problem of translating a natural language sentence into a syntactic representation such as a parse tree. This can be seen as a structured prediction problem in which the structured output domain is the set of all possible parse trees. Structured prediction is used in a wide variety of domains including bioinformatics, natural language processing (NLP), speech recognition, and computer vision. === Example: sequence tagging === Sequence tagging is a class of problems prevalent in NLP in which input data are often sequential, for instance sentences of text. The sequence tagging problem appears in several guises, such as part-of-speech tagging (POS tagging) and named entity recognition. In POS tagging, for example, each word in a sequence must be 'tagged' with a class label representing the type of word: The main challenge of this problem is to resolve ambiguity: in the above example, the words "sentence" and "tagged" in English can also be verbs. While this problem can be solved by simply performing classification of individual tokens, this approach does not take into account the empirical fact that tags do not occur independently; instead, each tag displays a strong conditional dependence on the tag of the previous word. This fact can be exploited in a sequence model such as a hidden Markov model or conditional random field that predicts the entire tag sequence for a sentence (rather than just individual tags) via the Viterbi algorithm. == Techniques == Probabilistic graphical models form a large class of structured prediction models. In particular, Bayesian networks and random fields are popular. Other algorithms and models for structured prediction include inductive logic programming, case-based reasoning, structured SVMs, Markov logic networks, Probabilistic Soft Logic, and constrained conditional models. The main techniques are: Conditional random fields Structured support vector machines Structured k-nearest neighbours Recurrent neural networks, in particular Elman networks Transformers. === Structured perceptron === One of the easiest ways to understand algorithms for general structured prediction is the structured perceptron by Collins. This algorithm combines the perceptron algorithm for learning linear classifiers with an inference algorithm (classically the Viterbi algorithm when used on sequence data) and can be described abstractly as follows: First, define a function ϕ ( x , y ) {\displaystyle \phi (x,y)} that maps a training sample x {\displaystyle x} and a candidate prediction y {\displaystyle y} to a vector of length n {\displaystyle n} ( x {\displaystyle x} and y {\displaystyle y} may have any structure; n {\displaystyle n} is problem-dependent, but must be fixed for each model). Let G E N {\displaystyle GEN} be a function that generates candidate predictions. Then: Let w {\displaystyle w} be a weight vector of length n {\displaystyle n} For a predetermined number of iterations: For each sample x {\displaystyle x} in the training set with true output t {\displaystyle t} : Make a prediction y ^ {\displaystyle {\hat {y}}} : y ^ = a r g m a x { y ∈ G E N ( x ) } ( w T , ϕ ( x , y ) ) {\displaystyle {\hat {y}}={\operatorname {arg\,max} }\,\{y\in GEN(x)\}\,(w^{T},\phi (x,y))} Update w {\displaystyle w} (from y ^ {\displaystyle {\hat {y}}} towards t {\displaystyle t} ): w = w + c ( − ϕ ( x , y ^ ) + ϕ ( x , t ) ) {\displaystyle w=w+c(-\phi (x,{\hat {y}})+\phi (x,t))} , where c {\displaystyle c} is the learning rate. In practice, finding the argmax over G E N ( x ) {\displaystyle {GEN}({x})} is done using an algorithm such as Viterbi or a max-sum, rather than an exhaustive search through an exponentially large set of candidates. The idea of learning is similar to that for multiclass perceptrons.

    Read more →
  • Raymond J. Mooney

    Raymond J. Mooney

    Raymond J. Mooney is an American computer scientist, professor of computer science, and director of the Artificial Intelligence laboratory at the University of Texas at Austin. His research focuses on machine learning and natural language processing. He was educated at O'Fallon Township High School in O'Fallon, Illinois and earned a BS, MS, and Ph.D. in computer science at the University of Illinois at Urbana-Champaign, where he was advised by Gerald DeJong. He is a fellow of the Association for Computing Machinery (ACM), Association for Computational Linguistics (ACL), and Association for the Advancement of Artificial Intelligence (AAAI).

    Read more →