Vasant Honavar

Vasant G. Honavar is an Indian-American computer scientist, and artificial intelligence, machine learning, big data, data science, causal inference, knowledge representation, bioinformatics and health informatics researcher and professor. == Early life and education == Vasant Honavar was born at Pune, India to Bhavani G. and Gajanan N. Honavar. He received his early education at the Vidya Vardhaka Sangha High School and M.E.S. College in Bangalore, India. He received a B.E. in Electronics & Communications Engineering from the B.M.S. College of Engineering in Bangalore, India in 1982, when it was affiliated with Bangalore University, an M.S. in electrical and computer engineering in 1984 from Drexel University, and an M.S. in computer science in 1989, and a Ph.D. in 1990, respectively, from the University of Wisconsin–Madison, where he studied Artificial Intelligence and worked with Leonard Uhr. == Career == Honavar is on the faculty of Informatics and Intelligent Systems Department in the Penn State College of Information Sciences and Technology at Pennsylvania State University where he currently holds the Dorothy Foehr Huck and J. Lloyd Huck Chair in Biomedical Data Sciences and Artificial Intelligence and previously held the Edward Frymoyer Endowed Chair in Information Sciences and Technology. He serves on the faculties of the graduate programs in Computer Science, Informatics, Bioinformatics and Genomics, Neuroscience, Operations Research, Public Health Sciences, and of undergraduate programs in Data Science and Artificial Intelligence methods and applications. Honavar serves as the director of the Artificial Intelligence Research Laboratory, Director of Strategic Initiatives for the Institute for Computational and Data Sciences and the director of the Center for Artificial Intelligence Foundations and Scientific Applications at Pennsylvania State University. Honavar served on the Leadership Team of the Northeast Big Data Innovation Hub. Honavar served on the Computing Research Association's Computing Community Consortium Council during 2014-2017, where he chaired the task force on Convergence of Data and Computing, and was a member of the task force on Artificial Intelligence. Honavar was the first Sudha Murty Distinguished Visiting Chair of Neurocomputing and Data Science by the Indian Institute of Science, Bangalore, India. Honavar was named a Distinguished Member of the Association for Computing Machinery for "outstanding scientific contributions to computing"; and elected a Fellow of the American Association for the Advancement of Science for his "distinguished research contributions and leadership in data science". As a Program Director in the Information Integration and Informatics program in the Information and Intelligent Systems Division of the Computer and Information Science and Engineering Directorate of the US National Science Foundation during 2010-13, Honavar led the Big Data Program. Honavar was a professor of computer science at Iowa State University where he led the Artificial Intelligence Research Laboratory which he founded in 1990 and was instrumental in establishing an interdepartmental graduate program in Bioinformatics and Computational Biology (and served as its Chair during 2003–2005). Honavar has held visiting professorships at Carnegie Mellon University, the University of Wisconsin–Madison, and at the Indian Institute of Science. == Research == Honavar's research has contributed to advances in artificial intelligence, machine learning, causal inference, knowledge representation, neural networks, semantic web, big data analytics, and bioinformatics and computational biology. He was a program chair of the Association for the Advancement of Artificial Intelligence(AAAI)'s 36th Conference on Artificial Intelligence. He has published over 300 research articles, including many highly cited ones, as well as several books on these topics. His recent work has focused on federated machine learning algorithms for constructing predictive models from distributed data and linked open data, learning predictive models from high dimensional longitudinal data, reasoning with federated knowledge bases, detecting algorithmic bias, big data analytics, analysis and prediction of protein-protein, protein-RNA, and protein-DNA interfaces and interactions, social network analytics, health informatics, secrecy-preserving query answering, representing and reasoning about preferences, and causal inference from complex, e.g., relational, data, large language models, diffusion models, and meta analysis. Honavar has been active in fostering national and international scientific collaborations in Artificial Intelligence, Data Sciences, and their applications in addressing national, international, and societal priorities in accelerating science, improving health, transforming agriculture through partnerships that bring together academia, non-profits, and industry. He is also active in making the science policy case for major national research initiatives such as AI for accelerating science and AI for combating the epidemic of diseases of despair. == Honors == National Science Foundation Director's Award for Superior Accomplishment, 2013 National Science Foundation Director's Award for Collaborative Integration, 2012 Margaret Ellen White Graduate Faculty Award, Iowa State University, 2011 Outstanding Career Achievement in Research Award, College of Liberal Arts and Sciences, Iowa State University, 2008 Regents Award for Faculty Excellence, Iowa Board of Regents, 2007 Edward Frymoyer Endowed Chair in Information Sciences and Technology, Penn State College of Information Sciences and Technology, Pennsylvania State University, 2013 Senior Faculty Research Excellence Award, Penn State College of Information Sciences and Technology, Pennsylvania State University, 2016 125 People of Impact, Department of Electrical and Computer Engineering, University of Wisconsin-Madison, 2016 Sudha Murty Distinguished (Visiting) Chair of Neurocomputing and Data Science, Indian Institute of Science, 2016-2021 ACM Distinguished Member, 2018 AAAS Fellow American Association for the Advancement of Science, 2018 EAI Fellow European Alliance for Innovation, 2019 Dorothy Foehr Huck and J. Lloyd Huck Chair in Biomedical Data Sciences and Artificial Intelligence, Pennsylvania State University, 2021

Document classification

Document classification or document categorization is a problem in library science, information science and computer science. The task is to assign a document to one or more classes or categories. This may be done "manually" (or "intellectually") or algorithmically. The intellectual classification of documents has mostly been the province of library science, while the algorithmic classification of documents is mainly in information science and computer science. The problems are overlapping, however, and there is therefore interdisciplinary research on document classification. The documents to be classified may be texts, images, music, etc. Each kind of document possesses its special classification problems. When not otherwise specified, text classification is implied. Documents may be classified according to their subjects or according to other attributes (such as document type, author, printing year etc.). In the rest of this article only subject classification is considered. There are two main philosophies of subject classification of documents: the content-based approach and the request-based approach. == "Content-based" versus "request-based" classification == Content-based classification is classification in which the weight given to particular subjects in a document determines the class to which the document is assigned. It is, for example, a common rule for classification in libraries, that at least 20% of the content of a book should be about the class to which the book is assigned. In automatic classification it could be the number of times given words appears in a document. Request-oriented classification (or -indexing) is classification in which the anticipated request from users is influencing how documents are being classified. The classifier asks themself: “Under which descriptors should this entity be found?” and “think of all the possible queries and decide for which ones the entity at hand is relevant” (Soergel, 1985, p. 230). Request-oriented classification may be classification that is targeted towards a particular audience or user group. For example, a library or a database for feminist studies may classify/index documents differently when compared to a historical library. It is probably better, however, to understand request-oriented classification as policy-based classification: The classification is done according to some ideals and reflects the purpose of the library or database doing the classification. In this way it is not necessarily a kind of classification or indexing based on user studies. Only if empirical data about use or users are applied should request-oriented classification be regarded as a user-based approach. == Classification versus indexing == Sometimes a distinction is made between assigning documents to classes ("classification") versus assigning subjects to documents ("subject indexing") but as Frederick Wilfrid Lancaster has argued, this distinction is not fruitful. "These terminological distinctions,” he writes, “are quite meaningless and only serve to cause confusion” (Lancaster, 2003, p. 21). The view that this distinction is purely superficial is also supported by the fact that a classification system may be transformed into a thesaurus and vice versa (cf., Aitchison, 1986, 2004; Broughton, 2008; Riesthuis & Bliedung, 1991). Therefore, assigning a subject term to a document in an index is equivalent to assigning that document to the class of documents indexed by that term (all documents indexed or classified as X belong to the same class of documents). == Automatic document classification (ADC) == Automatic document classification tasks can be divided into three sorts: supervised document classification where some external mechanism (such as human feedback) provides information on the correct classification for documents, unsupervised document classification (also known as document clustering), where the classification must be done entirely without reference to external information, and semi-supervised document classification, where parts of the documents are labeled by the external mechanism. There are several software products under various license models available. === Techniques === Automatic document classification techniques include: Artificial neural network Concept Mining Decision trees such as ID3 or C4.5 Expectation maximization (EM) Instantaneously trained neural networks Latent semantic indexing Multiple-instance learning Naive Bayes classifier Natural language processing approaches Rough set-based classifier Soft set-based classifier Support vector machines (SVM) K-nearest neighbour algorithms tf–idf == Applications == Classification techniques have been applied to spam filtering, a process which tries to discern E-mail spam messages from legitimate emails email routing, sending an email sent to a general address to a specific address or mailbox depending on topic language identification, automatically determining the language of a text genre classification, automatically determining the genre of a text readability assessment, automatically determining the degree of readability of a text, either to find suitable materials for different age groups or reader types or as part of a larger text simplification system sentiment analysis, determining the attitude of a speaker or a writer with respect to some topic or the overall contextual polarity of a document. health-related classification using social media in public health surveillance article triage, selecting articles that are relevant for manual literature curation, for example as is being done as the first step to generate manually curated annotation databases in biology

Feature detection (web development)

Feature detection (also feature testing) is a technique used in web development for handling differences between runtime environments (typically web browsers or user agents), by programmatically testing for clues that the environment may or may not offer certain functionality. This information is then used to make the application adapt in some way to suit the environment: to make use of certain APIs, or tailor for a better user experience. Its proponents claim it is more reliable and future-proof than other techniques like user agent sniffing and browser-specific CSS hacks. == Techniques == A feature test can take many forms. It is essentially any snippet of code which gives some level of confidence that a required feature is indeed supported. However, in contrast to other techniques, feature detection usually focuses on performing actions which directly relate to the feature to be detected, rather than heuristics. === JavaScript === JavaScript feature detection can inspect the DOM and the local JavaScript environment to test whether browser features or APIs are supported. The simplest technique is to check for the existence of a relevant object or property. For example, the Geolocation API (used for accessing the device's knowledge of its geographical location, possibly obtained from a GPS navigation device) exposes a geolocation property on the navigator object in the DOM; the presence of which implies the Geolocation API is supported: if ('geolocation' in navigator) { // Geolocation API is supported } For a higher level of confidence, some feature tests will attempt to invoke the feature then look for clues that it behaved properly. For example, a test for support for cookies might attempt to set a value as a cookie and then verify it can be read back. === CSS === In CSS, the at-rule @supports introduced in 2015 allows to test if a given feature is supported. For instance the following code activates the declarations only if the user agent supports display: flex: == Undetectables == Some browser features are considered undetectable, because no clues are known to give sufficient confidence that a feature is supported. These are often because of limited information available to the JavaScript environment in the browser; generally features must be exposed via the DOM in some way in order to be detectable using JavaScript. When undetectables are encountered, it is common to turn to user agent sniffing as an alternative mechanism, or to employ defensive coding to minimise the impact if the feature turns out not to be supported. The Modernizr project maintains a record of known undetectables on their wiki.

Flapit

Flapit is a split-flap display that reveals real-time social media statistics such as Twitter followers or Yelp ratings. The product is designed to show off a bricks-and-mortar company's online community and increase its online presence by letting offline customers interact with the connected counter. The idea came from a product launched by the retailer C&A called the Fashion Like. The device can be customised via a web app and API to display any promotional messages, internal stats or discounts. It has 7 digits including numbers, letters and currency symbols Special messages such as Thank You or Like Us can be displayed on the first flap and are translated into Italian, German, French, Chinese, Japanese, Russian, Portuguese, Spanish and English. The Flapit counter was officially presented to the press at the CES Las Vegas 2015 and received favorable reviews from major specialised press

Bookmarklet

A bookmarklet is a bookmark stored in a web browser that contains JavaScript commands that add new features to the browser. They are stored as the URL of a bookmark in a web browser or as a hyperlink on a web page. Bookmarklets are usually small snippets of JavaScript executed when a user clicks on them. When clicked, bookmarklets can perform a wide variety of operations, such as running a search query from selected text or extracting data from a table. Another name for bookmarklet is favelet or favlet, derived from favorites (synonym of bookmark). == History == Steve Kangas of bookmarklets.com coined the word bookmarklet when he started to create short scripts based on a suggestion in Netscape's JavaScript guide. Before that, Tantek Çelik called these scripts favelets and used that word as early as on 6 September 2001 (personal email). Brendan Eich, who developed JavaScript at Netscape, gave this account of the origin of bookmarklets: They were a deliberate feature in this sense: I invented the javascript: URL along with JavaScript in 1995, and intended that javascript: URLs could be used as any other kind of URL, including being bookmark-able. In particular, I made it possible to generate a new document by loading, e.g. javascript:'hello, world', but also (key for bookmarklets) to run arbitrary script against the DOM of the current document, e.g. javascript:alert(document.links[0].href). The difference is that the latter kind of URL uses an expression that evaluates to the undefined type in JS. I added the void operator to JS before Netscape 2 shipped to make it easy to discard any non-undefined value in a javascript: URL. The increased implementation of Content Security Policy (CSP) in websites has caused problems with bookmarklet execution and usage (2013–2015), with some suggesting that this hails the end or death of bookmarklets. William Donnelly created a work-around solution for this problem (in the specific instance of loading, referencing and using JavaScript library code) in early 2015 using a Greasemonkey userscript (Firefox / Pale Moon browser add-on extension) and a simple bookmarklet-userscript communication protocol. It allows (library-based) bookmarklets to be executed on any and all websites, including those using CSP and having an https:// URI scheme. However, if/when browsers support disabling/disallowing inline script execution using CSP, and if/when websites begin to implement that feature, it will "break" this "fix". == Concept == Web browsers use URIs for the href attribute of the tag and for bookmarks. The URI scheme, such as http or ftp, and which generally specifies the protocol, determines the format of the rest of the string. Browsers also implement javascript: URIs that to a parser is just like any other URI. The browser recognizes the specified javascript scheme and treats the rest of the string as a JavaScript program which is then executed. The expression result, if any, is treated as the HTML source code for a new page displayed in place of the original. The executing script has access to the current page, which it may inspect and change. If the script returns an undefined type (rather than, for example, a string), the browser will not load a new page, with the result that the script simply runs against the current page content. This permits changes such as in-place font size and color changes without a page reload. An immediately invoked function that returns no value or an expression preceded by the void operator will prevent the browser from attempting to parse the result of the evaluation as a snippet of HTML markup: == Usage == Bookmarklets are saved and used as normal bookmarks. As such, they are simple "one-click" tools which add functionality to the browser. For example, they can: Modify the appearance of a web page within the browser (e.g., change font size, background color, etc.) Extract data from a web page (e.g., hyperlinks, images, text, etc.) Remove redirects from (e.g. Google) search results, to show the actual target URL Submit the current page to a blogging service such as Posterous, link-shortening service such as bit.ly, or bookmarking service such as Delicious Query a search engine or online encyclopedia with highlighted text or by a dialog box Submit the current page to a link validation service or translation service Set commonly chosen configuration options when the page itself provides no way to do this Control HTML5 audio and video playback parameters such as speed, position, toggling looping, and showing/hiding playback controls, the first of which can be adjusted beyond HTML5 players' typical range setting. Installing a bookmarklet follows the same process as adding a normal bookmark; the only difference is that in place of the URL destination field is JavaScript code preceded by javascript:. Once created, bookmarklets can be run by clicking on them.

Inferential theory of learning

Inferential Theory of Learning (ITL) is an area of machine learning which describes inferential processes performed by learning agents. ITL has been continuously developed by Ryszard S. Michalski, starting in the 1980s. The first known publication of ITL was in 1983. In the ITL learning process is viewed as a search (inference) through hypotheses space guided by a specific goal. The results of learning need to be stored. Stored information will later be used by the learner for future inferences. Inferences are split into multiple categories including conclusive, deduction, and induction. In order for an inference to be considered complete it was required that all categories must be taken into account. This is how the ITL varies from other machine learning theories like Computational Learning Theory and Statistical Learning Theory; which both use singular forms of inference. == Usage == The most relevant published usage of ITL was in scientific journal published in 2012 and used ITL as a way to describe how agent-based learning works. According to the journal "The Inferential Theory of Learning (ITL) provides an elegant way of describing learning processes by agents".

Digital edition

A digital edition is an online magazine or online newspaper delivered in electronic form which is formatted identically to the print version. Digital editions are often called digital facsimiles to underline the likeness to the print version. Digital editions have the benefit of reduced cost to the publisher and reader by avoiding the time and the expense to print and deliver paper edition. This format is considered more environmentally friendly due to the reduction of paper and energy use. These editions also often feature interactive elements such as hyperlinks both within the publication itself and to other internet resources, search option and bookmarking, and can also incorporate multimedia such as video or animation to enhance articles themselves or for advertisement purposes. Some delivery methods also include animation and sound effects that replicate turning of the page to further enhance the experience of their print counterparts. Magazine publishers have traditionally relied on two revenue sources: selling ads and selling magazines. Additionally some publishers are using other electronic publication methods such as RSS to reach out to readers and inform them when new digital editions are available. Current technologies are generally either reader-based, requiring a download of an application and subsequent download of each edition, or browser-based, often using Macromedia Flash, requiring no application download (such as Adobe Acrobat). Some application-based readers allow users to access editions while not connected to internet. Dedicated hardware such as the Amazon Kindle and the iPad is also available for reading digital editions of select books, popular national magazines such as Time, The Atlantic, and Forbes and popular national newspapers such as the New York Times, Wall Street Journal, and Washington Post. Archives of print newspapers, in some cases dating hundreds of years back, are being digitized and made available online. Google is indexing existing digital archives produced by the newspapers themselves or by third parties. Newspaper and magazine archival began with microform film formats solving the problem of efficiently storing and preserving. This format, however, lacked accessibility. Many libraries, especially state libraries in the United States are archiving their collections digitally and converting existing microfilm to digital format. The Library of Congress provides project planning assistance and the National Endowment for the Humanities procures funding through grants from its National Digital Newspaper Program. Digital magazines, ezines, e-editions and emags are sometimes referred to as digital editions, however some of these formats are published only in digital format unlike digital editions which replicate a printed edition as well. == Digital magazines == Digital-replica magazines number in thousands—consumer and business publications, house magazines for associations, institutions and corporations – and conversion from print to digital was still increasing as of 2009. A 2008 report funded by digital-replica technology providers and auditing agencies counted 1,786 digital-replica editions having more than 7 million circulation among business-to-business publications, of which 230 editions were audited The same report counted 1,470 digital-replica editions of consumer magazines having 5.5 million digital circulation, of which 240 editions were audited. These authors estimated that by year end of 2009 there would be 8,000 digital magazines, having a combined distribution of more than 30 million people. Surveys have shown that, while not all subscribers prefer a digital edition, some do because of the environmental benefit and also because digital magazines are searchable and may easily be passed along or linked to. One such survey funded by a digital publisher reported on inputs from more than 30,000 subscribers to business, consumer and other digital magazines. == Digital magazine business models == === Reduced printing and distribution costs === The publishers' choice to save by moving some or all subscribers from print to digital is widely accepted. Oracle magazine, which has 176,000 of its 516,000 subscribers receiving digital according to its June 2009 BPA circulation statement, is said to be the most widely circulated digital edition of a business-to-business publication. Publishers who do this need to choose whether to make some issues all-digital, move some subscribers to digital edition, add some digital-only subscribers, or send all subscribers the digital edition. === Paid subscription revenue === In 2009, a major consumer magazine, PC Magazine, went all-digital, charging an annual subscription fee for its digital-replica edition. Many consumer magazines and newspapers are already available in eReader formats that are sold through booksellers. === Sponsorship and advertising revenue === Digital editions often carry special "front cover" advertising, or advertising on the email message alerting the subscriber of the digital edition. Publishers also produce special digital-only inserts and rich-media ads or advertorials. === Designed-for-digital issues === Another approach is to fully replace printed issues with digital ones, or to use digital editions for extra issues that would otherwise have to be printed.