Egocentric vision

Egocentric vision

Egocentric vision or first-person vision is a sub-field of computer vision that entails analyzing images and videos captured by a wearable camera, which is typically worn on the head or on the chest and naturally approximates the visual field of the camera wearer. Consequently, visual data capture the part of the scene on which the user focuses to carry out the task at hand and offer a valuable perspective to understand the user's activities and their context in a naturalistic setting. The wearable camera looking forwards is often supplemented with a camera looking inward at the user's eye and able to measure a user's eye gaze, which is useful to reveal attention and to better understand the user's activity and intentions. == History == The idea of using a wearable camera to gather visual data from a first-person perspective dates back to the 70s, when Steve Mann invented "Digital Eye Glass", a device that, when worn, causes the human eye itself to effectively become both an electronic camera and a television display. Subsequently, wearable cameras were used for health-related applications in the context of Humanistic Intelligence and Wearable AI. Egocentric vision is best done from the point-of-eye, but may also be done by way of a neck-worn camera when eyeglasses would be in-the-way. This neck-worn variant was popularized by way of the Microsoft SenseCam in 2006 for experimental health research works. The interest of the computer vision community into the egocentric paradigm has been arising slowly entering the 2010s and it is rapidly growing in recent years, boosted by both the impressive advances in the field of wearable technology and by the increasing number of potential applications. The prototypical first-person vision system described by Kanade and Hebert, in 2012 is composed by three basic components: a localization component able to estimate the surrounding, a recognition component able to identify object and people, and an activity recognition component, able to provide information about the current activity of the user. Together, these three components provide a complete situational awareness of the user, which in turn can be used to provide assistance to the user or to the caregiver. Following this idea, the first computational techniques for egocentric analysis focused on hand-related activity recognition and social interaction analysis. Also, given the unconstrained nature of the video and the huge amount of data generated, temporal segmentation and summarization were among the first problems addressed. After almost ten years of egocentric vision (2007–2017), the field is still undergoing diversification. Emerging research topics include: Social saliency estimation Multi-agent egocentric vision systems Privacy preserving techniques and applications Attention-based activity analysis Social interaction analysis Hand pose analysis Ego graphical User Interfaces (EUI) Understanding social dynamics and attention Revisiting robotic vision and machine vision as egocentric sensing Activity forecasting Gaze prediction == Technical challenges == Today's wearable cameras are small and lightweight digital recording devices that can acquire images and videos automatically, without the user intervention, with different resolutions and frame rates, and from a first-person point of view. Therefore, wearable cameras are naturally primed to gather visual information from our everyday interactions since they offer an intimate perspective of the visual field of the camera wearer. Depending on the frame rate, it is common to distinguish between photo-cameras (also called lifelogging cameras) and video-cameras. The former (e.g., Narrative Clip and Microsoft SenseCam), are commonly worn on the chest, and are characterized by a very low frame rate (up to 2fpm) that allows to capture images over a long period of time without the need of recharging the battery. Consequently, they offer considerable potential for inferring knowledge about e.g. behaviour patterns, habits or lifestyle of the user. However, due to the low frame-rate and the free motion of the camera, temporally adjacent images typically present abrupt appearance changes so that motion features cannot be reliably estimated. The latter (e.g., Google Glass, GoPro), are commonly mounted on the head, and capture conventional video (around 35fps) that allows to capture fine temporal details of interactions. Consequently, they offer potential for in-depth analysis of daily or special activities. However, since the camera is moving with the wearer head, it becomes more difficult to estimate the global motion of the wearer and in the case of abrupt movements, the images can result blurred. In both cases, since the camera is worn in a naturalistic setting, visual data present a huge variability in terms of illumination conditions and object appearance. Moreover, the camera wearer is not visible in the image and what he/she is doing has to be inferred from the information in the visual field of the camera, implying that important information about the wearer, such for instance as pose or facial expression estimation, is not available. == Applications == A collection of studies published in a special theme issue of the American Journal of Preventive Medicine has demonstrated the potential of lifelogs captured through wearable cameras from a number of viewpoints. In particular, it has been shown that used as a tool for understanding and tracking lifestyle behaviour, lifelogs would enable the prevention of noncommunicable diseases associated to unhealthy trends and risky profiles (such as obesity and depression). In addition, used as a tool of re-memory cognitive training, lifelogs would enable the prevention of cognitive and functional decline in elderly people. More recently, egocentric cameras have been used to study human and animal cognition, human-human social interaction, human-robot interaction, human expertise in complex tasks. Other applications include navigation/assistive technologies for the blind, monitoring and assistance of industrial workflows, and augmented reality interfaces.

Grammatik

Grammatik was the first grammar-checking program for home computers. Aspen Software of Albuquerque, NM, released the earliest version of this diction and style checker for personal computers. It was first released no later than 1981, and was inspired by the Writer's Workbench. Grammatik was first available for the TRS-80, and soon had versions for CP/M and the IBM PC. Reference Software International of San Francisco, California, acquired Grammatik in 1985. Development of Grammatik continued, and it became an actual grammar checker that could detect writing errors beyond simple style checking. Subsequent versions were released for MS-DOS, Windows, Macintosh, and Unix. Grammatik was ultimately acquired by WordPerfect Corporation and is integrated into the WordPerfect word processor.

Rnn (software)

rnn is an open-source machine learning framework that implements recurrent neural network architectures, such as LSTM and GRU, natively in the R programming language, that has been downloaded over 100,000 times (from the RStudio servers alone). The rnn package is distributed through the Comprehensive R Archive Network under the open-source GPL v3 license. == Workflow == The below example from the rnn documentation show how to train a recurrent neural network to solve the problem of bit-by-bit binary addition. == sigmoid == The sigmoid functions and derivatives used in the package were originally included in the package, from version 0.8.0 onwards, these were released in a separate R package sigmoid, with the intention to enable more general use. The sigmoid package is a dependency of the rnn package and therefore automatically installed with it. == Reception == With the release of version 0.3.0 in April 2016 the use in production and research environments became more widespread. The package was reviewed several months later on the R blog The Beginner Programmer as "R provides a simple and very user friendly package named rnn for working with recurrent neural networks.", which further increased usage. The book Neural Networks in R by Balaji Venkateswaran and Giuseppe Ciaburro uses rnn to demonstrate recurrent neural networks to R users. It is also used in the r-exercises.com course "Neural network exercises". The RStudio CRAN mirror download logs show that the package is downloaded on average about 2,000 per month from those servers , with a total of over 100,000 downloads since the first release, according to RDocumentation.org, this puts the package in the 15th percentile of most popular R packages .

Visual hierarchy

Visual hierarchy, in Gestalt psychology, describes how particular elements in a visual field stand out more than others in a pattern, creating a perceived order of importance. Although it can occur naturally, the term is most often used in design—especially graphic design and cartography—where elements are arranged to appear more important than others. This order is created by the visual contrast between forms in a field of perception. Objects with highest contrast to their surroundings are recognized first by the human mind. == Evidence == There is some scientific evidence for visual hierarchy using eye tracking. For example, one study found that when people agree that a graphic design is good, they exhibit more similar eye movements; measured by the Fréchet distance. == Theory == The concept of visual hierarchy is based in Gestalt psychological theory, an early 20th-century German theory that proposes that the human brain has innate organizing tendencies that “structure individual elements, shapes or forms into a coherent, organized whole,” especially when processing visual information. The German word Gestalt translates into “form,” “pattern,” or “shape” in English. When an element in a visual field disconnects from the ‘whole’ created by the brain's perceptual organization, it “stands out” to the viewer. The shapes that disconnect most severely from their surroundings stand out the most. This is commonly encapsulated as the Von Restorff effect, which states that isolation attracts attention. === Physical characteristics === The brain distinguishes objects based on differences in their physical appearances. These characteristics fall into four categories: color, size, alignment, and character. Each type of contrast can be used to construct a visual hierarchy. The same characteristics are also sometimes categorized (especially among cartographers) according to the visual variables of Jacques Bertin. Color encompasses the hue, saturation, value, and perceived texture of forms. Dark figures will stand out on a light background, light figures will stand out on a dark background, brightly colored figures will stand out on a muted background, and so on. The fluorescent colors used for tennis balls and other sports equipment is intended to make them instantly stand out against almost any natural visual field. Size has a strong influence on visual hierarchy. Large elements typically attract attention, provided that they can be recognized as figures. Alignment is the arrangement of forms relative to one another. For example, items in the upper left corner of a page are often seen first (at least for those readers accustomed to western languages), the center of the field has prominence. Negative space can also be employed: a figure isolated among large amounts of white space will stand out more than one amid other figures. Character includes several kinds of contrasts based on shape. For example, complex patterns attract more attention than simple or predictable patterns, intricate shapes attract more attention than generalized ones. Even large-scale patterns can attract attention if they contrast with the pattern in the remainder of the visual field. Camouflage is an example of eliminating contrast in character in color and/or character specifically to reduce visual hierarchy. The "squint test" is often suggested as a simple, if unscientific, method to evaluate the visual hierarchy of a graphical product like a map or web page. When viewed out of focus (or from a great distance), the viewer is not distracted by details, but can only see overall (gestalt) patterns such as visual hierarchy. All of the above patterns, except some aspects of character, are recognizable by this method. == Application == Visual hierarchy is an important concept in the field of graphic design, a field that specializes in visual organization. Designers attempt to control visual hierarchy to guide the eye to information in a specific order for a specific purpose. One could compare visual hierarchy in graphic design to grammatical structure in writing in terms of the importance of each principle to these fields. === Cartography === In cartographic design, visual hierarchy is used to emphasize certain important features on a map over less important features. Typically, a map has a purpose that dictates a conceptual hierarchy of what should be more or less important, so one of the goals of the choice of map symbols is to match the visual hierarchy to the conceptual hierarchy. The Visual hierarchy of a map may apply to individual geographic features (such as making a single country stand out), to map layers of related features (e.g., making lakes stand out more than roads), and to the entire layout of map and non-map elements (e.g., making the title look more important than the scale bar). Like the main map elements, such features have weight, and the properties that apply to visual hierarchy of map layers also apply to other elements on the page. Size and alignment are the two main determinants of the visual hierarchy for these features. Cartographers often utilize principles of negative space and figure-ground contrast to design an appropriate visual hierarchy by employing contrast between unused space and layout features. === User experience design and behavioral design === In user experience design and behavioural design, such as web design, visual hierarchy is used to prioritize navigational structures and content, so that audiences focus on elements that facilitate system usage, or increases the chance that they notice content that contains psychological nudges. Color is one of many factors used in the design of a visual hierarchy, and a key factor due to the high salience of color perception.

MIT Computer Science and Artificial Intelligence Laboratory

Computer Science and Artificial Intelligence Laboratory (CSAIL) is a research institute at the Massachusetts Institute of Technology (MIT) formed by the 2003 merger of the Laboratory for Computer Science (LCS) and the Artificial Intelligence Laboratory (AI Lab). Housed within the Ray and Maria Stata Center, CSAIL is the largest on-campus laboratory as measured by research scope and membership. It is part of the Schwarzman College of Computing but is also overseen by the MIT Vice President of Research. == Research activities == CSAIL's research activities are organized around a number of semi-autonomous research groups, each of which is headed by one or more professors or research scientists. These groups are divided up into seven general areas of research: Artificial intelligence Computational biology Graphics and vision Language and learning Theory of computation Robotics Systems (includes computer architecture, databases, distributed systems, networks and networked systems, operating systems, programming methodology, and software engineering, among others) == History == Computing Research at MIT began with Vannevar Bush's research into a differential analyzer and Claude Shannon's electronic Boolean algebra in the 1930s, the wartime MIT Radiation Laboratory, the post-war Project Whirlwind and the Research Laboratory of Electronics (RLE), and MIT Lincoln Laboratory's SAGE in the early 1950s. At MIT, research in the field of artificial intelligence began in the late 1950s. === Project MAC === On July 1, 1963, Project MAC (the Project on Mathematics and Computation, later backronymed to Multiple Access Computer, Machine Aided Cognitions, or Man and Computer) was launched with a $2 million grant from the Defense Advanced Research Projects Agency (DARPA). Project MAC's original director was Robert Fano of MIT's Research Laboratory of Electronics (RLE). Fano decided to call MAC a "project" rather than a "laboratory" for reasons of internal MIT politics – if MAC had been called a laboratory, then it would have been more difficult to raid other MIT departments for research staff. The program manager responsible for the DARPA grant was J. C. R. Licklider, who had previously been at MIT conducting research in RLE, and would later succeed Fano as director of Project MAC. Project MAC would become famous for groundbreaking research in operating systems, artificial intelligence, and the theory of computation. Its contemporaries included Project Genie at Berkeley, the Stanford Artificial Intelligence Laboratory, and (somewhat later) University of Southern California's (USC's) Information Sciences Institute. An "AI Group" including Marvin Minsky (the director), John McCarthy (inventor of Lisp), and a talented community of computer programmers were incorporated into Project MAC. They were interested principally in the problems of vision, mechanical motion and manipulation, and language, which they view as the keys to more intelligent machines. In the 1960s and 1970s the AI Group developed a time-sharing operating system called Incompatible Timesharing System (ITS) which ran on PDP-6 and later PDP-10 computers. The early Project MAC community included Fano, Minsky, Licklider, Fernando J. Corbató, and a community of computer programmers and enthusiasts among others who drew their inspiration from former colleague John McCarthy. These founders envisioned the creation of a computer utility whose computational power would be as reliable as an electric utility. To this end, Corbató brought the first computer time-sharing system, Compatible Time-Sharing System (CTSS), with him from the MIT Computation Center, using the DARPA funding to purchase an IBM 7094 for research use. One of the early focuses of Project MAC would be the development of a successor to CTSS, Multics, which was to be the first high availability computer system, developed as a part of an industry consortium including General Electric and Bell Laboratories. In 1966, Scientific American featured Project MAC in the September thematic issue devoted to computer science, that was later published in book form. At the time, the system was described as having approximately 100 TTY terminals, mostly on campus but with a few in private homes. Only 30 users could be logged in at the same time. The project enlisted students in various classes to use the terminals simultaneously in problem solving, simulations, and multi-terminal communications as tests for the multi-access computing software being developed. === AI Lab and LCS === In the late 1960s, Minsky's artificial intelligence group was seeking more space, and was unable to get satisfaction from project director Licklider. Minsky found that although Project MAC as a single entity could not get the additional space he wanted, he could split off to form his own laboratory and then be entitled to more office space. As a result, the MIT AI Lab was formed in 1970, and many of Minsky's AI colleagues left Project MAC to join him in the new laboratory, while most of the remaining members went on to form the Laboratory for Computer Science. Talented programmers such as Richard Stallman, who used TECO to develop EMACS, flourished in the AI Lab during this time. Those researchers who did not join the smaller AI Lab formed the Laboratory for Computer Science and continued their research into operating systems, programming languages, distributed systems, and the theory of computation. Two professors, Hal Abelson and Gerald Jay Sussman, chose to remain neutral—their group was referred to variously as Switzerland and Project MAC for the next 30 years. Among much else, the AI Lab led to the invention of Lisp machines and their attempted commercialization by two companies in the 1980s: Symbolics and Lisp Machines Inc. === CSAIL === On the fortieth anniversary of Project MAC's establishment, July 1, 2003, LCS was merged with the AI Lab to form the MIT Computer Science and Artificial Intelligence Laboratory, or CSAIL. This merger created the largest laboratory (over 600 personnel) on the MIT campus. In 2018, CSAIL launched a five-year collaboration program with IFlytek, a company sanctioned the following year for allegedly using its technology for surveillance and human rights abuses in Xinjiang. In October 2019, MIT announced that it would review its partnerships with sanctioned firms such as iFlyTek and SenseTime. In April 2020, the agreement with iFlyTek was terminated. CSAIL moved from the School of Engineering to the newly formed Schwarzman College of Computing by February 2020. == Offices == From 1963 to 2004, Project MAC, LCS, the AI Lab, and CSAIL had their offices at 545 Technology Square, taking over more and more floors of the building over the years. In 2004, CSAIL moved to the new Ray and Maria Stata Center, which was built specifically to house it and other departments. == Outreach activities == The IMARA (from Swahili word for "power") group sponsors a variety of outreach programs that bridge the global digital divide. Its aim is to find and implement long-term, sustainable solutions which will increase the availability of educational technology and resources to domestic and international communities. These projects are run under the aegis of CSAIL and staffed by MIT volunteers who give training, install and donate computer setups in greater Boston, Massachusetts, Kenya, Native American Indian tribal reservations in the American Southwest such as the Navajo Nation, the Middle East, and Fiji Islands. The CommuniTech project strives to empower under-served communities through sustainable technology and education and does this through the MIT Used Computer Factory (UCF), providing refurbished computers to under-served families, and through the Families Accessing Computer Technology (FACT) classes, it trains those families to become familiar and comfortable with computer technology. == Notable researchers == (Including members and alumni of CSAIL's predecessor laboratories) MacArthur Fellows Tim Berners-Lee, Erik Demaine, Dina Katabi, Daniela L. Rus, Regina Barzilay, Peter Shor, Richard Stallman, and Joshua Tenenbaum Turing Award recipients Leonard M. Adleman, Fernando J. Corbató, Shafi Goldwasser, Butler W. Lampson, John McCarthy, Silvio Micali, Marvin Minsky, Ronald L. Rivest, Adi Shamir, Barbara Liskov, and Michael Stonebraker IJCAI Computers and Thought Award recipients Terry Winograd, Patrick Winston, David Marr, Gerald Jay Sussman, Rodney Brooks Rolf Nevanlinna Prize recipients Madhu Sudan, Peter Shor, Constantinos Daskalakis Gödel Prize recipients Shafi Goldwasser (two-time recipient), Silvio Micali, Maurice Herlihy, Charles Rackoff, Johan Håstad, Peter Shor, and Madhu Sudan Grace Murray Hopper Award recipients Robert Metcalfe, Shafi Goldwasser, Guy L. Steele, Jr., Richard Stallman, and W. Daniel Hillis Textbook authors Harold Abelson and Gerald Jay Sussman, Richard Stallman, Thomas H. Cormen, Charles E. Leiserson, Patrick Winston, Ronald L.

Tiimo

Tiimo is an app designed to help neurodivergent individuals with planning their life. In August 2024 the company raised €1.4 million, bringing their total funding to €4.3 million. At that point they had over 500,000 users, including 50,000 paid users. The app has Apple Watch support and a learning platform that includes courses on well-being and neurodiversity. The app was founded by Helene Lassen Nørlem and Melissa Würtz Azari in 2015. After being a finalist in 2024, in December 2025 Tiimo was won Apple’s iPhone App of the Year. The premium version is $10/mo and features an AI chatbot alongside the daily planner.

Otterly.ai

Otterly.ai is an Austrian software company, founded in 2024, that provides tools for generative engine optimization, the practice of monitoring and optimizing results in large language models. == History == Otterly.ai was co-founded in 2024 by Thomas Peham, Klaus-M. Schremser and Josef Trauner. The concept for OtterlyAI was developed in response to the increasing use of generative AI tools in digital search and content discovery. The company announced a technology partnership with SEO platform Semrush in January 2025.