Contextual image classification

Contextual image classification, a topic of pattern recognition in computer vision, is an approach of classification based on contextual information in images. "Contextual" means this approach is focusing on the relationship of the nearby pixels, which is also called neighbourhood. The goal of this approach is to classify the images by using the contextual information. == Introduction == Similar as processing language, a single word may have multiple meanings unless the context is provided, and the patterns within the sentences are the only informative segments we care about. For images, the principle is same. Find out the patterns and associate proper meanings to them. As the image illustrated below, if only a small portion of the image is shown, it is very difficult to tell what the image is about. Even try another portion of the image, it is still difficult to classify the image. However, if we increase the contextual of the image, then it makes more sense to recognize. As the full images shows below, almost everyone can classify it easily. During the procedure of segmentation, the methods which do not use the contextual information are sensitive to noise and variations, thus the result of segmentation will contain a great deal of misclassified regions, and often these regions are small (e.g., one pixel). Compared to other techniques, this approach is robust to noise and substantial variations for it takes the continuity of the segments into account. Several methods of this approach will be described below. == Applications == === Functioning as a post-processing filter to a labelled image === This approach is very effective against small regions caused by noise. And these small regions are usually formed by few pixels or one pixel. The most probable label is assigned to these regions. However, there is a drawback of this method. The small regions also can be formed by correct regions rather than noise, and in this case the method is actually making the classification worse. This approach is widely used in remote sensing applications. === Improving the post-processing classification === This is a two-stage classification process: For each pixel, label the pixel and form a new feature vector for it. Use the new feature vector and combine the contextual information to assign the final label to the === Merging the pixels in earlier stages === Instead of using single pixels, the neighbour pixels can be merged into homogeneous regions benefiting from contextual information. And provide these regions to classifier. === Acquiring pixel feature from neighbourhood === The original spectral data can be enriched by adding the contextual information carried by the neighbour pixels, or even replaced in some occasions. This kind of pre-processing methods are widely used in textured image recognition. The typical approaches include mean values, variances, texture description, etc. === Combining spectral and spatial information === The classifier uses the grey level and pixel neighbourhood (contextual information) to assign labels to pixels. In such case the information is a combination of spectral and spatial information. === Powered by the Bayes minimum error classifier === Contextual classification of image data is based on the Bayes minimum error classifier (also known as a naive Bayes classifier). Present the pixel: A pixel is denoted as x 0 {\displaystyle x_{0}} . The neighbourhood of each pixel x 0 {\displaystyle x_{0}} is a vector and denoted as N ( x 0 ) {\displaystyle N(x_{0})} . The values in the neighbourhood vector is denoted as f ( x i ) {\displaystyle f(x_{i})} . Each pixel is presented by the vector ξ = ( f ( x 0 ) , f ( x 1 ) , … , f ( x k ) ) {\displaystyle \xi =\left(f(x_{0}),f(x_{1}),\ldots ,f(x_{k})\right)} x i ∈ N ( x 0 ) ; i = 1 , … , k {\displaystyle x_{i}\in N(x_{0});\quad i=1,\ldots ,k} The labels (classification) of pixels in the neighbourhood N ( x 0 ) {\displaystyle N(x_{0})} are presented as a vector η = ( θ 0 , θ 1 , … , θ k ) {\displaystyle \eta =\left(\theta _{0},\theta _{1},\ldots ,\theta _{k}\right)} θ i ∈ { ω 0 , ω 1 , … , ω k } {\displaystyle \theta _{i}\in \left\{\omega _{0},\omega _{1},\ldots ,\omega _{k}\right\}} ω s {\displaystyle \omega _{s}} here denotes the assigned class. A vector presents the labels in the neighbourhood N ( x 0 ) {\displaystyle N(x_{0})} without the pixel x 0 {\displaystyle x_{0}} η ^ = ( θ 1 , θ 2 , … , θ k ) {\displaystyle {\hat {\eta }}=\left(\theta _{1},\theta _{2},\ldots ,\theta _{k}\right)} The neighbourhood: Size of the neighbourhood. There is no limitation of the size, but it is considered to be relatively small for each pixel x 0 {\displaystyle x_{0}} . A reasonable size of neighbourhood would be 3 × 3 {\displaystyle 3\times 3} of 4-connectivity or 8-connectivity ( x 0 {\displaystyle x_{0}} is marked as red and placed in the centre). The calculation: Apply the minimum error classification on a pixel x 0 {\displaystyle x_{0}} , if the probability of a class ω r {\displaystyle \omega _{r}} being presenting the pixel x 0 {\displaystyle x_{0}} is the highest among all, then assign ω r {\displaystyle \omega _{r}} as its class. θ 0 = ω r if P ( ω r ∣ f ( x 0 ) ) = max s = 1 , 2 , … , R P ( ω s ∣ f ( x 0 ) ) {\displaystyle \theta _{0}=\omega _{r}\quad {\text{ if }}\quad P(\omega _{r}\mid f(x_{0}))=\max _{s=1,2,\ldots ,R}P(\omega _{s}\mid f(x_{0}))} The contextual classification rule is described as below, it uses the feature vector x 1 {\displaystyle x_{1}} rather than x 0 {\displaystyle x_{0}} . θ 0 = ω r if P ( ω r ∣ ξ ) = max s = 1 , 2 , … , R P ( ω s ∣ ξ ) {\displaystyle \theta _{0}=\omega _{r}\quad {\text{ if }}\quad P(\omega _{r}\mid \xi )=\max _{s=1,2,\ldots ,R}P(\omega _{s}\mid \xi )} Use the Bayes formula to calculate the posteriori probability P ( ω s ∣ ξ ) {\displaystyle P(\omega _{s}\mid \xi )} P ( ω s ∣ ξ ) = p ( ξ ∣ ω s ) P ( ω s ) p ( ξ ) {\displaystyle P(\omega _{s}\mid \xi )={\frac {p(\xi \mid \omega _{s})P(\omega _{s})}{p\left(\xi \right)}}} The number of vectors is the same as the number of pixels in the image. For the classifier uses a vector corresponding to each pixel x i {\displaystyle x_{i}} , and the vector is generated from the pixel's neighbourhood. The basic steps of contextual image classification: Calculate the feature vector ξ {\displaystyle \xi } for each pixel. Calculate the parameters of probability distribution p ( ξ ∣ ω s ) {\displaystyle p(\xi \mid \omega _{s})} and P ( ω s ) {\displaystyle P(\omega _{s})} Calculate the posterior probabilities P ( ω r ∣ ξ ) {\displaystyle P(\omega _{r}\mid \xi )} and all labels θ 0 {\displaystyle \theta _{0}} . Get the image classification result. == Algorithms == === Template matching === The template matching is a "brute force" implementation of this approach. The concept is first create a set of templates, and then look for small parts in the image match with a template. This method is computationally high and inefficient. It keeps an entire templates list during the whole process and the number of combinations is extremely high. For a m × n {\displaystyle m\times n} pixel image, there could be a maximum of 2 m × n {\displaystyle 2^{m\times n}} combinations, which leads to high computation. This method is a top down method and often called table look-up or dictionary look-up. === Lower-order Markov chain === The Markov chain also can be applied in pattern recognition. The pixels in an image can be recognised as a set of random variables, then use the lower order Markov chain to find the relationship among the pixels. The image is treated as a virtual line, and the method uses conditional probability. === Hilbert space-filling curves === The Hilbert curve runs in a unique pattern through the whole image, it traverses every pixel without visiting any of them twice and keeps a continuous curve. It is fast and efficient. === Markov meshes === The lower-order Markov chain and Hilbert space-filling curves mentioned above are treating the image as a line structure. The Markov meshes however will take the two dimensional information into account. === Dependency tree === The dependency tree is a method using tree dependency to approximate probability distributions.

Multimodal representation learning

Multimodal representation learning is a subfield of representation learning focused on integrating and interpreting information from different modalities, such as text, images, audio, or video, by projecting them into a shared latent space. This allows for semantically similar content across modalities to be mapped to nearby points within that space, facilitating a unified understanding of diverse data types. By automatically learning meaningful features from each modality and capturing their inter-modal relationships, multimodal representation learning enables a unified representation that enhances performance in cross-media analysis tasks such as video classification, event detection, and sentiment analysis. It also supports cross-modal retrieval and translation, including image captioning, video description, and text-to-image synthesis. == Motivation == The primary motivations for multimodal representation learning arise from the inherent nature of real-world data and the limitations of unimodal approaches. Since multimodal data offers complementary and supplementary information about an object or event from different perspectives, it is more informative than relying on a single modality. A key motivation is to narrow the heterogeneity gap that exists between different modalities by projecting their features into a shared semantic subspace. This allows semantically similar content across modalities to be represented by similar vectors, facilitating the understanding of relationships and correlations between them. Multimodal representation learning aims to leverage the unique information provided by each modality to achieve a more comprehensive and accurate understanding of concepts. These unified representations are crucial for improving performance in various cross-media analysis tasks such as video classification, event detection, and sentiment analysis. They also enable cross-modal retrieval, allowing users to search and retrieve content across different modalities. Additionally, it facilitates cross-modal translation, where information can be converted from one modality to another, as seen in applications like image captioning and text-to-image synthesis. The abundance of ubiquitous multimodal data in real-world applications, including understudied areas like healthcare, finance, and human-computer interaction (HCI), further motivates the development of effective multimodal representation learning techniques. == Approaches and methods == === Canonical-correlation analysis based methods === Canonical-correlation analysis (CCA) was first introduced in 1936 by Harold Hotelling and is a fundamental approach for multimodal learning. CCA aims to find linear relationships between two sets of variables. Given two data matrices X ∈ R n × p {\displaystyle X\in \mathbb {R} ^{n\times p}} and Y ∈ R n × q {\displaystyle Y\in \mathbb {R} ^{n\times q}} representing different modalities, CCA finds projection vectors w x ∈ R p {\displaystyle w_{x}\in \mathbb {R} ^{p}} and w y ∈ R q {\displaystyle w_{y}\in \mathbb {R} ^{q}} that maximizes the correlation between the projected variables: ρ = max w x , w y w x ⊤ Σ x y w y w x ⊤ Σ x x w x w y ⊤ Σ y y w y {\displaystyle \rho =\max _{w_{x},w_{y}}{\frac {w_{x}^{\top }\Sigma _{xy}w_{y}}{{\sqrt {w_{x}^{\top }\Sigma _{xx}w_{x}}}{\sqrt {w_{y}^{\top }\Sigma _{yy}w_{y}}}}}} such that Σ x x {\displaystyle \Sigma _{xx}} and Σ y y {\displaystyle \Sigma _{yy}} are the within-modality covariance matrices, and Σ x y {\displaystyle \Sigma _{xy}} is the between-modality covariance matrix. However, standard CCA is limited by its linearity, which led to the development of nonlinear extensions, such as kernel CCA and deep CCA. ==== Kernel CCA ==== Kernel canonical correlation analysis (KCCA) extends traditional CCA to capture nonlinear relationships between modalities by implicitly mapping the data into high dimensional feature spaces using kernel functions. Given kernel functions K x {\displaystyle K_{x}} and K y {\displaystyle K_{y}} with corresponding Gram matrices K x ∈ R n × n {\displaystyle K_{x}\in \mathbb {R} ^{n\times n}} and K y ∈ R n × n {\displaystyle K_{y}\in \mathbb {R} ^{n\times n}} , KCCA seeks coefficients α {\displaystyle \alpha } and β {\displaystyle \beta } that maximize: ρ = max α , β α ⊤ K x K y β α ⊤ K x 2 α β ⊤ K y 2 β {\displaystyle \rho =\max _{\alpha ,\beta }{\frac {\alpha ^{\top }K_{x}Ky\beta }{{\sqrt {\alpha ^{\top }K_{x}^{2}\alpha }}{\sqrt {\beta ^{\top }K_{y}^{2}\beta }}}}} To prevent overfitting, regularization terms are typically added, resulting in: ρ = max α , β α T K x K y β α T ( K x 2 + λ x K x ) α β T ( K y 2 + λ y K y ) β {\displaystyle \rho =\max _{\alpha ,\beta }{\frac {\alpha ^{T}K_{x}K_{y}\beta }{{\sqrt {\alpha ^{T}\left(K_{x}^{2}+\lambda _{x}K_{x}\right)\alpha }}{\sqrt {\;\beta ^{T}\left(K_{y}^{2}+\lambda _{y}K_{y}\right)\beta }}}}} where λ x {\displaystyle \lambda _{x}} and λ y {\displaystyle \lambda _{y}} are regularization parameters. KCCA has proven effective for tasks such as cross-modal retrieval and semantic analysis, though it faces computational challenges with large datasets due to its O ( n 2 ) {\displaystyle O(n^{2})} memory requirement for sorting kernel matrices. KCCA was proposed independently by several researchers. ==== Deep CCA ==== Deep canonical correlation analysis (DCCA), introduced in 2013, employs neural networks to learn nonlinear transformations for maximizing the correlation between modalities. DCCA uses separate neural networks f x {\displaystyle f_{x}} and f y {\displaystyle f_{y}} for each modality to transform the original data before applying CCA: max W x , W y , θ x , θ y corr ⁡ ( f x ( X ; θ x ) , f y ( Y ; θ y ) ) {\displaystyle \max _{W_{x},W_{y},\theta _{x},\theta _{y}}\operatorname {corr} \left(f_{x}(X;\theta _{x}),f_{y}(Y;\theta _{y})\right)} where θ x {\displaystyle \theta _{x}} and θ y {\displaystyle \theta _{y}} represent the parameters of the neural networks, and W x {\displaystyle W_{x}} and W y {\displaystyle W_{y}} are the CCA projection matrices. The correlation objective is computed as: corr ⁡ ( H x , H y ) = tr ⁡ ( T − 1 / 2 H x T H y S − 1 / 2 ) {\displaystyle \operatorname {corr} (H_{x},H_{y})=\operatorname {tr} \left(T^{-1/2}H_{x}^{T}H_{y}S^{-1/2}\right)} where H x = f x ( X ) {\displaystyle H_{x}=f_{x}(X)} and H y = f y ( Y ) {\displaystyle H_{y}=f_{y}(Y)} are the network outputs, T = H x T H x + r x I {\displaystyle T=H_{x}^{T}H_{x}+r_{x}I} , S = H y T H y + r y I {\displaystyle S=H_{y}^{T}H_{y}+r_{y}I} and r x , r y {\displaystyle r_{x},r_{y}} are the regularization parameters. DCCA overcomes the limitations of linear CCA and kernel CCA by learning complex nonlinear relationships while maintaining computational efficiency for large datasets through mini-batch optimization. === Graph-based methods === Graph-based approaches for multimodal representation learning leverage graph structure to model relationships between entities across different modalities. These methods typically represent each modality as a graph and then learn embedding that preserve cross-modal similarities, enabling more effective joint representation of heterogeneous data. One such method is cross-modal graph neural networks (CMGNNs) that extend traditional graph neural networks (GNNs) to handle data from multiple modalities by constructing graphs that capture both intra-modal and inter-modal relationships. These networks model interactions across modalities by representing them as nodes and their relationships as edges. Other graph-based methods include Probabilistic Graphical Models (PGMs) such as deep belief networks (DBN) and deep Boltzmann machines (DBM). These models can learn a joint representation across modalities, for instance, a multimodal DBN achieves this by adding a shared restricted Boltzmann Machine (RBM) hidden layer on top of modality-specific DBNs. Additionally, the structure of data in some domains like Human-Computer Interaction (HCI), such as the view hierarchy of app screens, can potentially be modeled using graph-like structures. The field of graph representation learning is also relevant, with ongoing progress in developing evaluation benchmarks. === Diffusion maps === Another set of methods relevant to multimodal representation learning are based on diffusion maps and their extensions to handle multiple modalities. ==== Multi-view diffusion maps ==== Multi-view diffusion maps address the challenge of achieving multi-view dimensionality reduction by effectively utilizing the availability of multiple views to extract a coherent low-dimensional representation of the data. The core idea is to exploit both the intrinsic relations within each view and the mutual relations between the different views, defining a cross-view model where a random walk process implicitly hops between objects in different views. A multi-view kernel matrix is constructed by combining these relations, defining a cross-view diffusion process and associ

Interstellar communication

Interstellar communication is the transmission of signals between planetary systems. Sending interstellar messages is potentially much easier than interstellar travel, being possible with technologies and equipment which are currently available. However, the distances from Earth to other potentially inhabited systems introduce prohibitive delays, assuming the limitations of the speed of light. Even an immediate reply to radio communications sent to stars tens of thousands of light-years away would take many human generations to arrive. == Radio == The SETI project has for the past several decades been conducting a search for signals being transmitted by extraterrestrial life located outside the Solar System, primarily in the radio frequencies of the electromagnetic spectrum. Special attention has been given to the Water Hole, the frequency of one of neutral hydrogen's absorption lines, due to the low background noise at this frequency and its symbolic association with the basis for what is likely to be the most common system of biochemistry (but see alternative biochemistry). The regular radio pulses emitted by pulsars were briefly thought to be potential intelligent signals; the first pulsar to be discovered was originally designated "LGM-1", for "Little Green Men." They were quickly determined to be of natural origin, however. Several attempts have been made to transmit signals to other stars as well. (See "Realized projects" at Active SETI.) One of the earliest and most famous was the 1974 radio message sent from the largest radio telescope in the world, the Arecibo Observatory in Puerto Rico. An extremely simple message was aimed at a globular cluster of stars known as M13 in the Milky Way Galaxy and at a distance of 30,000 light years from the Solar System. These efforts have been more symbolic than anything else, however. Further, a possible answer needs double the travel time, i.e. tens of years (near stars) or 60,000 years (M13). == Other methods == It has also been proposed that higher frequency signals, such as lasers operating at visible light frequencies, may prove to be a fruitful method of interstellar communication; at a given frequency it takes surprisingly small energy output for a laser emitter to outshine its local star from the perspective of its target. Other more exotic methods of communication have been proposed, such as modulated neutrino or gravitational wave emissions. These would have the advantage of being essentially immune to interference by intervening matter. Sending physical mail packets between stars may prove to be optimal for many applications. While mail packets would likely be limited to speeds far below that of electromagnetic or other light-speed signals (resulting in very high latency), the amount of information that could be encoded in only a few tons of physical matter could more than make up for it in terms of average bandwidth. The possibility of using interstellar messenger probes for interstellar communication — known as Bracewell probes — was first suggested by Ronald N. Bracewell in 1960, and the technical feasibility of this approach was demonstrated by the British Interplanetary Society's starship study Project Daedalus in 1978. Starting in 1979, Robert Freitas advanced arguments for the proposition that physical space-probes provide a superior mode of interstellar communication to radio signals, then undertook telescopic searches for such probes in 1979 and 1982.

Fear of missing out

Fear of missing out (FOMO) is the feeling of apprehension that one is either not in the know about or missing out on information, events, experiences, or life decisions that could make one's life better. FOMO is also associated with a fear of regret, which may lead to concerns that one might miss an opportunity for social interaction, a novel experience, a memorable event, profitable investment, or the comfort of loved ones. It is characterized by a desire to stay continually connected with what others are doing, and can be described as the fear that deciding not to participate is the wrong choice. FOMO could result from not knowing about a conversation, missing a TV show, not attending a wedding or party, or hearing that others have discovered a new restaurant. In recent years, FOMO has been attributed to a number of negative psychological and behavioral symptoms. FOMO has increased in recent times due to advancements in technology. Social networking sites create many opportunities for FOMO. While it provides opportunities for social engagement, it offers a view into an endless stream of activities in which a person is not involved. Further, a common tendency is to post about positive experiences (such as a great restaurant) rather than negative ones (such as a bad first date). Psychological dependence on social media can lead to FOMO or even pathological Internet use. FOMO is also present in video games, investing, and business marketing. The increasing popularity of the phrase has led to related linguistic and cultural variants. FOMO is associated with worsening depression and anxiety, and a lowered quality of life. FOMO can also affect businesses. Hype and trends can lead business leaders to invest based on perceptions of what others are doing, rather than their own business strategy. This is also the idea of the bandwagon effect, where one individual may see another person or people do something and they begin to think it must be important because everyone is doing it. They might not even understand the meaning behind it, and they may not totally agree with it. Nevertheless, they are still going to participate because they don't want to be left out. == History == Patrick J. McGinnis coined the term FOMO and popularized it in a 2004 op-ed titled "Social Theory at HBS: McGinnis' Two FOs" in The Harbus, the magazine of Harvard Business School, where he was then a student. The article also referred to another related condition, Fear of a Better Option (FOBO), and the role of these two fears in the school's social life. Currently the term has been used as a hashtag on social media and has been mentioned in hundreds of news articles, from online sources like Salon.com to print papers like The New York Times. === Earlier forms === The phrase "fear of missing out" is a common English phrase, especially in the form "fear of missing out on (something)". The term "fear of missing out" (but not the term FOMO) was used earlier in the academic business literature by marketing strategist Dan Herman, who used it in presentations in the late 1990s, and included the phrase in a 2000 paper about "short-term brands", where a motivation for trying these brands is "ambition to exhaust all possibilities and the fear of missing out on something". Herman also believes the concept has evolved to become more wide spread through mobile phone usage, texting, and social media and has helped flesh out the concept of the fear of missing out to the masses. Before the Internet, a related phenomenon, "keeping up with the Joneses", was widely experienced. FOMO generalized and intensified this experience because so much more of people's lives became publicly documented and easily accessed. == Symptoms == === Psychological === Fear of missing out has been associated with a deficit in psychological needs. Self-determination theory contends that an individual's psychological satisfaction in their competence, autonomy, and relatedness consists of three basic psychological needs for human beings. Test subjects with lower levels of basic psychological satisfaction reported a higher level of FOMO. FOMO has also been linked to negative psychological effects in overall mood and general life satisfaction. A study performed on college campuses found that experiencing FOMO on a certain day led to a higher fatigue on that day specifically. Experiencing FOMO continuously throughout the semester also can lead to higher stress levels among students. An individual with an expectation to experience the fear of missing out can also develop a lower level of self-esteem. A study by JWTIntelligence suggests that FOMO can influence the formation of long-term goals and self-perceptions. In this study, around half of the respondents stated that they are overwhelmed by the amount of information needed to stay up-to-date, and that it is impossible to not miss out on something. The process of relative deprivation creates FOMO and dissatisfaction. It reduces psychological well-being. FOMO led to negative social and emotional experiences, such as boredom and loneliness. A 2013 study found that it negatively impacts mood and life satisfaction, reduces self-esteem, and affects mindfulness. Four in ten young people reported FOMO sometimes or often. FOMO was found to be negatively correlated with age, and men were more likely than women to report it. People who experience higher levels of FOMO tend to have a stronger desire for high social status, are more competitive with others of the same gender, and are more interested in short-term relationships. Studies have found that experiencing fear of missing out has been linked to anxiety or depression. === Behavioral === The fear of missing out stems from a feeling of missing social connections or information. This absent feeling is then followed by a need or drive to interact socially to boost connections. The fear of missing out not only leads to negative psychological effects but also has been shown to increase negative behavioral patterns. In aims of maintaining social connections, negative habits are formed or heightened. A 2019 University of Glasgow study surveyed 467 adolescents, and found that the respondents felt societal pressure to always be available. According to John M. Grohol, founder and Editor-in-Chief of Psych Central, FOMO may lead to a constant search for new connections with others, abandoning current connections to do so. The fear of missing out derived from digital connection has been positively correlated with bad technology habits especially in youth. These negative habits included increased screen time, checking social media during school, or texting while driving. Social media use in the presence of others can be referred to as phubbing, the habit of snubbing a physically present person in favour of a mobile phone. Multiple studies have also identified a negative correlation between the hours of sleep and the scale at which individuals experience fear of missing out. A lack of sleep in college students experiencing FOMO can be attributed to the number of social interactions that occur late at night on campuses. == Settings == === Social media === Fear of missing out has a positive correlation with higher levels of social media usage. Social media connects individuals and showcases the lives of others at their peak. This gives people the fear of missing out when they feel like others on social media are taking part in positive life experiences that they personally are not also experiencing. This fear of missing out related to social media has symptoms including anxiety, loneliness, and a feeling of inadequacy compared to others. Self-esteem plays a key role in the levels a person feels when experiencing the fear of missing out, as their self worth is influenced by people they observe on social media. There are two types of anxiety; one related to genetics that is permanent, and one that is temporary. The temporary state of anxiety is the one that is more relevant to the fear of missing out, and is directly related to the individual looking at social media sites for a short period of time. This anxiety is caused by a loss of feeling of belonging through the concept of social exclusion. FOMO-sufferers may increasingly seek access to others' social lives, and consume an escalating amount of real-time information. A survey in 2012 indicated that 83% of respondents said that there is information overload in regards that there is too much to watch and read. Constant information that is available to people through social media causes the fear of missing out as people feel worse about themselves for not staying up to date with relevant information. Social media shows just exactly what people are missing out on in real time including events like parties, opportunities, and other events leading for people to fear missing out on other related future events. Another survey indicates that almost 40% of people from ages 12 through 67 i

Global digital divide

The global digital divide describes global disparities, primarily between developed and developing countries, in regards to access to computing and information resources such as the Internet and the opportunities derived from such access. The Internet is expanding very quickly, and not all countries—especially developing countries—can keep up with the constant changes. The term "digital divide" does not necessarily mean that someone does not have technology; it could mean that there is simply a difference in technology. These differences can refer to, for example, high-quality computers, fast Internet, technical assistance, or telephone services. == Statistics == There is a large inequality worldwide in terms of the distribution of installed telecommunication bandwidth. In 2014 only three countries (China, US, Japan) host 50% of the globally installed bandwidth potential (see pie-chart Figure on the right). This concentration is not new, as historically only ten countries have hosted 70–75% of the global telecommunication capacity (see Figure). The U.S. lost its global leadership in terms of installed bandwidth in 2011, being replaced by China, which hosts more than twice as much national bandwidth potential in 2014 (29% versus 13% of the global total). == Versus the digital divide == The global digital divide is a special case of the digital divide; the focus is set on the fact that "Internet has developed unevenly throughout the world" causing some countries to fall behind in technology, education, labor, democracy, and tourism. The concept of the digital divide was originally popularized regarding the disparity in Internet access between rural and urban areas of the United States of America; the global digital divide mirrors this disparity on an international scale. The global digital divide also contributes to the inequality of access to goods and services available through technology. Computers and the Internet provide users with improved education, which can lead to higher wages; the people living in nations with limited access are therefore disadvantaged. This global divide is often characterized as falling along what is sometimes called the North–South divide of "northern" wealthier nations and "southern" poorer ones. == Obstacles to a solution == Some people argue that necessities need to be considered before achieving digital inclusion, such as an ample food supply and quality health care. Minimizing the global digital divide requires considering and addressing the following types of access: === Physical access === Involves "the distribution of ICT devices per capita…and land lines per thousands". Individuals need to obtain access to computers, landlines, and networks in order to access the Internet. This access barrier is also addressed in Article 21 of the convention on the Rights of Persons with Disabilities by the United Nations. === Financial access === The cost of ICT devices, traffic, applications, technician and educator training, software, maintenance, and infrastructures require ongoing financial means. Financial access and "the levels of household income play a significant role in widening the gap". === Socio-demographic access === Empirical tests have identified that several socio-demographic characteristics foster or limit ICT access and usage. Among different countries, educational levels and income are the most powerful explanatory variables, with age being a third one. While a Global Gender Gap in access and usage of ICT's exist, empirical evidence shows that this is due to unfavorable conditions concerning employment, education and income and not to technophobia or lower ability. In the contexts understudy, women with the prerequisites for access and usage turned out to be more active users of digital tools than men. In the US, for example, the figures for 2018 show 89% of men and 88% of women use the Internet. === Cognitive access === In order to use computer technology, a certain level of information literacy is needed. Further challenges include information overload and the ability to find and use reliable information. === Design access === Computers need to be accessible to individuals with different learning and physical abilities including complying with Section 508 of the Rehabilitation Act as amended by the Workforce Investment Act of 1998 in the United States. === Institutional access === In illustrating institutional access, Wilson states "the numbers of users are greatly affected by whether access is offered only through individual homes or whether it is offered through schools, community centers, religious institutions, cybercafés, or post offices, especially in poor countries where computer access at work or home is highly limited". === Political access === Guillen & Suarez argue that "democratic political regimes enable faster growth of the Internet than authoritarian or totalitarian regimes." The Internet is considered a form of e-democracy, and attempting to control what citizens can or cannot view is in contradiction to this. Recently situations in Iran and China have denied people the ability to access certain websites and disseminate information. Iran has prohibited the use of high-speed Internet in the country and has removed many satellite dishes in order to prevent the influence of Western culture, such as music and television. === Cultural access === Many experts claim that bridging the digital divide is not sufficient and that the images and language needed to be conveyed in a language and images that can be read across different cultural lines. A 2013 study conducted by Pew Research Center noted how participants taking the survey in Spanish were nearly twice as likely not to use the internet. == Examples == In the early 21st century, residents of developed countries enjoy many Internet services which are not yet widely available in developing countries, including: Mobile phones and small electronic communication devices; E-communities and social-networking; Fast broadband Internet connections, enabling advanced Internet applications; Affordable and widespread Internet access, either through personal computers at home or work, through public terminals in public libraries and Internet cafes, and through wireless access points; E-commerce enabled by efficient electronic payment networks like credit cards and reliable shipping services; Virtual globes featuring street maps searchable down to individual street addresses and detailed satellite and aerial photography; Online research systems which enable users to peruse newspaper and magazine articles that may be centuries old, without having to leave home; Electronic readers such as Kindle, Sony Reader, Samsung Papyrus and Iliad by iRex Technologies; Price engines which help consumers find the best possible online prices and similar services which find the best possible prices at local retailers; Electronic services delivery of government services, such as the ability to pay taxes, fees, and fines online. Further civic engagement through e-government and other sources such as finding information about candidates regarding political situations. == Proposed remedies == There are four specific arguments why it is important to "bridge the gap": Economic equality – For example, the telephone is often seen as one of the most important components, because having access to a working telephone can lead to higher safety. If there were to be an emergency, one could easily call for help if one could use a nearby phone. In another example, many work-related tasks are online, and people without access to the Internet may not be able to complete work up to company standards. The Internet is regarded by some as a basic component of civic life that developed countries ought to guarantee for their citizens. Additionally, welfare services, for example, are sometimes offered via the Internet. Social mobility – Computer and Internet use is regarded as being very important to development and success. However, some children are not getting as much technical education as others, because lower socioeconomic areas cannot afford to provide schools with computer facilities. For this reason, some kids are being separated and not receiving the same chance as others to be successful. Democracy – Some people believe that eliminating the digital divide would help countries become healthier democracies. They argue that communities would become much more involved in events such as elections or decision making. Economic growth – It is believed that less-developed nations could gain quick access to economic growth if the information infrastructure were to be developed and well used. By improving the latest technologies, certain countries and industries can gain a competitive advantage. While these four arguments are meant to lead to a solution to the digital divide, there are a couple of other components that need to be considered. The first one is rural living versus s

Eyes of Things

Eyes of Things (EoT) is the name of a project funded by the European Union’s Horizon 2020 Research and Innovation Programme under grant agreement number 643924. The purpose of the project, which is funded under the Smart Cyber-physical systems topic, is to develop a generic hardware-software platform for embedded, efficient (i.e. battery-operated, wearable, mobile), computer vision, including deep learning inference. On November 29, 2018, the European Space Agency announced that it was testing the suitability of the device for space applications in advance of a flight in a Cubesat. == Motivation == EoT is based on the following tenets: Future embedded systems will have more intelligence and cognitive functionality. Vision is paramount to such intelligent capacity Unlike other sensors, vision requires intensive processing. Power consumption must be optimized if vision is to be used in mobile and wearable applications Cloud processing of edge-captured images is not sustainable. The sheer amount of visual data generated cannot be transferred to the cloud. Bandwidth is not sufficient and cloud servers cannot cope with it. == Partners == VISILAB group at University of Castilla–La Mancha (Coordinator) Movidius Awaiba Thales Security Solutions & Systems DFKI Fluxguide Evercam nVISO == Awards == 2019 Electronic Component and Systems Innovation Award by the European Commission 2018 HiPEAC Tech Transfer Award 2018 EC Innovation Radar - highlighting excellent innovations Award 2018 Internet of Things (IoT) Technology Research Award Pilot by Google 2016 Semifinalist "THE VISION SHOW STARTUP COMPETITION", Global Association for Vision Information, Boston US

Commercial skipping

Commercial skipping is a feature of some digital video recorders that makes it possible to automatically skip commercials in recorded programs. This feature created controversy, with major television networks and movie studios claiming it violates copyright and should be banned. == History == After the video cassette recorder (VCR) became popular in the 1980s, the television industry began studying the impact of users fast forwarding through commercials. Advertising agencies fought the trend by making them more entertaining. For many years, video recorders manufactured for the Japanese market have been able to skip advertisements automatically, which is done by detecting when foreign language audio overdub tracks provided for many programmes go silent, as advertisements were broadcast with a single language only. The first digital video recorder (DVR) with a built-in commercial skipping feature was ReplayTV with its "4000 Series" and "5000 Series" units. In 2002, the main television networks and movie studios sued ReplayTV, claiming that skipping advertisements during replay violates copyright. Later, five owners of ReplayTV represented by Electronic Frontier Foundation and attorneys Ira Rothken and Richard Wiebe countersued, asking the federal judge to uphold consumers' rights to record TV shows and skip commercials, claiming that features like commercial skipping help parents protect their kids from excessive consumerism. ReplayTV ended up filing for bankruptcy in 2003 after fighting a copyright infringement suit over the ReplayTV's ability to skip commercials. === Commercial skipping software === In addition to the DVR devices which existed in the private market since the late 1990s, towards the mid-2000s, due to the significant advances in home computers, Home theater PCs started gaining popularity in the private market and many users began using their Home theater PCs in their living room for entertainment purposes. Following this, many DVR programs were developed, including popular programs such as Windows Media Center, which contained all of the features of the DVR devices in addition to advanced features such as HDTV and the use of Multiple TV Tuner Cards. Some independent developers began developing independent software capable of skipping the commercial segments when playing recorded videos, and permanently removing the commercial segments from recorded video files. By 2014, many DVR programs such as Windows Media Center, SageTV and MythTV had the capability to skip commercials segments in recorded TV broadcasts after installing third-party add-ons such as DVRMSToolbox, Comskip and ShowAnalyzer, which use various advanced techniques to locate the commercial segments in the video files and save their locations to text files. The text files can also be fed into programs such as MEncoder or DVRMSToolboxGUI which can delete the commercial segments from the recorded video files. A few third-party tools such as MCEBuddy automate detection and removal/marking of commercials. One of the weaknesses of commercial skippers is that, operating automatically, they may misidentify program material as a commercial. Some programs like MCEBuddy provide the ability to fine-tune commercial detection for groups of files (e.g. by channel or country) and provide tools to manually fine-tune commercial segments for individual files. In May 2012, the US Dish Network began offering a DVR with what it calls AutoHop. The device would automatically skip commercials when displaying programming that the viewer had previously recorded with the PrimeTime Anytime feature. It does not skip ads on any live programs. US broadcasters were angered at the news, and FOX embarked on legal action. Most, but not all, of Fox's claims were dismissed; ultimately an agreement was reached whereby AutoHop would only become available for Fox stations seven days after a program is transmitted; terms of the settlement were not disclosed. == The future of TV advertisements == The introduction of digital video recorders and services with skipping and fast-forward capabilities enables viewers to avoid viewing interruptive advertisements in recorded programs, either manually or automatically. While advertising separate to television shows can be skipped, advertising in TV shows themselves ("product placement") cannot be skipped. Streaming services such as Hulu show shorter advertisements with a countdown timer and tailored to the viewers interests, asking interactive questions like "Is this ad relevant to you?".