AI Detector And Fixer

AI Detector And Fixer — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Open-source software security

    Open-source software security

    Open-source software security is the measure of assurance or guarantee in the freedom from danger and risk inherent to an open-source software system. == Implementation debate == === Benefits === Proprietary software forces the user to accept the level of security that the software vendor is willing to deliver and to accept the rate that patches and updates are released. It is assumed that any compiler that is used creates code that can be trusted, but it has been demonstrated by Ken Thompson that a compiler can be subverted using a compiler backdoor to create faulty executables that are unwittingly produced by a well-intentioned developer. With access to the source code for the compiler, the developer has at least the ability to discover if there is any mal-intention. Kerckhoffs' principle is based on the idea that an enemy can steal a secure military system and not be able to compromise the information. His ideas were the basis for many modern security practices, and followed that security through obscurity is a bad practice. === Drawbacks === Simply making source code available does not guarantee review. An example of this occurring is when Marcus Ranum, an expert on security system design and implementation, released his first public firewall toolkit. At one time, there were over 2,000 sites using his toolkit, but only 10 people gave him any feedback or patches. Having a large amount of eyes reviewing code can "lull a user into a false sense of security". Having many users look at source code does not guarantee that security flaws will be found and fixed. == Metrics and models == There are a variety of models and metrics to measure the security of a system. These are a few methods that can be used to measure the security of software systems. === Number of days between vulnerabilities === It is argued that a system is most vulnerable after a potential vulnerability is discovered, but before a patch is created. By measuring the number of days between the vulnerability and when the vulnerability is fixed, a basis can be determined on the security of the system. There are a few caveats to such an approach: not every vulnerability is equally bad, and fixing a lot of bugs quickly might not be better than only finding a few and taking a little bit longer to fix them, taking into account the operating system, or the effectiveness of the fix. === Poisson process === The Poisson process can be used to measure the rates at which different people find security flaws between open and closed source software. The process can be broken down by the number of volunteers Nv and paid reviewers Np. The rates at which volunteers find a flaw is measured by λv and the rate that paid reviewers find a flaw is measured by λp. The expected time that a volunteer group is expected to find a flaw is 1/(Nv λv) and the expected time that a paid group is expected to find a flaw is 1/(Np λp). === Morningstar model === By comparing a large variety of open source and closed source projects a star system could be used to analyze the security of the project similar to how Morningstar, Inc. rates mutual funds. With a large enough data set, statistics could be used to measure the overall effectiveness of one group over the other. An example of such as system is as follows: 1 Star: Many security vulnerabilities. 2 Stars: Reliability issues. 3 Stars: Follows best security practices. 4 Stars: Documented secure development process. 5 Stars: Passed independent security review. === Coverity scan === Coverity in collaboration with Stanford University has established a new baseline for open-source quality and security. The development is being completed through a contract with the Department of Homeland Security. They are utilizing innovations in automated defect detection to identify critical types of bugs found in software. The level of quality and security is measured in rungs. Rungs do not have a definitive meaning, and can change as Coverity releases new tools. Rungs are based on the progress of fixing issues found by the Coverity Analysis results and the degree of collaboration with Coverity. They start with Rung 0 and currently go up to Rung 2. Rung 0 The project has been analyzed by Coverity's Scan infrastructure, but no representatives from the open-source software have come forward for the results. Rung 1 At rung 1, there is collaboration between Coverity and the development team. The software is analyzed with a subset of the scanning features to prevent the development team from being overwhelmed. Rung 2 There are 11 projects that have been analyzed and upgraded to the status of Rung 2 by reaching zero defects in the first year of the scan. These projects include: AMANDA, ntp, OpenPAM, OpenVPN, Overdose, Perl, PHP, Postfix, Python, Samba, and Tcl.

    Read more →
  • Tropical cryptography

    Tropical cryptography

    In tropical analysis, tropical cryptography refers to the study of a class of cryptographic protocols built upon tropical algebras. In many cases, tropical cryptographic schemes have arisen from adapting classical (non-tropical) schemes to instead rely on tropical algebras. The case for the use of tropical algebras in cryptography rests on at least two key features of tropical mathematics: in the tropical world, there is no classical multiplication (a computationally expensive operation), and the problem of solving systems of tropical polynomial equations has been shown to be NP-hard. == Basic Definitions == The key mathematical object at the heart of tropical cryptography is the tropical semiring ( R ∪ { ∞ } , ⊕ , ⊗ ) {\displaystyle (\mathbb {R} \cup \{\infty \},\oplus ,\otimes )} (also known as the min-plus algebra), or a generalization thereof. The operations are defined as follows for x , y ∈ R ∪ { ∞ } {\displaystyle x,y\in \mathbb {R} \cup \{\infty \}} : x ⊕ y = min { x , y } {\displaystyle x\oplus y=\min\{x,y\}} x ⊗ y = x + y {\displaystyle x\otimes y=x+y} It is easily verified that with ∞ {\displaystyle \infty } as the additive identity, these binary operations on R ∪ { ∞ } {\displaystyle \mathbb {R} \cup \{\infty \}} form a semiring.

    Read more →
  • Social television

    Social television

    Social television is the union of television and social media. Millions of people now share their TV experience with other viewers on social media such as Twitter and Facebook using smartphones and tablets. TV networks and rights holders are increasingly sharing video clips on social platforms to monetise engagement and drive tune-in. The social TV market covers the technologies that support communication and social interaction around TV as well as companies that study television-related social behavior and measure social media activities tied to specific TV broadcasts – many of which have attracted significant investment from established media and technology companies. The market is also seeing numerous tie-ups between broadcasters and social networking players such as Twitter and Facebook. The market is expected to be worth $256bn by 2017. Social TV was named one of the 10 most important emerging technologies by the MIT Technology Review on Social TV in 2010. And in 2011, David Rowan, the editor of Wired magazine, named Social TV at number three of six in his peek into 2011 and what tech trends to expect to get traction. Ynon Kreiz, CEO of the Endemol Group told the audience at the Digital Life Design (DLD) conference in January 2011: "Everyone says that social television will be big. I think it's not going to be big—it's going to be huge". Much of the investment in the earlier years of social TV went into standalone social TV apps. The industry believed these apps would provide an appealing and complimentary consumer experience which could then be monetized with ads. These apps featured TV listings, check-ins, stickers and synchronised second-screen content but struggled to attract users away from Twitter and Facebook. Most of these companies have since gone out of business or been acquired amid a wave of consolidation and the market has instead focused on the activities of the social media channels themselves – such as Twitter Amplify, Facebook Suggested Videos and Snapchat Discover – and the technologies that support them. == Twitter == Twitter and Facebook are both helping users connect around media, which can provoke strong debate and engagement. Both social platforms want to be the 'digital watercooler' and host conversation around TV because the engagement and data about what media people consume can then be used to generate advertising revenue. As an open platform, conversation on Twitter is closely aligned with real-time events. In May 2013, it launched Twitter Amplify – an advertising product for media and consumer brands. With Amplify, Twitter runs video highlights from major live broadcasts, with advertisers' names and messages playing before the clip. By February 2014, all four major U.S. TV networks had signed up to the Amplify program, bringing a variety of premium TV content onto the social platform in the form of in-tweet real-time video clips. In June 2014, Twitter acquired its Twitter Amplify partner in the U.S. SnappyTV, a company that was helping broadcasters and rights holders to share video content both organically across social and via Twitter's Amplify program. Twitter continues to rely on Grabyo, which has also struck numerous deals with some of the largest broadcasters and rights holders in Europe and North America to share video content across Facebook and Twitter. == Facebook == Facebook made significant changes to its platform in 2014 including updates to its algorithm to enhance how it serves video in users' feeds. It also launched video autoplay to get users to watch the videos in their feeds. It rapidly surpassed Twitter and by the end of 2014 it was enjoying three billion video views a day on its platform and had announced a partnership with the NFL, one of Twitter's most active Twitter Amplify partners. In April 2015, at its F8 Developer Conference, it revealed it was working with Grabyo among other technology partners to bring video onto its platform. Then in July it announced it would be launching Facebook Suggested Videos, bringing related videos and ads to anyone that clicks on a video – a move that not only competed with Twitter's commercial video offering but also put it in direct competition with YouTube. == TV Time == TV Time is a television dedicated social network that allows users to keep track of the television series they watch, as well as films. It also allows them to express their reaction to the media they have seen with episode specific voting for favorite characters and emotional reaction to episodes, as well as commenting in episode restrictive pages. This way users are able to avoid spoilers while also finding a precise audience and community for each of their interactions, as opposed to bigger, non-television dedicated social medias such as Facebook and Twitter where the likelihood of unintentionally reading spoilers is much higher. TV Time offers an analytics service called "TVLytics" where the votes and reactions collected from users can be studied for research and television production purposes. == Advertising == According to Businessinsider.com, there are variety of applications for social TV, including support for TV ad sales, optimizing TV ad buys, making ad buys more efficient, as a complement to audience measurement, and eventually, audience forecasting and real-time optimization. Social TV data can ease access to focus groups and may create a positive feedback loop for generating ultra-sticky TV programming and multi-screen ad campaigns. == In numbers == Viewers share their TV experience on social media in real-time as events unfold: between 88-100m Facebook users login to the platform during the primetime hours of 8pm – 11pm in the US. The volume of social media engagement in TV is also rising – according to Nielsen SocialGuide, there was a 38% increase in tweets about TV in 2013 to 263m. For the 2014 Super Bowl, Twitter reported that a record 24.9 million tweets about the game were sent during the telecast, peaking at 381,605 tweets per minute. Facebook reported that 50 million people discussed the Super Bowl, generating 185 million interactions. The 2014 Oscars generated 5m tweets, viewed by an audience of 37m unique Twitter users and delivering 3.3bn impressions globally as conversation and key moments were shared virally across the platform. In 2014 the All England Lawn Tennis Club (AELTC), hosts of Wimbledon, used Grabyo to share video content across social. The videos were viewed 3.5 million times across Facebook and Twitter. In partnered with Grabyo again in 2015 and the videos generated over 48 million views across Facebook and Twitter. == Television shows with social integration == Here are some examples of how TV executives are integrating social elements with TV shows: C-SPAN streamed tweets from US Senators and Representatives during the quorum call The Voice had the judges of the program tweet during the show and the posts scrolls on the bottom of the screen. The use of Twitter also led to an increase in viewers. "Glee" Entertainment Weekly created a second screen viewing platform for the Glee season 3 premiere. == Related publications == Erika Jonietz. "Making TV Social, Virtually" MIT Technology Review. (January 11, 2010) AmigoTV (Alcatel-Lucent; Coppens et al.) – 2004 www.ist-ipmedianet.org/Alcatel_EuroiTV2004_AmigoTV_short_paper_S4-2.pdf Nextream (MIT Media Lab, Martin et al.) – 2010 Social Interactive Television: Immersive Shared Experiences and Perspectives (P. Cesar, D. Geerts, and K. Chorianopoulos (eds.)) – 2009 Social TV and the Emergence of Interactive TV – Multimedia Research Group – November 2010 Interactive Social TV on Service Oriented Environments: Challenges and Enablers (May 2011) == Systems == Boxee – acquired by Samsung GetGlue – acquired by i.TV Grabyo KIT digital Miso TV Tank Top TV WiO Xbox Live

    Read more →
  • Story (social media)

    Story (social media)

    In social media, a story is a function in which the user tells a narrative or provides status messages and information in the form of short, time-limited clips in an automatically running sequence. == Definition == A story is a short sequence of images, videos, or other social media content, which can be accompanied by backgrounds, music, text, stickers, animations, filters or emojis. Social media platforms typically advance through the sequence automatically when presenting a story to a viewer. Although the sequential nature of stories can be used to tell a narrative, the pieces of a story can also be unrelated. Social media platforms that offer stories will typically have a primary story for each user which consists of everything the user posted to their story over a certain period of time, usually the most recent 24 hours. Most stories cannot be changed afterwards and are only available for a short time. Stories are almost exclusively created on a mobile device such as a smartphone or tablet computer and are usually displayed vertically. == History == In October 2013, Snapchat first introduced the story function as a series of Snaps that can together tell a narrative through a chronological order, with each Snap being viewable by all of the poster's friends and deleted after 24 hours. Stories soon surpassed private Snaps to become Snapchat's most-viewed type of post. After 2015, Snapchat introduced a feature allowing users to post private stories viewable by a chosen subset of their friends. Later other apps would copy this feature. In August 2016, Instagram introduced a stories function that deletes the content after 24 hours. Various commenters have accused the site of copying Snapchat. In February 2017, the instant messenger WhatsApp introduced the Now Status stories function in beta, which was later renamed Status. In March 2017, a story function was introduced in Facebook Messenger. In February 2018, Google launched AMP Stories, bringing a story-style format to certain Google search results on mobile devices. In August 2018, YouTube introduced a stories function that initially was limited to pictures, but was later expanded to support short video clips. The feature was shut down in June 2023. In August 2018, the GIF website Giphy introduced a story function. In March 2022, TikTok added a story feature which allowed users to create 15 second long videos that delete after 24 hours. In June 2023, Telegram CEO Pavel Durov announced stories for Telegram would be released in July 2023. In July 2023, the feature was released for premium users, and in August 2023 it was rolled out for all users. == User motivations == In 2022, a study performed by Jia-Dai (Evelyn) Lu and Jhih-Syuan (Elaine) Lin examined the various motivations for updating stories on Instagram. The researchers found a new configuration of motivations for using Instagram Stories: exploration, self-enhancement, perceived functionality, entertainment, social sharing, relationship building, novelty, and surveillance. The findings also highlighted that contribution and creation activities are likely to result in positive emotions, while creation alone predicts negative emotions while updating stories on Instagram. == Usage statistics == In 2019, around 1.5 billion people worldwide every day on average used the stories function in a social network or messenger. Younger people in particular use this function. More than 20% of people aged 18 to 24 use Instagram stories, while it is just under 2% of those over 55. In a Facebook survey of 18,000 participants from 12 countries, 68% said they used the stories function at least once a month. Stories in the areas of fashion and tourism are particularly popular. The website Fanpage Karma analyzed several Instagram accounts and determined the average reach of posts and stories per follower, concluding that posts have a higher reach than stories, which often have less than half the reach.

    Read more →
  • Kolmogorov–Arnold Networks

    Kolmogorov–Arnold Networks

    Kolmogorov–Arnold Networks (KANs) are a type of artificial neural network architecture inspired by the Kolmogorov–Arnold representation theorem, also known as the superposition theorem. Unlike traditional multilayer perceptrons (MLPs), which rely on fixed activation functions and linear weights, KANs replace each weight with a learnable univariate function, often represented using splines. == History == KANs (Kolmogorov–Arnold Networks) were proposed by Liu et al. (2024) as a generalization of the Kolmogorov–Arnold representation theorem (KART), aiming to outperform MLPs in small-scale AI and scientific tasks. Before KANs, numerous studies explored KART's connections to neural networks or used it as a basis for designing new network architectures. In the 1980s and 1990s, early research applied KART to neural network design. Kůrková et al. (1992), Hecht-Nielsen (1987), and Nees (1994) established theoretical foundations for multilayer networks based on KART. Igelnik et al. (2003) introduced the Kolmogorov Spline Network using cubic splines to model complex functions. Sprecher (1996, 1997) introduced numerical methods for building network layers, while Nakamura et al. (1993) created activation functions with guaranteed approximation accuracy. These works linked KART's theoretical potential with practical neural network implementation. KART has also been used in other computational and theoretical fields. Coppejans (2004) developed nonparametric regression estimators using B-splines, Bryant (2008) applied it to high-dimensional image tasks, Liu (2015) investigated theoretical applications in optimal transport and image encryption, and more recently, Polar and Poluektov (2021) used Urysohn operators for efficient KART construction, while Fakhoury et al. (2022) introduced ExSpliNet, integrating KART with probabilistic trees and multivariate B-splines for improved function approximation. == Architecture == KANs are based on the Kolmogorov–Arnold representation theorem, which was linked to the 13th Hilbert problem. Given x = ( x 1 , x 2 , … , x n ) {\displaystyle x=(x_{1},x_{2},\dots ,x_{n})} consisting of n variables, a multivariate continuous function f ( x ) {\displaystyle f(x)} can be represented as: f ( x ) = f ( x 1 , … , x n ) = ∑ q = 1 2 n + 1 Φ q ( ∑ p = 1 n φ q , p ( x p ) ) {\displaystyle f(x)=f(x_{1},\dots ,x_{n})=\sum _{q=1}^{2n+1}\Phi _{q}\left(\sum _{p=1}^{n}\varphi _{q,p}(x_{p})\right)} (1) This formulation contains two nested summations: an outer and an inner sum. The outer sum ∑ q = 1 2 n + 1 {\displaystyle \sum _{q=1}^{2n+1}} aggregates 2 n + 1 {\displaystyle 2n+1} terms, each involving a function Φ q : R → R {\displaystyle \Phi _{q}:\mathbb {R} \to \mathbb {R} } . The inner sum ∑ p = 1 n {\displaystyle \sum _{p=1}^{n}} computes n terms for each q, where each term φ q , p : [ 0 , 1 ] → R {\displaystyle \varphi _{q,p}:[0,1]\to \mathbb {R} } is a continuous function of the single variable x p {\displaystyle x_{p}} . The inner continuous functions φ q , p {\displaystyle \varphi _{q,p}} are universal, independent of f {\displaystyle f} , while the outer functions Φ q {\displaystyle \Phi _{q}} depend on the specific function f {\displaystyle f} being represented. The representation (1) holds for all multivariate functions f {\displaystyle f} as proved in . If f {\displaystyle f} is continuous, then the outer functions Φ q {\displaystyle \Phi _{q}} are continuous; if f {\displaystyle f} is discontinuous, then the corresponding Φ q {\displaystyle \Phi _{q}} are generally discontinuous, while the inner functions φ q , p {\displaystyle \varphi _{q,p}} remain the same universal functions. Liu et al. proposed the name KAN. A general KAN network consisting of L layers takes x to generate the output as: K A N ( x ) = ( Φ L − 1 ∘ Φ L − 2 ∘ ⋯ ∘ Φ 1 ∘ Φ 0 ) x {\displaystyle \mathrm {KAN} (x)=(\Phi ^{L-1}\circ \Phi ^{L-2}\circ \cdots \circ \Phi ^{1}\circ \Phi ^{0})x} (3) Here, Φ l {\displaystyle \Phi ^{l}} is the function matrix of the l-th KAN layer or a set of pre-activations. Let i denote the neuron of the l-th layer and j the neuron of the (l+1)-th layer. The activation function φ j , i l {\displaystyle \varphi _{j,i}^{l}} connects (l, i) to (l+1, j): φ j , i l , l = 0 , … , L − 1 , i = 1 , … , n l , j = 1 , … , n l + 1 {\displaystyle \varphi _{j,i}^{l},\quad l=0,\dots ,L-1,\;i=1,\dots ,n_{l},\;j=1,\dots ,n_{l+1}} (4) where nl is the number of nodes of the l-th layer. Thus, the function matrix Φ l {\displaystyle \Phi ^{l}} can be represented as an n l + 1 × n l {\displaystyle n_{l+1}\times n_{l}} matrix of activations: x l + 1 = ( φ 1 , 1 l ( ⋅ ) φ 1 , 2 l ( ⋅ ) ⋯ φ 1 , n l l ( ⋅ ) φ 2 , 1 l ( ⋅ ) φ 2 , 2 l ( ⋅ ) ⋯ φ 2 , n l l ( ⋅ ) ⋮ ⋮ ⋱ ⋮ φ n l + 1 , 1 l ( ⋅ ) φ n l + 1 , 2 l ( ⋅ ) ⋯ φ n l + 1 , n l l ( ⋅ ) ) x l {\displaystyle x^{l+1}={\begin{pmatrix}\varphi _{1,1}^{l}(\cdot )&\varphi _{1,2}^{l}(\cdot )&\cdots &\varphi _{1,n_{l}}^{l}(\cdot )\\\varphi _{2,1}^{l}(\cdot )&\varphi _{2,2}^{l}(\cdot )&\cdots &\varphi _{2,n_{l}}^{l}(\cdot )\\\vdots &\vdots &\ddots &\vdots \\\varphi _{n_{l+1},1}^{l}(\cdot )&\varphi _{n_{l+1},2}^{l}(\cdot )&\cdots &\varphi _{n_{l+1},n_{l}}^{l}(\cdot )\end{pmatrix}}x^{l}} == Implementations == To make the KAN layers optimizable, the inner function is formed by the combination of spline and basic functions as the formula: φ ( x ) = w b b ( x ) + w s spline ( x ) {\displaystyle \varphi (x)=w_{b}\,b(x)+w_{s}\,{\text{spline}}(x)} where b ( x ) {\displaystyle b(x)} is the basic function, usually defined as s i l u ( x ) = x / ( 1 + e x ) {\displaystyle silu(x)=x/(1+e^{x})} and w b {\displaystyle w_{b}} is the base weight matrix. Also, w s {\displaystyle w_{s}} is the spline weight matrix and spline ( x ) {\displaystyle {\text{spline}}(x)} is the spline function. The spline function can be a sum of B-splines. spline ( x ) = ∑ i c i B i ( x ) {\displaystyle {\text{spline}}(x)=\sum _{i}c_{i}B_{i}(x)} Many studies suggested to use other polynomial and curve functions instead of B-spline to create new KAN variants. == Functions used == The choice of functional basis strongly influences the performance of KANs. Common function families include: B-splines: Provide locality, smoothness, and interpretability; they are the most widely used in current implementations. RBFs (include Gaussian RBFs): Capture localized features in data and are effective in approximating functions with non-linear or clustered structures. Chebyshev polynomials: Offer efficient approximation with minimized error in the maximum norm, making them useful for stable function representation. Rational function: Useful for approximating functions with singularities or sharp variations, as they can model asymptotic behavior better than polynomials. Fourier series: Capture periodic patterns effectively and are particularly useful in domains such as physics-informed machine learning. Wavelet functions (DoG, Mexican hat, Morlet, and Shannon): Used for feature extraction as they can capture both high-frequency and low-frequency data components. Piecewise linear functions: Provide efficient approximation for multivariate functions in KANs. == Usage == In some modern neural architectures like convolutional neural networks (CNNs), recurrent neural networks (RNNs), and Transformers, KANs are typically used as drop-in substitutes for MLP layers. Despite KANs' general-purpose design, researchers have created and used them for a number of tasks: Scientific machine learning (SciML): Function fitting, partial differential equations (PDEs) and physical/mathematical laws. Continual learning: KANs better preserve previously learned information during incremental updates, avoiding catastrophic forgetting due to the locality of spline adjustments. Graph neural networks: Extensions such as Kolmogorov–Arnold Graph Neural Networks (KA-GNNs) integrate KAN modules into message-passing architectures, showing improvements in molecular property prediction tasks. Sensor data processing: Kolmogorov–Arnold Networks (KANs) have recently been applied to sensor data processing due to their ability to model complex nonlinear relationships with relatively few parameters and improved interpretability compared to conventional multilayer perceptrons. Applications include industrial soft sensors, biomedical signal analysis, remote sensing, and environmental monitoring systems. == Drawbacks == KANs can be computationally intensive and require a large number of parameters due to their use of polynomial functions to capture data.

    Read more →
  • Social media coverage of the Olympics

    Social media coverage of the Olympics

    Over the years, television broadcast rights have distinguished what Olympic-related content can be accessed by fans online. By doing so, mobile-friendly social platforms began to integrate into the Olympics. Athletes and fans use these platforms to share live updates, special moments, and behind-the-scenes specials. Various social media platforms have been used for Olympic content, including Twitter and Facebook. Some marketers credit social media for prompting the official U.S. broadcasters, NBC, to live stream events, including early rounds. == Background == The Olympics is able to advertise to its viewers and its host country with the use of data it collects through Social media marketing. Prominent social media platforms include: Twitter, Facebook, Instagram, Tumblr, YouTube, Google, MSN, Yahoo and many more. Campaign Initiatives and Artificial Intelligence technologies have been used to analyze the social media content of users. Information from consumers such as their preferences, demographics, age and locality are all analyzed to gain consumer insight. Campaign initiatives and AI technologies were used for such purposes in the 2010 Vancouver Winter Olympics and are in use currently. Social media marketing of the Olympics is a new phenomena, beginning prior to the 2008 Beijing Olympics == Variations == There are two classifications of social media marketing recognized by the IOC: Officially sanctioned content from rights holders and sponsors that maximizes the use of Olympic content (imagery, hashtag) Unofficial content that is generated by brands that leverage the excitement of the Olympics == 2008 Beijing Summer Olympics == Social media marketing emerged as a phenomenon during the 2008 Beijing Olympics, which progressed as a marketing and an advertising tactic ever since. The Beijing Olympics became the test subject for social media marketing initiatives started by advertising agencies. In 2008, social media marketing began the transition from one-sided communication to mass communication of the Olympic Games. Although social media marketing of the Olympic Games began in 2008, the audience to the Olympics was still primarily reached through television–reaching an audience of 4.3 billion viewers. At the time, the viewers of the Olympic Games through Internet website platforms made up an audience of approximately 390 million individuals. What was the beginning of Olympic social media marketing, was also the beginning of a more globalized experience of the Olympic Games via social media. Twitter, now a prominent social media platform, began in 2006 and grew to three million active users by the beginning of the 2008 Beijing Olympics. Members of Facebook, another prominent social media platform, tracking the Olympic Games grew from approximately one million during the Olympic Games of Athens 2004 to 90 million during the 2008 Beijing Olympics. Social media use, in general, increased by 24 percent between 2007 and 2008–from 63 percent of U.S. adults to 87 percent of U.S. adults. == 2010 Vancouver Winter Olympics == The International Olympic Committee (IOC) deemed The Vancouver Winter Olympics as "the first social media games” based on its fan base through social media platforms. The IOC launched their Facebook page a month before the games began, attracting 1.5 million fans. Shifting to online viewing attracted a younger audience than past Olympic games with over 60 percent of Facebook fans being under 24 years of age. Athletes like Lindsey Vonn and Shaun White reached fans on social media as the platform posted behind-the-scenes coverage on their experiences. The IOC used social media to create competitions between athletes and fans streamed online. Its YouTube channel hosted a “Best of Us” challenge in which the public could compete in games with their favorite athletes, acquiring three million viewers. Photos spread across social media platforms, such as Flickr, which had 11,000 photos posted by 600 photographers, bringing a new perspective to the games. Twitter contributed constant live updates of the competitions. The IOC's Twitter following doubled to 12,000 followers during the Vancouver Olympics, creating a larger viewer population for the games. The IOC created social media guidelines as more athletes and fans got online to interact with the Olympics. Social media was still relatively new as a marketing platform, so these guidelines confused many individuals. == 2012 London Summer Olympics == The London 2012 Olympic Games succeeded in broadcasting, participation and marketing. For the first time, the IOC broadcast the Olympic Games live and on-demand through YouTube, allowing fans to access the Games anytime, anywhere through live streaming. The combination of conventional broadcasting and mobile platforms reached a global audience of 4.8 billion people. Social media soared with Facebook, Twitter and Google+, attracting 4.7 million followers. Athletes shared photographs, interacted online with fans and updated daily, either in person or via an agent. Instagram was established by 2012, making itself a premier photo-sharing platform perfect for athletes to capture their emotions. Lewis Wiltshire, head of sport for Twitter UK said, "Never before have fans had such direct access to their sporting heroes." Social media created conversation on fan opinions regarding athletes, including 962,756 total mentions of Usain Bolt, “Fastest Man in History,” who defended the 100 meter and 200 meter gold medals. Michael Phelps followed with 828,081 total mentions. Olympic sponsors were active on social media; created several campaigns to promote their brands; and inspired viewers with mass participation and personalized events. The Adidas “Take the Stage” Campaign recognized talent around the world, installing a photo booth and inviting the 550 Olympics athletes to take the stage. (IOC Marketing Report 2012). David Beckham surprised fans at the photo booth in Westfield shopping centre, gaining popularity in UK media. Coca-Cola, Acer Inc., McDonald's, Visa Inc. and several others used similar tactics of participation to attract viewers. == 2014 Sochi Winter Olympics == === Channels === The 2014 Winter Olympic Games were held in Sochi, a city in Krasnodar Krai, Russia, establishing the first “social media Olympics” for Russia. The most popular Russian social media and networking service, VK, created an Olympic page, similar to Facebook's. The Olympic VK page has 2.8 million fans and—the most popular official community on the platform. Throughout the games, VK had 54 million Olympic mentions, an average of 1.5 million per day. Numbers grew on other social media pages: more than 2 million fans joined the Olympic Facebook page, 168,101 followed the Olympic Twitter, 150,000 followed the Olympic Instagram and three million visited the Olympic website in February 2014. There were 90,000 total updates on social media by Sochi 2014 Olympians and teams. The United States was the most active country during the games logging 22,598 posts across Facebook, Twitter, and Instagram. === Engagement === With social media there is also hashtags. The most popular hashtag was #sochi2014 with almost 11,000 uses. The next top five hashtags were #wearewinter, #teamusa, #olympics, #goaus and #wirfuerD. Another popular hashtag was #Sochiproblems, depicting local struggles. Photos of the poor state of Sochi on all platforms made the games the number one trending topic one week before the opening ceremony. #SochiFail and #SochiProblems gave multiple reports of the poor living arrangements, incomplete construction, broken elevators, and polluted waters. This was one way that social media provided awareness to its users. === Media Perceptions === Media perceptions varied during the games; the Olympics was viewed as a confrontation between Eastern and Western Civilizations. The LGBT community took a stand against the games. Sponsors for the games including Coca-Cola, Mcdonald's, and P&G protested against Russian authorities and Russian anti-LGBT laws. Many protests took a stand against Russian laws, which created a discussion between human rights advocates. Advocates believed organizations should not promote certain values in western markets while supporting an anti-human rights government in another market. == 2016 Rio Summer Olympics == Social media marketing was an influential tool in the promotion and analysis of the 2016 Rio Olympics. Thomas Bach, President of the International Olympic Committee said that the power of sport demonstrates that diversity and interconnectedness can enlighten us all. With over 25,000+ sources of accredited media covering the games, the 2016 games were the most consumed Olympic games to date. Marketing for the Rio Olympics began in 2013 and ultimately lasted 3 years. There were 26 million visits to Olympic.org, the official website of the Olympic games, and over 7 billion views of official Olympic content on social media. There were o

    Read more →
  • Social media reach

    Social media reach

    Social media reach is a media analytics metric that refers to the number of users who have come across a particular content on a particular social media platform. Social media platforms have their own individual ways of tracking, analyzing and reporting the traffic on each of the individual platforms. As these platforms are a main source of communication between companies and their target audiences, by conducting research, companies are able to utilize analytical information, such as the reach of their posts, to better understand the interactions between the users and their content. There are multiple underlying factors that will determine what shows up on a newsfeed or timeline. Algorithms, for example, are a type of factor that can alter the reach of a post due to the way the algorithm is coded, which can affect who sees a post and when. Other examples of factors that can impede the reach can include the time at which posts are made, as well as how frequent the posts are between one another. In comparison, an impression is the total number of circumstances where content has been shown on a social timeline, meanwhile, engagement looks at how people interact with the content that they see on a social platform such as like, share or retweet. == Reach on Facebook == Facebook has their own analytic platform which allows the user to see how other users are interacting with their posts, with the use of multiple metrics. This is not something the average user uses, but rather a tool that is used by pages or public figures. For example, Facebook pages that represent a business often look at the activity their posts have generated. There are three types of reach that can be looked at on the Facebook analytic platform. === Types of reach === ==== Organic Reach ==== This type of reach regards the number of distinct users that have seen a specific post on their feed. Organic reach, in other words is the number of people who have seen the post being analyzed on their Facebook newsfeed. Data gathered from this type of reach can give intel to those doing the analysis, such as the demographics of those who have seen the post. ==== Paid Reach ==== This type of reach regards the number of times that distinct users have come across sponsored posts, ads or content. In other words, paid reach is the number of times Facebook users have seen a post that has been paid for by a company. Data collected can give insight, to advertisers or marketers for example, on the activity based around the reach of their post. ==== Viral Reach ==== This type of reach regards the number of views by distinct users on posts that have been commented on or shared by their friends on Facebook. In other words, viral reach looks at the number of people who have seen a post after a friend of theirs commented or shared the original post, therefore it showed on their timeline. Viral reach can be looked at in terms of a collective number of times that the post has been on individual user's timelines. Data collected from viral reach can be used in multiple ways, for example, it can be used to analyze the type of content that gets shared or commented on and can be further used to compare to other posts. === Engaged users === This refers to the number of individual users who have clicked and interacted with a post on Facebook. == Reach on Twitter == Twitter gives access to any of their users to analytics of their tweets as well as their followers. Their dashboard is user friendly, which allows anyone to take a look at the analytics behind their Twitter account. This open access is useful for both the average user and companies as it can provide a quick glance or general outlook of who has seen their tweets. The way that Twitter works is slightly different than the way of Facebook in terms of the reach. On Twitter, especially for users with a higher profile, they are not only engaging with the people who follow them, but also with the followers of their own followers. The reach metric on Twitter looks at the quantity of Twitter users who have been engaged, but also the number of users that follow them as well. This metric is useful to see the if the tweets/content being shared on Twitter are contributing to the growth of audience on this platform. == Reach on Instagram == Instagram gives their users access to their reach, in the Instagram Insights section. Instagram insights can be used to learn more about an account's followers and performance. Reach indicates the total number of unique Instagram accounts that have seen your Instagram post or story. You can find this data by looking at each individual post insights. == Uses of reach == The reach can be a useful metric to analyze for marketers and advertisers. Social media is a platform that is used by marketers to directly target their intended audience with ease. These platforms not only allow marketers to get a better understanding of their audience, but also allow advertisers to insert their ads onto the timelines of specific users to later be able to conduct research to see the reach of their posts/content. The basic goal of marketers is to increase their reach as much as possible to impact bigger audiences of their dream customers and, in the end, make more sales. When doing organic social media marketing, using paid methods like ads or doing influencer marketing whether it is paid or free, it allows marketers to track the performance of their strategy and tweak it based on what works and what does not. == Analytics and reach == Social analytics looks at the data collected based on the interactions of users on social media platforms. A lot of information can be gathered which can provide intel based on user activities on social media. When looking into analytics in regard to social media, each company or group has a different goal in mind to engage their audience. At a glance, the three might seem as if they are very similar, however the differences between them are significant. There are many aspects that can be analyzed from the data gathered from social media platforms, depending on what is being observed, the correct metric would then be selected to further analyze. One example of the many metrics that can be used through social analytics is the reach. == Reach formula == To calculate social media reach one can use the following formula: R = I f ¯ {\displaystyle R={\frac {I}{\bar {f}}}} where R {\displaystyle R} — is social media reach, I {\displaystyle I} stands for the number of impressions, f ¯ {\displaystyle {\bar {f}}} is the average frequency of impressions per user. f ¯ {\displaystyle {\bar {f}}} represents the number of events when the ad is shown to a particular user. The average value should be calculated over the time period with stable settings of advertisement campaign. == Commenting For Better Reach == Commenting For Better Reach also known as "CFBR" is a widely used strategy for organically boosting post reach on social media platforms. Algorithms tend to favor posts with substantial likes and comments, granting them broader exposure compared to less engaging content. Primarily seen on LinkedIn, a platform geared toward professional networking and business connections, the use of CFBR signals active engagement aimed at enhancing post visibility. It is important to note that genuine and meaningful comments are key to effective engagement. Spammy or irrelevant comments not only detract from the conversation but may also limit a post's potential reach and impact.

    Read more →
  • Story (social media)

    Story (social media)

    In social media, a story is a function in which the user tells a narrative or provides status messages and information in the form of short, time-limited clips in an automatically running sequence. == Definition == A story is a short sequence of images, videos, or other social media content, which can be accompanied by backgrounds, music, text, stickers, animations, filters or emojis. Social media platforms typically advance through the sequence automatically when presenting a story to a viewer. Although the sequential nature of stories can be used to tell a narrative, the pieces of a story can also be unrelated. Social media platforms that offer stories will typically have a primary story for each user which consists of everything the user posted to their story over a certain period of time, usually the most recent 24 hours. Most stories cannot be changed afterwards and are only available for a short time. Stories are almost exclusively created on a mobile device such as a smartphone or tablet computer and are usually displayed vertically. == History == In October 2013, Snapchat first introduced the story function as a series of Snaps that can together tell a narrative through a chronological order, with each Snap being viewable by all of the poster's friends and deleted after 24 hours. Stories soon surpassed private Snaps to become Snapchat's most-viewed type of post. After 2015, Snapchat introduced a feature allowing users to post private stories viewable by a chosen subset of their friends. Later other apps would copy this feature. In August 2016, Instagram introduced a stories function that deletes the content after 24 hours. Various commenters have accused the site of copying Snapchat. In February 2017, the instant messenger WhatsApp introduced the Now Status stories function in beta, which was later renamed Status. In March 2017, a story function was introduced in Facebook Messenger. In February 2018, Google launched AMP Stories, bringing a story-style format to certain Google search results on mobile devices. In August 2018, YouTube introduced a stories function that initially was limited to pictures, but was later expanded to support short video clips. The feature was shut down in June 2023. In August 2018, the GIF website Giphy introduced a story function. In March 2022, TikTok added a story feature which allowed users to create 15 second long videos that delete after 24 hours. In June 2023, Telegram CEO Pavel Durov announced stories for Telegram would be released in July 2023. In July 2023, the feature was released for premium users, and in August 2023 it was rolled out for all users. == User motivations == In 2022, a study performed by Jia-Dai (Evelyn) Lu and Jhih-Syuan (Elaine) Lin examined the various motivations for updating stories on Instagram. The researchers found a new configuration of motivations for using Instagram Stories: exploration, self-enhancement, perceived functionality, entertainment, social sharing, relationship building, novelty, and surveillance. The findings also highlighted that contribution and creation activities are likely to result in positive emotions, while creation alone predicts negative emotions while updating stories on Instagram. == Usage statistics == In 2019, around 1.5 billion people worldwide every day on average used the stories function in a social network or messenger. Younger people in particular use this function. More than 20% of people aged 18 to 24 use Instagram stories, while it is just under 2% of those over 55. In a Facebook survey of 18,000 participants from 12 countries, 68% said they used the stories function at least once a month. Stories in the areas of fashion and tourism are particularly popular. The website Fanpage Karma analyzed several Instagram accounts and determined the average reach of posts and stories per follower, concluding that posts have a higher reach than stories, which often have less than half the reach.

    Read more →
  • Energy-based model

    Energy-based model

    An energy-based model (EBM), also called Canonical Ensemble Learning (CEL) or Learning via Canonical Ensemble (LCE), is an application of canonical ensemble formulation from statistical physics for learning from data. The approach prominently appears in generative artificial intelligence. EBMs provide a unified framework for many probabilistic and non-probabilistic approaches to such learning, particularly for training graphical and other structured models. An EBM learns the characteristics of a target dataset and generates a similar but larger dataset. EBMs detect the latent variables of a dataset and generate new datasets with a similar distribution. Energy-based generative neural networks is a class of generative models, which aim to learn explicit probability distributions of data in the form of energy-based models, the energy functions of which are parameterized by modern deep neural networks. Boltzmann machines are a special form of energy-based models with a specific parametrization of the energy. == Description == For a given input x {\displaystyle x} , the model describes an energy E θ ( x ) {\displaystyle E_{\theta }(x)} such that the Boltzmann distribution P θ ( x ) = e − β E θ ( x ) Z ( θ ) {\displaystyle P_{\theta }(x)={e^{-\beta E_{\theta }(x)} \over Z(\theta )}} is a probability (density), and typically β = 1 {\displaystyle \beta =1} . Since the normalization constant: Z ( θ ) := ∫ x ∈ X e − β E θ ( x ) d x {\displaystyle Z(\theta ):=\int _{x\in X}e^{-\beta E_{\theta }(x)}dx} (also known as the partition function) depends on all the Boltzmann factors of all possible inputs x {\displaystyle x} , it cannot be easily computed or reliably estimated during training simply using standard maximum likelihood estimation. However, for maximizing the likelihood during training, the gradient of the log-likelihood of a single training example x {\displaystyle x} is given by using the chain rule: ∂ θ log ⁡ ( P θ ( x ) ) = E x ′ ∼ P θ [ ∂ θ E θ ( x ′ ) ] − ∂ θ E θ ( x ) ( ∗ ) {\displaystyle \partial _{\theta }\log \left(P_{\theta }(x)\right)=\mathbb {E} _{x'\sim P_{\theta }}[\partial _{\theta }E_{\theta }(x')]-\partial _{\theta }E_{\theta }(x)\,()} The expectation in the above formula for the gradient can be approximately estimated by drawing samples x ′ {\displaystyle x'} from the distribution P θ {\displaystyle P_{\theta }} using Markov chain Monte Carlo (MCMC). Early energy-based models, such as the 2003 Boltzmann machine by Hinton, estimated this expectation via blocked Gibbs sampling. Newer approaches make use of more efficient Stochastic Gradient Langevin Dynamics (LD), drawing samples using: x 0 ′ ∼ P 0 , x i + 1 ′ = x i ′ − α 2 ∂ E θ ( x i ′ ) ∂ x i ′ + ϵ {\displaystyle x_{0}'\sim P_{0},x_{i+1}'=x_{i}'-{\frac {\alpha }{2}}{\frac {\partial E_{\theta }(x_{i}')}{\partial x_{i}'}}+\epsilon } , where ϵ ∼ N ( 0 , α ) {\displaystyle \epsilon \sim {\mathcal {N}}(0,\alpha )} . A replay buffer of past values x i ′ {\displaystyle x_{i}'} is used with LD to initialize the optimization module. The parameters θ {\displaystyle \theta } of the neural network are therefore trained in a generative manner via MCMC-based maximum likelihood estimation: the learning process follows an "analysis by synthesis" scheme, where within each learning iteration, the algorithm samples the synthesized examples from the current model by a gradient-based MCMC method (e.g., Langevin dynamics or Hybrid Monte Carlo), and then updates the parameters θ {\displaystyle \theta } based on the difference between the training examples and the synthesized ones – see equation ( ∗ ) {\displaystyle ()} . This process can be interpreted as an alternating mode seeking and mode shifting process, and also has an adversarial interpretation. Essentially, the model learns a function E θ {\displaystyle E_{\theta }} that associates low energies to correct values, and higher energies to incorrect values. After training, given a converged energy model E θ {\displaystyle E_{\theta }} , the Metropolis–Hastings algorithm can be used to draw new samples. The acceptance probability is given by: P a c c ( x i → x ∗ ) = min ( 1 , P θ ( x ∗ ) P θ ( x i ) ) . {\displaystyle P_{acc}(x_{i}\to x^{})=\min \left(1,{\frac {P_{\theta }(x^{})}{P_{\theta }(x_{i})}}\right).} == History == The term "energy-based models" was first coined in a 2003 JMLR paper where the authors defined a generalisation of independent components analysis to the overcomplete setting using EBMs. Other early work on EBMs proposed models that represented energy as a composition of latent and observable variables. == Characteristics == EBMs demonstrate useful properties: Simplicity and stability. The EBM is the only object that needs to be designed and trained. Separate networks need not be trained to ensure balance. Adaptive computation time. An EBM can generate sharp, diverse samples or (more quickly) coarse, less diverse samples. Given infinite time, this procedure produces true samples. Flexibility. In Variational Autoencoders (VAE) and flow-based models, the generator learns a map from a continuous space to a (possibly) discontinuous space containing different data modes. EBMs can learn to assign low energies to disjoint regions (multiple modes). Adaptive generation. EBM generators are implicitly defined by the probability distribution, and automatically adapt as the distribution changes (without training), allowing EBMs to address domains where generator training is impractical, as well as minimizing mode collapse and avoiding spurious modes from out-of-distribution samples. Compositionality. Individual models are unnormalized probability distributions, allowing models to be combined through product of experts or other hierarchical techniques. == Experimental results == On image datasets such as CIFAR-10 and ImageNet 32x32, an EBM model generated high-quality images relatively quickly. It supported combining features learned from one type of image for generating other types of images. It was able to generalize using out-of-distribution datasets, outperforming flow-based and autoregressive models. EBM was relatively resistant to adversarial perturbations, behaving better than models explicitly trained against them with training for classification. == Applications == Target applications include natural language processing, robotics and computer vision. The first energy-based generative neural network is the generative ConvNet proposed in 2016 for image patterns, where the neural network is a convolutional neural network. The model has been generalized to various domains to learn distributions of videos, and 3D voxels. They are made more effective in their variants. They have proven useful for data generation (e.g., image synthesis, video synthesis, 3D shape synthesis, etc.), data recovery (e.g., recovering videos with missing pixels or image frames, 3D super-resolution, etc), data reconstruction (e.g., image reconstruction and linear interpolation ). == Alternatives == EBMs compete with techniques such as variational autoencoders (VAEs), generative adversarial networks (GANs) or normalizing flows. == Extensions == === Joint energy-based models === Joint energy-based models (JEM), proposed in 2020 by Grathwohl et al., allow any classifier with softmax output to be interpreted as energy-based model. The key observation is that such a classifier is trained to predict the conditional probability p θ ( y | x ) = e f → θ ( x ) [ y ] ∑ j = 1 K e f → θ ( x ) [ j ] for y = 1 , … , K and f → θ = ( f 1 , … , f K ) ∈ R K , {\displaystyle p_{\theta }(y|x)={\frac {e^{{\vec {f}}_{\theta }(x)[y]}}{\sum _{j=1}^{K}e^{{\vec {f}}_{\theta }(x)[j]}}}\ \ {\text{ for }}y=1,\dotsc ,K{\text{ and }}{\vec {f}}_{\theta }=(f_{1},\dotsc ,f_{K})\in \mathbb {R} ^{K},} where f → θ ( x ) [ y ] {\displaystyle {\vec {f}}_{\theta }(x)[y]} is the y-th index of the logits f → {\displaystyle {\vec {f}}} corresponding to class y. Without any change to the logits it was proposed to reinterpret the logits to describe a joint probability density: p θ ( y , x ) = e f → θ ( x ) [ y ] Z ( θ ) , {\displaystyle p_{\theta }(y,x)={\frac {e^{{\vec {f}}_{\theta }(x)[y]}}{Z(\theta )}},} with unknown partition function Z ( θ ) {\displaystyle Z(\theta )} and energy E θ ( x , y ) = − f θ ( x ) [ y ] {\displaystyle E_{\theta }(x,y)=-f_{\theta }(x)[y]} . By marginalization, we obtain the unnormalized density p θ ( x ) = ∑ y p θ ( y , x ) = ∑ y e f → θ ( x ) [ y ] Z ( θ ) =: e − E θ ( x ) , {\displaystyle p_{\theta }(x)=\sum _{y}p_{\theta }(y,x)=\sum _{y}{\frac {e^{{\vec {f}}_{\theta }(x)[y]}}{Z(\theta )}}=:e^{-E_{\theta }(x)},} therefore, E θ ( x ) = − log ⁡ ( ∑ y e f → θ ( x ) [ y ] Z ( θ ) ) , {\displaystyle E_{\theta }(x)=-\log \left(\sum _{y}{\frac {e^{{\vec {f}}_{\theta }(x)[y]}}{Z(\theta )}}\right),} so that any classifier can be used to define an energy function E θ ( x ) {\displaystyle E_{\theta }(x)} .

    Read more →
  • Information security

    Information security

    Information security is the practice of protecting information by mitigating information risks. It is part of information risk management. It typically involves preventing or reducing the probability of unauthorized or inappropriate access to data or the unlawful use, disclosure, disruption, deletion, corruption, modification, inspection, recording, or devaluation of information. It also involves actions intended to reduce the adverse impacts of such incidents. Protected information may take any form, e.g., electronic or physical, tangible (e.g., paperwork), or intangible (e.g., knowledge). Information security's primary focus is the balanced protection of data confidentiality, integrity, and availability (known as the CIA triad, unrelated to the US government organization) while maintaining a focus on efficient policy implementation, all without hampering organization productivity. This is largely achieved through a structured risk management process. To standardize this discipline, academics and professionals collaborate to offer guidance, policies, and industry standards on passwords, antivirus software, firewalls, encryption software, legal liability, security awareness and training, and so forth. This standardization may be further driven by a wide variety of laws and regulations that affect how data is accessed, processed, stored, transferred, and destroyed. While paper-based business operations are still prevalent, requiring their own set of information security practices, enterprise digital initiatives are increasingly being emphasized, with information assurance now typically being dealt with by information technology (IT) security specialists. These specialists apply information security to technology (most often some form of computer system). IT security specialists are almost always found in any major enterprise/establishment due to the nature and value of the data within larger businesses. They are responsible for keeping all of the technology within the company secure from malicious attacks that often attempt to acquire critical private information or gain control of the internal systems. There are many specialist roles in Information Security including securing networks and allied infrastructure, securing applications and databases, security testing, information systems auditing, business continuity planning, electronic record discovery, and digital forensics. == Standards == Information security standards are guidelines generally outlined in published materials that aim to protect a user's or an organization's cyber environment from threats. This environment includes the users themselves, hardware such as devices and networks, software such as applications or services, and any information in storage or transit. These standards comprise security concepts, technologies, and guidelines to deal with an adverse event. They may also include assessment criteria and certification for organizations implementing a minimum level of security. These standards are developed by various international and national bodies to prevent or mitigate cyber-attacks, ensure consistency among developers, and establish a minimum standard in industries susceptible to an attack. The ISO/IEC 27000 family, published by the International Organization for Standardization (ISO) and the International Electrotechnical Commission (IEC), provides information about the guidelines and requirements for an Information Security Management System (ISMS). The Common Criteria (ISO/IEC 15408) provides guidelines on evaluating and certifying the security of a system. The IEC 62443 establishes security standards for automation and control systems. Similarly, the ISO/SAE 21434, ETSI EN 303 645, and EN 18031 provide standards for road vehicles, the Internet of Things, and radio-based systems respectively. The NIST Cybersecurity Framework (NIST CSF) is a set of guidelines developed by the U.S. National Institute of Standards and Technology to help organizations with risk management. NIST also publishes various Federal Information Processing Standards (FIPS) and Special Publications. The United Kingdom has introduced Cyber Essentials, which is a certification scheme to protect organizations against common security threats. The Australian Cyber Security Centre publishes the Essential Eight mitigation strategies. The Payment Card Industry Data Security Standard (PCI DSS) regulates handling of cardholder data in order to reduce credit card fraud. UL has published standards related to specific industries such as UL 2900-2-3 for security and life safety signaling systems and UL-2900-2-1 for healthcare and wellness systems. == Threats == Information security threats come in many different forms. Some of the most common threats today are software attacks, theft of intellectual property, theft of identity, theft of equipment or information, sabotage, and information extortion. Viruses, worms, phishing attacks, and Trojan horses are a few common examples of software attacks. The theft of intellectual property has also been an extensive issue for many businesses. Identity theft is the attempt to act as someone else usually to obtain that person's personal information or to take advantage of their access to vital information through social engineering. Sabotage usually consists of the destruction of an organization's website in an attempt to cause loss of confidence on the part of its customers. Information extortion consists of theft of a company's property or information as an attempt to receive a payment in exchange for returning the information or property back to its owner, as with ransomware. One of the most functional precautions against these attacks is to conduct periodical user awareness. Governments, military, corporations, financial institutions, hospitals, non-profit organizations, and private businesses amass a great deal of confidential information about their employees, customers, products, research, and financial status. Should confidential information about a business's customers or finances or new product line fall into the hands of a competitor or hacker, a business and its customers could suffer widespread, irreparable financial loss, as well as damage to the company's reputation. From a business perspective, information security must be balanced against cost; the Gordon-Loeb Model provides a mathematical economic approach for addressing this concern. For the individual, information security has a significant effect on privacy, which is viewed very differently in various cultures. == History == Since the early days of communication, diplomats and military commanders understood that it was necessary to provide some mechanism to protect the confidentiality of correspondence and to have some means of detecting tampering. Julius Caesar is credited with the invention of the Caesar cipher c. 50 B.C., which was created in order to prevent his secret messages from being read should a message fall into the wrong hands. However, for the most part protection was achieved through the application of procedural handling controls. Sensitive information was marked up to indicate that it should be protected and transported by trusted persons, guarded and stored in a secure environment or strong box. As postal services expanded, governments created official organizations to intercept, decipher, read, and reseal letters (e.g., the U.K.'s Secret Office, founded in 1653). In the mid-nineteenth century more complex classification systems were developed to allow governments to manage their information according to the degree of sensitivity. For example, the British Government codified this, to some extent, with the publication of the Official Secrets Act in 1889. Section 1 of the law concerned espionage and unlawful disclosures of information, while Section 2 dealt with breaches of official trust. A public interest defense was soon added to defend disclosures in the interest of the state. A similar law was passed in India in 1889, The Indian Official Secrets Act, which was associated with the British colonial era and used to crack down on newspapers that opposed the Raj's policies. A newer version was passed in 1923 that extended to all matters of confidential or secret information for governance. By the time of the First World War, multi-tier classification systems were used to communicate information to and from various fronts, which encouraged greater use of code making and breaking sections in diplomatic and military headquarters. Encoding became more sophisticated between the wars as machines were employed to scramble and unscramble information. The establishment of computer security inaugurated the history of information security. The need for such appeared during World War II. The volume of information shared by the Allied countries during the Second World War necessitated formal alignment of classification systems and procedural controls. An arcane range of markings evol

    Read more →
  • Data security

    Data security

    Data security or data protection is the process of securing digital information to protect it from online threats. Data security or protection means protecting digital data, such as those in a database, from destructive forces and from the unwanted actions of unauthorized users, such as a cyberattack or a data breach. Data security protects computer hardware, software, storage devices, and the data of user devices. Data security also protects the data of organizations, companies and administrative controls. Data security guarantees the protection of individual data, such as identity documents and bank data, and protects against unauthorized access, theft and loss of individual data. Data security also protects data breaches that occurs in companies and industries. Good security measures in industries reduce the probability of data breaches, and employees can rely on the company with their data and private information to be kept secured while companies can continue to maintain a stable reputation. The CIA Triad (Confidentiality, Integrity, and Availability) is what is used to practice what an information security is required to follow. Confidentiality, protects information from being accessed by unauthorized persons. Integrity, makes sure data is trustworthy; and Availability, meaning that data can be accessed by approved users when it is needed; are three goals for data security. Non-repudiation in data security definition, is a device/service that shows where the data originated from and the proof of integrity. == Technologies == === Disk encryption === Disk encryption refers to encryption technology that encrypts data on a hard disk drive. It takes data from a storage device and coverts it into an unreadable format. Disk encryption typically takes form in either software (see disk encryption software) or hardware (see disk encryption hardware) which can be used together. Disk encryption is often referred to as on-the-fly encryption (OTFE) or transparent encryption. Full disk encryption encrypts each individual sector of a disk volume. Files and user data are encrypted to hinder unauthorized users from accessing without a decryption key. A diversifier permits a plaintext of a specific disk sector to be encrypted into different ciphertexts, which does not require additional storage, such as an initialization vector (IV) or message authentication code (MAC). === Software versus hardware-based mechanisms for protecting data === Software-based security solutions encrypt the data to protect it from theft. However, a malicious program or a hacker could corrupt the data to make it unrecoverable, making the system unusable. Hardware-based security solutions prevent read and write access to data, which provides very strong protection against tampering and unauthorized access. Hardware-based security or assisted computer security offers an alternative to software-only computer security. Security tokens such as those using PKCS#11 or a mobile phone may be more secure due to the physical access required in order to be compromised. Access is enabled only when the token is connected and the correct PIN is entered (see two-factor authentication). However, dongles can be used by anyone who can gain physical access to it. Newer technologies in hardware-based security solve this problem by offering full proof of security for data. Working off hardware-based security: A hardware device allows a user to log in, log out and set different levels through manual actions. Many devices use biometric technology to prevent malicious users from logging in, logging out, and changing privilege levels. The current state of a user of the device is read by controllers in peripheral devices such as hard disks. Illegal access by a malicious user or a malicious program is interrupted based on the current state of a user by hard disk and DVD controllers making illegal access to data impossible. Hardware-based access control is more secure than the protection provided by the operating systems as operating systems are vulnerable to malicious attacks by viruses and hackers. The data on hard disks can be corrupted after malicious access is obtained. With hardware-based protection, the software cannot manipulate the user privilege levels. A hacker or a malicious program cannot gain access to secure data protected by hardware or perform unauthorized privileged operations. This assumption is broken only if the hardware itself is malicious or contains a backdoor. The hardware protects the operating system image and file system privileges from being tampered with. Therefore, a completely secure system can be created using a combination of hardware-based security and secure system administration policies. === Backups === Backup is the process of reproducing copies of essential data and storing in a separate, secured place. It is used to ensure data that is lost can be recovered from another source. Backups contains a minimum of one copy of the data that requires preservation. It is considered essential to keep a backup of any data in most industries and the process is recommended for any files of importance to a user. There are 3 types of backups; full backups, incremental backups, and differential backups. Full backups secure all data from a production system, such as a server, database, or other connected data source. It is impossible to lose all data in a full backup if a breach or corruption were to occur. Full backups require a significantly large amount of time to back up and may be time-consuming taking hours to days to complete. Incremental backups only secures changed data since last backup. While all backups are done in full backups, incremental backups only save data that is recently or frequently changed. Incremental backups require lower storage costs making it a prominent solution for growing datasets. === Data Privacy === Data privacy (or information privacy) is the right for individual's data to be secured to obstruct the use of unauthorized access. It gives individuals control over their data and how it can be shared to third parties. The U.S Privacy Protection Law (see Privacy laws of the United States) requires organizations to inform individuals of how their data is collected and when a data breach occurs. By implementing an encryption, it ensures that private data is unreadable to cybercriminals. === Data masking === Data masking of structured data is the process of obscuring (masking) specific data within a database table or cell to ensure that data security is maintained and sensitive information is not exposed to unauthorized personnel. This may include masking the data from users (for example so banking customer representatives can only see the last four digits of a customer's national identity number), developers (who need real production data to test new software releases but should not be able to see sensitive financial data), outsourcing vendors, etc. Data masking is a form of encryption, as it obscures data by modifying particular letters and numbers to keep data concealed and protected from potential hackers. The individual that has access to the code that decrypts the replaced characters are the only ones that can uncover the data. === Data erasure === Data erasure (or data deletion, data destruction) is a method of software-based overwriting that permanently clears all electronic data residing on a hard drive or other digital media to ensure that no sensitive data is lost when an asset is retired or reused. Article 17: Right to be Forgotten states that users have the right to permanently remove all of their private information from their old devices/services to give people more control over their data. Users are able to switch between devices efficiently. == Threats == === Malware === Malware (or malicious software) is designed to destroy, corrupt or gain unauthorized access to a computer for the purpose of stealing, or destroying data. Hackers who use malware typically utilize many types of malware, which includes computer virus, computer worms, ransomware, spyware and Trojan horse to create a vast system of disruption and cause easy data theft. One of the victims of the vast system of disruption includes healthcare workers, who are targeted by compromised systems by infections and then having their data attacked. === Phishing === Phishing is a type of scam that allows hackers to hoax people using psychological and social engineering (using human emotions such as their trust and fear) tactics into giving personal data through emails and messages, and install computer viruses if the individual were to click on a malicious link unknowingly. Attackers are able to create websites that are very similar to original websites, which makes it difficult to detect a fake website, causing individuals to fall for giving in information. Phishing attackers use human emotion to exploit them, such as making them feel fear, urgency, sympathy with the message

    Read more →
  • Semiotics of social networking

    Semiotics of social networking

    The semiotics of social networking discusses the images, symbols and signs used in systems that allow users to communicate and share experiences with each other. Examples of social networking systems include Facebook, Twitter and Instagram. == Semiotics == Semiotics is a discipline that studies images, symbols, signs and other similarly related objects in an effort to understand their use and meaning. Semiotic structuralism seeks the meaning of these objects within a social context. Post-structuralist theories take tools from structuralist semiotics in combination with social interaction, creating social semiotics. Social semiotics is “a branch of the field of semiotics which investigates human signifying practices in specific social and cultural circumstances and which tries to explain meaning-making as a social practice.” “Social semiotics also examines semiotic practices, specific to a culture and community, for the making of various kinds of texts and meanings in various situational contexts and contexts of culturally meaningful activity”. Social semiotics is concerned with studying human interactions. == Social networking == Social networking is the communication among people within a virtual social space. This medium of communication allows insight into the significance of social semiotics. “Millions of people now interact through blogs, collaborate through wikis, play multiplayer games, publish podcasts and video, build relationships through social network sites and evaluate all the above forms of communication through feedback and ranking mechanisms”. Social semiotics “unlike speech, writing necessitates some sort of technology in the form of person device interaction”. Social semiotics functions through the triad of communication or Peircean semiotics in the form of sign, object, interpretant (Chart 1) and “Human, Machine, Tag (Information)” (Chart 2). In Peircean semiotics (Chart 1), "A sign…[in the form of representamen] is something which stands to somebody for something in some respect or capacity. It addresses somebody, that is, creates in the mind of that person an equivalent sign, or perhaps a more developed sign. That sign which it creates I call the interpretant of the first sign. The sign stands for an object, not in all respects, but in reference to a sort of idea which I have something called the ground of the representamen". This example of the triangle of Human, Machine, Tag is shown when looking at tagging photographs on Facebook (Chart 3). The Human takes the photo on a camera and puts the digital file (information) on the Machine, the Machine is then navigated to Facebook where the file is downloaded. The Human has the Machine Tag the photo with information (e. g., names, places, data) for other Humans to see. This process then can be continued (see Chart 2). “Collaborative tagging has been quickly gaining ground because of its ability to recruit the activity of web users into effectively organizing and sharing large amounts of information”.

    Read more →
  • Scene text

    Scene text

    Scene text is text that appears in an image captured by a camera in an outdoor environment. The detection and recognition of scene text from camera captured images are computer vision tasks which became important after smart phones with good cameras became ubiquitous. The text in scene images varies in shape, font, colour and position. The recognition of scene text is further complicated sometimes by non-uniform illumination and focus. To improve scene text recognition, the International Conference on Document Analysis and Recognition (ICDAR) conducts a robust reading competition once in two years. The competition was held in 2003, 2005 and during every ICDAR conference. International association for pattern recognition (IAPR) has created a list of datasets as Reading systems. == Text detection == Text detection is the process of detecting the text present in the image, followed by surrounding it with a rectangular bounding box. Text detection can be carried out using image based techniques or frequency based techniques. In image based techniques, an image is segmented into multiple segments. Each segment is a connected component of pixels with similar characteristics. The statistical features of connected components are utilised to group them and form the text. Machine learning approaches such as support vector machine and convolutional neural networks are used to classify the components into text and non-text. In frequency based techniques, discrete Fourier transform (DFT) or discrete wavelet transform (DWT) are used to extract the high frequency coefficients. It is assumed that the text present in an image has high frequency components and selecting only the high frequency coefficients filters the text from the non-text regions in an image. == Word recognition == In word recognition, the text is assumed to be already detected and located and the rectangular bounding box containing the text is available. The word present in the bounding box needs to be recognized. The methods available to perform word recognition can be broadly classified into top-down and bottom-up approaches. In the top-down approaches, a set of words from a dictionary is used to identify which word suits the given image. Images are not segmented in most of these methods. Hence, the top-down approach is sometimes referred as segmentation free recognition. In the bottom-up approaches, the image is segmented into multiple components and the segmented image is passed through a recognition engine. Either an off the shelf Optical character recognition (OCR) engine or a custom-trained one is used to recognise the text.

    Read more →
  • Initialization vector

    Initialization vector

    In cryptography, an initialization vector (IV) or starting variable is an input to a cryptographic primitive being used to provide the initial state. The IV is typically required to be random or pseudorandom, but sometimes an IV only needs to be unpredictable or unique. Randomization is crucial for some encryption schemes to achieve semantic security, a property whereby repeated usage of the scheme under the same key does not allow an attacker to infer relationships between (potentially similar) segments of the encrypted message. For block ciphers, the use of an IV is described by the modes of operation. Some cryptographic primitives require the IV only to be non-repeating, and the required randomness is derived internally. In this case, the IV is commonly called a nonce (a number used only once), and the primitives (e.g. CBC) are considered stateful rather than randomized. This is because an IV need not be explicitly forwarded to a recipient but may be derived from a common state updated at both sender and receiver side. (In practice, a short nonce is still transmitted along with the message to consider message loss.) An example of stateful encryption schemes is the counter mode of operation, which has a sequence number for a nonce. The IV size depends on the cryptographic primitive used; for block ciphers it is generally the cipher's block-size. In encryption schemes, the unpredictable part of the IV has at best the same size as the key to compensate for time/memory/data tradeoff attacks. When the IV is chosen at random, the probability of collisions due to the birthday problem must be taken into account. Traditional stream ciphers such as RC4 do not support an explicit IV as input, and a custom solution for incorporating an IV into the cipher's key or internal state is needed. Some designs realized in practice are known to be insecure; the WEP protocol is a notable example, and is prone to related-IV attacks. == Motivation == A block cipher is one of the most basic primitives in cryptography, and frequently used for data encryption. However, by itself, it can only be used to encode a data block of a predefined size, called the block size. For example, a single invocation of the AES algorithm transforms a 128-bit plaintext block into a ciphertext block of 128 bits in size. The key, which is given as one input to the cipher, defines the mapping between plaintext and ciphertext. If data of arbitrary length is to be encrypted, a simple strategy is to split the data into blocks each matching the cipher's block size, and encrypt each block separately using the same key. This method is not secure as equal plaintext blocks get transformed into equal ciphertexts, and a third party observing the encrypted data may easily determine its content even when not knowing the encryption key. To hide patterns in encrypted data while avoiding the re-issuing of a new key after each block cipher invocation, a method is needed to randomize the input data. In 1980, the NIST published a national standard document designated Federal Information Processing Standard (FIPS) PUB 81, which specified four so-called block cipher modes of operation, each describing a different solution for encrypting a set of input blocks. The first mode implements the simple strategy described above, and was specified as the electronic codebook (ECB) mode. In contrast, each of the other modes describe a process where ciphertext from one block encryption step gets intermixed with the data from the next encryption step. To initiate this process, an additional input value is required to be mixed with the first block, and which is referred to as an initialization vector. For example, the cipher-block chaining (CBC) mode requires an unpredictable value, of size equal to the cipher's block size, as additional input. This unpredictable value is added to the first plaintext block before subsequent encryption. In turn, the ciphertext produced in the first encryption step is added to the second plaintext block, and so on. The ultimate goal for encryption schemes is to provide semantic security: by this property, it is practically impossible for an attacker to draw any knowledge from observed ciphertext. It can be shown that each of the three additional modes specified by the NIST are semantically secure under so-called chosen-plaintext attacks. == Properties == Properties of an IV depend on the cryptographic scheme used. A basic requirement is uniqueness, which means that no IV may be reused under the same key. For block ciphers, repeated IV values devolve the encryption scheme into electronic codebook mode: equal IV and equal plaintext result in equal ciphertext. In stream cipher encryption uniqueness is crucially important as plaintext may be trivially recovered otherwise. Example: Stream ciphers encrypt plaintext P to ciphertext C by deriving a key stream K from a given key and IV and computing C as C = P xor K. Assume that an attacker has observed two messages C1 and C2 both encrypted with the same key and IV. Then knowledge of either P1 or P2 reveals the other plaintext since C1 xor C2 = (P1 xor K) xor (P2 xor K) = P1 xor P2. Many schemes require the IV to be unpredictable by an adversary. This is effected by selecting the IV at random or pseudo-randomly. In such schemes, the chance of a duplicate IV is negligible, but the effect of the birthday problem must be considered. As for the uniqueness requirement, a predictable IV may allow recovery of (partial) plaintext. Example: Consider a scenario where a legitimate party called Alice encrypts messages using the cipher-block chaining mode. Consider further that there is an adversary called Eve that can observe these encryptions and is able to forward plaintext messages to Alice for encryption (in other words, Eve is capable of a chosen-plaintext attack). Now assume that Alice has sent a message consisting of an initialization vector IV1 and starting with a ciphertext block CAlice. Let further PAlice denote the first plaintext block of Alice's message, let E denote encryption, and let PEve be Eve's guess for the first plaintext block. Now, if Eve can determine the initialization vector IV2 of the next message she will be able to test her guess by forwarding a plaintext message to Alice starting with (IV2 xor IV1 xor PEve); if her guess was correct this plaintext block will get encrypted to CAlice by Alice. This is because of the following simple observation: CAlice = E(IV1 xor PAlice) = E(IV2 xor (IV2 xor IV1 xor PAlice)). Depending on whether the IV for a cryptographic scheme must be random or only unique the scheme is either called randomized or stateful. While randomized schemes always require the IV chosen by a sender to be forwarded to receivers, stateful schemes allow sender and receiver to share a common IV state, which is updated in a predefined way at both sides. == Block ciphers == Block cipher processing of data is usually described as a mode of operation. Modes are primarily defined for encryption as well as authentication, though newer designs exist that combine both security solutions in so-called authenticated encryption modes. While encryption and authenticated encryption modes usually take an IV matching the cipher's block size, authentication modes are commonly realized as deterministic algorithms, and the IV is set to zero or some other fixed value. == Stream ciphers == In stream ciphers, IVs are loaded into the keyed internal secret state of the cipher, after which a number of cipher rounds are executed prior to releasing the first bit of output. For performance reasons, designers of stream ciphers try to keep that number of rounds as small as possible, but because determining the minimal secure number of rounds for stream ciphers is not a trivial task, and considering other issues such as entropy loss, unique to each cipher construction, related-IVs and other IV-related attacks are a known security issue for stream ciphers, which makes IV loading in stream ciphers a serious concern and a subject of ongoing research. == WEP IV == The 802.11 encryption algorithm called WEP (short for Wired Equivalent Privacy) used a short, 24-bit IV, leading to reused IVs with the same key, which led to it being easily cracked. Packet injection allowed for WEP to be cracked in times as short as several seconds. This ultimately led to the deprecation of WEP. == SSL 2.0 IV == In cipher-block chaining mode (CBC mode), the IV need not be secret, but must be unpredictable (In particular, for any given plaintext, it must not be possible to predict the IV that will be associated to the plaintext in advance of the generation of the IV.) at encryption time. Additionally for the output feedback mode (OFB mode), the IV must be unique. In particular, the (previously) common practice of re-using the last ciphertext block of a message as the IV for the next message is insecure (for example, this method was used by SSL 2.0). If an attacker knows

    Read more →
  • Data product

    Data product

    In data management and product management, a data product is a reusable, active, and standardized data asset designed to deliver measurable value to its users, whether internal or external, by applying the rigorous principles of product thinking and management. It comprises one or more data artifacts (e.g., datasets, models, pipelines) and is enriched with metadata, including governance policies, data quality rules, data contracts, and, where applicable, a software bill of materials (SBOM) to document its dependencies and components. Ownership of a data product is aligned to a specific domain or use case, ensuring accountability, stewardship, and its continuous evolution throughout its lifecycle. Adhering to the FAIR principles – findable, accessible, interoperable, and reusable – a data product is designed to be discoverable, scalable, reusable, and aligned with both business and regulatory standards, driving innovation and efficiency in modern data ecosystems. == History == In 2012, DJ Patil proposed the first documented definition: a data product is a product that facilitates an end goal through the use of data. In 2019, Zhamak Dehghani introduced Data Mesh, with a strong focus on domain-oriented data products. Later, in 2020, she solidifies Data Mesh around four principles, one being Data as a Product, in which she defines Data Product as the node on the mesh that encapsulates three structural components required for its function, providing access to the domain's analytical data as a product. In 2024, Andrea Gioia published one of the first books specifically on data products post Data Mesh announcement. In his book, Gioia defines the concept of pure data product. In 2025, during the Data Day Texas conference, Jean-Georges Perrin and a collective of product managers and data engineers got together to craft the current definition and make it available to the public domain. In July 2025, Bitol, a project of The Linux Foundation, released and early version of the Open Data Product Standard (ODPS) aiming at normalizing data products

    Read more →