AI Code Janitor

AI Code Janitor — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Speech segmentation

    Speech segmentation

    Speech segmentation is the process of identifying the boundaries between words, syllables, or phonemes in spoken natural languages. The term applies both to the mental processes used by humans, and to artificial processes of natural language processing. In the field of automatic pronunciation assessment, the process of segmenting an utterance against expected word(s) is called forced alignment. Speech segmentation is a subfield of general speech perception and an important subproblem of the technologically focused field of speech recognition, and cannot be adequately solved in isolation. As in most natural language processing problems, one must take into account context, grammar, and semantics, and even so the result is often a probabilistic division (statistically based on likelihood) rather than a categorical one. Though it seems that coarticulation—a phenomenon which may happen between adjacent words just as easily as within a single word—presents the main challenge in speech segmentation across languages, some other problems and strategies employed in solving those problems can be seen in the following sections. This problem overlaps to some extent with the problem of text segmentation that occurs in some languages which are traditionally written without inter-word spaces, like Chinese and Japanese, compared to writing systems which indicate speech segmentation between words by a word divider, such as the space. However, even for those languages, text segmentation is often much easier than speech segmentation, because the written language usually has little interference between adjacent words, and often contains additional clues not present in speech (such as the use of Chinese characters for word stems in Japanese). == Lexical recognition == In natural languages, the meaning of a complex spoken sentence can be understood by decomposing it into smaller lexical segments (roughly, the words of the language), associating a meaning to each segment, and combining those meanings according to the grammar rules of the language. Though lexical recognition is not thought to be used by infants in their first year, due to their highly limited vocabularies, it is one of the major processes involved in speech segmentation for adults. Three main models of lexical recognition exist in current research: first, whole-word access, which argues that words have a whole-word representation in the lexicon; second, decomposition, which argues that morphologically complex words are broken down into their morphemes (roots, stems, inflections, etc.) and then interpreted and; third, the view that whole-word and decomposition models are both used, but that the whole-word model provides some computational advantages and is therefore dominant in lexical recognition. To give an example, in a whole-word model, the word "cats" might be stored and searched for by letter, first "c", then "ca", "cat", and finally "cats". The same word, in a decompositional model, would likely be stored under the root word "cat" and could be searched for after removing the "s" suffix. "Falling", similarly, would be stored as "fall" and suffixed with the "ing" inflection. Though proponents of the decompositional model recognize that a morpheme-by-morpheme analysis may require significantly more computation, they argue that the unpacking of morphological information is necessary for other processes (such as syntactic structure) which may occur parallel to lexical searches. As a whole, research into systems of human lexical recognition is limited due to little experimental evidence that fully discriminates between the three main models. In any case, lexical recognition likely contributes significantly to speech segmentation through the contextual clues it provides, given that it is a heavily probabilistic system—based on the statistical likelihood of certain words or constituents occurring together. For example, one can imagine a situation where a person might say "I bought my dog at a ____ shop" and the missing word's vowel is pronounced as in "net", "sweat", or "pet". While the probability of "netshop" is extremely low, since "netshop" isn't currently a compound or phrase in English, and "sweatshop" also seems contextually improbable, "pet shop" is a good fit because it is a common phrase and is also related to the word "dog". Moreover, an utterance can have different meanings depending on how it is split into words. A popular example, often quoted in the field, is the phrase "How to wreck a nice beach", which sounds very similar to "How to recognize speech". As this example shows, proper lexical segmentation depends on context and semantics which draws on the whole of human knowledge and experience, and would thus require advanced pattern recognition and artificial intelligence technologies to be implemented on a computer. Lexical recognition is of particular value in the field of computer speech recognition, since the ability to build and search a network of semantically connected ideas would greatly increase the effectiveness of speech-recognition software. Statistical models can be used to segment and align recorded speech to words or phones. Applications include automatic lip-synch timing for cartoon animation, follow-the-bouncing-ball video sub-titling, and linguistic research. Automatic segmentation and alignment software is commercially available. == Phonotactic cues == For most spoken languages, the boundaries between lexical units are difficult to identify; phonotactics are one answer to this issue. One might expect that the inter-word spaces used by many written languages like English or Spanish would correspond to pauses in their spoken version, but that is true only in very slow speech, when the speaker deliberately inserts those pauses. In normal speech, one typically finds many consecutive words being said with no pauses between them, and often the final sounds of one word blend smoothly or fuse with the initial sounds of the next word. The notion that speech is produced like writing, as a sequence of distinct vowels and consonants, may be a relic of alphabetic heritage for some language communities. In fact, the way vowels are produced depends on the surrounding consonants just as consonants are affected by surrounding vowels; this is called coarticulation. For example, in the word "kit", the [k] is farther forward than when we say 'caught'. But also, the vowel in "kick" is phonetically different from the vowel in "kit", though we normally do not hear this. In addition, there are language-specific changes which occur in casual speech which makes it quite different from spelling. For example, in English, the phrase "hit you" could often be more appropriately spelled "hitcha". From a decompositional perspective, in many cases, phonotactics play a part in letting speakers know where to draw word boundaries. In English, the word "strawberry" is perceived by speakers as consisting (phonetically) of two parts: "straw" and "berry". Other interpretations such as "stra" and "wberry" are inhibited by English phonotactics, which does not allow the cluster "wb" word-initially. Other such examples are "day/dream" and "mile/stone" which are unlikely to be interpreted as "da/ydream" or "mil/estone" due to the phonotactic probability or improbability of certain clusters. The sentence "Five women left", which could be phonetically transcribed as [faɪvwɪmɘnlɛft], is marked since neither /vw/ in /faɪvwɪmɘn/ nor /nl/ in /wɪmɘnlɛft/ are allowed as syllable onsets or codas in English phonotactics. These phonotactic cues often allow speakers to easily distinguish the boundaries in words. Vowel harmony in languages like Finnish can also serve to provide phonotactic cues. While the system does not allow front vowels and back vowels to exist together within one morpheme, compounds allow two morphemes to maintain their own vowel harmony while coexisting in a word. Therefore, in compounds such as "selkä/ongelma" ('back problem') where vowel harmony is distinct between two constituents in a compound, the boundary will be wherever the switch in harmony takes place—between the "ä" and the "ö" in this case. Still, there are instances where phonotactics may not aid in segmentation. Words with unclear clusters or uncontrasted vowel harmony as in "opinto/uudistus" ('student reform') do not offer phonotactic clues as to how they are segmented. From the perspective of the whole-word model, however, these words are thought be stored as full words, so the constituent parts would not necessarily be relevant to lexical recognition. == In infants and non-natives == Infants are one major focus of research in speech segmentation. Since infants have not yet acquired a lexicon capable of providing extensive contextual clues or probability-based word searches within their first year, as mentioned above, they must often rely primarily upon phonotactic and rhythmic cues (with prosody being the dominant cue), all

    Read more →
  • Social computing

    Social computing

    Social computing is an area of computer science that is concerned with the intersection of social behavior and computational systems. It is based on creating or fostering existing social conventions and social contexts through the use of software and technology. Blogs, email, instant messaging, social network services, wikis, social bookmarking and other instances of what is often called social software illustrate ideas from social computing. The rise in social computing is attributed to the prevalence of personal devices and increased overall computing power. This enables a growing number of users to participate in sharing content and interact with another. == Definitions == Humans—and human behavior—are profoundly social. Humans tend to orient to one another and develop abilities to interact with each other and other species. This ranges from expression and gesture through spoken, written, and body language. Humans are influenced by the behavior of those around them and can rely on social context and cues to make decisions. An example of a behavior relying on social contexts is applauding at the end of the play. This is based on the context that the show ended, and other audience members are applauding. Social information provides a basis for inferences, planning, and coordinating activity. == Examples == Common tools include blogs, email, instant messaging, social networking sites, wikis, and social bookmarking platforms. These technologies enable users to generate content, share knowledge, and interact in real time. == Applications == The rise of social computing has highlighted opportunities for businesses. Businesses are interacting on social computing platforms and investing in facilities to support and research social computing.Business models can leverage the massive customer bases that accumulate through social computing channels. Some organizations have started their own blogs and networks (McAfee, 2006, Joe, 2005). Organizations from diverse industry sectors such as Google, Cisco, and Fox, have sought to acquire or invest in successful social computing enterprises. A business blog can serve as a source of information and promotion for the company. This allows the company to share content about the company and their initiatives. Businesses have also interacted with social computing to market themselves and interact with customers. A notable example is Wendy's with their X (formerly Twitter) account. The account was primarily used to promote business promotions and interact with users in a playful or meaningful way. E-commerce web sites have allowed users to leave reviews and feedback on purchases which has improved online shopping experience for sellers and consumers.As another example of social computing’s business applications, many e-commerce Web sites have adopted online product/vendor feedback/reputation systems. Such systems provide an asynchronous platform for the consumer community to share experiences collectively and influence their purchasing behavior. They also provide a vehicle for eliciting feedback information valuable to the vendors and e-commerce site operators.Consumers can use the feedback systems to make a more educated choice on a purchase by comparing reviews between products or vendors. Sellers can track consumer behaviors and trends regarding a product and adjust their supply according to the demand. == Challenges and criticism == Social computing raises several concerns related to privacy, data security, and algorithmic bias. The widespread collection and analysis of user-generated data can lead to ethical dilemmas, especially when users are unaware of how their information is used. Critics also highlight issues of digital labor, surveillance, and the spread of misinformation, which can influence public opinion and social dynamics. === Term appearance === The term appeared in the mid 1990s after technology advancements and development of the web. In 1994, the concept of social computing was first proposed by Schuler. He thought, "Social computing is a computing application, with software as the medium or focus of social relationships." === Premise === The premise of social computing is that it is possible to design digital systems that support useful functionality by making socially produced information available to their users. This information may be provided directly, as when systems show the number of users who have rated a review as helpful or not. Or the information may be provided after being filtered and aggregated, as is done when systems recommend a product based on what else people with similar purchase history have purchased. Alternatively, the information may be provided indirectly, as is the case with Google's page rank algorithms which orders search results based on the number of pages that (recursively) point to them. In all of these cases, information that is produced by a group of people is used to provide or enhance the functioning of a system. Social computing is concerned with systems of this sort and the mechanisms and principles that underlie them. Social computing can be defined as follows: "Social Computing" refers to systems that support the gathering, representation, processing, use, and dissemination of information that is distributed across social collectivities such as teams, communities, organizations, and markets. Moreover, the information is not "anonymous" but is significantly precise because it is linked to people, who are in turn linked to other people. More recent definitions, however, have foregone the restrictions regarding anonymity of information, acknowledging the continued spread and increasing pervasiveness of social computing. As an example, Hemmatazad, N. (2014) defined social computing as "the use of computational devices to facilitate or augment the social interactions of their users, or to evaluate those interactions in an effort to obtain new information." Social computing has to do with supporting "computations" that are carried out by groups of people, an idea that has been popularized in James Surowiecki's book, The Wisdom of Crowds. Examples of social computing in this sense include collaborative filtering, online auctions, reputation systems, computational social choice, tagging, and verification games. The social information processing page focuses on this sense of social computing. == History == === Technology infrastructure === Users were able to interact more with websites after the development of Web 2.0. This was an advancement from Web 1.0. Comode G. and Krishnamurthy B. (2008) note that "content creators were few in Web 1.0 with the vast majority of users simply acting as consumers of content." Web 2.0 provided functionalities that allowed for low-cost web-hosting services and introduced features with browser windows that used basic information structure and expanded it to as many devices as possible using HTTP, or Hypertext Transfer Protocol. Sometimes referred to as "Enterprise 2.0", a term derived from Web 2.0, social software for enterprise generally refers to the use of social computing in corporate intranets and in other medium- and large-scale business environments. It consisted of a class of tools that allowed for networking and social changes to businesses at the time. It was a layering of the business tools on Web 2.0 and brought forth several applications and collaborative software with specific uses. FinanceElectronic negotiation, which first came up in 1969 and was adapted over time to suit financial markets networking needs, represents an important and desirable coordination mechanism for electronic markets. Negotiation between agents (software agents as well as humans) allows cooperative and competitive sharing of information to determine a proper price. Recent research and practice has also shown that electronic negotiation is beneficial for the coordination of complex interactions among organizations. Electronic negotiation has recently emerged as a very dynamic, interdisciplinary research area covering aspects from disciplines such as Economics, Information Systems, Computer Science, Communication Theory, Sociology and Psychology.Social computing has become more widely known because of its relationship to a number of recent trends. These include the growing popularity of social software and Web 3.0, increased academic interest in social network analysis, the rise of open source as a viable method of production, and a growing conviction that all of this can have a profound impact on daily life. A February 13, 2006 paper by market research company Forrester Research suggested that: === Developments === PLATO was one of the earliest examples of social computing in a live production environment with initially hundreds and soon thousands of users. The PLATO computer system was developed by the University of Illinois at Urbana Champaign in 1960s. In the 70s, the system supported social software applications for multi-us

    Read more →
  • Data thinking

    Data thinking

    Data Thinking is a framework that integrates data science with the design process. It combines computational thinking, statistical thinking, and domain-specific knowledge to guide the development of data-driven solutions in product development. The framework is used to explore, design, develop, and validate solutions, with a focus on user experience and data analytics, including data collection and interpretation The framework aims to apply data literacy and inform decision-making through data-driven insights. == Major components == According to "Computational thinking in the era of data science": Data thinking involves understanding that solutions require both data-driven and domain-knowledge-driven rules. Data thinking evaluates whether data accurately represents real-life scenarios and improves data collection where necessary. The framework highlights the importance of preserving domain-specific meaning during data analysis. Data thinking incorporates statistical and logical analysis to identify patterns and irregularities. Data thinking involves testing solutions in real-life contexts and iteratively improving models based on new data. The process requires evaluating problems from multiple abstraction levels and understanding the potential for biases in generalizations. == Major phases == === Strategic context and risk analysis === Analyzing the broader digital strategy and assessing risks and opportunities is a common step before beginning a project. Techniques like coolhunting, trend analysis, and scenario planning can be used to assist with this. === Ideation and exploration === In this phase, focus areas are identified, and use cases are developed by integrating organizational goals, user needs, and data requirements. Design thinking methods, such as personas and customer journey mapping, are applied. === Prototyping === A proof of concept is created to test feasibility and refine solutions through iterative evaluation to optimize for effective performance. === Implementation and monitoring === Solutions are tested and monitored for performance and continual improvement. == Implementing Data Thinking == The following resources explain more about data thinking and its applications: "Data Thinking: Framework for data-based solutions" by StackFuel "What is Data Thinking? A modern approach to designing a data strategy" by Mantel Group "Data Science Thinking" by SpringerLink These sources provide detailed insights into the methodology, phases, and benefits of adopting Data Thinking in organizational processes.

    Read more →
  • Multiple encryption

    Multiple encryption

    Multiple encryption is the process of encrypting an already encrypted message one or more times, either using the same or a different algorithm. It is also known as cascade encryption, cascade ciphering, cipher stacking, multiple encryption, and superencipherment. Superencryption refers to the outer-level encryption of a multiple encryption. Some cryptographers, like Matthew Green of Johns Hopkins University, say multiple encryption addresses a problem that mostly doesn't exist: Modern ciphers rarely get broken... You’re far more likely to get hit by malware or an implementation bug than you are to suffer a catastrophic attack on Advanced Encryption Standard (AES). However, from the previous quote an argument for multiple encryption can be made, namely poor implementation. Using two different cryptomodules and keying processes from two different vendors requires both vendors' wares to be compromised for security to fail completely. == Independent keys == Picking any two ciphers, if the key used is the same for both, the second cipher could possibly undo the first cipher, partly or entirely. This is true of ciphers where the decryption process is exactly the same as the encryption process (a reciprocal cipher) – the second cipher would completely undo the first. If an attacker were to recover the key through cryptanalysis of the first encryption layer, the attacker could possibly decrypt all the remaining layers, assuming the same key is used for all layers. To prevent that risk, one can use keys that are statistically independent for each layer (e.g. independent RNGs). Ideally each key should have separate and different generation, sharing, and management processes. == Independent Initialization Vectors == For en/decryption processes that require sharing an Initialization Vector (IV) / nonce these are typically, openly shared or made known to the recipient (and everyone else). Its good security policy never to provide the same data in both plaintext and ciphertext when using the same key and IV. Therefore, its recommended (although at this moment without specific evidence) to use separate IVs for each layer of encryption. == Importance of the first layer == With the exception of the one-time pad, no cipher has been theoretically proven to be unbreakable. Furthermore, some recurring properties may be found in the ciphertexts generated by the first cipher. Since those ciphertexts are the plaintexts used by the second cipher, the second cipher may be rendered vulnerable to attacks based on known plaintext properties (see references below). This is the case when the first layer is a program P that always adds the same string S of characters at the beginning (or end) of all ciphertexts (commonly known as a magic number). When found in a file, the string S allows an operating system to know that the program P has to be launched in order to decrypt the file. This string should be removed before adding a second layer. To prevent this kind of attack, one can use the method provided by Bruce Schneier: Generate a random pad R of the same size as the plaintext. Encrypt R using the first cipher and key. XOR the plaintext with the pad, then encrypt the result using the second cipher and a different (!) key. Concatenate both ciphertexts in order to build the final ciphertext. A cryptanalyst must break both ciphers to get any information. This will, however, have the drawback of making the ciphertext twice as long as the original plaintext. Note, however, that a weak first cipher may merely make a second cipher that is vulnerable to a chosen plaintext attack also vulnerable to a known plaintext attack. However, a block cipher must not be vulnerable to a chosen plaintext attack to be considered secure. Therefore, the second cipher described above is not secure under that definition, either. Consequently, both ciphers still need to be broken. The attack illustrates why strong assumptions are made about secure block ciphers and ciphers that are even partially broken should never be used. == The Rule of Two == The Rule of Two is a data security principle from the NSA's Commercial Solutions for Classified Program (CSfC). It specifies two completely independent layers of cryptography to protect data. For example, data could be protected by both hardware encryption at its lowest level and software encryption at the application layer. It could mean using two FIPS-validated software cryptomodules from different vendors to en/decrypt data. The importance of vendor and/or model diversity between the layers of components centers around removing the possibility that the manufacturers or models will share a vulnerability. This way if one components is compromised there is still an entire layer of encryption protecting the information at rest or in transit. The CSfC Program offers solutions to achieve diversity in two ways. "The first is to implement each layer using components produced by different manufacturers. The second is to use components from the same manufacturer, where that manufacturer has provided NSA with sufficient evidence that the implementations of the two components are independent of one another." The principle is practiced in the NSA's secure mobile phone called Fishbowl. The phones use two layers of encryption protocols, IPsec and Secure Real-time Transport Protocol (SRTP), to protect voice communications. The Samsung Galaxy S9 Tactical Edition is also an approved CSfC Component.

    Read more →
  • Active learning (machine learning)

    Active learning (machine learning)

    Active learning is a special case of machine learning in which a learning algorithm can interactively query a human user (or some other information source) to label new data points with the desired outputs. The human user must possess expertise in the problem domain, including the ability to consult authoritative sources when necessary. In statistics literature, it is sometimes also called optimal experimental design. The information source is also called teacher or oracle. There are situations in which unlabeled data is abundant but manual labeling is expensive. In such a scenario, learning algorithms can actively query the teacher for labels. Since the learner chooses the examples, the number of examples to learn a concept can often be much lower than the number required in normal supervised learning. However, there is a risk that the algorithm is overwhelmed by uninformative examples. Recent developments are dedicated to multi-label active learning, hybrid active learning and active learning in a single-pass (on-line) context, combining concepts from the field of machine learning (e.g. conflict and ignorance) with adaptive, incremental learning policies in the field of online machine learning. Using active learning allows for faster development of a machine learning algorithm, when comparative updates would require a quantum or super computer. Large-scale active learning projects may benefit from crowdsourcing frameworks such as Amazon Mechanical Turk that include many humans in the active learning loop. == Definitions == Let T be the total set of all data under consideration. For example, in a protein engineering problem, T would include all proteins that are known to have a certain interesting activity and all additional proteins that one might want to test for that activity. During each iteration, i, T is broken up into three subsets T K , i {\displaystyle \mathbf {T} _{K,i}} : Data points where the label is known. T U , i {\displaystyle \mathbf {T} _{U,i}} : Data points where the label is unknown. T C , i {\displaystyle \mathbf {T} _{C,i}} : A subset of TU,i that is chosen to be labeled. Most of the current research in active learning involves the best method to choose the data points for TC,i. == Scenarios == Pool-based sampling: In this approach, which is the most well known scenario, the learning algorithm attempts to evaluate the entire dataset before selecting data points (instances) for labeling. It is often initially trained on a fully labeled subset of the data using a machine-learning method such as logistic regression or SVM that yields class-membership probabilities for individual data instances. The candidate instances are those for which the prediction is most ambiguous. Instances are drawn from the entire data pool and assigned a confidence score, a measurement of how well the learner "understands" the data. The system then selects the instances for which it is the least confident and queries the teacher for the labels. The theoretical drawback of pool-based sampling is that it is memory-intensive and is therefore limited in its capacity to handle enormous datasets, but in practice, the rate-limiting factor is that the teacher is typically a (fatiguable) human expert who must be paid for their effort, rather than computer memory. Stream-based selective sampling: Here, each consecutive unlabeled instance is examined one at a time with the machine evaluating the informativeness of each item against its query parameters. The learner decides for itself whether to assign a label or query the teacher for each datapoint. As contrasted with Pool-based sampling, the obvious drawback of stream-based methods is that the learning algorithm does not have sufficient information, early in the process, to make a sound assign-label-vs ask-teacher decision, and it does not capitalize as efficiently on the presence of already labeled data. Therefore, the teacher is likely to spend more effort in supplying labels than with the pool-based approach. Membership query synthesis: This is where the learner generates synthetic data from an underlying natural distribution. For example, if the dataset are pictures of humans and animals, the learner could send a clipped image of a leg to the teacher and query if this appendage belongs to an animal or human. This is particularly useful if the dataset is small. The challenge here, as with all synthetic-data-generation efforts, is in ensuring that the synthetic data is consistent in terms of meeting the constraints on real data. As the number of variables/features in the input data increase, and strong dependencies between variables exist, it becomes increasingly difficult to generate synthetic data with sufficient fidelity. For example, to create a synthetic data set for human laboratory-test values, the sum of the various white blood cell (WBC) components in a white blood cell differential must equal 100, since the component numbers are really percentages. Similarly, the enzymes alanine transaminase (ALT) and aspartate transaminase (AST) measure liver function (though AST is also produced by other tissues, e.g., lung, pancreas) A synthetic data point with AST at the lower limit of normal range (8–33 units/L) with an ALT several times above normal range (4–35 units/L) in a simulated chronically ill patient would be physiologically impossible. == Query strategies == Algorithms for determining which data points should be labeled can be organized into a number of different categories, based upon their purpose: Balance exploration and exploitation: the choice of examples to label is seen as a dilemma between the exploration and the exploitation over the data space representation. This strategy manages this compromise by modelling the active learning problem as a contextual bandit problem. For example, Bouneffouf et al. propose a sequential algorithm named Active Thompson Sampling (ATS), which, in each round, assigns a sampling distribution on the pool, samples one point from this distribution, and queries the oracle for this sample point label. Expected model change: label those points that would most change the current model. Expected error reduction: label those points that would most reduce the model's generalization error. Exponentiated Gradient Exploration for Active Learning: In this paper, the author proposes a sequential algorithm named exponentiated gradient (EG)-active that can improve any active learning algorithm by an optimal random exploration. Uncertainty sampling: label those points for which the current model is least certain as to what the correct output should be. Query by committee: a variety of models are trained on the current labeled data, and vote on the output for unlabeled data; label those points for which the "committee" disagrees the most Querying from diverse subspaces or partitions: When the underlying model is a forest of trees, the leaf nodes might represent (overlapping) partitions of the original feature space. This offers the possibility of selecting instances from non-overlapping or minimally overlapping partitions for labeling. Variance reduction: label those points that would minimize output variance, which is one of the components of error. Conformal prediction: predicts that a new data point will have a label similar to old data points in some specified way and degree of the similarity within the old examples is used to estimate the confidence in the prediction. Mismatch-first farthest-traversal: The primary selection criterion is the prediction mismatch between the current model and nearest-neighbour prediction. It targets on wrongly predicted data points. The second selection criterion is the distance to previously selected data, the farthest first. It aims at optimizing the diversity of selected data. User-centered labeling strategies: Learning is accomplished by applying dimensionality reduction to graphs and figures like scatter plots. Then the user is asked to label the compiled data (categorical, numerical, relevance scores, relation between two instances). A wide variety of algorithms have been studied that fall into these categories. While the traditional AL strategies can achieve remarkable performance, it is often challenging to predict in advance which strategy is the most suitable in a particular situation. In recent years, meta-learning algorithms have been gaining in popularity. Some of them have been proposed to tackle the problem of learning AL strategies instead of relying on manually designed strategies. A benchmark which compares 'meta-learning approaches to active learning' to 'traditional heuristic-based Active Learning' may give intuitions if 'Learning active learning' is at the crossroads == Minimum marginal hyperplane == Some active learning algorithms are built upon support-vector machines (SVMs) and exploit the structure of the SVM to determine which data points to label. Such methods usually calculate the margin, W, of each u

    Read more →
  • Simply Local

    Simply Local

    Simply Local is a decentralized community social networking and neighborhood broadcasting service developed by Simply Local, based in New Delhi. The app is used as a tool by residents to bridge the information gap and know what is happening in the locality. Simply Local creates private geo-fenced networks for people living in an area and provides social and community related services within that network. The user doesn’t post to a single person but broadcasts to a chosen community. One of its primary purposes is also to connect citizens to their elected representatives. Each community is independent of the other and information shared remains telescoped to that particular community. The app has been designed to maintain privacy and security of users and provides decentralized social networking in the sense that it forms an owner-independent, micro community, which is not connected with the world outside. Simply Local is available on Android Play and iOS App Store. It is available in two languages - English and Hindi. Simply Local’s founder and CEO is Nikhil Bapna. == History == 2020 May: Included as a Top 5 Useful App by Zee News. 2020: Used to connect candidates with local residents during the Delhi assembly elections. 2019: Renamed from Gadfly to its current name. 2018: Used for Karnataka State Elections to get detailed information on candidates. 2017: Launched under the name Gadfly as a tool to connect citizens with their elected representatives.

    Read more →
  • Coreu

    Coreu

    COREU (French: Correspondance Européenne – Telex network of European correspondents, also EUKOR-Netzwerk in Austria) is a communication network of the European Union for the communication of the Council of the European Union, the European correspondents of the foreign ministries of the EU member states, permanent representatives of member states in Brussels, the European Commission, and the General Secretariat of the Council of the European Union. The European Parliament is not among the participants. COREU is the European equivalent of the American Secret Internet Protocol Router Network (SIPRNet, also known as Intelink-S). COREU's official aim is fast communication in case of crisis. The network enables a closer cooperation in matters regarding foreign affairs. In actuality the system's function exceeds that of mere communication, it also enables decision-making. COREU's first goal is to enable the exchange of information before and after decisions. Relaying upfront negotiations in preparation of meetings is the second goal. In addition, the system also allows the editing of documents and the decision-making, especially if there is little time. While the first two goals are preparatory measures for a shared foreign policy, the third is a methodical variant marked by practise that is defining for the image of the Common Foreign and Security Policy. == Members == (The following information dates from 2013): There is one representative in each of the capital cities in the EU.(since 1973) In Germany for example, this is the European correspondent (EU-KOR) from the Foreign Office. In Austria it is the European correspondent from the Referat II.1.a in the Federal Ministry for Europe, Integration and Foreign Affairs They are the correspondents (since 1982) for the European Commission They comprise the secretariat for the European Council They also make up the European External Action Service (EEAS) (responsible for foreign policy issues, since 1987) == Data volume and technical details == COREU functions as a spoke-hub distribution paradigm system with the hub in Brussels. The network is operated by the European Union Intelligence and Situation Centre (formerly Joint Situation Center, JSC). The technical infrastructure is located in a building of the European Council. COREU may be described as an advanced telex system with encrypted messages via dedicated terminals. Once a message has reached the destination, it is then redistributed via the local media. In contrast, messages of governments are transmitted via local media to the correspondents and from there delivered point-to-point to Brussels via COREU. In 2010, approximately 8500 communications had been distributed over this network. == History == A telex-based communication system under the name COREU was established in 1973. Originally, only the ministries of Foreign Affairs in the European capitals were connected to it. This telex system was replaced in 1997 by the mail system CORTESY (COREU Terminal Equipment System). The name was retained despite the technical innovation. COREU was reportedly compromised by hackers working for the People's Liberation Army Strategic Support Force, allowing for the theft of thousands of low-classified documents and diplomatic cables.

    Read more →
  • Information Networking Institute

    Information Networking Institute

    Information Networking Institute (INI) is an academic department within the College of Engineering at Carnegie Mellon University. The institute was established in 1989 as the nation's first research and education center devoted to information networking. The INI also partners with research and outreach entities to extend educational and training programs to a broad audience of people using information networking as part of their daily lives. The INI is the educational partner of Carnegie Mellon CyLab, a university-wide, multidisciplinary research center involving more than 50 faculty and 100 graduate students. == Center of Academic Excellence Designations == Through the work of the INI and CyLab, Carnegie Mellon University has been designated by the National Security Agency and the Department of Homeland Security as a National Center of Academic Excellence in Information Assurance/Cyber Defense Education (CAE-IA/CD) and a National Center of Academic Excellence in Information Assurance/Cyber Defense Research (CAE-R). It has also been designated by the NSA and the U.S. Cyber Command as a National Center of Academic Excellence in Cyber Operations (CAE-Cyber Ops). Through these designations, the INI and CyLab participate in the: Federal CyberCorps Scholarship for Service (SFS) Program - Students pursuing graduate degrees in information security (MSIS or MSISPM) are eligible for scholarships under the SFS program. Information Assurance Scholarship Program (IASP) - Students pursuing graduate degrees in information security and seeking careers with the Department of Defense may be eligible for scholarships under the IASP. Capacity Building Program for Faculty from Historically Black and Hispanic Serving Institutions - The INI and CyLab developed a month-long, in-residence summer program to help build information assurance education and research capacity at colleges and universities designated as Minority Serving Institutions – specifically, Historically Black Colleges and Universities (HBCUs) and Hispanic Serving Institutions (HSIs). This program is supported through a grant from the National Science Foundation. == Faculty and researchers == Faculty involved in teaching and advising in the INI programs are conducting research in all aspects of information networking and information security. Affiliated research centers are: Carnegie Mellon CyLab SEI's CERT Division == Alumni == The INI has graduated over 1,400 alumni who currently occupy positions in a variety of sectors across industry, government and academia.

    Read more →
  • Active learning (machine learning)

    Active learning (machine learning)

    Active learning is a special case of machine learning in which a learning algorithm can interactively query a human user (or some other information source) to label new data points with the desired outputs. The human user must possess expertise in the problem domain, including the ability to consult authoritative sources when necessary. In statistics literature, it is sometimes also called optimal experimental design. The information source is also called teacher or oracle. There are situations in which unlabeled data is abundant but manual labeling is expensive. In such a scenario, learning algorithms can actively query the teacher for labels. Since the learner chooses the examples, the number of examples to learn a concept can often be much lower than the number required in normal supervised learning. However, there is a risk that the algorithm is overwhelmed by uninformative examples. Recent developments are dedicated to multi-label active learning, hybrid active learning and active learning in a single-pass (on-line) context, combining concepts from the field of machine learning (e.g. conflict and ignorance) with adaptive, incremental learning policies in the field of online machine learning. Using active learning allows for faster development of a machine learning algorithm, when comparative updates would require a quantum or super computer. Large-scale active learning projects may benefit from crowdsourcing frameworks such as Amazon Mechanical Turk that include many humans in the active learning loop. == Definitions == Let T be the total set of all data under consideration. For example, in a protein engineering problem, T would include all proteins that are known to have a certain interesting activity and all additional proteins that one might want to test for that activity. During each iteration, i, T is broken up into three subsets T K , i {\displaystyle \mathbf {T} _{K,i}} : Data points where the label is known. T U , i {\displaystyle \mathbf {T} _{U,i}} : Data points where the label is unknown. T C , i {\displaystyle \mathbf {T} _{C,i}} : A subset of TU,i that is chosen to be labeled. Most of the current research in active learning involves the best method to choose the data points for TC,i. == Scenarios == Pool-based sampling: In this approach, which is the most well known scenario, the learning algorithm attempts to evaluate the entire dataset before selecting data points (instances) for labeling. It is often initially trained on a fully labeled subset of the data using a machine-learning method such as logistic regression or SVM that yields class-membership probabilities for individual data instances. The candidate instances are those for which the prediction is most ambiguous. Instances are drawn from the entire data pool and assigned a confidence score, a measurement of how well the learner "understands" the data. The system then selects the instances for which it is the least confident and queries the teacher for the labels. The theoretical drawback of pool-based sampling is that it is memory-intensive and is therefore limited in its capacity to handle enormous datasets, but in practice, the rate-limiting factor is that the teacher is typically a (fatiguable) human expert who must be paid for their effort, rather than computer memory. Stream-based selective sampling: Here, each consecutive unlabeled instance is examined one at a time with the machine evaluating the informativeness of each item against its query parameters. The learner decides for itself whether to assign a label or query the teacher for each datapoint. As contrasted with Pool-based sampling, the obvious drawback of stream-based methods is that the learning algorithm does not have sufficient information, early in the process, to make a sound assign-label-vs ask-teacher decision, and it does not capitalize as efficiently on the presence of already labeled data. Therefore, the teacher is likely to spend more effort in supplying labels than with the pool-based approach. Membership query synthesis: This is where the learner generates synthetic data from an underlying natural distribution. For example, if the dataset are pictures of humans and animals, the learner could send a clipped image of a leg to the teacher and query if this appendage belongs to an animal or human. This is particularly useful if the dataset is small. The challenge here, as with all synthetic-data-generation efforts, is in ensuring that the synthetic data is consistent in terms of meeting the constraints on real data. As the number of variables/features in the input data increase, and strong dependencies between variables exist, it becomes increasingly difficult to generate synthetic data with sufficient fidelity. For example, to create a synthetic data set for human laboratory-test values, the sum of the various white blood cell (WBC) components in a white blood cell differential must equal 100, since the component numbers are really percentages. Similarly, the enzymes alanine transaminase (ALT) and aspartate transaminase (AST) measure liver function (though AST is also produced by other tissues, e.g., lung, pancreas) A synthetic data point with AST at the lower limit of normal range (8–33 units/L) with an ALT several times above normal range (4–35 units/L) in a simulated chronically ill patient would be physiologically impossible. == Query strategies == Algorithms for determining which data points should be labeled can be organized into a number of different categories, based upon their purpose: Balance exploration and exploitation: the choice of examples to label is seen as a dilemma between the exploration and the exploitation over the data space representation. This strategy manages this compromise by modelling the active learning problem as a contextual bandit problem. For example, Bouneffouf et al. propose a sequential algorithm named Active Thompson Sampling (ATS), which, in each round, assigns a sampling distribution on the pool, samples one point from this distribution, and queries the oracle for this sample point label. Expected model change: label those points that would most change the current model. Expected error reduction: label those points that would most reduce the model's generalization error. Exponentiated Gradient Exploration for Active Learning: In this paper, the author proposes a sequential algorithm named exponentiated gradient (EG)-active that can improve any active learning algorithm by an optimal random exploration. Uncertainty sampling: label those points for which the current model is least certain as to what the correct output should be. Query by committee: a variety of models are trained on the current labeled data, and vote on the output for unlabeled data; label those points for which the "committee" disagrees the most Querying from diverse subspaces or partitions: When the underlying model is a forest of trees, the leaf nodes might represent (overlapping) partitions of the original feature space. This offers the possibility of selecting instances from non-overlapping or minimally overlapping partitions for labeling. Variance reduction: label those points that would minimize output variance, which is one of the components of error. Conformal prediction: predicts that a new data point will have a label similar to old data points in some specified way and degree of the similarity within the old examples is used to estimate the confidence in the prediction. Mismatch-first farthest-traversal: The primary selection criterion is the prediction mismatch between the current model and nearest-neighbour prediction. It targets on wrongly predicted data points. The second selection criterion is the distance to previously selected data, the farthest first. It aims at optimizing the diversity of selected data. User-centered labeling strategies: Learning is accomplished by applying dimensionality reduction to graphs and figures like scatter plots. Then the user is asked to label the compiled data (categorical, numerical, relevance scores, relation between two instances). A wide variety of algorithms have been studied that fall into these categories. While the traditional AL strategies can achieve remarkable performance, it is often challenging to predict in advance which strategy is the most suitable in a particular situation. In recent years, meta-learning algorithms have been gaining in popularity. Some of them have been proposed to tackle the problem of learning AL strategies instead of relying on manually designed strategies. A benchmark which compares 'meta-learning approaches to active learning' to 'traditional heuristic-based Active Learning' may give intuitions if 'Learning active learning' is at the crossroads == Minimum marginal hyperplane == Some active learning algorithms are built upon support-vector machines (SVMs) and exploit the structure of the SVM to determine which data points to label. Such methods usually calculate the margin, W, of each u

    Read more →
  • Visual networking

    Visual networking

    Visual networking refers to an emerging class of user applications that combine digital video and social networking capabilities. It is based upon the premise that visual literacy, "the ability to interpret, negotiate and make meaning from information presented in the form of a moving image", is a powerful force in how humans communicate, entertain and learn. The duality of visual networking—subsuming entertainment and communications, professional and personal content, video and other digital media, data networks and social networks to create immersive experiences, when, where and how the user wants it. These applications have changed video content from long-form movies and broadcast television programming to a database of segments or "clips", and social network annotations. And the generation and distribution of content takes on a new dimension with Web 2.0 applications—participatory social-networks or communities that facilitate interactive creativity, collaboration and sharing between users. == History == The rise of visual networking is relatively recent phenomenon driven by the emergence of social networking capabilities and the ability to deliver interactive video over a broadband network. It is a natural evolution of the current social networking phenomena whereby social networking annotations are layered over broadband video to create highly interactive and immersive experiences between individuals and their content. Until early 2005 this was not considered viable due to the lack of web and broadband infrastructure designed to support the transmission of web video and the still nascent stage of social networks like MySpace and Facebook. The introduction of YouTube in February 2005 marked the first significant combination of broadband video and social network systems designed to allow users to share, rate and tag user generated and premium content. From 2006 to 2008 this trend continued to gain steam as individuals and businesses pursued new combinations of video and social networking across a wide range of entertainment, communication and learning applications. == Broadband video takes off == Video has largely been defined by its use as an entertainment medium. Since the commercial availability of the television in the late '30s video has become the dominant entertainment medium far eclipsing audio and text based entertainment both in terms of time and dollars spent. Within the past decade, video use has rapidly evolved across a broader range of devices, multiple locations and user applications. The popularization of the long-tail and user-generated video has further challenged people's ideas of what's possible with video. A key advantage of video relative to other media is its superior ability to communicate ideas and emotions economically. If a picture is worth a thousand words, then a video may be worth a thousand pictures. Video by its very nature is highly experiential, making communications more compelling, informative and memorable. == Social networking meets video == At the core of visual networking is the concept that people can participate in communities of content and communities of interest. A community of interest is defined as a community of people who share a common interest or passion. These people exchange ideas and thoughts about the given passion, but may know (or care) little about each other outside of this area. Participation in a community of interest can be compelling, entertaining and create a ‘sticky’ community where people return frequently and remain for extended periods. The unparalleled potential of the Internet to promote such connections is only now being fully recognized and exploited, through Web-based groups established for that purpose. Based on the six degrees of separation concept (the idea that any two people on the planet could make contact through a chain of no more than five intermediaries), social networking establishes interconnected Internet communities (sometimes known as personal networks) that help people make contacts that would be good for them to know, but that they would be unlikely to have met otherwise. == Transition from search to discovery == The phrase The Long Tail was, according to Chris Anderson, first coined by himself in October 2004. Anderson argued that products that are in low demand or have low sales volume can collectively make up a market share that rivals or exceeds the relatively few current bestsellers and blockbusters, if the store or distribution channel is large enough. The Long Tail also has implications for the producers of content; especially those whose products could not—for economic reasons—find a place in pre-Internet information distribution channels controlled by book publishers, record companies, movie studios, and television networks. Looked at from the producers' side, the Long Tail has made possible a flowering of creativity across all fields of human endeavor. One example of this is YouTube, where thousands of diverse videos—whose content, production value or lack of popularity make them inappropriate for traditional television—are easily accessible to a wide range of viewers. The benefit to the consumer is that they know have an almost infinite choice of content to select from able to create their own specific channels based upon their unique needs. A potential negative side effect of the long tail is the rapidly growing inventory of text, audio and video content. The storage and distribution systems of the past restricted the number of songs, video, and books making it easier to search for what was relevant to the individual. As the long-tail has grown, more and more relevant and irrelevant content passes an individual by without their knowledge. This is especially true for video because unlike text-based files which can searched and indexed for easy finding, video typically has only its title as a clue to what's in it. This lack of comprehensive meta-data has limited the applicability of traditional search models. Augmenting traditional search has been the emergence of content based discovery tools that make people aware of relevant content based upon their participation in communities of interest and/or communities of content. The idea is that users may or may not start out searching for something, but they soon begin reacting to things they find, exploring links on pages they stumble upon and taking cues from fellow surfers about where to go. Instead of the old, passive, lean-back style of watching video, viewers are actively seeking content through discovery. People interact with each other, posting comments on what they just saw. Many sites now allow people to vote on videos, ranking and rating them. Ranking is the result of one of a number of algorithms that measure how many people have watched something or how many sites link to it. == Early examples == YouTube is the best early example of a visual networking experience. YouTube is a video sharing website where users can upload, view and share video clips. Unregistered users can watch most videos on the site, while registered users are permitted to upload an unlimited number of videos. Few statistics are publicly available regarding the number of videos on YouTube. However, in July 2006, the company revealed that more than 100 million videos were being watched every day, and 2.5 billion videos were watched in June 2006. 50,000 videos were being added per day in May 2006, and this increased to 65,000 by July. In January 2008 alone, nearly 79 million users watched over 3 billion videos on YouTube. Telepresence refers to a set of technologies which allow a person to feel as if they were present, to give the appearance that they were present, or to have an effect, at a location other than their true location. Telepresence requires that the senses of the user, or users, are provided with such stimuli as to give the feeling of being in that other location. Additionally, the user(s) may be given the ability to affect the remote location. In this case, the user's position, movements, actions, voice, etc. may be sensed, transmitted and duplicated in the remote location to bring about this effect. Therefore, information may be traveling in both directions between the user and the remote location. Critical the creating an in-person experience is the presence of high-definition video perfectly synchronized with stereophonic sound. A minimum system usually includes visual feedback. Ideally, the entire field of view of the user is filled with a view of the remote location, and the viewpoint corresponds to the movement and orientation of the user's head. In this way, it differs from television or cinema, where the viewpoint is out of the control of the viewer. == Other applications == While still in its infancy, visual networking applications are beginning to emerge that span both consumer and business markets. === Mobile video === Proliferation of multi-function mobile devices, particularl

    Read more →
  • Peñabot

    Peñabot

    Peñabot is the nickname for automated social media accounts allegedly used by the Mexican government of Enrique Peña Nieto and the PRI political party to keep unfavorable news from reaching the Mexican public. Peñabot accusations are related to the broader issue of fake news in the 21st century. == History of disinformation in Mexican politics == The PRI political party has been reported to use fake news since before Peña Nieto. The main tactic originally was to spread such propaganda through open radio and television networks. Such tactic was effective in Mexico, because newspaper readership is low and cable TV is largely limited to the middle classes; consequently, the country's two major television networks – Televisa and TV Azteca – exert a significant influence in national politics. Televisa itself, not only owns around two-thirds of the programming on Mexico's TV channels, making it not only Mexico's largest television network, but also is the largest media network in the Spanish-speaking world. == Peñabots == Analysts have given the name Peñabots to a suspected network of automated accounts on social media used by the Mexican government to spread pro-government propaganda and to marginalize dissenting opinions in social media. The bots were first noticed in the 2012 elections when they were used to disseminate opinions in support of Enrique Peña Nieto on social networks such as Twitter and Facebook. According to Aristegui Noticias, their usage went against articles 6 and 134 of the Mexican Constitution. Those used by Peña Nieto's government cost an estimated 80 million pesos monthly, which news outlets argued only helped the government spread fake support towards the president, but did not have a benefit towards Mexican people (with whom EPN was highly unpopular). Facebook held approximately 640,321 Peñabots, while Twitter had less. As of July 2017, Oxford Internet Institute's Computational Propaganda Research Project claimed many western democracies, Mexico included, perform social media manipulation, thus saying the manipulation comes directly from the Mexican government itself. During Peña Nieto's subsequent presidency, analysts noted that Peñabots were used to overpower trending topics that critiqued government, to flood trending government critical hashtags with spam, to create fake trends by pushing alternative hashtags, and to push smear campaigns and threats against government-critical activists and journalists. Peñabots were distinguished as their pattern of activity was distinct from that of ordinary interaction on social networks. === Meadebots === On Twitter it was reported that about 94% of the followers of 2018 presidential candidate from the PRI Jose Antonio Meade were bots. When Antonio Meade presented himself as a candidate for the 2018 presidential election, his social media accounts such as "@MovimientoMEADE" (created by the PRI's official account @PRI_Nacional), obtained a huge quantity of followers in a short span of time. Some users noticed and brought it to attention, and after investigation it was reported 94% of such followers were bots (702,000 out of 747,000), and the account was eliminated from Twitter after 20 hours. The fake accounts used the hashtags #YoConMeade and #Meade18. It was further revealed was that Meade's official account on Twitter, @JoseAMeadeK had 25% bots (216,000 fake followers out of the 981,000). == Manipulation of news media in Mexico, through television == The Mexican government of Peña Nieto has been accused of using various means to keep unfavorable news from reaching the Mexican people. Many Mexicans have protested this practice as it clearly goes against the freedom of speech. The PRI has been reported to use fake news since before Peña Nieto. The main tactic has been to spread such propaganda through radio and television. This tactic is perceived as effective in Mexico, because newspaper readership is low and research on the Internet and cable TV is largely limited to the middle classes; consequently, the country's two major television networks – Televisa and TV Azteca – exert a significant influence in national politics. Televisa itself, owns around two-thirds of the programming on Mexico's TV channels, making it not only Mexico's largest television network, but also is the largest media network in the Spanish-speaking world. In June 2012, before the 2012 Mexican presidential elections, the British newspaper The Guardian published a series of allegations claiming Televisa, sold favorable coverage to top politicians in its news and entertainment shows, this scandal became known as the Televisa controversy. The documents published by 'The Guardian alleged that a secretive circle within Televisa manipulated news coverage to favor PRI presidential candidate Enrique Peña Nieto, who was poised as favorite to win. Televisa's secret circle supposedly commissioned videos to promote Peña Nieto and lash out his political rivals in 2009. The Guardian documents suggest that Televisa's secret team distributed such videos through e-mail, posting them posted them on Facebook and YouTube, some can still be seen there. Another document was a PowerPoint presentation, with a slide explicitly aimed at rival leftist candidate of the Party of the Democratic Revolution (PRD), Andrés Manuel López Obrador. Supposedly given to The Guardian by a Televisa employee. The document's authenticity was never possible to confirm– however dates, names, and events largely coincide. Televisa refused to talk the documents, and denied a relationship with the PRI or its presidential candidate, saying that they had provided equal media coverage to all parties. Televisa published an article supposedly showing discrepancies in The Guardian documents and denying accusations. Mexican citizens complained about the perceived favoritism towards Enrique Peña Nieto and the PRI, protesting through the Yo Soy 132 movement which Televisa covered in detail. However, Televisa's news media coverage is perceived to have been biased, by using a media coverage tactic Mexican citizens call cortinas de humo (smoke screens). These introduce a news scandal giving extensive coverage to distract citizens from a potential conflict-of-interest or controversy that could damage the image of the politician favored by the network. An example of a perceived smoke screen would be the news media coverage of "Caso Michoacán" and "Caso Paolette" distracting all the attention from the parallel "Yo soy 132" movement. A few years later, on the day of September 11, 2016; factual evidence of Televisa's performing media manipulation emerged, when a Televisa news anchor while live-on air reading a teleprompter, mistakenly read out loud that "try that Jaime "Ël Bronco" Rodríguez Calderón (Nuevo Leon's governor) is mentioned as little as possible". Newspaper El Universal caught it on video and published it social media. Televisa didn't mention the story and declined to comment. Lack of news coverage concerning Nuevo León's Governor Jaime Rodriguez, is perceived due to him being the first elected governor to not be part of any political party (Independent Governor), and because unlike the governors from the PRI preceding him, the independent governor "El Bronco" doesn't spend money on publicity at all, preferring to communicate all news by using social media such as Twitter and Facebook. While the incident may have proven Televisa's bias, there wasn't anything to incriminate the PRI political party or Enrique Peña Nieto, though it did further suspicion of Televisa manipulating news media. In contrast, a December 2017 article of The New York Times, reported Enrique Peña Nieto spending about 2000 million dollars on publicity, during his first 5 years as president, the largest publicity budget ever spent by a Mexican President. Additionally, 68 percent of news journalists admitted to not believe to have enough freedom of speech, and award-winning news reporter Carmen Aristegui was controversially fired shortly after revealing the Mexican White House scandals. == Violence and spying towards news journalists and civil rights activists == Far for only being receiving accusations of spreading fake news, the Mexican government of EPN (Enrique Peña Nieto) has also been accused of violence towards news journalists, and of spying on them, and also towards civil right leaders and their families. During his tenure as president, Peña Nieto has been accused of failing to protect news journalists, whose deaths are speculated to be politically triggered, by politicians attempting to prevent them from covering political scandals. The New York Times published a news report on the matter titled, "In Mexico it's easy to kill a journalist", on it mentioning how during EPN's government, Mexico became one of the worst countries on which to be a journalist. The assassination of journalist Javier Valdez on May 23, 2017, received national coverage, with multiple news journalists

    Read more →
  • Social media surgery

    Social media surgery

    A social media surgery is a gathering at which volunteer "surgeons" with expertise in using web tools, chiefly social media, offer free advice in using such tools, to representatives ("patients") of non-profit organisations, charities, community groups and activists, with "no boring speeches or jargon". The idea was conceived by Pete Ashton, with Nick Booth of Podnosh Ltd, who ran the first such surgery in Birmingham, England, on 15 October 2008. In July 2009, a spin-off surgery (dubbed the "Social media mob") started in Mosman, Australia, and in January 2010, the first spin-off surgery in Africa was held. On 16 February 2012, it was announced that the Social Media Surgery movement had won "the Prime Minister’s Big Society Award". Prime Minister David Cameron said: This is an excellent initiative - such a simple idea and yet so effective. The popularity of these surgeries and the fact that they have inspired so many others across the country to follow in their footsteps, is testament to its brilliance. Congratulations to Nick and all the volunteers who have shared their time and expertise to help so many local groups make the most of the internet to support their community. A great example of the Big Society in action. The scheme also won the 2013 Adult Learners' Week "BBC Learning Through Technology Award".

    Read more →
  • MobileNet

    MobileNet

    MobileNet is a family of convolutional neural network (CNN) architectures designed for image classification, object detection, and other computer vision tasks. They are designed for small size, low latency, and low power consumption, making them suitable for on-device inference and edge computing on resource-constrained devices like mobile phones and embedded systems. They were originally designed to be run efficiently on mobile devices with TensorFlow Lite. The need for efficient deep learning models on mobile devices led researchers at Google to develop MobileNet. As of June 2025, the family has five versions, each improving upon the previous one in terms of performance and efficiency. == Features == === V1 === MobileNetV1 was published in April 2017. Its main architectural innovation was incorporation of depthwise separable convolutions. It was first developed by Laurent Sifre during an internship at Google Brain in 2013 as an architectural variation on AlexNet to improve convergence speed and model size. The depthwise separable convolution decomposes a single standard convolution into two convolutions: a depthwise convolution that filters each input channel independently and a pointwise convolution ( 1 × 1 {\displaystyle 1\times 1} convolution) that combines the outputs of the depthwise convolution. This factorization significantly reduces computational cost. The MobileNetV1 has two hyperparameters: a width multiplier α {\displaystyle \alpha } that controls the number of channels in each layer. Smaller values of α {\displaystyle \alpha } lead to smaller and faster models, but at the cost of reduced accuracy, and a resolution multiplier ρ {\displaystyle \rho } , which controls the input resolution of the images. Lower resolutions result in faster processing but potentially lower accuracy. === V2 === MobileNetV2 was published in March 2019. It uses inverted residual layers and linear bottlenecks. Inverted residuals modify the traditional residual block structure. Instead of compressing the input channels before the depthwise convolution, they expand them. This expansion is followed by a 1 × 1 {\displaystyle 1\times 1} depthwise convolution and then a 1 × 1 {\displaystyle 1\times 1} projection layer that reduces the number of channels back down. This inverted structure helps to maintain representational capacity by allowing the depthwise convolution to operate on a higher-dimensional feature space, thus preserving more information flow during the convolutional process. Linear bottlenecks removes the typical ReLU activation function in the projection layers. This was rationalized by arguing that that nonlinear activation loses information in lower-dimensional spaces, which is problematic when the number of channels is already small. === V3 === MobileNetV3 was published in 2019. The publication included MobileNetV3-Small, MobileNetV3-Large, and MobileNetEdgeTPU (optimized for Pixel 4). They were found by a form of neural architecture search (NAS) that takes mobile latency into account, to achieve good trade-off between accuracy and latency. It used piecewise-linear approximations of swish and sigmoid activation functions (which they called "h-swish" and "h-sigmoid"), squeeze-and-excitation modules, and the inverted bottlenecks of MobileNetV2. === V4 === MobileNetV4 was published in September 2024. The publication included a large number of architectures found by NAS. Inspired by Vision Transformers, the V4 series included multi-query attention. It also unified both inverted residual and inverted bottleneck from the V3 series with the "universal inverted bottleneck", which includes these two as special cases. === V5 === MobileNetV5's architecture was published shortly after the release of Gemma 3n in June 2025. While the announcement stated a technical report on MobileNetV5 would be available soon, this has not yet materialised. The network is 10 times larger than the largest V4 variant.

    Read more →
  • Corporate surveillance

    Corporate surveillance

    Corporate surveillance describes the practice of businesses monitoring and extracting information from their users, clients, or staff. This information may consist of online browsing history, email correspondence, phone calls, location data, and other private details. Acts of corporate surveillance frequently look to boost results, detect potential security problems, or adjust advertising strategies. These practices have been criticized for violating ethical standards and invading personal privacy. Critics and privacy activists have called for businesses to incorporate rules and transparency surrounding their monitoring methods to ensure they are not misusing their position of authority or breaching regulatory standards. Monitoring can feel intrusive and give the impression that the business does not promote ethical behavior among its personnel. Staff satisfaction, productivity, and staff turnover may all suffer as a result of the invasion of privacy. == Monitoring methods == Employers may be authorized to gather information through keystroke logging and mouse tracking, which involves recording the keys individuals interact with and cursor position on computers. In cases where employment contracts permit it, they may also monitor webcam activity on company-provided computers. Employers may be able to view the emails sent from business accounts and may be able to see the websites visited when using a corporate internet connection. The screenshot capability is another tool that enables companies to see what remote workers are doing. This feature, which can be found in tracking software, takes screenshots throughout the day at predetermined or arbitrary intervals. Additionally, people who don't work in offices are observed. For instance, it has been claimed that Amazon has incorporated tracking technology to monitor warehouse staff and delivery drivers. == Use of collected information == Information collected by corporations can be used for a variety of uses including marketing research, targeting advertising, fraud detection and prevention, ensuring policy adherence, preventing lawsuits, and safeguarding records and company assets. == Privacy concerns == Concerns over corporate privacy have become more important due to companies collection and manipulation of personal data. Since these practices have been recognized there has been a rising concern about both the security and the possible mishandling of the data accumulated. Social Media data collection and monitoring has been one of the most concerned areas regarding corporate surveillance. Recently, many employers on CareerBuilder have checked their potential candidates' social media activities before the hiring process. This approach can be excusable since it is important to be aware of a future employee or applicant's online presence, and how it might affect the company's reputation in the future. This is crucial since employers are often made legally responsible for their worker's digital actions. These data can also be used to enact political gains. The Facebook-Cambridge Analytica data scandal in 2018 revealed that its British branch to have surreptitiously sold American psychological data to the Trump campaign. This information was supposed to be private, but Facebook's inability to protect user information had reportedly not been a top priority of the company at the time. == Laws and regulations == The National Labor and Relations Act (NLRA) safeguards workplace democracy by giving workers in the private sector the basic freedom to demand better working conditions and choice of representation without fear of retaliation. General Data Protection Regulation (GDPR) outlines the broad responsibilities of data controllers and the "processors" that handle personal data on their behalf. They must adopt the necessary security measures in accordance with the risk involved in the data processing operations they carry out.[1] Electronics Communication Privacy Act (ECPA), as amended, provides protection for electronic, oral, and wire communications while they are being created, while they are being sent, and while they are being stored on computers. Email, phone calls, and electronically stored data are covered by the Act. == Sale of customer data == If it is business intelligence, data collected on individuals and groups can be sold to other corporations, so that they can use it for the aforementioned purpose. It can be used for direct marketing purposes, such as targeted advertisements on Google and Yahoo. These ads are tailored to the individual user of the search engine by analyzing their search history and emails (if they use free webmail services). For example, the world's most popular web search engine stores identifying information for each web search. Google stores an IP address and the search phrase used in a database for up to 2 years. Google also scans the content of emails of users of its Gmail webmail service, in order to create targeted advertising based on what people are talking about in their personal email correspondences. Google is, by far, the largest web advertising agency. Their revenue model is based on receiving payments from advertisers for each page-visit resulting from a visitor clicking on a Google AdWords ad, hosted either on a Google service or a third-party website. Millions of sites place Google's advertising banners and links on their websites, in order to share this profit from visitors who click on the ads. Each page containing Google advertisements adds, reads, and modifies cookies on each visitor's computer. These cookies track the user across all of these sites, and gather information about their web surfing habits, keeping track of which sites they visit, and what they do when they are on these sites. This information, along with the information from their email accounts, and search engine histories, is stored by Google to use for building a profile of the user to deliver better-targeted advertising. == Surveillance of workers == In 1993, David Steingard and Dale Fitzgibbons argued that modern management, far from empowering workers, had features of neo-Taylorism, where teamwork perpetuated surveillance and control. They argued that employees had become their own "thought police" and the team gaze was the equivalent of Bentham's panopticon guard tower. A critical evaluation of the Hawthorne Plant experiments has in turn given rise to the notion of a Hawthorne effect, where workers increase their productivity in response to their awareness of being observed or because they are gratified for being chosen to participate in a project. According to the American Management Association and the ePolicy Institute, who undertook a quantitative survey in 2007 about electronic monitoring and surveillance with approximately 300 US companies, "more than one fourth of employers have fired workers for misusing email and nearly one third have fired employees for misusing the Internet." Furthermore, about 30 percent of the companies had also fired employees for usage of "inappropriate or offensive language" and "viewing, downloading, or uploading inappropriate/offensive content." More than 40 percent of the companies monitor email traffic of their workers, and 66 percent of corporations monitor Internet connections. In addition, most companies use software to block websites such as sites with games, social networking, entertainment, shopping, and sports. The American Management Association and the ePolicy Institute also stress that companies track content that is being written about them, for example by monitoring blogs and social media, and scanning all files that are stored in a filesystem. == Government use of corporate surveillance data == The United States government often gains access to corporate databases, either by producing a warrant for it, or by asking. The Department of Homeland Security has openly stated that it uses data collected from consumer credit and direct marketing agencies—such as Google—for augmenting the profiles of individuals that it is monitoring. The US government has gathered information from grocery store discount card programs, which track customers' shopping patterns and store them in databases, in order to look for terrorists by analyzing shoppers' buying patterns. == Corporate surveillance of citizens == According to Dennis Broeders, "Big Brother is joined by big business". He argues that corporations are in any event interested in data on their potential customers and that placing some forms of surveillance in the hands of companies, results in companies owning video surveillance data for stores and public places. The commercial availability of surveillance systems has led to their rapid spread. Therefore it is almost impossible for citizens to maintain their anonymity. When businesses can monitor their customers, such customers run the risk of facing prejudice when applying for housing, loans, jobs, and other economic opportun

    Read more →
  • Azuqua

    Azuqua

    Azuqua is an American cloud-based integration and automation company headquartered in Seattle, Washington. As such, they integrate SaaS applications and create automations that are designed to eliminate manual work. Azuqua's platform has the ability to set up workflows between multiple applications so disparate teams can stay in the loop. Azuqua's customers include companies such as Charles Schwab, General Electric, General Motors, HubSpot, and Airbnb. == History == Nikhil Hasija and Craig Unger founded Azuqua in 2011. In 2013, the team participated in Techstars Microsoft's Windows Azure Accelerator, a Seattle-based incubator that helps entrepreneurs gain traction through deep mentor engagement and rapid iteration cycles. Azuqua announced in 2014 that they have received their Series A funding from Ignition Partners which amounted to $5 million. 2017 included a 65% growth in new customers, a doubling of new SaaS connectors, and a 50% growth in overall employee headcount. Azuqua also received their Series B funding which totaled to $10.8 million. This funding was led by Insight Ventures Partners, with DFJ and Ignition Partners also joining the round In March 2018, Azuqua hired Todd Owens as CEO. Owens was previously CEO of Appuri, a customer data platform. Hasija has transitioned to the role of Chief Product Officer. Azuqua also hired on Dan Kogan who has taken on the role of Chief Marketing Officer. Kogan previously worked at Tableau, a BI and analytics company, as a Senior Director of Product Marketing. Okta acquired Azuqua in 2019. == Product Description/Features == Logic Library: Logic functions that can be used for data processing, branching logic, and business rules Drag and Drop Visual Designer: No-code visual designer Use of API's for each cloud service a business is using to allow the various apps to communicate and share data API Publishing: Integrations and automations can be made available as secure endpoints, webhooks, or open services Connector Builder: Build a connector to an application Connector Library: Pre-built connectors to SaaS applications Error Handling: Automations that execute when an error is detected

    Read more →