AI For Business Guide

AI For Business Guide — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Film recorder

    Film recorder

    A film recorder is a graphical output device for transferring images to photographic film from a digital source. In a typical film recorder, an image is passed from a host computer to a mechanism to expose film through a variety of methods, historically by direct photography of a high-resolution cathode-ray tube (CRT) display. The exposed film can then be developed using conventional developing techniques, and displayed with a slide or motion picture projector. The use of film recorders predates the current use of digital projectors, which eliminate the time and cost involved in the intermediate step of transferring computer images to film stock, instead directly displaying the image signal from a computer. Motion picture film scanners are the opposite of film recorders, copying content from film stock to a computer system. Film recorders can be thought of as modern versions of kinescopes. == Design == === Operation === All film recorders typically work in the same manner. The image is fed from a host computer as a raster stream over a digital interface. A film recorder exposes film through various mechanisms; flying spot (early recorders); photographing a high resolution video monitor; electron beam recorder (Sony HDVS); a CRT scanning dot (Celco); focused beam of light from a light valve technology (LVT) recorder; a scanning laser beam (Arrilaser); or recently, full-frame LCD array chips. For color image recording on a CRT film recorder, the red, green, and blue channels are sequentially displayed on a single gray scale CRT, and exposed to the same piece of film as a multiple exposure through a filter of the appropriate color. This approach yields better resolution and color quality than possible with a tri-phosphor color CRT. The three filters are usually mounted on a motor-driven wheel. The filter wheel, as well as the camera's shutter, aperture, and film motion mechanism are usually controlled by the recorder's electronics and/or the driving software. CRT film recorders are further divided into analog and digital types. The analog film recorder uses the native video signal from the computer, while the digital type uses a separate display board in the computer to produce a digital signal for a display in the recorder. Digital CRT recorders provide a higher resolution at a higher cost compared to analog recorders due to the additional specialized hardware. Typical resolutions for digital recorders were quoted as 2K and 4K, referring to 2048×1366 and 4096×2732 pixels, respectively, while analog recorders provided a resolution of 640×428 pixels in comparison. Higher-quality LVT film recorders use a focused beam of light to write the image directly onto a film loaded spinning drum, one pixel at a time. In one example, the light valve was a liquid-crystal shutter, the light beam was steered with a lens, and text was printed using a pre-cut optical mask. The LVT will record pixel beyond grain. Some machines can burn 120-res or 120 lines per millimeter. The LVT is basically a reverse drum scanner. The exposed film is developed and printed by regular photographic chemical processing. === Formats === Film recorders are available for a variety of film types and formats. The 35 mm negative film and transparencies are popular because they can be processed by any photo shop. Single-image 4×5 film and 8×10 are often used for high-quality, large format printing. Some models have detachable film holders to handle multiple formats with the same camera or with Polaroid backs to provide on-site review of output before exposing film. == Uses == Film recorders are used in digital printing to generate master negatives for offset and other bulk printing processes. For preview, archiving, and small-volume reproduction, film recorders have been rendered obsolete by modern printers that produce photographic-quality hardcopies directly on plain paper. They are also used to produce the master copies of movies that use computer animation or other special effects based on digital image processing. However, most cinemas nowadays use Digital Cinema Packages on hard drives instead of film stock. === Computer graphics === Film recorders were among the earliest computer graphics output devices; for example, the IBM 740 CRT Recorder was announced in 1954. Film recorders were also commonly used to produce slides for slide projectors; but this need is now largely met by video projectors that project images directly from a computer to a screen. The terms "slide" and "slide deck" are still commonly used in presentation programs. === Current uses === Currently, film recorders are primarily used in the motion picture film-out process for the ever increasing amount of digital intermediate work being done. Although significant advances in large venue video projection alleviates the need to output to film, there remains a deadlock between the motion picture studios and theater owners over who should pay for the cost of these very costly projection systems. This, combined with the increase in international and independent film production, will keep the demand for film recording steady for at least a decade. == Key manufacturers == Traditional film recorder manufacturers have all but vanished from the scene or have evolved their product lines to cater to the motion picture industry. Dicomed was one such early provider of digital color film recorders. Polaroid, Management Graphics, Inc, MacDonald-Detwiler, Information International, Inc., and Agfa were other producers of film recorders. Arri is the only current major manufacturer of film recorders. Kodak Lightning I film recorder. One of the first laser recorders. Needed an engineering staff to set up. Kodak Lightning II film recorder used both gas and diode laser to record on to film. The last LVT machines produced by Kodak / Durst-Dice stopped production in 2002. There are no LVT film recorders currently being produced. LVT Saturn 1010 uses a LED exposure (RGB) to 8"x10" film at 1000-3000ppi. LUX Laser Cinema Recorder from Autologic/Information International in Thousand Oaks, California. Sales end in March 2000. Used on the 1997 film “Titanic”. Arri produces the Arrilaser line of laser-based motion picture film recorders. MGI produced the Solitaire line of CRT-based motion picture film recorders. Matrix, originally ImaPRO, a branch of Agfa Division, produced the QCR line of CRT-based motion picture film recorders. CCG, formerly Agfa film recorders, has been a steady manufacturer of film recorders based in Germany. In 2004 CCG introduced Definity, a motion picture film recorder utilizing LCD technology. In 2010 CCG introduced the first full LED LCD film recorder as a new step in film recording. Cinevator was made by Cinevation AS, in Drammen, Norway. The Cinevator was a real-time digital film recorder. It could record IN, IP and prints with and without sound Oxberry produced the Model 3100 film recorder camera system, with interchangeable pin-registered movements (shuttles) for 35 mm (full frame/Silent, 1.33:1) and 16 mm (regular 16, "2R"), and others have adapted the Oxberry movements for CinemaScope, 1.85:1, 1.75:1, 1.66:1, as well as Academy/Sound (1.37:1) in 35 mm and Super-16 in 16 mm ("1R"). For instance, the "Solitaire" and numerous others employed the Oxberry 3100 camera system. == History == Before video tape recorders or VTRs were invented, TV shows were either broadcast live or recorded to film for later showing, using the kinescope process. In 1967, CBS Laboratories introduced the Electronic Video Recording format, which used video and telecined-to-video film sources, which were then recorded with an electron-beam recorder at CBS' EVR mastering plant at the time to 35mm film stock in a rank of 4 strips on the film, which was then slit down to 4 8.75 mm (0.344 in) film copies, for playback in an EVR player. All types of CRT recorders were (and still are) used for film recording. Some early examples used for computer-output recording were the 1954 IBM 740 CRT Recorder, and the 1962 Stromberg-Carlson SC-4020, the latter using a Charactron CRT for text and vector graphic output to either 16 mm motion picture film, 16 mm microfilm, or hard-copy paper output. Later 1970 and 80s-era recording to B&W (and color, with 3 separate exposures for red, green, and blue)) 16 mm film was done with an EBR (Electron Beam Recorder), the most prominent examples made by 3M), for both video and COM (Computer Output Microfilm) applications. Image Transform in Universal City, California used specially modified 3M EBR film recorders that could perform color film-out recording on 16 mm by exposing three 16 mm frames in a row (one red, one green and one blue). The film was then printed to color 16 mm or 35 mm film. The video fed to the recorder could either be NTSC, PAL or SECAM. Later, Image Transform used specially modified VTRs to record 24 frame for their "Image Vision" system. The modified 1 inch type B videotape VTRs would record

    Read more →
  • Tumblr

    Tumblr

    Tumblr ( TUM-blər) is a microblogging and social media platform founded by David Karp in 2007 and operated by American company Tumblr, Inc., a subsidiary of Automattic. The service allows users to post multimedia and other content to a short-form blog. It has attracted significant attention and controversy for hosting a wide range of progressive user-generated content. == History == === Beginnings (2006–2012) === Development of Tumblr began in 2006 during a two-week gap between contracts at David Karp's software consulting company, Davidville. Karp had been interested in tumblelogs (short-form blogs, hence the name Tumblr) for some time and was waiting for one of the established blogging platforms to introduce their own tumblelogging platform. As none had done so after a year of waiting, Karp and developer Marco Arment began working on their own platform. Tumblr was launched in February 2007, and within two weeks had gained 75,000 users. Arment left the company in September 2010 to work on Instapaper. In June 2012, Tumblr featured its first major brand advertising campaign in collaboration with Adidas, who launched an official soccer Tumblr blog and bought ad placements on the user dashboard. This launch came only two months after Tumblr announced it would be moving towards paid advertising on its site. === Ownership by Yahoo! (2013–2018) === On May 20, 2013, it was announced that Yahoo and Tumblr had reached an agreement for Yahoo! Inc. to acquire Tumblr for $1.1 billion in cash. Many of Tumblr's users were unhappy with the news, causing some to start a petition, achieving nearly 170,000 signatures. David Karp remained CEO and the deal was finalized on June 20, 2013. Advertising sales goals were not met and in 2016 Yahoo wrote down $712 million of Tumblr's value. Verizon Communications acquired Yahoo in June 2017, and placed Yahoo and Tumblr under its Oath subsidiary. Karp announced in November 2017 that he would be leaving Tumblr by the end of the year. Jeff D'Onofrio, Tumblr's president and COO, took over leading the company. The site, along with the rest of the Oath division (renamed Verizon Media Group in 2019), continued to struggle under Verizon. In March 2019, Similarweb estimated Tumblr had lost 30% of its user traffic since December 2018, when the site had introduced a stricter content policy with heavier restrictions on adult content (which had been a notable draw to the service). In May 2019, it was reported that Verizon was considering selling the site due to its continued struggles since the purchase (as it had done with another Yahoo property, Flickr, via its sale to SmugMug). Following this news, Pornhub's vice president publicly expressed interest in purchasing Tumblr, with a promise to reinstate the previous adult content policies. === Automattic (2019–present) === On August 12, 2019, Verizon Media announced that it would sell Tumblr to Automattic, the operator of blog service WordPress.com and corporate backer of the open source blog software of the same name. The sale was for an undisclosed amount, but Axios reported that the sale price was less than $3 million, less than 0.3% of Yahoo's original purchase price. Automattic CEO Matt Mullenweg stated that the site will operate as a complementary service to WordPress.com, and that there were no plans to reverse the content policy decisions made during Verizon ownership. In November 2022, Mullenweg stated that Tumblr will add support for the decentralized social networking protocol ActivityPub. In November 2023, most of Tumblr's product development and marketing teams were transferred to other groups within Automattic. Mullenweg stated that focus would shift to core functionality and streamlining existing features. In February 2024, Automattic announced that it would begin selling user data from Tumblr and WordPress.com to Midjourney and OpenAI. Tumblr users are opted-in by default, with an option to opt out. In August 2024, Automattic announced that it would migrate Tumblr's backend to an architecture derived from WordPress, in order to ease development and code sharing between the platforms. The company stated that this migration would not impact the service's user experience and content, and that users "won't even notice a difference from the outside". In January 2025, Mullenweg stated that the migration, once completed, would also "unlock" ActivityPub access for Tumblr, including native support for the company's official ActivityPub plugin for WordPress. In April 2025, Automattic announced layoffs for 16% of its workforce, reducing a large portion of Tumblr staff. On March 16, 2026, Tumblr implemented a change to how notes were assigned to reblogs, making it more similar to sites like Twitter and Bluesky. The change was rolled back the next day after heavy user backlash. == Features == === Blog management === Dashboard: The dashboard is the primary tool for the typical Tumblr user. It is a live feed of recent posts from blogs that they follow. Through the dashboard, users are able to comment, reblog, and like posts from other blogs that appear on their dashboard. The dashboard allows the user to upload text posts, images, videos, quotes, or links to their blog with a click of a button displayed at the top of the dashboard. Users are also able to connect their blogs to their Twitter and Facebook accounts, so that whenever they make a post, it will also be sent as a tweet and a status update. As of June 2022, users can also turn off reblogs on specific posts through the dashboard. Queue: Users are able to set up a schedule to delay posts that they make. They can spread their posts over several hours or even days. Tags: Users can help their audience find posts about certain topics by adding tags. If someone were to upload a picture to their blog and wanted their viewers to find pictures, they would add the tag #picture, and their viewers could use that word to search for posts with the tag #picture. HTML editing: Tumblr allows users to edit their blog's theme using HTML to control the appearance of their blog. Custom themes are able to be shared and used by other users, or sold. Custom domains: Tumblr allows users to use custom domains for their blogs. Users must purchase a domain from Tumblr Domains, an in-house registrar that provides domains that can only be used with Tumblr unless removed from the user's blog and transferred to another registrar. Blogs previously were able to be linked with any domain/subdomain from any registrar, however following the introduction of the Tumblr Domains service, now requires you to purchase a domain directly from Tumblr to be used with a blog. Users who kept their blogs connected to a domain after the introduction got to keep their custom domain, as long as they do not disconnect it from Tumblr or let the domain expire. === Tags === The tagging system on the website operates on a hybrid tagging system, involving both self-tagging (user write their own tags on their posts) and an auto-manual function (the website will recommend popular tags and ones that the user has used before.) Only the first 20 tags added to any post will be indexed by the site. The tags are prefaced by a hashtag and separated by commas, and spaces and special characters are allowed, but only up to 140 characters total per tag. There are two main types used by Tumblr users: descriptive tagging, and opinion or commentary tagging. Descriptive tags are usually introduced by the original poster, and describe what is in the post (e.g. #art, #sky). These are important for the original poster to use, so their post will be indexed and searchable by others wishing to view that subject of content. Tags used as a form of communication are unique to Tumblr, and are typically more personal, expressing opinions, reactions, meta-commentary, background information, and more. Instead of adding onto the reblogged post (with their comments becoming an addition to each subsequent reblog from them) a user may add their comments in the tags, not changing the content or appearance of the original post in any way. Not all users choose to use tags this way, but those who do use tags for commentary may prefer it over adding a comment on the actual post. === Mobile === With Tumblr's 2009 acquisition of Tumblerette, an iOS application created by Jeff Rock and Garrett Ross, the service launched its official iPhone app. The site became available to BlackBerry smartphones on April 17, 2010, via a Mobelux application in BlackBerry World. In June 2012, Tumblr released a new version of its iOS app, Tumblr 3.0, allowing support for Spotify integration, hi-res images and offline access. An app for Android is also available. A Windows Phone app was released on April 23, 2013. An app for Google Glass was released on May 16, 2013. === Inbox and messaging === Tumblr blogs have the option to allow users to submit questions, either as themselves or anonymously, to the blog for a response. Tumblr

    Read more →
  • Instant messaging

    Instant messaging

    Instant messaging (IM) technology is a type of synchronous computer-mediated communication involving the immediate (real-time) transmission of messages between two or more parties over the Internet or another computer network. Originally involving simple text message exchanges, modern instant messaging applications and services (also variously known as instant messenger, messaging app, chat app, chat client, or simply a messenger) tend to also feature the exchange of multimedia, emojis, file transfer, VoIP (voice calling), and video chat capabilities. Instant messaging systems facilitate connections between specified known users (often using a contact list also known as a "buddy list" or "friend list") or in chat rooms, and can be standalone apps or integrated into a wider social media platform, or in a website where it can, for instance, be used for conversational commerce. Originally the term "instant messaging" was distinguished from "text messaging" by being run on a computer network instead of a cellular/mobile network, being able to write longer messages, real-time communication, presence ("status"), and being free (only cost of access instead of per SMS message sent). Instant messaging was pioneered in the early Internet era; the IRC protocol was the earliest to achieve wide adoption. Later in the 1990s, ICQ was among the first closed and commercialized instant messengers, and several rival services appeared afterwards as it became a popular use of the Internet. Beginning with its first introduction in 2005, BlackBerry Messenger became the first popular example of mobile-based IM, combining features of traditional IM and mobile SMS. Instant messaging remains very popular today; IM apps are the most widely used smartphone apps: in 2018 for instance there were 980 million monthly active users of WeChat and 1.3 billion monthly users of WhatsApp, the largest IM network. == Overview == Instant messaging (IM), sometimes also called "messaging" or "texting", consists of computer-based human communication between two users (private messaging) or more (chat room or "group") in real-time, allowing immediate receipt of acknowledgment or reply. This is in direct contrast to email, where conversations are not in real-time, and the perceived quasi-synchrony of the communications by the users (although many systems allow users to send offline messages that the other user receives when logging in). Earlier IM networks were limited to text-based communication, not dissimilar to mobile text messaging. As technology has moved forward, IM has expanded to include voice calling using a microphone, videotelephony using webcams, file transfer, location sharing, image and video transfer, voice notes, and other features. IM is conducted over the Internet or other types of networks (see also LAN messenger). Depending on the IM protocol, the technical architecture can be peer-to-peer (direct point-to-point transmission) or client–server (when all clients have to first connect to the central server). Primary IM services are controlled by their corresponding companies and usually follow the client-server model. At one point, the term "Instant Messenger" was a service mark of AOL Time Warner and could not be used in software not affiliated with AOL in the United States. For this reason, in April 2007, the instant messaging client formerly named Gaim (or gaim) announced that they would be renamed "Pidgin". === Clients === Modern IM services generally provide their own client, either a separately installed application or a browser-based client. They are normally centralised networks run by the servers of the platform's operators, unlike peer-to-peer protocols like XMPP. These usually only work within the same IM network, although some allow limited function with other services (see #Interoperability). Third-party client software applications exist that will connect with most of the major IM services. There is the class of instant messengers that uses the serverless model, which doesn't require servers, and the IM network consists only of clients. There are several serverless messengers: RetroShare, Tox, Bitmessage, Ricochet. See also: LAN messenger. Some examples of popular IM services today include Signal, Telegram, WhatsApp Messenger, WeChat, QQ Messenger, Viber, Line, and Snapchat. The popularity of certain apps greatly differ between different countries. Certain apps have an emphasis on certain uses - for example, Skype focuses on video calling, Slack focuses on messaging and file sharing for work teams, and Snapchat focuses on image messages. Some social networking services offer messaging services as a component of their overall platform, such as Facebook's Facebook Messenger, who also own WhatsApp. Others have a direct IM function as an additional adjunct component of their social networking platforms, like Instagram, Reddit, Tumblr, TikTok, Clubhouse and Twitter; this also includes for example dating websites, such as OkCupid or Plenty of Fish, and online gaming chat platforms. === Features === ==== Private and group messaging ==== Private chat allows users to converse privately with another person or a group. Privacy can also be enhanced in several ways, such as end-to-end encryption by default. Public and group chat features allow users to communicate with multiple people simultaneously. ==== Calling ==== Many major IM services and applications offer a call feature for user-to-user voice calls, conference calls, and voice messages. The call functionality is useful for professionals who utilize the application for work purposes and as a hands-free method. Videotelephony using a webcam is also possible by some. ==== Games and entertainment ==== Some IM applications include in-app games for entertainment. Yahoo! Messenger, for example, introduced these where users could play a game and viewed by friends in real-time. MSN Messenger featured a number of playable games within the interface. Facebook's Messenger has had a built-in option to play games with people in a chat, including games like Tetris and Blackjack. Discord features multiple games built inside the "activities" tab in voice channels. ==== Payments ==== A relatively new feature to instant messaging, peer-to-peer payments are available for financial tasks on top of communication. The lack of a service fee also makes these advantageous to financial applications. IM services such as Facebook Messenger and the WeChat 'super-app' for example offer a payment feature. == History == === Early systems === Though the term dates from the 1990s, instant messaging predates the Internet, first appearing on multi-user operating systems like Compatible Time-Sharing System (CTSS) and Multiplexed Information and Computing Service (Multics) in the mid-1960s. Initially, some of these systems were used as notification systems for services like printing, but quickly were used to facilitate communication with other users logged into the same machine. CTSS facilitated communication via text message for up to 30 people. Parallel to instant messaging were early online chat facilities, the earliest of which was Talkomatic (1973) on the PLATO system, which allowed 5 people to chat simultaneously on a 512 x 512 plasma display (5 lines of text + 1 status line per person). During the bulletin board system (BBS) phenomenon that peaked during the 1980s, some systems incorporated chat features which were similar to instant messaging; Freelancin' Roundtable was one prime example. The first such general-availability commercial online chat service (as opposed to PLATO, which was educational) was the CompuServe CB Simulator in 1980, created by CompuServe executive Alexander "Sandy" Trevor in Columbus, Ohio. As networks developed, the protocols spread with the networks. Some of these used a peer-to-peer protocol (e.g. talk, ntalk and ytalk), while others required peers to connect to a server (see talker and IRC). The Zephyr Notification Service (still in use at some institutions) was invented at MIT's Project Athena in the 1980s to allow service providers to locate and send messages to users. Early instant messaging programs were primarily real-time text, where characters appeared as they were typed. This includes the Unix "talk" command line program, which was popular in the 1980s and early 1990s. Some BBS chat programs (i.e. Celerity BBS) also used a similar interface. Modern implementations of real-time text also exist in instant messengers, such as AOL's Real-Time IM as an optional feature. In the latter half of the 1980s and into the early 1990s, the Quantum Link online service for Commodore 64 computers offered user-to-user messages between concurrently connected customers, which they called "On-Line Messages" (or OLM for short), and later "FlashMail." Quantum Link later became America Online and made AOL Instant Messenger (AIM, discussed later). While the Quantum Link client software ran on a Commodore 64, using only

    Read more →
  • Data independence

    Data independence

    Data independence is the type of data transparency that matters for a centralized DBMS. It refers to the immunity of user applications to changes made in the definition and organization of data. Application programs should not, ideally, be exposed to details of data representation and storage. The DBMS provides an abstract view of the data that hides such details. There are two types of data independence: physical and logical data independence. The data independence and operation independence together gives the feature of data abstraction. There are two levels of data independence. == Logical data independence == The logical structure of the data is known as the 'schema definition'. In general, if a user application operates on a subset of the attributes of a relation, it should not be affected later when new attributes are added to the same relation. Logical data independence indicates that the conceptual schema can be changed without affecting the existing schemas. == Physical data independence == The physical structure of the data is referred to as "physical data description". Physical data independence deals with hiding the details of the storage structure from user applications. The application should not be involved with these issues since, conceptually, there is no difference in the operations carried out against the data. There are three types of data independence: Logical data independence: The ability to change the logical (conceptual) schema without changing the External schema (User View) is called logical data independence. For example, the addition or removal of new entities, attributes, or relationships to the conceptual schema or having to rewrite existing application programs. Physical data independence: The ability to change the physical schema without changing the logical schema is called physical data independence. For example, a change to the internal schema, such as using different file organization or storage structures, storage devices, or indexing strategy, should be possible without having to change the conceptual or external schemas. View level data independence: always independent no effect, because there doesn't exist any other level above view level. == Data independence == Data independence can be explained as follows: Each higher level of the data architecture is immune to changes of the next lower level of the architecture. The logical scheme stays unchanged even though the storage space or type of some data is changed for reasons of optimization or reorganization. In this, external schema does not change. In this, internal schema changes may be required due to some physical schema were reorganized here. Physical data independence is present in most databases and file environment in which hardware storage of encoding, exact location of data on disk, merging of records, so on this are hidden from user. == Data independence types == The ability to modify schema definition in one level without affecting schema of that definition in the next higher level is called data independence. There are two levels of data independence, they are Physical data independence and Logical data independence. Physical data independence is the ability to modify the physical schema without causing application programs to be rewritten. Modifications at the physical level are occasionally necessary to improve performance. It means we change the physical storage/level without affecting the conceptual or external view of the data. The new changes are absorbed by mapping techniques. Logical data independence is the ability to modify the logical schema without causing application programs to be rewritten. Modifications at the logical level are necessary whenever the logical structure of the database is altered (for example, when money-market accounts are added to banking system). Logical Data independence means if we add some new columns or remove some columns from table then the user view and programs should not change. For example: consider two users A & B. Both are selecting the fields "EmployeeNumber" and "EmployeeName". If user B adds a new column (e.g. salary) to his table, it will not affect the external view for user A, though the internal schema of the database has been changed for both users A & B. Logical data independence is more difficult to achieve than physical data independence, since application programs are heavily dependent on the logical structure of the data that they access.

    Read more →
  • Mark V. Shaney

    Mark V. Shaney

    Mark V. Shaney is a synthetic Usenet user whose postings in the net.singles newsgroups were generated by Markov chain techniques, based on text from other postings. The username is a play on the words "Markov chain". Many readers were fooled into thinking that the quirky, sometimes uncannily topical posts were written by a real person. The system was designed by Rob Pike with coding by Bruce Ellis. Don P. Mitchell wrote the Markov chain code, initially demonstrating it to Pike and Ellis using the Tao Te Ching as a basis. They chose to apply it to the net.singles netnews group. The program is fairly simple. It ingests the sample text (the Tao Te Ching, or the posts of a Usenet group) and creates a massive list of every sequence of three successive words (triplet) which occurs in the text. It then chooses two words at random, and looks for a word which follows those two in one of the triplets in its massive list. If there is more than one, it picks at random (identical triplets count separately, so a sequence which occurs twice is twice as likely to be picked as one which only occurs once). It then adds that word to the generated text. Then, in the same way, it picks a triplet that starts with the second and third words in the generated text, and that gives a fourth word. It adds the fourth word, then repeats with the third and fourth words, and so on. This algorithm is called a third-order Markov chain (because it uses sequences of three words). == Examples == A classic example, from 1984, originally sent as a mail message, later posted to net.singles is reproduced here: >From mvs Fri Nov 16 17:11 EST 1984 remote from alice It looks like Reagan is going to say? Ummm... Oh yes, I was looking for. I'm so glad I remembered it. Yeah, what I have wondered if I had committed a crime. Don't eat with your assessment of Reagon and Mondale. Up your nose with a guy from a firm that specifically researches the teen-age market. As a friend of mine would say, "It really doesn't matter"... It looks like Reagan is holding back the arms of the American eating public have changed dramatically, and it got pretty boring after about 300 games. People, having a much larger number of varieties, and are very different from what one can find in Chinatowns across the country (things like pork buns, steamed dumplings, etc.) They can be cheap, being sold for around 30 to 75 cents apiece (depending on size), are generally not greasy, can be adequately explained by stupidity. Singles have felt insecure since we came down from the Conservative world at large. But Chuqui is the way it happened and the prices are VERY reasonable. Can anyone think of myself as a third sex. Yes, I am expected to have. People often get used to me knowing these things and then a cover is placed over all of them. Along the side of the $$ are spent by (or at least for ) the girls. You can't settle the issue. It seems I've forgotten what it is, but I don't. I know about violence against women, and I really doubt they will ever join together into a large number of jokes. It showed Adam, just after being created. He has a modem and an autodial routine. He calls my number 1440 times a day. So I will conclude by saying that I can well understand that she might soon have the time, it makes sense, again, to get the gist of my argument, I was in that (though it's a Republican administration). _-_-_-_-Mark Other quotations from Mark's Usenet posts are: "I spent an interesting evening recently with a grain of salt." (Alternatively reported as "While at a conference a few weeks back, I spent an interesting evening with a grain of salt.") "I hope that there are sour apples in every bushel." (see also sour grapes) == History == In The Usenet Handbook Mark Harrison writes that after September 1981, students joined Usenet en masse, "creating the USENET we know today: endless dumb questions, endless idiots posing as savants, and (of course) endless victims for practical jokes." In December, Rob Pike created the netnews group net.suicide as prank, "a forum for bad jokes". Some users thought it was a legitimate forum, some discussed "riding motorcycles without helmets". At first, most posters were "real people", but soon "characters" began posting. Pike created a "vicious" character named Bimmler. At its peak, net.suicide had ten frequent posters; nine were "known to be characters." But ultimately, Pike deleted the newsgroup because it was too much work to maintain; Bimmler messages were created "by hand". The "obvious alternative" was software, running on a Bell Labs computer created by Bruce Ellis, based on the Markov code by Don Mitchell, which became the online character Mark V. Shaney. Kernighan and Pike listed Mark V. Shaney in the acknowledgements in The Practice of Programming, noting its roots in Mitchell's markov, which, adapted as shaney, was used for "humorous deconstructionist activities" in the 1980s. Dewdney pointed out "perhaps Mark V. Shaney's magnum opus: a 20-page commentary on the deconstructionist philosophy of Jean Baudrillard" directed by Pike, with assistance from Henry S. Baird and Catherine Richards, to be distributed by email. The piece was based on Jean Baudrillard's "The Precession of Simulacra", published in Simulacra and Simulation (1981). == Reception == The program was discussed by A. K. Dewdney in the Scientific American "Computer Recreations" column in 1989, by Penn Jillette in his PC Computing column in 1991, and in several books, including the Usenet Handbook, Bots: the Origin of New Species, Hippo Eats Dwarf: A Field Guide to Hoaxes and Other B.S., and non-computer-related journals such as Texas Studies in Literature and Language. Dewdney wrote about the program's output, "The overall impression is not unlike what remains in the brain of an inattentive student after a late-night study session. Indeed, after reading the output of Mark V. Shaney, I find ordinary writing almost equally strange and incomprehensible!" He noted the reactions of newsgroup users, who have "shuddered at Mark V. Shaney's reflections, some with rage and others with laughter:" The opinions of the new net.singles correspondent drew mixed reviews. Serious users of the bulletin board's services sensed satire. Outraged, they urged that someone "pull the plug" on Mark V. Shaney's monstrous rantings. Others inquired almost admiringly whether the program was a secret artificial intelligence project that was being tested in a human conversational environment. A few may even have thought that Mark V. Shaney was a real person, a tortured schizophrenic desperately seeking a like-minded companion. Concluding, Dewdney wrote, "If the purpose of computer prose is to fool people into thinking that it was written by a sane person, Mark V. Shaney probably falls short." A 2012 article in Observer compared Mark V. Shaney's "strangely beautiful" postings to the Horse_ebooks account on Twitter and music reviews at Pitchfork, saying that "this mash-up of gibberish and human sentiment" is what "made Mark V. Shaney so endlessly fascinating".

    Read more →
  • Link encryption

    Link encryption

    Link encryption is an approach to communications security that encrypts and decrypts all network traffic at each network routing point (e.g. network switch, or node through which it passes) until arrival at its final destination. This repeated decryption and encryption is necessary to allow the routing information contained in each transmission to be read and employed further to direct the transmission toward its destination, before which it is re-encrypted. This contrasts with end-to-end encryption where internal information, but not the header/routing information, is encrypted by the sender at the point of origin and only decrypted by the intended recipient. Link encryption offers two main advantages: encryption is automatic so there is less opportunity for human error. if the communications link operates continuously and carries an unvarying level of traffic, link encryption defeats traffic analysis. On the other hand, end-to-end encryption ensures only the intended recipient has access to the plaintext. Link encryption can be used with end-to-end systems by superencrypting the messages. Bulk encryption refers to encrypting a large number of circuits at once, after they have been multiplexed.

    Read more →
  • Contrast set learning

    Contrast set learning

    Contrast set learning is a form of association rule learning that seeks to identify meaningful differences between separate groups by reverse-engineering the key predictors that identify for each particular group. For example, given a set of attributes for a pool of students (labeled by degree type), a contrast set learner would identify the contrasting features between students seeking bachelor's degrees and those working toward PhD degrees. == Overview == A common practice in data mining is to classify, to look at the attributes of an object or situation and make a guess at what category the observed item belongs to. As new evidence is examined (typically by feeding a training set to a learning algorithm), these guesses are refined and improved. Contrast set learning works in the opposite direction. While classifiers read a collection of data and collect information that is used to place new data into a series of discrete categories, contrast set learning takes the category that an item belongs to and attempts to reverse engineer the statistical evidence that identifies an item as a member of a class. That is, contrast set learners seek rules associating attribute values with changes to the class distribution. They seek to identify the key predictors that contrast one classification from another. For example, an aerospace engineer might record data on test launches of a new rocket. Measurements would be taken at regular intervals throughout the launch, noting factors such as the trajectory of the rocket, operating temperatures, external pressures, and so on. If the rocket launch fails after a number of successful tests, the engineer could use contrast set learning to distinguish between the successful and failed tests. A contrast set learner will produce a set of association rules that, when applied, will indicate the key predictors of each failed tests versus the successful ones (the temperature was too high, the wind pressure was too high, etc.). Contrast set learning is a form of association rule learning. Association rule learners typically offer rules linking attributes commonly occurring together in a training set (for instance, people who are enrolled in four-year programs and take a full course load tend to also live near campus). Instead of finding rules that describe the current situation, contrast set learners seek rules that differ meaningfully in their distribution across groups (and thus, can be used as predictors for those groups). For example, a contrast set learner could ask, “What are the key identifiers of a person with a bachelor's degree or a person with a PhD, and how do people with PhD's and bachelor’s degrees differ?” Standard classifier algorithms, such as C4.5, have no concept of class importance (that is, they do not know if a class is "good" or "bad"). Such learners cannot bias or filter their predictions towards certain desired classes. As the goal of contrast set learning is to discover meaningful differences between groups, it is useful to be able to target the learned rules towards certain classifications. Several contrast set learners, such as MINWAL or the family of TAR algorithms, assign weights to each class in order to focus the learned theories toward outcomes that are of interest to a particular audience. Thus, contrast set learning can be thought of as a form of weighted class learning. === Example: Supermarket Purchases === The differences between standard classification, association rule learning, and contrast set learning can be illustrated with a simple supermarket metaphor. In the following small dataset, each row is a supermarket transaction and each "1" indicates that the item was purchased (a "0" indicates that the item was not purchased): Given this data, Association rule learning may discover that customers that buy onions and potatoes together are likely to also purchase hamburger meat. Classification may discover that customers that bought onions, potatoes, and hamburger meats were purchasing items for a cookout. Contrast set learning may discover that the major difference between customers shopping for a cookout and those shopping for an anniversary dinner are that customers acquiring items for a cookout purchase onions, potatoes, and hamburger meat (and do not purchase foie gras or champagne). == Treatment learning == Treatment learning is a form of weighted contrast-set learning that takes a single desirable group and contrasts it against the remaining undesirable groups (the level of desirability is represented by weighted classes). The resulting "treatment" suggests a set of rules that, when applied, will lead to the desired outcome. Treatment learning differs from standard contrast set learning through the following constraints: Rather than seeking the differences between all groups, treatment learning specifies a particular group to focus on, applies a weight to this desired grouping, and lumps the remaining groups into one "undesired" category. Treatment learning has a stated focus on minimal theories. In practice, treatment are limited to a maximum of four constraints (i.e., rather than stating all of the reasons that a rocket differs from a skateboard, a treatment learner will state one to four major differences that predict for rockets at a high level of statistical significance). This focus on simplicity is an important goal for treatment learners. Treatment learning seeks the smallest change that has the greatest impact on the class distribution. Conceptually, treatment learners explore all possible subsets of the range of values for all attributes. Such a search is often infeasible in practice, so treatment learning often focuses instead on quickly pruning and ignoring attribute ranges that, when applied, lead to a class distribution where the desired class is in the minority. === Example: Boston housing data === The following example demonstrates the output of the treatment learner TAR3 on a dataset of housing data from the city of Boston (a nontrivial public dataset with over 500 examples). In this dataset, a number of factors are collected for each house, and each house is classified according to its quality (low, medium-low, medium-high, and high). The desired class is set to "high", and all other classes are lumped together as undesirable. The output of the treatment learner is as follows: Baseline class distribution: low: 29% medlow: 29% medhigh: 21% high: 21% Suggested Treatment: [PTRATIO=[12.6..16), RM=[6.7..9.78)] New class distribution: low: 0% medlow: 0% medhigh: 3% high: 97% With no applied treatments (rules), the desired class represents only 21% of the class distribution. However, if one filters the data set for houses with 6.7 to 9.78 rooms and a neighborhood parent-teacher ratio of 12.6 to 16, then 97% of the remaining examples fall into the desired class (high-quality houses). == Algorithms == There are a number of algorithms that perform contrast set learning. The following subsections describe two examples. === STUCCO === The STUCCO contrast set learner treats the task of learning from contrast sets as a tree search problem where the root node of the tree is an empty contrast set. Children are added by specializing the set with additional items picked through a canonical ordering of attributes (to avoid visiting the same nodes twice). Children are formed by appending terms that follow all existing terms in a given ordering. The formed tree is searched in a breadth-first manner. Given the nodes at each level, the dataset is scanned and the support is counted for each group. Each node is then examined to determine if it is significant and large, if it should be pruned, and if new children should be generated. After all significant contrast sets are located, a post-processor selects a subset to show to the user - the low order, simpler results are shown first, followed by the higher order results which are "surprising and significantly different." The support calculation comes from testing a null hypothesis that the contrast set support is equal across all groups (i.e., that contrast set support is independent of group membership). The support count for each group is a frequency value that can be analyzed in a contingency table where each row represents the truth value of the contrast set and each column variable indicates the group membership frequency. If there is a difference in proportions between the contrast set frequencies and those of the null hypothesis, the algorithm must then determine if the differences in proportions represent a relation between variables or if it can be attributed to random causes. This can be determined through a chi-square test comparing the observed frequency count to the expected count. Nodes are pruned from the tree when all specializations of the node can never lead to a significant and large contrast set. The decision to prune is based on: The minimum deviation size: The maximum difference between the support

    Read more →
  • Social media measurement

    Social media measurement

    Social media measurement, also called social media controlling, is the management practice of evaluating successful social media communications of brands, companies, or other organizations. Key performance indicators may be measured by extracting information from social media channels, such as blogs, wikis, micro-blogs such as Twitter, social networking sites, or video/photo sharing websites, forums from time to time. It is also used by companies to gauge current trends in the industry. The process first gathers data from different websites and then performs analysis based on different metrics like time spent on the page, click through rate, content share, comments, text analytics to identify positive or negative emotions about the brand. Some other social media metrics include share of voice, owned mentions, and earned mentions. The social media measurement process starts with defining a goal that needs to be achieved and defining the expected outcome of the process. The expected outcome varies per the goal and is usually measured by a variety of metrics. This is followed by defining possible social strategies to be used to achieve the goal. Then the next step is designing strategies to be used and setting up configuration tools that ease the process of collecting the data. In the next step, strategies and tools are deployed in real-time. This step involves conducting Quality Assurance tests of the methods deployed to collect the data. And in the final step, data collected from the system is analyzed and if the need arises, it is refined on the run time to enhance the methodologies used. The last step ensures that the result obtained is more aligned with the goal defined in the first step. == Data Acquisition == Acquiring data from social media is in demand of an exploring the user participation and population with the purpose of retrieving and collecting so many kinds of data(ex: comments, downloads etc.). There are several prevalent techniques to acquire data such as Network traffic analysis, Ad-hoc application and Crawling Network Traffic Analysis - Network traffic analysis is the process of capturing network traffic and observing it closely to determine what is happening in the network. It is primarily done to improve the performance, security and other general management of the network. However concerned about the potential tort of privacy on the Internet, network traffic analysis is always restricted by the government. Furthermore, high-speed links are not adaptable to traffic analysis because of the possible overload problem according to the packet sniffing mechanism Ad-hoc Application - Ad-hoc application is a kind of application that provides services and games to social network users by developing the APIs offered by social network companies (Facebook Developer Platform). The infrastructure of Ad-hoc application allows the user to interact with the interface layer instead of the application servers. The API provides a path for application to access information after the user login. Moreover, the size of the data set collected vary with the popularity of the social media platform i.e. social media platforms having high number of users will have more data than platforms having less user base. Scraping is a process in which the APIs collect online data from social media. The data collected from Scraping is in raw format. However, having access to these types of data is a bit difficult because of its commercial value. Crawling - Crawling is a process in which a web crawler creates indexes of all the words in a web-page, stores them, then follows all the hyperlinks and indexes on that page and again stores them. It is the most popular technique for data acquisition and is also well known for its easy operation based on prevalent Object-Orientated Programming Language (Java or Python etc.). And most important, social network companies (YouTube, Flicker, Facebook, Instagram, etc.) are friendly to crawling techniques by providing public APIs == Applications == === For branding === Monitoring social media allows researchers to find insights into a brand's overall visibility on social media, to measure the impact of campaigns, to identify opportunities for engagement, to assess competitor activity and share of voice, and to detect impending crises. It can also provide valuable information about emerging trends and what consumers and clients think about specific topics, brands or products. This is the work of a cross-section of groups that include market researchers, PR staff, marketing teams, social-engagement, and community staff, agencies and sales teams. Several different providers have developed tools to facilitate the monitoring of a variety of social media channels - from blogging to internet video to internet forums. This allows companies to track what consumers say about their brands and actions. Companies can then react to these conversations and interact with consumers through social media platforms. === In government === Apart from commercial applications, social media monitoring has become a pervasive technique applied by public organizations and governments. Monitoring is a tradition within the public sector, and social-media monitoring provides a real-time approach to detecting and responding to social developments. Governments have come to realize the need for strategies to cope with surprises from the rapid expansion of public issues. Sobkowicz introduced a framework with three blocks of social-media opinion tracking, simulating and forecasting. It includes: real-time detection of emotions, topics and opinions information-flow modelling and agent-based simulation modeling of opinion networks Bekkers introduced the application of social media monitoring in the Netherlands. Public organizations in the Netherlands (such as the Tax Agency and the Education Ministry) have started to use social media monitoring to obtain better insights into the sentiments of target groups. On the one hand, the public sector will be enabled to provide timely and efficient answers to the public by using social media monitoring techniques, but on the other hand, they also have to deal with concerns about ethical issues such as transparency and privacy. == Quantifying social media == Social media management software (SMMS) is an application program or software that facilitates an organization's ability to successfully engage in social media across different communication channels. SMMS is used to monitor inbound and outbound conversations, support customer interaction, audit or document social marketing initiatives and evaluate the usefulness of a social media presence. It can be difficult to measure all social media conversations. Due to privacy settings and other issues, not all social media conversations can be found and reported by monitoring tools. However, whilst social media monitoring cannot give absolute figures, it can be extremely useful for identifying trends and for benchmarking, in addition to the uses mentioned above. These findings can, in turn, influence and shape future business decisions. In order to access social media data (posts, Tweets, and meta-data) and to analyze and monitor social media, many companies use software technologies built for business. These range from in-platform analytics dashboards to dedicated third-party platforms, which offer more advanced capabilities including cross-platform audience intelligence, sentiment analysis, and trend detection at scale. == Location-based == Most social media networks allow users to add a location to their posts (reference all of our feeds). The location can be classified as either 'at-the-location' or 'about-the-location'. "'At-the-location' services can be defined as services where location-based content is created at the geographic location. 'About-the-location' services can be defined as services which are referring to a particular location but the content is not necessarily created in this particular physical place." The added information available from geotagged (link to Geotagging article) posts means that they can be displayed on a map. This means that a location can be used as the start of a social media search rather than a keyword or hashtag. This has major implications for disaster relief, event monitoring, safety and security professionals since a large portion of their job is related to tracking and monitoring specific locations. == Technologies used == Various monitoring platforms use different technologies for social media monitoring and measurement. These technology providers may connect to the API provided by social platforms that are created for 3rd party developers to develop their own applications and services that access data. Facebook's Graph API is one such API that social media monitoring solution products would connect to pull data from. Some social media monitoring and analytics companies use calls to data providers each time an end-user d

    Read more →
  • List of C++ software and tools

    List of C++ software and tools

    This is a list of notable software and programming tools for the C++ programming language, including libraries, web frameworks, programming language implementations, compilers, integrated development environments (IDEs), and other related software development utilities. == Compilers and IDEs == AMD Optimizing C/C++ Compiler — proprietary fork of LLVM + Clang for Linux C++Builder — rapid application development (RAD) environment Clang – compiler front end for C, C++, and Objective-C, part of LLVM CLion — C++ IDE by JetBrains Code::Blocks — open-source cross-platform IDE that supports multiple compilers including GCC, Clang and Visual C++ CodeLite — cross-platform IDE for the C/C++ programming languages using the wxWidgets toolkit CodeSynthesis XSD – XML Data Binding compiler Dev-C++ — MinGW or TDM-GCC 64bit port of the GCC as its compiler GCC – GNU Compiler Collection Intel C++ Compiler – proprietary high-performance compiler by Intel KDevelop — IDE part of the KDE project and is based on KDE Frameworks and Qt, the C/C++ backend uses Clang. Microsoft Visual C++ – proprietary C++ compiler and IDE for Windows Oracle Developer Studio — Solaris, OpenSolaris, RHEL, and Oracle Linux operating systems. Qt Creator — part of the SDK for the Qt GUI application development framework and uses the Qt API SlickEdit — text editor and IDE Turbo C++ – legacy C++ IDE and compiler popular in the 1990s Understand — IDE that enables static code analysis through an array of visuals, documentation, and metric tools. Visual Studio — integrated development environment by Microsoft that supports C++ Visual Studio Code — integrated development environment by Microsoft that supports C++ Xcode — Apple IDE to develop macOS, iOS, iPadOS, watchOS, tvOS, and visionOS that supports C++ source code. == Debuggers == Allinea DDT – a graphical debugger dbx — a proprietary source-level debugger GNU Debugger – portable debugger that runs on many Unix-like systems Modular Debugger — a C/C++ source level debugger for Solaris and derivates Undo LiveRecorder — time travel debugger == Libraries == Active Template Library – template-based C++ classes developed by Microsoft Apache MXNet — deep learning framework Apache Xerces – parsing, validating, and serializing and manipulating XML. Asio — networking and low-level I/O library Bitpit — scientific computing and mesh manipulation library Boost — collection of peer-reviewed libraries Botan — cryptography library C++ AMP – easy way to write programs that compile and execute on data-parallel hardware, such as graphics cards and GPUs C++ Standard Library — standard library for the language C++/WinRT — library for Microsoft's Windows Runtime platform, designed to provide access to modern Windows APIs. C3D Toolkit — geometric modeling kernel Caffe — deep learning framework CAPD — library for rigorous numerics and dynamical systems Cassowary — constraint-solving toolkit that efficiently solves systems of linear equalities and inequalities Cinder — library for creative coding ClanLib — cross-platform game SDK CMU Sphinx — speech recognition system Crypto++ — cryptographic algorithms library Dlib — general-purpose cross-platform library Dune — partial differential equations using grid-based methods fastText — text representation and text classification library FLTK — GUI toolkit Geospatial Data Abstraction Library — geospatial data access library GDCM — image library General Polygon Clipper — polygon clipping library GiNaC — computer algebra system that uses Class Library for Numbers for implementing arbitrary-precision arithmetic GLFW — OpenGL and window management library HarfBuzz — text rendering and typesetting library High Efficiency Image File Format — digital container format for storing individual digital images and image sequences ITK — image analysis library Integrated Performance Primitives — domain-specific functions that are highly optimized for diverse Intel architectures Jackets library — GPU computing library JSBSim — open-source flight dynamics model JUCE — framework for audio applications KDE Frameworks — collection of libraries from the KDE project KFRlib — digital signal processing framework LEMON — library for optimization and graph problems LevelDB — key–value database library Libdash — MPEG-DASH streaming library libLAS — reading and writing geospatial data encoded in the ASPRS laser (LAS) file format libsigc++ — typesafe callbacks LibRaw — free and open-source software library for reading raw files from digital cameras libSBML — application programming interface (API) for the SBML (Systems Biology Markup Language) LIBSVM — sequential minimal optimization (SMO) algorithm for kernelized support vector machines Libx — DirectX .X files graphics library Loki — collection of design patterns LIVE555 — multimedia streaming library Metakit — embedded database library Microsoft Cognitive Toolkit — deep learning toolkit Microsoft Foundation Class Library — object-oriented library for developing desktop applications for Windows Microsoft SEAL — homomorphic encryption library mlpack — machine learning and AI library Mobile Robot Programming Toolkit — robotics research library Object Windows Library — Object Windows Library, superseded by VCL Open Cascade — CAD and 3D modeling library Open Asset Import Library — 3D model import library to provide a common API for different 3D asset file formats OpenCV – computer vision and machine learning library OpenFOAM — computational fluid dynamics toolkit OpenH264 — real-time encoding and decoding video streams in the H.264/MPEG-4 AVC format OpenImageIO — image processing library Open Inventor — higher layer of programming for OpenGL OpenNN — neural networks library OpenVDB — sparse volume data library openFrameworks — creative coding toolkit OpenRTM-aist — robotics middleware library Oracle Template Library — database access that supports IBM Db2 and Open Database Connectivity Orfeo toolbox — remote sensing image processing library OR-Tools — operations research and optimization library Parallel Augmented Maps — ordered sets, ordered maps, and augmented maps. Parallel Patterns Library — Microsoft library that provides features for multicore programming PhysX — physics simulation engine POCO C++ Libraries — general-purpose libraries for software development Poppler — PDF rendering library Protocol Buffers — data serialization library Qt — cross-platform widget toolkit QuantLib — quantitative finance library RocksDB — key–value database library ROOT — data analysis framework from CERN ROS — robotics middleware Scintilla — source code editing component SDL – Simple DirectMedia Layer, cross-platform development library for multimedia applications SFML – Simple and Fast Multimedia Library Shark – open-source machine learning library Shogun — machine learning toolbox Skia — 2D graphics library Snappy — compression library Sound Object Library — music and audio development Standard Template Library — library of containers and algorithms Stapl — parallel computing library SymbolicC++ — symbolic computation library TerraLib — GIS library Tesseract OCR — optical character recognition engine Threading Building Blocks — parallel computing library ThreadWeaver — concurrency framework Tiny-dnn — lightweight deep learning library TinyXML — lightweight XML parser Tkrzw — key–value databases VTD-XML — XML processing library wxWidgets — cross-platform GUI toolkit x265 — video encoding library for HEVC XGBoost — gradient boosting library Windows Template Library — Win32 development === Mathematical and numerical libraries === == Tools == Akonadi — a C++/Qt framework and storage service for personal information management BALL – framework and set of algorithms and data structures for molecular modelling and computational structural bioinformatics Boehm garbage collector – conservative garbage collector CEGUI — C++ GUI library ClanLib – video game SDK CMake — cross-platform build system for C++ projects Confidential Consortium Framework – blockchain infrastructure framework DaviX – WebDAV client Doxygen — documentation generator for C++ and other languages FLTK — Fast Light Toolkit, cross-platform GUI library Fox toolkit — C++ GUI toolkit GDB — GNU Project debugger, often used with C and C++ gtkmm — official C++ interface for the popular GUI library GTK HOOPS Visualize — 3D computer graphics HPX — partitioned global address space Parallel programming Runtime System JUCE — cross-platform C++ audio and GUI framework LessTif — free clone of Motif GUI toolkit MFC — Microsoft Foundation Class library Nana — modern C++ GUI toolkit PTK Toolkit — 2D rendering engine and SDK, and portability options. Qt — cross-platform C++ GUI toolkit Rogue Wave — C++ GUI toolkit TnFOX — C++ GUI toolkit Ultimate++ — cross-platform C++ GUI framework Valgrind — tool suite for debugging and profiling C/C++ programs wxWidgets — cross-platform C++ GUI toolkit x265 — encoder for creating digital video streams in the High Efficiency Vid

    Read more →
  • Storyful

    Storyful

    Storyful (stylized as storyful.) is a social media intelligence company headquartered in Dublin, Ireland that is a subsidiary of News Corp, offering services such as social news monitoring, video licensing, and reputation risk management tools for corporate clients. The startup was launched as the first social media newswire, a content aggregator, verifying news sources and online content in Dublin in 2010 by Mark Little, a former journalist with RTÉ News. Storyful was acquired by News Corp in 2013 for USD$25 million. == Background == Mark Little, who had worked as a television journalist for RTÉ One, founded startup Storyful in Dublin, Ireland, in 2010, as a service that "verified news sources and online content". According to Nieman Lab, Storyful had a reputation for content aggregation as a social news agency—finding, verifying, distributing, licensing, and commercializing user-generated content, social media and online content from social networking services, including videos about stories in the news, such as the Syrian Civil War, Arab Spring protests, as well as "smaller viral moments". Storyful aimed to provide authority through its verification and monitoring tools while providing authenticity through user-generated content. On 20 December 2013 News Corp purchased Storyful for US$25 million and opened a New York office in the same building as Fox News' main studios. Little left Storyful in 2015 and Gavin Sheridan, Storyful's director of innovation left in 2014. News Corp CEO Robert Thomson said that through Storyful, News Corp would "define the opportunities that the digital landscape presents, rather than simply adapt to them." After the acquisition, the company expanded its service to include "commercial and creative work". After Murdoch acquired the company, from 2014 through to February 2018, losses "swelled", requiring a series of cash injections from News Corp. During that time the company expanded aggressively globally with a staff of about 200 worldwide up from about 30 in 2014. According to The Guardian, in 2016, journalists were encouraged by Storyful to use the social media monitoring software called Verify developed by Storyful. By installing Verify's web browser extension on their computers, Verify would inform the journalists when social media content had been "verified and cleared". The Guardian revealed that through the Verify plugin, dozens of staff in four offices had access to the journalists browsing activity without them knowing. This data allowed Storyful to actively monitor its own clients' activities on social media and to "turn it into an internal feed" at Storyful that "updates in real time". In November 2018, when a video circulated by Infowars' Paul Joseph Watson appeared to prove that CNN's Jim Acosta's contact with a White House intern was a physical blow, Storyful was able to prove that the 15-second-long clip had been doctored. According to a 21 January 2019 article in CNN Business, Rob McDonagh, the editor of Storyful's U.S. news team, had proven that one of the viral videos that served as catalysts in the January 2019 Lincoln Memorial confrontation at 18 January 2019 Indigenous Peoples March, was posted by a suspicious account, under the handle @2020fight. McDonagh's team validates videos and posts before adding them to their "digest", distinguishing true stories from those that are not. Storyful attempts to validate each post or video before including it in its digest. McDonagh reviewed previous content from @2020fight's account, and found it suspicious because it had a high follower count, a "highly polarized and yet inconsistent political messaging", an "unusually high rate of tweets", and "the use of someone else's image in the profile photo." reporter Donie O'Sullivan said that the @2020fight video that had been posted on 18 January, which had 2.5 million views by 22 January, was the one that "helped frame the news cycle". Currently the website offers a service by which video can be commercially brokered. == Services == Services include a newswire service—one of their "core pillars"—and social news monitoring. By February 2018, Storyful was developing "risk and reputation monitoring" services through which they would source and verify social news, fact-checking it and contextualising it for corporate clients. They were "developing tech tools" to "explore obscure or closed networks" for their intelligence team. can use to explore obscure or closed networks. They "track deviations in social conversations around brands and organisations and catch potential risks before they blow up. Like an alerts system." The company "released a re-booted version of its Newswire platform in 2018. According to FORA, Storyful was developing new tools to combat fake news online. == Clients == When Storyful was acquired by News Corp in 2013, the company already had the Wall Street Journal, the BBC, New York Times, YouTube, ITN and Channel 4 News as clients. By 2018 their clients included CNN, ABC News and Fox News, The New York Times, the Washington Post, in the United States, the Australian Broadcasting Corporation and all of News Corp’s own publications. Most of their "reputation-conscious corporate customers" clients prefer to not be named.

    Read more →
  • Human rights and encryption

    Human rights and encryption

    Human rights and encryption refers to the ways in which digital encryption affects human rights. Encryption can be used as both a detriment and a boon to human rights; for example, encryption can be used to enforce digital rights management for video games. This kind of video game licensing can render software unusable long term and represents the erosion of consumer rights. At the same time, encryption is fundamental part of internet security. Asymmetrical encryption is used extensively online for authentication, providing users confidence their internet traffic is not being misdirected. Encryption is also used to obfuscate information as it travels from end-to-end over the internet, preventing eavesdropping and tampering. Encryption can also provide anonymity, which is an important consideration for freedom of expression. Despite its drawbacks, encryption is essential for a free, open, and trustworthy internet. == Background == === Human rights === Human rights are moral principles or norms for human behaviour that are regularly protected as legal rights in national and international law. They are commonly understood as inalienable, fundamental rights "to which a person is inherently entitled simply because they are a human being". Those rights are "inherent in all human beings" regardless of their nationality, location, language, religion, ethnic origin, or any other status. They are applicable everywhere and at every time and are universal and egalitarian. === Cryptography === Cryptography is a long-standing subfield of both mathematics and computer science. It can generally be defined as "the protection of information and computation using mathematical techniques." Encryption and cryptography are closely interlinked, although "cryptography" has a broader meaning. For example, a digital signature is "cryptography", but not technically "encryption". == Overview == Under international human rights law, freedom of expression is recognized as a human right under Article 19 of the Universal Declaration of Human Rights (UDHR) and the International Covenant on Civil and Political Rights (ICCPR). In Article 19 of the UDHR states that "everyone shall have the right to hold opinions without interference" and "everyone shall have the right to freedom of expression; this right shall include freedom to seek, receive and impart information and ideas of all kinds, regardless of frontiers, either orally, in writing or in print, in the form of art, or through any other media of his choice". Since the 1970s, the availability of digital computing and the invention of public-key cryptography have made encryption more widely available. (Previously, encryption techniques were the domain of nation-state actors.) Cryptographic techniques are also used to protect the anonymity of communicating actors and privacy more generally. The availability and use of encryption continue to lead to complex, important, and highly contentious legal policy debates. Some government agencies have made statements or proposals to lessen such usage and deployment due to hurdles it presents for government access. The rise of commercial end-to-end encryption services have pushed towards more debates around the use of encryption and the legal status of cryptography in general. Encryption, as defined above, is a set of cryptographic techniques to protect information. The normative value of encryption, however, is not fixed but varies with the type and purpose of the cryptographic methods used. Traditionally, encryption (cipher) techniques were used to ensure the confidentiality of communications and prevent access to information and communications by others and intended recipients. Cryptography can also ensure the authenticity of communicating parties and the integrity of communications contents, providing a key ingredient for enabling trust in the digital environment. There is a growing awareness within human rights organizations that encryption plays an important role in realizing a free, open, and trustworthy Internet. UN Special Rapporteur on the promotion and protection of the right to freedom of opinion and expression David Kaye observed, during the Human Rights Council in June 2015, that encryption and anonymity deserve a protected status under the rights to privacy and freedom of expression: "Encryption and anonymity, today's leading vehicles for online security, provide individuals with a means to protect their privacy, empowering them to browse, read, develop and share opinions and information without interference and enabling journalists, civil society organizations, members of ethnic or religious groups, those persecuted because of their sexual orientation or gender identity, activists, scholars, artists and others to exercise the rights to freedom of opinion and expression." == Encryption in media and communication == In the context of media and communication, two types of encryption in media and communication can be distinguished: It could be used as a result of the choice of a service provider or deployed by Internet users. Client-side encryption tools and technologies are relevant for marginalized communities, journalists and other online media actors practicing journalism as a way of protecting their rights. It could prevent unauthorized third party access, but the service provider implementing it would still have access to the relevant user data. End-to-end encryption is an encryption technique that refers to encryption that also prevents service providers themselves from having access to the user's communications. The implementation of these forms of encryption has sparked the most debate since the start of the 21st century. === Service providers deployed techniques to prevent unauthorized third-party access. === Among the most widely deployed cryptographic techniques is the securitization of communications channel between internet users and specific service providers from man-in-the-middle attacks, access by unauthorized third parties. Given the breadth of nuances involved, these cryptographic techniques must be run jointly by both the service user and the service provider in order to work properly. They require service providers, including online news publisher(s) or social network(s), to actively implement them into service design. Users cannot deploy these techniques unilaterally; their deployment is contingent on active participation by the service provider. The TLS protocol, which becomes visible to the normal internet user through the HTTPS header, is widely used for securing online commerce, e-government services and health applications as well as devices that make up networked infrastructures, e.g., routers, cameras. However, although the standard has been around since 1990, the wider spread and evolution of the technology has been slow. As with other cryptographic methods and protocols, the practical challenges related to proper, secure and (wider) deployment are significant and have to be considered. Many service providers still do not implement TLS or do not implement it well. In the context of wireless communications, the use of cryptographic techniques that protect communications from third parties are also important. Different standards have been developed to protect wireless communications: 2G, 3G and 4G standards for communication between mobile phones, base stations and base stations controllers; standards to protect communications between mobile devices and wireless routers ('WLAN'); and standards for local computer networks. One common weakness in these designs is that the transmission points of the wireless communication can access all communications e.g., the telecommunications provider. This vulnerability is exacerbated when wireless protocols only authenticate user devices, but not the wireless access point. Whether the data is stored on a device, or on a local server as in the cloud, there is also a distinction between 'at rest'. Given the vulnerability of cellphones to theft for instance, particular attention may be given to limiting service provided access. This does not exclude the situation that the service provider discloses this information to third parties like other commercial entities or governments. The user needs to trust the service provider to act in their interests. The possibility that a service provider is legally compelled to hand over user information or to interfere with particular communications with particular users, remains. === Privacy-enhancing Technologies === There are services that specifically market themselves with claims not to have access to the content of their users' communication. Service Providers can also take measures that restrict their ability to access information and communication, further increasing the protection of users against access to their information and communications. The integrity of these Privacy Enhancing Technologies (PETs), depends on delicate design decisions as well as the

    Read more →
  • Kerckhoffs's principle

    Kerckhoffs's principle

    Kerckhoffs's principle (also called Kerckhoffs's desideratum, assumption, axiom, doctrine or law) of cryptography was stated by the Dutch cryptographer Auguste Kerckhoffs in the 19th century. The principle holds that a cryptosystem should be secure, even if everything about the system, except the key, is public knowledge. This concept is widely embraced by cryptographers, in contrast to security through obscurity, which is not. Kerckhoffs's principle was phrased by the American mathematician Claude Shannon as "the enemy knows the system", i.e., "one ought to design systems under the assumption that the enemy will immediately gain full familiarity with them". In that form, it is called Shannon's maxim. Another formulation by American researcher and professor Steven M. Bellovin is: In other words—design your system assuming that your opponents know it in detail. (A former official at NSA's National Computer Security Center told me that the standard assumption there was that serial number 1 of any new device was delivered to the Kremlin.) == Origins == The invention of telegraphy radically changed military communications and increased the number of messages that needed to be protected from the enemy dramatically, leading to the development of field ciphers which had to be easy to use without large confidential codebooks prone to capture on the battlefield. It was this environment which led to the development of Kerckhoffs's requirements. Auguste Kerckhoffs was a professor of German language at Ecole des Hautes Etudes Commerciales (HEC) in Paris. In early 1883, Kerckhoffs's article, La Cryptographie Militaire, was published in two parts in the Journal of Military Science, in which he stated six design rules for military ciphers. Translated from French, they are: The system must be practically, if not mathematically, indecipherable; It should not require secrecy, and it should not be a problem if it falls into enemy hands; It must be possible to communicate and remember the key without using written notes, and correspondents must be able to change or modify it at will; It must be applicable to telegraph communications; It must be portable, and should not require several persons to handle or operate; Lastly, given the circumstances in which it is to be used, the system must be easy to use and should not be stressful to use or require its users to know and comply with a long list of rules. Some are no longer relevant given the ability of computers to perform complex encryption. The second rule, now known as Kerckhoffs's principle, is still critically important. == Explanation of the principle == Kerckhoffs viewed cryptography as a rival to, and a better alternative than, steganographic encoding, which was common in the nineteenth century for hiding the meaning of military messages. One problem with encoding schemes is that they rely on humanly-held secrets such as "dictionaries" which disclose for example, the secret meaning of words. Steganographic-like dictionaries, once revealed, permanently compromise a corresponding encoding system. Another problem is that the risk of exposure increases as the number of users holding the secrets increases. Nineteenth century cryptography, in contrast, used simple tables which provided for the transposition of alphanumeric characters, generally given row-column intersections which could be modified by keys which were generally short, numeric, and could be committed to human memory. The system was considered "indecipherable" because tables and keys do not convey meaning by themselves. Secret messages can be compromised only if a matching set of table, key, and message falls into enemy hands in a relevant time frame. Kerckhoffs viewed tactical messages as only having a few hours of relevance. Systems are not necessarily compromised, because their components (i.e. alphanumeric character tables and keys) can be easily changed. === Advantage of secret keys === Using secure cryptography is supposed to replace the difficult problem of keeping messages secure with a much more manageable one, keeping relatively small keys secure. A system that requires long-term secrecy for something as large and complex as the whole design of a cryptographic system obviously cannot achieve that goal. It only replaces one hard problem with another. However, if a system is secure even when the enemy knows everything except the key, then all that is needed is to manage keeping the keys secret. There are a large number of ways the internal details of a widely used system could be discovered. The most obvious is that someone could bribe, blackmail, or otherwise threaten staff or customers into explaining the system. In war, for example, one side will probably capture some equipment and people from the other side. Each side will also use spies to gather information. If a method involves software, someone could do memory dumps or run the software under the control of a debugger in order to understand the method. If hardware is being used, someone could buy or steal some of the hardware and build whatever programs or gadgets needed to test it. Hardware can also be dismantled so that the chip details can be examined under the microscope. === Maintaining security === A generalization some make from Kerckhoffs's principle is: "The fewer and simpler the secrets that one must keep to ensure system security, the easier it is to maintain system security." Bruce Schneier ties it in with a belief that all security systems must be designed to fail as gracefully as possible: Kerckhoffs's principle applies beyond codes and ciphers to security systems in general: every secret creates a potential failure point. Secrecy, in other words, is a prime cause of brittleness—and therefore something likely to make a system prone to catastrophic collapse. Conversely, openness provides ductility. Any security system depends crucially on keeping some things secret. However, Kerckhoffs's principle points out that the things kept secret ought to be those least costly to change if inadvertently disclosed. For example, a cryptographic algorithm may be implemented by hardware and software that is widely distributed among users. If security depends on keeping that secret, then disclosure leads to major logistic difficulties in developing, testing, and distributing implementations of a new algorithm – it is "brittle". On the other hand, if keeping the algorithm secret is not important, but only the keys used with the algorithm must be secret, then disclosure of the keys simply requires the simpler, less costly process of generating and distributing new keys. == Applications == In accordance with Kerckhoffs's principle, the majority of civilian cryptography makes use of publicly known algorithms. By contrast, ciphers used to protect classified government or military information are often kept secret (see Type 1 encryption). However, it should not be assumed that government/military ciphers must be kept secret to maintain security. It is possible that they are intended to be as cryptographically sound as public algorithms, and the decision to keep them secret is in keeping with a layered security posture. == Security through obscurity == It is moderately common for companies to keep the inner workings of a system secret. Some argue this "security by obscurity" makes the product safer and less vulnerable to attack. A counter-argument is that keeping the innards secret may improve security in the short term, but in the long run, only systems that have been published and analyzed should be trusted. Steven Bellovin and Randy Bush commented: Security Through Obscurity Considered Dangerous Hiding security vulnerabilities in algorithms, software, and/or hardware decreases the likelihood they will be repaired and increases the likelihood that they can and will be exploited. Discouraging or outlawing discussion of weaknesses and vulnerabilities is extremely dangerous and deleterious to the security of computer systems, the network, and its citizens. Open Discussion Encourages Better Security The long history of cryptography and cryptoanalysis has shown time and time again that open discussion and analysis of algorithms exposes weaknesses not thought of by the original authors, and thereby leads to better and more secure algorithms. As Kerckhoffs noted about cipher systems in 1883 [Kerc83], "Il faut qu'il n'exige pas le secret, et qu'il puisse sans inconvénient tomber entre les mains de l'ennemi." (Roughly, "the system must not require secrecy and must be able to be stolen by the enemy without causing trouble.")

    Read more →
  • Contrastive Language-Image Pre-training

    Contrastive Language-Image Pre-training

    Contrastive Language-Image Pre-training (CLIP) is a technique for training a pair of neural network models, one for image understanding and one for text understanding, using a contrastive objective. This method has enabled broad applications across multiple domains, including cross-modal retrieval, text-to-image generation, and aesthetic ranking. == Algorithm == The CLIP method trains a pair of models contrastively. One model takes in a piece of text as input and outputs a single vector representing its semantic content. The other model takes in an image and similarly outputs a single vector representing its visual content. The models are trained so that the vectors corresponding to semantically similar text-image pairs are close together in the shared vector space, while those corresponding to dissimilar pairs are far apart. To train a pair of CLIP models, one would start by preparing a large dataset of image-caption pairs. During training, the models are presented with batches of N {\displaystyle N} image-caption pairs. Let the outputs from the text and image models be respectively v 1 , . . . , v N , w 1 , . . . , w N {\displaystyle v_{1},...,v_{N},w_{1},...,w_{N}} . Two vectors are considered "similar" if their dot product is large. The loss incurred on this batch is the multi-class N-pair loss, which is a symmetric cross-entropy loss over similarity scores: − 1 N ∑ i ln ⁡ e v i ⋅ w i / T ∑ j e v i ⋅ w j / T − 1 N ∑ j ln ⁡ e v j ⋅ w j / T ∑ i e v i ⋅ w j / T {\displaystyle -{\frac {1}{N}}\sum _{i}\ln {\frac {e^{v_{i}\cdot w_{i}/T}}{\sum _{j}e^{v_{i}\cdot w_{j}/T}}}-{\frac {1}{N}}\sum _{j}\ln {\frac {e^{v_{j}\cdot w_{j}/T}}{\sum _{i}e^{v_{i}\cdot w_{j}/T}}}} In essence, this loss function encourages the dot product between matching image and text vectors ( v i ⋅ w i {\displaystyle v_{i}\cdot w_{i}} ) to be high, while discouraging high dot products between non-matching pairs. The parameter T > 0 {\displaystyle T>0} is the temperature, which is parameterized in the original CLIP model as T = e − τ {\displaystyle T=e^{-\tau }} where τ ∈ R {\displaystyle \tau \in \mathbb {R} } is a learned parameter. Other loss functions are possible. For example, Sigmoid CLIP (SigLIP) proposes the following loss function: L = 1 N ∑ i , j ∈ 1 : N f ( ( 2 δ i , j − 1 ) ( e τ w i ⋅ v j + b ) ) {\displaystyle L={\frac {1}{N}}\sum _{i,j\in 1:N}f((2\delta _{i,j}-1)(e^{\tau }w_{i}\cdot v_{j}+b))} where f ( x ) = ln ⁡ ( 1 + e − x ) {\displaystyle f(x)=\ln(1+e^{-x})} is the negative log sigmoid loss, and the Dirac delta symbol δ i , j {\displaystyle \delta _{i,j}} is 1 if i = j {\displaystyle i=j} else 0. == CLIP models == While the original model was developed by OpenAI, subsequent models have been trained by other organizations as well. === Image model === The image encoding models used in CLIP are typically vision transformers (ViT). The naming convention for these models often reflects the specific ViT architecture used. For instance, "ViT-L/14" means a "vision transformer large" (compared to other models in the same series) with a patch size of 14, meaning that the image is divided into 14-by-14 pixel patches before being processed by the transformer. The size indicator ranges from B, L, H, G (base, large, huge, giant), in that order. Other than ViT, the image model is typically a convolutional neural network, such as ResNet (in the original series by OpenAI), or ConvNeXt (in the OpenCLIP model series by LAION). Since the output vectors of the image model and the text model must have exactly the same length, both the image model and the text model have fixed-length vector outputs, which in the original report is called "embedding dimension". For example, in the original OpenAI model, the ResNet models have embedding dimensions ranging from 512 to 1024, and for the ViTs, from 512 to 768. Its implementation of ViT was the same as the original one, with one modification: after position embeddings are added to the initial patch embeddings, there is a LayerNorm. Its implementation of ResNet was the same as the original one, with 3 modifications: In the start of the CNN (the "stem"), they used three stacked 3x3 convolutions instead of a single 7x7 convolution, as suggested by. There is an average pooling of stride 2 at the start of each downsampling convolutional layer (they called it rect-2 blur pooling according to the terminology of ). This has the effect of blurring images before downsampling, for antialiasing. The final convolutional layer is followed by a multiheaded attention pooling. ALIGN a model with similar capabilities, trained by researchers from Google used EfficientNet, a kind of convolutional neural network. === Text model === The text encoding models used in CLIP are typically Transformers. In the original OpenAI report, they reported using a Transformer (63M-parameter, 12-layer, 512-wide, 8 attention heads) with lower-cased byte pair encoding (BPE) with 49152 vocabulary size. Context length was capped at 76 for efficiency. Like GPT, it was decoder-only, with only causally-masked self-attention. Its architecture is the same as GPT-2. Like BERT, the text sequence is bracketed by two special tokens [SOS] and [EOS] ("start of sequence" and "end of sequence"). Take the activations of the highest layer of the transformer on the [EOS], apply LayerNorm, then a final linear map. This is the text encoding of the input sequence. The final linear map has output dimension equal to the embedding dimension of whatever image encoder it is paired with. These models all had context length 77 and vocabulary size 49408. ALIGN used BERT of various sizes. == Dataset == === WebImageText === The CLIP models released by OpenAI were trained on a dataset called "WebImageText" (WIT) containing 400 million pairs of images and their corresponding captions scraped from the internet. The total number of words in this dataset is similar in scale to the WebText dataset used for training GPT-2, which contains about 40 gigabytes of text data. The dataset contains 500,000 text-queries, with up to 20,000 (image, text) pairs per query. The text-queries were generated by starting with all words occurring at least 100 times in English Wikipedia, then extended by bigrams with high mutual information, names of all Wikipedia articles above a certain search volume, and WordNet synsets. The dataset is private and has not been released to the public, and there is no further information on it. ==== Data preprocessing ==== For the CLIP image models, the input images are preprocessed by first dividing each of the R, G, B values of an image by the maximum possible value, so that these values fall between 0 and 1, then subtracting by [0.48145466, 0.4578275, 0.40821073], and dividing by [0.26862954, 0.26130258, 0.27577711]. The rationale was that these are the mean and standard deviations of the images in the WebImageText dataset, so this preprocessing step roughly whitens the image tensor. These numbers slightly differ from the standard preprocessing for ImageNet, which uses [0.485, 0.456, 0.406] and [0.229, 0.224, 0.225]. If the input image does not have the same resolution as the native resolution (224×224 for all except ViT-L/14@336px, which has 336×336 resolution), then the input image is first scaled by bicubic interpolation, so that its shorter side is the same as the native resolution, then the central square of the image is cropped out. === Others === ALIGN used over one billion image-text pairs, obtained by extracting images and their alt-tags from online crawling. The method was described as similar to how the Conceptual Captions dataset was constructed, but instead of complex filtering, they only applied a frequency-based filtering. Later models trained by other organizations had published datasets. For example, LAION trained OpenCLIP with published datasets LAION-400M, LAION-2B, and DataComp-1B. == Training == In the original OpenAI CLIP report, they reported training 5 ResNet and 3 ViT (ViT-B/32, ViT-B/16, ViT-L/14). Each was trained for 32 epochs. The largest ResNet model took 18 days to train on 592 V100 GPUs. The largest ViT model took 12 days on 256 V100 GPUs. All ViT models were trained on 224×224 image resolution. The ViT-L/14 was then boosted to 336×336 resolution by FixRes, resulting in a model. They found this was the best-performing model. In the OpenCLIP series, the ViT-L/14 model was trained on 384 A100 GPUs on the LAION-2B dataset, for 160 epochs for a total of 32B samples seen. == Applications == === Cross-modal retrieval === CLIP's cross-modal retrieval enables the alignment of visual and textual data in a shared latent space, allowing users to retrieve images based on text descriptions and vice versa, without the need for explicit image annotations. In text-to-image retrieval, users input descriptive text, and CLIP retrieves images with matching embeddings. In image-to-text retrieval, images are used to find related text content. CLIP’s ability to connect vis

    Read more →
  • Microsoft Security Development Lifecycle

    Microsoft Security Development Lifecycle

    The Microsoft Security Development Lifecycle (SDL) is the approach Microsoft uses to integrate security into DevOps processes (sometimes called a DevSecOps approach). You can use this SDL guidance and documentation to adapt this approach and practices to your organization. == Overview == The practices outlined in the SDL approach are applicable to all types of software development and across all platforms, ranging from traditional waterfall methodologies to modern DevOps approaches. They can generally be applied to the following: Software – whether you are developing software code for firmware, AI applications, operating systems, drivers, IoT Devices, mobile device apps, web services, plug-ins or applets, hardware microcode, low-code/no-code apps, or other software formats. Note that most practices in the SDL are applicable to secure computer hardware development as well. Platforms – whether the software is running on a ‘serverless’ platform approach, on an on-premises server, a mobile device, a cloud hosted VM, a user endpoint, as part of a Software as a Service (SaaS) application, a cloud edge device, an IoT device, or anywhere else. == Practices == The SDL recommends 10 security practices to incorporate into your development workflows. Applying the 10 security practices of SDL is an ongoing process of improvement so a key recommendation is to begin from some point and keep enhancing as you proceed. This continuous process involves changes to culture, strategy, processes, and technical controls as you embed security skills and practices into DevOps workflows. The 10 SDL practices are: Establish security standards, metrics, and governance Require use of proven security features, languages, and frameworks Perform security design review and threat modeling Define and use cryptography standards Secure the software supply chain Secure the engineering environment Perform security testing Ensure operational platform security Implement security monitoring and response Provide security training == Versions ==

    Read more →
  • Honey encryption

    Honey encryption

    Honey encryption is a type of data encryption that "produces a ciphertext, which, when decrypted with an incorrect key as guessed by the attacker, presents a plausible-looking yet incorrect plaintext." == Creators == Ari Juels and Thomas Ristenpart of the University of Wisconsin, the developers of the encryption system, presented a paper on honey encryption at the 2014 Eurocrypt cryptography conference. == Method of protection == A brute-force attack involves repeated decryption with random keys; this is equivalent to picking random plaintexts from the space of all possible plaintexts with a uniform distribution. This is effective because even though the attacker is equally likely to see any given plaintext, most plaintexts are extremely unlikely to be legitimate i.e. the distribution of legitimate plaintexts is non-uniform. Honey encryption defeats such attacks by first transforming the plaintext into a space such that the distribution of legitimate plaintexts is uniform. Thus an attacker guessing keys will see legitimate-looking plaintexts frequently and random-looking plaintexts infrequently. This makes it difficult to determine when the correct key has been guessed. In effect, honey encryption "[serves] up fake data in response to every incorrect guess of the password or encryption key." The security of honey encryption relies on the fact that the probability of an attacker judging a plaintext to be legitimate can be calculated (by the encrypting party) at the time of encryption. This makes honey encryption difficult to apply in certain applications e.g. where the space of plaintexts is very large or the distribution of plaintexts is unknown. It also means that honey encryption can be vulnerable to brute-force attacks if this probability is miscalculated. For example, it is vulnerable to known-plaintext attacks: if the attacker has a crib that a plaintext must match to be legitimate, they will be able to brute-force even Honey Encrypted data if the encryption did not take the crib into account. == Example == An encrypted credit card number is susceptible to brute-force attacks because not every string of digits is equally likely. The number of digits can range from 13 to 19, though 16 is the most common. Additionally, it must have a valid IIN and the last digit must match the checksum. An attacker can also take into account the popularity of various services: an IIN from MasterCard is probably more likely than an IIN from Diners Club Carte Blanche. Honey encryption can protect against these attacks by first mapping credit card numbers to a larger space where they match their likelihood of legitimacy. Numbers with invalid IINs and checksums are not mapped at all (i.e. have probability 0 of legitimacy). Numbers from large brands like MasterCard and Visa map to large regions of this space, while less popular brands map to smaller regions, etc. An attacker brute-forcing such an encryption scheme would only see legitimate-looking credit card numbers when they brute-force, and the numbers would appear with the frequency the attacker would expect from the real world. == Application == Juels and Ristenpart aim to use honey encryption to protect data stored on password manager services. Juels stated that "password managers are a tasty target for criminals," and worries that "if criminals get a hold of a large collection of encrypted password vaults they could probably unlock many of them without too much trouble." Hristo Bojinov, CEO and founder of Anfacto, noted that "Honey Encryption could help reduce their vulnerability. But he notes that not every type of data will be easy to protect this way. … Not all authentication or encryption system yield themselves to being honeyed."

    Read more →