Artificial intelligence engineering

Artificial intelligence engineering

Artificial intelligence engineering (AI engineering) is a technical discipline that focuses on the design, development, and deployment of AI systems. AI engineering involves applying engineering principles and methodologies to create scalable, efficient, and reliable AI-based solutions. It merges aspects of data engineering and software engineering to create real-world applications in diverse domains such as healthcare, finance, autonomous systems, and industrial automation. == Terminology ambiguity == According to Chip Huyen's book AI Engineering: Building Applications with Foundation Models, the term AI engineering refers to the process of building applications that use foundation models, which are typically models developed by a small number of research laboratories and made available as a service. Huyen distinguishes this from machine learning (ML) engineering, which involves building and deploying models developed in-house. She notes that most practical AI systems combine both approaches. For example, a customer-support chatbot may use a generative model to produce responses while also incorporating locally built components such as request classifiers or scoring mechanisms to assess response quality. As a result, the terms AI engineering and ML engineering are often used together or interchangeably in practice. The distinction and broader usage of the term have been discussed in industry publications and interviews, where AI engineering has been described as an emerging discipline focused on productionizing applications built with foundation models. == Key components == AI engineering integrates a variety of technical domains and practices, all of which are essential to building scalable, reliable, and ethical AI systems. === Data engineering and infrastructure === Data serves as the cornerstone of AI systems, necessitating careful engineering to ensure premium quality, wide spread availability, and usability. AI engineers gather large, diverse datasets from multiple sources such as databases, APIs, and real-time streams. This data undergoes cleaning, normalization, and preprocessing, often facilitated by automated data pipelines that manage extraction, transformation, and loading (ETL) processes. Efficient storage solutions, such as SQL (or NoSQL) databases and data lakes, must be selected based on data characteristics and use cases. Security measures, including encryption and access controls, are critical for protecting sensitive information and ensuring compliance with regulations like GDPR. Scalability is essential, frequently involving cloud services and distributed computing frameworks to handle growing data volumes effectively. === Algorithm selection and optimization === Selecting the appropriate algorithm is crucial for the success of any AI system. Engineers evaluate the problem (which could be classification or regression, for example) to determine the most suitable machine learning algorithm, including deep learning paradigms. Once an algorithm is chosen, optimizing it through hyperparameter tuning is essential to enhance efficiency and accuracy. Techniques such as grid search or Bayesian optimization are employed, and engineers often utilize parallelization to expedite training processes, particularly for large models and datasets. For existing models, techniques like transfer learning can be applied to adapt pre-trained models for specific tasks, reducing the time and resources needed for training. === Deep learning engineering === Deep learning is particularly important for tasks involving large and complex datasets. Engineers design neural network architectures tailored to specific applications, such as convolutional neural networks for visual tasks or recurrent neural networks for sequence-based tasks. Transfer learning, where pre-trained models are fine-tuned for specific use cases, helps streamline development and often enhances performance. Optimization for deployment in resource-constrained environments, such as mobile devices, involves techniques like pruning and quantization to minimize model size while maintaining performance. Engineers also mitigate data imbalance through augmentation and synthetic data generation, ensuring robust model performance across various classes. === Natural language processing === Natural language processing (NLP) is a crucial component of AI engineering, focused on enabling machines to understand and generate human language. The process begins with text preprocessing to prepare data for machine learning models. Recent advancements, particularly transformer-based models like BERT and GPT, have greatly improved the ability to understand context in language. AI engineers work on various NLP tasks, including sentiment analysis, machine translation, and information extraction. These tasks require sophisticated models that utilize attention mechanisms to enhance accuracy. Applications range from virtual assistants and chatbots to more specialized tasks like named-entity recognition (NER) and Part of speech (POS) tagging. === Reasoning and decision-making systems === Developing systems capable of reasoning and decision-making is a significant aspect of AI engineering. Whether starting from scratch or building on existing frameworks, engineers create solutions that operate on data or logical rules. Symbolic AI employs formal logic and predefined rules for inference, while probabilistic reasoning techniques like Bayesian networks help address uncertainty. These models are essential for applications in dynamic environments, such as autonomous vehicles, where real-time decision-making is critical. === Security === Security is a critical consideration in AI engineering, particularly as AI systems become increasingly integrated into sensitive and mission-critical applications. AI engineers implement robust security measures to protect models from adversarial attacks, such as evasion and poisoning, which can compromise system integrity and performance. Techniques such as adversarial training, where models are exposed to malicious inputs during development, help harden systems against these attacks. Additionally, securing the data used to train AI models is of paramount importance. Encryption, secure data storage, and access control mechanisms are employed to safeguard sensitive information from unauthorized access and breaches. AI systems also require constant monitoring to detect and mitigate vulnerabilities that may arise post-deployment. In high-stakes environments like autonomous systems and healthcare, engineers incorporate redundancy and fail-safe mechanisms to ensure that AI models continue to function correctly in the presence of security threats. === Ethics and compliance === As AI systems increasingly influence societal aspects, ethics and compliance are vital components of AI engineering. Engineers design models to mitigate risks such as data poisoning and ensure that AI systems adhere to legal frameworks, such as data protection regulations like GDPR. Privacy-preserving techniques, including data anonymization and differential privacy, are employed to safeguard personal information and ensure compliance with international standards. Ethical considerations focus on reducing bias in AI systems, preventing discrimination based on race, gender, or other protected characteristics. By developing fair and accountable AI solutions, engineers contribute to the creation of technologies that are both technically sound and socially responsible. == Workload == An AI engineer's workload revolves around the AI system's life cycle, which is a complex, multi-stage process. This process may involve building models from scratch or using pre-existing models through transfer learning, depending on the project's requirements. Each approach presents unique challenges and influences the time, resources, and technical decisions involved. === Problem definition and requirements analysis === Regardless of whether a model is built from scratch or based on a pre-existing model, the work begins with a clear understanding of the problem. The engineer must define the scope, understand the business context, and identify specific AI objectives that align with strategic goals. This stage includes consulting with stakeholders to establish key performance indicators (KPIs) and operational requirements. When developing a model from scratch, the engineer must also decide which algorithms are most suitable for the task. Conversely, when using a pre-trained model, the workload shifts toward evaluating existing models and selecting the one most aligned with the task. The use of pre-trained models often allows for a more targeted focus on fine-tuning, as opposed to designing an entirely new model architecture. === Data acquisition and preparation === Data acquisition and preparation are critical stages regardless of the development method chosen, as the performance of any AI system relies heavily on high-quality, re

Nouvelle AI

Nouvelle artificial intelligence (Nouvelle AI) is an approach to artificial intelligence pioneered in the 1980s by Rodney Brooks, who was then part of MIT artificial intelligence laboratory. Nouvelle AI differs from classical AI by aiming to produce robots with intelligence levels similar to insects. Researchers believe that intelligence can emerge organically from simple behaviors as these intelligences interacted with the "real world", instead of using the constructed worlds which symbolic AIs typically needed to have programmed into them. == Motivation == The differences between nouvelle AI and symbolic AI are apparent in early robots Shakey and Freddy. These robots contained an internal model (or "representation") of their micro-worlds consisting of symbolic descriptions. As a result, this structure of symbols had to be renewed as the robot moved or the world changed. Shakey's planning programs assessed the program structure and broke it down into the necessary steps to complete the desired action. This level of computation required a large amount time to process, so Shakey typically performed its tasks very slowly. Symbolic AI researchers had long been plagued by the problem of updating, searching, and otherwise manipulating the symbolic worlds inside their AIs. A nouvelle system refers continuously to its sensors rather than to an internal model of the world. It processes the external world information it needs from the senses when it is required. As Brooks puts it, "the world is its own best model--always exactly up to date and complete in every detail." A central idea of nouvelle AI is that simple behaviors combine to form more complex behaviors over time. For example, simple behaviors can include elements like "move forward" and "avoid obstacles." A robot using nouvelle AI with simple behaviors like collision avoidance and moving toward a moving object could possibly come together to produce a more complex behavior like chasing a moving object. === The frame problem === The frame problem describes an issue with using first-order logic (FOL) to express facts about a robot in the world. Representing the state of a robot with traditional FOL requires the use of many axioms (symbolic language) to imply that things about an environment do not change arbitrarily. Nouvelle AI seeks to sidestep the frame problem by dispensing with filling the AI or robot with volumes of symbolic language and instead letting more complex behaviors emerge by combining simpler behavioral elements. === Embodiment === The goal of traditional AI was to build intelligences without bodies, which would only have been able to interact with the world via keyboard, screen, or printer. However, nouvelle AI attempts to build embodied intelligence situated in the real world. Brooks quotes approvingly from the brief sketches that Turing gave in 1948 and 1950 of the "situated" approach. Turing wrote of equipping a machine "with the best sense organs that money can buy" and teaching it "to understand and speak English" by a process that would "follow the normal teaching of a child." This approach was contrasted to the others where they focused on abstract activities such as playing chess. == Brooks' robots == === Insectoid robots === Brooks focused on building robots that acted like simple insects while simultaneously working to remove some traditional AI characteristics. He created insect-like robots, named Allen and Herbert after cognitive science and AI pioneers Allen Newell and Herbert A. Simon. Brooks's insectoid robots contained no internal models of the world. Herbert, for example, discarded a high volume of the information received from its sensors and never stored information for more than two seconds. ==== Allen ==== Allen had a ring of twelve ultrasonic sonars as its primary sensors and three independent behavior-producing modules. These modules were programmed to avoid both stationary and moving objects. With only this module activated, Allen stayed in the middle of a room until an object approached and then it ran away while avoiding obstacles in its way. ==== Herbert ==== Herbert used infrared sensors to avoid obstacles and a laser system to collect 3D data over a distance of about 12 feet. Herbert also carried a number of simple sensors in its "hand." The robot's testing ground was the real world environment of the busy offices and workspaces of the MIT AI lab where it searched for empty soda cans and carried them away, a seemingly goal-oriented activity that emerged as a result of 15 simple behavior units combining. As a parallel, Simon noted that an ant's complicated path is due to the structure of its environment rather than the depth of its thought processes. ==== Other insectoid robots ==== Other robots by Brooks' team were Genghis and Squirt. Genghis had six legs and was able to walk over rough terrain and follow a human. Squirt's behavior modules had it stay in dark corners until it heard a noise, then it would begin to follow the source of the noise. Brooks agreed that the level of nouvelle AI had come near the complexity of a real insect, which raised a question about whether or not insect level-behavior was and is a reasonable goal for nouvelle AI. === Humanoid robots === Brooks' own recent work has taken the opposite direction to that proposed by Von Neumann in the quotations "theorists who select the human nervous system as their model are unrealistically picking 'the most complicated object under the sun,' and that there is little advantage in selecting instead the ant, since any nervous system at all exhibits exceptional complexity." ==== Cog ==== In the 1990s, Brooks decided to pursue the goal of human-level intelligence and, with Lynn Andrea Stein, built a humanoid robot called Cog. Cog is a robot with an extensive collection of sensors, a face, and arms (among other features) that allow it to interact with the world and gather information and experience so as to assemble intelligence organically in the manner described above by Turing. The team believed that Cog would be able to learn and able to find a correlation between the sensory information it received and its actions, and to learn common sense knowledge on its own. As of 2003, all development of the project had ceased.

Conference app

A conference app, also known as an event app or meeting app, is a mobile app developed to help attendees and meeting planners manage their conference experience. It typically includes conference proceedings and venue information, allowing users to create personalized schedules and engage with other users. A conference app can be a native app or web-based. In recent years, conference apps have gained in popularity as a sustainable solution for event management by reducing paper produced by printed materials. Advanced features often include real-time notifications for updates or changes, integration with virtual meeting platforms for hybrid or fully online events, and analytics tools for organizers to measure attendance and engagement. Additionally, some apps support sponsorship and exhibitor features, enabling businesses to showcase their products or services directly within the app.

Masking (art)

In art, craft, and engineering, masking is the use of materials to protect areas from change, or to focus change on other areas. This can describe either the techniques and materials used to control the development of a work of art by protecting a desired area from change; or a phenomenon that (either intentionally or unintentionally) causes a sensation to be concealed from conscious attention. The term is derived from the word mask, in the sense that it hides the face from view. == In painting == Masking materials supplement a painter's dexterity and choice of applicator to control where paint is laid. Examples include the use of a stencil or masking tape to protect areas which are not to be painted. === Solid masks === Most solid masks require an adhesive to hold the mask in place while work is performed. Some, such as masking tape and frisket, come with adhesive pre-applied. Solid masks are readily available in bulk, and are used in large painting jobs. Paper products Kraft paper Butcher paper Masking tape Plastic film Frisket Polyester tape Stencils Silk screen === Liquid masks === Liquid masks are preferred where precision is needed; they prevent paint from seeping underneath, resulting in clean edges. Care must be taken to remove them without damaging the work underneath. Latex or other polymers Molten wax Gesso, typically a substrate for painting, but can also be applied to achieve masking effects == In photography == Masks used for photography are used to enhance the quality of an image. Representations of a scene—whether film, video display, or printed—do not have the dynamic contrast range available to the human eye looking directly at the same scene. Adjusting the contrast in an image helps restore some of the perceived qualities of the original scene. These adjustments are typically performed on "blown-out" highlights, and "crushed" or "muddy" shadow areas, where clipping has occurred; or on desaturated colors. Photographic masks are peculiar in that they are produced from the image they will alter, an exercise in recursion. Masks used to produce other effects are similar to those used in painting. === Controlling exposure === ==== Film ==== The basic methods of controlling exposure are dodging and burning, which respectively lighten (reduce exposure) and darken (increase exposure) areas of an image. The tools a film photographer uses range from shaped pieces of black material (such as studio foil, foam, and paper) to the photographer's hands. To create a photographic mask, a sheet of negative film is contact-exposed to the original film negative or slide positive in a particular way. Both films are then combined to produce a processed positive. The process is similar when applied using digital techniques: the inverse of the working image is reduced to an image mask; filters or other adjustments are then applied, using the mask to selectively block portions of the image. ==== Digital ==== Image editors offer at the very least a "Select All" command and a rectangular "marquee" selection tool. (The word "marquee" describes the "crawling ants" border used to highlight the active region.) Once a selection is created, further changes to the image will be confined to that area. To continue editing the rest of the image, the selection is either "deselected" or the entire image is selected. Advanced suites offer more ways to select portions of an image, as well as ways to combine these selections through. Selection masks can be switched between an editable greyscale image and a mask. They allow the user to create a mask using the suite's painting tools. === Contrast masking === When the contrast range of an image needs to be adjusted, a contrast mask is a simple solution. The processed image resembles what would be achieved when exposing through a neutral density filter, but the effects are focused highly upon the extreme regions of the image. The blocking areas of the mask coincide with the highlights of the image, and the permissive areas with the shadows, resulting in more detail appearing in each. ==== Film ==== The mask is often made from high-quality black-and-white film, such as Kodak Technical Pan, which allows for a degree of softening on the mask. Its processing time is reduced so as to not completely oppose the original negative. Both negatives are combined and registered, and collectively exposed with additional time to compensate for the presence of the mask. ==== Digital ==== Contrast masking is made simpler with digital editing. A grayscale version of the image is produced, either by desaturation or by calculating selected ratios of the image's color channels, inverted, and blurred. The mask and original image are blended together to produce the final processed image. Some image editors allow for refinement of the effect by changing the strength of the blend. Contrast masking can be considered to be the opposite of gamma correction, which adjusts the midtones of an image. Effects similar to contrast masking can be achieved by adjusting the response curves of an image. === Unsharp masking === A derivative of contrast masking is unsharp masking, an unusual term for a process intended to increase the apparent sharpness (acutance) of an image. Unsharp masking uses a blurred form of the image to increase contrast along regions of moderate contrast difference. Around edges, the blur region causes highlights to overexpose and shadows to underexpose. Taken to an extreme, the edges become overly visible and detract from the quality of the image—this is referred to as halation. Unsharp masking does not increase the actual sharpness, as it cannot recover details lost to blurring. ==== Film ==== Unsharp masking allows the photographer to sharpen areas that have become blurred in the original negative, due to long shutter speed/exposure time, or from using a wide aperture/"fast" lens. When creating the unsharp mask, extra space or diffusing material is added between the image and the mask to produce the necessary blur. ==== Digital ==== Unsharp masking has become automated in digital editing, with higher-end suites offering the process as a "tool" or "filter" in their standard sharpening kits—the actual creation of a mask is bypassed in favor of calculations that represent the mask's effect. The process depends on three factors: the radius of the blur, the strength of the effect, and the threshold degree of contrast above which the effect will be applied. (Adjusting the threshold allows the editor to apply the effect selectively upon moderately defined edges and ignore image noise.) Unsharp masking is computationally more complex than other sharpening algorithms, but results in a higher-quality remedy. Deconvolution allows for truer sharpening, but is much more complex than unsharp masking.

Video browsing

Video browsing, also known as exploratory video search, is the interactive process of skimming through video content in order to satisfy some information need or to interactively check if the video content is relevant. While originally proposed to help users inspecting a single video through visual thumbnails, modern video browsing tools enable users to quickly find desired information in a video archive by iterative human–computer interaction through an exploratory search approach. Many of these tools presume a smart user that wants features to interactively inspect video content, as well as automatic content filtering features. For that purpose, several video interaction features are usually provided, such as sophisticated navigation in video or search by a content-based query. Video browsing tools often build on lower-level video content analysis, such as shot transition detection, keyframe extraction, semantic concept detection, and create a structured content overview of the video file or video archive. Furthermore, they usually provide sophisticated navigation features, such as advanced timelines, visual seeker bars or a list of selected thumbnails, as well as means for content querying. Examples of content queries are shot filtering through visual concepts (e.g., only shots showing cars), through some specific characteristics (e.g., color or motion filtering), through user-provided sketches (e.g., a visually drawn sketch), or through content-based similarity search. == History == Video browsing was originally proposed by Iranian engineer Farshid Arman, Taiwanese computer scientist Arding Hsu, and computer scientist Ming-Yee Chiu, while working at Siemens, and it was presented at the ACM International Conference in August 1993. They described a shot detection algorithm for compressed video that was originally encoded with discrete cosine transform (DCT) video coding standards such as JPEG, MPEG and H.26x. The basic idea was that, since the DCT coefficients are mathematically related to the spatial domain and represent the content of each frame, they can be used to detect the differences between video frames. In the algorithm, a subset of blocks in a frame and a subset of DCT coefficients for each block are used as motion vector representation for the frame. By operating on compressed DCT representations, the algorithm significantly reduces the computational requirements for decompression and enables effective video browsing. The algorithm represents separate shots of a video sequence by an r-frame, a thumbnail of the shot framed by a motion tracking region. A variation of this concept was later adopted for QBIC video content mosaics, where each r-frame is a salient still from the shot it represents. === Video Notebook === Modern video browsing solutions include Video Notebook, a Menlo Park startup founded in 2021 by Mike Lanza, which uses computer vision to extract slides and optical character recognition and speech recognition to facilitate video search. The software can be either used on the client side (using a browser extension), where the slides and text are extracted while the video is watched (e.g. on a video platform like YouTube or Udemy), or on the server side. Processed videos, which can be viewed in the Video Notebook web app, feature a video browsing user interface with extracted timestamped slides, a search bar for querying the video (or a collection of videos), and text chapters. Video Notebook customers include organisations like Ernst & Young. === Video Browser Showdown === The Video Browser Showdown (VBS) is an annual live evaluation competition for exploratory video search tools, where international researchers use video browsing tools to solve ad-hoc video search tasks on a moderately large data set as fast as possible. The main goal of the VBS, which started in 2012 at the International Conference on MultiMedia Modeling (MMM), is to advance the performance of video browsing tools. Since 2016, the VBS also collaborates with TRECVID. The aim of the VBS is to evaluate video browsing tools for efficiency at known-item search (KIS) tasks with a well-defined data set in direct comparison to other tools.

MeituPic

Meitu Xiu Xiu ("Meitu") (Chinese: 美图秀秀) is an image editing software that is mostly used in Mainland China but is also popular in Hong Kong and Taiwan. It is only available on Google Play and App Store in certain countries. It provides tools for editing photos: filters, retouching, collage, scenes, frames, and photo decorations, as well as generative AI features such as text-to-images, AI removal and AI repainting etc. Meitu is one of the apps developed by Meitu, Inc.; it also produced BeautyCam, Wink and X-Design. == History == Meitu's PC version was created in 2008 by Wu Xinhong, the CEO of Meitu. In 2013, its mobile version became one of the first must-have mobile apps in China. Meitu, Inc. is a photo and video-centered app developer, which was founded in 2008 in Xiamen. Currently, the major revenue source of Meitu is premium subscription. Meitu, Inc. was initially funded by Cai Wensheng, a well-known angel investor. The company has an approximately 250 million monthly active users globally. == Function == === Edit === MeituPic provides a number of photo-editing tools. The major functions are auto enhance, edit, enhance, filters, frames, magic brush, mosaic, text, and blur. Auto enhance focuses on the nature of photos taken, while Edit includes functions of cropping, rotation, sharpening, and adjustment of ratio. For Enhance, users can apply slight adjustment on the photo by controlling the levels of brightness, contrast, colour temperature, saturation, highlight, shadow and smart light. Major types of filters are LOMO, beauty, style as well as art. Different frames can be chosen from poster, simple, and fantasy. Magic brush provides a great variety of brushes with different colours and patterns for users to decorate the photos. Mosaic brush enables users to cover certain parts of the photo. Texts can be added to the photo. Choices of different bubbles, font as well as style of words are available. Blurring effect is also available to make the photo less distinct and clear. === Beauty Retouch === There are seven major functions for retouching a photo: automatic retouch, smooth and whiten skin, remove blemish, make slimmer, remove dark circles and bags under the eyes, make taller, and enhance the eyes. Automatic retouch enhances portraits by lightening the skin tone, brightening the eyes, and simulating a face-lift by tapping on just one button. This helps to remove wrinkles and optimizes the skin tone. Acne, blemishes, and other skin imperfections can also be removed. The face-lift and weight-loss functions in the slimming option can be used to reshape the body. The option to make the subject taller can be used to change the perceived height of the subject and give the impression of slimmer, longer legs. The option to enhance the eyes can enlarge and brighten the eyes. === Collage === Collage has four types: template, freestyle, poster, PicStrip, which all maximize to insert nine photos. Template integrates photos in a vertical rectangle tightly. MeituPic has 15 frames or free download function for users. MeituPic also provides different templates according to number of photos inserted. Freestyle separates photos on a background freely. There are two parts of background: custom and more. For custom, users choose from album. For more, there are plain and picture with 18 choices. Poster makes a poster with photos. Users choose a poster among 8 choices or tap ‘more’ to download a new one. PicStrip combines photos vertically making an elongated file. Users choose a frame from 15 choices. Pinching thumb and forefinger together or apart zooms photos in/out. Putting two fingers and turning hand rotates photos. Pressing moves photos to ideal location. After designing, users tap ‘save/share’ on the upper right corner and the photo made is saved into album automatically. == Awards ==

Elowan

Elowan is a plant-robot cyborg. Using its own internal bioelectrical signals, The plant has a robotic extension that makes it move towards light sources. Electrodes are inserted into the leaves, stem, and ground to detect the faint bioelectrical signals the plant produces. Then they are amplified so the robot can read them. So when the plant "wants" to go to light, the cyborg automatically goes to the nearest light source. Future extensions of the robot could provide: Protection, growth frameworks, and nutrients. Other factors that could make the cyborg move are temperature, soil, and gravity conditions Elowan is one in a series of plant-electronic hybrid experiments.