A relational database has been the backend of business systems for many years. However, the past decade witnessed an evolution of Big Data that is bigger in volume and structurally varied, leading to the NoSQL movement. Volumes of customer data gathered from disparate sources had to be integrated to provide crucial insights. This could be possible only with a powerful, advanced database that can store, map, and query data relationships at a great speed – a database that helps understand the crucial focus points in an enterprise, incorporating dynamic information without the need of a pre-defined rigid schema.

To summarize, there was a need for a general database management technology that could embrace connected data and enable graph thinking. This technology is called the Graph Database. It should not be viewed as a “replacement” technology, but rather a complimentary piece to other databases that have already been deployed.

One of the main reasons for choosing a Graph Database is speed. The data processing speed is considerably high. A basic query, for example, can be executed hundreds of times faster. Since the schema is less rigid, fluctuating data can be handled quickly, rapidly, and naturally. Records can be accessed in a fraction of a second since each element is linked to the other by each edge. Moreover, graph databases can handle volumes of data quickly when compared to relational databases.

This blog post gives an overview of graph databases, the need for graph databases, and touches on graph models and their properties, use cases, merits, and demerits.

What is a Graph Database?

A graph database is data management system software that has vertices (nodes) and edges as the building blocks instead of tables.

There are four types of NoSQL databases, and one of them is the graph database. The key differentiator between these types of new databases is the data model that they use. The data model is basically the building block that is exhibited to the developer. The building block of a graph database is that of nodes, typed relationships between nodes and key value properties that can be attached to both the nodes and to the relationship or be associated with their own attributes. For example; if there are two parties, x and y, who engaged in a conversation, X is a node and y is also a node, and between them is a relationship called “in a conversation”.

The basic idea is that for each party, the graph will break out and show the relationships between the parties. To refer to the above example; X has a relationship with Y who is engaged in a conversation. The graph feature will visually show the relationship.

Graph database management systems offer Create, Read, Update, and Delete (CRUD) methods to access and manage data. Graph databases can be used for both OLAP and OLTP (Online Transaction Processing). Systems tailored to OLTP are generally optimized for transactional performance, guaranteeing the ACID properties.

What are Some Possible Graph Database Projects?

Graph Databases are great for social applications and have gained a great popularity from social networking models. One of the well-known examples is the Facebook Graph Search. Facebook’s search app, called as the third pillar of Facebook, has a new way of searching social graphs.
There is also a broad applicability of graph databases outside of social. They are used in master data management, network management, content management, CRM systems, online recommendations, virtual assistants to drive conversations, artificial intelligence, machine learning, security, and fraud detection.

Master Data Management/Customer 360

Understand and analyze business data and their relationships, consolidated across various business units to acquire a holistic customer view.

Recommendation

Influence customers to buy company products and recommend to others.

Security and Fraud Detection

Determine the fraudulent entity, transaction, or interaction that poses a security risk or compliance issue.

When should you use a Graph Database?

A graph database cannot be applied at all situations. The requirements have to be evaluated every time based on the situation.

Data Model
One of the major criteria for determining if the graph database can be applied is by evaluating your data model. If the data model is highly relational, a graph database is the best fit. The general-purpose structure allows you to model all kinds of scenarios

Relationship Type
If your data contains many-to-many relationships, a graph database is preferable. A graph database is also easier to represent in the case of one-to-many relationships. Graph databases represent the related data inherently as it is and eliminates “Join” performance issues. Since relationships take priority, applications need not infer data connections using foreign keys.

Schema
If your schema is not absolutely rigid, then a graph database is the best fit even if a relational database suits your requirement. The property graph model database, one of the most popular variants of the graph database model, is schema-less and optimized for traversal.

Data Analysis
If your data involves complex analysis or expensive queries spanning multiple types of data, a graph database is the best choice to run the queries more efficiently.

How do Graph Databases Work?

The graph model works particularly well for applications where connections between items of data are the most essential factor. Connected data is more important than individual points. Relationships and connections are persisted through every part of the data cycle.

Graph model has two primary components: Nodes (entities) and edges (relationships between entities). Nodes are the entities in the graph. They can have properties or attributes (key-value). Relationships have a start and end node. They can also have labels assigned to it representing their roles in that domain. The main purpose of a Graph Database is analysis and visualization of graphical data. In a graph database, you typically index properties to find start points.

The way you would move around a graph is to traverse along specific edge types, or across the entire graph. Traversing is nothing but the act of journeying through nodes in the graph. Traversing to nodes via their relationships is similar to the JOINs on the relational database tables, but these traverses are quicker than the JOINs. Since relationships are stored in the graph itself, retrieval of information from nodes is easier. The graph takes into account only the data that is needed without considering any grouping operations on the entire dataset while querying. The performance of the data being queried is based on the data being queried and not on the size of the dataset. Since relationships and connections are persistent through the entire data lifecycle, there is no reliance on a foreign key for lookups in tables.

Unique Characteristics

The following are two properties that make graph databases unique:

Storage

Graph storage refers to the underlying structure of the database that contains graph data. It can be classified into Native and Non-Native Storage.

  • Native Storage
    Some graph databases use native graph storage that is specifically designed to store and manage graphs. The graph database is populated with technology designed to be graph-first.

  • Non-Native Storage
    Databases are bolted as an after-thought. Storage comes from an outside source such as a relational database, an object-oriented database, or some other general-purpose data store. This type of storage is slower than a native approach because all the graph connections have to be translated into a different data model.

Processing

Graph processing can also be classified into Native Graph Processing and Non-Native Graph Processing.

  • Native Graph Processing
    Connected nodes physically point to each other in a native graph processing; this is the most efficient means of processing data in a graph. It is also known as index-free adjacency because each node directly references the nodes adjacent to it.

  • Non-Native Graph Processing
    Non-native graph processing often uses a larger number of indexes to complete a read or write transaction, significantly slowing down operations.

What are Graph Databases good for?

A graph database is good for quoting a graph and interpreting a graph. Graph databases are applicable when there is a need for fast, iterative development to meet the open-ended business requirements.

  1. Recommendations
    Recommendation engines depend on finding objects having similar properties. But as the volume and complexity grows, the underlying system architecture also becomes more complex. Graph Databases can perform sophisticated relationship-based queries in real time to analyze customer behavior at the point of transaction.

  2. Highly-Connected Data (Social Networks):
    Perhaps the most common use case for a graph database is social networks, with their perplexing connections and user activity.

  3. Anomaly Detection
    A quick analysis of data relationships is essential to uncover fraud in real time, and graph databases provide the necessary performance.

  4. Metadata-based Content Management
    Content can be annotated automatically and transformed to a knowledge base

  5. Knowledge Graphs
    Used by search engines and businesses alike, knowledge graphs accumulate data from a wide variety of sources, allowing for better digital asset management and easier information retrieval. For example, Google tries to gather the knowledge in the universe into one big graph, so when you do a search for a person X, you will get all the classic text search results, which is based on a full-text query ordered by Pagerank in a simplified way. You also tend to get additional results around the search keyword.

  6. Systems Management
    Graph databases can be used by enterprise customers to map communications or IT networks and run complex scenario testing, to prepare for outages in a better way.

  7. Identity Management
    Graph databases are more dynamic by nature, can track changing roles better, and access authorizations more efficiently than traditional systems.

Disadvantages

Graph databases are not as helpful for operational use cases because they are not efficient at processing high volumes of transactions and they are not good at handling queries that span the entire database. They are not optimized to store and retrieve business entities such as customers or suppliers, which is why you would need to combine a graph database with a relational or NoSQL database.

A graph database is just a data store and does not give you a business-facing user interface to query or manage relationships nor will it provide survivorship functionality or data quality capabilities.

Graph databases do not create better connections but retrieve data at a faster pace for connected data.

Popular Examples of Graph Databases

A lot of graph databases, tools, and frameworks have sprung up in the past few years, and a majority of them are open source. Here are the few of the popular ones:

• Titan
• ArangoDB
• Neo4j
• AnzoGraph
• AllegroGraph

Mastech InfoTrellis’ Approach to Graph Database

Mastech InfoTrellis can help create a Customer 360 view from data received from disparate sources.

  • Customer-related data links are provided to easily view the graph network of the Customer 360 page. Create customer context, ascertain connection between parties, and understand how they are inter-related.

  • Business users can view the connection between multiple parties.

  • Enables business users to search for their customers and find relationships to any other data attributes related to that customer (other customers, products, competitors, concepts, places, events, or anything else).

  • Hierarchies of multiple entities can be visualized in a single Customer 360 dashboard.

  • Business users can visually see all the relationships created for that customer.

  • Enables a 360° visual representation of the relationship between entities using the data.For example, marketing teams can start exploring relationships between objects of interest using the data and use this information to drive targeted marketing messaging and campaigns.
    Having a graph database in the picture saves a lot of time when creating composite views(collection of connected or related source keys).
Customer 360 view

Conclusion
Graph databases solve today’s data challenges by focusing not only on data, but also on the connections or relationships between individual database entries. They have various use cases and are available both as community-driven software products and as commercial software with enterprise-grade support.

Aradhana

Aradhana Pandey

Technical Consultant



Informatica Intelligent Cloud Services (IICS) is the cloud-based data integration platform that provides various features such as enterprise data integration, application integration, and API management between cloud and on-premise systems.

This Informatica Data Integration platform brings in faster time-to-value (TTV) against your investment in a data integration tool. You don’t need to buy software licenses and servers to install the software, but you only pay for the usage of service. Faster TTV means you get started with building your data integration applications as soon as you have a subscription.

One of the biggest strengths of Informatica Cloud is that it enables business users to create integrations via easy-to-use tools. This makes it easy to create integrations and get faster results.

This blog highlights a few common issues users may encounter while working with the Application Integration module of IICS and the best practices we’ve adopted to overcome those issues. These best practices will help users save time on troubleshooting, increase efficiency and insight quality, and enhance product ROI.

Issues & Best Practice

1.Salesforce Connection Object Runs on Secure Agents Other than Cloud Server
When we assign a run-on environment as a secure agent, we are likely to face the following runtime error message if the connection object is used inside the process:

Invoke Create_contact_object: Completed with fault: subLanguageExecutionFault (Error executing expression (hutil:getConnection(string($dataSources/source[ $counter ]))). Reason: The connection meta-data for SFDC was not found. Please try to re-publish the connection.)

The reason is, the SFDC connection object generally will have hundreds of dependent objects which may not be published completely when you use a run-on environment as one of your secure agents. Hence, when the process gets executed it may not have the full SFDC metadata available from the connection object to add/update/delete the fields.

Solution: Use the run-on environment as a cloud server in the connection object and publish again.

Salesforce Connection Object Runs on Secure Agents Other than Cloud Server

2. Process/Sub-process Publish Issue
There may be inconsistency in process and subprocess publishing. In that case, when the parent process tries to invoke a subprocess during execution, we might get the following error:

Invoke P_SUB_PROCESS_NAME: Completed with fault: AeInvokePrepareException (Failed to initialize process invoke! Process ‘urn:screenflow: process: P_MAIN_PROCESS_NAME: P_MAIN_PROCESS_NAME’ (pid: #) in tenant $public.

Solution: Publish the parent process as well while publishing the sub-process that has changed. So, the best practice is to publish the parent process because it will invariably publish all its dependent objects.

3. Checkbox Field Not Being Populated Properly
When your target field is of a checkbox type, such as an SFDC checkbox type, and if you give binary values such as 1 for check and 0 for uncheck, the target field won’t be populated.

Solution: Always use the checkbox variable to map to checkbox fields. Alternatively, we can consider the checkbox field as a “Boolean type” with the values as true for check and false for uncheck; then a formula field can be used to derive true/false values based on requirements/conditions as shown in the screenshot below:

Checkbox Field Not Being Populated Properly

4. Reading/Parsing JSON Messages as Input in a Process
To parse JSON messages, we can use Process Objects. The process object is a data structure of how the XML/JSON can be stored in each message. It constitutes a group of variables of various types, which will be validated against the input. The order of variables does not matter but the type should match the input type.

The screenshot given below displays the sample process object. We can create input/output/temporary variables (process object type), which we have already defined, inside a process – in this case, Address. Process objects generated inside a service connector are available only within the service connector. Standalone process objects are available for use in any service connectors and/or processes.

Reading/Parsing JSON Messages as Input in a Process

When you use “Process object” as a type, you can directly use XQuery/XPath to easily parse the JSON/XML. For example, you can have “Address JSON” as an input.

[{“ADDRESS”:{“ACTIONTYPE”:”ADD”,”ACTIONDATE”:”2018-05-29T10:19:02.800″,”CONSTITUENTLOOKUPID”:”9995565″,”ADDRESSTYPE”:”Home”,”COUNTRY”:”United States”,”ADDRESS”:”8333 Arnold St”, “CITY”:”Dearborn Heights”,”STATE”:”WY”,”ZIP”: “48127-1218”}}]

Obtain Zip Code using Xpath as;

temp_Zip = Address[1]/ZIP

Note: JSON/XML messages can also be parsed without “Process objects”, but the usage of “Process objects” is advisable.

5. Publish Timed Out Error
When we try to publish any object, be it connection-related objects or even a process, and if it takes more time than the usual, the system will probably throw a publish timeout error. Even when this error message is displayed, Informatica will try to publish the object again and again in the backend. We can see the publish status after 5-10 minutes. If there are no changes in the publish time, it means that the publish activity has probably gone to a death path.

Publish Timed Out Error

Try to add URN Mapping for the respective secure agent in the console as follows:
1. Go to the Application Integration console.
2. Select the “Deployed Assets” tab and the respective secure agent from the drop-down menu.
3. Add the following URN and URL pair and select the “Add” button.
a. URN: ae:agent-response-mechanism
b. URL: asyncURN Mappings

URN Mappings

After adding the URN mapping, if you get an error message while trying to re-publish, even after a long interval (after encountering the above error message), it confirms that your previous publish attempt has gone to a death path. A death path is an issue that we face during publishing an object that never ends the process of publishing it. It can have many reasons such as:
• Need of accurate metadata information while creating a connection object
• Patch related issue that can be fixed only with consultation with Informatica support

Error

However, we cannot view these publish tasks running in the backend from the new UI of Informatica Cloud. Follow the steps below to view the older UI version:

1. Login to IICS and launch the Application Integration console.
2. Copy the URL and paste it in an adjacent browser tab.
3. Do not click on enter yet. Edit the URL to remove everything after the “/activevos”, until end of it, and then click on enter to submit the page.
4. This should open the traditional version (Old UI) of the process console related to your organization
5. Search for “Publish Process” in the Active Processes selection filter, by unchecking the “Hide System” and the “Hide Public” checkboxes.

Selection Filter

6. You will find some “Publish” processes, like Publish, PublishProcess in the running state. Click “Publish Process”, and verify the input variable to confirm if that is the connection you initiated from the publish activity.
7. Once you confirm the above and choose to kill the background process, assuming it is hanging for several hours together (which is not ideal), you can terminate the “Publish” processes from the above “Active Processes” listing section, by choosing these processes and choosing the Terminate action from the dropdown.
8. Then, you can try to publish the object again.

6. Connections Instances to Queue being Dropped / Event-Driven Process not being Triggered on Message Publish
If the secure agent machine on which the queue connection object is published underwent a reboot or if there has been a connectivity issue, the connection instance (consumer) to a queue may be lost. This will result in a situation where the attached process doesn’t get triggered even after messages are being published into the Queue.

Solution: Try to re-publish the connection object and once you do that, you should be able to see a connection instance (consumer) to the corresponding queue

7. Error Handling – Suspend on Faults
Users can choose to “suspend the process” when there is a fault, by enabling the checkbox in the process Advanced tab. In such cases when there is a fault, the process will suspend with the status as Suspended (faulting) in the console.

The significance of using “suspend on faults” in a process is mentioned below:
• Allows the process to be suspended rather than terminated when something goes wrong
• Suspended processes can be filtered in Application integration console
• Extensive support for fixing process
    o Activities within the process can be retried through manual option from the point of failure to
    resume the suspended process
    o Data in the input fields can also be set/reset before resuming the suspended process

 Error Handling – Suspend on Faults

Conclusion
The Application Integration module is used because of its real-time features. The real-time feature is achieved via processes and guides in the integration process. RabbitMQ is used as a source and Salesforce as a target in all the cases discussed above.

As a leading Data & Analytics services provider, Mastech InfoTrellis – a business unit of Mastech Digital – is helping companies around the world in their digital business transformation initiatives. If master data management, enterprise data integration, or big data and analytics are on your agenda, get in touch today for an in-depth consultation.

Rajendran-V

Rajendran V

Technical Consultant



Artificial Intelligence & Digital Business Transformation

Although Artificial Intelligence has been a buzzword for quite some time now, the positive impact it enables for businesses keeps growing with each year. Across industries, Artificial Intelligence is facilitating true digital business transformation, allowing businesses to take their operations to the next level and deliver more tailored, more sophisticated customer experiences. Below, we take a look at some of the industries where Artificial Intelligence is gaining significant traction.

 

  • Healthcare
  • While still relatively nascent, AI is beginning to pick up steam in the healthcare realm. Machine learning can track and identify specific health patterns within a group of individuals. X-ray machines with smart vision can help pinpoint diseases, and natural language processing techniques can improve drug prescription accuracy and safety. And once this data can all be integrated and understood, more individuals can be helped, at a faster rate, benefitting patients tremendously.
  • Manufacturing & Construction
  • AI holds unprecedented value in the world of manufacturing and construction. From enabling safer, more efficient operations to preventing faults before they happen by way of smart analytics, the significant reduction in human error and long-term costs will be a key factor in the sustainability of businesses within these industries.
  • Retail
  • The use of AI in the retail industry provides benefits for business and customers alike. Retailers will use AI along with augmented and virtual reality applications to extend and improve the buying experience. In fact, it’s predicted that chatbots will power a majority of customer interactions in the years to come, stretching the reach of a business from the storefront to the homefront. On the backend, smart, immersive visualization will help product managers and visual merchandisers make better decisions about the way products are displayed and conceived by the public.
  • Fashion & Supply-Chain
  • The fashion industry, and its famed supply-chain networks, have a lot to gain from AI. With new trends arising and fading faster than designers can change seasonal line-ups, AI helps designers stay a fashionable step ahead of the market by predicting what’s in, next. And when the supply chain has AI integrated as well, then the entire business can flow – from the top, downstream – seamlessly. Optimal decisions can be made even under the most complex scenarios, ensuring demand is appropriate accounted for, and supply is safely provided.
  • Education
  • While education, at its core, hasn’t really changed much in its goals, the way it is being delivered has. Utilizing AI within education, in schools and workplaces, can help instructors identify student pain-points and come up with tailored, engaging learning solutions. When the level of interaction changes, the impact of the education changes proportionally. AI brings forth a much-need revolution in education, where learning can be equally proactive and reactive, rather than just entirely one-sided, providing immense value to the learner(s)

At Mastech Digital – a digital transformation services company – we’re focused on facilitating true digital business transformation for our clients, who represent a variety of industries – more than those listed above. With dedicated, tech-savvy teams, we take the time to understand your needs and work with you to get you the right resources to realize your business goals. Technology can be a game-changer. And Artificial Intelligence is changing the game. The question is: are you?

 

ryan-bokor
Ryan Bokor
Account Executive