What is the best index strategy or query SELECT when performing a search/lookup BETWEEN IP address (IPv4 and IPv6) ranges?Unable to search between IPv6 range when converting string IP Address to VARBINARY(16)What is the best option for a large slowly changing dimension lookup tableAre two indexes needed?Identical query, tables, but different EXPLAIN and performanceIndex to filter a view before joining the underlying tablesdeteriorating stored procedure running timesNeed help improving sql query performanceMySQL query taking too longSlow SELECT examining whole tableEliminate Key Lookup (Clustered) operator that slows down performanceWhy filtered index on IS NULL value is not used?Queries key look up

How to write a chaotic neutral protagonist and prevent my readers from thinking they are evil?

Create chunks from an array

What is better: yes / no radio, or simple checkbox?

Paper published similar to PhD thesis

What does it take to become a wilderness skills guide as a business?

Is there a math expression equivalent to the conditional ternary operator?

Averaging over columns while ignoring zero entries

How to install "rounded" brake pads

Why is there an extra space when I type "ls" on the Desktop?

What would be the most expensive material to an intergalactic society?

Boss Telling direct supervisor I snitched

Does the US political system, in principle, allow for a no-party system?

I am the person who abides by rules but breaks the rules . Who am I

Where is the License file location for Identity Server in Sitecore 9.1?

Can multiple states demand income tax from an LLC?

Sort array by month and year

Professor forcing me to attend a conference, I can't afford even with 50% funding

3.5% Interest Student Loan or use all of my savings on Tuition?

Short story about cities being connected by a conveyor belt

Rationale to prefer local variables over instance variables?

Inorganic chemistry handbook with reaction lists

Is it a Cyclops number? "Nobody" knows!

Is it appropriate to ask a former professor to order a library book for me through ILL?

How to make sure I'm assertive enough in contact with subordinates?



What is the best index strategy or query SELECT when performing a search/lookup BETWEEN IP address (IPv4 and IPv6) ranges?


Unable to search between IPv6 range when converting string IP Address to VARBINARY(16)What is the best option for a large slowly changing dimension lookup tableAre two indexes needed?Identical query, tables, but different EXPLAIN and performanceIndex to filter a view before joining the underlying tablesdeteriorating stored procedure running timesNeed help improving sql query performanceMySQL query taking too longSlow SELECT examining whole tableEliminate Key Lookup (Clustered) operator that slows down performanceWhy filtered index on IS NULL value is not used?Queries key look up













3















Question: Is there a better indexing strategy or query SELECT that I can use for looking up one large data set against another large data set? Or, should I look at placing the lookup dimension table in memory (all 125 GB of it)?



Server Configuration:



  • The server is a virtual server running on top of VMWare, so additional hardware can be added in the background without having to reinstall the operating system

  • Microsoft SQL Server 2017 (RTM) - 14.0.1000.169 (X64)
    Aug 22 2017 17:04:49
    Copyright (C) 2017 Microsoft Corporation
    Standard Edition (64-bit) on Windows Server 2016 Standard 10.0 (Build 14393: ) (Hypervisor)


  • Note: I was previously on 2014 Enterprise - I have inquired why I was placed on Standard.

  • There is only one instance that is running 2 databases: mine and the DBAs

  • 2 File groups, with 1 file each: PRIMARY (system tables : not-default) and SECONDARY (non-system tables : default). The SECONDARY was meant to be scalable to hold more files once more CPUs were added. When the file group was initially created the server only had 2 CPUs

  • 8 GB memory

  • 500 GB disk storage (ISCSI SAN)

  • 4 CPUs (Intel I assume)

IIS Exchange Server log table Schema:



CREATE TABLE [FWY].[ExchangeServerLogTest](
[RowKey] [int] IDENTITY(1,1) NOT NULL,
[SourceFileName] [varchar](50) NOT NULL,
[SourceServer] [varchar](9) NOT NULL,
[SourceService] [varchar](6) NOT NULL,
[EventOccuranceTs] [datetime] NOT NULL,
[ServiceType] [varchar](50) NOT NULL,
[UserNameType] [varchar](25) NOT NULL,
[DomainId] [varchar](50) NULL,
[DomainName] [varchar](255) NULL,
[UserNameToLookup] [varchar](255) NOT NULL,
[UserAgent] [varchar](255) NULL,
[OutsideProtocolId] [varchar](10) NOT NULL,
[OutsideIp] [varchar](39) NULL,
[OutsideIpHex] [varbinary](16) NULL,
[InsideProtocolId] [varchar](10) NOT NULL,
[InsideIp] [varchar](39) NULL,
[InsideIpHex] [varbinary](16) NULL,
[DeviceId] [varchar](32) NULL,
[DeviceType] [varchar](25) NULL,
[DeviceModel] [varchar](75) NULL,
[AsOfDt] [date] NULL,
[OutsideProtocolKey] [int] NULL,
[InsideProtocolKey] [int] NULL,
CONSTRAINT [PK_ExchangeServerLogTest] PRIMARY KEY CLUSTERED
(
[RowKey] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [SECONDARY]
) ON [SECONDARY]


Non-Clustered Index:



CREATE NONCLUSTERED INDEX [NCIDX_ExchangeServerLogTest_InsideOutsideProtocolKeyIpHexInclRowKey] ON [FWY].[ExchangeServerLogTest]
(
[InsideProtocolKey] ASC,
[OutsideProtocolKey] ASC,
[InsideIpHex] ASC,
[OutsideIpHex] ASC
)
INCLUDE ( [RowKey]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
GO


IP GeoLocation data vendor table schema



CREATE TABLE [DE].[IpGeoLocation](
[CreateTs] [datetime] NOT NULL,
[CreateBy] [varchar](50) NOT NULL,
[CreateSequenceKey] [int] NULL,
[UpdateTs] [datetime] NULL,
[UpdateBy] [varchar](50) NULL,
[UpdateSequenceKey] [int] NULL,
[ActiveInd] [int] NOT NULL,
[RowKey] [int] IDENTITY(1,1) NOT NULL,
[VendorKey] [int] NULL,
[VendorTypeKey] [int] NULL,
[DimensionTypeKey] [int] NULL,
[ProtocolKey] [int] NULL,
[ProtocolId] [varchar](10) NOT NULL,
[EffectiveStartDate] [date] NULL,
[EffectiveEndDate] [date] NULL,
[NetworkStartIp] [varchar](39) NOT NULL,
[NetworkStartIpHex] [varbinary](16) NULL,
[NetworkEndIp] [varchar](39) NOT NULL,
[NetworkEndIpHex] [varbinary](16) NULL,
[Country] [varchar](255) NOT NULL,
[Region] [varchar](255) NOT NULL,
[City] [varchar](255) NOT NULL,
[ConnectionSpeed] [varchar](255) NOT NULL,
[ConnectionType] [varchar](255) NOT NULL,
[MetroCode] [int] NOT NULL,
[Latitude] [numeric](6, 3) NULL,
[Longitude] [numeric](6, 3) NULL,
[PostalCode] [varchar](255) NOT NULL,
[PostalExtension] [varchar](255) NOT NULL,
[CountryCode] [int] NOT NULL,
[RegionCode] [int] NOT NULL,
[CityCode] [int] NOT NULL,
[ContinentCode] [int] NOT NULL,
[TwoLetterCountry] [varchar](2) NOT NULL,
[InternalCode] [int] NOT NULL,
[AreaCodes] [varchar](255) NOT NULL,
[CountryConfidenceCode] [int] NOT NULL,
[RegionConfidenceCode] [int] NOT NULL,
[CityConfidenceCode] [int] NOT NULL,
[PostalConfidenceCode] [int] NOT NULL,
[GmtOffset] [varchar](255) NOT NULL,
[InDistance] [varchar](255) NOT NULL,
[TimeZoneName] [varchar](255) NOT NULL,
CONSTRAINT [PK_IpGeoLocation] PRIMARY KEY CLUSTERED
(
[RowKey] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [SECONDARY]
) ON [SECONDARY]


Non-Clustered Index:



CREATE NONCLUSTERED INDEX [NCIDX_IpGeoLocation_ProtocolKeyNetworkStartEndIpHexIncRowKey] ON [DE].[IpGeoLocation]
(
[ProtocolKey] ASC,
[NetworkStartIpHex] ASC,
[NetworkEndIpHex] ASC
)
INCLUDE ( [RowKey]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
GO


IP addresses are converted to their hexadecimal value using .NET's System.Net class: Ipaddress.Parse(IpAddress).GetAddressBytes(). I load the data files with SSIS and I have a script component that returns the ProtocolId and the IP address as a Byte array, which goes into SSIS as DT_BYTE and is mapped to a SQL Server VARBINARY(16) field (the byte array is implicitly converted to a hexadecimal value).



Lookup IP Address range



I have two data sets: IIS Exchange Server IP log records and IP GeoLocation data provided by a 3rd party vendor; where the Geolocation covers a range of IP addresses. I need to lookup the IP address from the log file and get its GeoLocation. Both data sets accommodate for IPv4 and IPv6 and the IP address is received in string format. When I load the data, I convert the IP address into a hexadecimal value [VARBINARY(16)] so that I can lookup an IP addresses GeoLocation.



The problem here is that I am loading a large amount of records. Currently, the vendor provides close to 200 million IP address Geolocations (i.e., dimension lookup table). I knew from the inception that performance optimization will be required at all stages (i.e., hardware configuration, table partitioning, and indexing strategy). I have loaded one week's worth of sample log data and that is approximately 150 million records.



Note: The log files are parsed where approximately 90% of records are ignored - we are only loading 10% of the records, so there is no performance boost that can be made here



I have created the following indexes on the ExchangeLogs table:



  1. A clustered index on an integer IDENTITY column called RowId

  2. A non-clustered index on the ProtocolId (i.e., IPv4 or IPv6 represented as integers), IpHex; where the RowId is included

I have created the following indexes on the IPGeoLocation table:



  1. A clustered index on an integer IDENTITY column called RowId

  2. A non-clustered index on the ProtocolId (i.e., IPv4 or IPv6 represented as integers), StartIpHex, and EndIpHex; where the RowId is included

When searching for the IP Geolocation, I join the two datasets as follows:



SELECT COUNT(DISTINCT DE.RowKey)
FROM DE.IpGeoLocation DE
INNER JOIN FWY.ExchangeServerLogTest T
ON T.InsideProtocolKey = DE.ProtocolKey
AND T.InsideIpHex BETWEEN DE.NetworkStartIpHex AND DE.NetworkEndIpHex


Estimated Query Execution Plan: Estimated InsideIp Query Execution Plan



Actual Query Execution Plan: Waiting for query to complete



SELECT COUNT(DISTINCT DE.RowKey)
FROM DE.IpGeoLocation DE
INNER JOIN FWY.ExchangeServerLogTest T
ON T.OutsideProtocolKey = DE.ProtocolKey
AND T.OutsideIpHex BETWEEN DE.NetworkStartIpHex AND DE.NetworkEndIpHex


Estimated Execution Plan: Estimated OutsideIp Query Execution Plan



Actual Query Execution Plan: DOES NOT FINISH



Note 2: The ProtocolId must be included, otherwise there are two results for each IP lookup: one for IPv4 and one for IPv6.



This seems like a very efficient execution plan considering 95% of the cost is on an index seek and another 2% on an index scan - 97% is attributed to index work.



The log files contain both internal and external IP Address on each row. For the sample data loaded:



  1. The Internal IP list contains 3 DISTINCT IP addresses.

  2. The external IP list contains approximately 60,000 DISTINCT IP Address.

Results:



  1. A SELECT on the internal IP list takes about 9 minutes to complete.

  2. A SELECT on the external IP list was stopped after allowing it to run for 16.25 hours (overnight).

I have not partitioned either the log table or the IP GeoLocation table. This might provide a performance boost by streaming data through two separate LUNs, but I am still trying to get a hardware configuration specification from our IT Ops group (they just provisioned new servers, so I don't have that info yet).










share|improve this question
























  • Besides that, why is there a ProtocolKey, an InsideProtocolKey and an OutsideProtocolKey in the first table and which of the 3 columns is used in the query?

    – ypercubeᵀᴹ
    7 hours ago












  • @ypercubeᵀᴹ The IIS Exchange Log files contain the internal and external IP address on the same row; where the internal and external can be any combination of IP Protocol (i.e., IPv4/IPv4, IPv4/IPv6, or IPv6/IPv4). The data vendor provides IP address ranges where each row only contains one IP protocol type (i.e., IPv4 or IPv6).

    – J Weezy
    7 hours ago











  • @ypercubeᵀᴹ That is a good idea. Though, I have a foreign key constraint to a Protocol table that stores all protocols that we are loading data for, which is an Int field. Conceivably, I could just use a bit field as I don't think IPv6 will be replaced anytime soon. But, then I would lose the foreign key constraint.

    – J Weezy
    7 hours ago












  • Let us continue this discussion in chat.

    – ypercubeᵀᴹ
    7 hours ago















3















Question: Is there a better indexing strategy or query SELECT that I can use for looking up one large data set against another large data set? Or, should I look at placing the lookup dimension table in memory (all 125 GB of it)?



Server Configuration:



  • The server is a virtual server running on top of VMWare, so additional hardware can be added in the background without having to reinstall the operating system

  • Microsoft SQL Server 2017 (RTM) - 14.0.1000.169 (X64)
    Aug 22 2017 17:04:49
    Copyright (C) 2017 Microsoft Corporation
    Standard Edition (64-bit) on Windows Server 2016 Standard 10.0 (Build 14393: ) (Hypervisor)


  • Note: I was previously on 2014 Enterprise - I have inquired why I was placed on Standard.

  • There is only one instance that is running 2 databases: mine and the DBAs

  • 2 File groups, with 1 file each: PRIMARY (system tables : not-default) and SECONDARY (non-system tables : default). The SECONDARY was meant to be scalable to hold more files once more CPUs were added. When the file group was initially created the server only had 2 CPUs

  • 8 GB memory

  • 500 GB disk storage (ISCSI SAN)

  • 4 CPUs (Intel I assume)

IIS Exchange Server log table Schema:



CREATE TABLE [FWY].[ExchangeServerLogTest](
[RowKey] [int] IDENTITY(1,1) NOT NULL,
[SourceFileName] [varchar](50) NOT NULL,
[SourceServer] [varchar](9) NOT NULL,
[SourceService] [varchar](6) NOT NULL,
[EventOccuranceTs] [datetime] NOT NULL,
[ServiceType] [varchar](50) NOT NULL,
[UserNameType] [varchar](25) NOT NULL,
[DomainId] [varchar](50) NULL,
[DomainName] [varchar](255) NULL,
[UserNameToLookup] [varchar](255) NOT NULL,
[UserAgent] [varchar](255) NULL,
[OutsideProtocolId] [varchar](10) NOT NULL,
[OutsideIp] [varchar](39) NULL,
[OutsideIpHex] [varbinary](16) NULL,
[InsideProtocolId] [varchar](10) NOT NULL,
[InsideIp] [varchar](39) NULL,
[InsideIpHex] [varbinary](16) NULL,
[DeviceId] [varchar](32) NULL,
[DeviceType] [varchar](25) NULL,
[DeviceModel] [varchar](75) NULL,
[AsOfDt] [date] NULL,
[OutsideProtocolKey] [int] NULL,
[InsideProtocolKey] [int] NULL,
CONSTRAINT [PK_ExchangeServerLogTest] PRIMARY KEY CLUSTERED
(
[RowKey] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [SECONDARY]
) ON [SECONDARY]


Non-Clustered Index:



CREATE NONCLUSTERED INDEX [NCIDX_ExchangeServerLogTest_InsideOutsideProtocolKeyIpHexInclRowKey] ON [FWY].[ExchangeServerLogTest]
(
[InsideProtocolKey] ASC,
[OutsideProtocolKey] ASC,
[InsideIpHex] ASC,
[OutsideIpHex] ASC
)
INCLUDE ( [RowKey]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
GO


IP GeoLocation data vendor table schema



CREATE TABLE [DE].[IpGeoLocation](
[CreateTs] [datetime] NOT NULL,
[CreateBy] [varchar](50) NOT NULL,
[CreateSequenceKey] [int] NULL,
[UpdateTs] [datetime] NULL,
[UpdateBy] [varchar](50) NULL,
[UpdateSequenceKey] [int] NULL,
[ActiveInd] [int] NOT NULL,
[RowKey] [int] IDENTITY(1,1) NOT NULL,
[VendorKey] [int] NULL,
[VendorTypeKey] [int] NULL,
[DimensionTypeKey] [int] NULL,
[ProtocolKey] [int] NULL,
[ProtocolId] [varchar](10) NOT NULL,
[EffectiveStartDate] [date] NULL,
[EffectiveEndDate] [date] NULL,
[NetworkStartIp] [varchar](39) NOT NULL,
[NetworkStartIpHex] [varbinary](16) NULL,
[NetworkEndIp] [varchar](39) NOT NULL,
[NetworkEndIpHex] [varbinary](16) NULL,
[Country] [varchar](255) NOT NULL,
[Region] [varchar](255) NOT NULL,
[City] [varchar](255) NOT NULL,
[ConnectionSpeed] [varchar](255) NOT NULL,
[ConnectionType] [varchar](255) NOT NULL,
[MetroCode] [int] NOT NULL,
[Latitude] [numeric](6, 3) NULL,
[Longitude] [numeric](6, 3) NULL,
[PostalCode] [varchar](255) NOT NULL,
[PostalExtension] [varchar](255) NOT NULL,
[CountryCode] [int] NOT NULL,
[RegionCode] [int] NOT NULL,
[CityCode] [int] NOT NULL,
[ContinentCode] [int] NOT NULL,
[TwoLetterCountry] [varchar](2) NOT NULL,
[InternalCode] [int] NOT NULL,
[AreaCodes] [varchar](255) NOT NULL,
[CountryConfidenceCode] [int] NOT NULL,
[RegionConfidenceCode] [int] NOT NULL,
[CityConfidenceCode] [int] NOT NULL,
[PostalConfidenceCode] [int] NOT NULL,
[GmtOffset] [varchar](255) NOT NULL,
[InDistance] [varchar](255) NOT NULL,
[TimeZoneName] [varchar](255) NOT NULL,
CONSTRAINT [PK_IpGeoLocation] PRIMARY KEY CLUSTERED
(
[RowKey] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [SECONDARY]
) ON [SECONDARY]


Non-Clustered Index:



CREATE NONCLUSTERED INDEX [NCIDX_IpGeoLocation_ProtocolKeyNetworkStartEndIpHexIncRowKey] ON [DE].[IpGeoLocation]
(
[ProtocolKey] ASC,
[NetworkStartIpHex] ASC,
[NetworkEndIpHex] ASC
)
INCLUDE ( [RowKey]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
GO


IP addresses are converted to their hexadecimal value using .NET's System.Net class: Ipaddress.Parse(IpAddress).GetAddressBytes(). I load the data files with SSIS and I have a script component that returns the ProtocolId and the IP address as a Byte array, which goes into SSIS as DT_BYTE and is mapped to a SQL Server VARBINARY(16) field (the byte array is implicitly converted to a hexadecimal value).



Lookup IP Address range



I have two data sets: IIS Exchange Server IP log records and IP GeoLocation data provided by a 3rd party vendor; where the Geolocation covers a range of IP addresses. I need to lookup the IP address from the log file and get its GeoLocation. Both data sets accommodate for IPv4 and IPv6 and the IP address is received in string format. When I load the data, I convert the IP address into a hexadecimal value [VARBINARY(16)] so that I can lookup an IP addresses GeoLocation.



The problem here is that I am loading a large amount of records. Currently, the vendor provides close to 200 million IP address Geolocations (i.e., dimension lookup table). I knew from the inception that performance optimization will be required at all stages (i.e., hardware configuration, table partitioning, and indexing strategy). I have loaded one week's worth of sample log data and that is approximately 150 million records.



Note: The log files are parsed where approximately 90% of records are ignored - we are only loading 10% of the records, so there is no performance boost that can be made here



I have created the following indexes on the ExchangeLogs table:



  1. A clustered index on an integer IDENTITY column called RowId

  2. A non-clustered index on the ProtocolId (i.e., IPv4 or IPv6 represented as integers), IpHex; where the RowId is included

I have created the following indexes on the IPGeoLocation table:



  1. A clustered index on an integer IDENTITY column called RowId

  2. A non-clustered index on the ProtocolId (i.e., IPv4 or IPv6 represented as integers), StartIpHex, and EndIpHex; where the RowId is included

When searching for the IP Geolocation, I join the two datasets as follows:



SELECT COUNT(DISTINCT DE.RowKey)
FROM DE.IpGeoLocation DE
INNER JOIN FWY.ExchangeServerLogTest T
ON T.InsideProtocolKey = DE.ProtocolKey
AND T.InsideIpHex BETWEEN DE.NetworkStartIpHex AND DE.NetworkEndIpHex


Estimated Query Execution Plan: Estimated InsideIp Query Execution Plan



Actual Query Execution Plan: Waiting for query to complete



SELECT COUNT(DISTINCT DE.RowKey)
FROM DE.IpGeoLocation DE
INNER JOIN FWY.ExchangeServerLogTest T
ON T.OutsideProtocolKey = DE.ProtocolKey
AND T.OutsideIpHex BETWEEN DE.NetworkStartIpHex AND DE.NetworkEndIpHex


Estimated Execution Plan: Estimated OutsideIp Query Execution Plan



Actual Query Execution Plan: DOES NOT FINISH



Note 2: The ProtocolId must be included, otherwise there are two results for each IP lookup: one for IPv4 and one for IPv6.



This seems like a very efficient execution plan considering 95% of the cost is on an index seek and another 2% on an index scan - 97% is attributed to index work.



The log files contain both internal and external IP Address on each row. For the sample data loaded:



  1. The Internal IP list contains 3 DISTINCT IP addresses.

  2. The external IP list contains approximately 60,000 DISTINCT IP Address.

Results:



  1. A SELECT on the internal IP list takes about 9 minutes to complete.

  2. A SELECT on the external IP list was stopped after allowing it to run for 16.25 hours (overnight).

I have not partitioned either the log table or the IP GeoLocation table. This might provide a performance boost by streaming data through two separate LUNs, but I am still trying to get a hardware configuration specification from our IT Ops group (they just provisioned new servers, so I don't have that info yet).










share|improve this question
























  • Besides that, why is there a ProtocolKey, an InsideProtocolKey and an OutsideProtocolKey in the first table and which of the 3 columns is used in the query?

    – ypercubeᵀᴹ
    7 hours ago












  • @ypercubeᵀᴹ The IIS Exchange Log files contain the internal and external IP address on the same row; where the internal and external can be any combination of IP Protocol (i.e., IPv4/IPv4, IPv4/IPv6, or IPv6/IPv4). The data vendor provides IP address ranges where each row only contains one IP protocol type (i.e., IPv4 or IPv6).

    – J Weezy
    7 hours ago











  • @ypercubeᵀᴹ That is a good idea. Though, I have a foreign key constraint to a Protocol table that stores all protocols that we are loading data for, which is an Int field. Conceivably, I could just use a bit field as I don't think IPv6 will be replaced anytime soon. But, then I would lose the foreign key constraint.

    – J Weezy
    7 hours ago












  • Let us continue this discussion in chat.

    – ypercubeᵀᴹ
    7 hours ago













3












3








3


1






Question: Is there a better indexing strategy or query SELECT that I can use for looking up one large data set against another large data set? Or, should I look at placing the lookup dimension table in memory (all 125 GB of it)?



Server Configuration:



  • The server is a virtual server running on top of VMWare, so additional hardware can be added in the background without having to reinstall the operating system

  • Microsoft SQL Server 2017 (RTM) - 14.0.1000.169 (X64)
    Aug 22 2017 17:04:49
    Copyright (C) 2017 Microsoft Corporation
    Standard Edition (64-bit) on Windows Server 2016 Standard 10.0 (Build 14393: ) (Hypervisor)


  • Note: I was previously on 2014 Enterprise - I have inquired why I was placed on Standard.

  • There is only one instance that is running 2 databases: mine and the DBAs

  • 2 File groups, with 1 file each: PRIMARY (system tables : not-default) and SECONDARY (non-system tables : default). The SECONDARY was meant to be scalable to hold more files once more CPUs were added. When the file group was initially created the server only had 2 CPUs

  • 8 GB memory

  • 500 GB disk storage (ISCSI SAN)

  • 4 CPUs (Intel I assume)

IIS Exchange Server log table Schema:



CREATE TABLE [FWY].[ExchangeServerLogTest](
[RowKey] [int] IDENTITY(1,1) NOT NULL,
[SourceFileName] [varchar](50) NOT NULL,
[SourceServer] [varchar](9) NOT NULL,
[SourceService] [varchar](6) NOT NULL,
[EventOccuranceTs] [datetime] NOT NULL,
[ServiceType] [varchar](50) NOT NULL,
[UserNameType] [varchar](25) NOT NULL,
[DomainId] [varchar](50) NULL,
[DomainName] [varchar](255) NULL,
[UserNameToLookup] [varchar](255) NOT NULL,
[UserAgent] [varchar](255) NULL,
[OutsideProtocolId] [varchar](10) NOT NULL,
[OutsideIp] [varchar](39) NULL,
[OutsideIpHex] [varbinary](16) NULL,
[InsideProtocolId] [varchar](10) NOT NULL,
[InsideIp] [varchar](39) NULL,
[InsideIpHex] [varbinary](16) NULL,
[DeviceId] [varchar](32) NULL,
[DeviceType] [varchar](25) NULL,
[DeviceModel] [varchar](75) NULL,
[AsOfDt] [date] NULL,
[OutsideProtocolKey] [int] NULL,
[InsideProtocolKey] [int] NULL,
CONSTRAINT [PK_ExchangeServerLogTest] PRIMARY KEY CLUSTERED
(
[RowKey] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [SECONDARY]
) ON [SECONDARY]


Non-Clustered Index:



CREATE NONCLUSTERED INDEX [NCIDX_ExchangeServerLogTest_InsideOutsideProtocolKeyIpHexInclRowKey] ON [FWY].[ExchangeServerLogTest]
(
[InsideProtocolKey] ASC,
[OutsideProtocolKey] ASC,
[InsideIpHex] ASC,
[OutsideIpHex] ASC
)
INCLUDE ( [RowKey]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
GO


IP GeoLocation data vendor table schema



CREATE TABLE [DE].[IpGeoLocation](
[CreateTs] [datetime] NOT NULL,
[CreateBy] [varchar](50) NOT NULL,
[CreateSequenceKey] [int] NULL,
[UpdateTs] [datetime] NULL,
[UpdateBy] [varchar](50) NULL,
[UpdateSequenceKey] [int] NULL,
[ActiveInd] [int] NOT NULL,
[RowKey] [int] IDENTITY(1,1) NOT NULL,
[VendorKey] [int] NULL,
[VendorTypeKey] [int] NULL,
[DimensionTypeKey] [int] NULL,
[ProtocolKey] [int] NULL,
[ProtocolId] [varchar](10) NOT NULL,
[EffectiveStartDate] [date] NULL,
[EffectiveEndDate] [date] NULL,
[NetworkStartIp] [varchar](39) NOT NULL,
[NetworkStartIpHex] [varbinary](16) NULL,
[NetworkEndIp] [varchar](39) NOT NULL,
[NetworkEndIpHex] [varbinary](16) NULL,
[Country] [varchar](255) NOT NULL,
[Region] [varchar](255) NOT NULL,
[City] [varchar](255) NOT NULL,
[ConnectionSpeed] [varchar](255) NOT NULL,
[ConnectionType] [varchar](255) NOT NULL,
[MetroCode] [int] NOT NULL,
[Latitude] [numeric](6, 3) NULL,
[Longitude] [numeric](6, 3) NULL,
[PostalCode] [varchar](255) NOT NULL,
[PostalExtension] [varchar](255) NOT NULL,
[CountryCode] [int] NOT NULL,
[RegionCode] [int] NOT NULL,
[CityCode] [int] NOT NULL,
[ContinentCode] [int] NOT NULL,
[TwoLetterCountry] [varchar](2) NOT NULL,
[InternalCode] [int] NOT NULL,
[AreaCodes] [varchar](255) NOT NULL,
[CountryConfidenceCode] [int] NOT NULL,
[RegionConfidenceCode] [int] NOT NULL,
[CityConfidenceCode] [int] NOT NULL,
[PostalConfidenceCode] [int] NOT NULL,
[GmtOffset] [varchar](255) NOT NULL,
[InDistance] [varchar](255) NOT NULL,
[TimeZoneName] [varchar](255) NOT NULL,
CONSTRAINT [PK_IpGeoLocation] PRIMARY KEY CLUSTERED
(
[RowKey] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [SECONDARY]
) ON [SECONDARY]


Non-Clustered Index:



CREATE NONCLUSTERED INDEX [NCIDX_IpGeoLocation_ProtocolKeyNetworkStartEndIpHexIncRowKey] ON [DE].[IpGeoLocation]
(
[ProtocolKey] ASC,
[NetworkStartIpHex] ASC,
[NetworkEndIpHex] ASC
)
INCLUDE ( [RowKey]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
GO


IP addresses are converted to their hexadecimal value using .NET's System.Net class: Ipaddress.Parse(IpAddress).GetAddressBytes(). I load the data files with SSIS and I have a script component that returns the ProtocolId and the IP address as a Byte array, which goes into SSIS as DT_BYTE and is mapped to a SQL Server VARBINARY(16) field (the byte array is implicitly converted to a hexadecimal value).



Lookup IP Address range



I have two data sets: IIS Exchange Server IP log records and IP GeoLocation data provided by a 3rd party vendor; where the Geolocation covers a range of IP addresses. I need to lookup the IP address from the log file and get its GeoLocation. Both data sets accommodate for IPv4 and IPv6 and the IP address is received in string format. When I load the data, I convert the IP address into a hexadecimal value [VARBINARY(16)] so that I can lookup an IP addresses GeoLocation.



The problem here is that I am loading a large amount of records. Currently, the vendor provides close to 200 million IP address Geolocations (i.e., dimension lookup table). I knew from the inception that performance optimization will be required at all stages (i.e., hardware configuration, table partitioning, and indexing strategy). I have loaded one week's worth of sample log data and that is approximately 150 million records.



Note: The log files are parsed where approximately 90% of records are ignored - we are only loading 10% of the records, so there is no performance boost that can be made here



I have created the following indexes on the ExchangeLogs table:



  1. A clustered index on an integer IDENTITY column called RowId

  2. A non-clustered index on the ProtocolId (i.e., IPv4 or IPv6 represented as integers), IpHex; where the RowId is included

I have created the following indexes on the IPGeoLocation table:



  1. A clustered index on an integer IDENTITY column called RowId

  2. A non-clustered index on the ProtocolId (i.e., IPv4 or IPv6 represented as integers), StartIpHex, and EndIpHex; where the RowId is included

When searching for the IP Geolocation, I join the two datasets as follows:



SELECT COUNT(DISTINCT DE.RowKey)
FROM DE.IpGeoLocation DE
INNER JOIN FWY.ExchangeServerLogTest T
ON T.InsideProtocolKey = DE.ProtocolKey
AND T.InsideIpHex BETWEEN DE.NetworkStartIpHex AND DE.NetworkEndIpHex


Estimated Query Execution Plan: Estimated InsideIp Query Execution Plan



Actual Query Execution Plan: Waiting for query to complete



SELECT COUNT(DISTINCT DE.RowKey)
FROM DE.IpGeoLocation DE
INNER JOIN FWY.ExchangeServerLogTest T
ON T.OutsideProtocolKey = DE.ProtocolKey
AND T.OutsideIpHex BETWEEN DE.NetworkStartIpHex AND DE.NetworkEndIpHex


Estimated Execution Plan: Estimated OutsideIp Query Execution Plan



Actual Query Execution Plan: DOES NOT FINISH



Note 2: The ProtocolId must be included, otherwise there are two results for each IP lookup: one for IPv4 and one for IPv6.



This seems like a very efficient execution plan considering 95% of the cost is on an index seek and another 2% on an index scan - 97% is attributed to index work.



The log files contain both internal and external IP Address on each row. For the sample data loaded:



  1. The Internal IP list contains 3 DISTINCT IP addresses.

  2. The external IP list contains approximately 60,000 DISTINCT IP Address.

Results:



  1. A SELECT on the internal IP list takes about 9 minutes to complete.

  2. A SELECT on the external IP list was stopped after allowing it to run for 16.25 hours (overnight).

I have not partitioned either the log table or the IP GeoLocation table. This might provide a performance boost by streaming data through two separate LUNs, but I am still trying to get a hardware configuration specification from our IT Ops group (they just provisioned new servers, so I don't have that info yet).










share|improve this question
















Question: Is there a better indexing strategy or query SELECT that I can use for looking up one large data set against another large data set? Or, should I look at placing the lookup dimension table in memory (all 125 GB of it)?



Server Configuration:



  • The server is a virtual server running on top of VMWare, so additional hardware can be added in the background without having to reinstall the operating system

  • Microsoft SQL Server 2017 (RTM) - 14.0.1000.169 (X64)
    Aug 22 2017 17:04:49
    Copyright (C) 2017 Microsoft Corporation
    Standard Edition (64-bit) on Windows Server 2016 Standard 10.0 (Build 14393: ) (Hypervisor)


  • Note: I was previously on 2014 Enterprise - I have inquired why I was placed on Standard.

  • There is only one instance that is running 2 databases: mine and the DBAs

  • 2 File groups, with 1 file each: PRIMARY (system tables : not-default) and SECONDARY (non-system tables : default). The SECONDARY was meant to be scalable to hold more files once more CPUs were added. When the file group was initially created the server only had 2 CPUs

  • 8 GB memory

  • 500 GB disk storage (ISCSI SAN)

  • 4 CPUs (Intel I assume)

IIS Exchange Server log table Schema:



CREATE TABLE [FWY].[ExchangeServerLogTest](
[RowKey] [int] IDENTITY(1,1) NOT NULL,
[SourceFileName] [varchar](50) NOT NULL,
[SourceServer] [varchar](9) NOT NULL,
[SourceService] [varchar](6) NOT NULL,
[EventOccuranceTs] [datetime] NOT NULL,
[ServiceType] [varchar](50) NOT NULL,
[UserNameType] [varchar](25) NOT NULL,
[DomainId] [varchar](50) NULL,
[DomainName] [varchar](255) NULL,
[UserNameToLookup] [varchar](255) NOT NULL,
[UserAgent] [varchar](255) NULL,
[OutsideProtocolId] [varchar](10) NOT NULL,
[OutsideIp] [varchar](39) NULL,
[OutsideIpHex] [varbinary](16) NULL,
[InsideProtocolId] [varchar](10) NOT NULL,
[InsideIp] [varchar](39) NULL,
[InsideIpHex] [varbinary](16) NULL,
[DeviceId] [varchar](32) NULL,
[DeviceType] [varchar](25) NULL,
[DeviceModel] [varchar](75) NULL,
[AsOfDt] [date] NULL,
[OutsideProtocolKey] [int] NULL,
[InsideProtocolKey] [int] NULL,
CONSTRAINT [PK_ExchangeServerLogTest] PRIMARY KEY CLUSTERED
(
[RowKey] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [SECONDARY]
) ON [SECONDARY]


Non-Clustered Index:



CREATE NONCLUSTERED INDEX [NCIDX_ExchangeServerLogTest_InsideOutsideProtocolKeyIpHexInclRowKey] ON [FWY].[ExchangeServerLogTest]
(
[InsideProtocolKey] ASC,
[OutsideProtocolKey] ASC,
[InsideIpHex] ASC,
[OutsideIpHex] ASC
)
INCLUDE ( [RowKey]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
GO


IP GeoLocation data vendor table schema



CREATE TABLE [DE].[IpGeoLocation](
[CreateTs] [datetime] NOT NULL,
[CreateBy] [varchar](50) NOT NULL,
[CreateSequenceKey] [int] NULL,
[UpdateTs] [datetime] NULL,
[UpdateBy] [varchar](50) NULL,
[UpdateSequenceKey] [int] NULL,
[ActiveInd] [int] NOT NULL,
[RowKey] [int] IDENTITY(1,1) NOT NULL,
[VendorKey] [int] NULL,
[VendorTypeKey] [int] NULL,
[DimensionTypeKey] [int] NULL,
[ProtocolKey] [int] NULL,
[ProtocolId] [varchar](10) NOT NULL,
[EffectiveStartDate] [date] NULL,
[EffectiveEndDate] [date] NULL,
[NetworkStartIp] [varchar](39) NOT NULL,
[NetworkStartIpHex] [varbinary](16) NULL,
[NetworkEndIp] [varchar](39) NOT NULL,
[NetworkEndIpHex] [varbinary](16) NULL,
[Country] [varchar](255) NOT NULL,
[Region] [varchar](255) NOT NULL,
[City] [varchar](255) NOT NULL,
[ConnectionSpeed] [varchar](255) NOT NULL,
[ConnectionType] [varchar](255) NOT NULL,
[MetroCode] [int] NOT NULL,
[Latitude] [numeric](6, 3) NULL,
[Longitude] [numeric](6, 3) NULL,
[PostalCode] [varchar](255) NOT NULL,
[PostalExtension] [varchar](255) NOT NULL,
[CountryCode] [int] NOT NULL,
[RegionCode] [int] NOT NULL,
[CityCode] [int] NOT NULL,
[ContinentCode] [int] NOT NULL,
[TwoLetterCountry] [varchar](2) NOT NULL,
[InternalCode] [int] NOT NULL,
[AreaCodes] [varchar](255) NOT NULL,
[CountryConfidenceCode] [int] NOT NULL,
[RegionConfidenceCode] [int] NOT NULL,
[CityConfidenceCode] [int] NOT NULL,
[PostalConfidenceCode] [int] NOT NULL,
[GmtOffset] [varchar](255) NOT NULL,
[InDistance] [varchar](255) NOT NULL,
[TimeZoneName] [varchar](255) NOT NULL,
CONSTRAINT [PK_IpGeoLocation] PRIMARY KEY CLUSTERED
(
[RowKey] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [SECONDARY]
) ON [SECONDARY]


Non-Clustered Index:



CREATE NONCLUSTERED INDEX [NCIDX_IpGeoLocation_ProtocolKeyNetworkStartEndIpHexIncRowKey] ON [DE].[IpGeoLocation]
(
[ProtocolKey] ASC,
[NetworkStartIpHex] ASC,
[NetworkEndIpHex] ASC
)
INCLUDE ( [RowKey]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
GO


IP addresses are converted to their hexadecimal value using .NET's System.Net class: Ipaddress.Parse(IpAddress).GetAddressBytes(). I load the data files with SSIS and I have a script component that returns the ProtocolId and the IP address as a Byte array, which goes into SSIS as DT_BYTE and is mapped to a SQL Server VARBINARY(16) field (the byte array is implicitly converted to a hexadecimal value).



Lookup IP Address range



I have two data sets: IIS Exchange Server IP log records and IP GeoLocation data provided by a 3rd party vendor; where the Geolocation covers a range of IP addresses. I need to lookup the IP address from the log file and get its GeoLocation. Both data sets accommodate for IPv4 and IPv6 and the IP address is received in string format. When I load the data, I convert the IP address into a hexadecimal value [VARBINARY(16)] so that I can lookup an IP addresses GeoLocation.



The problem here is that I am loading a large amount of records. Currently, the vendor provides close to 200 million IP address Geolocations (i.e., dimension lookup table). I knew from the inception that performance optimization will be required at all stages (i.e., hardware configuration, table partitioning, and indexing strategy). I have loaded one week's worth of sample log data and that is approximately 150 million records.



Note: The log files are parsed where approximately 90% of records are ignored - we are only loading 10% of the records, so there is no performance boost that can be made here



I have created the following indexes on the ExchangeLogs table:



  1. A clustered index on an integer IDENTITY column called RowId

  2. A non-clustered index on the ProtocolId (i.e., IPv4 or IPv6 represented as integers), IpHex; where the RowId is included

I have created the following indexes on the IPGeoLocation table:



  1. A clustered index on an integer IDENTITY column called RowId

  2. A non-clustered index on the ProtocolId (i.e., IPv4 or IPv6 represented as integers), StartIpHex, and EndIpHex; where the RowId is included

When searching for the IP Geolocation, I join the two datasets as follows:



SELECT COUNT(DISTINCT DE.RowKey)
FROM DE.IpGeoLocation DE
INNER JOIN FWY.ExchangeServerLogTest T
ON T.InsideProtocolKey = DE.ProtocolKey
AND T.InsideIpHex BETWEEN DE.NetworkStartIpHex AND DE.NetworkEndIpHex


Estimated Query Execution Plan: Estimated InsideIp Query Execution Plan



Actual Query Execution Plan: Waiting for query to complete



SELECT COUNT(DISTINCT DE.RowKey)
FROM DE.IpGeoLocation DE
INNER JOIN FWY.ExchangeServerLogTest T
ON T.OutsideProtocolKey = DE.ProtocolKey
AND T.OutsideIpHex BETWEEN DE.NetworkStartIpHex AND DE.NetworkEndIpHex


Estimated Execution Plan: Estimated OutsideIp Query Execution Plan



Actual Query Execution Plan: DOES NOT FINISH



Note 2: The ProtocolId must be included, otherwise there are two results for each IP lookup: one for IPv4 and one for IPv6.



This seems like a very efficient execution plan considering 95% of the cost is on an index seek and another 2% on an index scan - 97% is attributed to index work.



The log files contain both internal and external IP Address on each row. For the sample data loaded:



  1. The Internal IP list contains 3 DISTINCT IP addresses.

  2. The external IP list contains approximately 60,000 DISTINCT IP Address.

Results:



  1. A SELECT on the internal IP list takes about 9 minutes to complete.

  2. A SELECT on the external IP list was stopped after allowing it to run for 16.25 hours (overnight).

I have not partitioned either the log table or the IP GeoLocation table. This might provide a performance boost by streaming data through two separate LUNs, but I am still trying to get a hardware configuration specification from our IT Ops group (they just provisioned new servers, so I don't have that info yet).







sql-server performance query-performance index-tuning configuration






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited 7 hours ago







J Weezy

















asked 9 hours ago









J WeezyJ Weezy

1507




1507












  • Besides that, why is there a ProtocolKey, an InsideProtocolKey and an OutsideProtocolKey in the first table and which of the 3 columns is used in the query?

    – ypercubeᵀᴹ
    7 hours ago












  • @ypercubeᵀᴹ The IIS Exchange Log files contain the internal and external IP address on the same row; where the internal and external can be any combination of IP Protocol (i.e., IPv4/IPv4, IPv4/IPv6, or IPv6/IPv4). The data vendor provides IP address ranges where each row only contains one IP protocol type (i.e., IPv4 or IPv6).

    – J Weezy
    7 hours ago











  • @ypercubeᵀᴹ That is a good idea. Though, I have a foreign key constraint to a Protocol table that stores all protocols that we are loading data for, which is an Int field. Conceivably, I could just use a bit field as I don't think IPv6 will be replaced anytime soon. But, then I would lose the foreign key constraint.

    – J Weezy
    7 hours ago












  • Let us continue this discussion in chat.

    – ypercubeᵀᴹ
    7 hours ago

















  • Besides that, why is there a ProtocolKey, an InsideProtocolKey and an OutsideProtocolKey in the first table and which of the 3 columns is used in the query?

    – ypercubeᵀᴹ
    7 hours ago












  • @ypercubeᵀᴹ The IIS Exchange Log files contain the internal and external IP address on the same row; where the internal and external can be any combination of IP Protocol (i.e., IPv4/IPv4, IPv4/IPv6, or IPv6/IPv4). The data vendor provides IP address ranges where each row only contains one IP protocol type (i.e., IPv4 or IPv6).

    – J Weezy
    7 hours ago











  • @ypercubeᵀᴹ That is a good idea. Though, I have a foreign key constraint to a Protocol table that stores all protocols that we are loading data for, which is an Int field. Conceivably, I could just use a bit field as I don't think IPv6 will be replaced anytime soon. But, then I would lose the foreign key constraint.

    – J Weezy
    7 hours ago












  • Let us continue this discussion in chat.

    – ypercubeᵀᴹ
    7 hours ago
















Besides that, why is there a ProtocolKey, an InsideProtocolKey and an OutsideProtocolKey in the first table and which of the 3 columns is used in the query?

– ypercubeᵀᴹ
7 hours ago






Besides that, why is there a ProtocolKey, an InsideProtocolKey and an OutsideProtocolKey in the first table and which of the 3 columns is used in the query?

– ypercubeᵀᴹ
7 hours ago














@ypercubeᵀᴹ The IIS Exchange Log files contain the internal and external IP address on the same row; where the internal and external can be any combination of IP Protocol (i.e., IPv4/IPv4, IPv4/IPv6, or IPv6/IPv4). The data vendor provides IP address ranges where each row only contains one IP protocol type (i.e., IPv4 or IPv6).

– J Weezy
7 hours ago





@ypercubeᵀᴹ The IIS Exchange Log files contain the internal and external IP address on the same row; where the internal and external can be any combination of IP Protocol (i.e., IPv4/IPv4, IPv4/IPv6, or IPv6/IPv4). The data vendor provides IP address ranges where each row only contains one IP protocol type (i.e., IPv4 or IPv6).

– J Weezy
7 hours ago













@ypercubeᵀᴹ That is a good idea. Though, I have a foreign key constraint to a Protocol table that stores all protocols that we are loading data for, which is an Int field. Conceivably, I could just use a bit field as I don't think IPv6 will be replaced anytime soon. But, then I would lose the foreign key constraint.

– J Weezy
7 hours ago






@ypercubeᵀᴹ That is a good idea. Though, I have a foreign key constraint to a Protocol table that stores all protocols that we are loading data for, which is an Int field. Conceivably, I could just use a bit field as I don't think IPv6 will be replaced anytime soon. But, then I would lose the foreign key constraint.

– J Weezy
7 hours ago














Let us continue this discussion in chat.

– ypercubeᵀᴹ
7 hours ago





Let us continue this discussion in chat.

– ypercubeᵀᴹ
7 hours ago










1 Answer
1






active

oldest

votes


















4















  • First, I suggest you add two separate indexes, on



    (InsideProtocolKey, InsideIpHex) INCLUDE (RowKey)

    (OutsideProtocolKey, OutsideIpHex) INCLUDE (RowKey)


    and try the queries again. Your 4-column index is not good for the "Outside" query as the columns appear in the 2nd and 4th position and only slightly good for the "inside" query (1st and 3rd). Plus, these 2 indexes will be half in size (20 bytes vs 40 bytes per row).




  • Second, a minor improvement. Since you only have two options for the ProtocolKey column (and its variations, Inside/Outside), you could conevert (all of them) from int (4 bytes) to tinyint (1 byte) or even to bit (1 bit) and save 3 bytes per row (or 3 + 7/8).



    It won't be a huge saving, but for big tables, it would help. For the not so big, 200M rows x 3 bytes = 600MB save, for every index where the columns appear. I'm not entirely sure about space use of indexes bit columns but surely the save would be either the same as with tinyint (600MB) or more (up to 775MB) for the same table size. Still, and I mention this again, for every index that uses the column.



    Smaller indexes, smaller size on disk and more important, less memory and more probable to stay in memory, especially with the low RAM server you have.



  • Third, 8GB sounds like a very small amount of RAM these days, especially when you have tables of this size. RAM is cheap (at least until you pass the 128GB Standard/Enterprise threshold and then you have the bigger licence charge).






share|improve this answer
























    Your Answer








    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "182"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













    draft saved

    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f231689%2fwhat-is-the-best-index-strategy-or-query-select-when-performing-a-search-lookup%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    4















    • First, I suggest you add two separate indexes, on



      (InsideProtocolKey, InsideIpHex) INCLUDE (RowKey)

      (OutsideProtocolKey, OutsideIpHex) INCLUDE (RowKey)


      and try the queries again. Your 4-column index is not good for the "Outside" query as the columns appear in the 2nd and 4th position and only slightly good for the "inside" query (1st and 3rd). Plus, these 2 indexes will be half in size (20 bytes vs 40 bytes per row).




    • Second, a minor improvement. Since you only have two options for the ProtocolKey column (and its variations, Inside/Outside), you could conevert (all of them) from int (4 bytes) to tinyint (1 byte) or even to bit (1 bit) and save 3 bytes per row (or 3 + 7/8).



      It won't be a huge saving, but for big tables, it would help. For the not so big, 200M rows x 3 bytes = 600MB save, for every index where the columns appear. I'm not entirely sure about space use of indexes bit columns but surely the save would be either the same as with tinyint (600MB) or more (up to 775MB) for the same table size. Still, and I mention this again, for every index that uses the column.



      Smaller indexes, smaller size on disk and more important, less memory and more probable to stay in memory, especially with the low RAM server you have.



    • Third, 8GB sounds like a very small amount of RAM these days, especially when you have tables of this size. RAM is cheap (at least until you pass the 128GB Standard/Enterprise threshold and then you have the bigger licence charge).






    share|improve this answer





























      4















      • First, I suggest you add two separate indexes, on



        (InsideProtocolKey, InsideIpHex) INCLUDE (RowKey)

        (OutsideProtocolKey, OutsideIpHex) INCLUDE (RowKey)


        and try the queries again. Your 4-column index is not good for the "Outside" query as the columns appear in the 2nd and 4th position and only slightly good for the "inside" query (1st and 3rd). Plus, these 2 indexes will be half in size (20 bytes vs 40 bytes per row).




      • Second, a minor improvement. Since you only have two options for the ProtocolKey column (and its variations, Inside/Outside), you could conevert (all of them) from int (4 bytes) to tinyint (1 byte) or even to bit (1 bit) and save 3 bytes per row (or 3 + 7/8).



        It won't be a huge saving, but for big tables, it would help. For the not so big, 200M rows x 3 bytes = 600MB save, for every index where the columns appear. I'm not entirely sure about space use of indexes bit columns but surely the save would be either the same as with tinyint (600MB) or more (up to 775MB) for the same table size. Still, and I mention this again, for every index that uses the column.



        Smaller indexes, smaller size on disk and more important, less memory and more probable to stay in memory, especially with the low RAM server you have.



      • Third, 8GB sounds like a very small amount of RAM these days, especially when you have tables of this size. RAM is cheap (at least until you pass the 128GB Standard/Enterprise threshold and then you have the bigger licence charge).






      share|improve this answer



























        4












        4








        4








        • First, I suggest you add two separate indexes, on



          (InsideProtocolKey, InsideIpHex) INCLUDE (RowKey)

          (OutsideProtocolKey, OutsideIpHex) INCLUDE (RowKey)


          and try the queries again. Your 4-column index is not good for the "Outside" query as the columns appear in the 2nd and 4th position and only slightly good for the "inside" query (1st and 3rd). Plus, these 2 indexes will be half in size (20 bytes vs 40 bytes per row).




        • Second, a minor improvement. Since you only have two options for the ProtocolKey column (and its variations, Inside/Outside), you could conevert (all of them) from int (4 bytes) to tinyint (1 byte) or even to bit (1 bit) and save 3 bytes per row (or 3 + 7/8).



          It won't be a huge saving, but for big tables, it would help. For the not so big, 200M rows x 3 bytes = 600MB save, for every index where the columns appear. I'm not entirely sure about space use of indexes bit columns but surely the save would be either the same as with tinyint (600MB) or more (up to 775MB) for the same table size. Still, and I mention this again, for every index that uses the column.



          Smaller indexes, smaller size on disk and more important, less memory and more probable to stay in memory, especially with the low RAM server you have.



        • Third, 8GB sounds like a very small amount of RAM these days, especially when you have tables of this size. RAM is cheap (at least until you pass the 128GB Standard/Enterprise threshold and then you have the bigger licence charge).






        share|improve this answer
















        • First, I suggest you add two separate indexes, on



          (InsideProtocolKey, InsideIpHex) INCLUDE (RowKey)

          (OutsideProtocolKey, OutsideIpHex) INCLUDE (RowKey)


          and try the queries again. Your 4-column index is not good for the "Outside" query as the columns appear in the 2nd and 4th position and only slightly good for the "inside" query (1st and 3rd). Plus, these 2 indexes will be half in size (20 bytes vs 40 bytes per row).




        • Second, a minor improvement. Since you only have two options for the ProtocolKey column (and its variations, Inside/Outside), you could conevert (all of them) from int (4 bytes) to tinyint (1 byte) or even to bit (1 bit) and save 3 bytes per row (or 3 + 7/8).



          It won't be a huge saving, but for big tables, it would help. For the not so big, 200M rows x 3 bytes = 600MB save, for every index where the columns appear. I'm not entirely sure about space use of indexes bit columns but surely the save would be either the same as with tinyint (600MB) or more (up to 775MB) for the same table size. Still, and I mention this again, for every index that uses the column.



          Smaller indexes, smaller size on disk and more important, less memory and more probable to stay in memory, especially with the low RAM server you have.



        • Third, 8GB sounds like a very small amount of RAM these days, especially when you have tables of this size. RAM is cheap (at least until you pass the 128GB Standard/Enterprise threshold and then you have the bigger licence charge).







        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited 6 hours ago









        Erik Darling

        21.5k1267108




        21.5k1267108










        answered 7 hours ago









        ypercubeᵀᴹypercubeᵀᴹ

        77.4k11134216




        77.4k11134216



























            draft saved

            draft discarded
















































            Thanks for contributing an answer to Database Administrators Stack Exchange!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f231689%2fwhat-is-the-best-index-strategy-or-query-select-when-performing-a-search-lookup%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Identify plant with long narrow paired leaves and reddish stems Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern) Announcing the arrival of Valued Associate #679: Cesar Manara Unicorn Meta Zoo #1: Why another podcast?What is this plant with long sharp leaves? Is it a weed?What is this 3ft high, stalky plant, with mid sized narrow leaves?What is this young shrub with opposite ovate, crenate leaves and reddish stems?What is this plant with large broad serrated leaves?Identify this upright branching weed with long leaves and reddish stemsPlease help me identify this bulbous plant with long, broad leaves and white flowersWhat is this small annual with narrow gray/green leaves and rust colored daisy-type flowers?What is this chilli plant?Does anyone know what type of chilli plant this is?Help identify this plant

            fontconfig warning: “/etc/fonts/fonts.conf”, line 100: unknown “element blank” The 2019 Stack Overflow Developer Survey Results Are In“tar: unrecognized option --warning” during 'apt-get install'How to fix Fontconfig errorHow do I figure out which font file is chosen for a system generic font alias?Why are some apt-get-installed fonts being ignored by fc-list, xfontsel, etc?Reload settings in /etc/fonts/conf.dTaking 30 seconds longer to boot after upgrade from jessie to stretchHow to match multiple font names with a single <match> element?Adding a custom font to fontconfigRemoving fonts from fontconfig <match> resultsBroken fonts after upgrading Firefox ESR to latest Firefox

            Shilpa Shastras Contents Description In painting In carpentry In metallurgy Shilpa Shastra education in ancient India Treatises on Shilpa Shastras See also References Further reading External links Navigation menueOverviewTraditions of the Indian Craftsman251930242ŚilpinŚilpiniTraditions of the Indian CraftsmanThe Technique of Wall Painting in Ancient IndiaEssay on the Architecture of the HindusThe Journal of the Society of Arts10.1007/s11837-998-0378-3The role of India in the diffusion of early culturesTraditions of the Indian CraftsmanAn Encyclopedia of Hindu ArchitectureBibliography of Vastu Shastra Literature, 1834-2009The Technique of Wall Painting in Ancient India4483067Les lapidaires indiens